-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Semantic Search instead of Full-Text-Search #1
Comments
Hey @alexferrari88 thanks for the kinds words. Agreed, semantic search would be great. I've thought about it as well, but unlike sqlite-in-wasm, I'm unaware of a solution to semantic search in JS. Running a vector store remotely is an option but one of my goals with this project was to have it be useful as a standalone product. What are your thoughts on how to implement semantic search? |
Right after submitting the issue, I started looking for a wasm vector search, since — I agree with you — it would be nicer to have this extension be sort of self-contained. Unfortunately, the solutions are still few and far between. The best solutions I found so far: Of the two, voy seems quite nice and there are also JS examples on how to use. Curious to know your thoughts about this. |
Awesome, thanks for the links. After a quick look i have some thoughts:
I initially created this extension with WebSQL, which works for extensions using manifest v2. MV2 extensions are no longer allowed though, so while porting to MV3 I initially wanted to use OPFS and the official sqlite-wasm implementation. I was unable to get OPFS to work in the web extension service worker. It works in browser tabs, and in normal web workers, but specifically in the background service worker that replaced background scripts in MV3 it would not work. At the time it seemed to be unintentional, i.e. a bug in the chrome implementation. So perhaps its now possible. I ended up using IndexedDB as the backing filesystem via the excellent wa-sqlite implementation. That's the current state of things -- Using IndexedDB because it happens to work in service workers. |
thank you for looking more into it. That's unfortunate Ideally, one could proceed with an external (but local) vector store (e.g. Chromadb) and create a repository layer that would allow an easy swap for a wasm implementation in the future. I understand this is completely outside the scope of this extension. I might fork it and start working on it but can't promise anything 😎 |
Technically (and pedantically) speaking, OPFS should work in any context, including service workers. The restriction is the OPFS synchronous file access handles that make OPFS file operations fast are only available in dedicated workers. That is a deliberate choice, not a bug - the rationale is that blocking calls should not be used anywhere else. For Chrome extensions, although they are implemented as service workers, I think there is a workaround. An offscreen document can be attached to an extension, and this document can create a Worker where the entire OPFS API should be usable. Perhaps that path is worth exploring. |
Thanks for chiming in @rhashimoto. Interesting, I had looked at the offscreen document API for dom parsing but if it allows access to a normal worker that might be an option. A bit roundabout, but vector search for browsing history may well be worth it. |
There is a new, viable option: using pgvector via pglite (https://pglite.dev/extensions/#pgvector). I'm exploring this now. |
Hi Ian, great project. It's in my todo list of projects to build but thank god you got this done :)
I was wondering: wouldn't be better to use semantic search instead of full text search?
At least this was my idea for creating a project similar to yours.
I'd be glad to give more details, if my question is not clear.
(also interested in contributing, if you want to go in this direction)
The text was updated successfully, but these errors were encountered: