Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Semantic Search with SQLite (neuml.github.io)
107 points by txtai on Nov 21, 2022 | hide | past | favorite | 7 comments


It's not clear to me if txtai reaches out to the internet for all these queries. I assume it does and processes much of this in the cloud. That probably makes it a non-starter for much of my work. I do wonder, however, since their API docs talk a little about cloud options (making me wonder if non-cloud is the default). But, it's not immediately obvious to me.


It's all local except for downloading the transformers models for vectorization. TRANSFORMERS_OFFLINE_MODE can be set and models downloaded manually for offline use.


Interesting choice of a database. Duckdb seems like it is a much better fit for the type of query being done. Wonder why sqlite was chosen over duckdb.


Duckdb is on the roadmap. SQLite is the first implementation. The plan is to extend the interface for other database types, including Duckdb.


If you’re using python the Jina Docarray package supports very similar workflows out of the box. Using an SQLite backend is possible as well in addition to many others. Currently using the SentenceTransformers package as well and you can have your own search engine running in a couple hours.


Any examples of this being combined with full text search to get a hybrid index?


Sure thing. Examples can be found here: https://neuml.github.io/txtai/examples/

This one re-ranks the output from an Elasticsearch index - https://colab.research.google.com/github/neuml/txtai/blob/ma...

The next major release have more examples using the local BM25 scoring module.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: