We have relationships with many providers and I don't want to be seen as promoting or not promoting a specific provider. Some decent privacy-preserving vendors - Brave, Exa, Parallel Web Systems, DuckDuckGo etc
We will continue to monitor what's good to improve the output quality and results. Sometimes it could be the combination of providers to yield even better results. If I say one combination right now, and realize another combination is better, and make changes, I wouldn't need to broadcast it each time or risk misrepresenting the feature, which is to have amazing search and research capabilities that can augment models for a superior output.
Exa: https://exa.ai/assets/Exa_Labs_Terms_of_Service.pdf "You may not [...] download, modify, copy, distribute, transmit, display, perform, reproduce, duplicate, publish, license, create derivative works from, or offer for sale any information contained on, or obtained from or through, the Services, except for temporary files that are automatically cached by your web browser for display
purposes"
Many of the things I want to do with a search API are blocked by these rules! So I need to know which rules I am subject to.
IANAL, but if Ollama says "you can do with the results whatever you want", then they would be the ones liable for any breach of TOS.
That's admittedly a pretty foolish behaviour on their part and doesn't instill trust in Ollama as a service provider, but you as the end-user should be in the clear.
It's pretty wild that Brave's terms of service state as much, considering their search API is entirely derived from storing the results of other search systems. https://support.brave.app/hc/en-us/articles/4409406835469-Wh.... Aka Brave is blocking exactly what it does to Bing and Google.
My nightmare scenario is that I build my own crucial database of information partially derived from a search API... and then later get into legal trouble which forces me to delete that data, which is now intermingled with other information I've collected.
Yes - it's important to me that I understand the source of the data I've collected and if that source results in restrictions on what I can do with that data.
Especially when I'm building databases that I want other organizations to be able to use.
Fun fact: many geocoding APIs have restrictions on what you can do with the data you get back from that geocoder - including how long you can store it and whether you are allowed to re-syndicate to other people. That's one of the reasons I like OpenCage: https://opencagedata.com/guides/how-to-compare-and-test-geoc...
This information is very useful to the open source community. Whats the rationale in not "building in the public"? Is Ollama turning its back on the open source community? Also why should we believe ollama web search is better than my locally run searxng server?
Oh yes! that is why I want to provide the names of the providers we use. I do believe in building in the open. The web search functionality has a very generous free tier (it is behind Ollama's free account to prevent abuse) that allows you to give it a try comparing to running a searxng server locally.
On making the search functionality locally -- we made considerations and gave it a try but had trouble around result quality and websites blocking Ollama for making a crawler. Using a hosted API, we can get results for users much faster. I'd want us to revisit this at some point. I believe in having the power of local.
We will continue to monitor what's good to improve the output quality and results. Sometimes it could be the combination of providers to yield even better results. If I say one combination right now, and realize another combination is better, and make changes, I wouldn't need to broadcast it each time or risk misrepresenting the feature, which is to have amazing search and research capabilities that can augment models for a superior output.