Hacker News new | past | comments | ask | show | jobs | submit login

What are other potential platforms?



This is a good long-list of projects, although it is not narrowly scoped to tracing/evals/prompt-management: https://github.com/tensorchord/Awesome-LLMOps?tab=readme-ov-...


One missing in the list below is Agenta (https://github.com/agenta-ai/agenta).

We're oss, otel compliant with stronger focus on evals and the enabling collaboration between subject matter experts and devs.


Bunch of them : Langsmith, Lunary, Phoenix Arize, Portkey, Datadog and Helicone.

We also picked Langfuse - more details here: https://www.nonbios.ai/post/the-nonbios-llm-observability-pi...


Thanks, this post was insightful. I laughed at the reason why you rejected Arize Phoenix, I had similar thoughts while going through their site!=)

> "Another notable feature of Langfuse is the use of a model as a judge ... this is not enabled in the free version/self-hosted version"

I think you can add LLM-as-judge to the self-hosted version of Langfuse by defining your own evaluation pipeline: https://langfuse.com/docs/scores/external-evaluation-pipelin...


Thanks for the pointer !

We are actually toying with building out a prompt evaluation platform and were considering extending langfuse. Maybe just use this instead.


Thanks for sharing your blogpost. We had a similar journey. I installed and tried both Langfuse and Phoenix and ended up choosing Langfuse due to some versioning conflicts on the python dependency. I’m curious if your thoughts change after V3? I also liked that it only depended on Postgres but the scalable version requires other dependencies.

The thing I liked about Phoenix is that it uses OpenTelemetry. In the end we’re building our Agents SDK in a way that the observability platform can be swapped (https://github.com/zetaalphavector/platform/tree/master/agen...) and the abstraction is OpenTelemetry-inspired.


As you mentioned, this was a significant trade-off. We faced two choices:

(1) Stick with a single Docker container and Postgres. This option is simple to self-host, operate, and iterate on, but it suffers from poor performance at scale, especially for analytical queries that become crucial as the project grows. Additionally, as more features emerged, we needed a queue and benefited from caching and asynchronous processing, which required splitting into a second container and adding Redis. These features would have been blocked when going for this setup.

(2) Switch to a scalable setup with a robust infrastructure that enables us to develop features that interest the majority of our community. We have chosen this path and prioritized templates and Helm charts to simplify self-hosting. Please let us know if you have any questions or feedback as we transition to v3. We aim to make this process as easy as possible.

Regarding OTel, we are considering adding a collector to Langfuse as the OTel semantics are currently developing well. The needs of the Langfuse community are evolving rapidly, and starting with our own instrumentation has allowed us to move quickly while the semantic conventions were not developed. We are tracking this here and would greatly appreciate your feedback, upvotes, or any comments you have on this thread: https://github.com/orgs/langfuse/discussions/2509


So we are still on V2.7 - works pretty good for us. Havent tried V3 yet, and not looking to upgrade. I think the next big feature set we are looking for is a prompt evaluation system.

But we are coming around to the view that it is a big enough problem to have dedicated saas, rather than piggy back on observability saas. At NonBioS, we have very complex requirements - so we might just end up building it up from the ground up.


"Langsmith appeared popular, but we had encountered challenges with Langchain from the same company, finding it overly complex for previous NonBioS tooling. We rewrote our systems to remove dependencies on Langchain and chose not to proceed with Langsmith as it seemed strongly coupled with Langchain."

I've never really used Langchain, but setup Langsmith with my own project quite quickly. It's very similar to setting up Langfuse, activated with a wrapper around the OpenAI library. (Though I haven't looked into the metadata and tracing yet.)

Functionally the two seem very similar. I'm looking at both and am having a hard time figuring out differences.


We launched Laminar couple of months ago, https://www.lmnr.ai. Extremely fast, great DX and written in Rust. Definitely worth a look.


Congrats on the Launch!


apologies for hijacking your launch (congrats btw!)


thanks Marc :)


I'm a maintainer of Opik, an open source LLM evaluation and observability platform. We only launched a few months ago, but we're growing rapidly: https://github.com/comet-ml/opik




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: