More

mfdupuis · 2025-02-28T16:52:18 1740761538

Disclosure, I'm a founder in the data space[1]

Have you thought about how you would handle much larger datasets? Or is the idea that since this is a spreadsheet, the 10M cell limit is plenty sufficient?

I find WASM really interesting, but I can't wrap my head around how this scales in the enterprise. But I figure it probably just comes down to the use cases and personas you're targeting.

[1] https://www.fabi.ai/

revenga99 · 2025-03-01T05:18:58 1740806338

I am also very deeply invested in this question. It seems like the goto path for huge large data sets is text to sql (clickhouse, snowflake) etc. But all these juicy python data science libraries require code execution based on off the much small data payloads from the sql results. Feel free to reach out, what you are trying to achieve seems very similar to what I am trying to do in a completely different industry/usecase.

mfdupuis · 2025-01-03T01:17:46 1735867066

Fabi.ai | https://www.fabi.ai/| Senior front end engineer | Full-time | Hybrid SF or Remote (US)

We're looking for a senior front end engineer to join our mighty and growing team.

We're transforming the way data analysis is done in the enterprise and already have some amazing customers and are growing rapidly.

This person should have extensive React and Typescript experience be able to operate with minimal design supervision (we're a small team and we expect this person to have a sharp eye).

Full job description: https://www.linkedin.com/jobs/view/4093878394

mfdupuis · 2024-12-19T23:59:48 1734652788

This feels like a good opportunity for a startup. I've seen a lot of startups crop up around Snowflake cost management, I wonder what's in the AWS space.

jprd · 2024-12-20T01:15:48 1734657348

https://duckbillgroup.com/

mfdupuis · 2024-12-18T16:55:14 1734540914

Love DuckDB. Definitely a great place to start.

> A common pattern I’ve seen over the years have been folks in engineering leadership positions that are not super comfortable with extracting and interpreting data from stores

I think this extends beyond just engineering, and I wish more data teams made the raw data (or at least some clean, subset) more readily available for folks across organizations to explore. I've been part of orgs where I had access to read-only replicas, and I quickly got comfortable querying and analyzing data on my own, and I've been part of other orgs where everything had to go through the data team, and I had to be spoon-fed all the data and insights.

ryanwaldorf · 2024-12-18T17:02:41 1734541361

Totally agree. In my last job I was able to create my own ETL jobs as a PM to get data for my own analyses and figured out a fairly minor configuration change could save us $10M per year. It was from one of many random ETL jobs I created myself out of curiosity that, if I had been forced to rely on other people, I may not ever have created.

wjnc · 2024-12-18T17:37:25 1734543445

If you’d just had a business controller, you’d have x*$10M saved and have more time for your PM-role.

Yes, calling BS on leadership running their own SQL. Bring strategy and tactics, find good people, create clear roles and expectations and sure don’t get lost in running naive scripts you’ve written because you can do all roles better than the people actually occupying those roles.

mble_ · 2024-12-18T18:25:01 1734546301

Agreed, if you have the budget for it. There are often times where living off the land is necessary.

wjnc · 2024-12-19T11:49:20 1734608960

I know nothing about working in small firms. So that is probably very true. The smaller the firm, the more you do yourself. But ... if a company can save $ 10 mln. ... it can afford a set of financials.

mfdupuis · 2024-12-17T16:15:39 1734452139

This is actually one of the more interesting LLM observability platforms I've seen. Beyond addressing scaling issues, where do you see yourself going next?

marcklingen · 2024-12-17T16:28:47 1734452927

Positioning/roadmap differs between the different project in the space.

We summarized what we strongly believe in here: https://langfuse.com/why Tldr: open apis, self-hostable, LLM/cloud/model/framework-agnostic, API first, unopinionated building blocks for sophisticated teams, simple yet scalable instrumentation that is incrementally adoptable

Regarding roadmap, this is the near-term view: https://langfuse.com/roadmap

We work closely with the community, and the roadmap can change frequently based on feedback. GitHub Discussions is very active, so feel free to join the conversation if you want to suggest or contribute a feature: https://langfuse.com/ideas

mathiasn · 2024-12-17T16:23:40 1734452620

What are other potential platforms?

marcklingen · 2024-12-17T16:53:09 1734454389

This is a good long-list of projects, although it is not narrowly scoped to tracing/evals/prompt-management: https://github.com/tensorchord/Awesome-LLMOps?tab=readme-ov-...

resiros · 2024-12-17T21:50:35 1734472235

One missing in the list below is Agenta (https://github.com/agenta-ai/agenta).

We're oss, otel compliant with stronger focus on evals and the enabling collaboration between subject matter experts and devs.

suninsight · 2024-12-17T18:37:46 1734460666

Bunch of them : Langsmith, Lunary, Phoenix Arize, Portkey, Datadog and Helicone.

We also picked Langfuse - more details here: https://www.nonbios.ai/post/the-nonbios-llm-observability-pi...

unnikrishnan_r · 2024-12-17T19:08:33 1734462513

Thanks, this post was insightful. I laughed at the reason why you rejected Arize Phoenix, I had similar thoughts while going through their site!=)

> "Another notable feature of Langfuse is the use of a model as a judge ... this is not enabled in the free version/self-hosted version"

I think you can add LLM-as-judge to the self-hosted version of Langfuse by defining your own evaluation pipeline: https://langfuse.com/docs/scores/external-evaluation-pipelin...

suninsight · 2024-12-18T06:22:52 1734502972

Thanks for the pointer !

We are actually toying with building out a prompt evaluation platform and were considering extending langfuse. Maybe just use this instead.

barefeg · 2024-12-17T19:56:10 1734465370

Thanks for sharing your blogpost. We had a similar journey. I installed and tried both Langfuse and Phoenix and ended up choosing Langfuse due to some versioning conflicts on the python dependency. I’m curious if your thoughts change after V3? I also liked that it only depended on Postgres but the scalable version requires other dependencies.

The thing I liked about Phoenix is that it uses OpenTelemetry. In the end we’re building our Agents SDK in a way that the observability platform can be swapped (https://github.com/zetaalphavector/platform/tree/master/agen...) and the abstraction is OpenTelemetry-inspired.

marcklingen · 2024-12-17T21:49:42 1734472182

As you mentioned, this was a significant trade-off. We faced two choices:

(1) Stick with a single Docker container and Postgres. This option is simple to self-host, operate, and iterate on, but it suffers from poor performance at scale, especially for analytical queries that become crucial as the project grows. Additionally, as more features emerged, we needed a queue and benefited from caching and asynchronous processing, which required splitting into a second container and adding Redis. These features would have been blocked when going for this setup.

(2) Switch to a scalable setup with a robust infrastructure that enables us to develop features that interest the majority of our community. We have chosen this path and prioritized templates and Helm charts to simplify self-hosting. Please let us know if you have any questions or feedback as we transition to v3. We aim to make this process as easy as possible.

Regarding OTel, we are considering adding a collector to Langfuse as the OTel semantics are currently developing well. The needs of the Langfuse community are evolving rapidly, and starting with our own instrumentation has allowed us to move quickly while the semantic conventions were not developed. We are tracking this here and would greatly appreciate your feedback, upvotes, or any comments you have on this thread: https://github.com/orgs/langfuse/discussions/2509

suninsight · 2024-12-18T09:36:02 1734514562

So we are still on V2.7 - works pretty good for us. Havent tried V3 yet, and not looking to upgrade. I think the next big feature set we are looking for is a prompt evaluation system.

But we are coming around to the view that it is a big enough problem to have dedicated saas, rather than piggy back on observability saas. At NonBioS, we have very complex requirements - so we might just end up building it up from the ground up.

ianbicking · 2024-12-18T16:36:03 1734539763

"Langsmith appeared popular, but we had encountered challenges with Langchain from the same company, finding it overly complex for previous NonBioS tooling. We rewrote our systems to remove dependencies on Langchain and chose not to proceed with Langsmith as it seemed strongly coupled with Langchain."

I've never really used Langchain, but setup Langsmith with my own project quite quickly. It's very similar to setting up Langfuse, activated with a wrapper around the OpenAI library. (Though I haven't looked into the metadata and tracing yet.)

Functionally the two seem very similar. I'm looking at both and am having a hard time figuring out differences.

skull8888888 · 2024-12-17T21:39:43 1734471583

We launched Laminar couple of months ago, https://www.lmnr.ai. Extremely fast, great DX and written in Rust. Definitely worth a look.

marcklingen · 2024-12-17T21:52:45 1734472365

Congrats on the Launch!

skull8888888 · 2024-12-17T22:10:01 1734473401

apologies for hijacking your launch (congrats btw!)

skull8888888 · 2024-12-17T22:08:36 1734473316

thanks Marc :)

calebkaiser · 2024-12-17T17:54:49 1734458089

I'm a maintainer of Opik, an open source LLM evaluation and observability platform. We only launched a few months ago, but we're growing rapidly: https://github.com/comet-ml/opik

mfdupuis · 2024-12-12T23:36:37 1734046597

Apple vision sales surpass iPad

I_am_tiberius · 2024-12-13T22:46:14 1734129974

Only if they dramatically change to a glasses based design like Meta. Otherwise I'd replace "iPad" with "Google Explorer glasses".

mfdupuis · 2024-12-11T00:51:00 1733878260

I'm curious to see how this plays out when it comes to deploying and maintaining production-grade apps. I know relatively little about infrastructure and DevOps, but that's the stuff that actually always seems complicated when it goes from going to MVP to production. This question feels particularly important if we're expecting PMs and designers to be primary users.

That said, I'm super excited about this space and love seeing smart folks putting energy into this. Even if it's still a bit aspirational, I think the idea of cutting down time spent debugging and refactoring and putting more power in the hands of less technical folks is awesome.

mfdupuis · 2024-12-11T00:40:46 1733877646

Are you looking to validate a market idea? If so, are you thinking of more of a consumer use case? You mentioned Cursor, so it sound like you're maybe thinking more enterprise, but embedded ads are basically not a thing in the enterprise. Most solutions offer freemium mostly as a loss-leader, but this isn't AI specific IMO.

__01000010 · 2024-12-15T08:05:29 1734249929

You're right, embedded ads don’t work in enterprise, and freemium often serves as a loss-leader there. We're looking to validate the market, possibly for consumer use cases, while testing if freemium can drive early adoption or loyalty. Do you think it has potential in consumer AI, or is premium-only the better approach?

mfdupuis · 2024-12-11T00:36:54 1733877414

Congrats on the launch!

I'm building in this space[1] and I'm intrigued. When I checked out the repo, this actually looked like possibly a really convenient way to fine-tune models, but I'm trying to understand the piece about "products simply don’t have datasets, and datasets can’t keep up with product evolution". What does this mean in practice and how does this relate to fine-tuning?

[1] https://www.fabi.ai/

scosman · 2024-12-11T01:02:18 1733878938

Datasets tend to be really rough proxies of product goals. Initial spec is "feature smiling faces", so a "smiling/no-smiling" dataset is built. But over the next year you really realize people can be "smiling but ugly smiling", "neutral faced but pleasant" and a bunch more. There are bugs you need to fix (false positive/negatives), and lots of tweaks to the goals. Any design nuance is lost in the chain: explain product concept to data science team, who writes a spec for data collectors, who collect samples, DS makes a model, eng integrates, and then folks (finally) try it in product.

QA files one off bugs, but not in a way that impacts datasets/training. Someone needs to analyze them in bulk and make calls about which areas to care about (which is slow and expensive).

However, if the time to data is tiny, you can iterate more like software. New model drops often (with fast evals). Subjective feedback can become synth data quickly, the issue fixed, and results evaluated.

Your product looks a bit more like analysis pipelines for new problems? I'm more looking at zero-shot quality and performance.

mfdupuis · 2024-11-24T18:12:00 1732471920

Personally I find plotly hard to beat. Unless you're doing really fancy stuff, it gives you everything you need. Seaborn also a great option IMO.