Yeah, this is the next step. I first wanted to understand if this gets any traction. I think I will provide a dockerized version for the server part that you can just run with a simple command and maybe some interface to create api keys and distribute them to your users.
Fair enough from a business standpoint, but seeing as there are massive privacy/security risks involved in exposing your data to an opaque service, the open source component is probably a non-optional aspect of the value prop.
"TypeScript is now the most used language on GitHub. In August 2025, TypeScript overtook both Python and JavaScript. Its rise illustrates how developers are shifting toward typed languages that make agent-assisted coding more reliable in production. It doesn’t hurt that nearly every major frontend framework now scaffolds with TypeScript by default. Even still, Python remains dominant for AI and data science workloads, while the JavaScript/TypeScript ecosystem still accounts for more overall activity than Python alone."
I am not sure I agree with the conclusion "developers are shifting toward typed languages that make agent-assisted coding more reliable in production". I see it more with fullstack development being democratized.
I am originally Python/BE/ML engineer. But I've built in the last years many Frontend, simply because AI coding enables so much.
>I'll preface this by saying that neither of us has a lot of experience writing Python async code
> I'm actually really interested in spending proper time in becoming more knowledgeable with Python async, but in our context you a) lose precious time that you need to use to ship as an early-stage startup and b) can shoot yourself in the foot very easily in the process.
The best advice for a start-up is to use the tools that you know best. And sometimes that's not the best tool for the job. Let's say you need to build a CLI. It's very likely that Go is the best tool for the job, but if you're a great Python programmer, then just do it in Python.
Here's a clearer case where the author was not very good with Python. Clearly, since they actually used Django instead of FastAPI, which should have been the right tool for the job. And then wrote a blog post about Python being bad, but actually it's about Django. So yeah, they should have started with Node from day one.
The only issue with writing a CLI in Node is ecosystem. The CLI libraries for Node are (or were last time I checked) inspired by React. Not a paradigm that is fun to write in, and if I'm making a CLI tool it is because I am bored and want to make something for my own entertainment.
Yeah the last CLI app I used was actually a TUI. It routed std out and std err from scripts and programs it'd call out to into separate windows. It had animations and effects and a help system built in. It also had theming support because the library I used for the TUI happened to have that and came with some default themes! It was a bit beyond a simple CLI tool.
If I'm farfing around with the console I'm going to have fun.
Gemini CLI used it, and I actually hate the layout. Never thought I'd see a CLI manage to waste terminal window space to the point of needing to zoom it out.
There is a major mistake in the article. The author argues that openinference is not otel compatible. That is false.
>OpenInference was created specifically for AI applications. It has rich span types like LLM, tool, chain, embedding, agent, etc. You can easily query for "show me all the LLM calls" or "what were all the tool executions." But it's newer, has limited language support, and isn't as widely adopted.
> The tragic part? OpenInference claims to be "OpenTelemetry compatible," but as Pranav discovered, that compatibility is shallow. You can send OpenTelemetry format data to Phoenix, but it doesn't recognize the AI-specific semantics and just shows everything as "unknown" spans.
What is written above is false. Openinference (or for the matter, Openllmetry, and the GenAI otel conventions) are just semantic conventions for otel. Semantic conventions specify how the span's attributes should be name. Nothing more or less. If you are instrumenting an LLM call, you need to specify the model used. Semantic conventions would tell you to save the model name under the attribute `llm_model`. That's it.
Saying OpenInference is not otel compatible does not make any sense.
Saying Phoenix (the vendor) is not otel compatible because it does not show random spans that does not follow its convention, is ... well unfair to say the least (saying this as a competitor in the space).
A vendor is Otel compliant if it has a backend that can ingest data in the otel format. That's it.
Different vendors are compatible with different semconvs. Generalist observability platforms like Signoz don't care about the semantic conventions. They show all spans the same way, as a JSON of attributes. A retrieval span, an LLM call, or a db transaction look all the same in Signoz. They don't render messages and tool calls any different.
LLM observability vendors (like Phoenix, mentioned in the article, or Agenta, the one I am maintaining and shamelessly plugging), care a lot about the semantic conventions. The UI in these vendors are designed for showing AI traces the best way. LLM messages, tool calls, prompt templates, retrieval results are all shown in user friendly ways. As a result the UI needs to understand where each attribute lives. Semantic conventions matter a lot to LLM Observability vendors. Now the point that the article is making is that Phoenix can only understand the Openinference semconvs. That's very different from saying that Phoenix is not Otel compatible.
Having been from the other side of the table. You did not flunk anything again.
A job process is not an exam where if you do well you succeed.
Your "performance" plays a small role in whether you are accepted (maybe less than 30%). The rest is:
- The pipeline: that is who are your competitors, is there someone late in the process, is there someone a manager worked with / knows
- Your CV: obviously at the point of the interview, you can't change your history
- The position fit: basically who they're looking for. They might have a profile in mind (let's say someone extrovert to do lots of talks, or someone to devrel to enterprise) where you simply don't fit.
- The biases: And there is looot of these. For instance, some would open your blog and say it's unprofessional because of the UI. Not saying that is the case, it's simply their biases.
So, my advice, you reached hn front page twice in a couple of months. Most people, me included, never did. You clearly have something. Find work with people that see that.
A corporate lawyer at a mid-sized firm exemplified this dynamic. Her organization invested $50,000 in a specialized contract analysis tool, yet she consistently defaulted to ChatGPT for drafting work: "Our purchased AI tool provided rigid summaries with limited customization options. With ChatGPT, I can guide the conversation and iterate until I get exactly what I need. The fundamental quality difference is noticeable, ChatGPT consistently produces better outputs, even though our vendor claims to use the same underlying technology." This pattern suggests that a $20-per-month general-purpose tool often outperforms bespoke enterprise systems costing orders of magnitude more, at least in terms of immediate usability and user satisfaction. This paradox exemplifies why most organizations remain on the wrong side of the GenAI Divide.
That's a similar story for me at $DAYJOB. We have copilot for our IDEs and it is so much worse than Claude code or any other CLI integration option. We're so restricted on adopting the features as fast as they are turned on. I try to use it during the day and end up frustrated that the agent mode is restricted and returns "I can't complete that for you" or something similar when asking for pretty reasonable actions.
I've been cranking out personal apps with Claude Code in contrast and my brain is exploding with ideas for day job, but this is such an organic space that the speed that corporations move at they are using the cool tool from last year has left me demoing personal work to coworkers and hoping that starts to move the needle on getting better tooling. I understand the governance and privacy concerns for $DAYJOB, and as such every tool needs to get approved by a slow human process.
We also have OpenAI access and I have found myself using that for research more so than copilot as well, maybe we just picked the worst tool because of that MS vertical integration...
All I have is copilot as well, but with that I can configure Aider to use the copilot openai endpoint, and through that access most of the good models with a capable CLI tool. It’s a pair-programming experience more than an agentic one but I need to stay close to the code anyway.
> AI would offer automation equaling $2.3 trillion in labor value affecting 39 million positions
But
>Current automation potential: 2.27% of U.S. labor value
Given the US GDP right now is 27 trillion, I'm not sure if this is really mathing out in my head. Wwe're going to potentially optimize 61 billion dollars of US labor value while displacing some 15% of the American labor force, and return back 2.3 trillion in value? Who's purchasing all this (clearly not the workforce)? Meanwhile, investments in AI as of 2025 is already hitting half of that.
Granted, GDP is an odd indicator to measure on for this situation. But I'm unsure how else we measure "labor value" here.
I’m not sure how you got that 2.2% of 18.5 trillion in GDP attributed to labor is 61 billion, so I’d agree that math doesn’t seem accurate.
Additionally, you seemed to have pulled the cherry-picked quote and compared with the “current” impact and ignored the immediately following text on latent automation exposure (partially extracted for quote) that explains how it could have a greater impact that results in their 2.3t/39m estimate numbers. Seems odd to find those numbers in the report but not read the rest of the same section.
>I’m not sure how you got that 2.2% of 18.5 trillion in GDP attributed to labor is 61 billion
The number I googled for 2024 US GDP was 29.18 trillion, so thats part of it. I'm flexibke enough to adjust that if wrong.
>Additionally, you seemed to have pulled the cherry-picked quote and compared with the “current” impact and ignored the immediately following text on latent automation exposure
There's no time scale presented in that section thst I can find for the "latent" exposure, so its not very useful as presented. That's why I compared it to now.
Over 5 years; I'm not sure but it can be realistic. Over 20 years, If the US GDP doesn't absolutely tank, that's not necessary as impressive a number as it sounds. You see my confusion here?
>that explains how it could have a greater impact that results in their 2.3t/39m estimate numbers.
Maybe I need to read more of the article, but I need a lot more numbers to be convinced of a 40x efficiency boost (predicted returns divided by current gdp value times their 2.2% labor value) for anything. Even the 20x number if I used your gpd number is a hefty claim.
>Or presented a better metric than my formula above on interpreting "impact". I'm open to a better model here than my napkin math.
I would consider reading the actual report more closely rather than an article of questionable accuracy. For example:
> “For instance, an employee can adjust based on new instructions, previous mistakes, and situational needs. A generative AI model cannot carry that memory across tasks unless retrained.”
This is factually false; that is exactly what memory, knowledge, and context can do with no retraining. Not having completely solved self adjustment is not a barrier, merely a hurdle already currently in research. Imagine if, like the human brain, an LLM were to apply training cases identified throughout the day while it “slept”; the author seems to think this would be a massive undertaking of “retraining”. And sorry, if you’ve worked with many of the same types of employees I have over there years, you’d already know that the suggestion employees are more easily adaptable, will remember across tasks, and are good at adjusting to situational needs, can be laughable and even detrimental to think, depending on the person.
The statement seems to be based more on the complaint of a lawyer who has no actual AI technical expertise; hardly the best source for what things AI can and cannot do “currently”. It’s useful to consider that almost all of the subjective opinions expressed in this report come from, effectively, 300 or so (maybe less) individuals, and that it isn’t all that easy to distinguish between the findings that are truly fact-based or opinion-based, especially with the linked post.
It is also important to note that this report seems to focus more on the feedback and data from CEOs who look at P&L, not intrinsic or unquantified values. How do you directly quantify a developer fixing 3 bugs instead of 1 in your internal tool? Unless there are layoffs attributed to this specifically, and not “market changes” or general “reorganizations”, how is this quantified? There are a million things AI might do in the future that may not have a massive, or any, clear return on investment. If I buy a better shovel that saves me an hour on digging a trench in my own backyard, how much money did that save me?
GDP is 29.2t, of which an additional google would find that U.S. labor accounts for an estimated 18.5t. 2.2% of 18.5t, or 29.2t, is still not 61m. In most cases, if the simple part of the math doesn’t fit, there are potentially some bigger logic mistakes at play.
Best of luck on your understanding. As I said, I’d suggest maybe starting with direct statements from factual sources and the report rather than those the author (or you) interpreted.
That's my assessment of the report as well.... really, some news truly is "fake" where they are pushing a narrative that they think will drive clicks and eyeballs, and the media is severely misrepresenting what is in this report.
The failure is not AI, but that a lot of existing employees are not adopting the tools or at least not adopting the tools provided by their company. The "Shadow AI economy" they discuss is a real issue: People are just using their personal subscriptions to LLMs rather than internal company offerings. My university made an enterprise version of ChatGPT available to all students, faculty, and staff so that it can be used with data that should not be used with cloud-based LLMs, but it lacks a lot of features and has many limitations compared to, for example, GPT-5. So, adoption and retention of users of that system is relatively low, which is almost surely due to its limitations compared to cloud-based options. Most use-cases don't necessarily involve data that would be illegal to use with a cloud-based system.
My team has been chewed out for "just because it didn't work once, you need to keep trying it." That feels, to be blunt, almost religious. Claude didn't bless you because you didn't pray often enough and weren't devout enough.
Maybe we need to not just say "people aren't adopting it" but actually listen to why.
AI is a new tool with a learning curve. But that means it's a luxury choice-- we can spend our days learning the new tool, trying out toy problems, building a workflow, or we can continue to use existing tools to deliver the work we already promised.
It's also a tool with an absolutely abysmal learning model right now. Think of the first time you picked up some heavy-duty commercial software (Visual Studio, Lotus 1-2-3, AutoCAD, whatever). Yes, it's complex. But for those programs, there were reliable resources and clear pathways to learn it. So much of the current AI trend seems to be "just keep rewording the prompt and asking it to think really hard and add more descriptive context, and eventually magic happens." This doesn't provide a clear path to mastery, or even solid feedback so people can correct and improve their process. This isn't programming. It's pleading with a capricious deity. Frustration is understandable.
If I have to use AI, I find I prefer the Cursor experience of "smarter autocomplete" than the Claude experience of prompting and negotiation. It doesn't have the "special teams" problem of having to switch to an entirely different skill set and workflow in the middle of the task, and it avoids dumping 2000 line diffs so you aren't railroaded into accepting something that doesn't really match your vision/style/standards.
What would I want to see in a prompt-based AI product? You'd have much more documented, formal and deterministic behaviour. Less friendly chat and more explicit debugging of what was generated and why. In the end, I guess we'd be reinventing one of those 1990s "Rapid Application Development" environments that largely glues together pre-made components and templates, except now it burns an entire rainforest to build one React SPA. Has anyone thought about putting a chat-box front end around Visual Basic?
reply