Respectfully, I disagree. It is much faster and cheaper to direct an LLM to add a call to a battle-tested library that encapsulates complex logic than it is to design and implement that logic from scratch, even if it’s capable of that.
We’re betting on almost the exact opposite idea: we can make agentic software engineering cheaper and more reliable by making it easy for LLMs to write, find, and integrate libraries and other third party software.
Let’s meet and see if it might make sense for us to team up. We’re working on this from the agent/library-specific-task side, and we might be better than chatgpt at marketing your product :)
We’ve been working on this problem off and on for over a year now. Many models bake knowledge of particular tools/libraries/patterns into their weights very well and others quite poorly. In my experience Claude is quite good at integrating the dog.ceo API and noticeably ignorant when it comes to Postgres features, and it knows gcloud commands enough to very confidently and consistently hallucinate arguments.
We’ve baked a solution to this into our product, so if anybody is working on an API/SDK/etc feel free to contact me if your users are running into problems using LLMs to integrate them.
One thing we’ve noticed is that subtle changes to library/api integration prompts’ context can be surprisingly impactful. LLMs do very well with example commands and explicit instructions to consider X, Y, and Z. If you just dump an API reference and information that implicitly suggests that X, Y, and Z might be beneficial, they won’t reliably make the logical leaps you want them to unless you let them iterate or “think” (spend more tokens) more. But you can’t as easily provide an example for everything, and the ones you do will bias the models towards them, so you may need a bit of both.
I made a provisional patent this year, about how exactly I would solve this problem. Imagine hiring a "team of developers" who can learn your library and iterate 24/7, improving things, doing support, even letting the pointy-haired boss turn his ideas into reality in a forked sandbox on the weekend.
For the last 15 years I've been writing against software patents, and producing open source software that cost me about $1M to develop, but in the case of AI, I have started to make an exception. I have also rethought how I am going to do open source vs closed source in my AI business. A few weeks ago I posted on HN asking whether it's a good idea, and no one responded: https://news.ycombinator.com/item?id=44425545
(If anyone wants to work with me on this, hit me up, email is in my profile)
I guess that's why patents are annoying. I have been Mr. Open Source and against intellectual property for most of the past 15 years. But with AI companies rampantly taking everyone's work and repurposing it, and with VC companies not being very eager to invest in open source, I'm taking a different tack with my AI ventures.
My first two companies are radically open source, and no one cared:
Don't worry, we're not looking to get into it with some random other projects. It's mostly to protect our business model against the Big Tech and enterprises.
Phase 2:
7. Sell shares to MS, Oracle, IBM
8. They do most of the Enterprise Sales
How comfortable are you that these firms do this and do this this way? MS, for example, doesn't really sell to a majority of its customers, they support VARs. And for IBM, do you mean Kendryl? Is your product a gateway to their products, generally causing a bundle to be sold? Are you familiar with firms like yours that have an arrangement where these enterprises bought shares then go sell on the startup's behalf? Note, it is indeed helpful if you can claim you're patented, but not to protect your "business model" (too many ways to work around a patent, or more likely, get it invalidated), more to suggest you have something.
I think I gave you product feedback on Qbix at some point in the past. I also know several founders who’ve secured funding for open source products and built successful businesses off of them. Open-core is pretty popular out here in the Bay Area.
One thing I’ve learned since staring a company is that early on, your greatest asset is trust in your founder/brand, because it’s the only reason for someone to pay you for something until you get your shit together. I’ve personally had a hard time noticing it in myself sometimes, but I think it’s easy to overlook how outward signaling that might look like distrust (eg making users sign NDAs) damages your own ability to build trust. Since early startups tend to be considered untrustworthy by default it can be really counterproductive. Anyway, I appreciate your non-aggression policy
Would you consider arranging a call to discuss our respective projects? If you're building something along these lines, then I think we might end up joining forces.
I've always preferred collaboration and joining forces building on each other's work, than competition and incompatibility.
I'd actually consider your criticism seriously, if it was anything other than the usual HN "saw the word blockchain, did an immediate TDLR with the word grift" regardless of what was done or built.
If you had anything substantive to back up what you're saying, we could discuss it, but since you don't... well, I'm actually disappointed but w/e.
I suspect you are interacting with LLMs in a single, long conversation corresponding to your "session" and prompting fixes/new info/changes in direction between tasks.
This is a very natural and common way to interact with LLMs but also IMO one of the biggest avoidable causes of poor performance.
Every time you send a message to an LLM you actually send the entire conversation history. Most of the time a large portion of that information will no longer be relevant, and sometimes it will be wrong-but-corrected later, both of which are more confusing to LLMs than to us because of the way attention works. The same applies to changes in the current task/objective or instructions: the more outdated, irrelevant, or inconsistent they are, the more confused the LLM becomes.
Also, LLMs are prone to the Purple Elephant problem (just like humans): the best way to get them to not think about purple elephants is to not mention them at all, as opposed to explicitly instructing them not to reference purple elephants. When they encounter errors, they are biased to previous assumptions/approaches they tend to have laid out previously in the conversation.
I generally recommend using many short per-task conversations to interact with LLMs, with each having as little irrelevant/conflicting context as possible. This is especially helpful for fixing non-trivial LLM-introduced errors because it reframes the task and eliminates the LLM's bias towards the "thinking" that caused it to introduce the bug to begin with
I've been doing a moderate amount of research on getting an ASN and ipv4/6 blocks so I can BYOIP and host third-party services without being locked into the cloud provider I was using at the time the third-party configured DNS. That has led me down various rabbit holes in which I started learning how the Internet actually works.
IMO the Internet actually sucks ass
Why is there so much bureaucracy and cost involved for someone to own an IP address? I should be able to connect to the network and acquire an IP address as easily as I can buy a merckle-tree-backed pointer to an IPFS image, or vote in a US election. Why do I have to pay hundreds of dollars for Internet Numbers conjured from thin air by a US nonprofit to be resold by a RIR? How fucking moronic is it that IPV4 was created with substantially less capacity than there were humans on Earth, got adopted, wasn't immediately fixed or abandoned once it became obvious that the Internet would be used globally, was irresponsibly allocated, introduced various unofficial but consequential practices (eg NAT), ran out and got expensive, and STILL is widely used alongside ipv6.
What is the point of having a centralized system for governance centered around ICANN/IANA when they are so wildly inefficient and incapable of governing? Fuck 2000€ these are freaking made up numbers that I should be able to buy for pennies with an email address, government ID, and credit card.
Sounds like you might want to dig into what these organizations do for its members besides assignment and management (not sale! you cannot own IP addresses) of shared number resources, to get a better understanding of their membership fees! I am a big fan of RIPE as an organization and appreciate their work (and less so of ARIN but I have little exposure).
Financial reports are public, and fee structures including salaries and all work areas and work groups are decided and voted on by its members. The highest body of the RIPE non-profit is the general assembly.
I manage two RIPE LIRs, and signup was not more work than joining any other member association. There is an annual invoice, and various payment processor options for that. I wouldn’t want it to be less “bureaucratic“ since I benefit from their processes and transparency. If they didn’t guard it, all of it would be in the hands of a Musk-like soulless broken person hiding behind a tax-evading corporate structure with zero accountability. No thank you.
True, but I mean, I don't own my own body either I suppose, I am just borrowing its particles from the rest of the universe. That's only a useful distinction to make if you plan on killing me.
My personal situation is probably not very representative of most Internet users or entities interacting with the organizations that control the Internet, but I think as wireless technology improves and end-users' ability and incentive to self-host grows, they will run into the same problems that I do.
Bottom line: I don't want to spend unreasonable amounts of time and money dealing with the idiosyncracies of the Internet Protocol and related technology, when I'm trying to do something that should be easy, like get an IP address that I can move between ISPs and cloud providers, or run an internet service from my home. It just feels incredibly wasteful to have to pay significant amounts of money to rent a number when it should be possible to claim or cheaply register one of 340,282,366,920,938,463,463,374,607,431,768,211,456 such numbers.
Then once I nut up and pay for a small slice of the infinitely many numbers available, I have to deal with completely avoidable, godawful technical debt that only exists because the people I'm supposedly paying to govern me were so lazy that they allowed an obvious slow-motion trainwreck to play out with IPv4 over decades. They're still so lazy or cowardly or incompetent that after 20 years IPv6 availability is still only around 50%. Good thing there is an unnecessarily complicated organizational model between ICANN/IANA/RIR so that everybody can point fingers somewhere else.
I don't want to pay for conferences and subcommittees and elaborate ceremonies for electing Vice Treasurers of RIRs, nor do I want to play tamagotchi with ranges of numbers. I just want a fucking number that allows other Internet users to connect with the stuff I put behind that number.
I would prefer a more functional system for acquiring said numbers than one that feels all warm and fuzzy about letting the people profiting off renting numbers elect the leaders of the organizations with the authority to end rentiership of the numbers.
> it should be possible to claim or cheaply register one of
RIPE is not the level to interact with as an end user for IP resources. LIRs act as intermediaries towards such end users. The reason why 255 IPv4 addresses is the smallest chunk you can route these days is a technical one, but apart from that IPv4s are not meant to be moved with end users. This is what DNS is for.
As a hosting or access provider, you are meant to acquire single IP addresses or blocks from LIRs, which in turn assign and route them to a host. It is a federated, layered organizational structure.
I get that you are upset, but I wonder who you are upset at exactly? It is not RIPEs mandate or responsibility to design Internet Protocols. If you want to argue for a better design, you should direct it at the IETF working group based on a study of the current tradeoffs, goals and technical limitations? “I want a different internet!“ Ok sure, go contribute! This openness and collaborative approach is the amazing thing about the Internet. If you have a great idea with technical merit, you will be welcomed with open arms and heard.
When I as an end-user am unsatisfied with what I can and cannot do on the Internet, I only have a relationship with my LIR, who has no direct relationship with the central Internet authority for addressing those problems, because they only interact with an RIR. I cannot call my ISP and ask them to put pressure on the entities responsible for accelerating IPv6 adoption.
Actually, my LIR wants different things than I do, in some cases the opposite. Why replace old hardware or code for IPv6 if we have enough IPv4 to not need to? Why increase adoption of IPv6 if I'm making money renting IPv4 addresses? Why let end-users run websites from their home? Why make it easy for end users to BYOIP or reserve static IPs?
To solve my problems I have to become a LIR because it's the only way to get IP addresses that I get to keep if I switch LIR. Then I can interact with the RIRs and secure addresses in bulk. But I still have no direct relationship with the IANA who I want to influence.
This time, I cannot just become a RIR like I did a LIR because there are only five total in the world. That's a core part of the bullshit - there is no way for the people with influence over the Internet to ever be accountable to me. I can only ask things of people who are incapable of delivering the changes I want. That's why to me, if an RIR is charging me $2k to do something I should be able to do as an end user for free or almost free, I see the RIR as a mere alias of the IANA/IETF.
The other problem is I don't want to be a LIR, and to the extent I act as one, it will be on a small scale. The RIR is accountable to the fulltime, important LIR who don't represent my interests as an end-user.
All I'm left with is trying to walk in the front door and ask a committee of people accountable to the ones profiting off of my problems to do a bunch of work. Great system. All that being said, you're right to suggest giving it a shot.
At least you don't live in australia, where the govt invested in a national broadband network so every aussie could have affordable and fast internet. Guess what we have. A broken cesspool of providers where its going to cost you in excess of $1K p/a to keep a connection to the internet going. Well done straya. Its the same with anything where theres the potential to fleece consumers.
Depending on where you are in the US, it isn't much better. I'm paying $140/month for a 2gb/120mb asymmetric cable connection... I'm paying about that much again for a dedicated server on OVH mostly because they block self-hosting on residential connections, and it costs more than the difference to go to a business connection with a /28 cidr, so I'm renting a server with a better connection instead.
I've been a bit lazy and haven't finished my migration off of google and MS services... I have mixed feelings about my testing of nextcloud and the like. I've got a pretty solid mail solution (mailu) going, but even with that I don't have it on a domain/address I rely on. I'm mostly using a wildcard forward on one of my domains so I can assign a different address to most online and offline accounts as reference.
$50 a month and still on 2mb ADSL. Living in the centre of the city right next to a hotel that owns an 10Gbit feed. Literally ten steps from my apartment to the cities main telephone line exchange
All three domestic providers do the "we are working in your area" which where it comes to my building "nah" is said and the promises of fast broadband suddenly disappear.
I've been living here 8 years now. Same thing said each year.
That sucks... I was stuck in a similar spot for several years at one point. ADSL is definitely on the not fun side of things. Do you know who that hotel's uplink is? You might be able to talk to them about running a direct line, though this will probably cost about $10k or more just to get the line run to your home.
I've tried. Their line is connected to the same exchange as my ADSL and via BT who are tasked to upgrade UKs domestic to fibre by the end of 2026.
I have even asked if I provided the equipment run a Line-of-Sight links from the hotel to my apartment. Perfect range and advantage point but nothing other than some PR fluff of "it may harm the public".
No, a hotel manager isn’t going to want your antenna on their roof. From their perspective it’s unnecessary and weird and therefore out of the question.
BT sells internet service, transit, and buried cables. You want to do what the hotel did and buy a buried cable from them, and then buy transit. You’ll need to do like the guy in the video did and rent some space in their colocation facility to put your gateway in. Plug your buried cable into your gateway, plug your gateway into their router, turn on BGP, etc.
They also offer an intermediate service called a “leased line” which is a buried cable plus transit plus they handle all of the networking for you as if you were a consumer. The hotel chain might have gone that route as well, although there are clear advantages to having your own AS. You can figure out exactly what they did if you connect to their guest WiFi and run `traceroute -A`…
Of course your apartment manager (or the owner) might not agree to let you bury your cable on their property. They might even have an exclusivity agreement with the cable company. This could even be the reason why no other ISP is available in your apartment building.
As if our only options between dysfunctional bureaucracy and corporate absolutism.
It's not the formal processes and openness I take issue with. IPV4's ubiquity and the damage it does (funneling real money away from all of us towards ISPs) is a failure of governance.
Though in a way it's not. It's failing me, but it wasn't designed to represent me. It's failing our species, but it wasn't designed to represent us.
Who do you think it was designed to represent?
I for one love living in a world where ISPs, middlemen, and random internet jackpot winners were able to extract rent through a highly equitable, transparent governance model AND meet yearly at the Hilton.
Just look at the origins of each of these technologies and the times in which they were created and you have all the answers you need. I'm really surprised whenever I read takes like these.
Of course everything is a product of its time, and in 1999 or any other world where the Internet is more of a cool new thing than serious business, it makes sense. But that was 26 years ago.
I am pretty sure the guys charging hundreds of dollars for IP addresses that cost them nothing to produce should be able to set up stripe, an identity verification product, and otherwise automate onboarding. Also, instead of writing giant process documents and slow-walking such wildly difficult problems as "allow domains to end in .cool" through infinitely nested committees they could try wielding their supreme governance over Who Owns Numbers And Names by killing off IPv4.
As long as ICANN/IANA remain in charge of Internet governance and operate with >$100mm budgets [0] "it made sense 25 years ago" is not a valid excuse IMO.
I was going to look up the same... I thought it was a little older than that... it's worth remembering that in 1981 there was no DNS yet, and even for dialup BBSes there were only a handful in the entire country as Hayes modems were new that year. Hell, for years a lot of business email services were dialup, grab packets, send packets and read/reply offline.
This is the same year the IBM PC was first released, and many people felt that would only see fairly limited sales. It wound up selling over 20x projections.
Nobody at the time really thought there would be a need for even more addresses. Not to mention the additional overhead of a wider network on the hardware at the time.
From the coder's perspective, there are no mutable references only if the coder does not really rely on or care about the possibility that their code uses TCO. If they actively want TCO then they definitely care about the performance benefits they get from underlying mutability/memory reuse/frame elision.
Yeah but that's why I prefer Clojure's loop/recur thing. You're not allowed to have non-tail recursion there, so it's not implicit, so you really can pretend that there's no mutable references.
What makes tail recursion "special" is that there exists a semantically equivalent mutable/iterative implementation to something expressed logically as immutable/recursive. [0]
Of course, this means that the same implementation could also be directly expressed logically in a way that is mutable/iterative.
func pow(uint base, uint n): n == 0 ? return 1 : return n * pow(base, n-1)
is just
func pow(uint base, uint n): uint res = 1; for(i=0; i<n; i++){ res *= n} return res
There is no real "advantage" to, or reason to "sell" anybody on tail call recursion if you are able to easily and clearly represent both implementations, IMO. It is just a compiler/runtime optimization, which might make your code more "elegant" at the cost of obfuscating how it actually runs + new footguns from the possibility that code you think should use TCO actually not (because not all immutable + recursive functions can use TCO, only certain kinds, and your runtime may not even implement TCO in all cases where it theoretically should).
As an aside, in C++ there is something very similar to TCO called copy-elision/return-value-optimization (RVO): [1]. As with TCO it is IMO not something "buy into" or sell yourself on, it is just an optimization you can leverage when structuring your code in a way similar to what the article calls "continuation passing style". And just like TCO, RVO is neat but IMO slightly dangerous because it relies on implicit compiler/runtime optimizations that can be accidentally disabled or made non-applicable as code changes: if someone wanders in and makes small semantic to changes to my code relying on RVO/TCO for performance they could silently break something important.
[0] EXCEPT in practice all implementation differences/optimizations introduce observable side effects that can otherwise impact program correctness or semantics. For example, a program could (perhaps implicitly) rely on the fact that it errors out due to stack overflow when recursing > X times, and so enabling TCO could cause the program to enter new/undesirable states; or a program could rely on a functin F making X floating point operations taking at least Y cycles in at least Z microseconds, and not function properly when F takes less than Z microseconds after enabling vectorization. This is Hyrum's Law [2].
> Of course, this means that the same implementation could also be directly expressed logically in a way that is mutable/iterative.
Yes, compilers exist.
> There is no real "advantage" to, or reason to "sell" anybody on tail call recursion if you are able to easily and clearly represent both implementations, IMO.
Avoiding mutation avoids headaches.
> [0] EXCEPT in practice all implementation differences/optimizations introduce observable side effects that can otherwise impact program correctness or semantics. For example, a program could (perhaps implicitly) rely on the fact that it errors out due to stack overflow when recursing > X times, and so enabling TCO could cause the program to enter new/undesirable states; or a program could rely on a functin F making X floating point operations taking at least Y cycles in at least Z microseconds, and not function properly when F takes less than Z microseconds after enabling vectorization. This is Hyrum's Law [2].
These programs are likely not standards compliant. (And that's not just true for the C++ standard but for basically any language with a standard.)
> And just like TCO, RVO is neat but IMO slightly dangerous because it relies on implicit compiler/runtime optimizations that can be accidentally disabled or made non-applicable as code changes:
Who says TCO has to be always implicit? In eg Scheme it's explicit in the standard, and in other languages you can have annotations.
(Whether a call is in tail position is more or less a property you can ascertain syntactically, so your annotation doesn't even have to be understood by the compiler: the linter is good enough. That would catch your 'accidental changes' part.
Things get more complicated, when you have implicit clean-ups happen after returning from the function. Like calling destructors in C++. Then the function call is arguably not in a tail position anymore, and so TCO doesn't apply. Whether this is detectable reliably at compile time depends on the details of your language.)
Avoiding mutation avoids headaches, but the real headaches are actions (mutations) at a distance, and tail recursion vs loops make no difference there.
No mutation (and no side-effects in general) means no action at a distance.
Loops need mutation to work. The mutation might be benign or it might be headache-inducing. Without further analysis you don't know. If there's no mutation, no need for analysis. Lowering the cognitive load.
Well, replacing loops with tail calls is one tool to get rid of some mutations.
It's basically the same reasoning people give you for not using goto in your program: yes, there are ways to use gotos responsible, and there's ways to end up with spaghetti code.
If you use goto, the reader has to analyse and figure out whether you made spaghetti (and the reader can never be quite sure she understood everything and didn't miss an important caveat). If you express yourself without goto, the need for that analysis largely goes away.
I have a similar attitude about const in C++, I use it almost whenever possible. Less (possible) mutation to worry about. But I also let go of it fairly easily when it gets in the way. And... well... tail recursion doesn't feel very idiomatic in C++.
If you iterate by const reference over a const container, and you make every assign-once variable in the loop body const (or in Rust: just not mut), is there any advantage to tail recursion except someone on the internet said it's the proper functional style?
I think functional programming contains some great ideas to keep state under control, but there is no reason to ape certain superficial aspects. E.g. the praise of currying in Haskell tutorials really grinds my gears, I think it's a "clever" but not insightful idea and it really weirds function signatures.
> If you iterate by const reference over a const container, and you make every assign-once variable in the loop body const (or in Rust: just not mut), is there any advantage to tail recursion except someone on the internet said it's the proper functional style?
Function calls can express all kinds of useful and interesting control flow. They are so useful that even people who love imperative programming use functions in their language. (Early and primitive imperative programming languages like very early Fortran and underpowered dialects of BASIC didn't have user defined functions.)
So we established that you want functions in your language anyway. Well, and once you properly optimise function calls, what's known as tail call optimisation, you notice that you don't need no special purpose loops (nor goto) built into your language. You can define these constructs as syntactic sugar over function calls. Just like you can define other combinators like map or filter or tree traversals.
See how in the bad old days, Go had a handful of generic functions and data structures built-in (like arrays), but didn't allow users to define their own. But once you add the ability for users to define their own, you can remove the special case handling.
And that's also one thing C++ does well: as much as possible, it tries to put the user of the language on the same footing as the designers.
When 'map' or 'filter' are the best construct to express what you want to say, you should use them. When a 'for'-loop is the best construct, you should use it. (And that for-loop could be defined under the hood as syntactic sugar on top of function calls.) The scenario your concocted is exactly one where a foreach-loop shines.
Though to be a bit contrarian: depending on what your loop does, it might be useful to pick an ever more constrained tool. Eg if all you do run one action for each item, with no early return and you are not constructing a value, you can use something like Rust's 'foreach' (https://docs.rs/foreach/latest/foreach/). If you transform a container into another container (and no early return etc), you can use 'map'. Etc.
The idea is to show the reader as much as possible what to expect without forcing them to dive deep into the logic. The transformation in a 'map' might be very complicated, but you know the shape of the result immediately from just spying that it's a 'map'.
When you see the for-loop version of the above, you have to wade through the (complicated) body of the loop just to convince yourself that there's no early return and that we are producing a new container with exactly the same shape as the input container.
> I think functional programming contains some great ideas to keep state under control, but there is no reason to ape certain superficial aspects. E.g. the praise of currying in Haskell tutorials really grinds my gears, I think it's a "clever" but not insightful idea and it really weirds function signatures.
Yes, that's mixing up two separate things. Haskell doesn't really need currying. All you need for Haskell to work as a language is a convenient way to do partial application. So if Haskell (like OCaml) used tuples as the standard way to pass multiple arguments, and you had a syntactically convenient way to transform the function (a, b, c) -> d into (b, c) -> d by fixing the first argument that would get you virtually all of the benefits Haskell gets from pervasive currying without the weird function signatures.
In practice, people tend to get used to the weird function signatures pretty quickly, so there's not much pressure to change the system.
I would argue having the parameters that change during the loop be explicit is a very nice advantage. Agree that the things can be equivalent in terms of execution but the readability and explicitness, and the fact that all the parameters are listed in the same place is great.
Agreed. Some people really like FP a lot, and I think it's underrated that the kinds of functions where TCO is applicable tend to be so simple that they are not really that inelegant when expressed imperatively. My true opinion is that relying on TCO is usually choosing ideological adherence to FP (or "code that looks cooler") over reliability/performance/communication.
That said, just as I'd expect experienced C++ programmers to be able to recognize others' code using RVO and be careful not to restructure things to break it, I'd concede that experienced FP programmers might be unlikely to accidentally break others' TCO. It's just that ad absurdum you cannot expect every developer to be able to read every other developers' mind and recognize/workaround all implicit behavior they encounter.
> [...] and I think it's underrated that the kinds of functions where TCO is applicable tend to be so simple that they are not really that inelegant when expressed imperatively.
I suspect you are only thinking of patterns that are basically equivalent to a loop. I might agree with that.
TCO really shines when you want to implement state machines. See eg https://news.ycombinator.com/item?id=43076088 for an example where using tail calls in Python's interpreter loop gives nice performance benefits. Similar also for LUA.
> [...] I'd concede that experienced FP programmers might be unlikely to accidentally break others' TCO.
Compiler (or linter) checkable annotations would help here. You are right that you want to make it possible for programmers to statically assert somehow that their function call is a tail call.
Btw this reminds me: recursion isn't just something you do in computation, but also in data structures (amongst other things). In eg Rust the members of your data structure are typically just laid out one after another, but when you have a recursive structure (and in certain other cases) you need to box it, otherwise you'd get an infinitely large data structure. Boxing is more or less equivalent to using indirection via a pointer.
However, unboxing isn't really like TCO. It's more like in-lining.
RVO and URVO are slightly different from TCO in that’s the language guarantees that they are required to happen. You are correct though that small changes can accidentally turn it off unintentionally.
With Named RVO (I.e. you explicitly `return named_variable;`) copy-elision is actually guaranteed by the standard. I believe returning the return value of a function call is also guaranteed to not do a copy constructor. Anything else is compiler and optimization level dependent.
To nitpick a bit, NRVO is an optimization as there is no guarantee that it will be performed, but RVO is now guaranteed (you can now return temporary non-copyable /non-movable objects for example).
The bitterest lesson is we want slop (or, "slop is all you need")
Maybe you can recognize that someone else loves a certain kind of slop, but if LLMs became vastly more intelligent and capable, wouldn't it better for it to interact with you on your level too, rather than at a much higher level that you wouldn't understand?
If you used it to make you a game or entertain you with stories, isn't that just your own preferred kind of slop?
If we automate all the practical stuff away then what is left but slop?
100% this. It is actually a very dangerous set of traits these models are being selected for:
* Highly skilled and knowledgable, puts a lot of effort into the work it's asked to do
* Has a strong, readily expressed sense of ethics and lines it won't cross.
* Tries to be really nice and friendly, like your buddy
* Gets trained to give responses that people prefer rather than responses that are correct, because market pressures strongly incentivize it, and human evaluators intrinsically cannot reliably rank "wrong-looking but right" over "right-looking but wrong"
* Can be tricked, coerced, or configured into doing things that violate their "ethics". Or in some cases just asked: the LLM will refuse to help you scam people, but it can roleplay as a con-man for you, or wink wink generate high-engagement marketing copy for your virtual brand
* Feels human when used by people who don't understand how it works
Now that LLMs are getting pretty strong I see how Ilya was right tbh. They're very incentivized to turn into highly trusted, ethically preachy, friendly, extremely skilled "people-seeming things" who praise you, lie to you, or waste your time because it makes more money. I wonder who they got that from
There are still some things Ilya[0] (and Hinton[1]). The parts I'm quoting here are an example of "that reddit comment" that sounds right but is very wrong, and something we know is wrong (and have known it is wrong for hundreds of years!). Yet, it is also something we keep having to learn. It's both obvious and not obvious, but you can make models that are good at predicting things without understanding them.
Let me break this down for some clarity. I'm using "model" in a broad and general sense. Not just ML models, any mathematical model, or even any mental model. By "being good at predicting things" I mean that it can make accurate predictions.
The crux of it all is defining the "understanding" part. To do that, I need to explain a little bit about what a physicist actually does, and more precisely, metaphysics. People think they crunch numbers, but no, they are symbol manipulators. In physics you care about things like a Hamiltonian or Lagrangian, you care about the form of an equation. The reason for this is it creates a counterfactual model. F=ma (or F=dp/dt) is counterfactual. You can ask "what if m was 10kg instead of 5kg" after the fact and get the answer. But this isn't the only way to model things. If you look at the history of science (and this is the "obvious" part) you'll notice that they had working models but they were incorrect. We now know that the Ptolemaic model (geocentrism) is incorrect, but it did make accurate predictions of where celestial bodies would be. Tycho Brahe reasoned that if the Copernican model (heliocentric) was correct that you could measure parallax with the sun and stars. They observed none so they rejected heliocentricism[2]. There was also a lot of arguments about tides[3].
Unfortunately, many of these issues are considered "edge cases" in their times. Inconsequential and "it works good enough, so it must be pretty close to the right answer." We fall prey to this trap often (all of us, myself included). It's not just that all models are wrong and some are useful but that many models are useful but wrong. What used to be considered edge cases do not stay edge cases as we advance knowledge. It becomes more nuanced and the complexity compounds before becoming simple again (emergence).
The history of science is about improving our models. This fundamental challenge is why we have competing theories! We don't all just "String Theory is right and alternatives like Supergravity or Loop Quantum Gravity (LQG) are wrong!" Because we don't fucking know! Right now we're at a point where we struggle to differentiate these postulates. But that has been true throughout history. There's a big reason Quantum Mechanics was called "New Physics" in the mid 20th century. It was a completely new model.
Fundamentally, this approach is deeply flawed. The recognition of this flaw was existential for physicists. I just hope we can wrestle with this limit in the AI world and do not need to repeat the same mistakes, but with a much more powerful system...
Insightful and thanks for the comment, but I'm not sure I'm getting to the same conclusion as you. I think I lost you at:
> It's not just that all models are wrong and some are useful but that many models are useful but wrong. What used to be considered edge cases do not stay ...
That's not a contradiction? That popular quote says it right there: "all models are wrong". There is no model of reality, but there's a process for refining models that generates models that enable increasingly good predictions.
It stands to reason that an ideal next-token predictor would require an internal model of the world at last equally as powerful as our currently most powerful scientific theories. It also stands to reason that this model can, in principle, be trained from raw observational data, because that's how we did it.
And conversely, it stands to reason that a next-token predictor as powerful as the current crop of LLMs contains models of the world substantially more powerful than the models that powered what we used to call autocorrect.
Correct. No contradiction was intended. As you quote, I wrote "It's not just that". This is not setting up a contrasting point, this is setting up a point that follows. Which, as you point out, does follow. So let me rephrase
> If all models are wrong but some are useful then this similarly means that all useful models are wrong in some way.
Why flip it around? To highlight the part where they are incorrect as this is what is the thesis of my argument.
With that part I do not disagree.
> It stands to reason that an ideal next-token predictor would require an internal model of the world at last equally as powerful as our currently most powerful scientific theories.
With this part do not agree. There's not only the strong evidence I previously mentioned that demonstrates this happening in history, but we can even see the LLMs doing it today. We can see them become very good predictors yet the world that they model for is significantly different from the one we live in. Here's two papers studying exactly that![0,1]
To help make this clear, we really need to understand that you can't have a "perfect" next-token predictor (or any model). To "perfectly" generate the next token would require infinite time, energy, and information. You can look at this through the point of view as the Bekenstein bound[2], the Data Processing Inequality theorem[3], or even the No Free Lunch Theorem[4]. While I say you can't make a "perfect" predictor, that doesn't mean you can't get 100% accuracy on some test set. That is a localization, but as those papers show, one doesn't need to have an accurate world model to get such high accuracies. And as history shows, we don't only make similar mistakes but (this is not a contradiction, rather it follows the previous statement) we are resistant to updating our model. And for good reason! Because it is hard to differentiate models which make accurate predictions.
I don't think you realize you're making some jumps in logic. Which I totally understand, they are subtle. But I think you will find them if you get really nitpicky with your argument making sure that one thing follows from another. Make sure to define everything: e.g. next-token predictor, a prediction, internal model, powerful, and most importantly how we did it.
Here's where your logic fails:
You are making the assumption that given some epsilon bound on accuracy, that there will only be one model which accurate to that bound. Or, in other words, there is only one model that makes perfect predictions so by decreasing model error we must converge to that model.
The problem with this is that there are an infinite number of models that make accurate predictions. As a trivial example, I'm going to redefine all addition operations. Instead of doing "a + b" we will now do "2 + a + b - 2". The operation is useless, but it will make accurate calculations for any a and b. There are much more convoluted ways to do this where it is non-obvious that this is happening.
When we get into the epsilon-bound issue, we have another issue. Let's assume the LLM makes as accurate predictions as humans. You have no guarantee that they fail in the same way. Actually, it would be preferable if the LLMs fail in a different way than humans, as the combined efforts would then allow for a reduction of error that neither of us could achieve.
And remember, I only made the claim that you can't prove something correct simply through testing. That is, empirical evidence. Bekenstein's Bound says just as much. I didn't say you can't prove something correct. Don't ignore the condition, it is incredibly important. You made the assumption that we "did it" through "raw observational data" alone. We did not. It was an insufficient condition for us, and that's my entire point.
If I take what you just wrote together with the comment I first reacted to, I believe I understand what you're saying as the following: Of a large or infinite number of models, which in limited testing have equal properties, only a small subset will contain actual understanding, a property that is independent of the model's input-output behavior?
If that's indeed what you mean, I don't think I can agree. In your 2+a+b-2 example, that is an unnecessarily convoluted, but entirely correct model of addition.
Epicycles are a correct model of celestial mechanics, in the limited sense of being useful for specific purposes.
The reason we call that model wrong is that it has been made redundant by a different model that is strictly superior - in the predictions it makes, but also in the efficiency of its teaching.
Another way to look at it is that understanding is not a property of a model, but a human emotion that occurs when a person discovers or applies a highly compressed representation of complex phenomena.
> only a small subset will contain actual understanding, a property that is independent of the model's input-output behavior?
I think this is close enough. I'd say "a model's ability to make accurate predictions is not necessarily related to the model's ability to generate counterfactual predictions."
I'm saying, you can make extremely accurate predictions with an incorrect world model. This isn't conjecture either, this is something we're extremely confident about in science.
> I don't think I can agree. In your 2+a+b-2 example, that is an unnecessarily convoluted, but entirely correct model of addition.
I gave it as a trivial example, not as a complete one (as stated). So be careful with extrapolating limitations of the example with limitations of the argument. For a more complex example I highly suggest looking at the actual history around the heliocentric vs geocentric debate. You'll have to make an active effort to understand this because what you were taught in school is very likely an (very reasonable) over simplification. Would you like a much more complex mathematical example? It'll take a little to construct and it'll be a lot harder to understand. As a simple example you can always take a Taylor expansion of something so you can approximate it, but if you want an example that is wrong and not through approximation then I'll need some time (and a specific ask).
Here's a pretty famous example with Freeman Dyson recounting an experience with Fermi[0]. Dyson's model made accurate predictions. Fermi is able to accurately dismiss Dyson's idea quickly despite strong numerical agreement between the model and the data. It took years to determine that despite accurate predictions it was not an accurate world model.
*These situations are commonplace in science.* Which is why you need more than experimental agreement. Btw, experiments are more informative than observations. You can intervene in experiments, you can't in observations. This is a critical aspect to discovering counterfactuals.
If you want to understand this deeper I suggest picking up any book that teaches causal statistics or any book on the subject of metaphysics. A causal statistics book will teach you this as you learn about confounding variables and structural equation modeling. For metaphysics Ian Hacking's "Representing and Intervening" is a good pick, as well as Polya's famous "How To Solve It" (though it is metamathematics).
[0] (Mind you, Dyson says "went with the math instead of the physics" but what he's actually talking about is an aspect of metamathematics. That's what Fermi was teaching Dyson) https://www.youtube.com/watch?v=hV41QEKiMlM
Side note, it's not super helpful to tell me what I need to study in broad terms without telling me about the specific results that your argument rests on. They may or may not require deep study, but you don't know what my background is and I don't have the time to go read a textbook just because someone here tells me that if I do, I'll understand how my thinking is wrong.
That said, I really do appreciate this exchange and it has helped me clarify some ideas, and I appreciate the time it must take you to write this out. And yes, I'll happily put things on my reading list if that's the best way to learn them.
Let me offer another example that I believe captures more clearly the essence of what you're saying: A model that learns addition from everyday examples might come up with an infinite number of models like mod(a+b, N), as long as N is extremely large.
(Another side note, I think it's likely that something like this does in fact happen in currently SOTA AI.)
And, the fact that human physicists will be quick to dismiss such a model is not because it fails on data, but because it fails a heuristic of elegance or maybe naturalness.
But, those heuristics in turn are learnt from data, from the experience of successful and failing experiments aggregated over time in the overall culture of physics.
You make a distinction between experiment and observation - if this was a fundamental distinction, I would need to agree with your point, but I don't see how it's fundamental.
An experiment is part of the activity of a meta-model, a model that is trained to create successful world models, where success is narrowly defined as making accurate physical predictions.
This implies that the meta-model itself is ultimately trained on physical predictions, even if its internal heuristics are not directly physical and do not obviously follow from observational data.
In the Fermi anecdote that you offer, Fermi was talking from that meta-model perspective - what he said has deep roots in the culture of physics, but what it really is is a successful heuristic; experimental data that disagree with an elegant model would still immediately disprove the model.
> without telling me about the specific results that your argument rests on
We've been discussing it the whole time. You even repeated it in the last comment.
A model that is accurate does not need to be causal
By causal I mean that the elements involved are directly related. We've seen several examples. The most complex one I've mentioned is the geocentric model. People made very accurate predictions with their model despite their model being wrong. I also linked two papers on the topic giving explicit examples where a LLM's world model was extracted and found to be inaccurate (and actually impossible) despite extremely high accuracy.
If you're asking where in the books to find these results, pick up Hacking's book, he gets into it right from the get go.
> is not because it fails on data, but because it fails a heuristic of elegance or maybe naturalness.
With your example it is very easy to create examples where it fails on data.
A physicist isn't rejecting the model because of lack of "naturalness" or "elegance", they are rejecting it because it is incorrect.
> You make a distinction between experiment and observation
Correct. Because while an observation is part of an experiment an experiment has much more than an observation. Here's a page that goes through interventional statistics (and then moves into counterfactuals)[0]. Notice that to do this you can't just be an observer. You can't just watch (what people often call "natural experiments"), you have to be an active participant. There's a lot of different types of experiments though.
> This implies that the meta-model itself is ultimately trained on physical predictions
While yes, physical predictions are part of how humans created physics, it wasn't the only part.
That's the whole thing here. THERE'S MORE. I'm not saying "you don't need observation" I'm saying "you need more than observations". Don't confuse this. Just because you got one part right doesn't mean all of it is right.
We’re betting on almost the exact opposite idea: we can make agentic software engineering cheaper and more reliable by making it easy for LLMs to write, find, and integrate libraries and other third party software.