Hacker Newsnew | past | comments | ask | show | jobs | submit | more LegionMammal978's commentslogin

The probability is less than 1, and in fact it exponentially goes to 0, since the halting condition can be modeled as a biased random walk [0].

[0] https://wiki.bbchallenge.org/wiki/Antihydra#Trajectory


You can solve it with a linear recurrence relation [0]: the halting probability from position n is ((sqrt(5)-1)/2)^(n+1), where n is twice the number of odds minus the number of evens. (In fact, this +2/-1 random walk is precisely how the machine implements its termination condition.) The expected value of n is 1/3 the number of iterations. At the end of the longest simulation that has been computed, n is greater than 2^37, so the halting probability is less than 10^(-10^10).

[0] https://wiki.bbchallenge.org/wiki/Antihydra#Trajectory


The peak run lengths of evens/odds 'should' go to infinity, but these runs become a smaller and smaller component of the overall average, so that it is expected to approach the long-term 50% regardless.

In other words, an unbiased random walk should almost surely return to the origin, but a biased random walk will fail to return to the origin with nonzero probability. This can be considered a biased random walk [0], since the halting condition linearly moves further and further away from the expected value of the 50/50 walk.

[0] https://wiki.bbchallenge.org/wiki/Antihydra#Trajectory


This post is literally about an issue observed in vixie-cron (as included in some distro c. 2013), not about any systemd implementation.


Exactly! The article makes the point that:

> until one of them achieves cron’s level of ubiquity, we have to live with cron at least some places and sometimes

systemd could arguably be described as close to (maybe behind, maybe ahead of since it's the default for the most popular Linux distros) cron's level of ubiquity, and doesn't have this bug as far as we know.


It's considered a parse error [0]: it basically says that a parser may reject the document entirely if it occurs, but if it accepts the document, then it must act as if a space is present. In practice, browsers want to ignore all parse errors and accept as many documents as possible.

[0] https://html.spec.whatwg.org/multipage/parsing.html#parse-er...


> a parser may reject the document entirely if it occurs

Ah, that's what I was missing. Thanks! The relevant part of the spec:

> user agents, while parsing an HTML document, may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification.

(https://html.spec.whatwg.org/multipage/parsing.html#parse-er...)


It looks like that project does link against the usual Windows DLLs, it just doesn't use a static or dynamic C runtime.


Windows isn’t quite like Linux in that typically apps don’t make syscalls directly. Maybe you could say what’s in ntdll is the system call contract, but in practice you call the subsystem specific API, typically the Win32 API, which is huge compared to the Linux syscall list because it includes all sorts of things like UI, COM (!), etc.

The project has some of the properties discussed above such as not having a typical main() (or winmain), because there’s no CRT to call it.


To the extent that some champion machines have Collatz-like behavior, it's usually not by iterating a function with randomized scale factors until the value hits zero (like in the original Collatz iteration), but instead by iterating a function with a fixed scale factor until it hits a certain modular value.

E.g., the BB(5) machine [0] repeatedly multiplies a unary counter by 5/3 and adds a small offset, until the value becomes 2 mod 3 and the machine halts. In further domains, these kinds of "halt once a certain modular value is reached" conditions are used to terminate iterated exponentiation and higher operations.

The main similarity with the original Collatz iteration is that the machines repeatedly divide out a certain factor to produce a pseudorandom stream of remainders. In some cases, a machine measures a statistical property of the stream that is expected to hold forever, but cannot easily be proven to do so, as in the infamous Hydra/Antihydra machines [1].

Somewhat curiously, a candidate for the BB(3,3) champion machine [2] does not use any sort of Collatz-like modular-value conditions. Instead, it repeatedly samples points and compares them against the subdivisions of a sort of Cantor set, until one of the points perfectly hits the edge of a subdivision. It's expected to halt after roughly 10^135 iterations.

[0] https://wiki.bbchallenge.org/wiki/5-state_busy_beaver_winner

[1] https://www.sligocki.com/2024/07/06/bb-6-2-is-hard.html

[2] https://wiki.bbchallenge.org/wiki/1RB2LC1RC_2LC---2RB_2LA0LB...


If you want automatic implicit tail calls for literally everything, then you need a solution for

  {
    FooObject foo = FooObject(123);
    return foo.bar();
  }
ending up in a UAF when FooObject::bar() tries accessing the receiver "this". Or any other case of the tail function accessing a pointer to something the caller has put on the stack. Short of some kind of crazy dependency tracking (or shoving stuff onto the heap and using a GC), at the end of the day the programmer will have to explicitly work around the stack frame no longer existing in most return statements. To which the only workaround would be doing some circumlocutions specifically to avoid the tail-call recognition.


Good example. It's a little clearer if desugared:

    {
       FooObject foo = FooObject(123);
       return FooObject::bar(&foo);
    }
The foo object needs to be somewhere that the address of it means something, because C++ passes 'this' by pointer.

The answer to this is to look at the current stack frame, shuffle everything that needs to stay alive to one end of it, move the stack pointer to deallocate the rest and then jump to bar, where bar is now responsible for deallocating N bytes of stack before it continues onwards.

It's a pain, sure, but in the scheme of compiler backends it's fine. They do very similar things on optimisation grounds, e.g. "shrink wrapping" means holding off on allocating stack in case an early return fires and you don't need it after all.

Though, when memory use is a function of escape analysis, which is a function of how much time you gave the compiler to work with, I do start to empathise with the make-the-programmer-do-it as the solution.


> where bar is now responsible for deallocating N bytes of stack before it continues onwards.

That doesn't really seem feasible in general? For a function with pointer parameters, the compiler would need to generate a new version for every possible stack offset left by its callers. And this could be indefinitely large, if, e.g., the caller builds a linked list on the stack before passing it to a function, or if the escape analysis fails to prove that some stack pointers are no longer in use.

Also, in C++/Rust/etc., the stack objects can also have destructors that must run, and so the callee must remember which destructors it needs to run before deallocating those objects and returning. One generic way to do this would be to also pass an address to a shim that calls the destructors in the correct order, but at that point you've literally just reinvented return addresses.

In general, it can make sense for a compiler to shrinkwrap the stack to an extent, and to automatically use tail calls when it can prove it makes no difference, but ultimately you're going to have to make the programmer do it one way or another. And if you're going to make the programmer manually track the lifetimes differently than they would for any other function call, it's much clearer to have an explicit tail call indication.


I am dense but don't see the issue. I would assume that functions being called with pointers to the stack of the currently executing function would be ineligible for tail recursion.


I agree, some of the Linux people have a very broad notion of what counts as a derived work, but I haven't seen much in the way of actual case law to support the conclusion that "white box reverse engineering is generally illegal".

Software generally receives wide protection for its 'non-literal elements', but it's not the case that every possible iota of its function falls under its protectable expression. Indeed, plenty of U.S. courts have subscribed to the "abstraction-filtration-comparison test" which explicitly describes how some characteristics of a program can be outside the scope of its protection.

But in practice, many of the big software copyright cases have been about literal byte-for-byte copying of some part of a program or data structure, so that the issue comes down to either contract law or fair-use law (which Sega v. Accolade and Google v. Oracle both fall under). The scope of fine-grained protection for 'non-literal elements' seems relatively unexplored, and the maximalist interpretation has seemingly proliferated from people wanting to avoid any risk of legal trouble.


Linux/Linus is not the FSF, hence the FSF insistence on referring to gnu/Linux.


FSF/Stallman wanted the whole OS (Grub + Linux + Glibc + SysV init + ...) called Gnu/Linux.


Though at the limit, it would have to be at least O(sqrt(n)) thanks to the Bekenstein bound [0]. And of course, as mentioned in TFA, you can always do better if you can get away with local random access in parallel, rather than global random access.

[0] https://en.wikipedia.org/wiki/Bekenstein_bound


If one wants to start talking cosmology, it's unlikely to the case that arbitrarily long-lived computers are possible, I don't think any of the theories in [0] are conducive to either an infinite-time or infinite-memory computer, so the strict mathematical definition for Big-O doesn't hold up. IMO it's better to use Big-O as an effective theory for predicting runtime on human-scale computers than take the mathematical formalism too literally.

[0] https://en.wikipedia.org/wiki/Ultimate_fate_of_the_universe?...


> infinite-time or infinite-memory computer

That doesn't apply for the Bekenstein Bound though.

Literally the first line of the wikipedia article:

> In physics, the Bekenstein bound (named after Jacob Bekenstein) is an upper limit on the thermodynamic entropy S, or Shannon entropy H, that can be contained within a given *finite* region of space which has a *finite* amount of energy—or equivalently, the maximum amount of information that is required to perfectly describe a given physical system down to the quantum level.


I'm arguing against the use of Big-O "in the limit" as GP puts it; our tech is far away from that limit and O(N^{1/3}) is a better model.


I mean, if you want to talk about our actual tech, it's bound by lithography of silicon chips, which are largely n^2, on printed circuit boards, which are n^2, on the surface area of the earth, which is n^2.

This has been observed since at least 2014: https://www.ilikebigbits.com/2014_04_21_myth_of_ram_1/3_fit....

Pretty much every physical metric pegs memory access as O(n^1/2).


Okay, then we just use a bunch of tiny black holes and pack some extra dark energy between them, then we can get back to volumetric scaling. In fact, since dark matter isn't yet fully understood, once we can harness it, we can fit several times as much black hole surface area in the same space as a singular black hole.


Once you're talking about computers that verge on being black holes, I think you're making too many assumptions about how everything works to give it a speed rating.


Strong disagree- this is a fundamental core aspect of information theory.


The core aspect of information theory is how much information fits in a sphere. Getting from there to memory access latency in a real computer is several abstractions away and a lot of those abstractions might not hold.


It says that for a sufficiently large storage system, the information within will ultimately be limited by the surface area and not the volume. That you can indeed judge a book by its cover. For the sake of asymptotic analysis of galactic algorithms, one need only consider schemes for reading and writing information on the surface of a sphere. Where it comes to "real hardware," this sort of analysis is inapplicable.


The book you are presenting says nothing about latency. I judge the book as fine but not answering the right question.


It's obviously about latency. How do you not see the latency aspect of it?

Latency is directly bound by the speed of light. A computer the size of the earth's orbit will be bound by latency of 8 minutes. A computer the size of 2x of earth's orbit will be bound by a latency of 16 minutes, but have 4x the maximum information storage capacity.

The size of the book is directly proportional to the speed of light. Ever heard of ping? Does the universe you live in have infinite speed of light, and therefore you don't see how R contributes to latency?


Could you name one that seems likely to fail?


Off the top of my head: Assuming there's a specific point the data needs to get to. Assuming the size of the data sphere doesn't impact the speeds of anything inside it. Assuming we're using a classical computer. Assuming the support scaffolding of the computer stays a fixed percentage of the mass and doesn't eat into the data budget.

And I know some of those still fit into "at least" but if one of those would make it notably worse than sqrt(n) then I stand by my claim that it's a bad speed rating.


Time dilation for one. As you get closer to a black hole, an observer far away sees it as if time is slowing down for you. Not really sure how this changes the Big O analysis.


Ultimately I think we will go to full NUMA like Sun and others tried. Instead of having L4 and then L5 caches, each core simply has 4GB of local working memory and you use programming languages that are ready for this. Erlang would be easy, and I think Rust has the semantics to make it work but it might take years of compiler advancement to make it efficient.

All shared state is communicated through shared memory pool that is either accessed directly through segregated address ranges or via DMA.


NUMA has a huge amount of overhead (e.g. in terms of intercore latency), and NUMA server CPUs cost a lot more than single socket boards. If you look at the servers at Google or Facebook they will have some NUMA servers for certain workloads that actually need them, but most most servers will be single socket because they're cheaper and applications literally run faster on them. It's a win win if you can fit your workload on a single socket server so there is a lot of motivation to make applications work in a non-NUMA way if at all possible.


Honestly, i doubt it. That exposes many details to the programmers that many of them would prefer not to know.

The higher level the language, the less interest there is to manually manage memory. It is just something to offload to the gc/runtime/etc.

So, i think this is a no-go. The market wont accept it.


You already don’t have a choice. The reason we are all in the cloud is that hardware stopped scaling properly vertically and had to scale horizontally, and we needed abstractions that kept us from going insane doing that.

If you really want to dumb down what I’m suggesting, it’s is tantamount to blade servers with a better backplane, treating the box as a single machine instead of a cluster. If IPC replaces a lot of the RPC, you kick the Amdahl’s Law can down the road at least a couple of process cycles before we have to think of more clever things to do.

We didn’t have any of the right tooling in place fifteen years ago when this problem first started to be unavoidable, but is now within reach, if not in fact extant.


It's tricky to decode all this but there are a lot of misconceptions.

First, amdahl's law just says that the non parallel parts of a program become more of a bottleneck as the parallel parts are split up more. It's trivial and obvious, it has nothing to do with being able to scale to more cores because it has nothing to do with how much can be parallelized.

Second in your other comment, there is nothing special about "rust having the semantics" for NUMA. People have been programming NUMA machines since they existed (obviously). NUMA just means that some memory addresses are local and some are not local. If you want things to be fast you need to use the addresses that are local as much as possible.


You don't need every programmer to leverage the architecture for the market to accept it, just a few that hyper-optimize an implementation to a highly valuable problem (e.g., ads or something like that).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: