Not an author, but there's a good alternative. If busybox was edited to ignore argv[2], then applets could be called via shebangs, instead of symlinks:
Right now this doesn't work properly, because "./myecho" (argv[0]) gets placed into argv[2] of the process. Otherwise, this technique IMHO is better than symlinks:
- Each applet uses the same amount of disk space (0 blocks, i.e. the content fits into inode).
- Doesn't read or write to argv[0].
- You could finally rename the applets. This is not that useful if busybox is your only posix userspace implementation, but very useful if you want many implementations to live side-by-side. E.g. on macOS, I'd like to have readlink point to BSD/macOS's readlink, greadlink to GNU coreutil's, bbreadlink to busybox's.
But as I said, this doesn't work for now. The best you can do now is to write shell two-liners https://news.ycombinator.com/item?id=41436012. Some of such two-liners may also fit into the inode inlining limit, so that's a plus. But you will have performance penalty on every call (since sh needs to start up).
> Each applet uses the same amount of disk space (0 blocks, i.e. the content fits into inode).
Is that really the case? AFAIK, OpenWRT uses SquashFS by default, and a quick web search tells me that "[...] In addition, inode and directory data are highly compacted, and packed on byte boundaries. Each compressed inode is on average 8 bytes in length [...]" (https://www.kernel.org/doc/html/latest/filesystems/squashfs....). That is, even if the content fits into the inode, it will make the inode use more space (they're variable-size, unlike on traditional filesystems with fixed-size inodes).
And using hardlinks (traditionally, we use hardlinks with busybox, not symlinks) goes even further: all commands use a single inode, the only extra space needed is for the directory entry (which you need anyway).
Well that would be inefficient. For each command you run the kernel has to read the file, detect that it has a shebang, parse the shebang line, and then finally load the actual executable in memory. That could be a performance problem, since busybox is used typically in embedded systems that doesn't have a lot of resources: imagine a shell script that runs a command in a loop, it has to do a lot of extra work.
Finally, symlinks can be relative, while the solution you proposed is not. This is particularly useful for distributing software, e.g. distributing a tar file with the busybox itself and their symlinks.
In fact, you don't even need symlinks at all: you can even have hard links, that could even save disk space on embedded filesystems, that are readonly images anyway.
> Well that would be inefficient. For each command you run the kernel has to read the file, detect that it has a shebang, parse the shebang line, and then finally load the actual executable in memory.
Those that exist today would, but no kernel would have to work like that.
Once you've agreed that monolithic kernels have merits, you've accepted that the kernel can do whatever it wants to make this efficient—including being complicit in this scheme and leapfrogging over most of what you just described.
And I didn't mention the guidelines (i.e. newsguidelines.html). On that note, though:
> the site guidelines[...] aren't a list of proscribed behaviors but a set of values to internalize. I'd say "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize" covers this case pretty squarely
I’m going to challenge you on the performance angle. Instead of doing the shebang line, it has to traverse the filesystem to resolve the link. I suspect that’s probably more expensive than parsing the shebang line. Indeed, a shell script that runs a command in a loop should have busybox detecting the built in command & executing it inline without spawning executables via the file system (this is common in bash as well btw).
There are valid reasons but I think the performance angle is the weakest argument to make.
> Instead of doing the shebang line, it has to traverse the filesystem to resolve the link. I suspect that’s probably more expensive than parsing the shebang line.
I highly doubt that. Path traversal is one of the most optimized pieces of code in the Linux kernel, especially for commonly accessed places like /bin where everything is most likely already in the dentry cache. For the script with a shebang on the other hand it first has to read it from disk (or the page cache), then parse the path from it, and then do a path traversal anyway to find the referenced file.
I was going to say it'd be easier to have a single script, eg
#!/bin/sh
busybox $0 $@
and then every command required could just be a hardlink to the same script, instead of replicating it over and over again for hardcoded command names.
Then I realised the whole point is to posit a world where $0 doesn't exist, and we're not allowed to be clever about it.
- Each applet uses the same amount of disk space (0 blocks, i.e. the content fits into inode).
- Doesn't read or write to argv[0].
- You could finally rename the applets. This is not that useful if busybox is your only posix userspace implementation, but very useful if you want many implementations to live side-by-side. E.g. on macOS, I'd like to have readlink point to BSD/macOS's readlink, greadlink to GNU coreutil's, bbreadlink to busybox's.
But as I said, this doesn't work for now. The best you can do now is to write shell two-liners https://news.ycombinator.com/item?id=41436012. Some of such two-liners may also fit into the inode inlining limit, so that's a plus. But you will have performance penalty on every call (since sh needs to start up).