The most surprising Unix programs

abetusk · on March 15, 2020

For me, the most surprising one was paste.

paste allowed me to interleave to streams or to split out a single stream into two columns. I'd been writing custom scripting monstrosities before I discovered paste:

    $ paste <( echo -e 'foo\nbar' ) <( echo -e 'baz\nqux' )
    foo     baz
    bar     qux
    $ echo -e 'foo\nbar\nbaz\nqux' | paste - - 
    foo     bar
    baz     qux

I wonder what other unix gems I've been missing...

jerf · on March 15, 2020

Recently I found myself wanting to stop a program after a certain timeout. I found myself thinking "sure there's a UNIX program to do this in shell?" There is. And it's called... brace yourself... "timeout".

20+ years in UNIX, never encountered it. Part of GNU coreutils.

mekster · on March 16, 2020

Sometimes good to use for cron jobs so that the previous one won't keep running on top of the current run.

You should debug for it to not happen though but probably better than seeing your system choke with dozens of instances even slowing the jobs more.

There's also "run-one" that allows you to control which job should survive instead of just killing the old one.

http://manpages.ubuntu.com/manpages/trusty/man1/run-one.1.ht...

majewsky · on March 16, 2020

If you're using systemd timers, you get this behavior for free because those timers only activate service units. If the service is already active, nothing happens.

(Of course, this can be surprising in a different way when you expect a new job to start, but it does not because a previous one is still lingering. Not saying it's better or worse, just pointing out for those who don't know.)

saagarjha · on March 15, 2020

It’s unfortunately not part of POSIX, which means that BusyBox can’t get the options right or even stay consistently wrong :(

JdeBP · on March 15, 2020

I would guess, from experience of people doing things the hard way, at:

* John A. Kunze's jot and rs

* John Kerl's mlr ("Miller")

* jq

* join and comm, as mentioned

* fmt

* ex

And, given what you just wrote:

* printf

AdieuToLogic · on March 15, 2020

A handy one to add to your list is:

seq[0]

EDIT: I didn't see that you had already included 'jot' so removed it from my original reply.

0 - https://www.freebsd.org/cgi/man.cgi?query=seq&apropos=0&sekt...

JdeBP · on March 16, 2020

I didn't include it because in my experience people don't overlook seq, whereas they do overlook jot. I only included things where I've encountered people overlooking such tools and doing things the hard way.

pge · on March 15, 2020

Do I remember correctly that “jot” was also the name of a gui text editor on unix systems as well (different from the program referred to in the parent comment)? I think I used it in the early 90s on an SGI workstation running sun solaris OS.

loeg · on March 17, 2020

Are you thinking of joe? https://joe-editor.sourceforge.io/

mixmastamyk · on March 15, 2020

Yes, I remember a jot being the “notepad” of SGI workstations. Didn’t take me long to move on to nedit instead.

loeg · on March 15, 2020

The related join(1) and comm(1) are oft-missed and occasionally helpful.

cptnapalm · on March 15, 2020

I'm going through the AWK book and join is in there. I was unhappily surprised to find that using a tab as a delimiter is painful with join. Variations to overcome this which I've bumped into:

  join -t $'\t' file1 file2  (BASH only, I think.)
  join -t '<CTRL+v><Tab>' file1 file2
  join -t "`echo '\t'`" file1 file2

Why 'join -t '\t' file1 file2' is apparently beyond the pale has me mystified.

loeg · on March 15, 2020

Given that the delimiter must be a single character and the intended use is in shells, I am equally mystified as to why control codes must be passed literally. Join is smart enough to reject 2 character delimiters, but could easily grok escapes if it so chose:

$ join -t '\t' join: illegal tab character specification

JdeBP · on March 16, 2020

See https://unix.stackexchange.com/a/65819/5132 for some thinking on making every individual program have an escape sequence parser, and why that syntax is not specific nor original to the Bourne Again shell.

cptnapalm · on March 18, 2020

Found a link in your link to something by a Sven Mascheck and found out there is a printf command. Despite having started using Linux in 1997, I still find new things.

bloopernova · on March 15, 2020

Paste is quite wonderful. It adds a lot of flexibility in output and should definitely be more widely known.

thedance · on March 15, 2020

paste is an important part of the shell wizard's diet. paste with one input from `yes` is quite useful.

mci · on March 15, 2020

> Hidden inside WWB (writer's workbench), Lorinda Cherry's Parts annotated English text with parts of speech, based on only a smidgen of English vocabulary, orthography, and grammar.

Writer's Workbench was indeed a marvel of 1970's limited-space engineering. You can see it for yourself [1]: the generic part-of-speech rules are in end.l, the exceptions in edict.c and ydict.c, and the part-of-speech disambiguator in pscan.c. Such compact, rule-based NLP has fallen out of favor these days but (shameless plug alert!) Writer's Workbench inspired my 2018 IOCCC entry that highlights passive constructions in English texts [2].

[1] https://github.com/dspinellis/unix-history-repo/tree/BSD-4_1...

[2] https://ioccc.org/2018/ciura/hint.html

ivan_ah · on March 15, 2020

This Writer's Workbench seems really cool. The wikipedia page indicates there were quite a few more programs in the suite: https://en.wikipedia.org/wiki/Writer%27s_Workbench#Package_c...

Do you know where I could be able to find the source for all of these?

I'd be interested to "revive" these utils, possibly rewriting as python or bash for easy hacking. I have some basic scrips for that, and they are already proving to be useful even though they simply call grep https://github.com/ivanistheone/writing_scripts

dbremner · on March 15, 2020

I don't think it has all of them, but [0] is a tarball from Research Unix that has some Writer's Workbench source code.

The files are in cmd/wwb.

Other tarballs may have more Writer's Workbench code but I haven't looked at them.

[0] https://www.tuhs.org/Archive/Distributions/Research/Dan_Cros...

ahoka · on March 15, 2020

It's very interesting that the readme calls shell scripts "runcom", an archaic name coming from the Compatible Time Sharing System. This is also the origin for rc files.

loeg · on March 15, 2020

"Runcom" is still referenced by that name in BSD init(8):

https://svnweb.freebsd.org/base/head/sbin/init/init.c?view=m...

sn41 · on March 15, 2020

One of the useful applications of trigram-based analysis I have done is the following: for a large web-based application form where about 200000 online applications were made, we had to filter out the dummy applications - often, people would try out the interface using "aaa" as a name, for example.

Since the names were mostly Indian, we did not even have a standard database of names to test against.

What we did was the following: go through the entire database of all applications, and build a trigram frequency table. Then, using that trigram table, do a second pass over the database of names to find names with anomalous trigrams - if the percentage of trigram frequency anomaly in a name was too high (if the name was long enough), or the absolute number of trigrams in name was too high (if the name was short), we flagged the application and examined it manually. Using this alone, we were able to filter out a large number of dummy application forms.

Of course, it is not a comprehensive tool since what forms a valid name is very vague, but I think this kind of a tool is useful and culture-neutral.

amelius · on March 15, 2020

The problem with these methods is that you exclude everybody that deviates from the norm. Yes, it might make your life (as a developer) a little bit easier, but it makes the lives of some of the applicants a lot harder.

pjc50 · on March 15, 2020

I think the "manual review" phase makes this OK, in a way that simply autobanning Mr Null from your system isn't.

amelius · on March 15, 2020

I'm not sure if that would work either.

Why not send them an email to verify? Or use captcha tech developed by big companies that actually has some science behind it.

sn41 · on March 15, 2020

These are high school students in India. Many of them are from rural backgrounds and do not have personal email accounts. From our experience, the forms are many times filled up by employees at cyber cafes who fill their own email addresses and mobile numbers instead of the students.

The only reliable means of communications back to students is by a government approved website, or newspapers, or official media. (The process also has to stand up in court in case some student says that (s)he did not get the communication, and newspaper ads are a documentable evidence of communications on a specified date.)

(BTW, the forms do have captchas, the spurious forms are manually filled in by mischievous/malicious/curious applicants.)

banana_giraffe · on March 15, 2020

Often not in practice. These sort of tools to triage input data for human review generally means the humans will just auto-approve what the computer tells them to.

Yes, they'll catch some extreme cases, but anything that looks like the normal bad case will go by unnoticed generally more so than if the human is doing all of the review work.

znpy · on March 15, 2020

It's surprising that Doug McIlroy still reads and writes about UNIX.

For those who don't know, Dough is the guy that invented pipes.

cmroanirgo · on March 15, 2020

Also interesting:

> Originators of nearly half the list--pascal, struct, parts, eqn--were women, well beyond women's demographic share of computer science.

indigochill · on March 15, 2020

In the 40s, computing was seen as primarily women's work (similar to the stereotype of switchboard operators). Into the 60s, women still comprised up to half of the computing workforce. In 84, they peaked at 37%. So demographically speaking, the ratio was not as bad as it is today.

(Source: https://en.wikipedia.org/wiki/Women_in_computing)

reaperducer · on March 15, 2020

Into the 60s, women still comprised up to half of the computing workforce.

My life experience corroborates this.

When I was in school, girls were taught to type, and boys weren't. Because of that, many of the girls from my school went into computing, while the boys went to more "manly" pursuits. It's also why Catholic nuns were over-represented in early computing.

ruricolist · on March 15, 2020

"It's also why Catholic nuns were over-represented in early computing."

Would you mind expanding on that?

reaperducer · on March 15, 2020

Highly educated, and able to type. The first woman to earn a PhD in Computer Science was a nun in Ohio.

Catholic nuns have done much for the business world that is overlooked.

From memory, so no citations:

The first female CEOs were Catholic nuns — in the early 1900's, a time when women in the American workplace was uncommon, Catholic nuns founded over 800 hospitals in the United States. They also ran schools, colleges, and universities.

The concept of just-in-time delivery was invented by nuns running those hospitals.

The first health insurance company was founded in Missouri by nuns to care for railroad workers.

There are others. I once ran across a list of them on the intarwebs, but these are the ones that stuck with me.

yesenadam · on March 16, 2020

Fascinating, thanks. This all makes Sister Celine's contribution less surprising. I came across her in the classic A=B "about identities in general, and hypergeometric identities in particular, with emphasis on computer methods of discovery and proof." An amazing and wonderfully-written book.

https://www.math.upenn.edu/~wilf/AeqB.html

https://mathworld.wolfram.com/SisterCelinesMethod.html

https://en.wikipedia.org/wiki/Mary_Celine_Fasenmyer

nonesuchluck · on March 15, 2020

I feel like the line between computer user/operator and computer programmer used to be fuzzier. I've always wanted to better understand how and where that distinction has shifted over time. To this day, IBM calls mainframe operators, "systems programmers."

I suspect that shifting narrative may tell a chapter of the story of how women were gradually pushed out of our industry.

chrisfinazzo · on March 15, 2020

Talking to my parents, there was definitely a hierarchy between the two.

Although the programmers might have looked down on the operators as grunts, these people were in the privileged position of actually getting to interact with the machine directly, which is something.

You're pretty much right on re their exodus. Thompson's book, which I mentioned in my earlier comment, has a chapter called 'The ENIAC Girls Vanish'.

perl4ever · on March 15, 2020

In the 40s, I don't think computing was a thing like you're implying, kind of like nuclear reactors were a very very tiny area. The explosion of computer usage and programming was more like the late 50s, when Fortran came out. Even then, I don't think people even thought in terms of a "computing workforce". Nobody majored in computers, a programmer might be a math major or might not. Male engineers had female assistants that were more than typists, but not the same as a Google SWE today either. Nor the same personalities.

My source is my own family history and it makes me annoyed at the anachronistic assumptions and framework that people shoehorn historical tidbits into when discussing this topic.

By the way, how can you have 50% in the 60s and then a peak at 37% later?

morelisp · on March 15, 2020

50% is the workforce; 37% is for formal CS graduates. Most histories show unequal access to computing at home was a major factor discouraging woman, versus when any computing experience was on-the-job experience.

chrisfinazzo · on March 15, 2020

Clive Thompson's book "Coders" goes into this demographic shift in considerable depth - currently reading, would highly recommend. At least in those days, if you were reasonably smart and detail oriented, it was possible to find a way in.

Dad - a history major turned COBOL programmer and later, project manager - found his way into his prior company (and then stayed for 32 years) by this route.

He met Mom there too - also not uncommon for the time - who was a math major who decided that teaching wasn't the right fit for her.

samatman · on March 15, 2020

In the 40s, most computers were women.

This sounds really weird to modern ears but it's literally true. Computer was a job before it was a machine.

mhd · on March 15, 2020

"Workforce" can be a bit misleading though. If you look at Mad Men, the women were the majority of the ad agency workforce -- in the secretary pool.

An interesting statistic would be the gender ratio for people who published papers.

kristianp · on March 16, 2020

I'd be interested to see an example of how the pascal error messages worked.

saati · on March 15, 2020

Wirth is a man so who is he talking about wrt pascal?

atq2119 · on March 15, 2020

I'd suggest you go and actually read the article -- it's not about Pascal the language, but about a specific implementation of it.

saati · on March 15, 2020

I actually did, it did not answer my question, that's why I asked here.

kps · on March 15, 2020

    > pascal
    > 
    > The syntax diagnostics from the compiler made by Sue Graham's group at
    > Berkeley were the mmost helpful I have ever seen--and they were generated
    > automatically. At a syntax error the compiler would suggest a token that
    > could be inserted that would allow parsing to proceed further. No attempt
    > was made to explain what was wrong. The compiler taught me Pascal in
    > an evening, with no manual at hand.

TheDesolate0 · on March 15, 2020

I'm scrapping off some old brain cells here but I believe it's before the language (Pascal) and it was like a linter, but more like rustc's behavior.

kedean · on March 16, 2020

Pedantic but I think important clarification: McIlroy proposed the concept of linking programs together in a pipeline. Ken Thompson created the notation and implementation we all know.

tangue · on March 15, 2020

I didn't knew about typo. One surprising unix program I discovered this year is cal (or ncal). Having a calendar in your terminal is sometimes useful and I wish I knew earlier I could type things like ncal -w 2020

hinkley · on March 15, 2020

A similarly flavored one I’ve always appreciated is the man page for ascii, which shows the octal, decimal, and hex values for each character in the ASCII space.

Most unixes have one, although the format differs.

dotancohen · on March 16, 2020

How have I been googling ASCII codes for two decades with this right under my fingertips?!? Thank you!

aasasd · on March 15, 2020

Personally I prefer using the Mac app Alfred for things like that—basically a graphical one-shot terminal with autocompletion for a bunch of frequently-used stuff, in the vein of Spotlight. I whipped me up a script in Lua just so the calendar is faster than a readymade one in Python. However, Alfred needs to be bent somewhat to output content like a calendar in its suggestions.

butterthebuddha · on March 15, 2020

I would like to invite you to share your setup.

mekster · on March 16, 2020

You might like gcal better as it can show holidays too.

https://unix.stackexchange.com/questions/164555/how-to-empha...

pvaldes · on March 15, 2020

or cal -3 -m 3 2020

ja27 · on March 16, 2020

or cal -3 -m 9 1752

myroon5 · on March 18, 2020

3–13 September were skipped when the British Empire adopted the Gregorian calendar

https://en.wikipedia.org/wiki/1752

saagarjha · on March 15, 2020

And people say theoretical computer science isn’t useful in “the real world”…

I am curious about this one, though, has anyone used it?

> The syntax diagnostics from the compiler made by Sue Graham's group at Berkeley were the mmost helpful I have ever seen--and they were generated automatically. At a syntax error the compiler would suggest a token that could be inserted that would allow parsing to proceed further. No attempt was made to explain what was wrong.

On the surface it sounds a lot like it would produce error messages like “expected ‘;’” that most beginner programmers come to hate: was it any better than this, or was that the extent of its intelligence and everything else at the time was even worse?

thaumasiotes · on March 15, 2020

> On the surface it sounds a lot like it would produce error messages like “expected ‘;’” that most beginner programmers come to hate

Do people really come to hate these? I'd expect the opposite -- that people would start off hating messages like "expected ';'", but fairly quickly become accustomed to what they almost always mean.

As long as you can look at the message and have a good idea of what's wrong, it's not a bad message.

saagarjha · on March 15, 2020

The issue is that the solution to errors like these is often not adding a semicolon, but something else like “the compiler has no idea what is going on in this line” and the actual problem can range from something like an unbalanced delimiter, a misspelled keyword, or even a “syntactically valid-looking” (that a stupid parser, like a code highlighter or formatter, would approve of) but ultimately subtly illegal construct. With practice is usually becomes fairly easy to figure out where the actual error is, but it’s certainly not a very good experience for beginners. (And they’ll “come to hate it” because they’ll see it more than a few times before understanding how to deal with them.)

dottedmag · on March 15, 2020

Pascal grammar is sufficiently different from C-alikes, so it might explain why for Pascal source code these messages are more meaningful.

nicoburns · on March 15, 2020

Once you've used a compiler like Rust or Elm that actually provides suggestions for common solutions to these errors (effectively building the tribal knowledge of what the error "really means" into the compiler itself), it's hard to tolerate these cryptic errors that only really make sense to machines.

loeg · on March 15, 2020

I often found Rust's errors completely confusing, even after chasing down the '--explain CrypticNNNNN' follow-up explainer. This was in 2019 — not some ancient version of Rust.

saagarjha · on March 15, 2020

Yeah, Rust’s compiler errors are decent if you make simple mistakes but degrade to being about as bad as any other modern compiler’s once you start doing complicated things. Which isn’t horrible, but --explain isn’t really useful so it’s just wasting space on my screen.

steveklabnik · on March 15, 2020

Please file bugs for any message that is confusing! We track them like any other bug, and there’s some folks actively working on them.

saagarjha · on March 15, 2020

Hmm, I wouldn’t call them confusing per se, they’re just not useful, and I don’t think any compiler has really solved this problem (but then again, generation of compiler error messages is not something I’m an expert in). Let’s say I forgot to put a “*” in front of something: the compiler’s error might be something like “xyz does not implement SomeTrait, here is a page explaining what traits are”. I’d be more than happy to file bugs for things like these but I have generally refrained from doing so because I am unsure if this is something that is possible to fix. If you’d like, I could file issues for things like this, but I’m genuinely curious to hear if there’s any strategies on improving these or work done in this area.

steveklabnik · on March 15, 2020

Let us determine if it's possible or not. The person who currently works on errors is of the opinion that any time the error isn't useful, it's a bug.

> I’m genuinely curious to hear if there’s any strategies on improving these or work done in this area.

It's just a ton of work. You look at what the compiler produces, look at what information you can have at that point, see if you can craft something better, and then ship. And then look at the next error.

loeg · on March 15, 2020

> Let us determine if it's possible or not. The person who currently works on errors is of the opinion that any time the error isn't useful, it's a bug.

That's a fantastic attitude and I really appreciate that someone is working towards that goal, thanks.

> It's just a ton of work. You look at what the compiler produces, look at what information you can have at that point, see if you can craft something better, and then ship. And then look at the next error.

Exactly. That person is a saint.

saagarjha · on March 15, 2020

I have no issue with filing bugs about errors I think aren’t great, aside from the fact that I might not be able to suggest anything better…I’ll try it out the next time I see something.

loeg · on March 15, 2020

I'll keep that in mind for future interaction with rustc, thanks. Just to double check, you are referring to Github issues on the rust-lang/rust repository, right?

steveklabnik · on March 15, 2020

I am!

jimhefferon · on March 15, 2020

I'd like to know more about how they do this. Do you know a good source?

projektfu · on March 15, 2020

They were frustrating in pascal because of its original ;-as-separator philosophy. Lightspeed/Think pascal would give you those errors and guide you to a compilable program, but it was still too easy to make that mistake in the first place.

On the other hand, a missing semicolon in Microsoft C would often give a litany of unrelated errors.

loeg · on March 15, 2020

Some versions of GCC, as well (like, mid 2000s 3.x or 4.x).

m_j_g · on March 15, 2020

Elm has syntax errors stylized in this manner, but I am not sure how much is only rephrasing of usual "expected" or something more elaborate

Someone · on March 15, 2020

I would guess the most useful part of that is that would allow parsing to proceed further.

I used an Algol compiler that had messages such as:

  Semicolon missing after end (inserted)

  Undeclared identifier ‘foo’ (assumed integer)

Both of these hugely improved the compiler output, as far fewer utterly useless error messages would be produced (yes, I know I didn’t declare ‘foo’. You told me so the previous 12 times I used it)

Parsing valid programs is easy, so are bailing out or going into the woods when encountering invalid syntax. Producing meaningful error messages for line 100 after having seen errors on lines 13, 42 and 78 can be fairly hard.

sitkack · on March 15, 2020

Couple this with ast and source similarity search (how this is enabled is tbd) would allow one to not only suggest how to complete the program via the compiler's analysis, but then also find code that was similar to your program across all source say in crates.io or on github.

Great learning tool, but also the ability to copy-pasta from terabytes of code ... (scurries off to do some analysis).

JdeBP · on March 15, 2020

MetaWare was producing a C/C++ compiler that did much the same thing, in the 1980s.

chris_st · on March 15, 2020

I played around with it a bit in college; in my experience, the "help" wasn't very helpful, although it was suggesting keywords and not semi-colons (though I might not have left any out). I don't remember any specific examples, but I think the thing was that it was suggesting syntactically correct things, but they seemed irrelevant compared to the semantics of what I was trying to do.

I think it's interesting that McIlroy was able to learn Pascal from it!

morelisp · on March 15, 2020

Probably the majority of input validation (compilers or otherwise) errors I see today are still "unexpected x", and I would still often prefer "expected y".

Compilers have gotten significantly better in the past couple years. But in the browser I'm using to write this comment is even worse than "unexpected x", since it gives me the type and not even the token:

    var x = 1 console.log(x + 1)
    SyntaxError: unexpected token: identifier

chmaynard · on March 15, 2020

The author is THE Doug McIlroy. It's wonderful to learn that he's still around and spreading the good word.

https://en.wikipedia.org/wiki/Douglas_McIlroy

mjw1007 · on March 15, 2020

« Typo was as surprising inside as it was outside. Its similarity measure was based on trigram frequencies, which it counted in a 26x26x26 array. The small memory, which had barely room enough for 1-byte counters, spurred a scheme for squeezing large numbers into small counters. To avoid overflow, counters were updated probabilistically to maintain an estimate of the logarithm of the count. »

This sounds like something from the same family as hyperloglog

Wikipedia traces that back to the Flajolet–Martin algorithm in 1984. When would typo have been written?

mci · on March 15, 2020

You are confusing the approximate counting of distinct elements (done by ingenious algorithms like hyperloglog or Flajolet–Martin) with the approximate counting of each element from a manageable set (done by incrementing the counters less and less often as they grow).

saagarjha · on March 15, 2020

I believe that the paper backing the tool came out in the 70s, but if I ask IEEE for it it gives me back an awful PDF of one page that constrains a poor scan of the cover page of the paper and nothing else so I can’t confirm whether this idea was in it. Perhaps you might find more success: https://ieeexplore.ieee.org/abstract/document/6593963

DagAgren · on March 15, 2020

Probably not related.

Sounds like it's just doing something like replacing `counter++` with `if(rand() % counter == 0) counter++`, so that the counter will increase slower and slower the larger it gets.

morelisp · on March 15, 2020

Absolutely related! This is essentially the same observation that makes Flajolet-Martin and HyperLogLog work - that when comparing counts, the exact low bits of large numbers "matter less" than the low bits of small numbers, so you can store the logarithm of the count. They differ in how they calculate the "incremental log" without storing the real values, based on what they are counting (high-dimensional events vs. high-cardinality sets).

morelisp · on March 15, 2020

https://link.springer.com/article/10.1007%2FBF01934993 which is one of Flajolet's early papers on the topic opens with "Approximate counting is an algorithm proposed by R. Morris". My guess would be Morris wrote it out of engineering need, and Flajolet and Martin followed up with formal analysis and resulting improvements.

kqr · on March 16, 2020

Typo existed at least in Unix V5, with this source code: https://minnie.tuhs.org/cgi-bin/utree.pl?file=V5/usr/source/...

Unix V5 was released mid '70s, but as others have pointed out, counting is different from count distinct.

aasasd · on March 15, 2020

Seems to me like a variant of a counting Bloom filter.

morelisp · on March 15, 2020

Nope - counting bloom filters store an exact count of approximate events. This stores approximate counts of exact events.

If you count 5 "abc" and 5 "xyz" in a counting bloom filter, it will always say you had 10 events, but might say they were 10 of the same event.

If you count the same in Morris's structure, it will never confuse the two different sets, but might say one occurred 4 times and the other 8.

Of course, that means you can combine the two, for the benefits and downsides of both - storing very high (and inaccurate) counts of very sparse (and maybe misattributed) event sets.

adben · on March 15, 2020

How about GNU parallel? https://www.gnu.org/software/parallel/

nunoferreira · on March 15, 2020

wow! You just saved the future me thousands of hours.

saagarjha · on March 15, 2020

I hope you don’t mind the citation nags ;)

nunoferreira · on March 15, 2020

What about "comm" - compare two sorted files line by line. You can easily get occurrences only in file 1, in both files, only in file 2.

Super powerful and saved me hours of work.

pimlottc · on March 15, 2020

comm is a really useful tool, with one big caveat — you must make sure your input files are all sorted the exact same way. If not, you can get unexpected results, and worse, might not even realize it.

This may seem obvious, but there are many tiny ways that sorts can differ between locales, operating systems and programs (e.g. Excel), especially when dealing with Unicode. It may look the same 99% of the time, and you may not realize until later that you’ve accidentally filtered out values.

TomNomNom · on March 15, 2020

My advice is to sort the files just-in-time using the shell:

    comm <(sort fileA.txt) <(sort fileB.txt)

nunoferreira · on March 15, 2020

Absolutely! From my experience I only use with listings from the same source with the same sort tool (mostly unix sort).

Hello71 · on March 15, 2020

GNU comm prints a warning if either file is not sorted, unless all input lines are pairable.

zamadatix · on March 15, 2020

Comm is perfect for scripting usage but you might find diff better for human usage. Added bonus diff also does binary.

Plus diff was in part written by the author of the linked content :).

loeg · on March 15, 2020

Comm operates on sets. Diff is a patch generator. They serve different needs. They're both useful!

TheGrassyKnoll · on March 15, 2020

You might enjoy tkdiff

  sudo apt-get install tkdiff

beefbroccoli · on March 15, 2020

There's a very simple system tool that clicked on about 50 simultaneous lightbulbs in my brain after only 10 minutes of playing with it: mkfifo

ric2b · on March 22, 2020

The man page for it is awful, 0 explanation of what it actually does.

It allows you to create pipes as files! So you can do:

`echo 'hello world' > mypipe` on one terminal and `cat < mypipe` on another!

Very neat, I'm sure I'll find uses for it in the future.

ganzuul · on March 15, 2020

I learned about it through a plugin for irssi, which lets you have a list of users in an IRC channel in a tmux window partition.

ur-whale · on March 15, 2020

The fact that dc does (or at least tries to) guarantee error bounds on the result is news to me.

And if that does indeed work, that's pretty cool.

bloomer · on March 15, 2020

The default Android calculator app by Hans Boehm (developer of the Boehm Garbage Collector as well) does this by using the computable real numbers.

https://dl.acm.org/doi/10.1145/2911981

Provides a good overview of how it works and perms website has more information.

What’s cool about the computable reals implementation is you can increase the precision after the fact and it will recalculate up to that precision. Basically it memoizes the steps of the calculation and how they affect the precision.

nn3 · on March 15, 2020

I doubt the modern GNU or BSD versions of it that you are likely using do. Noone uses the original anymore.

swixmix · on March 15, 2020

Is scale factor the same as error bounds in http://man.openbsd.org/dc ?

nn3 · on March 15, 2020

I believe it's just the number of digits when the printing cuts off

kmstout · on March 15, 2020

sl

```

                          (  ) (@@) ( )  (@)  ()    @@    O     @     O     @      O
                     (@@@)
                 (    )
              (@@@@)

            (   )
         ====        ________                ___________
     _D _|  |_______/        \__I_I_____===__|_________|
      |(_)---  |   H\________/ |   |        =|___ ___|      _________________
      /     |  |   H  |  |     |   |         ||_| |_||     _|                \_____A
     |      |  |   H  |__--------------------| [___] |   =|                        |
     | ________|___H__/__|_____/[][]~\_______|       |   -|                        |
     |/ |   |-----------I_____I [][] []  D   |=======|____|________________________|_
   __/ =| o |=-O=====O=====O=====O \ ____Y___________|__|__________________________|_
    |/-=|___|=    ||    ||    ||    |_____/~\___/          |_D__D__D_|  |_D__D__D_|
     \_/      \__/  \__/  \__/  \__/      \_/               \_/   \_/    \_/   \_/

```

elteto · on March 15, 2020

During college my friend and I kept an innocent prank going for a couple of years: every time one of us left our laptops unlocked the other would jump in and type 'alias ls=sl' in the prompt and then clear the screen. Good times.

Izkata · on March 16, 2020

Put it in their bashrc ;)

brunoff · on March 16, 2020

  $ cowsay "hey dude"
   __________
  < hey dude >
   ----------
          \   ^__^
           \  (oo)\_______
              (__)\       )\/\
                  ||----w |
                  ||     ||

tomcatfish · on March 16, 2020

In your .bashrc (or whatever config file is run whenever you open up a new terminal).

fortune | cowsay

And voilà, you have a little quote running in your cow friend whenever you open up your terminal.

Also: Does anyone have any more good fortune files? I only have the fortunes that came preinstalled on Ubuntu but would love to have more.

kmstout · on March 17, 2020

I like cowsay (especially cowsay -f dragon) for punching up important git commit messages.

morelisp · on March 15, 2020

> struct - Brenda Baker undertook her Fortan-to-Ratfor converter against the advice of her department head--me. I thought it would likely produce an ad hoc reordering of the orginal, freed of statement numbers, but otherwise no more readable than a properly indented Fortran program. Brenda proved me wrong. She discovered that every Fortran program has a canonically structured form. Programmers preferred the canonicalized form to what they had originally written.

We could've had prettier et al instead of style linters 40(+?) years ago. :(

qubex · on March 15, 2020

I had to look up ‘Ratfor’ because I’d never heard of it — apparently it’s a FORTRAN preprocessor that added control structures.

msla · on March 15, 2020

https://en.wikipedia.org/wiki/Ratfor

The original Ratfor brings FORTRAN 66 nearly up to the level of a respectable programming language.

It turns this:

    if (a > b) {
      max = a
    } else {
      max = b
    }

Into this:

      IF(.NOT.(A.GT.B))GOTO 1
      MAX = A
      GOTO 2
    1 CONTINUE
      MAX = B
    2 CONTINUE

... with proper columnization, of course.

Going the opposite direction is pretty miraculous to me.

Ratfiv is the follow-on, which did the same to FORTRAN 77. However, FORTRAN 77 had control structures beyond the conditional GOTO, so Ratfiv was somewhat less necessary.

FORTRAN 77 would look like this:

      IF (A .GT. B) THEN
        MAX = A
      ELSE
        MAX = B
      ENDIF

https://en.wikipedia.org/wiki/Ratfiv

davidwihl · on March 15, 2020

One of the books that most influenced my coding was Software Tools by Brian W. Kernighan, P.J. Plauger [0]. Even though I never used Ratfor, the clear descriptions were immensely useful.

[0] https://www.goodreads.com/book/show/515603.Software_Tools

qubex · on March 15, 2020

I have that book. I have read snippets of it. Evidently I should read all the way through it.

mhd · on March 15, 2020

As usual, the original paper is paywalled, but it appears that this is about transforming ancient Fortran from GOTOs to structured control-flow (if-then, loops etc.).

That has almost nothing to do with the spaces-and-braces nitpicking of prettier/gofmt etc.

morelisp · on March 15, 2020

Not "almost nothing to do" - putting programs into a readable normal form seems the natural evolution of these tools.

When I started programming in the 90s, "spaces-and-braces" checking - as you say, nitpicking - was basically all we had, along with limited automatic tools to fix them (all more or less as good as `M-x indent-region`). If you were lucky and in a widely-used language you could cobble together compiler warnings, lint, and a few other tools to also get warnings about legacy interfaces (gets), dangerous practices (ignoring error codes), and unusual structure (shadowed variables, loop conditions that seemed impossible). Today we finally have considerably better tools that don't just check if you match a style guide but do a full reformat (not nitpicking, but doing it for you) and linters that can enforce 'deeper' structural demands, sometimes with automatic fixes.

But 40 years ago we had tools to completely restructure programs to a normalized form, and the practical experience to know programmers found this form preferable! And like so many things in our field, 10-20 years later we had to rediscover it, painfully, all over again. Probably because today's programmers think source-to-source Fortran/Ratfor translation has "almost nothing to do" with the challenges facing them today.

TheDesolate0 · on March 15, 2020

I _still_ do this today with new (to me) code bases!

By doing this I really read the code, and really get to understand what the previous programmer was doing.

Also something I took from Asimov's foundation series, code that doesn't look right, doesn't run right.

I know, not really; compiler gives no fucks, but I'm not a compiler however, and GCC error messages (Clang too!) are still about as useful as a hot bikini wax is to a walrus.

This was one of the features that made me really fall in love with Emacs back in the day! I could set it up to force my style requirements, then even yanked (pasted) code would be proper (mostly) and I couldn't fat finger my code to death.

My only emacs complaint is lisp. I get it, I just don't like it. I'll take fortran 77 over lisp any day (not 66 or before, tho. I'm not that crazy). So, sorry mr(s) moar lisp.

jawilson · on March 15, 2020

I've written a few useful scripts that everyone should have.

histogram - simply counts each occurrence of a line and then outputs from highest to lowest. I've implemented this program in several different languages for learning purposes. There are practical tricks that one can apply, such as hashing any line longer than the hash itself.

unique - like uniq but doesn't need to have sorted input! again, one can simply hash very long lines to save memory.

datetimes - looks for numbers that might be dates (seconds or milliseconds in certain reasonable ranges) and adds the human readable version of the date as comments to the end of the line they appear in. This is probably my most used script (I work with protocol buffers that often store dates as int64s).

human - reformats numbers into either powers of 2 or powers of 10. inspired obviously by the -h and -H flags from df.

I'm sure I have a few more but if I can't remember them from the top of my head, then they clearly aren't quite as generally useful.

Anyone else have some useful scripts like these?

jldugger · on March 15, 2020

> histogram

Is this much different than `alias histogram="sort $1 | uniq -c | uniq -nr"`

Sidenote: I started https://github.com/jldugger/moarutils as a means of publishing and sharing these, but it turns out I don't even have a lot of dumb ideas. Will probably end up bookmarking this HN post for "later."

nonesuchluck · on March 15, 2020

I work with csv files a lot. I have a short awk script which truncates/pads each column to a fixed width which I can specify at runtime. It also repeats the top column (headers) every 20 rows in a different ANSI color. I pipe the output to less -SR for interactive use so I can scan delimited data in a scrollable grid, with all columns aligned and labeled.

I understand there's vim plugins for this, but, ehh.

JdeBP · on March 16, 2020

There's also the likes of console-flat-table-viewer . One would have to convert the comma-separated stuff into one of the table types, but that's what Miller is for. (-:

* http://jdebp.uk./Softwares/nosh/guide/commands/console-flat-...

* http://johnkerl.org/miller/

mekster · on March 16, 2020

Where can I find them?

mkchoi212 · on March 15, 2020

“To avoid overflow, counters were updated probabilistically to maintain an estimate of the logarithm of the count.”

Stuff like this really makes me love what the pioneers of CS did in the past. In the past, they were counting every byte and every register while nowadays, programmers make things without considering the impact it will have on the HW.

londons_explore · on March 15, 2020

> The math library for Bob Morris's variable-precision desk calculator used backward error analysis to determine the precision necessary at each step to attain the user-specified precision of the result.

I wonder if compilers could do this today? If you can bound values for floating point operations, you might be able to replace them with fixed point equivalents and get a big speedup. You might also be able to replace them with ints or smaller floats if you can detect the result is rounded to an int.

CPU's also have the possibility to do this since they know (some of) the actual values at runtime, and could take shortcuts with floating point calculation in places where not needed for the result.

pavlov · on March 15, 2020

Replacing floats with fixed point isn’t usually a meaningful optimization on modern CPUs. The FPU runs in parallel to the integer units, so you can easily end up idling the FPU while the integer units are too busy doing both the math and the necessary state management (counters, pointer arithmetic etc.)

This could make sense for SIMD however, but then the problem is getting the array data in the right format before the computation — if you’re converting from float to int and back within the loop, it destroys any performance gain.

londons_explore · on March 15, 2020

Fixed point uses a lot less power though, and many use cases are effectively power limited rather than functional-unit limited, since if you really do fill all functional units on every cycle you'll soon need to throttle back your clock speed...

Perhaps a good example of that is video encoding, which is mostly fixed point, despite it looking like a pretty close fit for floating point maths.

pavlov · on March 15, 2020

A very good point. My worldview of performance is highly biased towards “full steam ahead” desktop graphics.

Video encoding is a bit of a special case though because the common algorithms are carefully designed for hardware acceleration. For most rendering, it doesn’t make sense to go out of your way to avoid the FPU.

bloomer · on March 15, 2020

This isn’t a performance optimization but rather an accuracy optimization. Even if the requested output is a double (64-bits) the intermediate calculations often need to be done to higher precision to get fully accurate answers. Note that the desktop calculator on Android does the same analysis by using computable numbers.

https://dl.acm.org/doi/10.1145/2911981

Is a nice overview.

nullc · on March 15, 2020

It's also a performance optimization though, since otherwise would might instead just use 400 digits of precision (or whatever) all the time, and round just the output.

bloomer · on March 15, 2020

Somewhat, in that it doesn’t use more precision than needed, but the real issue is you can’t just pick some arbitrarily large precision and round at the end. For some calculations, even 400 digits during intermediate steps would not be enough due to catastrophic cancellation and you would need to go even higher precision to get the right answer. It really is about solving an accuracy issue and not an optimization. And determining that you are using sufficient precision to get an accurate answer is an extra cost, so it is always more expensive than just plowing ahead and calculating an inaccurate answer.

nullc · on March 15, 2020

Presumably there is a maximum precision or otherwise a seemingly innocuous calculation could run you out of memory.

tannhaeuser · on March 15, 2020

What's surprising about eqn, dc, and egrep? I'm using the latter two all the time, and have used eqn (+troff/groff and even tbl and pic) in the 1990's for manuals and as late as (early) 2000's to typeset math-heavy course material. Not nearly as feature-rich as TeX/LaTeX, but much more approachable for casual math, with DSLs for typesetting equations, tables, and diagrams/graphs. I was delighted to see that GNU had a full suite of roff/troff drop-in replacements (which I later learned was implemented by James Clark, of SGML and, recently, Ballerina fame).

Mediterraneo10 · on March 15, 2020

I had never heard of eqn and was surprised to find that the binary is still there on my Linux box.

With regard to roff in general, when I got into Linux-based typesetting around the turn of the millennium, that was already seen as antiquated tech, superseded by LaTeX which was undergoing a frenzy of development and improvement around that time. So, anyone under the age of 30 will probably be hearing of such *roff stuff for the first time (and sadly even familiarity with LaTeX has waned).

tannhaeuser · on March 15, 2020

Ok I'm probably showing my age here then :) Back in the 1980 and 1990s, the roff suite, and most definitely egrep and classic Thompson DFA construction and DFA->NFA conversion was definitely Unix folklore/taught in Uni. Manpages are still rendered using roff/groff today, so probably many of us are using it regularly. Whereas GNU's texinfo has matured less well I'd say, or wasn't even very useful in practice to begin with due to lack of content.

I'm also using TeX/LaTex, but it's still a programming language whereas roff/eqn etc are non-Turing DSLs and renderers for particular narrow purposes. I get your point, but saying these are "antiquated" is like saying HTML is obsoleted by JavaScript.

burntsushi · on March 15, 2020

> and most definitely egrep and classic Thompson DFA construction and DFA->NFA conversion was definitely Unix folklore/taught in Uni

I think you mean "Thompson NFA construction" and "NFA->DFA."

Regardless though, this is not what the OP is pointing out. 'egrep' (or just GNU grep these days) is doing something more clever (emphasis mine):

> Al Aho expected his deterministic regular-expression recognizer would beat Ken's classic nondeterministic recognizer. Unfortunately, for single-shot use on complex regular expressions, Ken's could finish while egrep was still busy building a deterministic automaton. To finally gain the prize, Al sidestepped the curse of the automaton's exponentially big state table by inventing a way to build on the fly only the table entries that are actually visited during recognition.

Russ Cox talks about this a bit in part 3 of his articles on regex matching[1]. Its implementation in RE2 is here: https://github.com/google/re2/blob/master/re2/dfa.cc

[1] - https://swtch.com/~rsc/regexp/regexp3.html

tannhaeuser · on March 15, 2020

> I think you mean "Thompson NFA construction" and "NFA->DFA."

Yep, only noticed it later, then left it in to see who's paying attention :)

shakna · on March 15, 2020

> Manpages are still rendered using roff/groff today, so probably many of us are using it regularly.

I know a number of projects that generate their roff by using pandoc. They don't actually know, or have the inclination to learn, exactly how g/roff works.

saagarjha · on March 15, 2020

> Thompson DFA construction and DFA->NFA conversion

My (very recent) university education was unfortunately quite light on UNIX folklore, but this was converted in our formal automata course as we traversed the Chomsky hierarchy.

macintux · on March 15, 2020

You know who the author of the email is, right? He’s not using “surprising” in the sense of “I didn’t know this existed” but rather “these are quite amazing tools”.

mturmon · on March 16, 2020

I first met eqn in the mid 1980s.

It was amazing that you could write pretty complex math with just a few special literals like “sup” and “sum”, and the braces. It turns out that compositionality is so strong in math that it’s most of what you need. This isn’t obvious until you try it!

Paired with a LaserWriter (vintage 1986, say), and troff, you could get almost book quality typesetting.

Later on, TeX got the details of math much better, but the basic language was the same.

saagarjha · on March 15, 2020

The algorithms behind them.

ur-whale · on March 15, 2020

First time I hear of typo ... it's not on my standard Linux install ... where can I find the source code?

saagarjha · on March 15, 2020

It’s not quite the original, but Rob Pike wrote an implementation in Go: https://github.com/robpike/typo

TheDesolate0 · on March 15, 2020

In go? This is a job for rust!

...there goes my weekend.

tjalfi · on March 15, 2020

Typo was added in Research Unix V5[0] and also present in V6[1]. It isn't in V7, my guess is that it was replaced by spell. I don't think it would be difficult to get it compile on a modern system.

[0] https://github.com/dspinellis/unix-history-repo/blob/Researc...

[1] https://github.com/dspinellis/unix-history-repo/blob/Researc...

ruslan · on March 15, 2020

I would add bc to the list, very useful to make occasional calculations from command line using "human readable" syntax.

bonzini · on March 15, 2020

Fun fact, the first version of bc was just a frontend to dc. It converted the structured input to dc's stack-based form and let dc do the math.

ruslan · on March 15, 2020

Did not know that, thanks. I searched for dc inside bc and could find a reference to /usr/bin/dc, so I think bc still is just a wrapper.

% uname -a

FreeBSD skyrocket 9.3-RELEASE FreeBSD 9.3-RELEASE #1: Fri Nov 27 20:28:19 UTC 2015

bonzini · on March 15, 2020

GNU bc isn't though they share the bignum code, I am not surprised that the BSDs are following the older implementation more closely!

lcall · on March 17, 2020

I have found it useful to survey the existing unix utilities (maybe every several years). I'm no genius but I find things I will use. One way of course is simply to review the names in wherever your system stores manual pages, and read (or skim) those where you don't know what they do, trying out some things, or trying to remember at least where to look it up later when ready to use it. Another is by browsing to https://man.openbsd.org/ , then put a single period (".") in the search field, optionally choose a section (and/or other system, not sure how far the coverage goes), and click the apropos button.

jhoechtl · on March 15, 2020

Doug McIlroy is regularly active in the groff mailing list https://lists.gnu.org/archive/html/groff/

Torwald · on March 15, 2020

What does he man by "record structure in the file system" in re to Multics?

tenebrisalietum · on March 15, 2020

Unix files are simply a stream of bytes and outsource concern of file structure to userland. There's nowhere to set/get a type, no mechanism to create schema in the file like fields, lengths, constraints, etc. You simply can seek to a place in the file (if it's seekable) and read/write the bytes. What they mean is up to the programs/user/convention.

Earlier filesystems were trying much more to be like databases.

unused0 · on March 15, 2020

http://bitsavers.trailing-edge.com/pdf/honeywell/multics/AN5...

vladdoster · on March 15, 2020

Crabs seems likes a really cool program.

Here is a paper from Bell Labs

http://lucacardelli.name/Papers/Crabs.pdf

noisy_boy · on March 15, 2020

I didn't find egrep surprising - I use it quite often. The thing I didn't know about it was that it was Al Aho's creation. I only knew about him from awk.

yegle · on March 15, 2020

killall5 is the most bizarre command that I learned recently.

Read manpage before trying it.

nullc · on March 15, 2020

Sys5 killall: Bane of all regular linux administrators that also sometimes administered solaris boxes.

Once after blowing up an in production database server during the day, I suffered the unfortunate difficulty of having to explain why running a command "killall" on a critical server that killed everything was an innocent mistake and that I didn't have any reason to expect it to kill everything.

It's extremely difficult to not sound like a moron when explaining that you didn't expect "killall" to "kill all".

cat199 · on March 15, 2020

+1 - pgrep and pkill should be more widely 'taught' for this reason

NikkiA · on March 16, 2020

So it's functionally equivalent to `kill` with PID of -1, which is what we used to use back in the old days anyway. `kill -9 -1` should only kill your user processes if you're not root.

saagarjha · on March 15, 2020

killall5 was our favorite way to log off the machine in high school, at least until the (clearly incompetent) lab administrator removed its execute permissions because it had “kill” in its name.

smitty1e · on March 15, 2020

Hadn't heard of most of these.

The peoples' names were more recognizable.

winrid · on March 15, 2020

I found GNU parallel to be very useful/cool.

katharine7 · on March 16, 2020

sed awk tr egrep for processing making special greeting lol converting images

all are so exciting!!

TheDesolate0 · on March 15, 2020

sed & awk for life

pvaldes · on March 15, 2020

both rename and mmv are pretty handy

hyperpallium · on March 15, 2020

xargs parallelizes with -Pn

ur-whale · on March 15, 2020

[flagged]

DagAgren · on March 15, 2020

Gender doesn't matter, as long as you're male.