paste allowed me to interleave to streams or to split out a single stream into two columns. I'd been writing custom scripting monstrosities before I discovered paste:
Recently I found myself wanting to stop a program after a certain timeout. I found myself thinking "sure there's a UNIX program to do this in shell?" There is. And it's called... brace yourself... "timeout".
20+ years in UNIX, never encountered it. Part of GNU coreutils.
If you're using systemd timers, you get this behavior for free because those timers only activate service units. If the service is already active, nothing happens.
(Of course, this can be surprising in a different way when you expect a new job to start, but it does not because a previous one is still lingering. Not saying it's better or worse, just pointing out for those who don't know.)
I didn't include it because in my experience people don't overlook seq, whereas they do overlook jot. I only included things where I've encountered people overlooking such tools and doing things the hard way.
Do I remember correctly that “jot” was also the name of a gui text editor on unix systems as well (different from the program referred to in the parent comment)? I think I used it in the early 90s on an SGI workstation running sun solaris OS.
I'm going through the AWK book and join is in there. I was unhappily surprised to find that using a tab as a delimiter is painful with join.
Variations to overcome this which I've bumped into:
Given that the delimiter must be a single character and the intended use is in shells, I am equally mystified as to why control codes must be passed literally. Join is smart enough to reject 2 character delimiters, but could easily grok escapes if it so chose:
$ join -t '\t'
join: illegal tab character specification
See https://unix.stackexchange.com/a/65819/5132 for some thinking on making every individual program have an escape sequence parser, and why that syntax is not specific nor original to the Bourne Again shell.
Found a link in your link to something by a Sven Mascheck and found out there is a printf command. Despite having started using Linux in 1997, I still find new things.
> Hidden inside WWB (writer's workbench), Lorinda Cherry's Parts annotated English text with parts of speech, based on only a smidgen of English vocabulary, orthography, and grammar.
Writer's Workbench was indeed a marvel of 1970's limited-space engineering. You can see it for yourself [1]: the generic part-of-speech rules are in end.l, the exceptions in edict.c and ydict.c, and the part-of-speech disambiguator in pscan.c. Such compact, rule-based NLP has fallen out of favor these days but (shameless plug alert!) Writer's Workbench inspired my 2018 IOCCC entry that highlights passive constructions in English texts [2].
Do you know where I could be able to find the source for all of these?
I'd be interested to "revive" these utils, possibly rewriting as python or bash for easy hacking. I have some basic scrips for that, and they are already proving to be useful even though they simply call grep https://github.com/ivanistheone/writing_scripts
It's very interesting that the readme calls shell scripts "runcom", an archaic name coming from the Compatible Time Sharing System. This is also the origin for rc files.
One of the useful applications of trigram-based analysis I have done is the following: for a large web-based application form where about 200000 online applications were made, we had to filter out the dummy applications - often, people would try out the interface using "aaa" as a name, for example.
Since the names were mostly Indian, we did not even have a standard database of names to test against.
What we did was the following: go through the entire database of all applications, and build a trigram frequency table. Then, using that trigram table, do a second pass over the database of names to find names with anomalous trigrams - if the percentage of trigram frequency anomaly in a name was too high (if the name was long enough), or the absolute number of trigrams in name was too high (if the name was short), we flagged the application and examined it manually. Using this alone, we were able to filter out a large number of dummy application forms.
Of course, it is not a comprehensive tool since what forms a valid name is very vague, but I think this kind of a tool is useful and culture-neutral.
The problem with these methods is that you exclude everybody that deviates from the norm. Yes, it might make your life (as a developer) a little bit easier, but it makes the lives of some of the applicants a lot harder.
These are high school students in India. Many of them are from rural backgrounds and do not have personal email accounts. From our experience, the forms are many times filled up by employees at cyber cafes who fill their own email addresses and mobile numbers instead of the students.
The only reliable means of communications back to students is by a government approved website, or newspapers, or official media. (The process also has to stand up in court in case some student says that (s)he did not get the communication, and newspaper ads are a documentable evidence of communications on a specified date.)
(BTW, the forms do have captchas, the spurious forms are manually filled in by mischievous/malicious/curious applicants.)
Often not in practice. These sort of tools to triage input data for human review generally means the humans will just auto-approve what the computer tells them to.
Yes, they'll catch some extreme cases, but anything that looks like the normal bad case will go by unnoticed generally more so than if the human is doing all of the review work.
In the 40s, computing was seen as primarily women's work (similar to the stereotype of switchboard operators). Into the 60s, women still comprised up to half of the computing workforce. In 84, they peaked at 37%. So demographically speaking, the ratio was not as bad as it is today.
Into the 60s, women still comprised up to half of the computing workforce.
My life experience corroborates this.
When I was in school, girls were taught to type, and boys weren't. Because of that, many of the girls from my school went into computing, while the boys went to more "manly" pursuits. It's also why Catholic nuns were over-represented in early computing.
Highly educated, and able to type. The first woman to earn a PhD in Computer Science was a nun in Ohio.
Catholic nuns have done much for the business world that is overlooked.
From memory, so no citations:
The first female CEOs were Catholic nuns — in the early 1900's, a time when women in the American workplace was uncommon, Catholic nuns founded over 800 hospitals in the United States. They also ran schools, colleges, and universities.
The concept of just-in-time delivery was invented by nuns running those hospitals.
The first health insurance company was founded in Missouri by nuns to care for railroad workers.
There are others. I once ran across a list of them on the intarwebs, but these are the ones that stuck with me.
Fascinating, thanks. This all makes Sister Celine's contribution less surprising. I came across her in the classic A=B "about identities in general, and hypergeometric identities in particular, with emphasis on computer methods of discovery and proof." An amazing and wonderfully-written book.
I feel like the line between computer user/operator and computer programmer used to be fuzzier. I've always wanted to better understand how and where that distinction has shifted over time. To this day, IBM calls mainframe operators, "systems programmers."
I suspect that shifting narrative may tell a chapter of the story of how women were gradually pushed out of our industry.
Talking to my parents, there was definitely a hierarchy between the two.
Although the programmers might have looked down on the operators as grunts, these people were in the privileged position of actually getting to interact with the machine directly, which is something.
You're pretty much right on re their exodus. Thompson's book, which I mentioned in my earlier comment, has a chapter called 'The ENIAC Girls Vanish'.
In the 40s, I don't think computing was a thing like you're implying, kind of like nuclear reactors were a very very tiny area. The explosion of computer usage and programming was more like the late 50s, when Fortran came out. Even then, I don't think people even thought in terms of a "computing workforce". Nobody majored in computers, a programmer might be a math major or might not. Male engineers had female assistants that were more than typists, but not the same as a Google SWE today either. Nor the same personalities.
My source is my own family history and it makes me annoyed at the anachronistic assumptions and framework that people shoehorn historical tidbits into when discussing this topic.
By the way, how can you have 50% in the 60s and then a peak at 37% later?
50% is the workforce; 37% is for formal CS graduates. Most histories show unequal access to computing at home was a major factor discouraging woman, versus when any computing experience was on-the-job experience.
Clive Thompson's book "Coders" goes into this demographic shift in considerable depth - currently reading, would highly recommend. At least in those days, if you were reasonably smart and detail oriented, it was possible to find a way in.
Dad - a history major turned COBOL programmer and later, project manager - found his way into his prior company (and then stayed for 32 years) by this route.
He met Mom there too - also not uncommon for the time - who was a math major who decided that teaching wasn't the right fit for her.
> pascal
>
> The syntax diagnostics from the compiler made by Sue Graham's group at
> Berkeley were the mmost helpful I have ever seen--and they were generated
> automatically. At a syntax error the compiler would suggest a token that
> could be inserted that would allow parsing to proceed further. No attempt
> was made to explain what was wrong. The compiler taught me Pascal in
> an evening, with no manual at hand.
Pedantic but I think important clarification: McIlroy proposed the concept of linking programs together in a pipeline. Ken Thompson created the notation and implementation we all know.
I didn't knew about typo. One surprising unix program I discovered this year is cal (or ncal). Having a calendar in your terminal is sometimes useful and I wish I knew earlier I could type things like ncal -w 2020
A similarly flavored one I’ve always appreciated is the man page for ascii, which shows the octal, decimal, and hex values for each character in the ASCII space.
Most unixes have one, although the format differs.
Personally I prefer using the Mac app Alfred for things like that—basically a graphical one-shot terminal with autocompletion for a bunch of frequently-used stuff, in the vein of Spotlight. I whipped me up a script in Lua just so the calendar is faster than a readymade one in Python. However, Alfred needs to be bent somewhat to output content like a calendar in its suggestions.
And people say theoretical computer science isn’t useful in “the real world”…
I am curious about this one, though, has anyone used it?
> The syntax diagnostics from the compiler made by Sue Graham's group at Berkeley were the mmost helpful I have ever seen--and they were generated automatically. At a syntax error the compiler would suggest a token that could be inserted that would allow parsing to proceed further. No attempt was made to explain what was wrong.
On the surface it sounds a lot like it would produce error messages like “expected ‘;’” that most beginner programmers come to hate: was it any better than this, or was that the extent of its intelligence and everything else at the time was even worse?
> On the surface it sounds a lot like it would produce error messages like “expected ‘;’” that most beginner programmers come to hate
Do people really come to hate these? I'd expect the opposite -- that people would start off hating messages like "expected ';'", but fairly quickly become accustomed to what they almost always mean.
As long as you can look at the message and have a good idea of what's wrong, it's not a bad message.
The issue is that the solution to errors like these is often not adding a semicolon, but something else like “the compiler has no idea what is going on in this line” and the actual problem can range from something like an unbalanced delimiter, a misspelled keyword, or even a “syntactically valid-looking” (that a stupid parser, like a code highlighter or formatter, would approve of) but ultimately subtly illegal construct. With practice is usually becomes fairly easy to figure out where the actual error is, but it’s certainly not a very good experience for beginners. (And they’ll “come to hate it” because they’ll see it more than a few times before understanding how to deal with them.)
Once you've used a compiler like Rust or Elm that actually provides suggestions for common solutions to these errors (effectively building the tribal knowledge of what the error "really means" into the compiler itself), it's hard to tolerate these cryptic errors that only really make sense to machines.
I often found Rust's errors completely confusing, even after chasing down the '--explain CrypticNNNNN' follow-up explainer. This was in 2019 — not some ancient version of Rust.
Yeah, Rust’s compiler errors are decent if you make simple mistakes but degrade to being about as bad as any other modern compiler’s once you start doing complicated things. Which isn’t horrible, but --explain isn’t really useful so it’s just wasting space on my screen.
Hmm, I wouldn’t call them confusing per se, they’re just not useful, and I don’t think any compiler has really solved this problem (but then again, generation of compiler error messages is not something I’m an expert in). Let’s say I forgot to put a “*” in front of something: the compiler’s error might be something like “xyz does not implement SomeTrait, here is a page explaining what traits are”. I’d be more than happy to file bugs for things like these but I have generally refrained from doing so because I am unsure if this is something that is possible to fix. If you’d like, I could file issues for things like this, but I’m genuinely curious to hear if there’s any strategies on improving these or work done in this area.
Let us determine if it's possible or not. The person who currently works on errors is of the opinion that any time the error isn't useful, it's a bug.
> I’m genuinely curious to hear if there’s any strategies on improving these or work done in this area.
It's just a ton of work. You look at what the compiler produces, look at what information you can have at that point, see if you can craft something better, and then ship. And then look at the next error.
> Let us determine if it's possible or not. The person who currently works on errors is of the opinion that any time the error isn't useful, it's a bug.
That's a fantastic attitude and I really appreciate that someone is working towards that goal, thanks.
> It's just a ton of work. You look at what the compiler produces, look at what information you can have at that point, see if you can craft something better, and then ship. And then look at the next error.
I have no issue with filing bugs about errors I think aren’t great, aside from the fact that I might not be able to suggest anything better…I’ll try it out the next time I see something.
I'll keep that in mind for future interaction with rustc, thanks. Just to double check, you are referring to Github issues on the rust-lang/rust repository, right?
They were frustrating in pascal because of its original ;-as-separator philosophy. Lightspeed/Think pascal would give you those errors and guide you to a compilable program, but it was still too easy to make that mistake in the first place.
On the other hand, a missing semicolon in Microsoft C would often give a litany of unrelated errors.
I would guess the most useful part of that is that would allow parsing to proceed further.
I used an Algol compiler that had messages such as:
Semicolon missing after end (inserted)
Undeclared identifier ‘foo’ (assumed integer)
Both of these hugely improved the compiler output, as far fewer utterly useless error messages would be produced (yes, I know I didn’t declare ‘foo’. You told me so the previous 12 times I used it)
Parsing valid programs is easy, so are bailing out or going into the woods when encountering invalid syntax. Producing meaningful error messages for line 100 after having seen errors on lines 13, 42 and 78 can be fairly hard.
Couple this with ast and source similarity search (how this is enabled is tbd) would allow one to not only suggest how to complete the program via the compiler's analysis, but then also find code that was similar to your program across all source say in crates.io or on github.
Great learning tool, but also the ability to copy-pasta from terabytes of code ... (scurries off to do some analysis).
I played around with it a bit in college; in my experience, the "help" wasn't very helpful, although it was suggesting keywords and not semi-colons (though I might not have left any out). I don't remember any specific examples, but I think the thing was that it was suggesting syntactically correct things, but they seemed irrelevant compared to the semantics of what I was trying to do.
I think it's interesting that McIlroy was able to learn Pascal from it!
Probably the majority of input validation (compilers or otherwise) errors I see today are still "unexpected x", and I would still often prefer "expected y".
Compilers have gotten significantly better in the past couple years. But in the browser I'm using to write this comment is even worse than "unexpected x", since it gives me the type and not even the token:
var x = 1 console.log(x + 1)
SyntaxError: unexpected token: identifier
« Typo was as surprising inside as it was outside. Its similarity measure was based on trigram frequencies, which it counted in a 26x26x26 array. The small memory, which had barely room enough for 1-byte counters, spurred a scheme for squeezing large numbers into small counters. To avoid overflow, counters were updated probabilistically to maintain an estimate of the logarithm of the count. »
This sounds like something from the same family as hyperloglog
Wikipedia traces that back to the Flajolet–Martin algorithm in 1984. When would typo have been written?
You are confusing the approximate counting of distinct elements (done by ingenious algorithms like hyperloglog or Flajolet–Martin) with the approximate counting of each element from a manageable set (done by incrementing the counters less and less often as they grow).
I believe that the paper backing the tool came out in the 70s, but if I ask IEEE for it it gives me back an awful PDF of one page that constrains a poor scan of the cover page of the paper and nothing else so I can’t confirm whether this idea was in it. Perhaps you might find more success: https://ieeexplore.ieee.org/abstract/document/6593963
Sounds like it's just doing something like replacing `counter++` with `if(rand() % counter == 0) counter++`, so that the counter will increase slower and slower the larger it gets.
Absolutely related! This is essentially the same observation that makes Flajolet-Martin and HyperLogLog work - that when comparing counts, the exact low bits of large numbers "matter less" than the low bits of small numbers, so you can store the logarithm of the count. They differ in how they calculate the "incremental log" without storing the real values, based on what they are counting (high-dimensional events vs. high-cardinality sets).
https://link.springer.com/article/10.1007%2FBF01934993 which is one of Flajolet's early papers on the topic opens with "Approximate counting is an algorithm proposed by R. Morris". My guess would be Morris wrote it out of engineering need, and Flajolet and Martin followed up with formal analysis and resulting improvements.
Nope - counting bloom filters store an exact count of approximate events. This stores approximate counts of exact events.
If you count 5 "abc" and 5 "xyz" in a counting bloom filter, it will always say you had 10 events, but might say they were 10 of the same event.
If you count the same in Morris's structure, it will never confuse the two different sets, but might say one occurred 4 times and the other 8.
Of course, that means you can combine the two, for the benefits and downsides of both - storing very high (and inaccurate) counts of very sparse (and maybe misattributed) event sets.
comm is a really useful tool, with one big caveat — you must make sure your input files are all sorted the exact same way. If not, you can get unexpected results, and worse, might not even realize it.
This may seem obvious, but there are many tiny ways that sorts can differ between locales, operating systems and programs (e.g. Excel), especially when dealing with Unicode. It may look the same 99% of the time, and you may not realize until later that you’ve accidentally filtered out values.
Provides a good overview of how it works and perms website has more information.
What’s cool about the computable reals implementation is you can increase the precision after the fact and it will recalculate up to that precision. Basically it memoizes the steps of the calculation and how they affect the precision.
During college my friend and I kept an innocent prank going for a couple of years: every time one of us left our laptops unlocked the other would jump in and type 'alias ls=sl' in the prompt and then clear the screen. Good times.
> struct - Brenda Baker undertook her Fortan-to-Ratfor converter against the advice of her department head--me. I thought it would likely produce an ad hoc reordering of the orginal, freed of statement numbers, but otherwise no more readable than a properly indented Fortran program. Brenda proved me wrong. She discovered that every Fortran program has a canonically structured form. Programmers preferred the canonicalized form to what they had originally written.
We could've had prettier et al instead of style linters 40(+?) years ago. :(
The original Ratfor brings FORTRAN 66 nearly up to the level of a respectable programming language.
It turns this:
if (a > b) {
max = a
} else {
max = b
}
Into this:
IF(.NOT.(A.GT.B))GOTO 1
MAX = A
GOTO 2
1 CONTINUE
MAX = B
2 CONTINUE
... with proper columnization, of course.
Going the opposite direction is pretty miraculous to me.
Ratfiv is the follow-on, which did the same to FORTRAN 77. However, FORTRAN 77 had control structures beyond the conditional GOTO, so Ratfiv was somewhat less necessary.
One of the books that most influenced my coding was Software Tools by Brian W. Kernighan, P.J. Plauger [0]. Even though I never used Ratfor, the clear descriptions were immensely useful.
As usual, the original paper is paywalled, but it appears that this is about transforming ancient Fortran from GOTOs to structured control-flow (if-then, loops etc.).
That has almost nothing to do with the spaces-and-braces nitpicking of prettier/gofmt etc.
Not "almost nothing to do" - putting programs into a readable normal form seems the natural evolution of these tools.
When I started programming in the 90s, "spaces-and-braces" checking - as you say, nitpicking - was basically all we had, along with limited automatic tools to fix them (all more or less as good as `M-x indent-region`). If you were lucky and in a widely-used language you could cobble together compiler warnings, lint, and a few other tools to also get warnings about legacy interfaces (gets), dangerous practices (ignoring error codes), and unusual structure (shadowed variables, loop conditions that seemed impossible). Today we finally have considerably better tools that don't just check if you match a style guide but do a full reformat (not nitpicking, but doing it for you) and linters that can enforce 'deeper' structural demands, sometimes with automatic fixes.
But 40 years ago we had tools to completely restructure programs to a normalized form, and the practical experience to know programmers found this form preferable! And like so many things in our field, 10-20 years later we had to rediscover it, painfully, all over again. Probably because today's programmers think source-to-source Fortran/Ratfor translation has "almost nothing to do" with the challenges facing them today.
I _still_ do this today with new (to me) code bases!
By doing this I really read the code, and really get to understand what the previous programmer was doing.
Also something I took from Asimov's foundation series, code that doesn't look right, doesn't run right.
I know, not really; compiler gives no fucks, but I'm not a compiler however, and GCC error messages (Clang too!) are still about as useful as a hot bikini wax is to a walrus.
This was one of the features that made me really fall in love with Emacs back in the day! I could set it up to force my style requirements, then even yanked (pasted) code would be proper (mostly) and I couldn't fat finger my code to death.
My only emacs complaint is lisp. I get it, I just don't like it. I'll take fortran 77 over lisp any day (not 66 or before, tho. I'm not that crazy). So, sorry mr(s) moar lisp.
I've written a few useful scripts that everyone should have.
histogram - simply counts each occurrence of a line and then outputs from highest to lowest. I've implemented this program in several different languages for learning purposes. There are practical tricks that one can apply, such as hashing any line longer than the hash itself.
unique - like uniq but doesn't need to have sorted input! again, one can simply hash very long lines to save memory.
datetimes - looks for numbers that might be dates (seconds or milliseconds in certain reasonable ranges) and adds the human readable version of the date as comments to the end of the line they appear in. This is probably my most used script (I work with protocol buffers that often store dates as int64s).
human - reformats numbers into either powers of 2 or powers of 10. inspired obviously by the -h and -H flags from df.
I'm sure I have a few more but if I can't remember them from the top of my head, then they clearly aren't quite as generally useful.
Is this much different than `alias histogram="sort $1 | uniq -c | uniq -nr"`
Sidenote: I started https://github.com/jldugger/moarutils as a means of publishing and sharing these, but it turns out I don't even have a lot of dumb ideas. Will probably end up bookmarking this HN post for "later."
I work with csv files a lot. I have a short awk script which truncates/pads each column to a fixed width which I can specify at runtime. It also repeats the top column (headers) every 20 rows in a different ANSI color. I pipe the output to less -SR for interactive use so I can scan delimited data in a scrollable grid, with all columns aligned and labeled.
I understand there's vim plugins for this, but, ehh.
There's also the likes of console-flat-table-viewer . One would have to convert the comma-separated stuff into one of the table types, but that's what Miller is for. (-:
“To avoid overflow, counters were updated probabilistically to maintain an estimate of the logarithm of the count.”
Stuff like this really makes me love what the pioneers of CS did in the past. In the past, they were counting every byte and every register while nowadays, programmers make things without considering the impact it will have on the HW.
> The math library for Bob Morris's variable-precision desk calculator
used backward error analysis to determine the precision necessary at
each step to attain the user-specified precision of the result.
I wonder if compilers could do this today? If you can bound values for floating point operations, you might be able to replace them with fixed point equivalents and get a big speedup. You might also be able to replace them with ints or smaller floats if you can detect the result is rounded to an int.
CPU's also have the possibility to do this since they know (some of) the actual values at runtime, and could take shortcuts with floating point calculation in places where not needed for the result.
Replacing floats with fixed point isn’t usually a meaningful optimization on modern CPUs. The FPU runs in parallel to the integer units, so you can easily end up idling the FPU while the integer units are too busy doing both the math and the necessary state management (counters, pointer arithmetic etc.)
This could make sense for SIMD however, but then the problem is getting the array data in the right format before the computation — if you’re converting from float to int and back within the loop, it destroys any performance gain.
Fixed point uses a lot less power though, and many use cases are effectively power limited rather than functional-unit limited, since if you really do fill all functional units on every cycle you'll soon need to throttle back your clock speed...
Perhaps a good example of that is video encoding, which is mostly fixed point, despite it looking like a pretty close fit for floating point maths.
A very good point. My worldview of performance is highly biased towards “full steam ahead” desktop graphics.
Video encoding is a bit of a special case though because the common algorithms are carefully designed for hardware acceleration. For most rendering, it doesn’t make sense to go out of your way to avoid the FPU.
This isn’t a performance optimization but rather an accuracy optimization. Even if the requested output is a double (64-bits) the intermediate calculations often need to be done to higher precision to get fully accurate answers. Note that the desktop calculator on Android does the same analysis by using computable numbers.
It's also a performance optimization though, since otherwise would might instead just use 400 digits of precision (or whatever) all the time, and round just the output.
Somewhat, in that it doesn’t use more precision than needed, but the real issue is you can’t just pick some arbitrarily large precision and round at the end. For some calculations, even 400 digits during intermediate steps would not be enough due to catastrophic cancellation and you would need to go even higher precision to get the right answer. It really is about solving an accuracy issue and not an optimization. And determining that you are using sufficient precision to get an accurate answer is an extra cost, so it is always more expensive than just plowing ahead and calculating an inaccurate answer.
What's surprising about eqn, dc, and egrep? I'm using the latter two all the time, and have used eqn (+troff/groff and even tbl and pic) in the 1990's for manuals and as late as (early) 2000's to typeset math-heavy course material. Not nearly as feature-rich as TeX/LaTeX, but much more approachable for casual math, with DSLs for typesetting equations, tables, and diagrams/graphs. I was delighted to see that GNU had a full suite of roff/troff drop-in replacements (which I later learned was implemented by James Clark, of SGML and, recently, Ballerina fame).
I had never heard of eqn and was surprised to find that the binary is still there on my Linux box.
With regard to roff in general, when I got into Linux-based typesetting around the turn of the millennium, that was already seen as antiquated tech, superseded by LaTeX which was undergoing a frenzy of development and improvement around that time. So, anyone under the age of 30 will probably be hearing of such *roff stuff for the first time (and sadly even familiarity with LaTeX has waned).
Ok I'm probably showing my age here then :) Back in the 1980 and 1990s, the roff suite, and most definitely egrep and classic Thompson DFA construction and DFA->NFA conversion was definitely Unix folklore/taught in Uni. Manpages are still rendered using roff/groff today, so probably many of us are using it regularly. Whereas GNU's texinfo has matured less well I'd say, or wasn't even very useful in practice to begin with due to lack of content.
I'm also using TeX/LaTex, but it's still a programming language whereas roff/eqn etc are non-Turing DSLs and renderers for particular narrow purposes. I get your point, but saying these are "antiquated" is like saying HTML is obsoleted by JavaScript.
> and most definitely egrep and classic Thompson DFA construction and DFA->NFA conversion was definitely Unix folklore/taught in Uni
I think you mean "Thompson NFA construction" and "NFA->DFA."
Regardless though, this is not what the OP is pointing out. 'egrep' (or just GNU grep these days) is doing something more clever (emphasis mine):
> Al Aho expected his deterministic regular-expression recognizer would beat Ken's classic nondeterministic recognizer. Unfortunately, for single-shot use on complex regular expressions, Ken's could finish while egrep was still busy building a deterministic automaton. To finally gain the prize, Al sidestepped the curse of the automaton's exponentially big state table by inventing a way to build on the fly only the table entries that are actually visited during recognition.
> Manpages are still rendered using roff/groff today, so probably many of us are using it regularly.
I know a number of projects that generate their roff by using pandoc. They don't actually know, or have the inclination to learn, exactly how g/roff works.
> Thompson DFA construction and DFA->NFA conversion
My (very recent) university education was unfortunately quite light on UNIX folklore, but this was converted in our formal automata course as we traversed the Chomsky hierarchy.
You know who the author of the email is, right? He’s not using “surprising” in the sense of “I didn’t know this existed” but rather “these are quite amazing tools”.
It was amazing that you could write pretty complex math with just a few special literals like “sup” and “sum”, and the braces. It turns out that compositionality is so strong in math that it’s most of what you need. This isn’t obvious until you try it!
Paired with a LaserWriter (vintage 1986, say), and troff, you could get almost book quality typesetting.
Later on, TeX got the details of math much better, but the basic language was the same.
Typo was added in Research Unix V5[0] and also present in V6[1]. It isn't in V7, my guess is that it was replaced by spell. I don't think it would be difficult to get it compile on a modern system.
I have found it useful to survey the existing unix utilities (maybe every several years). I'm no genius but I find things I will use. One way of course is simply to review the names in wherever your system stores manual pages, and read (or skim) those where you don't know what they do, trying out some things, or trying to remember at least where to look it up later when ready to use it. Another is by browsing to https://man.openbsd.org/ , then put a single period (".") in the search field, optionally choose a section (and/or other system, not sure how far the coverage goes), and click the apropos button.
Unix files are simply a stream of bytes and outsource concern of file structure to userland. There's nowhere to set/get a type, no mechanism to create schema in the file like fields, lengths, constraints, etc. You simply can seek to a place in the file (if it's seekable) and read/write the bytes. What they mean is up to the programs/user/convention.
Earlier filesystems were trying much more to be like databases.
I didn't find egrep surprising - I use it quite often. The thing I didn't know about it was that it was Al Aho's creation. I only knew about him from awk.
Sys5 killall: Bane of all regular linux administrators that also sometimes administered solaris boxes.
Once after blowing up an in production database server during the day, I suffered the unfortunate difficulty of having to explain why running a command "killall" on a critical server that killed everything was an innocent mistake and that I didn't have any reason to expect it to kill everything.
It's extremely difficult to not sound like a moron when explaining that you didn't expect "killall" to "kill all".
So it's functionally equivalent to `kill` with PID of -1, which is what we used to use back in the old days anyway. `kill -9 -1` should only kill your user processes if you're not root.
killall5 was our favorite way to log off the machine in high school, at least until the (clearly incompetent) lab administrator removed its execute permissions because it had “kill” in its name.
paste allowed me to interleave to streams or to split out a single stream into two columns. I'd been writing custom scripting monstrosities before I discovered paste:
I wonder what other unix gems I've been missing...