Hacker News new | past | comments | ask | show | jobs | submit login

Speaking of Emacs:

https://news.ycombinator.com/item?id=22849522

DonHopkins on April 12, 2020 | parent | context | favorite | on: Enemy AI: chasing a player without Navigation2D or...

James Gosling's Emacs screen redisplay algorithm also used similar "dynamic programming techniques" to compute the minimal cost path through a cost matrix of string edit operations (the costs depended i.e. on the number of characters to draw, length of the escape codes to insert/delete lines/characters, padding for slow terminals, etc). https://en.wikipedia.org/wiki/Gosling_Emacs

>Gosling Emacs was especially noteworthy because of the effective redisplay code, which used a dynamic programming technique to solve the classical string-to-string correction problem. The algorithm was quite sophisticated; that section of the source was headed by a skull-and-crossbones in ASCII art, warning any would-be improver that even if they thought they understood how the display code worked, they probably did not.

https://donhopkins.com/home/archive/emacs/skull-and-crossbon...

Trivia: That "Skull and Crossbones" ASCII art is originally from Brian Reid's Scribe program, and is not copyrighted.

https://donhopkins.com/home/archive/emacs/mw/display.c

    /*  1   2   3   4   ....            Each Mij represents the minumum cost of
          +---+---+---+---+-----        rearranging the first i lines to map onto
        1 |   |   |   |   |             the first j lines (the j direction
          +---+---+---+---+-----        represents the desired contents of a line,
        2 |   |  \| ^ |   |             i the current contents).  The algorithm
          +---+---\-|-+---+-----        used is a dynamic programming one, where
        3 |   | <-+Mij|   |             M[i,j] = min( M[i-1,j],
          +---+---+---+---+-----                      M[i,j-1]+redraw cost for j,2
        4 |   |   |   |   |                           M[i-1,j-1]+the cost of
          +---+---+---+---+-----                        converting line i to line j);
        . |   |   |   |   |             Line i can be converted to line j by either
        .                               just drawing j, or if they match, by moving
        .                               line i to line j (with insert/delete line)
     */
https://donhopkins.com/home/documents/EmacsRedisplayAlgorith...

https://dl.acm.org/doi/10.1145/1159890.806463

A redisplay algorithm, by James Gosling.

Abstract

This paper presents an algorithm for updating the image displayed on a conventional video terminal. It assumes that the terminal is capable of doing the usual insert/delete line and insert/delete character operations. It takes as input a description of the image currently on the screen and a description of the new image desired and produces a series of operations to do the desired transformation in a near-optimal manner. The algorithm is interesting because it applies results from the theoretical string-to-string correction problem (a generalization of the problem of finding a longest common subsequence), to a problem that is usually approached with crude ad-hoc techniques.

[...]

6. Conclusion

The redisplay algorithm described in this paper is used in an Emacs-like editor for Unix and a structure editor. It's performance has been quite good: to redraw everything on the screen (when everything has changed) takes about 0.12 seconds CPU time on a VAX 11/780 running Unix. Using the standard file typing program, about 0.025 seconds of CPU time are needed to type one screenful of text. Emacs averages about 0.004 CPU seconds per keystroke (with one call on the redisplay per keystroke).

Although in the interests of efficency we have stripped down algorithm 5 to algorithm 6 the result is still an algorithm which has a firm theoretical basis and which is superior to the usual ad-hoc approach.

carapace on April 12, 2020 [–]

I wonder how the ratio of display overhead to cost to compute the minimal cost path has changed over the years? At some point it must be cheaper to do a "dumb" redraw if the display hardware is fast enough?

Also, it seems like this might also be useful for computing minimal diffs, eh?

DonHopkins on April 12, 2020 | parent [–]

It was definitely worth it if you had a 300 baud modem and a lightly loaded VAX mostly to yourself. But it became an overkill with higher baud rates and high speed networks.

It's based on the "string-to-string correction problem" (edit distance / Levenshtein distance), combined with "dynamic programming" (finding the best cost path through a cost matrix), which is (kind of) how diff works, on a line-by-line basis for inserted and deleted lines, and then on a character-by-character basis between changed lines. Each "edit" has a "cost" determined by the number of characters you have to send to the terminal (length of the escape codes to perform an edit in multiple possible ways, versus just redrawing the characters). Emacs computes its way through a twisty maze of little escape codes each time it updates the screen.

H. M. Wagner and M. J. Fischer. "The string-to-string correction problem." JACM 21, 1 (January 1974), 168-173

Citation: https://dl.acm.org/doi/10.1145/321796.321811

PDF: https://dl.acm.org/doi/pdf/10.1145/321796.321811?download=tr...

https://en.wikipedia.org/wiki/String-to-string_correction_pr...

>In computer science, the string-to-string correction problem refers to determining the minimum number of edit operations necessary to change one string into another (i.e., computing the shortest edit distance). A single edit operation may be changing a single symbol of the string into another, deleting, or inserting a symbol. The length of the edit sequence provides a measure of the distance between the two strings.

https://en.wikipedia.org/wiki/Diff

>In computing, the diff utility is a data comparison tool that calculates and displays the differences between two files. Unlike edit distance notions used for other purposes, diff is line-oriented rather than character-oriented, but it is like Levenshtein distance in that it tries to determine the smallest set of deletions and insertions to create one file from the other. The diff command displays the changes made in a standard format, such that both humans and machines can understand the changes and apply them: given one file and the changes, the other file can be created.

https://en.wikipedia.org/wiki/Levenshtein_distance

>In information theory, linguistics and computer science, the Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. It is named after the Soviet mathematician Vladimir Levenshtein, who considered this distance in 1965.

https://en.wikipedia.org/wiki/Edit_distance

>In computational linguistics and computer science, edit distance is a way of quantifying how dissimilar two strings (e.g., words) are to one another by counting the minimum number of operations required to transform one string into the other. Edit distances find applications in natural language processing, where automatic spelling correction can determine candidate corrections for a misspelled word by selecting words from a dictionary that have a low distance to the word in question. In bioinformatics, it can be used to quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T.

Here's a Levenshtein demo that shows you the cost matrix with the shortest path highlighted. Try calculating the distance between "fotomat" and "tomato", and you can see the shortest path which has a cost of 3 edits (1: delete f at beginning, 2: delete o at beginning, 3: insert o at end):

http://www.let.rug.nl/kleiweg/lev/

>Levenshtein distance is obtained by finding the cheapest way to transform one string into another. Transformations are the one-step operations of (single-phone) insertion, deletion and substitution. In the simplest versions substitutions cost two units except when the source and target are identical, in which case the cost is zero. Insertions and deletions costs half that of substitutions. This demonstration illustrates a simple algorithm which basically looks at all of the different ways for operations to transform one string to another.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: