I just worked on a new diff algorithm for VS Code, in particular for the new merge editor (you can turn it on with "diffEditor.diffAlgorithm": "experimental").
The space efficient Myers diff algorithm has some problems when using it for merging, as tiny changes at the beginning of a file can cause a different path selection later on. Also, Myers doesn't minimize the number of hunks (only their total length), which sometimes leads to unnatural outcomes.
If anybody has good ideas for more advanced diffing algorithms, I'm all ears! (maybe even tree based ones, like difftastic)
It's already very difficult to objectively evaluate the quality of a diff or to see if improving the diff for a special case doesn't make the average case worse.
Given there is no "truth" (IE diff algorithms don't reproduce what you actually did, just how to transform one thing into the other), what is the goal you have for your diff algorithm?
Size of diff?
Speed of diff?
etc
That would be helpful in trying to give you alternative sources :)
The new algorithm better represents the change that the coder actually made.
More generally, I'd say that the point of diffing in something like VSCode is to try to aide the reviewer (for any number of types of reviewer - looking at merge conflicts etc) in evaluating a change.
It's a tool for human understanding, and from that point of view, text diffing is kinda barbaric (don't get me wrong, I love it and use it all day long). You can rename a variable using an LSP, knowing that it's effectively a safe no-op on the code. But once it comes to review the PR, that change is just additional noise mixed in with everything else.
I use Neovim, where I have treesitter. I look at the amazing information in there and long for the day when I'll be able to use its power for reviews. Something like I described above - show me a version of the change with stuff that doesn't matter, like renames, removed. Now show me a diff that's the inverse of that, so I can skim it. I mentioned in another comment that `--color-moved=dimmed-zebra` is a useful tool to help focus on what matters.
It is a very interesting idea to decompose a diff into two diffs - one part that a computer understands (like renames, formatting changes), and one part that a computer does not understand, but that is as small as possible!
In case of the diff viewer in VS Code, the goal is to present the changes to the user in a helpful way, so that the user quickly understands what changed.
In case of the merge editor, the diffs should reflect the actual edits as close as possible, so that non-conflicting edits cause non-conflicting diffs, which then allows a non-conflicting merge.
Diffing is hard and there are many subtleties (e.g. https://github.com/microsoft/vscode/issues/164370).
The space efficient Myers diff algorithm has some problems when using it for merging, as tiny changes at the beginning of a file can cause a different path selection later on. Also, Myers doesn't minimize the number of hunks (only their total length), which sometimes leads to unnatural outcomes.
If anybody has good ideas for more advanced diffing algorithms, I'm all ears! (maybe even tree based ones, like difftastic) It's already very difficult to objectively evaluate the quality of a diff or to see if improving the diff for a special case doesn't make the average case worse.
As for the visualization of Myers, I can recommend to have a look at https://blog.robertelder.org/diff-algorithm/.