Hacker News new | past | comments | ask | show | jobs | submit login
Darke Files is a version control and file synchronization system (handmade.network)
94 points by zdw on April 24, 2023 | hide | past | favorite | 35 comments



I've tested a lot of file synchronization software. I rely via scripts on Unison, originally authored by the computer scientist Benjamin Pierce, and now decades of tweaking by a strong open source community.

https://github.com/bcpierce00/unison

I'd love to see Darke Files get everything right that Unison gets right, that nearly all commercial projects get wrong, through the usual blend of arrogance and ignorance:

* Meta data. It takes a lot to insure that two copies of a MacOS file appear identical to a user. There used to be a test suite on the web that embarrassed everybody.

* Atomic folders such as ".git" or an application bundle. A prototypical example is a MacOS sparse disk bundle, supported by a folder of many small files. This makes incremental backup and transfer more efficient, assuming a single source. Unison lets you specify the conflict resolution at the folder level, all-or-nothing decide which copy or fix it.

* Symbolic links. This is wildly complicated by users, sure they're right, who want special handling to hack features into sync software that aren't there. A symbolic link is just a file, with correct use the responsibility of the user. You wouldn't want sync software stopping to view your porn, right? They're just files, not the sync software's business.

I use Dropbox for various purposes because I need to, but they bungle more of this than one would expect. For example, a typical MacOS application bundle can have internal symbolic links a typical user never notices, pointing the "current" version of resources to a versioned folder. Last I checked, Dropbox expands the symbolic link into a redundant copy, wasting space without kneecapping the app.

One could go on... I've given up on reporting these issues, though the responses would make a hilarious blog post.


Dev here.

Darke Files isn't available for MacOS yet. I'll make sure to understand how MacOS' file system works (especially the meta data) before working on a MacOS port. If you can provide me with more info about that test suite I would be very interested.

Right now version history of a repository is saved in a ".darke" directory that works quite similar to Git's ".git" directory. I've got a few experiments planned to improve on that system, how well it can be incrementally backed-up is a metric I will also consider.

That Darke Files does not change your files at all is one of the core principles. The same will be true for symbolic links. I'm not handling symbolic links quite yet; I've been bitten by Git's not-good-enough handling of symbolic links on Windows before.


Dropbox's handling of macOS metadata is the reason I have stuck with it over other sync services over the years. SyncThing recently added support but I found the overall setup process remarkably obtuse.


It seems to be completely closed source.

I would think more than twice before introducing something closed-source into such important areas like version control or backups.

The problem is not the need to pay (closed-source freeware exists, and I regularly donate to open-source projects), but continuity.


'_Annectodal_': "I once used a chinese backup-restoring software, cos it was the only illegal copy i found that time (wayback), searching for this specific 'warez' (closed-source software in such important areas like backups) and wonder after restoring some lost .txt-files that they were translated into traditional chinese... About what i got after translating it back again... you may 'wonder'" (-;


I'm going to take a guess that nothing was translated, but instead misinterpreted as UTF-16.


Handmade Network is closely related to games, so this is probably closer to Perforce, heavily used in the video games sector.

It also follows the client/server architecture of Perforce, supports large files, etc. I wish it was open source though, but at least it's free for now.


When it comes to the inner workings, Darke Files is closer to Git than to Perforce. But when it comes to large files Darke Files is definitely taking its cues from Perforce. (Also I've got a few more Perforce-inspired features planned for the future.)


I'm a bit burned out testing these different backup/file sync things. I've used at least unison, restic, rsync, kopia, syncthing and rclone. What is becoming clear to me is that I don't like sync/backup solutions that run constantly in the background, the longer I use those, less sure I become how they work. Also many of those have weird formats, that are not easy to manipulate with standard cli tools. When I set up a new machine I have to figure out how those work, which is surprisingly annoying.

Whereas these CLI tools like tar, rclone and rsync, I setup to run periodically, I have a simple bash script, and I have a pretty good idea of how they work when I reinstall my system.


I can't tell what this actually is. Is it like git with syncthing?


The explanation on the website seems perfectly good. It's a system that can go from something like DropBox (automatic sync of files with a cloud) to Git (version history, commits, manual pull/push, conflict resolution). I don't know for sure, but it seems you configure which features you want to activate during setup.

Does this help?


Dev here.

There actually aren't any features to activate, the goal of Darke Files is that different users of the same repository can work at different ends of this sync <-> version control spectrum.


I can’t tell what the use case is, or how it varies from git/git push.


> Regardless whether you're using small text files or huge binary files, Darke Files handles them all.

That is interesting to game developers and creatives (like movie EFX folks). They can have huge assets, and Git is pretty terrible at large binaries.

That's why Perforce is still popular in the gaming industry. It handles big files, very well.


Does Git LFS have issues?


LFS can have issues, e.g., if using HTTPS alongside SSH is going to be a problem. It's also not a part of the core git project, but neither of these things may matter to you. There's other caveats, but they go with the territory of the benefits gained from LFS.

For me, it works fine in Gitea, and I do audio and music for games. Are there more appropriate solutions like Perforce? Probably, but the current pipeline is git-based.


Thanks, interesting to hear a perspective of someone who versions media in Git. Also going to check out your music!


Git LFS is very slow for lots of files. It is also inefficient space-wise and bandwidth-wise if you update big files.

The integration with Git is also very clunky since it's based on smudge/clean filters (that were really meant to update templates in your text files or change line endings, not download big files).

The server options are also limited and not easy to deploy. Not a problem if you use a central repository on the cloud, but a non-starter if you want direct sync between devices.


> The integration with Git is also very clunky since it's based on smudge/clean filters (that were really meant to update templates in your text files or change line endings, not download big files).

There is no better integration though, right? I'm implementing an alternate for Git LFS because i also don't like it - but my past research led me no where. Git just doesn't have good support for this and all we can to is smudge or batch smudge.

iirc there's some nuance/difference to how Git Annex achieves this problem space, but fundamentally i think everyone basically does the same thing. No?

If there's a better way i'd love to know, since i'm literally replacing/reimplementing Git LFS for my own pipeline.


> There is no better integration though, right?

Yeah this is not specifically a critique of LFS, it's a critique of the whole thing. Some of those problems are definitely limitations of Git. I'm not trying to point fingers, but the result is bad.


Concretely, how does Smudge not solve the problem okay?

I dislike Smudge in that it felt like an abuse of an API. BUT, i'm not aware of any real problems with it.. but i could easily be missing something. So i'd love opinions on the subject :)


Integration is poor. Some tooling is not aware of it and will diff or show the pointer file instead, such as Git hosting platforms. It is also easy to end in situation where the pointer file hasn't been smudged, and Git might not report a change and might make it hard to find what is and isn't smudged... at least that has been my experience when dealing with large repositories where I sometimes want to only pull large files in specific folders.


I think either you get super integrated but restricted ecosystem (eg. you can use darke files but then darke files is all you can use) or you build on top of an established protocol and get something maximally compatible (any git server) but some software will ignore the extra bits.


Where is the inefficiency, Git LFS spec or some particular server? I think space is not a spec issue at all, an LFS server can use any deduplicating storage under the hood

The part where filters are used is not elegant, porcelain for LFS could be better.


For me it's just that Git LFS defaults to a server. I'm rewriting it largely because i loathe Git-LFS needing to hit some HTTP server. Running one locally because they defaulted to a Github Webserver mindset frustrates me.


Personally if files are large I prefer to store all of them and the entire history of their changes somewhere else, so a server makes sense to me...


Yea, i just don't want to be forced to. Ie to me storing them in another folder, a network filesystem, some SSH filesystem, a dumb KV store like S3, etc are all viable storage locations to it.

So that's what i'm writing because i'm just picky, i guess heh.


The spec says pointer is just an URL so it could probably be a file:... or s3:... URL if you have the right resolver logic


> I think space is not a spec issue at all, an LFS server can use any deduplicating storage under the hood

The client doesn't deduplicate, so you have to download whole objects and store them. Of course you can clean up the older version of objects, but that will force you to re-download if you change branch or go back in the history.


How would you deduplicate on the client at all?

If I have multiple copies of the same file in different subdirectories, and I need all of them for work, there's no way around storing all of them as multiple copies on my machine, right? Unless you do some symlinking or hardlinking magic, but that will likely break platform and software compatibility so I wouldn't want it to happen either...

Similar with history, how can I have my cake (not have previous versions occupy space locally) and eat it too (have them readily available locally without download)?

Or do you mean something like chunking binary files and then deduping chunks? Is it an LFS limitation, or simply that no Git LFS client/server implementation does it yet? It's still fairly new after all.


Either way, its not ready for production. FTA:

> Can I start using this right now?

> Darke Files is still alpha software so please don't use it in production. Additionally Darke Files is missing many missing many features one might expect from a version control system (or a software forge, or a devops platform). Keep good backups of your data if you decide to try it out.


Can I use it for CAD files?


The name is really unfortunate. It's way too close to an ethnic slur which is spelled a bunch of different ways, but darkie is almost the same as Darke.

https://www.merriam-webster.com/dictionary/darky


So this got voted down. That's odd to me. I'm trying to warn the developers that this name is potentially offensive and could remind people of racist images. At one point Colgate sold Darkie toothpaste with a really offensive caricature of a black man on it. They later renamed it Darlie, but the history is still there.

https://www.goldthread2.com/identity/origins-colgate-darlie-...


I'm guessing it got voted down because it's a pretty tenuous link. I read the title as pronounced like "dark", and probably most other people did too, since an "e" is usually silent if placed at the end of the word after a consonant.

I don't think most people would make the connection to the slur, and that's if they even know the slur exists, since it's pretty antiquated.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: