So every time my PC is on @ 20:00, a shell window will pop-up, asking me for password and runs the backup :). Since they are incremental, it takes maybe 10-15 minutes top.
Note that `restic check` only verifies that the repository metadata is correct, and doesn't detect, say, bit swaps in actual packfiles, which would render your backup unrestorable. You might be interested in the `--read-data` or `--read-data-subset` flags to help double check your backups!
Love seeing a code example. It's one thing to hear "restic is fantastic, super easy to set up", it is another to see an example of HOW simple it is. Thank you for sharing.
You might be also interested in how I use restic to backup PostgreSQL and other data onto Backblaze for security and to save cloud costs as Cloud providers charge exuberant fees for backups[1].
If so, I think an important difference is that mine won't hide the exit code, and you can then answer a sibling commenter (on forgetting without checking) with 'I omitted set -e at the top'.
Never mentioned it's super simple. It's DIY that I stitched together in 1-2 hrs of incremental upgrades and it's been running like this for months. To me it's simple enough but YMMV
Yeah I bet there are good GUI tools out there, but I always want to go for the stuff I can script myself, so I can hook desktop notifications into it and such.
It makes it hard for me to recommend backup tools to the non-technical people in my life, because they're looking for GUI solutions with the corners rounded off, and I want something crunchy and scriptable.
It doesn't have to be one or the other. The GUI would be for one-off or intermittent usage, and the CLI program would still come in as the primary tool for scripting or recurring use.
Well, to be fair, your does a bit more than the simple solution above. If you backup system configs and stuff with restic it wouldn't be much shorter than what you do here.
Restic is a great backup program. I use it (via rclone) to back up my laptop and it truly saved my bacon when I attached my main hard disk to a VM I was playing with and accidentally formatted it!
Restic can use rclone to access cloud providers it can't access and that's something we worked together on.
I use rclone directly for backing up media and files which don't change much, but restic nails that incremental backup at the cost of no longer mapping one file to one object on the storage which is where rclone shines.
Seems great, but I don't find the "Quickstart" section helpful. A demo video that shows 2-3 unknown commands without context is not a guide, same with links to the full documentation. Both make sense on the landing page, but both don't tell me how to get started quickly.
This quickstart assumes some context that not every person starting to use Restic has. It could be improved by offering some more context for each line:
# RESTIC QUICKSTART FOR LINUX
# Choose were you want to save your backups
MYREPO=/mnt/mybackupdrive/somefolder
# Initialize a new restic repo at your chosen backup location
restic init -r $MYREPO
# Backup your files to the newly created repo
restic backup /my/files -r $MYREPO
# To restore a backup, first mount the repo
restic mount /mnt/backedup -r $MYREPO
# Browse the latest backup at /mnt/backedup/snapshots/latest
ls -la /mnt/backedup/snapshots/latest
# Copy the files you want to restore out of the repo
cp -r /mnt/backedup/snapshots/latest/directory-to-restore ~/restored
> > unknown commands without context is not a guide
Quick start != guide, to me. A guide will guide you through thing at a slow pace, a quick start is the quickest way to get something running/started. And the commands seem fairly self-explanatory, but I could be suffering from the so-called curse of knowledge.
I don't remember really having that issue, though I don't remember the very first use. I think the only time where I remember being a bit confused, was with the 'forget' command, and I would guess that's just because of its nature rather than due to lacking docs. Testing what I thought was correct with --dry-run solved that problem.
Mostly I just use restic --help or restic subcommand --help anyhow, rather than the docs. Perhaps I'm just not doing fancy enough things with it and that's why I didn't run into edge cases yet where it lacks documentation?
It is never encouraging when I see that a project is at least 6 years old and has unapproachable documentation like you describe, and language about how they'll stop changing the backup format "when 1.0.0 is released."
Like...maybe it's time to stop working on new features, and focus on a release.
Duplicati has a similar problem. They're endlessly tinkering with new features instead of squashing bugs and meeting user expectations. I think it still chokes and permanently corrupts backup archives if you interrupt it during the initial backup.
Like: guys. People expect backup software to be able to handle interruptions and disconnects. It's actually one of the things I liked the most about Crashplan, and it could handle being interrupted without issue...nearly a decade ago.
Really looking forward to the day I can ditch borg backup and switch over but really can't until https://github.com/restic/restic/issues/21 is addressed. I have to pay for cloud storage and the lack of compression would easily double my costs based on my testing.
Restic looked interesting - I would never have guessed it didn't support such a fundamental feature as compression?! So thanks for mentioning this and saving me from wasting my time.
Note: I'm not trying to be horrible or disrespectful to the Restic devs; it's just that backup without compression is a complete show-stopper, especially if you want to use cloud storage (where storage space, storage operation, and bandwidth all eat money).
I trigger this using cron every night, but systemd timers will work too.
The first neat thing about this setup is that the client never even sees the private key that it uses to authenticate to the borg server - the key stays on the server & authentication is tunnelled between client & server via ssh-agent. You don't even need to be able to make a tcp connection from the client to the server - so long as the borg server can make an outgoing tcp connection to the client then everything just works. The client connects back to the server via a socat connection through a unix socket created by the outgoing ssh connection that tunnels any tcp connection made through it back to the sshd on the server. (You could probably tunnel the repo passphrase through as well, if you really wanted to.)
The second neat thing is the use of authorized_keys commands which are tied to an ssh keypair means that you're giving the minimal possible access - each ssh connection can only trigger that specific command & no other. You can issue ssh keys on a per-host basis & revoke them individually if necessary.
You have to use socat as a proxy program for the return ssh connection as ssh doesn't know how to connect to a unix socket & this setup requires
config
StreamLocalBindUnlink yes
in the .ssh/config on the client (possibly both client + server?), as otherwise the unix socket doesn't get cleaned up after the connection ends & the whole thing only works once before you have to remove the socket by hand. I'm not sure why this isn't the default for ssh to be honest.
If you strip out the unix socket stuff (which I need for oddball network config reasons...) it’s just standard ssh authorised keys configs & ssh-agent working exactly as designed. It’s quite elegant really!
It’s the unix socket dance that introduces the gruesome hackery (imo at least!).
Holy crap that's just `pkg install borgbackup`. I had no idea (my phone is already rooted anyhow, so this will also be able to get data folders). This changes everything. There is also `pkg install restic` btw. Based on the problems with append-only in borg and the lack of those in restic's implementation (I did a short audit on that part of the `restic/rest-server` code, looks solid but don't take my word for it), I might go with the latter but this is a great tip regardless.
I never wrote a blog post about it, but it is triggered when I plug in my charger and the phone is on Wifi. There are hooks in termux to do so. Thanks for the suggestion to write a blog post about it ;)
Quid pro quo: I've been using Titanium Backup[1] to make backups of all my apps, however it does not working properly with Android 11 and seems to perhaps be abandoned. So I'm now also using OAndBackupX[2] as well, which seems to be doing the job.
I then use FolderSync[3] to SFTP synchronise those two backup folders across to my server regularly when the phone is on the home wifi. (I also two-way sync my photos folder which is really quite handy.)
I use to also occasionally do a full sync of my phone contents to my server using FTP[4] although since upgrading, Android 11 has clobbered access to the Android/data folder making that problematic.
Using Termux + Borg (or Restic) so push full full backups looks attractive. Never seen Termus before. Thanks.
Also a long time user. I'm only speculating on abandonment because: it hasn't had an update since Nov 2019; I believe the fix for Android 11 would be a fairly simple permissions change, and; from the comments, no one has had a response from the author on the issue.
It is a shame, it has been a mainstay for me, restoring apps and data across at least three phones now. I'm hoping OAndBackupX works out, but have not really battle-tested it yet.
You might like Snebu then, it has always had compression and deduplication, now does public key encryption. No direct cloud support, although you can keep your repository sync'd to a bucket if you want. (Disclosure -- I'm the author)
"Once version 1.0.0 is released, we guarantee backward compatibility of all repositories within one major version; as long as we do not increment the major version, data can be read and restored. We strive to be fully backward compatible to all prior versions.
During initial development (versions prior to 1.0.0), maintainers and developers will do their utmost to keep backwards compatibility and stability, although there might be breaking changes without increasing the major version.
"
For what it's worth, it has a good track record, but yes you're totally right that restic is not exactly commercial-grade software with proper guarantees.
In one of the early talks at a local hackerspace, the author did also demo decrypting the data manually if something got somehow broken. The tool is just a layer on top of relatively straightforward cryptography. From what I remember (I looked at this in 2018 so forgive any errors), you'd have to write a script that iterates over the index where it says which blocks are in which files (since it's deduplicated) and decrypt it with some standard AES library. Perhaps as a security consultant this seems easier to me than it does to others, though, but it's not as if you're without hope if the tool did break, or as if you couldn't just download a previous version from the GitHub releases page.
One of the reasons I prefer restic to borg is it is trivial to maintain a standalone copy of the executable. Wherever I put backups, I keep a copy of the restic executable used to generate the dump.
For extra paranoid, could clone the restic source tree (with vendored dependencies). Go language backwards compatibility is such that I should always be able to read my data.
Hmm, anyone know if they fixed some of the exponential time issues it was having? I was really excited about some of the features and tried it a couple years ago, and it died before getting very far sync'ing my NAS to a usb3 jbod I plugged directly in to prime it for a rpi4 network attached backup. For something like that I would expect the backup speed to be a few hundred MB/sec and it quickly was just running in the 10's of MB/sec and getting slower.
I've noted this before that my NAS has a tendency to kill a lot of backup applications that haven't really been stressed. Its about 50 something TB of data, made up of a fair number of compressed video files, family movie kinds of things and about ~7T of source files. The combination of a crapton of tiny files and 50T of data kills most of these more recent open source backup applications, which seem to be released with the "it can backup my laptop" testing.
Also, I see someone mentions it sill doesn't have a compressed block option, which I really want because most of those source files compress to about 1/3rd of their space, which is important since my upload speeds are crummy (thanks spectrum!) I wonder how much of that deficiency is just lack of good go compression routines that can run at a few hundred MB/sec.
My main dataset that I back up with restic is around 11 million files or so, and fluctuates between 8-15TB of source data. I think I've run into similar issues testing out various backup tools too, but have settled on restic for now since it's been the most reliable.
I'd definitely recommend trying restic again if it's been a couple years. Somewhat recently they made some nice speed improvements. It used to take me several days to do a forget+purge on my restic repo. Now it takes only a few hours (less than an actual backup takes).
How many files are you backing up? I found the biggest issue for me was that almost every tool tries to keep the list of files in memory, so once you get into millions of files - it starts to require a lot more memory and can crash on low-resource machines.
I've always liked the look of Restic. I should really start using it to backup my Linux servers.
However for desktop use, I've always really struggled with the idea of not having a UI for my backup client. I'm not afraid of the command line, but the idea of browsing backup archives without a GUI feels awkward to me.
I wonder if there is room for some sort of add-on GUI for Restic for those that are more visual (unless such a thing already exists?)
I think many more people would use Restic if this was called out prominently on the home page.
I've known about Restic for years and would likely be using it by now had I realised that!
EDIT: Ah looks like it may not work on Windows. Part of the appeal of Restic, for me, would be being able to cover all of my Windows, Mac and Linux machines with the same system.
Restic doesn’t have a GUI for backup/restore configuration and actions (yet), but I found that Borg has a GUI called Vorta [1] for macOS and Linux. This was mentioned on a HN comment several days ago by @crossroadsguy here. [2]
Indeed, I feel the same way. I used to be a big user of CrashPlan, largely because it had a straightforward UI: highlight a bunch of directories; choose a backup destination; choose a backup frequency.
I do like the idea of being able to tune and configure things from the CLI. It was frustrating configuring CrashPlan on a remote computer.
With that said, I feel like both should be possible. Even just a basic wrapper GUI would be a start.
Edit: and some basic searching has lead me to lots of options! Time to do some more research.
I am a restic user. Use it to backup to b2 and Scaleway. It is not without hiccups, but once setup, and as long as I don’t backup anything from those protected Mac folders, it has worked smooth so far.
However I also acutely feel the lack of a standalone GUI so that I can get rid of the scrips and custom setup or at
least that can be an option. (There’s a commercial third party UI I think which is a subscription)
Another tool I’m looking at closely is https://kopia.io. It comes with a UI by default (Electron I guess). Though its UI and logo has quite some work left.
I trialed restic, kopia and borg and ended up with kopia, backing up to backblaze + opportunistic external hdd. Costs me cents a month for tb+ between all our devices. Agreed on the kopia UI (and logo, was only thinking that the other day!), it's pretty basic .. but to be honest it's all you need. I use it on my family's machines as well, including wife and parents (same bblaze, dif HDDs), and it means they can pretty much manage it without me. I liked restic and probably would have ended up with it if there was a decent oss/stock UI. Borg was significantly slower on my machine for backup and restore.
I’ve thought a lot about this. One alternative solution could be this: Instead of backing the laptop up directly, sync it to a NAS. (Using rsync or a similar tool.) Then run Restic on the NAS and back up the data from the NAS to S3 or similar.
I'm really glad that you find it interesting, it was my solo project for the long time. Feel free to contact me on discord stan#9673 if you need any help or just for the general shell-scripting related chat :)
I'm still waiting for backup tool that uses asymmetric encryption (data encrypted using public key, decrypted using private key) and having write-only server mode (so bad actor can't remove backups)
Indeed, this is one of the major advantages of tarsnap, though a friend also mentioned borg can do this apparently. I should really look into borg (again).
The append-only mode can be implemented using https://github.com/restic/rest-server or services like rsync.net that offer read-only zfs snapshots. Doesn’t solve the asymmetric crypto of course.
(I make Arq, a backup app that supports S3's object-lock API for immutable backups). I can't see how to do incremental backups without using the private key to read the previous backup record. Can you explain how that would work?
Hmm. If anybody can encrypt backups indistinguishably, and you want write-only so that bad guys can't remove stuff, surely you can incur unlimited costs as bad guys fill it with gibberish and you can't stop them?
You would run out of remote storage space quickly and then find out you have been compromised. But your backups that you made up to the point of being compromised are still in tact, which seems to be the best you can hope for.
> incur unlimited costs as bad guys fill it with gibberish
That's always an option, no matter if it's set to append-only or not. If you don't want to pay infinite costs for storage, you will need to limit it using software on the server.
The remote backup store can easily choose to only store things you actually encrypted, but this isn't possible for a simple public key setup. If you wanted to get fancy, you could use a sign + encrypt setup with separate keys so the store can tell if this is a real backup from you, and not allow things to get stored unless they've got such a signature, yet it still can't actually decrypt the backups it has been given.
As a proof of concept, take a look at a Certificate Transparency log server. Most CT logs are configured to only accept certificates meeting certain criteria. They'll log any such certificates (their SLAs only apply to contractual users, but you don't need an SLA you're probably just writing one certificate to the log to see it works) but you cannot fill them with garbage because you can't make any certificates they'd accept, only the legitimate CAs can do that†
† The CAs have their own reasons not to let you produce heaps of garbage, even Let's Encrypt has finite resources and so it imposes rate limits.
But if your box gets compromised, then whatever it was using to prove to the server that it's legit is also going to be available to the attacker. I guess you're thinking of a scenario where backups are not automated and the user either unerringly knows whether their box is compromised (and doesn't type in said proof when that happens) or uses some 2FA hardware device.
Given the chance, there certainly will be people that use 2FA when making an append-only regular backup, but even among command line restic users I expect this will be the exception rather than the rule.
Why would that not be possible? Especially since you say borg can do it, and borg is said to be incremental and deduplicated?
Note that write-only or append-only is a bit of a misnomer, since reading the files is fine (you need the decryption key anyhow before they're of any use). It's about not being able to overwrite or remove backup data without some verification that you're not ransomware or similar.
Well it is written in Go instead of Python.
If I remember correctly, borg is single threaded and quite slow. Kopia is really fast.
Other than performance, the biggest benefit is that multiple clients can write to one repository at once. There is a designated maintenance user, though.
When it comes to backup software, do yourself a favor and do extensive research on their stability and reliability.
From what I learnt, borg has least problem of all the open source backup software with the most wanted features (encryption, dedup, compression and rotation) and others all have their quirks, be it huge memory usage, data loss to worst being corruption of the backup repository.
Things may have improved since I've looked but you want user feedbacks saying so.
It's too late to know that backup software has been failing when you need it as the original data is already unavailable, so choosing by the look of landing page is a bad way when it comes to backup.
kopia is still new and last time I used about a year ago, it still had basic problems, so I would never use it in place of borg.
restic, duplicati, duplicity and duplicacy all have some sort of problems especially when the repo gets large but of course there are cases where things are working fine.
borg also has change logs with any major reliability problems mentioned up front which gives you more confidence than serious bugs buried in GitHub issues like other tools.
The only downside of borg is it can only target ssh host natively, but there are services like rsync.net (with special borg pricing), borgbase or you could locally run borg and rclone the entirety to anywhere you want.
There are a bunch of options to borg check depending on what you want. A full check with --verify-data is indeed rather slow, because it checks everything (essentially equivalent to seeing if all archives can be extracted plus a bunch of extra checks). If you only want to detect e.g. bit-rot, --repository-only will be sufficient in most cases and will be I/O limited.
There are certain priority to what you treat as issues.
Slowness is the least of the problem against data loss or repo corruption and perhaps you may be able to somewhat circumvent it by splitting the repo to different locations or disks.
You mention duplicity but do not give a link to an issue. Do you know if there is something wrong with it? Would really appreciate you pointing me towards that if so.
Sorry I didn't link to an issue but a concern with duplicity is more about its implementation which needs incremental backups to depend on the base full backup
(that you would make the first time) which means if you want to prune that base backup later on, you need to take a new full backup for later incremental backups to depend on instead of taking continuous incremental backups indefinitely.
I have a Makefile to invoke restic on just about every Linux machine I have (even Raspberry Pis), usually to do incremental backups to Azure blob storage. It is a great, no-fuss, “forever” tool I’ve relied upon for years, and has made it trivial to clone or restore some pretty weird setups.
Just mind you exclude node_modules and the like from it :)
Do you also run things like 'restic check' on it from time to time? I once managed to corrupt a restic repository after ctrl+c'ing the program and pulling a disk before it was apparently done syncing the write cache (I wanted to leave and catch a bus but noticed that this was still running), not sure if there are other conditions where that might happen. Better to know ahead of time to make a new backup before the primary copy breaks.
I've used restic for over four years. It's been rock solid on all platforms (mac/windows/linux). I've set it up on everything from beefy linux servers to 10 year old Windows machines (scheduled job running for 3+ years, pruning on a separate machine, never had any problems). The support for B2 and S3 compatible backends as well as rclone makes it a breeze to set up. The community and maintainers are also very friendly and helpful. Highly recommended!
borg supports compression, restic doesn't (and there is no way to add it without breaking backwards compatibility because of how the file format was designed). That's all I need to know for my use cases.
I used to use borg and I migrated away from it to Restic when I somehow corrupted my backup archive. I dunno what I did, but I started getting "non-utf-8 filename" python errors every time I tried to access it. It might have had something to do with the archive being on a removable disk.
Anyway! I'm happier with restic now. It's never crashed for me, and it has native cloud backends. But it's ultimately just another backup application.
Is there any statistically-significant data on which backup applications are the most reliable? I'm not married to restic, but I'll judge it by first-hand experience in the absence of anything else.
Not really, no. It usually goes like in this thread: Someone had a problem with software X and switched to software Y. Someone else had the opposite experience. It's worth pointing out that Borg and other hash-deduplicating backup tools regularly find faulty hardware where other backup tools wouldn't notice the data getting corrupted (e.g. many people advocate for "plain" backup tools like rsnapshot or just having an rsync cronjob, but all of these are unable to check the integrity of backups). Sometimes, users point to the backup tool (sometimes they're right and it's a bug, but usually it's a bad stick of RAM or a hard drive loosing a few bits here and there).
And it's just a single go-binary, i just trow it on a win/bsd/linux machine create a key, start the backup. I love the simplicity of it, however for more complex plans i use git-annex.
Borg is much older and has seen production use for decades and had all the bugs worked out. Iirc rustic is still sub-1.0. Not ideal for backup software.
Borg sequentially scans your filesystem for changes and only then starts backing up changed data. (tbf: a lot of backup tools seem to do this)
Restic scans your filesystem for changes, and then also starts backing up the changes it finds in parallel while it is still scanning for more changes.
When you have millions of files, this makes a huge difference.
This is wrong. The difference between the two is that Restic uses multi-threading and Borg currently doesn't. Both just scan the filesystem and add files to the backup set as they go.
Hmm, maybe it has changed or I'm remembering wrong. It's been years since I tried borg, but I remember it taking something like 10 hours to scan for changes, and then another 4 hours or so to actually backup the data.
With restic, it still took around 10 hours to scan for changes, but it was also already done backing up all the data by the time the scan finished.
Speaking of which, borg in Debian is maintained by the Debian Borg Collective, and the nickname of one of the maintainers is Locutus of Borg:
https://tracker.debian.org/teams/borg/
My full-system backups are done with btrfs snapshots synced to an external disk (actually two disks to have two backup locations). It's nice because you can keep the snapshot on your system and don't need the external disk as long as you have enough space, and both filesystems are in almost exactly the same state which makes it easy to mount it for copying a single file or to even boot your system from the external disk.
Bup (Python + C) - https://github.com/bup/bup - de duplicated and compressed. Storage format is a fit repository, an interesting choice that lets you restore using just git tools, “cat” and some effort.
Do you know which of these support a full Windows 10/11 system backup and restore?
I’m trying to avoid the need to reinstall and configure my system (for example, the registry, custom installed and tweaked programs) in case of complete data loss or a migration to new hardware.
My understanding is that full system backups on Windows requires the tool to create VSS snapshots and back up from the snapshot. Any tool that just copies files on the disk won't work.
I use Veeam Agent for this purpose (free, but not open source). It can do full system backups and supports both restoring to the same hardware and new hardware. Restores are done via a bootable WinPE-based image that the tool creates.
One cool thing about it I haven't seen in other backup software is that incremental backups work via a driver that tracks which disk blocks are changed as the system is running. It avoids the need to rescan the disk to detect what has been changed (though it will still do that if the filesystem is modified outside of Windows, eg. if dual booting).
The biggest downside is Veeam's website. It's pretty "enterprisey" and they want you to register to be able to download. I install via the Chocolatey package manager to avoid this. Chocolatey's package source has a direct link to the official installer [0].
There are no ads, nagging, nor upselling in the software itself. I have not seen it making any network connections outside of connecting to my backup target host and the auto-updates server.
I've been looking for open source alternative with a similar feature set, but haven't had too much luck. There's Bacula, but that seems to very much be designed for an enterprise use case.
Windows system backup requires support for correctly handling pretty much every NTFS feature, even (especially) the most obscure ones. While a generic file backup tool works fine for Linux and BSD system backups, it's hopeless for Windows. You need a tool that's specifically designed to do that.
A simple duplicity then seems superior, unfortunately. Duplicitly seems solid but a bit complicated, possibly fragile (with so many components fitted together, like par2 stacking on top of any other storage).
That's like comparing rsync to google drive. One is an open source tool where you can use whatever back-end you want, the other is a service. (Which is fine, just different kinds of things.)
However, in this case it's the open source tool that has a much easier user interface (I am actually proficient with tar, but still my tarsnap experience is like comparing 'restic backup /my/files --repo /mnt/backupdisk' with https://xkcd.com/1168/)
Indeed. Restic is just something you apt install and nobody provides you any service (you have to organise your storage space yourself); tarsnap is not simply free to use for yourself with your own storage. (Not saying it has to be free, but that's what makes it the definition of a service you have to purchase.)
Tarsnap works out as ~6$/GB/year for me. That’s for a mostly managed backup service. The only thing missing is snapshot pruning which is slow and a bit of pain due to the way tarsnap’s cache works. Restore is on par with restoring from tape — reliable but slow, but who can really complain about how fast disaster recovery is?
Raw managed storage with rsync.net is 0.18$/GB/year.
Do it all yourself, with the associated peril and time sink that entails, and the disks will cost you 0.04$/GB/replica.
Tarsnap has its place and I’m still a happy customer, but it’s one small part of a wider strategy that includes bulk storage elsewhere — rsync.net with borgbackup and plain rsync, on premises ZFS dumpsters, and offsite drives used like they are tapes.
Is the Microsoft OneDrive is $70 for 1TB a good option for the truly paranoid, or Microsoft will share the content in case they get a court order or some with a tank in their front door?
Tarsnap is not expensive at all for its target audience: folks with highly compressible data. For any data that's not very compressible, it's super costly.
Example, if you are into photography it's not uncommon to generate hundreds of GBs of files _per year_. Only in 2020, I generated over 200GB of photographs. Putting that on Tarsnap would cost me about $60/month. In 5 years time, I could be paying upto $4000/yr. Tiered services like B2 would cost an order of magnitude less.
Why on earth someone would pay 10x-20x alternatives for encryption that these days is available in high quality free open source software such as Restic, Borg or Duplicacy?
For me the one to beat is IBM’s backup utility (known for years as ADSM (adstar storage manager) and later called TSM (tivoli storage manager). I’ve never seen a commercial or open source program that comes close.
I’ve also never seen successful backups anyplace that did not have TSM. Usually the backups are corrupt, or nobody knows how to do a restore, or you need 1000x the storage capacity in order to restore the initial backup and all of the incremental backups until you reach a specific point in time.
At places I worked with TSM it was so simple that individual users could fire up a gui and pull files out of the backup pool.
On the backend we had massive IBM tape libraries and it was hypnotic to watch the robot jet around moving tapes in and out of drives and the storage slots. It never stopped moving either, when backups are restored were not happening it was busy consolidating the tapes, making copies of data from tapes that had been used too many times, or preparing copies to be sent off site. It was a full time job for someone to load new blanks when TSM requested and remove the offsite tapes and put them into a box for fedex to pick up. (The one thing that has not changed is that it’s still quicker to send massive amounts of data by fedex then it is to send it over the public internet)
I have used TSM (or ADSM or Spectrum Protect or whatever IBM calls it this week) quite a bit. The basic functionality and performance are not too bad. However, it clearly shows that the software originates in the 1980s. The client is written in C++ and really likes to leak memory. This becomes problematic when backing up more than a few million files. The official "fix" suggested by IBM is to configure a cronjob that restarts the scheduler once a day (seriously).
TSM also has no support for deduplication, so good luck backing up large variable binary files such as VM images or project files (video, CAD, etc).
I’m pretty sure it did originate in the 80s, it had an earlier name than ADSM, then was rebranded back when IBM was going to split itself into “baby blues”, then Lou Gerstner took over and stopped the split-up. Despite its faults it’s still the best I’ve ever encountered.
€0.40/TB/month or €120 per 5 years for 5TB: Old phone or a raspberry pi with a hard drive attached located at a friend's or family member's house.
Any cloud service is going to be 5-10 times more at least, last I checked (2019), and that's already a lot better than five years before that (as a ratio of self-hosted to a managed service, so independent of raw storage prices).
B2 and Glacier seem to be some of the cheaper options these days. Backblaze (the backup software) doesn't run on Linux and is closed source but they pinky promise to support "unlimited" backups for $5/month which is a really good deal if you both trust them and run Windows. Tarsnap is S3 pricing plus markup, but what you get is linux support, open source clients from a person the community trusts, write-only keys, and the hosting part is off your hands, and pay-as-you-go, which is quite a unique combination.
B2 doesn't require any closed-source platform specific software to use, you can just use restic. And I'm not sure why should I care about either software and hardware on their end, they should be basically an external hard-drive to me, don't they?
About the "raspberry pi" thing... this is kind of answer you immediately regret you didn't preemptively dismiss in the question. I mean, it's hard to even decide if the person saying this is serious or not. Like, setting up a backup server at "your friends house"? Really? Is this seriously something that everybody but me does? Should I do that too? Is it considered normal practice in their cultures? Or is it just something that they say, because they like giving advice they don't follow? To me, that sounds just crazy.
Paying under $100/year to know that all of my junk is safely kept somewhere doesn't sound crazy at all, on the other hand. But which is the best option and if they really keep their promises I don't know, of course, that's exactly why I'm asking. Maybe there's some problem in disguise, maybe hardly anybody even uses their services and I shouldn't trust them. I don't know.
> B2 doesn't require any closed-source platform specific software to use
I never said it did, unless you are confusing Backblaze (the name of their backup solution, https://www.backblaze.com/cloud-backup.html) with their separate and much newer B2 service.
> this is kind of answer you immediately regret you didn't preemptively dismiss in the question
Well I'm sorry.
For what it's worth, this is literally the solution I use so I thought I could be helpful by at least including that as a base price point.
> which is the best option and if they really keep their promises I don't know, of course, that's exactly why I'm asking.
I don't understand, has anyone shared stories of paying Google/Amazon/Microsoft/Backblaze/Dropbox for X amount of storage and them not keeping their end of the contract? I understand your question even less now than I thought I did before.
Ok, sorry, I didn't mean to be rude. I'm sure you had the best intentions in mind when writing your answer. The thing is, for me the main reason to ask questions on forums like HN is to work against "unknown unknowns". Things like B2 storage cost can be easily googled if you don't know them, so it makes much more sense to just google them instead of asking here. In fact, that's what is usually expected from anybody asking advice on forums. But the problem is, that in reality there's always more to that than the basic specifications.
Take B2 for example. I know their storage pricing (that's very easy to find on their website), and I know for a fact it's super affordable, compared to other similar services. But I also know, that the "fine print" in their case is just that the upload speed is the bottleneck, which will prevent most users from backing up too much. And the fact that they have only 4 facilities. Is the latter the real problem? Well, I don't know. I didn't hear any stories about them loosing user data, but that might be just it — I didn't hear them. That's why asking such questions on places like HN has value in my opinion.
Similarly with HDD cost. It's kinda obvious that this is the most affordable solution, so if the person asking doesn't know that, it means he didn't do his homework. I don't use my friends' houses for that (that really sound super awkward), but in fact something similar is my current "solution" as well. But it feels like something I should be adviced to stop rather than to start doing. Backup service backend needs maintenance too. And maintaining it doesn't seem like a fun hobby (that's part of a reason why asking a friend to do this for you seems very weird). Backups are something you want to be reliable, by definition. And HDDs are not.
(As a side-note, I also sometimes contemplate if backing up rarely changing info to more reliable storage, like tapes and optic drives is a viable option. It still seems like no, unfortunately. But keeping everything on HDDs that I personally own makes me feel uneasy as hell. Despite being something I do, this is basically a strategy equivalent to "just hoping that everything will be ok". I don't do any real work, to make sure it's the case. I have no idea, when they are likely to fail. I just hope that they don't get broken at the same time, and that's it. I have no idea what the actual probability of that is.)
rsync.net is awesome. You get a Linux filesystem you can access via SSH. Many tools support it (like git-annex), and since it's ssh-able, you can script anything that doesn't work out of the box.
They also have optional replication, and you can choose the location of both of the servers that host your files. Great support from actual engineers, used by huge companies for their offsite backups, and never been breached (to my knowledge).
Robust in what way? If you mean user error then I would agree, it's super easy to manage backups and not do stupid things (I say this without much experience with other tools, fwiw). But as for file corruption, is it much different from the other standard solutions out there like borg or duplicity?
Why not use both? Recommendations are to have two backups anyway, one local and one remote. I use borg to make local backups to a USB HDD, and restic for remote backups in the cloud. Using different software guards against implementation bugs.
Using Borg for more than a year I haven't had an issue yet. In the past I used a similar tool that had error-correction capability but the tool was buggy and slow when compared to Borg.
I had Borg fail me the same way with index corruption. Since moving to Restic I haven't had any issues backing up or restoring, and it seems quicker too.
both are fantastic. i started on borg, and moved part of my shit to restic so i use both. borg ui is a bit better, but you can normalize that away with a bunch of scripts.
Restic is a great piece of software. I use it for more than two years now to do encrypted backups of my home server into Backblaze B2. Took minutes to set up with a couple of lines of script, works like a charm since then. Highly recommended!
Same here! Really happy with the setup. Got an Odroid HC2 doing daily backups to Backblaze B2. The thing doesn't even sweat scanning around 1TB of data. I find the automated pruning of old snapshots also pretty sleek: restic forget --keep-daily 7 --keep-weekly 5 --keep-monthly 12 --keep-yearly 75
Just a heads up in case you aren't already aware, 'restic forget' does not automatically prune data. You also need to pass --prune or run 'restic prune' later. Other wise, your snapshots are dropped from the index but the data used by them still exists in the repo.
> Once version 1.0.0 is released, we guarantee backward compatibility of all repositories within one major version; as long as we do not increment the major version, data can be read and restored. We strive to be fully backward compatible to all prior versions.
> During initial development (versions prior to 1.0.0), maintainers and developers will do their utmost to keep backwards compatibility and stability, although there might be breaking changes without increasing the major version.
Hmm not sure I would like to try on a backup tool that might introduce breaking changes that easily
I used to religiously make backups, and I still rsync my homedir onto a server occasionally.
That said, I keep ~/Documents and ~/dev these days in Syncthing directories, and one of my syncthing nodes is an Ubuntu LTS server with zfs, with the zfs-auto-snapshot package installed.
I still run my old backup system periodically (once or twice a month) but I now think Syncthing is at a point of reliability where realtime cross-machine sync is now my primary safety net wrt "the machine in front of me has turned to entropy", versus some point-in-time backup.
> No. Syncthing is not a great backup application because all changes to your files (modifications, deletions, etc.) will be propagated to all your devices. You can enable versioning, but we encourage you to use other tools to keep your data safe from your (or our) mistakes.
Even if you format your drive and Syncthing copies over the empty files you will still have a history of snapshots on the other host running ZFS snapshots.
What if you accidentally format the drive on the host with the ZFS snapshots? Won't that result in synchronizing the deletion of those files to all your other devices?
Yes, for his setup (if I understand right). I'm suggesting having two hosts with ZFS snapshots running (independent of each other) with Syncthing between. So if you delete the files on one host and they get Syncthing'd across there will still be a ZFS snapshot history on the other host. I'd also have a cold storage backup on hand that is also ZFS. Having your backup as a regular filesystem is a very nice feature once it comes to recovery.
syncthing has send-only and receive-only folders. You can set a receive-only folder on your backup device and send-only on your phone or whatever. With incremental backups it's not an issue. Syncthing is just the tool to get your data to the place it needs to be. Just like rsync or cp
Anyone taking backups seriously already knows the 3-2-1 rule anyway.
Syncthing frequently stops syncing on my phone and requires me to delete everything and resync. I'm using receive-only folders. I think I'm an exception but it's been a pain in the ass.
I'm coming to the same conclusion. I was running Freenas and my file server died and then my files were locked up on that host. Luckily I was using Syncthing to the central file server but I just added new links between individual devices and then it didn't matter that my file server died (except for files that were not on any non-server device). File servers are annoying for disconnected usage. I was using duplicacy for backups but the storage format is annoying (not just regular files) and nothing is more reliable than regular file systems. So now with my new setup I'm using a Rpi4b with NixOS and ZFS data drive over a USB3 dual drive docking station with UASP. I can boot from a USB3 thumb drive (which I do) or an SSD in the docking station and still have another SATA port for the data drive. (The downside of this is RAID doesn't work over USB.)
I'll have two of the same RPi servers in different locations with all software running (like a hot swap) with Syncthing keeping them in sync while ZFS snapshots keeps a history. I can plug in a cold storage drive in the second dock slot once in a while too. I'll have a spare RPi4b on the shelf in case it dies. If my server dies I can take the off site hot backup home and reconfig the network and then it is my primary server. With remote duplicacy backup I'm days away from getting going again. So ZFS snapshots + Syncthing and cold storage is where I'm going (for home use). Also I want to stick with Linux because I set up Freenas 5 years ago and now I forget how to admin it so I'd rather just keep with ZFS and Linux (the zfs send from freenas to zfs recv on Linux works perfectly).
It's part of kubesail.com, which helps folks host at home (but works with any kubernetes cluster). We use restic as part of the backup service which launches a container on your machine to locally encrypt and then upload data. Restic works great to de-dupe files and minimize the size of the backup and restore.
One of my favorite things about restic is not about the software, it's the community. People in the forums and bug reports are respectful and decent, and it's fantastic to see.
Restic is great, I use it to back up my Nextcloud data from my raspberry pi to two locations, a separate USB drive and a remote DigitalOcean space. Very easy to use and all encrypted. I wrote a blog post about it actually! https://compileandrun.com/2021-01-31-nextcloud-traefik-resti...
I was wondering how this compared to Bacula, which I setup for a company with significant amounts of data to backup (and which worked great, especially after I wrote a retention management + pruning tool).
> Can this be used to incrementally backup the whole / (root) directory?
That's how I use it (with --one-file-system), but I haven't tried restoring that. I just check that my data is there (e.g. open some pictures) and call it a day. For me, if I need to reinstall the system anyhow (super rare), I'm also happy to just reinstall the OS, do a bunch of apt installs, restore a few files in /etc perhaps, and put my homedir back. So I can't vouch for whether it will store all special attributes on system files, but I would expect that common things like owner/group/mode are there.
Fwiw, I once rsync'd a remote root partition to the current machine and that worked. Would not recommend to depend on this, but it apparently doesn't take much to be able to restore the root and boot from it :). You can also quite easily test this in a VM, if you'd like to make sure it works.
And how does it compare to duply, which let me boil down my duplicity setup to 8 lines config, 3 lines of globs (2 includes, 1 global exclude) and this line in my crontab:
I'm a bit confused. For all the description of different convoluted processes below, why not simply handle backup with something even simpler like Backblaz, SpiderOak or maybe sync.com? Seems a lot simpler than the Restic, Rclone and other methods described here.
I need a backup system in which files are never deleted, even if they are deleted locally. These files should also be easy and efficient to search for; you shouldn't have to go through all previous snapshots, trying to guess which snapshot a file might be in.
My backup script runs a 'restic diff' command after every snapshot it takes and puts it in a log file. It's basically a list of every file added, removed, or modified in that snapshot.
Since I have a directory full of these logs, I can search for deleted files by doing something like 'grep "^- " restic-diff-*.log'
I'm not sure if there's a better way to do this, but it works pretty well for me.
$ restic mount --repo /mnt/backupdrive/myrepo /mnt/backedup
$ echo /mnt/backedup/snapshots/*/path/to/file
And it would print which snapshots it is contained in.
If you don't know the path, I don't know how long `find -type f` takes so you might be right about that being inefficient. It is certainly not a use-case that I think restic ever had in mind.
I also don't know of "backup" software to solve this, it seems a bit out of scope for most of them (they are meant to have a backup copy of your disk, not be a file manager with history). You might have more luck with tools like rsync (or Toucan, from the good old days where I used Windows and Portable Apps), those can certainly create only new files and never delete deleted files, but then you typically don't get encryption and deduplication.
Yes, I should have specified that the emphasis is on the use-case of finding and restoring old files.
Yes, a backup tool never deletes by itself, but the standard way of using such tools is that you keep the last n snapshots - this is just delayed sync.
I don't know anything about backup architectures, but is it possible to be more efficient with storage and indexing(for search) if your use-case includes finding files from 5 years ago, but not necessarily recreate the whole filesystem as it was on some date?
> Yes, a backup tool never deletes by itself, but the standard way of using such tools is that you keep the last n snapshots - this is just delayed sync.
Restic, Kopia and most other modern backup solutions allow you to to define the retention policy. Usually by specifying how many hourly/daily/weekly/monthly/yearly snapshots to keep. They don't just remove all snapshots that are older than n.
They also usually let you restore any given file, from any given snapshot, without having to restore everything. And as long as there's a functional index, this shouldn't be terribly slow either.
If a file existed for 3 days and was then deleted, it might make it to 3 daily snapshots, and no weekly, monthly or yearly snapshot.
Even if you’re lucky and the snapshots are not deleted, if you don’t remember when the file existed, currently you need to search through all snapshots. There’s no index or any data structure to speed this search up.
I agree, restoring is fast once you’ve found the file.
Currently I use backblaze, and it does have infinite retention and snapshots, and restoring is fast. But I’m looking to migrate to something that gives me search as well.
My current backup system involves having all my data on ZFS and then using sanoid for snapshots and syncoid for syncing those snapshots to my server that also runs ZFS. Would I gain anything significant from switching to something like Restic for backups?
I use syncoid to sync to a server that only has 1 open port, ssh, which only allows one user, and that non root user runs rbash and can only do zfs receive, not zfs destroy. Hopefully that stumps the ransomware people trying to delete backups.
Somewhat offtopic but this reminds me: what's the best way to backup a Raspberry Pi? Mine has a 128GB SD card and that makes backing up the whole thing kind of silly. There has to be a better way.
I've lost a few months of data twice because of my ignorance
Not unless they've improved it substantially in the past decade.
A startup I worked for [0] shipped a backup/"DR" appliance using rdiff-backup behind the scenes that I had the pleasure of inheriting ownership of.
What rdiff-backup was good for was creating the false impression that you had working backups you could restore from. But once your available disk space for backups filled up, which is kind of the whole goal of a backup system; accumulate as many revisions going back as far as you have space for, the thing paints itself into a corner you can't recover from without creating potentially huge amounts of free space first.
Here's why:
1. The backup tree is modified in-place in the course of performing a backup. If the backup is prevented from finishing for any reason (admin/user cancelled, ENOSPC, power loss, backup source became unavailable, etc), the backup tree is left partially in the new revision and partially in the previous revision. Any subsequent operation, restore or backup, must first restore the backup tree to its previous version, using the same primitive restore algorithm rdiff-backup uses for general restores.
2. Unless you're restoring from the latest revision requiring no reassembly from differentials, the restore algorithm requires enough free space to store up to two additional copies of any given file having changes it's reassembling. This doesn't even include the final destination file, if restoring into the backup filesystem (as it does when recovering from an interrupted backup, mentioned in #1), you need space for the third copy too.
Maybe they've fixed these problems since my time dealing with this, it's been years.
I ended up writing a compatible replacement for my employer at the time which used hard link farms to facilitate transactional backups requiring no recovery process when interrupted. This also enabled remote replication to always have a consistent tree to copy offsite while backups were in-progress, something rdiff-backup's in-place modification interfered with. As-is you'd end up just propagating a partially updated backup offsite if it happened to overlap with an ongoing backup.
My replacement also didn't require any temporary space for reassembling arbitrary versions of files from the differentials. So it could always perform a restore, even with no free space available. I even built a FUSE interface and versioned backup fs virtualization shim for QEMU+qcow2 atop those algorithms. But it was all proprietary and some of the stuff got patented unfortunately.
I wouldn't consider rdiff-backup usable if it didn't at least have the ability to restore without free space yet. At least then it might still be able to do its rollback process when ~full, assuming it's still doing the in-place modification of the backup data.
Edit:
In case it's not clear from the above; it's particularly nefarious the way rdiff-backup would fail, since it was typically unattended automated backups that would fill the disks, leaving the backup tree in a rollback-required state to either run another backup OR restore. The customers usually discovered this situation when they urgently needed to restore something, and rdiff-backup couldn't perform any restore without first doing the rollback, which it couldn't do because there was no space available. Not that it could even perform a differential restore without free space, but the rollback-required state almost guaranteed a differential restore was required just to do the rollback.
Back when I was implementing the replacement it was such an urgent crisis that I was logging into customer appliances to manually restore files from sets of differentials without needing temporary space, using unfinished test programs before I had even started on the integration glue to streamline that process.
Basically every back-up system does incremental backups, because indeed uploading or copying everything every time is kind of silly as you say. It's easy to go here "restic does that" but basically every other tool does that as well, so what "the best way" is depends on what other things you need, not on incrementalness.
Personally I would use restic in your situation because I'm familiar with it already and it does what I need (I particularly like the encryption aspect), but that's not to say that borg, bup, rsync, or other tools couldn't also fit your needs.
I do this too. And you can combine rsync with gocryptfs (in reverse mode) to get strong encryption of your backups as well. This is especially important if you are storing the backups on a remote/untrusted device.
I use rsync to backup my desktop (which is mostly used for development) pretty much just rsync /home to a big enough external LUKS drive.
Has saved my bacon a few times, I don't care about incremental snapshots or delta's (though I've used both rsnapshot and rdiff successfully in the past), I'm covering the "if the SSD blows up, how long to chuck a new one in and be back up" case not the "I might need that file from 6mths ago".
I also have syncthing setup via /home/<user>/Shared/{Personal,Work}/ on every machine and important stuff I just chuck in Shared/ and forget about/it's available wherever I need it at any point.
I've had really ornate bulletproof snapshot based backups but honestly for my particular use case they where more hassle than they were worth, rsync does what I want every time and has never let me down.
Use Borg, and copy the whole image. Borg will compress and deduplicate it, so you only backup the changes each time. You'll get point in time historical backups, so if you accidentally delete stuff, you can find it again. Useful because data loss isn't always just hardware failure!
What if I don't want to encrypt? I would like tarsnap/restic/whatever (but unlike tarsnap, I want to be able to self host) and I want it without encryption.
To be clear, "every snapshot [is] always a full backup" does not mean that you have to upload your whole terabytes-large drive every time. I guess what you mean is that each backup stands on its own, referring to the chunks it needs. I don't think that's what people mean when they say that each snapshot is a new full backup but rather that this is the definition of it being incremental.
IIRC restic operates at the file level on the source side and then stores data in blobs, and unchanged blobs are deduplicated (it operates as a content addressed store, kind of like git), which means iterative backups are possible.
Yes, it deduplicates but does not compress. Since most large files (mp3, jpg, mp4, docx, etc.) are already compressed, this is not a big issue but it's not ideal either.
If I remember correctly, another tool either does both or does compression instead of deduplication, this might have been borg or bup. In case that's something you care about.
Sorry about the formatting. But the compression is not completely irrelevant. Dedup of blocks between files and backups is of course the absolutely most crucial part though.
Please correct me if 'one backup' and 'all backups' is an incorrect interpretation of 'all archives'. I wasn't entirely sure what you mean by that but I think I get the point.
So in conclusion, adding compression saves about 17% (a $10 monthly bill would be $8.28 instead, if you pay per GB) for you.
Even if I'd get the better of the two values, I don't have enough systems that a backup being 2/3rds of the original data size reduces the number of backup disks I need to buy, but it's not insignificant either.
Well, for one Tarsnap is incredibly expensive. I get 6TB of cloud storage for roughly $65/year with unlimited transfer. Tarsnap would be in excess of $1500 per month.