Backup software

Also see backup hardware and backup cloud. Also see personal knowledge base.

My use case

What I want to backup

There are overlaps between these points. Also, temporary files might/should be excluded (e.g. /tmp/*).

full Linux system including all files, such that recovering that will give me a fully functional system
home directories
archives (which are themselves backups of older computers/discs, where I don't have direct access to anymore)
projects (programming but also other things) (usually Git repos)
my pictures
my music collection
maybe movies (not so important)
contact details (e.g. Google contacts)
Tweets
movie rankings (score11.de, imdb)
documents / texts / mails

Backup properties

Certain files are more important than others (e.g. home directory files are quite important, programming projects, mails).
I want to keep several copies around (on different media, PCs, online services). Not all copies must be complete, they could maybe just have the important files.
Some overview where (media, PCs, online service) I have what backup, and which files are there, and maybe even the version of them.
Some of the projects etc have their own versioning (e.g. Git repo). But this probably does not matter too much. Although it is not totally clear whether the backup system should have its own versioning support. It probably should not keep the full history of everything. Also it should be possible to permanently delete things from the whole system.
Copying will take long, because of huge amounts of data. There has to be some continuous update.
Backups on external cloud storage would be nice as well, but should be encrypted there.
Some projects / pictures are published elsewhere (GitHub, Google Photos or so). It would be good if the system knows that. And maybe provides an easy way to publish further directories.

File Index

In any case, I want to have an index of all the files, and that index should contain meta information, e.g. what backups contain the file, and other things.

The index could be part of the backup solution, or external (but it should know about the multiple copies).

Git-Annex might be an option.

Baloo or others are maybe relevant for a search index.

Software

Wikipedia software list
ArchLinux software list
Ubuntu software list
rsync
duplicity. Encrypted tar-format volumes, uploading them to a remote or local file server. No central index.
Bacula
Perkeep (previously Camlistore). Also for indexing of pics, etc. Looks close to what I want. Comparison.
Upspin (HN). Similar to Perkeep, but different focus. Also very relevant.
SeaweedFS stores blobs, objects, files.
Syncthing (HN)
FreeFileSync
bup
restic
Kopia (HN)
Box Backup
BorgBackup (HN). Used by rsync.net. Special cheap rsync.net cloud storage support. Remotely encrypted backups. No central index.
Rclone. "rsync for cloud storage". No central index of stored data.
BackupPC
Bareos
Areca Backup
Burp
git-annex
Datalad. On top of git-annex.
Dat
Unison File Synchronizer. project dead.
Seafile
albertz/backup_system: incomplete
Resilio. commercial
Dropbox, Google Drive, etc. commercial
imap-backup

A lot of the software can be divided into:

Standard backup software: Choose what files to backup, and where. You are responsible for how many copies there are, and to keep track what files are backuped where. I.e. there is no global index of all files. They might be simpler to use, though.
Global index based systems, like Perkeep or Upspin. They are not designed to work with lots of small files (e.g. Git object files, whole Linux systems, etc) but more for media files (images, documents).

The stored backup can have its own custom format (e.g. for efficient incremental backups) or it can be stored as-is. A custom format means that accessing it needs custom tools, might support custom FUSE, but is not as efficient.

It might make sense to decouple the storage of the files (maybe just as-is) from the index (to keep track which backup or remote contains what files, etc).

Note that not all software seems to be maintained anymore. Check the corresponding Git repo, whether it is still active.

Related software

albertz/google-contacts-sync: syncs Google contacts
albertz/memos: collects Tweets. similar is Timeliner
albertz/personal_assistant: personal assistant. backup, or a knowledge base, is kind of an integrated part of this; or knowing where to find what data
albertz/system-tools, albertz/helpers: small tools to sync/download things, or create projects, etc
albertz/iphone-backup
Baloo or others for indexing
Solid project, e.g. Solid Google Takeout importer

Perkeep

What's missing from Perkeep for the outlined use case? How would the workflow look like?

The index of objects/files:
- Is it easily synchronized, so always up-to-date?
- Reasonable small enough, so every backup instance can have the full index? Or do we need partial index support?
- Does it contain information on what media/PC we have the data? If not, can we add that? (I want to see, how many copies of some objects (or tree) are there, have control over that.)
Good idea to just push all Git object files into it?
- Should we then also push the checked out files into it? We already have all the data from the Git objects.
- Can Perkeep directly read and understand the Git object files? Directly accessible (read-only) via FS?
Would that work well with once-written/offline backup media (DVD, tape)?
Automatic backup schedules:
- Some trees (e.g. home dir) should automatically be synced to multiple online media.

Bup

What's missing from Bup for the outlined use case? How would the workflow look like? Simpler than Perkeep (no concept of user, access control) but that might not be a dealbreaker.

Python 2?
The index of objects/files:
- Is it easily synchronized, so always up-to-date?
- Index is a single file? Can it be distributed? Partial?
- Does it contain information on what media/PC we have the data? If not, can we add that?

Git-Annex

Create multiple repos or one single?

If a single repo (maybe more convenient), how would I link existing files into it?

How to setup for existing files?

E.g. my current picture collections, which is already distributed, and partial in each copy.

Would I just go in one of the copies and do git init and git annex init?

Question in forum

How to manage alternative keys

E.g. via local sensitive hashing (LSH).

Question in forum

Difference to Datalad

Syncthing

HN

How good is the partial directory support?

Good also in the sense of how easy it is to use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backup-software.md

backup-software.md

Backup software

My use case

What I want to backup

Backup properties

File Index

Software

Related software

Related articles

Perkeep

Bup

Git-Annex

Create multiple repos or one single?

How to setup for existing files?

How to manage alternative keys

Difference to Datalad

Syncthing

How good is the partial directory support?

Files

backup-software.md

Latest commit

History

backup-software.md

File metadata and controls

Backup software

My use case

What I want to backup

Backup properties

File Index

Software

Related software

Related articles

Perkeep

Bup

Git-Annex

Create multiple repos or one single?

How to setup for existing files?

How to manage alternative keys

Difference to Datalad

Syncthing

How good is the partial directory support?