Arch Linux based File Server, Backblaze B2 (cloud storage)

As I mentioned in the abstract, Backblaze B2 storage appears to be quite affordable, and is straightfoward to use.  Again, at one point I had over 4TiB stored, with a rough monthly storage cost of about $20 (USD, February 2022).  Backblaze does not charge for uploads, just storage and downloads.  They mention on their pricing page a flat $5/TB/month pricing (they use SI units, so 1000GB = 1TB, not 1024GiB = 1TiB).  This translates to about $0.005 per GiB, with the first 10GB free.  Downloads are $0.01 per GB (I had downloaded about 650GB/608GiB when I tested this during my DRE, at about $6.50).  They also don't have complicated tiers of storage, every stored file is hot, it can be downloaded as soon as it's uploaded, and there is no limit to how much you can store or retrieve.  There is also no minimum storage duration, so you can delete files as soon as you upload them (this is important since Borg could possibly delete files immediately after a prune and compact step).  If you have a cloud storage provider that is even cheaper than this, with better terms, please leave a comment below!

So, let's get started!

Initialization

If you haven't created a Backblaze B2 account yet, follow the Backblaze B2 Quick Start Guide.  These instructions have you create your first B2 bucket, you can do that now, or I show you how to do it later from the command line.  I made my bucket private, so only I (or Backblaze) has accesss to the bucket.  I also set the LifeCycle rules to only keep the latest version of any uploaded file, since Borg keeps track of these files.  Also, I do not have Backblaze encrypting the files I upload.  Encryption doesn't cost extra, it doesn't look like, but since my Borg clients already encrypt before they send data to the Borg server (tennessine) there's no need for B2 to encrypt things. Originally I did all of this setup from the Backblaze website (before I had Borg encrypt files), but you could also do this from the B2 API, which is what the CLI backblaze-b2 tool does.  The main thing is getting the B2 applicaiton and bucket keys, since you'll need them for the next steps. You will be presented with the master account application key only once, please store this somewhere safe (like your password manager), so you can refer to it later!  This key needs to be used to create buckets, and manage the account.  You can also create per-bucket applicaiton keys, which is recommended so that key only has access to the bucket.  These tokens are how you authenticate to B2, rather than using usernames or passwords, or multi-factor authentication (MFA, sometimes called two-factor authentication, or 2FA).  Since we will be scripting this, using MFA is infeasible, we don't want to be pinged each time the server wants to upload new files, or delete files.

Next, install backblaze-b2 from the AUR.  Backblaze just calls this tool b2, but since Arch already has an officially accepted package in the extra repository which provides an executable named b2 (boost, a popular C++ library), the AUR package maintainer calls the Backblaze B2 CLI tool backblaze-b2.  

Authenticate

The first thing to do with backblaze-b2 is authenticate with B2, using the application key, and application ID.  When you created your account you should have created and stored your master application token and ID.  To be more secure, you may wish to generate a new application token that is bucket-specific, along with a bucket ID.  At least on the B2 website, you only see these keys once, so store them somewhere safe, like your password manager.  I first authorized my master account on my laptop, ferrum, using the following command:

backblaze-b2 authorize-account 0123456789ab cdef0123456789abcdef0123456789abcdef012345

This way I can create and manage buckets using backblaze-b2from ferrum, and there's no need to log into the Backblaze website.  This will create the SQLite file ~/.b2_account_info, so you only need to authenticate with this user once.

Create Bucket

Once you've authenticated with your master application key and ID, it's now time to create a bucket.  If you're more comfortable doing this from the Backblaze website, the Quick Guide linked above shows you how to do it.  If you want to manage all buckets from the website, you don't need to autthorize using your master application key and ID, you can simply replace the master application ID (first positional parameter in the backblaze-b2 authorize-account command) with the bucket ID, and the bucket application token (second positional parameter to authorize-account).  Again, if you do this from the website you only see the application key once, so store it in a safe place!  The rest of these instructions assume you're using the command line.

From the command line/CLI, I need to set up variables for the bucket CORS (Cross-Origin Resource Sharing) rules, and Life Cycle rules.  Actually, since I won't be downloading files via a web browser, CORS rules are not necessary.  I will need to set the Life Cycle rules, since I only ever want to keep the latest version of a file (Borg will manage the changing and deletion of files through pruning and compacting).  See the Life Cycle rules page linked above for the JSON (JavaScript Object Notation) that is necessary to set.  Here is the variable I set up (this requires the jq package to format the JSON structure):

lifecycle_rules='[{"daysFromHidingToDeleting": 1,     "daysFromUploadingToHiding": null, "fileNamePrefix": ""}]'
jq . <<< ${lifecycle_rules}
[
  {
    "daysFromHidingToDeleting": 1,
    "daysFromUploadingToHiding": null,
    "fileNamePrefix": ""
  }
]

Now, I can create the bucket:

backblaze-b2 create-bucket --lifecycleRules ${lifecycle_rules} myEncryptedB2Bucket allPrivate

This will output the bucket ID, which is a 24-character hexadecimal string, e.g. 0123456789abcdef01234567.  We will need the bucket name in the next step to create the bucket application ID and key/token:

backblaze-b2 create-key --bucket myEncryptedB2Bucket myEncryptedB2Bucket-mgmt listAllBucketNames,listBuckets,readBuckets,listFiles,readFiles,shareFiles,writeFiles,deleteFiles,readBucketEncryption,writeBucketEncryption

Since this is a bucket application ID/key, you can't use the option --allCapabilities.  listAllBucketNames is a required capability of this key, so backblaze-b2 can map the bucket name to its ID.  All of the other capabilities are necessary.  I had actually grabbed this from backblaze-b2 list-keys --long for an existing bucket, before I had set up Borg encryption.  The create-key will output the application ID and key, which we can use to authorize the account as the backup user on tennessine.  This is the only time the key/token will be output, if you lose it you'll need to generate a new application ID and key.  With these, I used a similar command as the backblaze-b2 authorize-account command above.

Back Up to B2

With the bucket, application ID, and key created, with the backup user authorized, we can start backing up to Backblaze B2.  First, I wrote a script to do that, /usr/local/sbin/borg_backblaze.sh:

#!/usr/bin/env zsh

b2_bucket="b2://myEncryptedB2Bucket/"
borg_dir="/mnt/snapshots/borg/encrypted"
data_disk="/dev/sda2"

_ret=0
_err=()

if grep -q "${borg_dir}" /proc/mounts; then
    # latest snapshot is mounted, back up to B2
    if ! backblaze-b2 sync --delete --replaceNewer --excludeDirRegex '.*/.snapshots' ${borg_dir} ${b2_bucket}; then
        _ret=$(( ${?} + ${_ret} ))
        err+="Backup to B2 failed!"
    fi
else
    _ret=$(( ${?} + ${_ret} ))
    _err+="Latest snapshot not mounted!"
fi

if [[ ${_ret} -gt 0 ]]; then # we had an error
    echo "Backing up to B2 failed!  Retrun code: ${_ret}, Error array:  "
    for err in ${_err[@]}; do
        echo ${err}
    done
    exit ${_ret}
fi

# if we're here, backup was successful!

Be sure to replace myEncryptedB2Bucket name with your globally unique bucket name!  This assumes the modified /usr/local/sbin/mount-latest-snapshot.sh to run in the ExecStartPost of snapper-timeline.service, via the systemd unit override (as in the Borg setup):

#!/usr/bin/env zsh

#set -x
# Set Btrfs subvolume IDs
enc_snapshots_subvol_id="48246"
latest_enc_snapshot_id=$(sudo btrfs subvolume list -p /data | grep "parent ${enc_snapshots_subvol_id}" | tail -n 1 | grep -Po '^ID \d+' | tr -d '[ID ]')

# Set up variables
btrfs_disk="/dev/sda2"
enc_dir="/mnt/snapshots/borg/encrypted"

_ret=0
_err=()

# unmount previous encrypted snapshot
if grep -q "${enc_dir}" /proc/mounts; then
    if ! umount "${enc_dir}"; then
        _ret=$(( ${_ret} + ${?} ))
        _err+="Could not unmount source directory: ${enc_dir}!"
    fi
fi

# mount latest encrypted snapshot
if ! mount -o ro,compress=zstd,subvolid=${latest_enc_snapshot_id} "${btrfs_disk}" "${enc_dir}"; then
    _ret=$(( ${_ret} + ${?} ))
    _err+="Could not mount latest snapshot to ${enc_dir}!"
fi


if [[ ${_ret} -ne 0 ]]; then
    # something above failed, exit with nonzero status
    echo "Mounting latest snapshots failed!  Return code:  ${_ret}, Error array:  "
    for err in ${_err[@]}; do
        echo ${err}
    done
    exit ${_ret}
fi

# If we're here, latest root and home snapshots have been mounted
#set +x

Next, I setup the systemd units backblaze.service and backblaze.timer:

# /etc/systemd/system/backblaze.service
[Unit]
Description=Backup Borg data to the Backblaze B2 Cloud
Wants=network-online.target
After=network-online.target

[Service]
Type=oneshot
User=backup
ExecStart=/usr/local/sbin/borg_backblaze.sh

[Install]
WantedBy=multi-user.target
# /etc/systemd/system/backblaze.timer
[Unit]
Description=Upload Borg data to Backblaze
Wants=network-online.target
After=network-online.target borg-compact.timer

[Timer]
OnCalendar=*-*-* 00/4:00:00
AccuracySec=5min
RandomizedDelaySec=10min

[Install]
WantedBy=timers.target

Finally, I enabled the backblaze.timer:

systemctl enable --now backblaze.timer

Restoring from B2

Eventually, rebuilding the Borg server repositories will become necessary, either as part of a disaster recovery exercise (DRE), or recovering from an actual catastrophic event (flood, fire, petty theft, or other complete failure of the file server).  These instructions are a way to restore these from Backblaze B2 cloud storage.

For these instructions, I'm restoring to my old QNAP NAS, sodium.  The first thing I need to do is use backblaze-b2 to authorize the restoring system user (in this case, backup) to access the encrypted Borg repository in B2:

backblaze-b2 authorize-account 0123456789ab cdef0123456789abcdef0123456789abcdef012345

Next, I created the destination directory and changed to it:

mkdir -p /data/backup/borg/encrypted
cd !$

After some trial and error, I developed the following script to download and store each file in the Backblaze B2 bucket into the current directory:

# for each file in the bucket
for file in $(backblaze-b2 ls \
              --long --recursive --json \
              myEncryptedB2Bucket | jq -r '.[].fileName'); do 
    echo ${file} 
    mkdir -p $(dirname ${file})
    backblaze-b2 download-file-by-name myEncryptedB2Bucket ${file} ${file}
done

Replace myEncryptedB2Bucket with your actual B2 bucket name (which is globally unique).  After this, I can continue restoring access to the Borg repositories, as described in the Borg DRE section.

Conclusion

And that's it for backing up to and restoring from Backblaze B2!  If you have any questions or comments, please leave them below!

NEXT STEPS

The following articles describe how I set up the Arch Linux-based file server, as described in the abstract:

Please leave feedback below if you have any comments or questions!