Database backups via mysqldump: from MariaDB container to S3

9 minute read

For a little side project I wanted an easy way to perform regular backups of a MariaDB database and upload the resultant dump gzipped to S3.

Here are the steps to make this happen.

The setup

  • Docker container running MariaDB
  • Docker engine running on a AWS EC2 instance
  • An S3 bucket as the destination for the dumps

Writing the backup script

We begin with writing our shell script, backup.sh, which we will later execute in regular intervals from our host:

#!/bin/bash

set -e

trap '/usr/bin/rm -f $temp_file; /usr/bin/rm -f "$temp_file.gz"' EXIT
temp_file=$(/usr/bin/mktemp)

# create db dump
/usr/bin/docker exec <my container> sh -c 'exec mysqldump --hex-blob -uroot -p"$MYSQL_PASSWORD" <my database>' > $temp_file

# gzip
/usr/bin/gzip $temp_file

# upload to S3
/usr/bin/aws s3api put-object --bucket <my bucket> --key "<some directory/prefix>/$(date -u '+%Y-%m-%d_%H-%M-%S').dump.sql.gz" --body "$temp_file.gz"

exit 0

What are we doing here?

  • First, we make sure to have a temp file for our operation (mktemp) and register a trap for cleaning up this temp file (with or without .gz extension) on exit.
  • Next, we perform the dump via mysqldump running inside the mariadb container. The result is written to our temp file.
  • Note that we’re using --hex-blob to get data from binary columns hex-encoded and not garbled up
  • After writing our dump file, we gzip it
  • Finally, we upload the gzipped file via the aws cli tool to a bucket of our choice and give it a timestamp

Setting permissions for S3 upload

To allow our EC2 instance to put objects into our bucket, we’ll adjust the policy like so:

{
  "Effect":"Allow",
  "Action": [
    "s3:ListBucket",
    "s3:PutObject"
  ],
  "Resource": [
    "arn:aws:s3:::<my bucket>/<some directory/prefix>/*",
   ]
}

Installing the cron job

To let our backup script run every nth hour, we set up a cronjob via crontab -e:

0 */6 * * * /path/to/script/backup.sh && /usr/bin/curl -s https://ping.allgood.systems/l/j/<random id>

Here, we are running the script every 6 hours and – on success – issue a HTTP request via curl to a monitoring system. The latter part is optional but a good way to get notified should your script ever stop running at the expected interval. The monitoring service I use is allgood.systems, which I recently built.

Cleaning up old backups

Once the backup files are in your S3 bucket you can decide how to proceed. Generally, it is recommended to archive some of the dumps at a different location.

To regularly clean up your old files I would additionally recommend setting up a “Lifecycle rule” for your S3 bucket or even just a prefix in that bucket. This would then automatically assign an expiry date to your uploaded files.

Final thoughts

If you are working with a larger/busier/more critical database, there are a couple of things you might want to tweak (e.g. in regards to buffering, single transactions, checking for enough space, etc.) or choose a different approach altogether (e.g. full snapshots, setting up replication, etc.).

Like to comment? Feel free to send me an email or reach out on Twitter.

Did this or another article help you? If you like and can afford it, you can buy me a coffee (3 EUR) ☕️ to support me in writing more posts. In case you would like to contribute more or I helped you directly via email or coding/troubleshooting session, you can opt to give a higher amount through the following links or adjust the quantity: 50 EUR, 100 EUR, 500 EUR. You'll be able to get a receipt for your business.