Using Kubernetes CronJobs

2020-05-05

My backups are fairly simple, but they work for me. I back everything up to GSuite (Pseudo-unlimited storage) using Rclone. I encrypt everything I backup to GSuite, because that’s the right thing to do.

Before proceeding through this guide, you should already have a functioning connection to GSuite using Rclone. If you do not have a working rclone.conf, I would recommend using this guide.

Manifests

The Kubernetes Manifests are located here: live_archivist/demo-bu-rclone

Rclone Namespace

Right now I have two Kubernetes clusters, but I’m going to be merging them into one big cluster very soon. In order to prepare to join the two clusters, I started to set up separate namespaces for all of the services that will be running. Eventually I will add QOS to certain namespaces, to ensure they get the resources they need.

apiVersion: v1
kind: Namespace
metadata:
  name: rclone

Thankfully, to create a namespace, it’s relatively easy and not much to declare here.

Apply the namespace by running kubectl apply -f manifests/1_demo-bu-rclone-namespace.yaml.

rclone.conf to ConfigMap

Initially when I put this together I was just going to map a volume to the container running rclone and then put my rclone.conf file in there, but that’s not very k8s-sy, so I decided to figure out how to do this with a ConfigMap.

A ConfigMap allows you to decouple the configuration of a pod from the pod itself, which makes it easier to re-use your pods and containers. The data: portion of a ConfigMap is just a single or a series of Key:Value pairs. In our case, we’re going to turn the entire rclone.conf file into a single Key:Value pair named rclone.

apiVersion: v1
kind: ConfigMap
metadata:
  name: rclone-config
  namespace: rclone
data:
  rclone: |
    [gdrive]
    type = drive
    client_id = REDACTED
    client_secret = REDACTED
    scope = drive
    token = REDACTED
    root_folder_id = REDACTED

    [gcache]
    type = cache
    remote = gdrive:/
    chunk_size = 5M
    info_age = 1d
    chunk_total_size = 10G

    [gcrypt]
    type = crypt
    remote = gcache:/
    filename_encryption = standard
    directory_name_encryption = true
    password = REDACTED
    password2 = REDACTED

This is pretty straight-forward, make note that the namespace matches the namespace we created in the previous step. Let’s create the ConfigMap by running kubectl apply -f manifests/2_demo-bu-rclone-configmap.yaml.

CronJob

At first I thought this would be very straightforward, create the CronJob and mount a volume containing the ConfigMap like in this blog post. Unfortunately, I ran into issues, because rclone copies the rclone.conf file at runtime to rclone.conf.old and then uses the in-memory config file in case it makes any changes. This copy function, is a write on the filesystem - and ConfigMaps are Read-Only.

Of course, I didn’t know this immediately, or even an hour into it. So I figured my yaml files were not using the correct terminology, but it turned out I needed to use an InitContainer to copy the values from the ConfigMap to a new rclone.conf file in an emptyDir.

Here’s the resulting manifest, we’ll walk through it piece-by-piece after the break.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: cronjob-demo-backup
  namespace: rclone
spec:
  schedule: "0 7 * * *"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          initContainers:
          - name: init-cronjob-demo-backup
            image: busybox
            command: ['cp', '/config/rclone.conf', '/rc-config/rclone.conf']
            volumeMounts:
            - name: rclone-confmap
              mountPath: /config/rclone.conf
              subPath: rclone.conf
            - name: rc-config
              mountPath: /rc-config
          containers:
          - name: cronjob-demo-backup
            image: lucashalbert/rclone
            volumeMounts:
            - name: cronjob-demo-data 
              mountPath: /data
            - name: rc-config
              mountPath: /config
            env:
            - name: SUBCMD
              value: "sync"
            - name: PARAMS
              value: "-v /data gcrypt:/_backups/cronjob-demo"
          volumes:
          - name: cronjob-demo-data
            hostPath:
              path: /mnt/homedev/cronjob-demo
              type: Directory
          - name: rclone-confmap
            configMap:
              name: rclone-config
              items:
              - key: rclone
                path: rclone.conf
          - name: rc-config
            emptyDir: {}
          restartPolicy: OnFailure

The first part of the manifest is a pretty basic CronJob declaration, basically all were saying is to create a CronJob with a particular name in a namespace, with a specific schedule. Note: Kubernetes always uses UTC, so adjust accordingly.

My backups can sometimes span days, thanks to my cable company’s craptastic upload speeds, so I wanted to ensure that a second job wouldn’t be started if the previous one hadn’t completed. Enter, the concurrencyPolicy. Basically it is Forbidden that another Job is started if the previous hasn’t completed.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: cronjob-demo-backup
  namespace: rclone
spec:
  schedule: "0 7 * * *"
  concurrencyPolicy: Forbid

Next up, an InitContainer.

    spec:
      template:
        spec:
          initContainers:
          - name: init-cronjob-demo-backup
            image: busybox
            command: ['cp', '/config/rclone.conf', '/rc-config/rclone.conf']
            volumeMounts:
            - name: rclone-confmap
              mountPath: /config/rclone.conf
              subPath: rclone.conf
            - name: rc-config
              mountPath: /rc-config

The InitContainer does one simple command, it copies the ConfigMap version of rclone.conf and writes it to an empty directory mounted at /rc-config. Once the InitContainer completes it’s task, it terminates and the main container starts. I should note, EmptyDir Volumes are created at the instantiation of the Pod and killed off at the end. This means they can be shared within any of the containers that are a part of that Pod.

Now, the main show, the Rclone container.

          containers:
          - name: cronjob-demo-backup
            image: lucashalbert/rclone
            volumeMounts:
            - name: cronjob-demo-data 
              mountPath: /data
            - name: rc-config
              mountPath: /config
            env:
            - name: SUBCMD
              value: "sync"
            - name: PARAMS
              value: "-v /data gcrypt:/_backups/cronjob-demo"
          volumes:
          - name: cronjob-demo-data
            hostPath:
              path: /mnt/homedev/cronjob-demo
              type: Directory
          - name: rclone-confmap
            configMap:
              name: rclone-config
              items:
              - key: rclone
                path: rclone.conf
          - name: rc-config
            emptyDir: {}
          restartPolicy: OnFailure

There are two volumes that get mounted to this container, first, the actual data directory that contains the data we want to backup. Second, is the previously Empty Directory, which is now populated with a writable rclone.conf file, thanks to our InitContainer.

You’ll want to pay special attention to the environment variables: SUBCMD and PARAMs. You’ll want to populate those with the specifics from the rclone command you would run, if you were to manually run it.

PRs Accepted!

I’m just learning Kubernets, so I’m sure there are better ways to accomplish this. Please open a PR or an Issue if you have questions or know of a better way to do this.