100 Days of Coding

2020-04-15

I will be spending the next 100 days, coding for at least one hour per day. I will be focusing on my scalable surveillance software, Spectare.

Repository: spectare-pub

Days 15-21

2020-04-30 –> 2020-05-06

I’ve been bad about updating, but I’ve done a pretty decent job at sticking to it. I recently put up a blog post walking through how to utilize rclone and Kubernetes CronJobs to do nightly backups to GSuite. In addition to that, I’ve started working on another demo using RethinkDB Changefeeds to scale a deployment. I’ll be utilizing this code to patch Spectare so it utilizes Changefeeds instead of a query every 10 seconds to the database. This is a complicated issue to fix, so it’ll take me some time. Hoping to have it out in the next few days, but I’m also trying to stay off the computer as much as possible since I’m on furlough and need the break.

Day 14

2020-04-29

I figured I would spend the next few days getting the pipeline ready to build automated tests and various other things for my code. Tonight I got a few Gitlab Runners deployed and plan on starting to work on some of the CI/CD files tomorrow.

Day 13

2020-04-28

Still working on getting the new cluster and dev environment billed out. I manage to clear up some of my confusion on Cert-Manager and Ingress with Traefik and got a wildcard DNS entry passing through to the cluster. This will allow me to do some cool CI/CD things with each commit. More to come with this soon!

Day 12

2020-04-27

I spent today focusing on building out a more stable dev environment. I will be working tomorrow to get Gitlab ported over from running physically on a Pi, to running on Kubernetes, along with some runners that scale out.

Day 11

2020-04-26

Not a lot of code got written today, but started playing a little more with OpenCV and Object Detection. More will come on this soon.

Day 10

2020-04-25

I spent a couple hours getting scale up and down working properly in the cluster. I had to switch the workers away from a deployment to bare pods, then add some logic into Pool Manager.

You can check out the changes in this commit.

Day 9

2020-04-24

This morning was VERY successful, although not terribly sexy. I closed and merged two issues API Refactor and Pool Manager issues if ReplicaSet doesn’t exist.

I finished up migrating to the new API routes for the Workers and PoolMgr. Tested well and seems to be much more stable.

I then took a look at an issue where the pool manager would freak out if a replicaset didn’t exist. I changed the default replicaset info from scale=1 to scale=0, then added the right logic in pool manager to handle that. Pool Manager now handles the scale of workers and smart-detect from 0 to n.

Then I spent some time scoping how to migrate the workers to OpenCV to replace FFMPEG. The workflow remains largely the same, but I sketched it out so I can scope the code out properly.

cameras_get_aval()
workers_update_status()
--> update cameras table
--> update pods table
worker_record()
--> generate uuid for initial footage name
--> begin recording to uuid.mp4
--> once camera_segment_length reached, close mp4
--> update footage table (uuid.mp4, date/time, camera_name)
--> repeat
on_exit()
--> mark camera as grabbed=0

Day 8

2020-04-23

I spent another hour or so working on the API this morning. I also spent some time trying to build CUDA capable containers, but kept hitting a wall. I’ll take another stab at that tomorrow.

Day 7

2020-04-22

Well, I’m still at it. I decided today was a good day to stop and re-factor my API routes and functions. I didn’t have a very clear naming scheme and I was doing a lot of things in the routes that should be done inside of the various workers instead. Not a whole lot of excitement, but very important stuff to ensure this will continue to scale. It’ll probably take me a few more hours to get this re-factored properly.

I am making sure I am documenting things really well, which should make things easier in the future.

A small selection of the API re-write is below:

# /cameras/byname/cam_name
# GET: Returns all data for specified camera
# e.g. /cameras/byname/main-entrance
# expected response:
#     {
#         "details": {
#             "name_friendly": "Main Entrance",
#             "url": "rtsp://admin:password@10.45.3.22:554/Streaming/Channels/1"
#         },
#         "id": "d46df43e-02f7-40fa-af77-3104f2736b36",
#         "name": "main-entrance",
#         "smartdetect": {
#             "enabled": "0",
#             "sd_pod": ""
#         },
#         "worker": {
#             "grabbed": "1",
#             "worker_pod": "worker-8cjz7"
#         }
#     }
@app.route(api_path + "/cameras/byname/<string:cam_name>", methods=['GET'])
def cameras_get_byname(cam_name):
    cam_list = cameras_list()
    for cam in cam_list:
        if cam['name'] == cam_name:
            cam_id = cam['id']
            camera = r.table('cameras').get(cam_id).run(g.rdb_conn)
    return json.dumps(camera)

Day 6

2020-04-21

Well holy crap, I managed to get Object Detection working. It’s not performing very well, but I did manage to get this sexy picture of me pulled from an RTSP stream! It even tagged me as a person with 64% confidence. I feel like I have more confidence than that, but whatever.

Smart-Detect

I have a lot more work to do. Adios.

Day 5

2020-04-20

Decided I needed to add the dates in here because I’m losing track of time with the COVID-19 lock downs. I’m honestly very shocked at how much can be done in such a short focused amount of time. I’m estimating I have about a dozen hours into this project so far, so it feels really good to have a MVP up and running. I could spend tons of time focusing on getting the core right, but what’s a surveillance platform without the ability to see the footage? Today marks the beginning of the process to build a UI.

I opened the repo up for the world today because I’m no longer ashamed of the code, link at the top of the page.

Not a ton of progress was made today, as I’m starting to run into limitations with my implimentation of FFMPEG. With the desire to eventually add motion detection and tracking, OpenCV was going to eventually make it into the platform, so I’ve made the decision to start migrating all FFMPEG code to OpenCV. This will also allow me to better implement the Web UI, as there are a ton of walk-throughs of OpenCV/Python streaming of RTSP to a browser.

I did manage to get the WebUI deployment working using Nginx, but I thought better of going this route and will move to delivering the site with Flask through a route at /.

I started poking at getting one of my NVIDIA Jetsons setup at my desk, so I can start doing Image Recognition testing. Lots of theory to get under my belt before this becomes a real thing.

Last update: I recorded a short demo of adding a camera to the cluster. I got a test camera setup at my desk, so I figured I should add it into the system. Dropbox Link to Video

Day 4

2020-04-19

BUGS and Improvements. Today I built a new route to reset the camera worker status, so I don’t have to keep manually doing it while testing. This exposed a few bugs. If the API crashes or is reloaded, the running workers are Terminated and the ReplicaSet scales back to 1. That is a temporary issue and it recovers once the API recovers and Pool Manager stops being grumpy. The bigger bug it exposed though is related to output file naming. Currently output files are named using the following schema:

output = "/footage/" + cam_shortname + "-output%08d.mp4"

This is fine, if the worker never restarts. But if the camera is assigned a new worker or ffmpeg crashes, then the worker will start re-writing at -output00000000.mp4, overwriting existing footage.

I took the time to properly record the bugs and my notes for improvements in GitLab Issues, so I can track them over time.

I’m also starting to image and prep two NVIDIA Jetson Nanos to be added to the cluster for some future AI/ML goodness - maybe even GPU offloading of transcoding! Stay Tuned!

Not a lot of platform improvements made today, but overall progress was made.

Day 3

2020-04-18

A lot of changes today. The worker pods for cameras scale based on the number of cameras in the database. The database is then updated with the name of the pod that is working the camera as well as marking the camera as grabbed.

The cameras are then recorded in 5 minute segments to a NFS share. Nothing special.

As you add and delete cameras, the pods auto-scale - WHICH IS SUPER FUCKING COOL!

I’m spent. It’s been a long day.

Day 2

2020-04-17

I was not feeling super energetic today, but I was angry with Redis so I went to town. First, I dumped Redis entirely and have started to work on integrating RethinkDB as my backend database. There will be reasons that become clear for this change down the road, but I will tell you there are some very specific features that will make my life a lot easier.

I’ve also started to refactor my kubernetes deployment model. There’s a specific order that the yaml files will need to run, so I’ve renamed them appropriately.

I’ve also build a K8s “job”, which does initial Database setup for the RethinkDB database. Right now I’m building a new docker image for RethinkDB on Raspberry Pi and will be uploading it to my private registry soon. Once I’m confident in my images, I’ll also be putting them on Docker Hub.

Along with refactoring to RethinkDB, I aslo began re-writing the Flask API to utilize @app.before_request and @app.teardown_request. Fun times ahead!

Day 1

2020-04-16

This morning I worked on adding a stats endpoint to the API. This endpoint will return a few items like System Health, Camera Stats (total number of cameras, grabbed cameras and orphaned cameras). In the future, this will also include other details about the system, including motion detection notification, recording info, worker pod info, ALPR/Facial Detection tags, etc.

I also started writing the Dockerfile for the API container. I did a test build of the API in docker and it runs just fine. No issues that I could detect. I need to get the camera data moved from a python dictionary to the Redis database.

I took the first pass at a Worker container. It queries the running API (still with the static cameras list) and asks for the first camera that is not “grabbed”. Currently it chops the videos into 60 second segments for testing. It seems to work really well too!

I need to work on the build pipeline for the project, as well as setting up a container registry on-site. I would like for Gitlab to build the containers then push them into a local registry.

api.py

#!/bin/python3

import flask
from flask import request, jsonify
import redis

app = flask.Flask(__name__)
app.config["DEBUG"] = True

# r = redis.Redis("redis.spectare.svc.cluster.local")

cameras = [
    {'id': 0,
     'friendlyname': 'Garage',
     'name': 'garage',
     'url': 'rtsp://admin:password@10.45.3.21:554/Streaming/Channels/1',
     'tags': [],
     'grabbed': 0},
    {'id': 1,
     'friendlyname': 'Backyard West Gate',
     'name': 'backyard-west-gate',
     'url': 'rtsp://admin:password@10.45.3.22:554/Streaming/Channels/1',
     'tags': [],
     'grabbed': 1},
    {'id': 2,
     'friendlyname': 'Driveway',
     'name': 'driveway',
     'url': 'rtsp://admin:password@10.45.3.23:554/Streaming/Channels/1',
     'tags': [],
     'grabbed': 0},
    {'id': 3,
     'friendlyname': 'Backyard',
     'name': 'backyard',
     'url': 'rtsp://admin:password@10.45.3.24:554/Streaming/Channels/1',
     'tags': [],
     'grabbed': 0}
]

@app.route('/', methods=['GET'])
def home():
    return "<h1>Spectare API</h1><p>Howdy! There's nothing here.</p>"

@app.route('/stats', methods=['GET'])
def stats():
    numCams = len(cameras)
    grabbedCams = 0
    orphanedCams = 0
    # Count Grabbed and Orphaned Cameras - Update proper variables to hold count
    for cam in cameras:
        if cam['grabbed']== 1:
            grabbedCams += 1
        elif cam['grabbed']== 0:
            orphanedCams += 1
    # Return statistics
    return jsonify({'system_health':'OK', 'cam_health': {'num_cams': numCams, 'grabbed_cams': grabbedCams, 'orphaned_cams': orphanedCams}})

# Route to return all entries
@app.route('/api/v1/cameras/all')
def cams_all():
    return jsonify(cameras)

@app.route('/api/v1/cameras', methods=['GET'])
def api_id():
    if 'id' in request.args:
        id = int(request.args['id'])
    else:
        return "Error: No id field provided. Please specify an id."

    results = []

    for camera in cameras:
        if camera['id'] == id:
            results.append(camera)

    return jsonify(results)

@app.route('/api/v1/grabbed', methods=['GET'])
def api_grabbed():
    if 'status' in request.args:
        status = request.args['status']
    else:
        return "Error: No grabbed field provided. Please specify an grabbed status [1/0]."

    results = []

    for camera in cameras:
        if camera['grabbed'] == status:
            results.append(camera)

    return jsonify(results)

app.run(host='0.0.0.0')

worker/entrypoint.py

#!/bin/python3

import ffmpy
from ffmpy import FFmpeg
import json
import requests

r = requests.get('http://172.17.0.2:5000/api/v1/grabbed?status=0')
avail_cams = json.loads(r.content)

avail_cams_length = len(avail_cams)

if avail_cams_length != 0:
    if avail_cams[0]['grabbed']=="0":
        cam_friendlyname = avail_cams[0]['friendlyname']
        cam_shortname = avail_cams[0]['name']
        cam_url = avail_cams[0]['url']
        print("Camera Name: " + cam_friendlyname)
        print("Camera URL: " + cam_url)
    
        output = "/app/" + cam_shortname + "-output%03d.mp4"
        # Start FFMPEG STUFF
        if cam_url:
            ff = FFmpeg(
                inputs={cam_url: '-rtsp_transport tcp'},
                outputs={output: '-map 0 -c copy -f segment -segment_time 60'}
            )
        ff.cmd
        ff.run()
        # If FFMPEG Good, POST {grabbed:1}
    else: 
        print("Available Camera 0 has since been grabbed by another pod.")
        exit(0)
else:
    print("All cams grabbed.")
    exit(0)

Day 0

2020-04-15

Got Kubernetes cluster stood up for development and tied to Gitlab. Need to deploy the Gitlab runner and build CI/CD files. I also scoped out the Redis Database (up and running on K3s) and the Flask API. It’s a very basic start and does nothing with the Redis db yet.

#!/bin/python3

import flask
from flask import request, jsonify

app = flask.Flask(__name__)
app.config["DEBUG"] = True

cameras = [
    {'id': 0,
     'friendlyname': 'Garage',
     'name': 'garage',
     'url': 'rtsp://user:password@10.45.3.21:554/Streaming/Channels/1',
     'grabbed': False},
    {'id': 1,
     'friendlyname': 'Backyard West Gate',
     'name': 'backyard-west-gate',
     'url': 'rtsp://user:password@10.45.3.22:554/Streaming/Channels/1',
     'grabbed': False},
    {'id': 2,
     'friendlyname': 'Driveway',
     'name': 'driveway',
     'url': 'rtsp://user:password@10.45.3.23:554/Streaming/Channels/1',
     'grabbed': False},
    {'id': 3,
     'friendlyname': 'Backyard',
     'name': 'backyard',
     'url': 'rtsp://user:password@10.45.3.24:554/Streaming/Channels/1',
     'grabbed': False}
]

@app.route('/', methods=['GET'])
def home():
    return "<h1>Spectare API</h1><p>Howdy! There's nothing here.</p>"

# Route to return all entries
@app.route('/api/v1/cameras/all')
def cams_all():
    return jsonify(cameras)

app.run()