An introduction to every parent’s trial and travails
When our son was born early last year, I admit I wasn’t ready for it. Fatherhood was not the kind of thing I was ready for (and who really can ever be ready for parenthood, anyways?).
It so turns out that the vast majority of the first year of parenting is simply enduring the gut-wrenching cries of your little one. And cry they do - crying when they are too tired, screaming when they are too energetic, crying when they are gassy, screaming when they are bored, and crying when they just pooped.
The trials that Annie and I went through with our little guy was particularly difficult on us (you can ask me in person if we ever get to chat). The little guy was a prolific screamer and absolutely. hated. sleep.
What’s a geeky dad to do? Quantify household suffering by leveraging machine learning, of course.
I set out to build a system that would in the end determine how well our little guy slept through (or didn’t sleep through) the night. I started by building a system that naively parsed audio samples from his nursery, and then trained a TensorFlow model to do more accurate detection of his cries. Here’s how it worked:
Act 1: The Misery Meter
In version 1 of the system, I bought a cheap USB microphone and hooked it up to a Raspberry Pi 3.
In it, I loaded up a script to record a 30-second audio sample, using the
arecord UNIX command line tool:
1 2 3 4 5 6 7 8 9
…then I ran the
sox command-line tool to grab some simple loudness statistics out from it:
In case you’re curious, here’s the full output from
sox -V3 FILE -n stats. Pretty nifty:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Here, we’re really only interested in
RMS dB, which is the relative loudness levels within this 30-second sample. I chose to push these three stats up to a web service which I use to aggregate and graph these metrics.
RMS lev is the average,
RMS Pk is the peak, and
RMS Tr is the trough (the floor).
I’m not showing the entirety of the script, but the last thing it does is parse and push the results of the analysis of this audio sample to a Web-based metrics aggregation service! In case you were wondering, I have an API service sitting between the Raspberry Pi and a time-series API supplied by Keen.io. But the main point is that now I can load up a cute JS widget that graphs these data points!
Now, how does one read this graph?
- We can follow the peaks of the audio signal and assume that any noise over a certain dB threshold is the little dude’s screaming.
- We can follow the troughs of the graph and assume that if the trough jumps, then man there is some serious crying going on, since the audio floor of the soundscape has been bumped up!
- Or we can follow the average RMS reading and assume some combination of the two?
The truth of the matter, none of the readings from the Misery Meter (so I called it) were particularly reliable indicators of “the little buddy is crying his little head off”. Sometimes, his crying was at the same volume level as other ambient noises in the house (say, when he’s playing in the other room and someone shuts a door). So it turns out that using volume as a proxy for crying is insufficient to give us reliable results.
Act 2: Enter TensorFlow… next!
In my next post, I’ll discuss how I modified this script to use TensorFlow to train a model that could then be used to enhance the accuracy of little dude’s crying. Stick around, it’ll be fun!