Every nerd with a raspberry pi has thought about making it into a personal assistant. I am one of those nerds. So in true Shenanigans fashion, I cobbled together a series of micro-services in a frankenstein’s-monsteresque amalgmation of bash-scripting, Python and PHP. It’s glorious. I’m not putting the whole thing on github, cause there’s some personal keys in there that I’m too lazy to env out. But I’ll highlight some of the more interesting bits of the build.
I’ve always felt like a speech-recognition-based system was awkward. I don’t like talking to Siri or Google in public and buttons just seem way faster than 1: talking, 2: the server barely recognising what I say, 3: it having to natural-languange-process its way into a understandable command.
Plus I prefer not getting spied on by alexa or google home…
Talking to my devices feels weird, but the devices talking back is a pretty handy way of conveying information without a screen. I tried a couple of TTS (text-to-speech ) engines that run locally rather than using a remote services that tracks all of my data (call me paranoid, but you never know with these things).
Festival is pretty promising, but everything not backed by a giant company like Google or IBM still sounds like a drunk robot. Ultimately I landed on Amazon Polly. The API is pretty straightforward and fast enough to feel like low/no-latency. It probably still knows everything it’s ever said to me, but untill some open-source alternative gets out of the uncanny valley it’s a compromise I’m willing to make. The following uses the IVONA speechcloud php package. (Amazon Polly used to be called Ivona, they changed it a couple months back but all the underlying endpoints still seem to work)
(I switch up the voices from time to time to find the best/most interesting one. Raveena is Indian-English)
This outputs the content of a MP3 file, which I pipe into mpg123. I suppose you could configure aplay (the default linux audioplayer) to work with mp3s but I prefer mpg123 for its simplicity and higher-level api.
Motion detection with Raspberry PI
At first “House” just ran on my laptop, and was triggered via command-line. I had a “house morning” command that does all the usual my-first-assistant stuff. Weather, calendars, reminders,… and I thought it would be pretty futuristic to trigger it when I walked into the living room in the morning. So I ported everything to run on linux (isn’t unix the best, by the way?) and started running it on one of the seven thousand unused Rpi’s I had laying around.
Hooking up the Pi with a PIR (passive infrared) motion sensor was pretty straightforward (shoutout to This dude). Those GPIO pins are awesome and let you use the Pi as an arduino that you can SSH into (and way more probably, but that’s how I use them now). Because I don’t know how to solder and didn’t have a large enough breadboard laying around I just kind of mapped out what wires went where on the Cobbler breakout-cable. If you’re trying anything like this yourself: get a breadboard. Because once those wires fall out it’s a giant pain in the ass to reassemble everything. And those wires will fall out.
Small note: I found that the PIR sensor I bought was slightly more effective and reliable when I read the output as analogue instead of digital, but the RPi apparently only has digital inputs, so what am I gonna do? Convert the signal? Who has time for that? Reading the analog input as digital worked just fine for me.
Reverse-engineering an Infrared Remote
The Pi is hooked up to my soundsystem. So is my TV and a bunch of other stuff, so the input-channel is usually not set to House. It’s hard enough to keep track of the 17 remotes I use for tv, apple-tv, the cable-box, etc. So busting out the remote to my amplifier every time didn’t feel like an option. So: I figured I’d reverse-engineer said remote and set the input to the correct line-in whenever House is about to speak.
It turns out that infrafred remotes work pretty intuitively, but are also based on a lot of proprietary protocols, depending on who makes the device you’re trying to control. The underlying principle is kind of a high-frequency morse-code, but the encoding and actual values sent from remote to receiver are different for most vendors.
Using this super-handy tutorial on how to hook up an arduino to a reciever and an IR-LED, and combining that with the GPIO-pin stuff from the motion-detector I was able to “change the channel” on my amp whenever I triggered a House-action. Also mirrors, because of the way my living room is arranged. The Pi is behind a corner and I couldn’t find an appropriate angle on the IR-led to point at the amplifier. So I got some stick-on mirrors to reflect the signal around. It’s not pretty, but it works.