Voice Control Without the Cloud: Local-Only Smart Lighting with Shelly RGBW2 + ESPHome
Last month, a client—a cybersecurity analyst—called me in a panic. His “smart” recessed lights had just flashed amber mid-presentation. Turns out his voice assistant had silently updated its firmware, rerouted audio through an overseas server, and briefly disabled local scene triggers. He yanked the hub that same day. We replaced it with four Shelly RGBW2s, ESPHome, and Rhasspy on a Pi Zero W—and haven’t touched the internet since.
“Just Use Alexa or Hue” — Nope. Here’s Why That Fails for Real Privacy
The popular take? “Buy a certified smart bulb. It’s plug-and-play.” Sure—if you’re okay with your living room light reporting ambient noise to a third party every time it hears “Hey Siri,” or if you don’t mind firmware updates that rewrite your dimming curves without warning.
This falls flat because certified = cloud-dependent. Even bulbs labeled “local control” usually require a vendor bridge (like Philips Hue Bridge) that still phones home for authentication, feature unlocks, or even OTA health checks. For someone auditing their own network—like our security pro—that’s not “local.” That’s “obfuscated remote.”
I think true local-only means: no outbound DNS requests, no TLS handshakes to unknown IPs, no wake-word audio leaving the LAN—even in encrypted form. If it can’t run behind a firewall with all upstream traffic blocked, it doesn’t qualify.
What Actually Works: Shelly RGBW2 + ESPHome + On-Device ASR
You need three layers—and none of them talk to the cloud:
- Hardware: Shelly RGBW2 modules (not the newer Pro models—those add unnecessary BLE radios and cloud hooks). These are dumb enough to be trustworthy: relay + PWM outputs, no mic, no camera, no Wi-Fi stack beyond what ESPHome needs.
- Firmware: ESPHome flashed locally via USB or OTA. No vendor app. No account. You write YAML, compile, and push. Full control over every GPIO, timing curve, and MQTT topic.
- Voice: Rhasspy (for wake-word + intent parsing) or Vosk (lightweight speech-to-text only) running on a Raspberry Pi 4 or Pi Zero 2W. Audio stays on-device. Wake word is a 200KB neural model—not streaming bytes to Amazon.
We used Rhasspy because it handles both wake-word spotting (“Hey Lamp”) and intent mapping (“dim to 30% in office”) in one pass—no round-trip to a server needed. Total RAM usage: under 180MB on idle.
Wiring Recessed Cans—Without Rewiring Your Ceiling
You’re not replacing fixtures—you’re retrofitting drivers. Here’s how we did it in a 12’x14’ home office with six 6-inch IC-rated recessed cans:
- Turn off breaker. Verify with non-contact tester.
- Remove existing LED driver (usually a small black box clipped behind the trim).
- Wire Shelly RGBW2’s L (line) and N (neutral) to house supply.
- Connect OUT1–OUT4 to the four color channels (R/G/B/W) of a compatible 4-channel LED strip *or* to four separate 12V constant-voltage drivers feeding discrete monochrome LEDs (we used warm-white 2700K and cool-white 5000K strips side-by-side for tunable white).
- Ground the metal chassis—non-negotiable for safety in insulated ceilings.
No neutral required at the switch location. The Shelly sits *at the fixture*, not the wall. That’s critical: it eliminates the need for neutral wires in old switch boxes—and keeps control logic physically close to the load.
Lumen output? With two 12V/3A drivers (one per white channel), we hit ~2,400 lumens total—enough to comfortably task-light the desk while keeping ambient glow soft. ESPHome lets you set max current per channel (we capped red/green/blue at 1.2A to prevent thermal throttling), and define warm/cool white mixing ratios so “2700K” actually looks like 2700K—not some vendor’s guess.
OTA Updates That Don’t Betray You
ESPHome’s OTA is safe *because* it’s dumb-simple: your Pi hosts a local web server; you click “Upload” in Home Assistant (or use esphome upload CLI); the binary goes straight to the Shelly over your LAN. No signing keys. No cloud registry. No “update available” nag banners.
We version-control our YAML configs in a private Git repo. When we tweak fade time from 300ms to 800ms (smoother for eye strain), we tag it, rebuild, and push—all in under 90 seconds. No waiting for “server approval.”
Triggering Scenes Without Saying “Alexa”
Rhasspy listens for “Hey Lamp” on a USB mic taped to the underside of a bookshelf (low-profile, directional). When triggered, it parses the utterance and publishes to Mosquitto (also local) on rhasspy/lamp/office/command.
Our ESPHome config subscribes to that topic:
mqtt:
broker: 192.168.1.20
username: !secret mqtt_user
password: !secret mqtt_pass
automation:
- alias: "Office lamp scene"
trigger:
- platform: mqtt
topic: rhasspy/lamp/office/command
action:
- choose:
- conditions:
- condition: template
value_template: "{{ trigger.payload == 'focus' }}"
sequence:
- light.turn_on:
entity_id: light.office_main
rgbw_color: [255, 220, 180, 255] # Warm white dominant
brightness: 220
- conditions:
- condition: template
value_template: "{{ trigger.payload == 'relax' }}"
sequence:
- light.turn_on:
entity_id: light.office_main
rgbw_color: [200, 220, 255, 200] # Cool white + blue bias
brightness: 80
No NLU cloud. No custom skill registration. Just MQTT topics and YAML logic—auditable, editable, offline.
Why This Isn’t Just “For Tinkerers”
It’s for people who’ve seen what happens when “convenience” becomes a vector. A Shelly RGBW2 costs $29. An ESP32 dev board for Rhasspy is $12. A Pi Zero 2W is $15. Total hardware under $60 per zone—and zero recurring fees.
More importantly: it’s maintainable. When Rhasspy drops support for a wake word model, you swap in a new one. When Shelly releases a hardware revision, you ignore it—your ESPHome config stays identical. There’s no vendor roadmap dictating your lighting behavior.
That security analyst? His office lights now respond to “Hey Lamp, focus” with zero latency, zero logging, and zero external packets. And when he presents to clients, he knows—down to the TCP handshake—that nothing leaves his subnet.
That’s not minimalism. That’s sovereignty.
