Home Assistant Alexa โ Layer 3: Ollama Local AI + Node-RED Flow
This is the layer that makes the whole system genuinely useful. When you say "Alexa, ask house will the battery last tonight?", Ollama running locally on your Pi 5 interprets the question, picks the correct Home Assistant script, and the Echo Dot speaks the answer โ all without a single byte leaving your LAN. Here is exactly how to build it.
- Layer 1 (Nabu Casa) working โ guide here
- Layer 2 (AWS Lambda) working โ guide here
- Node-RED add-on installed and running in HA
- Ollama add-on installed in HA
Step 1 โ Add HA Helpers and REST Command
Open /homeassistant/configuration.yaml and add:
input_text:
alexa_query:
name: "Alexa AI Query"
max: 255
icon: mdi:microphone
alexa_zone_1_request:
name: "Zone 1"
icon: mdi:sprinkler-variant
alexa_zone_2_request:
name: "Zone 2"
icon: mdi:sprinkler-variant
alexa_zone_3_request:
name: "Zone 3"
icon: mdi:sprinkler-variant
alexa_skip_zone:
name: "Skip Zone"
icon: mdi:skip-next
rest_command:
trigger_alexa_ai:
url: "http://YOUR_PI_IP:1880/endpoint/alexa-ai-query"
method: POST
headers:
Content-Type: "application/json"
Authorization: "Basic YOUR_BASE64_CREDENTIALS"
payload: '{"query": "{{ query }}"}'
content_type: "application/json"
Adding
rest_command to configuration.yaml requires a full Settings โ System โ Restart. Reloading scripts, automations, or running homeassistant.reload_core_config does NOT load the new rest_command. This caused hours of debugging.Step 2 โ Add the Trigger Automation
Add to /homeassistant/automations.yaml:
- id: alexa_ai_query_trigger
alias: "Alexa AI Query Trigger"
trigger:
- platform: state
entity_id: input_text.alexa_query
condition:
- condition: template
value_template: >
{{ trigger.to_state.state not in ['unknown', ''] }}
action:
- action: rest_command.trigger_alexa_ai
data:
query: "{{ trigger.to_state.state }}"
mode: queued
max: 5
Step 3 โ Install Ollama and Pull the Model
- HA โ Settings โ Add-ons โ Add-on Store โ search Ollama โ Install โ Start
- Open HA Terminal (Settings โ Add-ons โ Terminal โ Open Web UI)
- Pull the model:
curl http://localhost:11434/api/pull -d '{"model":"qwen2.5:1.5b"}'Wait for the ~900MB download to complete.
Why qwen2.5:1.5b? Small enough to fit in Pi 5 RAM alongside everything else (~900MB), fast enough at ~7.5 tokens/second, and accurate enough for the one task it does: mapping spoken queries to HA script names as compact JSON. For this use case, a 1.5B model is more than sufficient.
Step 4 โ Build the Node-RED "Alexa AI" Flow
Open Node-RED at http://YOUR_PI_IP:1880. Click + to create a new tab. Double-click the tab and rename it to Alexa AI.
Node 1: HTTP In (Webhook)
- Type: http in ยท Method: POST ยท URL:
/alexa-ai-query - Name: Alexa AI Webhook
Node 2: HTTP Response (Immediate ACK)
- Type: http response ยท Status code:
200 - Connect directly from HTTP In. Returns 200 immediately so HA doesn't time out waiting.
Node 3: Function โ "Process via Ollama"
Also connect from HTTP In (parallel branch). This does the actual AI work asynchronously.
const TOKEN = env.get('SUPERVISOR_TOKEN');
const http = require('http');
// Strip the |timestamp nonce Lambda appended
const rawValue = (msg.payload && msg.payload.query) ? msg.payload.query : (msg.payload || '');
const query = rawValue.includes('|') ? rawValue.split('|')[0].trim() : rawValue.trim();
if (!query || query === 'unknown' || query === '') return null;
// Pre-filter: zone 4 never exists โ save Ollama tokens
if (/zone\s*(4|four)/i.test(query)) {
const d = JSON.stringify({ entity_id: 'script.voice_announce',
variables: { message: 'Zone four does not exist. I can run zones one, two, or three.' } });
const req = http.request({ hostname: 'supervisor', port: 80,
path: '/core/api/services/script/turn_on', method: 'POST',
headers: { 'Content-Type': 'application/json',
'Content-Length': Buffer.byteLength(d),
'Authorization': 'Bearer ' + TOKEN }}, () => {});
req.on('error', () => {});
req.write(d); req.end();
return null;
}
// SYSTEM PROMPT โ must be BYTE-IDENTICAL to the warmup node (see KV cache note below)
const SYSTEM = `You control the Van Buren home in South Africa.
OUTPUT RULES:
- ALWAYS output ONE compact JSON line for ANY home control or status request.
- ONLY for pure general-knowledge questions output plain spoken English.
JSON format: {"service":"script.EXACT_NAME","data":{}}
SCRIPTS:
script.alexa_battery_status
script.alexa_solar_status
script.alexa_system_status
script.alexa_good_night
script.alexa_will_battery_last
script.alexa_charge_overnight
script.alexa_start_eskom_import
script.alexa_stop_eskom_import
script.alexa_loadshedding
script.alexa_morning_brief
script.alexa_geyser_boost
script.alexa_eskom_outage
script.alexa_biltong_status
script.alexa_start_sprinklers data: {"zone1_minutes":N,"zone2_minutes":N,"zone3_minutes":N}
script.alexa_stop_sprinklers
script.alexa_sprinkler_status
script.aircon_status
script.aircon_configure data: {"temperature":N,"mode":"cool|heat|auto|dry|fan_only|off"}
script.aircon_boost
script.aircon_eco
EXAMPLES:
"battery status" โ {"service":"script.alexa_battery_status","data":{}}
"run zone 2 for 8 minutes" โ {"service":"script.alexa_start_sprinklers","data":{"zone1_minutes":0,"zone2_minutes":8,"zone3_minutes":0}}
"will the battery last" โ {"service":"script.alexa_will_battery_last","data":{}}
"set aircon to 22" โ {"service":"script.aircon_configure","data":{"temperature":22}}`;
msg.payload = JSON.stringify({
model: 'qwen2.5:1.5b',
messages: [{ role: 'system', content: SYSTEM }, { role: 'user', content: query }],
stream: false,
options: { num_ctx: 2048, num_predict: 80, keep_alive: -1 }
});
msg.url = 'http://YOUR_PI_IP:11434/api/chat';
msg.method = 'POST';
msg.headers = { 'Content-Type': 'application/json' };
return msg;
Node 4: HTTP Request โ "Call Ollama"
- Method: use msg.method ยท URL: leave blank (set by function)
- Return: a parsed JSON object ยท Timeout:
30seconds
Node 5: Function โ "Parse and Execute"
const TOKEN = env.get('SUPERVISOR_TOKEN');
const http = require('http');
const content = msg.payload.message.content.trim();
let service, data = {};
try {
const m = content.match(/\{[\s\S]*\}/);
if (m) {
const parsed = JSON.parse(m[0]);
service = parsed.service;
data = parsed.data || {};
}
} catch(e) {
service = 'script.voice_announce';
data = { message: content };
}
if (!service) return null;
const ALLOWED = [
'script.alexa_battery_status','script.alexa_solar_status','script.alexa_system_status',
'script.alexa_good_night','script.alexa_will_battery_last','script.alexa_charge_overnight',
'script.alexa_start_eskom_import','script.alexa_stop_eskom_import','script.alexa_loadshedding',
'script.alexa_morning_brief','script.alexa_geyser_boost','script.alexa_eskom_outage',
'script.alexa_biltong_status','script.alexa_start_sprinklers','script.alexa_stop_sprinklers',
'script.alexa_sprinkler_status','script.aircon_status','script.aircon_configure',
'script.aircon_boost','script.aircon_eco','script.voice_announce'
];
if (!ALLOWED.includes(service)) { node.warn('Blocked: ' + service); return null; }
const payload = JSON.stringify({ entity_id: service, variables: data });
const req = http.request({
hostname: 'supervisor', port: 80,
path: '/core/api/services/script/turn_on', method: 'POST',
headers: { 'Content-Type': 'application/json',
'Content-Length': Buffer.byteLength(payload),
'Authorization': 'Bearer ' + TOKEN }
}, () => {});
req.on('error', e => node.warn('HA call failed: ' + e.message));
req.write(payload); req.end();
return null;
Node 6: Inject โ "Warmup (startup + 5min)"
- Fires once at startup: โ ยท Repeat every
300seconds - Connect to a "Build Warmup" function โ HTTP Request โ debug
Ollama caches the tokenized system prompt in GPU/CPU memory (the KV cache). If your warmup node and process node use the exact same SYSTEM string, each real query costs ~0.05 seconds for prompt evaluation. If they differ by even ONE character โ a space, a newline, a different quote โ Ollama re-tokenizes the entire prompt on every query, costing ~20 seconds.
Copy the const SYSTEM = ... string from your process node. Paste it verbatim into the warmup node. Do not retype.
| State | Prompt eval | Generation | Total |
|---|---|---|---|
| Cache miss (prompts differ) | ~20 sec | ~3 sec | ~23 sec |
| Cache hit (prompts match) | ~0.05 sec | ~3 sec | ~3-4 sec |
Final flow connections
[Warmup inject] โ [Build Warmup fn] โ [HTTP Request Ollama] โ [debug]
[HTTP In: Webhook] โ [HTTP Response: 200 ACK]
โโ [Process fn] โ [HTTP Request Ollama] โ [Parse+Execute fn]
[TEST inject: "good night check"] โ [Process fn]
Click Deploy. The warmup inject fires immediately on deploy, loading qwen2.5:1.5b and priming the KV cache.
Step 5 โ Add the voice_announce Script
First find your Echo Dot's device_id: HA โ Settings โ Devices โ click your Echo Dot โ look at the URL: /config/devices/device/XXXXXX โ that hex string is the device_id.
Add to /homeassistant/scripts.yaml:
voice_announce:
alias: "Voice Announce"
fields:
message:
description: "Text to announce"
required: true
sequence:
- action: alexa_devices.send_text_command
data:
device_id: YOUR_ECHO_DEVICE_ID
text_command: "announce {{ message }}"
mode: single
Step 6 โ Set Up the Alexa Devices Integration (for TTS)
- HA โ Settings โ Integrations โ + Add Integration
- Search Alexa Devices โ enter your Amazon credentials
- Complete 2FA if prompted
- Your Echo Dot appears as
media_player.echo_dot_*
This integration is what enables HA to proactively speak through the Echo Dot at any time โ for morning briefs, alerts, automation announcements.
Testing Without Speaking
Before testing with Alexa, verify the pipeline directly:
- In Node-RED, click the TEST inject node (payload: "battery status") โ check that Echo speaks
- If the Echo speaks: the NR โ Ollama โ HA โ Echo Dot chain is working
- If silent: check the NR debug panel for errors, then check HA Developer Tools โ States โ
input_text.alexa_query
To test the full chain: "Alexa, ask house what is the battery level" โ Echo should say "Checking with the house" then speak the battery % about 3-5 seconds later.
Pipeline working? Now add the HA scripts that do the actual calculations.
The HA Scripts: Battery, Solar, Sprinklers โ