Home Assistant Alexa — Layer 3: Ollama Local AI + Node-RED Flow

📅 April 2026⏱️ 15 min read🏷️ Home Assistant · Alexa · Ollama · Node-RED

This is the layer that makes the whole system genuinely useful. When you say "Alexa, ask house will the battery last tonight?", Ollama running locally on your Pi 5 interprets the question, picks the correct Home Assistant script, and the Echo Dot speaks the answer — all without a single byte leaving your LAN. Here is exactly how to build it.

What you need first

Layer 1 (Nabu Casa) working — guide here
Layer 2 (AWS Lambda) working — guide here
Node-RED add-on installed and running in HA
Ollama add-on installed in HA

Step 1 — Add HA Helpers and REST Command

Open /homeassistant/configuration.yaml and add:

input_text:
  alexa_query:
    name: "Alexa AI Query"
    max: 255
    icon: mdi:microphone
  alexa_zone_1_request:
    name: "Zone 1"
    icon: mdi:sprinkler-variant
  alexa_zone_2_request:
    name: "Zone 2"
    icon: mdi:sprinkler-variant
  alexa_zone_3_request:
    name: "Zone 3"
    icon: mdi:sprinkler-variant
  alexa_skip_zone:
    name: "Skip Zone"
    icon: mdi:skip-next

rest_command:
  trigger_alexa_ai:
    url: "http://YOUR_PI_IP:1880/endpoint/alexa-ai-query"
    method: POST
    headers:
      Content-Type: "application/json"
      Authorization: "Basic YOUR_BASE64_CREDENTIALS"
    payload: '{"query": "{{ query }}"}'
    content_type: "application/json"

⚠️ CRITICAL — Full HA restart required
Adding rest_command to configuration.yaml requires a full Settings → System → Restart. Reloading scripts, automations, or running homeassistant.reload_core_config does NOT load the new rest_command. This caused hours of debugging.

Step 2 — Add the Trigger Automation

Add to /homeassistant/automations.yaml:

- id: alexa_ai_query_trigger
  alias: "Alexa AI Query Trigger"
  trigger:
    - platform: state
      entity_id: input_text.alexa_query
  condition:
    - condition: template
      value_template: >
        {{ trigger.to_state.state not in ['unknown', ''] }}
  action:
    - action: rest_command.trigger_alexa_ai
      data:
        query: "{{ trigger.to_state.state }}"
  mode: queued
  max: 5

Step 3 — Install Ollama and Pull the Model

HA → Settings → Add-ons → Add-on Store → search Ollama → Install → Start
Open HA Terminal (Settings → Add-ons → Terminal → Open Web UI)

Pull the model:

curl http://localhost:11434/api/pull -d '{"model":"qwen2.5:1.5b"}'

Wait for the ~900MB download to complete.

Why qwen2.5:1.5b? Small enough to fit in Pi 5 RAM alongside everything else (~900MB), fast enough at ~7.5 tokens/second, and accurate enough for the one task it does: mapping spoken queries to HA script names as compact JSON. For this use case, a 1.5B model is more than sufficient.

Step 4 — Build the Node-RED "Alexa AI" Flow

Open Node-RED at http://YOUR_PI_IP:1880. Click + to create a new tab. Double-click the tab and rename it to Alexa AI.

Node 1: HTTP In (Webhook)

Type: http in · Method: POST · URL: /alexa-ai-query
Name: Alexa AI Webhook

Node 2: HTTP Response (Immediate ACK)

Type: http response · Status code: 200
Connect directly from HTTP In. Returns 200 immediately so HA doesn't time out waiting.

Node 3: Function — "Process via Ollama"

Also connect from HTTP In (parallel branch). This does the actual AI work asynchronously.

const TOKEN = env.get('SUPERVISOR_TOKEN');
const http = require('http');

// Strip the |timestamp nonce Lambda appended
const rawValue = (msg.payload && msg.payload.query) ? msg.payload.query : (msg.payload || '');
const query = rawValue.includes('|') ? rawValue.split('|')[0].trim() : rawValue.trim();

if (!query || query === 'unknown' || query === '') return null;

// Pre-filter: zone 4 never exists — save Ollama tokens
if (/zone\s*(4|four)/i.test(query)) {
    const d = JSON.stringify({ entity_id: 'script.voice_announce',
        variables: { message: 'Zone four does not exist. I can run zones one, two, or three.' } });
    const req = http.request({ hostname: 'supervisor', port: 80,
        path: '/core/api/services/script/turn_on', method: 'POST',
        headers: { 'Content-Type': 'application/json',
                   'Content-Length': Buffer.byteLength(d),
                   'Authorization': 'Bearer ' + TOKEN }}, () => {});
    req.on('error', () => {});
    req.write(d); req.end();
    return null;
}

// SYSTEM PROMPT — must be BYTE-IDENTICAL to the warmup node (see KV cache note below)
const SYSTEM = `You control the Van Buren home in South Africa.

OUTPUT RULES:
- ALWAYS output ONE compact JSON line for ANY home control or status request.
- ONLY for pure general-knowledge questions output plain spoken English.

JSON format: {"service":"script.EXACT_NAME","data":{}}

SCRIPTS:
script.alexa_battery_status
script.alexa_solar_status
script.alexa_system_status
script.alexa_good_night
script.alexa_will_battery_last
script.alexa_charge_overnight
script.alexa_start_eskom_import
script.alexa_stop_eskom_import
script.alexa_loadshedding
script.alexa_morning_brief
script.alexa_geyser_boost
script.alexa_eskom_outage
script.alexa_biltong_status
script.alexa_start_sprinklers  data: {"zone1_minutes":N,"zone2_minutes":N,"zone3_minutes":N}
script.alexa_stop_sprinklers
script.alexa_sprinkler_status
script.aircon_status
script.aircon_configure  data: {"temperature":N,"mode":"cool|heat|auto|dry|fan_only|off"}
script.aircon_boost
script.aircon_eco

EXAMPLES:
"battery status" → {"service":"script.alexa_battery_status","data":{}}
"run zone 2 for 8 minutes" → {"service":"script.alexa_start_sprinklers","data":{"zone1_minutes":0,"zone2_minutes":8,"zone3_minutes":0}}
"will the battery last" → {"service":"script.alexa_will_battery_last","data":{}}
"set aircon to 22" → {"service":"script.aircon_configure","data":{"temperature":22}}`;

msg.payload = JSON.stringify({
    model: 'qwen2.5:1.5b',
    messages: [{ role: 'system', content: SYSTEM }, { role: 'user', content: query }],
    stream: false,
    options: { num_ctx: 2048, num_predict: 80, keep_alive: -1 }
});
msg.url = 'http://YOUR_PI_IP:11434/api/chat';
msg.method = 'POST';
msg.headers = { 'Content-Type': 'application/json' };
return msg;

Node 4: HTTP Request — "Call Ollama"

Method: use msg.method · URL: leave blank (set by function)
Return: a parsed JSON object · Timeout: 30 seconds

Node 5: Function — "Parse and Execute"

const TOKEN = env.get('SUPERVISOR_TOKEN');
const http = require('http');

const content = msg.payload.message.content.trim();
let service, data = {};

try {
    const m = content.match(/\{[\s\S]*\}/);
    if (m) {
        const parsed = JSON.parse(m[0]);
        service = parsed.service;
        data = parsed.data || {};
    }
} catch(e) {
    service = 'script.voice_announce';
    data = { message: content };
}

if (!service) return null;

const ALLOWED = [
    'script.alexa_battery_status','script.alexa_solar_status','script.alexa_system_status',
    'script.alexa_good_night','script.alexa_will_battery_last','script.alexa_charge_overnight',
    'script.alexa_start_eskom_import','script.alexa_stop_eskom_import','script.alexa_loadshedding',
    'script.alexa_morning_brief','script.alexa_geyser_boost','script.alexa_eskom_outage',
    'script.alexa_biltong_status','script.alexa_start_sprinklers','script.alexa_stop_sprinklers',
    'script.alexa_sprinkler_status','script.aircon_status','script.aircon_configure',
    'script.aircon_boost','script.aircon_eco','script.voice_announce'
];
if (!ALLOWED.includes(service)) { node.warn('Blocked: ' + service); return null; }

const payload = JSON.stringify({ entity_id: service, variables: data });
const req = http.request({
    hostname: 'supervisor', port: 80,
    path: '/core/api/services/script/turn_on', method: 'POST',
    headers: { 'Content-Type': 'application/json',
               'Content-Length': Buffer.byteLength(payload),
               'Authorization': 'Bearer ' + TOKEN }
}, () => {});
req.on('error', e => node.warn('HA call failed: ' + e.message));
req.write(payload); req.end();
return null;

Node 6: Inject — "Warmup (startup + 5min)"

Fires once at startup: ✓ · Repeat every 300 seconds
Connect to a "Build Warmup" function → HTTP Request → debug

The KV Cache — the most critical detail in this entire build

Ollama caches the tokenized system prompt in GPU/CPU memory (the KV cache). If your warmup node and process node use the exact same SYSTEM string, each real query costs ~0.05 seconds for prompt evaluation. If they differ by even ONE character — a space, a newline, a different quote — Ollama re-tokenizes the entire prompt on every query, costing ~20 seconds.

Copy the const SYSTEM = ... string from your process node. Paste it verbatim into the warmup node. Do not retype.

State	Prompt eval	Generation	Total
Cache miss (prompts differ)	~20 sec	~3 sec	~23 sec
Cache hit (prompts match)	~0.05 sec	~3 sec	~3-4 sec

Final flow connections

[Warmup inject] → [Build Warmup fn] → [HTTP Request Ollama] → [debug]

[HTTP In: Webhook] → [HTTP Response: 200 ACK]
                  └→ [Process fn] → [HTTP Request Ollama] → [Parse+Execute fn]

[TEST inject: "good night check"] → [Process fn]

Click Deploy. The warmup inject fires immediately on deploy, loading qwen2.5:1.5b and priming the KV cache.

Step 5 — Add the voice_announce Script

First find your Echo Dot's device_id: HA → Settings → Devices → click your Echo Dot → look at the URL: /config/devices/device/XXXXXX — that hex string is the device_id.

Add to /homeassistant/scripts.yaml:

voice_announce:
  alias: "Voice Announce"
  fields:
    message:
      description: "Text to announce"
      required: true
  sequence:
    - action: alexa_devices.send_text_command
      data:
        device_id: YOUR_ECHO_DEVICE_ID
        text_command: "announce {{ message }}"
  mode: single

Step 6 — Set Up the Alexa Devices Integration (for TTS)

HA → Settings → Integrations → + Add Integration
Search Alexa Devices → enter your Amazon credentials
Complete 2FA if prompted
Your Echo Dot appears as media_player.echo_dot_*

This integration is what enables HA to proactively speak through the Echo Dot at any time — for morning briefs, alerts, automation announcements.

Testing Without Speaking

Before testing with Alexa, verify the pipeline directly:

In Node-RED, click the TEST inject node (payload: "battery status") → check that Echo speaks
If the Echo speaks: the NR → Ollama → HA → Echo Dot chain is working
If silent: check the NR debug panel for errors, then check HA Developer Tools → States → input_text.alexa_query

To test the full chain: "Alexa, ask house what is the battery level" → Echo should say "Checking with the house" then speak the battery % about 3-5 seconds later.

Pipeline working? Now add the HA scripts that do the actual calculations.

The HA Scripts: Battery, Solar, Sprinklers →