Home Assistant Alexa โ€” Layer 3: Ollama Local AI + Node-RED Flow

๐Ÿ“… April 2026โฑ๏ธ 15 min read๐Ÿท๏ธ Home Assistant ยท Alexa ยท Ollama ยท Node-RED

This is the layer that makes the whole system genuinely useful. When you say "Alexa, ask house will the battery last tonight?", Ollama running locally on your Pi 5 interprets the question, picks the correct Home Assistant script, and the Echo Dot speaks the answer โ€” all without a single byte leaving your LAN. Here is exactly how to build it.

What you need first

Step 1 โ€” Add HA Helpers and REST Command

Open /homeassistant/configuration.yaml and add:

input_text:
  alexa_query:
    name: "Alexa AI Query"
    max: 255
    icon: mdi:microphone
  alexa_zone_1_request:
    name: "Zone 1"
    icon: mdi:sprinkler-variant
  alexa_zone_2_request:
    name: "Zone 2"
    icon: mdi:sprinkler-variant
  alexa_zone_3_request:
    name: "Zone 3"
    icon: mdi:sprinkler-variant
  alexa_skip_zone:
    name: "Skip Zone"
    icon: mdi:skip-next

rest_command:
  trigger_alexa_ai:
    url: "http://YOUR_PI_IP:1880/endpoint/alexa-ai-query"
    method: POST
    headers:
      Content-Type: "application/json"
      Authorization: "Basic YOUR_BASE64_CREDENTIALS"
    payload: '{"query": "{{ query }}"}'
    content_type: "application/json"
โš ๏ธ CRITICAL โ€” Full HA restart required
Adding rest_command to configuration.yaml requires a full Settings โ†’ System โ†’ Restart. Reloading scripts, automations, or running homeassistant.reload_core_config does NOT load the new rest_command. This caused hours of debugging.

Step 2 โ€” Add the Trigger Automation

Add to /homeassistant/automations.yaml:

- id: alexa_ai_query_trigger
  alias: "Alexa AI Query Trigger"
  trigger:
    - platform: state
      entity_id: input_text.alexa_query
  condition:
    - condition: template
      value_template: >
        {{ trigger.to_state.state not in ['unknown', ''] }}
  action:
    - action: rest_command.trigger_alexa_ai
      data:
        query: "{{ trigger.to_state.state }}"
  mode: queued
  max: 5

Step 3 โ€” Install Ollama and Pull the Model

  1. HA โ†’ Settings โ†’ Add-ons โ†’ Add-on Store โ†’ search Ollama โ†’ Install โ†’ Start
  2. Open HA Terminal (Settings โ†’ Add-ons โ†’ Terminal โ†’ Open Web UI)
  3. Pull the model:
    curl http://localhost:11434/api/pull -d '{"model":"qwen2.5:1.5b"}'
    Wait for the ~900MB download to complete.

Why qwen2.5:1.5b? Small enough to fit in Pi 5 RAM alongside everything else (~900MB), fast enough at ~7.5 tokens/second, and accurate enough for the one task it does: mapping spoken queries to HA script names as compact JSON. For this use case, a 1.5B model is more than sufficient.

Step 4 โ€” Build the Node-RED "Alexa AI" Flow

Open Node-RED at http://YOUR_PI_IP:1880. Click + to create a new tab. Double-click the tab and rename it to Alexa AI.

Node 1: HTTP In (Webhook)

Node 2: HTTP Response (Immediate ACK)

Node 3: Function โ€” "Process via Ollama"

Also connect from HTTP In (parallel branch). This does the actual AI work asynchronously.

const TOKEN = env.get('SUPERVISOR_TOKEN');
const http = require('http');

// Strip the |timestamp nonce Lambda appended
const rawValue = (msg.payload && msg.payload.query) ? msg.payload.query : (msg.payload || '');
const query = rawValue.includes('|') ? rawValue.split('|')[0].trim() : rawValue.trim();

if (!query || query === 'unknown' || query === '') return null;

// Pre-filter: zone 4 never exists โ€” save Ollama tokens
if (/zone\s*(4|four)/i.test(query)) {
    const d = JSON.stringify({ entity_id: 'script.voice_announce',
        variables: { message: 'Zone four does not exist. I can run zones one, two, or three.' } });
    const req = http.request({ hostname: 'supervisor', port: 80,
        path: '/core/api/services/script/turn_on', method: 'POST',
        headers: { 'Content-Type': 'application/json',
                   'Content-Length': Buffer.byteLength(d),
                   'Authorization': 'Bearer ' + TOKEN }}, () => {});
    req.on('error', () => {});
    req.write(d); req.end();
    return null;
}

// SYSTEM PROMPT โ€” must be BYTE-IDENTICAL to the warmup node (see KV cache note below)
const SYSTEM = `You control the Van Buren home in South Africa.

OUTPUT RULES:
- ALWAYS output ONE compact JSON line for ANY home control or status request.
- ONLY for pure general-knowledge questions output plain spoken English.

JSON format: {"service":"script.EXACT_NAME","data":{}}

SCRIPTS:
script.alexa_battery_status
script.alexa_solar_status
script.alexa_system_status
script.alexa_good_night
script.alexa_will_battery_last
script.alexa_charge_overnight
script.alexa_start_eskom_import
script.alexa_stop_eskom_import
script.alexa_loadshedding
script.alexa_morning_brief
script.alexa_geyser_boost
script.alexa_eskom_outage
script.alexa_biltong_status
script.alexa_start_sprinklers  data: {"zone1_minutes":N,"zone2_minutes":N,"zone3_minutes":N}
script.alexa_stop_sprinklers
script.alexa_sprinkler_status
script.aircon_status
script.aircon_configure  data: {"temperature":N,"mode":"cool|heat|auto|dry|fan_only|off"}
script.aircon_boost
script.aircon_eco

EXAMPLES:
"battery status" โ†’ {"service":"script.alexa_battery_status","data":{}}
"run zone 2 for 8 minutes" โ†’ {"service":"script.alexa_start_sprinklers","data":{"zone1_minutes":0,"zone2_minutes":8,"zone3_minutes":0}}
"will the battery last" โ†’ {"service":"script.alexa_will_battery_last","data":{}}
"set aircon to 22" โ†’ {"service":"script.aircon_configure","data":{"temperature":22}}`;

msg.payload = JSON.stringify({
    model: 'qwen2.5:1.5b',
    messages: [{ role: 'system', content: SYSTEM }, { role: 'user', content: query }],
    stream: false,
    options: { num_ctx: 2048, num_predict: 80, keep_alive: -1 }
});
msg.url = 'http://YOUR_PI_IP:11434/api/chat';
msg.method = 'POST';
msg.headers = { 'Content-Type': 'application/json' };
return msg;

Node 4: HTTP Request โ€” "Call Ollama"

Node 5: Function โ€” "Parse and Execute"

const TOKEN = env.get('SUPERVISOR_TOKEN');
const http = require('http');

const content = msg.payload.message.content.trim();
let service, data = {};

try {
    const m = content.match(/\{[\s\S]*\}/);
    if (m) {
        const parsed = JSON.parse(m[0]);
        service = parsed.service;
        data = parsed.data || {};
    }
} catch(e) {
    service = 'script.voice_announce';
    data = { message: content };
}

if (!service) return null;

const ALLOWED = [
    'script.alexa_battery_status','script.alexa_solar_status','script.alexa_system_status',
    'script.alexa_good_night','script.alexa_will_battery_last','script.alexa_charge_overnight',
    'script.alexa_start_eskom_import','script.alexa_stop_eskom_import','script.alexa_loadshedding',
    'script.alexa_morning_brief','script.alexa_geyser_boost','script.alexa_eskom_outage',
    'script.alexa_biltong_status','script.alexa_start_sprinklers','script.alexa_stop_sprinklers',
    'script.alexa_sprinkler_status','script.aircon_status','script.aircon_configure',
    'script.aircon_boost','script.aircon_eco','script.voice_announce'
];
if (!ALLOWED.includes(service)) { node.warn('Blocked: ' + service); return null; }

const payload = JSON.stringify({ entity_id: service, variables: data });
const req = http.request({
    hostname: 'supervisor', port: 80,
    path: '/core/api/services/script/turn_on', method: 'POST',
    headers: { 'Content-Type': 'application/json',
               'Content-Length': Buffer.byteLength(payload),
               'Authorization': 'Bearer ' + TOKEN }
}, () => {});
req.on('error', e => node.warn('HA call failed: ' + e.message));
req.write(payload); req.end();
return null;

Node 6: Inject โ€” "Warmup (startup + 5min)"

The KV Cache โ€” the most critical detail in this entire build

Ollama caches the tokenized system prompt in GPU/CPU memory (the KV cache). If your warmup node and process node use the exact same SYSTEM string, each real query costs ~0.05 seconds for prompt evaluation. If they differ by even ONE character โ€” a space, a newline, a different quote โ€” Ollama re-tokenizes the entire prompt on every query, costing ~20 seconds.

Copy the const SYSTEM = ... string from your process node. Paste it verbatim into the warmup node. Do not retype.

StatePrompt evalGenerationTotal
Cache miss (prompts differ)~20 sec~3 sec~23 sec
Cache hit (prompts match)~0.05 sec~3 sec~3-4 sec

Final flow connections

[Warmup inject] โ†’ [Build Warmup fn] โ†’ [HTTP Request Ollama] โ†’ [debug]

[HTTP In: Webhook] โ†’ [HTTP Response: 200 ACK]
                  โ””โ†’ [Process fn] โ†’ [HTTP Request Ollama] โ†’ [Parse+Execute fn]

[TEST inject: "good night check"] โ†’ [Process fn]

Click Deploy. The warmup inject fires immediately on deploy, loading qwen2.5:1.5b and priming the KV cache.

Step 5 โ€” Add the voice_announce Script

First find your Echo Dot's device_id: HA โ†’ Settings โ†’ Devices โ†’ click your Echo Dot โ†’ look at the URL: /config/devices/device/XXXXXX โ€” that hex string is the device_id.

Add to /homeassistant/scripts.yaml:

voice_announce:
  alias: "Voice Announce"
  fields:
    message:
      description: "Text to announce"
      required: true
  sequence:
    - action: alexa_devices.send_text_command
      data:
        device_id: YOUR_ECHO_DEVICE_ID
        text_command: "announce {{ message }}"
  mode: single

Step 6 โ€” Set Up the Alexa Devices Integration (for TTS)

  1. HA โ†’ Settings โ†’ Integrations โ†’ + Add Integration
  2. Search Alexa Devices โ†’ enter your Amazon credentials
  3. Complete 2FA if prompted
  4. Your Echo Dot appears as media_player.echo_dot_*

This integration is what enables HA to proactively speak through the Echo Dot at any time โ€” for morning briefs, alerts, automation announcements.

Testing Without Speaking

Before testing with Alexa, verify the pipeline directly:

  1. In Node-RED, click the TEST inject node (payload: "battery status") โ†’ check that Echo speaks
  2. If the Echo speaks: the NR โ†’ Ollama โ†’ HA โ†’ Echo Dot chain is working
  3. If silent: check the NR debug panel for errors, then check HA Developer Tools โ†’ States โ†’ input_text.alexa_query

To test the full chain: "Alexa, ask house what is the battery level" โ†’ Echo should say "Checking with the house" then speak the battery % about 3-5 seconds later.

Pipeline working? Now add the HA scripts that do the actual calculations.

The HA Scripts: Battery, Solar, Sprinklers โ†’