How HTM Learns to Predict CNC Machine Behavior

The Big Picture

What Are We Building?

A CNC machine cuts metal. While cutting, sensors measure things like axis load (how hard the motor is working), temperature, spindle speed, and feedrate. These numbers arrive every 5 seconds.

We want a system that learns what "normal" looks like for this machine, and alerts us when something is "abnormal" — like a worn tool, a bearing failure, or a crash about to happen.

Instead of programming rules like "alert if load > 80%", we use an algorithm inspired by how the human brain works. It learns patterns from experience and gets surprised when something new happens. That surprise is the anomaly signal.

⚙

Raw Sensor

→

▦

Normalize

→

☷

Encoder

→

⚖

Spatial Pooler

→

⚙

Temporal Memory

→

⚠

Anomaly Score

→

✓

Likelihood

Let's walk through each stage. We'll use a real config file for the XM axis load sensor on a Fanuc CNC machine.

Stage 1

Normalization

The raw sensor reads something like 30.27% load. But the algorithm needs a number between 0 and 1. So we normalize:

normalized = (30.27 - 0) / (100 - 0) = 0.3027

The normalization range (0 to 100 for load) is configured in mtc2anomaly. If the range is wrong — say you set max to 50 but the load sometimes hits 80 — values get clipped and information is lost.

Why it matters

The normalized value determines which encoder bucket the input falls into. Wrong normalization = wrong encoding = wrong predictions = false anomalies.

Stage 2

The Encoder

The encoder converts the normalized number into a Sparse Distributed Representation (SDR) — a long row of bits (0s and 1s) where only a few bits are turned ON.

Think of it like a ruler with 256 marks. The value 0.3027 lights up 41 consecutive marks centered around position 65. If the value changes to 0.31, the lit-up window slides one mark to the right.

SDR Encoder — drag to change value 0.3027

inputBits

256

Total bits in the SDR. More bits = finer resolution, but more sensitive to noise.

How many bits are ON. Wider = more overlap between nearby values = less sensitive to tiny changes.

minVal / maxVal

0 — 1

The encoder's input range. Values outside get clipped.

clipInput

true

Clamp out-of-range values instead of throwing an error.

Sparsity — Why Only 41 of 256 Bits?

The SDR is sparse — only 16% of bits are ON (41 / 256). This isn't arbitrary. Sparsity gives HTM two superpowers:

1. Massive storage capacity. The number of unique patterns you can make by choosing 41 bits out of 256 is astronomically large — roughly 10⁴⁸. Every input value gets its own unique pattern with essentially zero chance of collision.

2. Noise tolerance. Two SDRs are considered "matching" if enough bits overlap (the activationThreshold). With 41 active bits, even if noise corrupts 10 of them, the remaining 31 still identify the pattern. A dense representation (say 200 of 256 ON) would be fragile — most patterns would look the same.

The original Numenta research targets ~2% sparsity (w/n). Our encoder is at 16% (41/256) which is denser than the 2% guideline, but we chose wider w deliberately to increase overlap between adjacent values. The SP output restores sparsity: 10 active / 256 columns = 3.9%, closer to the 2% target.

The two levels of sparsity

Encoder sparsity (w / inputBits = 41/256 = 16%) controls how much adjacent values overlap. Wider w = more overlap = less sensitivity to jitter.

SP output sparsity (numActiveColumnsPerInhArea / numColumns = 10/256 = 3.9%) controls the final representation that the TM learns from. This is the sparsity that matters for memory capacity and noise tolerance.

Resolution

With 256 bits and w=41, there are 256 - 41 = 215 possible bucket positions. Each bucket covers 1.0 / 215 = 0.00465 of the input range. For load (0-100%), that's 0.47% per bucket.

Resolution — what the encoder sees vs ignores

A load change from 30.0% to 30.4% stays in the same bucket — invisible to the system. A change from 30% to 35% crosses ~10 buckets — clearly visible. Try it yourself — drag the two values below and watch the overlap:

SDR Overlap — drag either value to compare

The w=41 trick

With w=41, two values one bucket apart share 40 of 41 active bits (97% overlap). The Spatial Pooler sees nearly identical input and activates the same columns. This makes the system tolerant to normal cutting load jitter.

Stage 3

The Spatial Pooler

The encoder outputs 256 bits (41 are ON, 215 are OFF). Now we need to figure out which columns should respond to this pattern. The Spatial Pooler does this through a competition.

Step 1: Wiring — Each Column Gets Random Connections

At startup, each of the 256 columns gets randomly wired to about half of the 256 input bits (potentialPct: 0.5 = ~128 connections per column). This wiring is random and permanent — column 42 might connect to bits [3, 7, 15, 22, 41, 55, 89, ...] while column 200 connects to a completely different set.

Each connection has a permanence value — a number between 0 and 1 representing how strong that connection is. Connections above 0.25 (synPermConnected) are "connected" and actually count. Below that, they exist but don't contribute.

Random Wiring — each column connects to different input bits

Step 2: Overlap — Counting Matches

When the encoder lights up 41 bits, each column counts: "How many of my connected wires touch a lit-up bit?" This count is called the overlap score.

Say column A has 128 wires, and 14 of them happen to touch the 41 active bits. Column A's overlap = 14. Column C might only have 6 wires touching active bits. Column A has a stronger response to this input.

Step 3: Competition — Top 10 Win

All 256 columns compare their overlap scores. The top 10 (numActiveColumnsPerInhArea: 10) win. The other 246 are suppressed. This winner-takes-all competition is called inhibition — the strong columns "inhibit" the weaker ones.

With globalInhibition: true, every column competes against every other column. (The alternative, local inhibition, only competes against neighbors — but since our column positions are meaningless, global makes sense.)

Step 4: Why These Specific 10?

The 10 winners are the columns whose random wiring happens to overlap the most with the current 41 active bits. Because the wiring is random and initialized from randomSeed: 42, the same encoder pattern will always activate the same 10 columns (at least initially, before learning changes things).

Different encoder patterns (different load values) light up different bits, which means different columns win. That's how the SP maps input values to column representations. Try it — drag the slider to change load and watch which columns win:

SP Column Mapping — drag to change load 30.0%

Notice how nearby values (30.0% vs 30.2%) produce the same columns, while distant values (30% vs 50%) produce mostly different columns. This is the SP translating encoder overlap into column stability.

Spatial Pooler — 10 columns compete by overlap score

Step 5: Learning — Strengthening the Winners

After the 10 winners are chosen, the SP adjusts their connection strengths. Only the winning columns learn — the 246 losers are untouched:

Connections to active input bits (the 41 that were ON) get stronger: permanence += 0.15
Connections to inactive input bits (the 215 that were OFF) get weaker: permanence -= 0.005

Click "Learn" below to watch a winning column's connections change over repeated cycles. Notice how connections to active bits (green) get stronger while connections to inactive bits (grey) decay. After enough cycles, the column becomes a specialist — strongly wired to its input region and weakly wired to everything else.

SP Learning — one column's permanences over cycles

Over many cycles of seeing the same input, the winning columns develop very strong connections to that input's bit pattern. They become "specialists" for that value range. Column 42 might become the go-to column for loads around 28-32% because its permanences for those encoder bits are near 1.0.

Why 10 columns and not 1?

If only 1 column represented each value, a tiny input change could activate a completely different column — the TM would see a totally new pattern and burst. With 10 columns, a small value change might swap 1-2 columns while the other 8-9 stay the same. The TM's predictions survive because most of the pattern is unchanged.

SP Output Sparsity — The 2% Rule

The SP output is also sparse: 10 active / 256 total = 3.9%. The Numenta research recommends ~2% sparsity for optimal memory capacity. Here's why sparsity matters at the SP level:

The Temporal Memory downstream needs to learn "when I see column pattern A, predict column pattern B next." If 200 of 256 columns were active (dense), almost every pair of timesteps would share most of the same columns — the TM couldn't tell one pattern from another. With only 10 active, different input values produce clearly distinct column patterns.

The tradeoff: fewer active columns = more distinct patterns but less overlap between adjacent values. More active columns = more overlap (better predictions through value changes) but less distinct patterns. Our choice of 10 balances both.

Over time, each column becomes a "specialist" for a particular input region. Column 42 might respond best to loads around 28-32%. Column 198 might respond best to 45-50%.

numColumns

256

Total columns available. Same as inputBits.

numActiveColumnsPerInhArea

How many columns win. More = more overlap between timesteps = better predictions.

potentialPct

0.5

Each column randomly connects to 50% of input bits.

synPermConnected

0.25

Connection strength threshold. Above = "connected" (counts). Below = "potential" (exists but doesn't count yet).

synPermActiveInc

0.15

How much to strengthen a connection each cycle.

synPermInactiveDec

0.005

How much to weaken an unused connection each cycle.

globalInhibition

true

All columns compete against each other (not just neighbors).

maxBoost

1.2

Underactive columns get a slight boost to stay competitive.

The SP never stops learning

Every cycle, the SP adjusts its connections — even if it's seen the same input 10,000 times. This can cause "drift" where columns slowly change for the same input. The anomaly likelihood layer compensates for this.

Stage 4

Temporal Memory

This is where learning happens. The Temporal Memory (TM) is the brain of the system. It asks: "Given what just happened, what will happen next?"

Each of the 256 columns has 16 cells stacked vertically. That's 4,096 cells total. Each cell can store up to 64 prediction rules called segments. Each segment has connections called synapses that link to other cells.

The Structure: Columns, Cells, Segments, Synapses

Think of it as a building. Each column is a floor in the building. Each floor has 16 rooms (cells). Each room has a whiteboard with up to 64 notes (segments). Each note says something like: "If rooms 42, 89, and 201 were active last timestep, predict that I'll be active next."

Anatomy of the Temporal Memory

Step 1: The SP Activates Columns

The SP picks 10 active columns. Now the TM needs to decide: for each active column, which of the 16 cells should fire?

Step 2: Check Predictions

For each active column, the TM asks: "Did any cell in this column predict this would happen?"

A cell "predicts" if it has a segment (a rule) where enough synapses connect to cells that were active in the previous timestep. "Enough" means at least activationThreshold: 4 synapses match.

Predicted vs Bursting — what happens in each column

Two outcomes:

■ Predicted: A cell had a matching segment. Only that cell activates. Quiet, controlled. The TM expected this. Anomaly score contribution: 0.
■ Bursting: NO cell predicted this column. All 16 cells fire at once — the column "screams" because it's surprised. One cell is chosen as the winner to learn from this experience. Anomaly score contribution: 1/10 = 0.1.

Step 3: Learning — Building New Rules

Learning happens in two cases:

Case A: Column bursted (wasn't predicted)

The TM picks a winner cell (the one with the fewest existing segments) and creates a brand new segment on it. This segment grows synapses that connect to the previous timestep's winner cells.

It's saying: "The cells that were active before this happened? Remember them. Next time I see them, predict me."

Learning after a burst — new segment created

Case B: Column was correctly predicted

The segment that successfully predicted gets reinforced. Synapses to cells that were active get stronger (+0.10). Synapses to cells that were NOT active get weaker (-0.05). The rule becomes more confident.

Reinforcement — strengthening correct predictions

Step 4: Punishment — Wrong Predictions Weaken

Sometimes the TM predicts a column will be active, but the SP doesn't select it. How? The TM and SP have different jobs:

The TM predicts: "based on what just happened, I think column 100 will be active next"
The SP decides: "based on the actual encoder input, the top 10 columns are 12, 45, 67, 89, 103"

Column 100 was predicted but the SP picked column 103 instead — the input value changed just enough that a different column won the competition. The TM's prediction was wrong because the value moved.

When this happens, the segments that incorrectly predicted column 100 get gently weakened by predictedSegmentDecrement: 0.001. This is very slow — takes hundreds of wrong predictions to destroy a connection. The system forgets slowly, giving those segments a chance to be right next time the value returns to that range.

Step 5: Permanence — The Strength of a Connection

Every synapse has a permanence — a number between 0 and 1 that represents how strong the connection is.

Permanence lifecycle of a synapse

New synapses start at 0.35 — already above the connected threshold (0.25), so they work immediately
Each correct prediction: +0.10 (strengthens)
Each wrong prediction: -0.05 (weakens, but only half as fast)
If permanence drops below ~0: synapse dies and is removed
If ALL synapses on a segment die: the entire segment (rule) is destroyed

The asymmetry matters: building is 2x faster than tearing down. The network biases toward retaining learned connections.

The Complete Prediction Cycle

Putting it all together, here's what happens every 5 seconds when a new sensor reading arrives:

Encoder converts value to 256-bit SDR (41 bits ON)
SP picks 10 winning columns
TM checks: were any cells in these columns predicting?
Predicted columns → only predicted cell activates
Unpredicted columns → all 16 cells burst, one winner learns
Winner cells grow synapses to previous timestep's winners
Correctly predicted segments get reinforced
Wrong predictions get slightly weakened
Anomaly score = (bursting columns) / (total active columns)
The predictions made NOW become the "previous predictions" for the NEXT timestep

cellsPerColumn

Cells stacked per column. More = more sequence contexts the value can participate in.

activationThreshold

Synapses needed for a segment to activate (trigger prediction). Must be less than active columns (10).

minThreshold

Synapses needed for a segment to "almost match" (eligible for learning).

maxNewSynapseCount

Max synapses grown per learning cycle.

maxSegmentsPerCell

Max prediction rules per cell. More = remembers more patterns before forgetting old ones.

maxSynapsesPerSegment

Max connections per rule.

initialPermanence

0.35

New synapses start above threshold (0.25) — immediately connected. Survives 2 wrong predictions.

permanenceIncrement

0.10

Strengthen correct predictions aggressively.

permanenceDecrement

0.05

Weaken wrong predictions gently. Half the increment = bias toward keeping connections.

learningRadius

-1

Unlimited. Any cell can connect to any other cell. (-1 = no distance restriction.)

predictedSegmentDecrement

0.001

Very slow background decay on wrong predictions. Takes hundreds of cycles to weaken a connection.

skipLearningAfterStableCount

After 50 identical inputs in a row (~4 min), stop learning. Prevents wasting segment slots on idle patterns.

The critical rule

activationThreshold must be less than numActiveColumnsPerInhArea. A segment can only grow synapses to previous winner cells (one per active column). If the threshold exceeds the number of active columns, segments can NEVER activate and the TM can NEVER predict. We learned this the hard way.

Stage 5

Anomaly Score (Prediction Error)

After the TM processes the input, we compare what was predicted with what actually happened. The formal equation from the Numenta research paper:

s_t = 1 − (π(x_t-1) · a(x_t)) / |a(x_t)|

In plain English: take the prediction from the previous timestep (π), compare it with the actual encoding this timestep (a), and measure how much they overlap. The dot product counts matching bits, divided by total active bits.

s = 0.0 = perfect overlap. Every active column was predicted. Normal.
s = 0.5 = half the columns were unexpected. Something changed.
s = 1.0 = zero overlap. Complete surprise. Highly anomalous.

Two Types of Anomalies

Not all anomalies are the same. The Numenta research identifies two distinct types:

Spatial anomalies — a value that's unusual on its own, regardless of context. Like axis load suddenly spiking to 90% when it's normally around 30%. Any detection method can catch these, even a simple threshold.

Temporal anomalies — a value that's normal on its own but unusual in this sequence. Like load at 30% during what should be an active cutting cycle (where you'd expect 35-50%). The value isn't extreme — the timing is wrong. Only sequence-learning methods like HTM can catch these.

This is HTM's superpower. A simple threshold detector scores 0.00 on temporal anomalies in the Numenta benchmark. HTM scores 0.40. It detects subtle pattern changes — like a bearing vibration that slowly shifts days before failure — not just obvious spikes.

Why raw anomaly score isn't enough

In noisy systems (like a CNC machine with mechanical vibration), the raw score spikes frequently during normal operation. A tiny input change can cause one SP column to swap, producing a score of 0.1 even though nothing is actually wrong. If we alerted on every spike, we'd drown in false alarms. That's why we need one more stage...

anomalyThreshold

0.75

Raw score threshold (only used if anomaly likelihood is disabled). Not recommended — too many false positives on noisy data.

anomalyLearningPeriod

500

First 500 timesteps (~42 min): ramp up scores gradually instead of slamming to 1.0. Smooths cold start.

Stage 6

Anomaly Likelihood

This is the most important stage. The Numenta research showed that using anomaly likelihood instead of raw scores improves detection accuracy by 16.5 points on their benchmark (53.6 → 70.1). It's the difference between a noisy mess and a useful alert system.

The Washing Machine Analogy

Think of your washing machine. It vibrates during every cycle — that's normal.

Prediction error asks: "Was this vibration what I expected?" A single big vibration might score high — but it could just be the spin cycle.
Anomaly likelihood asks: "Is this vibration unusual compared to how noisy this machine usually is?" If the machine is always vibratey, one big vibration isn't noteworthy. But if the vibrations get consistently different from the past hour's baseline, something's wrong.

The likelihood doesn't ask "is this value weird?" — it asks "is the prediction error weird?" This is a crucial distinction.

The Deep Insight: Modeling Error, Not Data

Here's the key idea from the Numenta paper that makes the whole system work:

The anomaly likelihood models the distribution of ERRORS, not the distribution of data values

It doesn't care what the load value is. It doesn't know if 30% is normal or abnormal. It only knows: "the prediction error is usually around 0.08 with a standard deviation of 0.03. Right now the error is 0.45. That's 12 standard deviations above normal. Something is very wrong."

This is why the system works across completely different machines without tuning. The error distribution adapts automatically to whatever signal is being monitored.

The Math (from the Numenta paper)

The likelihood calculation has four steps:

Step 1: Rolling baseline — track the mean and variance of raw scores over a window of W samples:

μ_t = Σ s_t-i / W (mean of last W scores)

σ²_t = Σ (s_t-i − μ_t)² / (W-1) (variance)

Step 2: Short-term average — smooth the last W' raw scores to filter single-timestep spikes:

μ̃_t = Σ s_t-i / W' (average of last W' scores)

Step 3: How unusual is the current error? — compute the z-score (how many standard deviations above the mean):

z = (μ̃_t − μ_t) / σ_t

Step 4: Convert to probability — use the Gaussian Q-function (tail probability):

L_t = 1 − Q(z) (anomaly likelihood)

L_t close to 0 = the current error is within normal range. L_t close to 1 = the current error is extremely unusual. We flag an anomaly when L_t ≥ 1 − ε.

Anomaly Likelihood — interactive calculator

How to use this demo

1. See normal operation: Click + Normal several times. The blue bars stay low, the baseline (green dashed line) stays around 0.10, and the result says "NORMAL."

2. Simulate an anomaly: Drag the slider to 0.5 or higher, then click + Spike. Watch the yellow bar appear above the baseline, the z-score jump, the tail area on the Gaussian turn red, and the result flip to "ANOMALY DETECTED."

3. See concept drift: Click + Drift (10). Ten gradually increasing scores are added. The baseline shifts upward as the window fills with higher scores — after enough drift, what was anomalous becomes the new normal.

4. Reset and try again: Click Reset to start fresh with a clean baseline.

Why This Works on Noisy CNC Data

A CNC machine's load signal is inherently noisy during cutting. The raw anomaly score might sit around 0.1-0.2 as the load oscillates. The likelihood calculator sees this:

Normal cutting: raw scores averaging 0.12, std dev 0.05. A score of 0.15? z-score = 0.6. Likelihood = 0.27. Not flagged.
Tool wearing out: raw scores creeping to 0.35 over an hour. z-score = 4.6. Likelihood = 0.99999+. Flagged!
Bearing seizes: raw score jumps to 1.0. z-score = 17.6. Likelihood = 1.0. Immediately flagged!

The same threshold works for all three scenarios because the likelihood adapts to whatever the normal noise level is.

Concept Drift: Adapting to Change

When the machine switches to a new part program, the load profile changes. Raw scores spike as the TM encounters new patterns. But within a few minutes, the TM learns the new patterns and scores drop. The likelihood window eventually fills with the new baseline, and the system is recalibrated — no manual intervention needed.

This is called concept drift — the underlying statistics change over time. HTM handles it automatically because:

The TM continuously learns new patterns (never stops adapting)
The likelihood window is rolling (old scores drop out, new ones enter)
After the window fills with new-baseline scores, the old baseline is forgotten

enableAnomalyLikelihood

true

Must be on. Without it, every tiny score spike triggers an alert. The Numenta benchmark shows a 16.5-point improvement with likelihood enabled.

anomalyWindowSize (W)

20,000

Rolling window for baseline statistics. At 5s intervals = ~28 hours. The Numenta paper used 8,000 and noted it's "not sensitive to W as long as large enough." Larger = more stable baseline, slower to adapt after concept drift.

anomalyShortTermWindowSize (W')

Smooths the last 10 raw scores before comparing to baseline. A single spike won't trigger — the spike must persist for ~50 seconds. Must be much smaller than W (W' ≪≪ W).

anomalyLikelihoodThreshold (ε)

0.99999

Flag when L_t ≥ 1 − ε = 0.99999. The Numenta paper used ε = 10^-5 (same value!) and found it "works well across a large range of domains" without per-application tuning.

This is the real output

Everything before this — the encoder, SP, TM, raw score — is internal machinery. The anomaly likelihood is what gets shown on dashboards, what triggers alerts, what operators see. If the likelihood is working, the system is working, even if internal metrics look messy. The Numenta paper's best-performing detector used these exact formulas.

Multi-Signal

Multi-Encoder: Monitoring Signals Together

Everything so far describes monitoring one signal — like XM_load by itself. But a CNC machine has dozens of signals, and some anomalies only show up when you look at combinations.

Why Monitor Signals Together?

Imagine X-axis load is at 35% and spindle speed is at 6000 RPM. Both are individually normal. But if you've never seen 35% load at 6000 RPM before (usually it's 35% at 3000 RPM), something might be wrong — wrong tool, wrong feed rate, or a programming error.

An individual HTM watching load would say "35% is normal." An individual HTM watching spindle speed would say "6000 RPM is normal." But a multi-encoder HTM watching both together would say "this combination is new — anomaly."

How It Works: Concatenated SDRs

A multi-encoder takes 2, 3, or 4 signals and concatenates their individual encodings into one long SDR:

Multi-Encoder: xyz-load (3 signals concatenated)

Each member signal gets its own encoder (same w=41, same settings as individual). The outputs are placed side by side:

XM_load encoding: 256 bits (bits 0-255)
YM_load encoding: 256 bits (bits 256-511)
ZM_load encoding: 256 bits (bits 512-767)
Total: 768 bits, with 41 × 3 = 123 bits active

This combined SDR feeds into its own SP and TM — completely separate from the individual signal HTMs. The SP has 768 columns and picks 15 winners. The TM learns temporal patterns across all three signals simultaneously.

Individual vs Multi-Encoder: What Each Catches

Individual Encoder

Monitors: one signal in isolation

Catches: "XM_load spiked to 60%"

Misses: unusual combinations of normal values

SP columns: 256, 10 active

Faster learning: only 1 signal changing at a time

Multi-Encoder

Monitors: 2-4 signals together

Catches: "XM_load + YM_load + ZM_load are in an unusual combination"

Misses: nothing extra (has all the info individual has, plus correlations)

SP columns: 512-1024, 10-20 active

Slower learning: multiple signals change simultaneously = more column churn

The Tradeoff: More Context, More Noise

Multi-encoders are more powerful in theory — they see correlations between signals that individual encoders miss. But they're harder to learn because multiple signals change between timesteps.

With an individual encoder, the load might change by 1 encoder bucket between readings. The SP swaps maybe 1 of 10 columns. Easy for the TM to predict.

With a 3-signal multi-encoder, each signal might change by 1 bucket = 3 total changes across the concatenated SDR. The SP might swap 3-4 of 15 columns. Harder for the TM to predict, even when the machine is operating normally.

This is why we run both: individual encoders for clean per-signal anomaly detection, and multi-encoders for catching unusual signal combinations. The individual encoders are the workhorses; the multi-encoders are the specialists.

Our Multi-Encoder Groups

The system runs 23 multi-encoder groups simultaneously, monitoring different signal combinations:

    xyz-load: XM + YM + ZM load (force balance across axes)

    xyz-temperature: XM + YM + ZM temp (thermal balance)

    xyz-power: XM + YM + ZM power (power distribution)

    x-motor-health: XM load + XM temp + XM feedrate (per-axis health)

    spindle-health: spindle load + spindle speed

    complete-force: XM + YM + ZM load + spindle load (full force picture)

    x-load-execution: XM load + execution state (behavioral context)

    ...and 16 more combinations

Multi-encoder parameter cascade

Multi-encoder inputBits must equal the sum of member encoder bits. If you change a scalar encoder from 256 to 128 bits, every multi-encoder containing that signal must be updated too. With 23 groups referencing 15 signals, one change can cascade to 10+ config files.

Processing cost

Each signal update triggers its individual encoder PLUS every multi-encoder group it belongs to. XM_load participates in 8 groups — so one load reading triggers 9 HTM instances (1 individual + 8 groups). With 20 signals, that's ~40 HTM calls per 5-second cycle. The SP takes 10-100ms per call, so processing time adds up. Monitor queue depth to ensure the system keeps up with real-time data.

Full Picture

What Happens When a Machine Cuts Metal

1. Machine is Idle

Load sits at 30%. The encoder produces the same SDR every 5 seconds. The SP picks the same 10 columns. The TM predicts perfectly — score 0.0. After 50 identical readings, learning stops to preserve memory.

2. Cutting Starts

Load jumps to 35%. The encoder slides ~10 buckets. Some SP columns change. The TM's predictions don't fully match — a few columns burst. Score spikes to 0.2-0.3 briefly. The TM builds new segments connecting the new pattern to the old. Within seconds, it's predicting again.

3. Normal Cutting Oscillation

Load varies between 28-35% as the tool moves. With w=41, these values share most of the same SDR bits. The SP activates mostly the same columns. The TM has learned "load oscillates in this range" and predicts correctly. Score stays near 0.

4. Something Goes Wrong

A tool starts wearing out. Load creeps from 35% to 50% over an hour. Each new high value crosses encoder buckets the TM hasn't seen. Scores spike. The likelihood calculator sees scores well above the normal baseline. Alert fires.

Or: a bearing seizes. Load drops to 0% instantly. The TM predicted "35% next" and got 0%. Score = 1.0. Likelihood spikes immediately. Alert fires.

5. New Part Program

The machine switches to a different part. Load patterns are completely different — 45% instead of 30%. The TM bursts on the new columns, learns the new pattern over a few minutes. Old patterns survive in dormant segments (maxSegmentsPerCell: 64). When the old part returns, the TM may recognize it instantly.

The key insight

HTM doesn't detect "load is too high" (that's a threshold). It detects "this pattern has never happened before." A load of 90% during heavy cutting might be normal. A load of 40% during what should be idle is anomalous. The system learns context, not thresholds.

Summary

The Complete Config, Explained

Every parameter in the config file serves one of four purposes. Click any card to highlight its connections. Red warnings = hard constraints.

Encoder — How precisely to measure

inputBits256

The total number of bits in the encoder's output. This is the foundation that everything else builds on. Reducing it makes the system less sensitive to small changes (good for noisy CNC signals) but also reduces the total number of distinct patterns the encoder can represent. When you change this, numColumns must change to match, and every multi-encoder that includes this signal must update its inputBits too.

numColumns (must equal this)

w (together they determine resolution: 1/(n-w))

Cascades to all multi-encoder inputBits (sum of members)

w41

How many bits are ON in every encoding. This is the single most important parameter for controlling noise sensitivity. With w=41, two values that are one bucket apart share 40 of 41 bits (97% overlap). The Spatial Pooler sees nearly identical input and picks the same columns. This makes the system tolerant to normal load jitter during cutting. Must be odd.

inputBits (resolution = 1/(inputBits - w))

skipLearnStable (wider w = more values hash identically)

Sparsity: 41/256 = 16% • 215 bucket positions • 0.47% load per bucket

minVal / maxVal0 — 1

The range of values the encoder can represent. Set to 0-1 because mtc2anomaly normalizes all raw sensor values into this range before sending them. With clipInput=true, any value outside this range gets clamped to the boundary instead of causing an error. Tighter ranges increase resolution within that region but clip everything outside.

Normalization ranges are configured per-signal in mtc2anomaly appsettings.json

Spatial Pooler — How columns are selected

numColumns256

The total number of minicolumns in the Spatial Pooler. Each column competes to represent the current input. Out of these 256, only 10 win each cycle. This determines the total capacity of the system: 256 columns × 16 cells = 4,096 cells that can store learned patterns.

inputBits (must equal)

cellsPerColumn (total cells = numColumns × cellsPerColumn)

MUST equal inputBits

numActiveColumnsPerInhArea10

How many columns win the competition each cycle. With 10 active out of 256, the SP output is 3.9% sparse. More active columns means more overlap between consecutive timesteps (small value changes swap fewer columns), which makes TM predictions more robust. But too many active columns reduces the distinctness between different input patterns.

activationThreshold (MUST be greater than this)

activationThreshold MUST be less than this, or the TM can never predict

SP output sparsity: 10/256 = 3.9%

potentialPct0.5

The fraction of input bits each column is randomly wired to at startup. At 0.5, each column connects to ~128 of the 256 input bits. Higher values mean columns respond to broader input regions. Lower values mean more specialized columns but potentially less stable representations.

synPermConnected0.25

The permanence threshold that determines whether a synapse is "connected" and counts toward a column's overlap score. Synapses with permanence above 0.25 contribute to the competition; below 0.25 they exist but don't help the column win. This threshold also applies to the TM's distal synapses when determining if a segment activates.

initialPermanence (must be greater than this)

initialPermanence MUST be > this, or new TM synapses start disconnected

synPermActiveInc0.15

How much the SP strengthens connections to active input bits each cycle. At 0.15 per cycle, a synapse goes from initial (0.21) to well-connected (0.36) in just one cycle. The ratio with synPermInactiveDec (30:1) means the SP builds connections much faster than it tears them down, creating stable column specialists over time.

synPermInactiveDec (30:1 ratio controls specialization speed)

synPermInactiveDec0.005

How much the SP weakens connections to inactive input bits each cycle. At 0.005, it takes 50 cycles for a synapse to drop by 0.25 (from connected to disconnected). This slow decay means columns retain their learned specializations even through periods of different input. Too high and columns forget; too low and columns can never change.

synPermActiveInc (30:1 ratio — builds 30x faster than it decays)

maxBoost1.2

Columns that rarely win get a boost multiplier to stay competitive. At 1.2, underactive columns get up to 20% more overlap credit. This prevents "dead columns" that never participate. Setting to 1.0 disables boosting entirely. Recalculated every dutyCyclePeriod (600) cycles.

Temporal Memory — How it learns and remembers

cellsPerColumn16

How many cells are stacked in each column. Each cell can represent the same input value in a different temporal context. Cell 3 in column 42 might mean "30% load after idle" while cell 7 means "30% load during roughing." More cells = more contexts = the TM can distinguish more situations where the same value means different things.

numColumns (total cells = 256 × 16 = 4,096)

Total cells: 4,096 • Max segments: 262,144 • Max synapses: 8.4M

activationThreshold4

The number of connected synapses needed for a segment (prediction rule) to activate and trigger a prediction. This is the most critical constraint in the system: it MUST be less than numActiveColumnsPerInhArea. With 10 active columns, segments can have at most 10 synapses. Setting threshold to 4 means a segment only needs 4 of those 10 to match — giving it tolerance for columns that change between timesteps.

numActive (10) — MUST be greater than this

minThreshold — must be ≤ activationThreshold

MUST be < numActiveColumnsPerInhArea (10). Current gap: 6

minThreshold3

The minimum number of potential (not necessarily connected) synapses needed for a segment to be considered a "near miss." Segments above this threshold but below activationThreshold are candidates for learning — the TM will reinforce them so they can eventually cross the activation threshold and start predicting.

activationThreshold (minThreshold must be ≤ this)

initialPermanence0.35

The permanence value assigned to newly created synapses. At 0.35, new synapses start well above the connected threshold (0.25), meaning they work immediately without needing to be reinforced first. The gap of 0.10 between initial permanence and the threshold means a synapse can survive 2 wrong predictions (losing 0.05 each time) before dropping below the threshold and disconnecting.

synPermConnected (0.25) — must be less than initialPerm

permanenceDecrement (survival = margin / decrement = 0.10/0.05 = 2)

MUST be > synPermConnected (0.25)

Margin: 0.10 • Survives 2 wrong predictions before disconnecting

permanenceIncrement0.10

How much a synapse is strengthened when it correctly contributes to a prediction. At +0.10 per correct prediction, a synapse at 0.35 reaches 0.45 after one success. The ratio with permanenceDecrement (2:1) means the network builds connections twice as fast as it tears them down, creating a bias toward retaining learned patterns. This helps the system remember multiple cutting programs.

permanenceDecrement (2:1 ratio — builds 2x faster than decays)

After 5 correct predictions: 0.35 → 0.85 (very strong)

permanenceDecrement0.05

How much a synapse weakens when its source cell was not active during a prediction. At -0.05 per wrong prediction, a synapse starting at 0.35 crosses below the connected threshold (0.25) after just 2 consecutive errors. This is intentionally half the increment rate (0.05 vs 0.10) so the system forgets more slowly than it learns — old patterns survive longer in dormant segments.

permanenceIncrement (2:1 ratio)

initialPermanence (survival = (0.35-0.25)/0.05 = 2 wrong predictions)

maxSegmentsPerCell64

The maximum number of segments (prediction rules) each cell can hold. When a cell is full and a new pattern needs a segment, the least recently used segment gets evicted. At 64, each cell can remember 64 different temporal contexts — enough for multiple cutting programs to coexist. When an old program returns, its dormant segments may still be there, enabling instant recognition without re-learning.

Capacity: 4,096 cells × 64 segs × 32 syns = 8.4M max synapses

skipLearnAfterStableCount50

After this many consecutive identical encoder outputs, learning is suppressed for both the SP and TM. During long idle periods, the same value repeats thousands of times. Without this limit, the TM builds redundant segments on multiple cells, wasting capacity needed for cutting patterns. At 50 (~4 minutes of identical readings), the system learns the pattern then preserves it. Set to 0 for discrete signals like spindle speed or execution state that need continuous learning.

w (wider w = more distinct values hash identically = triggers more often)

Disabled (0) for: spindle speed, category encoder, multi-encoders

Sensitivity — When to alert

anomalyWindowSize (W)20,000

The rolling window of raw anomaly scores used to compute the baseline mean and standard deviation. The likelihood calculator asks "is the current score unusual compared to this window?" At 20,000 samples with 5-second intervals, this covers ~28 hours. Larger windows produce more stable baselines but take longer to adapt after machine changes. The Numenta paper used 8,000 and noted the system is "not sensitive to W as long as large enough."

anomalyShortTermWindow (W′)10

The short-term window that smooths recent raw scores before comparing to the baseline. Instead of reacting to a single timestep spike, the system averages the last 10 scores. A spike must persist for ~50 seconds to affect the short-term mean enough to trigger. This prevents false alerts from one-off prediction errors. Must be much smaller than the main window (10 vs 20,000).

W′ (10) ≪≪ W (20,000) — short-term vs long-term comparison

anomalyLikelihoodThreshold0.99999

The anomaly likelihood must exceed this threshold to trigger an alert. At 0.99999 (five 9s), the system only flags when it's 99.999% certain the current error level is unusual. This matches the Numenta research paper's recommendation of ε = 10⁻⁵, which they found "works well across a large range of domains" without per-application tuning. Very conservative — minimizes false positives at the cost of slightly delayed detection.

anomalyLearningPeriod500

During the first 500 timesteps after startup (~42 minutes), the raw anomaly score ramps up gradually from 0 instead of immediately reporting 1.0. This prevents a burst of false anomaly alerts during cold start when the TM has no learned patterns yet. After 500 timesteps, the score reflects the actual prediction quality. Only matters after a restart — irrelevant during normal operation.

How a Brain-Inspired Algorithm Learns to Predict CNC Machine Behavior

What Are We Building?

Normalization

The Encoder

Sparsity — Why Only 41 of 256 Bits?

Resolution

The Spatial Pooler

Step 1: Wiring — Each Column Gets Random Connections

Step 2: Overlap — Counting Matches

Step 3: Competition — Top 10 Win

Step 4: Why These Specific 10?

Step 5: Learning — Strengthening the Winners

SP Output Sparsity — The 2% Rule

Temporal Memory

The Structure: Columns, Cells, Segments, Synapses

Step 1: The SP Activates Columns

Step 2: Check Predictions

Step 3: Learning — Building New Rules

Step 4: Punishment — Wrong Predictions Weaken

Step 5: Permanence — The Strength of a Connection

The Complete Prediction Cycle

Anomaly Score (Prediction Error)

Two Types of Anomalies

Anomaly Likelihood

The Washing Machine Analogy

The Deep Insight: Modeling Error, Not Data

The Math (from the Numenta paper)

Why This Works on Noisy CNC Data

Concept Drift: Adapting to Change

Multi-Encoder: Monitoring Signals Together

Why Monitor Signals Together?

How It Works: Concatenated SDRs

Individual vs Multi-Encoder: What Each Catches

The Tradeoff: More Context, More Noise

Our Multi-Encoder Groups

What Happens When a Machine Cuts Metal

1. Machine is Idle

2. Cutting Starts

3. Normal Cutting Oscillation

4. Something Goes Wrong

5. New Part Program

The Complete Config, Explained

Encoder — How precisely to measure

Spatial Pooler — How columns are selected

Temporal Memory — How it learns and remembers

Sensitivity — When to alert