2026.06.02 fpgaarchitecture 2 min read

Patterns are data, not gates

Why the Regex Accelerator keeps its rules in runtime-programmable tables instead of compiling them into logic - and what that trade actually costs.

VoskenAI · Jun 2, 2026

There are two ways to put a regular expression into hardware. You can compile the pattern into logic - every state a flop, every transition a LUT - or you can keep the pattern in memory and build an engine that walks it. The first is how most published regex accelerators work. We chose the second, and the reason is operational, not academic.

The hidden cost of hard-wired patterns

A hard-wired engine is fast and compact, right up until a rule changes. Then the change is not a configuration update; it is a synthesis run, a place-and-route run, a timing closure exercise, and a bitstream deployment. On a large FPGA that is hours, with a nonzero chance the new rule set simply doesn’t close timing the way the old one did.

For the workloads this engine targets - intrusion detection, deep packet inspection, content filtering - rule feeds update daily, sometimes hourly. An engine that needs re-synthesis per update is an engine that is permanently out of date.

Tables instead

The Regex Accelerator executes an NFA whose transition tables live in on-chip memory. Rule sets are compiled to tables offline - seconds, on a host CPU - and loaded into the engine over a control interface while the device stays in service. The bitstream never changes.

The capacity numbers, with their conditions:

parameter	value	conditions
NFA states	262,144	default configuration, banked on-chip RAM
Rules	8,192	with per-rule metadata
Character classes	512	shared, table-driven
Clock	250 MHz	AWS F2 · Virtex UltraScale+ VU47P

The default table configuration occupies roughly 9% of the device’s URAM, which is the point: the cost of programmability is memory, and on a modern FPGA, memory is the resource you have.

What we gave up

Honesty requires the other column. A hard-wired engine of the same state count would burn fewer block RAMs and could, pattern by pattern, clock higher. A table-driven engine pays a read per transition and lives within the bandwidth of its memory banks. Those are real costs, and for a fixed, rarely-changing pattern set, hard-wiring can be the right answer.

But “the rules never change” is a claim almost no security workload can make. We optimized for the world where patterns are operational data - versioned, audited, deployed on their own schedule - and the silicon is the stable substrate underneath them.

That separation has a second-order benefit: the verification story splits cleanly in two. The engine is verified once, against the table semantics. A rule set is validated by construction in the compiler. Neither has to be re-proven when the other changes.

Want the evidence behind the words?

See Verification Evidence →