Single Perceptron
63
cycles
0.42s @ 150 Hz
2-Layer MLP (4→4→2)
495
cycles
3.3s @ 150 Hz
3-Layer MLP (4→4→4→2)
927
cycles
6.2s @ 150 Hz
Simple RNN (8 steps)
3,960
cycles
26.4s @ 150 Hz
Tiny Transformer
270K
cycles
30 min @ 150 Hz
Cycles vs Hidden Layer Width (2-Layer MLP)
2-Layer MLP
1 second threshold
Cycle Formula (2-Layer)
C = n_in × n_hidden × 21 + n_hidden × n_out × 21 + overhead
Cycles vs Network Depth (Width = 4)
MLP (width=4)
1 second threshold
⚡ Key Insight
Depth scaling is linear in cycles but exponential in representational power.
This is why deep networks revolutionized AI—but also why they need massive compute.
Architecture Comparison @ 150 Hz
| Architecture | Parameters | Cycles | Time @ 150Hz |
|---|---|---|---|
| Perceptron | 5 | 63 | 0.42s |
| MLP 4→4→2 | 26 | 495 | 3.30s |
| MLP 4→8→8→2 | 114 | 1,791 | 11.9s |
| Simple RNN (seq=8) | ~50 | 3,960 | 26.4s |
| Tiny Attention | ~800 | ~18,000 | 2 min |
| Mini Transformer | ~5,000 | ~270,000 | 30 min |
Architecture Evolution & Feasibility
Perceptron (1958)
Single layer, linear classification
63 cycles • ✓ Real-time feasible
Multi-Layer Perceptron (1980s)
Hidden layers enable nonlinear boundaries
300-1000 cycles • ✓ Near real-time
Recurrent Neural Network
Sequential processing, temporal memory
~4000 cycles • ⚠ Multi-second delays
Attention Mechanism (2014)
"Which inputs matter?" - dynamic weighting
~18K cycles • ⚠ Minutes per inference
Transformer (2017)
Self-attention + parallelization
~270K+ cycles • ✗ Hours per token
💡 The Parallelization Insight
Transformers aren't faster than RNNs per-operation—they're faster because attention is
embarrassingly parallel. At 150 Hz with one ALU, you get none of that benefit.
A GPU with 10,000 cores at 1 GHz is ~10^11× faster.
Scaling Laws: Why Modern AI Needs Modern Compute
GPT-2 Small
124M
parameters
~500B cycles/token
~100 years @ 150 Hz
~100 years @ 150 Hz
GPT-3
175B
parameters
~700T cycles/token
~150,000 years @ 150 Hz
~150,000 years @ 150 Hz
Human Brain
86B
neurons
~1000 Hz, massively parallel
~20 watts
~20 watts
🔬 Research Question
The brain operates at ~1000 Hz per neuron—only 10× faster than our simulation.
Yet it performs tasks no transformer can match. The difference isn't raw speed—it's
86 billion parallel units with 100 trillion connections.
What's the minimum parallelism needed for "intelligent" behavior?