Neural Network Scaling Analysis | 130-160 Hz Constraints

Single Perceptron

cycles

0.42s @ 150 Hz

2-Layer MLP (4→4→2)

495

cycles

3.3s @ 150 Hz

3-Layer MLP (4→4→4→2)

927

cycles

6.2s @ 150 Hz

Simple RNN (8 steps)

3,960

cycles

26.4s @ 150 Hz

Tiny Transformer

270K

cycles

30 min @ 150 Hz

Cycles vs Hidden Layer Width (2-Layer MLP)

2-Layer MLP

1 second threshold

Cycle Formula (2-Layer)

C = n_in × n_hidden × 21 + n_hidden × n_out × 21 + overhead

Cycles vs Network Depth (Width = 4)

MLP (width=4)

1 second threshold

⚡ Key Insight

Depth scaling is linear in cycles but exponential in representational power. This is why deep networks revolutionized AI—but also why they need massive compute.

Architecture Comparison @ 150 Hz

Perceptron

63 (0.4s)

MLP 4→2→2

279 (1.9s)

MLP 4→4→2

495 (3.3s)

MLP 4→8→2

927 (6.2s)

MLP 4→4→4→2

927 (6.2s)

RNN (8 steps)

3,960 (26s)

Architecture	Parameters	Cycles	Time @ 150Hz
Perceptron	5	63	0.42s
MLP 4→4→2	26	495	3.30s
MLP 4→8→8→2	114	1,791	11.9s
Simple RNN (seq=8)	~50	3,960	26.4s
Tiny Attention	~800	~18,000	2 min
Mini Transformer	~5,000	~270,000	30 min

Architecture Evolution & Feasibility

Perceptron (1958)

Single layer, linear classification

63 cycles • ✓ Real-time feasible

Multi-Layer Perceptron (1980s)

Hidden layers enable nonlinear boundaries

300-1000 cycles • ✓ Near real-time

Recurrent Neural Network

Sequential processing, temporal memory

~4000 cycles • ⚠ Multi-second delays

Attention Mechanism (2014)

"Which inputs matter?" - dynamic weighting

~18K cycles • ⚠ Minutes per inference

Transformer (2017)

Self-attention + parallelization

~270K+ cycles • ✗ Hours per token

💡 The Parallelization Insight

Transformers aren't faster than RNNs per-operation—they're faster because attention is embarrassingly parallel. At 150 Hz with one ALU, you get none of that benefit. A GPU with 10,000 cores at 1 GHz is ~10^11× faster.

Scaling Laws: Why Modern AI Needs Modern Compute

GPT-2 Small

124M

parameters

~500B cycles/token
~100 years @ 150 Hz

GPT-3

175B

parameters

~700T cycles/token
~150,000 years @ 150 Hz

Human Brain

86B

neurons

~1000 Hz, massively parallel
~20 watts

🔬 Research Question

The brain operates at ~1000 Hz per neuron—only 10× faster than our simulation. Yet it performs tasks no transformer can match. The difference isn't raw speed—it's 86 billion parallel units with 100 trillion connections. What's the minimum parallelism needed for "intelligent" behavior?