Mini-Transformer Block

Attention + FFN + LayerNorm + Residuals @ 130-160 Hz
Configuration
Sequence Length 4
Embedding Dim 8
FFN Hidden 16
Attention Heads 1
Processing Phases
Input Embed 0
LayerNorm 1 ~320
Self-Attention ~2,400
Residual Add ~32
LayerNorm 2 ~320
Feed-Forward ~3,200
Residual Add ~32
Output 0
Transformer Block Architecture
INPUT LAYERNORM 1 μ, σ → normalize SELF-ATTENTION Q·Kᵀ/√d → softmax → V 4 positions × 8 dim + + LAYERNORM 2 μ, σ → normalize FEED-FORWARD Linear(8→16) → GELU Linear(16→8) + + OUTPUT ~320 cyc ~2,400 cyc ~320 cyc ~3,200 cyc
READY Transformer block ready. Press Run to execute.
Layer State
Current Activations (pos 0)
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
Attention Weights (pos 0→all)
Performance
Phase
Cycles
0
Time @ 150Hz
0.00s
% Complete
0%
Ready 0.00s
⚠ Full Transformer Scaling
This block (tiny) ~6,300 cyc
Time @ 150 Hz ~42 sec
GPT-2 block ~500M cyc
GPT-2 full (12 layers) ~6B cyc
GPT-2 @ 150 Hz ~1.3 years/token
🧠
Your Brain Has Performed
0
operations since arriving