Mini-Transformer Block @ 130-160 Hz | The Complete Architecture

Configuration

Sequence Length 4

Embedding Dim 8

FFN Hidden 16

Attention Heads 1

Processing Phases

Input Embed 0

LayerNorm 1 ~320

Self-Attention ~2,400

Residual Add ~32

LayerNorm 2 ~320

Feed-Forward ~3,200

Residual Add ~32

Output 0

Transformer Block Architecture

READY Transformer block ready. Press Run to execute. —

Layer State

Current Activations (pos 0)

0.00

Attention Weights (pos 0→all)

—

Performance

Phase

—

Cycles

Time @ 150Hz

0.00s

% Complete

Ready 0.00s

⚠ Full Transformer Scaling

This block (tiny) ~6,300 cyc

Time @ 150 Hz ~42 sec

GPT-2 block ~500M cyc

GPT-2 full (12 layers) ~6B cyc

GPT-2 @ 150 Hz ~1.3 years/token