I study how language models fail and build tools to catch it.

AI Engineer, 7 years. I design agents for production and research LLM trust at the architectural level.

Trust Bench in progress

Open-source profiler that looks inside LLMs to find failure modes benchmarks miss. Next: training sparse autoencoders on Qwen3.5's hybrid DeltaNet layers.

all →
microGPT Playground Train a transformer in your browser →