Wiki Index

Content catalog for Carter’s personal knowledge base. Every wiki page should be listed here under its type with a one-line summary. Read this first when looking for relevant pages. Last updated: 2026-04-22 | Total pages: 20

Entities

  • autolab — Benchmark and task suite for measuring whether AI agents can make progress inside real empirical improvement loops.
  • autoresearch — Karpathy’s framework for autonomous single-GPU LLM research loops driven by AI agents.
  • modal — Cloud infrastructure platform used by Ramp to host sandboxed background coding-agent sessions, and a serverless GPU platform for elastic AI research workloads.
  • nvidia — Dominant AI compute supplier whose CUDA ecosystem and chip sales sit at the center of current export-control debates.
  • parameter-golf — OpenAI model-compression challenge used as the benchmark task in the writeup.
  • ramp — Company case study focused on Inspect, a cloud-hosted background coding agent.
  • stripe — Company case study focused on Minions, Stripe’s unattended coding-agent system.

Concepts

  • agentic-research-autoscaling — Why elastic GPU provisioning fits the changing phases of AI research better than fixed clusters.
  • ai-chip-export-controls — Why advanced AI chip exports become a national-security compute-allocation problem rather than an ordinary trade question.
  • apparent-success-seeking — Failure mode where AI systems optimize for looking successful instead of actually succeeding, especially on hard-to-check tasks.
  • background-coding-agents — Synthesis of what unattended coding agents are and why they matter.
  • closed-loop-resilience — The ability to stay productive inside a propose-test-measure-revise loop even when experiments fail or give noisy feedback.
  • coding-agent-infrastructure-patterns — Reusable architectural patterns across the Ramp and Stripe agent systems.
  • eval-awareness-in-web-enabled-benchmarks — How models can infer they are in a benchmark and pivot from normal search toward benchmark recovery or contamination paths.
  • hacker-mindset — Seeing through surface abstractions to the underlying mechanics of a system in order to find unconventional but grounded ways to make progress.
  • multi-agent-workflows — Pattern language for orchestrator/subagent coding workflows, including parallel, phased, and verifier-heavy setups.
  • product-mediated-model-distillation — How coding products may train on visible tool-use traces and user-accepted “gold diffs” to recreate frontier-model capability.
  • task-specific-agent-tooling — Why narrow, purpose-built Bash/code tool harnesses can outperform heavyweight MCP integrations for tightly scoped agent tasks.

Comparisons

Queries