AgentTest
Why This is an Opportunity
YC W26 analysis shows agent infrastructure is the #1 investment theme — auth, testing, security, context management, and observability. 85% of the batch is AI-first, a third are building agents. But there is no standard way to test agent behavior across changes. Sentrial (YC W26) is doing AI failure detection, validating the category. YC W26 batch analysis: 'the money moved to infrastructure — auth, testing, security, context management, and observability for agents.' @steipete: 'Running tests in parallel is taxing' — the testing bottleneck is real.
Key Pain Points
- •You updated your system prompt and 3 agent workflows silently broke because you had no regression tests
- •A model version change caused your agent to start hallucinating tool calls and you didn't catch it for 2 days
- •You have no idea if your agent performs worse on Opus 4.6 vs Sonnet 4.6 for your specific use case
- •Your CI/CD pipeline tests code but completely ignores whether your AI agents still work correctly
Original Discovery
A testing and QA platform specifically for AI agent workflows. You define expected behaviors for your agents — when given this input, it should call these tools in this order and produce this output — and AgentTest runs regression suites against your agent configurations. Catches when a model update, prompt change, or new tool breaks your agent behavior before it hits production. Includes snapshot testing, latency tracking, cost benchmarks, and failure mode detection.
Ready to Build This?
Sign up to save this opportunity and get your personalized MVP kit. Includes domain name suggestions, boilerplate code, and AI prompts to build your MVP rapidly.
Free MVP kit • Domain finder • Starter code