About Us
BenchFlow is creating a unified runtime for AI benchmarks. We host the largest library of rigorously designed evaluations (e.g., CMU WebArena, coding agents) and enable enterprises to run them via API.
Role
- Design scalable systems to execute benchmarks on cloud infrastructure
- Optimize runtime performance for Python/Node.js-based AI workflows
- Build tools for tracing, logging, and dynamic leaderboards
Requirements
- Proficiency in Python, async programming, and cloud platforms (AWS/GCP)
- Experience with distributed systems or developer tools (e.g., CI/CD)
- [Intern] Current student/recent grad in CS/Engineering
Nice to Have
- Knowledge of AI agent frameworks (LangChain, LlamaIndex)
- Familiarity with benchmarks like SWE-bench, WebArena
Salary and Perks
- Full-time Bay Area base $130k-170k/yr, 0.5%-1.5% equity; internship base Bay Area $6k/mo
- Full-time remote salary negotiable; internship remote ¥13k-¥24k
- Participate in developing open-source tools used by thousands of developers
- Full-time positions receive equity in a rapidly growing AI infrastructure company