OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.
OpenAI's Evals & Understanding Team is responsible for evaluating OpenAI’s models based on performance and safety. The team provides metrics and eval frameworks to allow researchers to understand the safety, efficacy, and performance of models as they’re developed. Products like ChatGPT, Dall-E, plugins, browsing, code interpreter, GPT-V rely on both human and synthetic data, as well as model-based experimentation to evaluate success.
Our team builds and deploys the products and experiences necessary to evaluate, debug, and understand our models scale with data from a variety of sources and builds the ML operations, data management tooling, quality and eval systems, model experimentation and insights tools that are leveraged to improve overall AI models.
In this role, you will:
You might thrive in this role if you: