We are representing an innovative AI-powered workplace safety company that has recently secured significant funding and is recognized as a leader in AI innovation. They specialize in harnessing the power of artificial intelligence, applied computer vision, and distributed systems to create state-of-the-art solutions that significantly enhance safety in the workplace. Their pioneering technologies are setting new standards in accident prevention across various industries.
Job Summary: Join our client's dedicated team as a Staff Distributed Systems Engineer and play a pivotal role in building and enhancing the infrastructure that underpins their groundbreaking ML-driven safety platforms. In this senior position, you will not only craft the systems that support machine learning applications, particularly in computer vision, but also guide a team towards achieving new heights in workplace safety technology.
Responsibilities:
- Architect and implement sophisticated distributed systems that form the backbone of our client's ML and computer vision platforms.
- Offer technical leadership and mentorship within your team, promoting a culture of excellence and collaboration.
- Employ agile project management methodologies to ensure deliverables are met with quality and on schedule.
- Develop robust and scalable data pipelines, emphasizing high availability and fault tolerance.
- Integrate DevOps best practices into the ML lifecycle to streamline deployment, scaling, and management using tools like Docker and Kubernetes.
- Advance the field of ML operations with your expertise, particularly in the context of distributed training and production environments.
- Navigate the dynamic landscape of a startup to design and deploy infrastructure solutions that have a real-world impact on safety.
Qualifications:
Must-Haves:
- A Bachelor's degree in Computer Science or a related technical discipline.
- Over 5 years of experience in software engineering with a demonstrated track record in distributed systems.
- A history of technical leadership, whether through project lead roles or as a senior member of a software engineering team.
- Hands-on experience with machine learning system design and working collaboratively on ML-focused teams.
- Deep understanding of distributed system architecture and infrastructure design principles.
- Mastery of container technologies like Docker and orchestration systems like Kubernetes.
- Proficiency in applying DevOps principles to ML operations, enhancing the efficiency and reliability of ML systems.
Nice-to-Haves:
- Insight into the application of computer vision within machine learning frameworks.
- Experience with cloud service platforms and the nuances of infrastructure management in a cloud environment.
- Knowledge of state-of-the-art ML operations, including model deployment and real-time monitoring.
- Expertise in automating ML pipelines and managing large-scale data within ML ecosystems.
- Familiarity with big data tools like Apache Spark and their integration into ML workflows.
Location: This is a hybrid role based in San Francisco, designed to blend the flexibility of remote work with the synergy of in-person sessions.
As part of our client's team, you will directly contribute to safeguarding workplaces with AI innovations, making a tangible difference in the lives of workers across industries. Your role is not just about building systems; it's about shaping a safer future.