Skill TorchHire With Rapha

Staff Distributed Systems Engineer

San Franciscoup to $200,000full-time

About the role

We are representing an innovative AI-powered workplace safety company that has recently secured significant funding and is recognized as a leader in AI innovation. They specialize in harnessing the power of artificial intelligence, applied computer vision, and distributed systems to create state-of-the-art solutions that significantly enhance safety in the workplace. Their pioneering technologies are setting new standards in accident prevention across various industries.

Job Summary: Join our client's dedicated team as a Staff Distributed Systems Engineer and play a pivotal role in building and enhancing the infrastructure that underpins their groundbreaking ML-driven safety platforms. In this senior position, you will not only craft the systems that support machine learning applications, particularly in computer vision, but also guide a team towards achieving new heights in workplace safety technology.

Responsibilities:

  • Architect and implement sophisticated distributed systems that form the backbone of our client's ML and computer vision platforms.
  • Offer technical leadership and mentorship within your team, promoting a culture of excellence and collaboration.
  • Employ agile project management methodologies to ensure deliverables are met with quality and on schedule.
  • Develop robust and scalable data pipelines, emphasizing high availability and fault tolerance.
  • Integrate DevOps best practices into the ML lifecycle to streamline deployment, scaling, and management using tools like Docker and Kubernetes.
  • Advance the field of ML operations with your expertise, particularly in the context of distributed training and production environments.
  • Navigate the dynamic landscape of a startup to design and deploy infrastructure solutions that have a real-world impact on safety.

Qualifications:

Must-Haves:

  • A Bachelor's degree in Computer Science or a related technical discipline.
  • Over 5 years of experience in software engineering with a demonstrated track record in distributed systems.
  • A history of technical leadership, whether through project lead roles or as a senior member of a software engineering team.
  • Hands-on experience with machine learning system design and working collaboratively on ML-focused teams.
  • Deep understanding of distributed system architecture and infrastructure design principles.
  • Mastery of container technologies like Docker and orchestration systems like Kubernetes.
  • Proficiency in applying DevOps principles to ML operations, enhancing the efficiency and reliability of ML systems.

Nice-to-Haves:

  • Insight into the application of computer vision within machine learning frameworks.
  • Experience with cloud service platforms and the nuances of infrastructure management in a cloud environment.
  • Knowledge of state-of-the-art ML operations, including model deployment and real-time monitoring.
  • Expertise in automating ML pipelines and managing large-scale data within ML ecosystems.
  • Familiarity with big data tools like Apache Spark and their integration into ML workflows.

Location: This is a hybrid role based in San Francisco, designed to blend the flexibility of remote work with the synergy of in-person sessions.

As part of our client's team, you will directly contribute to safeguarding workplaces with AI innovations, making a tangible difference in the lives of workers across industries. Your role is not just about building systems; it's about shaping a safer future.