Staff Distributed Systems Engineer @ Skill Torch

Skill Torch

Staff Distributed Systems Engineer

San Franciscoup to $200,000full-time

About the role

We are representing an innovative AI-powered workplace safety company that has recently secured significant funding and is recognized as a leader in AI innovation. They specialize in harnessing the power of artificial intelligence, applied computer vision, and distributed systems to create state-of-the-art solutions that significantly enhance safety in the workplace. Their pioneering technologies are setting new standards in accident prevention across various industries.

Job Summary: Join our client's dedicated team as a Staff Distributed Systems Engineer and play a pivotal role in building and enhancing the infrastructure that underpins their groundbreaking ML-driven safety platforms. In this senior position, you will not only craft the systems that support machine learning applications, particularly in computer vision, but also guide a team towards achieving new heights in workplace safety technology.

Responsibilities:

Architect and implement sophisticated distributed systems that form the backbone of our client's ML and computer vision platforms.
Offer technical leadership and mentorship within your team, promoting a culture of excellence and collaboration.
Employ agile project management methodologies to ensure deliverables are met with quality and on schedule.
Develop robust and scalable data pipelines, emphasizing high availability and fault tolerance.
Integrate DevOps best practices into the ML lifecycle to streamline deployment, scaling, and management using tools like Docker and Kubernetes.
Advance the field of ML operations with your expertise, particularly in the context of distributed training and production environments.
Navigate the dynamic landscape of a startup to design and deploy infrastructure solutions that have a real-world impact on safety.

Qualifications:

Must-Haves:

A Bachelor's degree in Computer Science or a related technical discipline.
Over 5 years of experience in software engineering with a demonstrated track record in distributed systems.
A history of technical leadership, whether through project lead roles or as a senior member of a software engineering team.
Hands-on experience with machine learning system design and working collaboratively on ML-focused teams.
Deep understanding of distributed system architecture and infrastructure design principles.
Mastery of container technologies like Docker and orchestration systems like Kubernetes.
Proficiency in applying DevOps principles to ML operations, enhancing the efficiency and reliability of ML systems.

Nice-to-Haves:

Insight into the application of computer vision within machine learning frameworks.
Experience with cloud service platforms and the nuances of infrastructure management in a cloud environment.
Knowledge of state-of-the-art ML operations, including model deployment and real-time monitoring.
Expertise in automating ML pipelines and managing large-scale data within ML ecosystems.
Familiarity with big data tools like Apache Spark and their integration into ML workflows.

Location: This is a hybrid role based in San Francisco, designed to blend the flexibility of remote work with the synergy of in-person sessions.

As part of our client's team, you will directly contribute to safeguarding workplaces with AI innovations, making a tangible difference in the lives of workers across industries. Your role is not just about building systems; it's about shaping a safer future.