As a Senior DevOps Engineer, you'll play a crucial role in maintaining and improving systems that require constant availability. This position is ideal for someone with experience in AI and machine learning infrastructure.
In this role, you'll be responsible for ensuring the reliability and performance of AI and machine learning systems. You'll participate in on-call rotations, providing support during production incidents and acting as an escalation point for any issues that arise. Your expertise will help maintain systems that require 24/7 uptime, which is critical for the company's operations.
Your day-to-day tasks will involve monitoring system performance, troubleshooting issues, and implementing improvements to enhance system availability. You'll work closely with other engineers to optimize GPU workloads and support video and media processing tasks. This role suits someone who thrives in a fast-paced environment and has a strong understanding of DevOps practices.
Key responsibilities include: • Participating in on-call rotations • Maintaining AI/ML infrastructure • Improving system uptime and reliability • Supporting GPU workloads and media processing
To succeed in this position, you should have a solid background in DevOps, particularly with AI and machine learning technologies. Experience with high-availability systems and a proactive approach to problem-solving will be essential. If you're passionate about technology and enjoy working in a collaborative environment, this role could be a great fit for you.
You'll be taken to the original listing on CareerJet to apply.