As a Senior DevOps Engineer, you'll play a crucial role in maintaining and enhancing our AI and ML infrastructure. This position is ideal for someone who thrives in high-pressure environments and has a passion for optimizing complex systems.
In this role, you will be responsible for maintaining and improving our AI and ML infrastructure pipelines. You will work with GPU workloads and ensure that our distributed processing environments are running smoothly. Your expertise will be vital in managing high-availability systems that require 24/7 uptime, making your role critical to our operations.
Your day-to-day responsibilities will include monitoring system performance, troubleshooting production incidents, and implementing improvements to enhance system reliability. You will collaborate with other teams to ensure that our infrastructure meets the demands of our AI and ML projects.
This position is well-suited for someone with a strong background in DevOps practices, particularly in environments that leverage AI and ML technologies. You should have experience with high-availability systems and be comfortable working under pressure to resolve incidents quickly.
Key requirements for this role include a solid understanding of AI and ML infrastructure, experience with GPU workloads, and a proven track record in managing distributed systems. If you are passionate about technology and enjoy optimizing complex systems, this could be the perfect opportunity for you.
You'll be taken to the original listing on CareerJet to apply.