Job Description
At Nexus AI Labs, we are pushing the boundaries of generative AI and predictive modeling. We are looking for a Senior Machine Learning Engineer to join our core infrastructure team to build scalable, high-performance systems that power our next-generation research platforms.
You will work at the intersection of data science and production engineering, ensuring our models perform reliably under heavy real-time loads. If you are passionate about optimization, distributed computing, and solving complex architectural challenges, this role is for you.
Responsibilities
- Architect and deploy large-scale machine learning models into production environments.
- Collaborate with research scientists to translate experimental code into scalable production services.
- Optimize model training pipelines and inference latency for high-throughput systems.
- Implement MLOps best practices, including CI/CD for models, monitoring, and automated retraining.
- Mentor junior engineers and contribute to our internal engineering documentation and standards.
- Analyze performance bottlenecks and develop innovative solutions for distributed hardware acceleration.
Qualifications
- M.S. or Ph.D. in Computer Science, Artificial Intelligence, or a related quantitative field.
- 5+ years of experience in production-grade software engineering with Python and C++.
- Deep expertise in deep learning frameworks like PyTorch, TensorFlow, or JAX.
- Proficiency with cloud infrastructure (AWS/GCP) and container orchestration using Kubernetes.
- Proven track record of optimizing large-scale distributed training on GPU clusters.
- Strong understanding of data structures, algorithms, and system design patterns.
- Excellent communication skills and ability to thrive in a fast-paced, research-driven environment.