DevOps Engineer (AI Infrastructure)
29 Ноября 2025
Город:
Астана
Занятость:
Полная занятость
Компания "Armeta KZ"
Armeta Inc. is developing advanced AI-driven systems that transform how large-scale engineering and construction projects are evaluated and approved. Our technology automates complex, compliance-heavy processes, ensuring accuracy and trustworthiness.
We are building a high-performance, on-premise computing platform to power our complex multi-agent, data, and backend systems, and we are looking for a DevOps engineer to build and manage this critical infrastructure.
Key Responsibilities
- Design, build, and maintain our high-availability on-premise infrastructure, built on Kubernetes and bare-metal (including supercomputers and NVIDIA DGX systems).
- Develop and manage robust CI/CD pipelines (e.g., GitLab CI, Jenkins) for automated building, testing, and deployment of all services.
- Manage the deployment, scaling, and operation of our core technology stack, including:
- Backend microservices (FastAPI);
- AI multi-agent systems and LLM-serving platforms;
- Distributed compute clusters (specifically Ray);
- Object storage systems (specifically Minio).
- Implement and manage comprehensive monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, ELK/Loki) to ensure system health and performance.
- Manage NVIDIA DGX hardware, including GPU drivers, CUDA, and high-performance networking (e.g., Infiniband).
- Automate infrastructure provisioning and configuration management using IaC tools (e.g., Ansible, Terraform).
- Work closely with AI and Backend teams to ensure a smooth, reliable path from research and development to production.
- Implement and maintain on-premise security best practices, including network policies, access control, and vulnerability management.
Qualifications
- Expert-level knowledge of Kubernetes (K8s) and the container ecosystem (Docker).
- Proven experience managing on-premise, bare-metal server environments. Experience with public cloud (AWS, GCP) is a plus, but on-premise expertise is essential.
- Strong experience with CI/CD tools (e.g., GitLab CI, Jenkins, GitHub Actions).
- Strong experience with Infrastructure as Code (IaC) tools (especially Ansible, Terraform).
- 5+ years of hands-on experience in DevOps, SRE, or a similar role.
- Deep understanding of networking principles (TCP/IP, load balancing, firewalls, VPCs).
- Proficiency in scripting and automation (e.g., Python, Bash).
- Experience with monitoring and logging stacks (e.g., Prometheus, Grafana).
Preferred Qualifications (Bonus Points)
- Strong experience with MLOps tools and platforms (e.g., KubeFlow, MLflow, Seldon Core, KServe).
- Hands-on experience with NVIDIA GPU management, CUDA, and the NVIDIA GPU Operator for K8s.
- Direct experience deploying and managing Ray clusters.
- Direct experience deploying and managing Minio clusters.
- Experience with high-performance networking (e.g., Infiniband).
- Experience with distributed storage systems (e.g., Ceph).
Зарегистрируйтесь или войдите, чтобы открыть контакты работодателя
Прикрепите резюме для отклика
Уже с нами?
Войдите, чтобы отправить резюме
07 Ноября
Сетевой инженер (Senior Network Engineer)( Национальные информационные технологии )
Астана
Компания "Национальные информационные технологии" Мы создаём цифровое государство будущего — делаем технологии ближе к людям и помогаем...
09 Ноября
Project manager in AI B2B SaaS (founder bussines assistant)
Астана
Компания "Hustle Media" Бизнес-ассистент основателя в AI-стартап Clario Эта позиция - возможность стать операционным лидером в B2B...
25 Ноября
BI Engineer - Temporary position for the period of maternity leave
Астана
Компания "inDrive" Responsibilities: Design, development, analysis, and automation of reporting pipelines (from data collection to...
25 Ноября
Астана
Компания "Wabtec Corporation, ТМ (Филиал компании «Транспортейшн Глоубэл Эл-Эл-Си»в Республике Казахстан)" Our best-in-class Services Team...
27 Ноября
Technical Service Engineer( Dialysis machine)
Астана
Компания "Бэйн Медикал Трэйдинг" Responsibilities: 1. Installation, demonstration and maintenance of Hemodialysis Machine. 2. Presentation,...
Вакансия размещена в отрасли