Senior Site Reliability Engineer
12 Июня 2025

Город:
Астана
Занятость:
Полная занятость
Опыт:
Более 6 лет
Компания "Балхаш Системс"
We have 30 years of expertise in designing and building custom software systems. We provide software development services focusing on complex high-load applications, AI and BI solutions, and mobile apps.
Our client is a company in Luxembourg specializing in a knowledge assessment system with expertise in various areas, including academia (universities and schools).
As a DevOps Site Reliability Engineer (SRE), you will be responsible for ensuring the reliability, scalability, and performance of our systems. You will bridge the gap between development and operations by applying software engineering principles to infrastructure and operations problems. Your role will focus on automation, incident response, monitoring, capacity planning, and improving system resilience while supporting production workloads on Google Cloud Platform (GCP).
Responsibilities:
- Design, implement, and maintain highly available, scalable, and resilient cloud-based infrastructure using Google Cloud Platform (GCP).
- Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
- Conduct capacity planning, performance tuning, and load testing to optimize system performance.
- Develop chaos engineering practices to identify and mitigate failure scenarios.
- Develop and maintain Infrastructure as Code (IaC) using Terraform, Ansible, or equivalent tools.
- Automate system provisioning, configuration management, and deployments using CI/CD pipelines (ArgoCD, GitOps, GitHub Actions).
- Improve auto-healing and self-recovery capabilities in production environments.
- Monitor system health and performance using Google Cloud Operations Suite (Stackdriver), Prometheus, Dynatrace, Grafana and Datadog.
- Participate in on-call rotation, troubleshoot and resolve production incidents by applying root cause analysis (RCA).
- Implement postmortem processes and drive corrective actions to prevent recurrence.
- Implement and enforce security best practices, ensuring compliance with ISO 27001, SOC 2, and GDPR.
- Apply IAM (Identity & Access Management) best practices for secure cloud operations.
- Manage network security, including firewalls, VPNs, and service mesh (e.g., Istio).
- Work closely with development, security, and operations teams to improve deployment strategies.
- Advocate for blameless postmortems, knowledge sharing, and documentation improvements.
- Lead SRE best practices adoption, including error budgeting and toil reduction.
Required experience and skills:
- 3+ years of experience in a DevOps, SRE, or Cloud Engineering role.
− Strong expertise in Google Cloud Platform (GCP) services, including GKE, Cloud Run, Cloud Functions, Cloud SQL, BigQuery, and Pub/Sub.
− Experience with Kubernetes (GKE) and container orchestration.
− Proficiency in Terraform, Helm, and Kubernetes operators for infrastructure automation.
− Strong scripting and automation skills in Python, Bash, or Go.
− Experience with monitoring, logging, and tracing tools (e.g., Google Cloud Operations Suite, Prometheus, OpenTelemetry).
− Strong understanding of CI/CD pipelines using tools like ArgoCD, Jenkins, or GitHub Actions.
− Knowledge of GitOps methodologies and IaC best practices.
− Strong experience with PostgreSQL, Redis, and NoSQL databases.
− Strong problem-solving and critical-thinking skills.
− Ability to work collaboratively in a fast-paced environment.
− Strong communication and documentation skills.
− Ability to manage incidents under pressure and work on call as needed.
− Experience with multi-cloud (AWS/GCP) and hybrid environments.
− Knowledge of site reliability engineering principles (Google SRE).
− Understanding of security best practices for cloud-native applications.
− Google Cloud Certification (Professional Cloud DevOps Engineer, Professional Cloud Architect) is a plus.
Our offer as your future employer:
- full-time job with the flexible work schedule
- possibility to work remotely
- opportunities for professional growth.
Зарегистрируйтесь или войдите, чтобы открыть контакты работодателя
Прикрепите резюме для отклика
Уже с нами?
Войдите, чтобы отправить резюме
04 Июня
Астана
Компания "Hard Code" Обязанности: Тестирование REST API с помощью Postman / Swagger Регистрация багов и составление протокола по...
04 Июня
Ведущий QA инженер/AQA Engineer
Астана
Компания "Inter Solutions" Обязанности : • проведение тестирования программного обеспечения; • написание автотестов, тестовых скриптов; •...
02 Июня
Middle QA-engineer / Специалист по тестированию
Астана
от 80 000 руб.
Компания "Divo.ai" We are Buzz.ai Buzz is a rapidly-scaling SaaS company in the Sales Engagement space. We’ve grown incredibly quickly to...
04 Июня
Астана
Компания "LeverX International" Senior SAP TM Consultant at LeverX: Take Your Career to the Next Level! Are you ready to grow your expertise...
04 Июня
Астана
Компания "Inter Solutions" Требования: Очень хорошее алгоритмическое мышление. Опыт работы разработчиком Java8+. Знание Spring/Spring...
Вакансия размещена в отрасли