Site Reliability Engineer Lead - Bandung, Indonesia
Description
About the Role
As an SRE Lead, you will be responsible for leading a team of Site Reliability Engineers in designing, implementing, and maintaining highly scalable and reliable systems. You will collaborate with cross-functional teams, including software engineers and system administrators, to ensure the seamless operation of critical production services.
Location – Hybrid | Bandung, Indonesia (at least 2 days per week in the office)
What You’ll be Doing
Team Leadership
- Lead, mentor, and guide a team of Site Reliability Engineers.
- Foster a collaborative and innovative team culture.
- Provide technical guidance and support for team members across all activities of the SiteOps team
- Be accountable for the work outcomes of the SRE team including production uptime and optimization projects
- Own Operational projects and Network Operations Centre (NOC). Implement SOP for the NOC to derive maximum coverage.
System Architecture and Design
- Collaborate with software engineering teams to design and implement scalable and reliable systems.
- Participate in code reviews to ensure adherence to DevOps / SRE best practices.
- Work closely with system administrators, network engineers, and security teams to ensure a holistic approach to system reliability.
- Manage and version control infrastructure configurations.
Automation and Tooling
- Develop and maintain automation scripts and tools to streamline operational tasks.
- Develop and maintain automation tools for infrastructure provisioning, configuration management, and deployment (Terraform or Ansible)
- Implement monitoring and alerting solutions to proactively identify and address potential issues.
- Evaluate, implement, and manage DevOps / SRE -related tools for configuration management, monitoring, and logging.
Incident Management
- Lead incident response efforts, ensuring timely resolution of production issues.
- Conduct post-incident reviews and implement improvements to prevent future incidents.
Performance Optimization
- Analyse system performance and implement optimizations to enhance reliability and efficiency.
- Work on capacity planning to accommodate future growth.
Security and Compliance
- Work with security teams to implement and enhance security measures in the DevOps pipeline.
- Ensure compliance with industry standards and regulatory requirements.
- Ensure adherence to established security standards across all environments both with the engineering and the SiteOps teams
Documentation
- Maintain and update documentation related to system architecture, processes, and best practices.
On-call Support
- Participate in an on-call rotation schedule to provide 24/7 support for production systems.
About You
- Bachelor’s degree in computer science, Information Technology, or a related field.
- Proven experience as a Site Reliability Engineer / DevOps leader or in a similar role.
- In-depth knowledge of cloud computing platforms (e.g., AWS, Azure, GCP).
- Strong leadership and communication skills.
- Proficiency in programming/scripting languages (e.g., Python, Shell, Ruby).
- Proficiency in System administration of production servers
- Experience with container orchestration tools (e.g., Kubernetes, Docker).
- Familiarity with infrastructure as code (e.g., Terraform, Ansible).
- Expertise in Build and release strategies and able to implement the right strategy for the team
- Expertise in monitoring and logging tools (e.g., Datadog, Zabbix, Prometheus, Grafana, etc).
- Solid understanding of networking concepts and protocols.
- Understanding of security best practices in DevOps processes.
- Excellent problem-solving and communication skills.
- Ability to work independently and collaboratively in a team environment.
About Us
NinjaOne unifies IT to simplify work for nearly 40,000 customers in 140+ countries.
The NinjaOne Unified IT Operations Platform delivers endpoint management, autonomous patching, backup, and remote access in a single console to improve efficiency, increase resilience, and reduce spend. By automating IT and managing all endpoints, organizations give employees a great technology experience at work.
NinjaOne is obsessed with customer success and has retained a 98% customer satisfaction score for more than 5 years.
What You’ll Love
- We are a collaborative, kind, and curious community
- We prioritise your work/life balance offering a hybrid work environment and free in-office lunches throughout the week
- We reward your work with opportunity for growth and advancement
- Grow personally and together with one of the fastest growing companies globally
- Develop your skills through our renowned training platform
- Receive competitive compensation
- Collaborate with an amazing international workforce
Additional Information
This position is NOT eligible for Visa sponsorship.
All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, genetic information, marital status, veteran status, or any other status protected by applicable law. We are committed to providing an inclusive and diverse work environment.