Member of Technical Staff - Cloud Technical Support
Description
Job Title: Senior Site Reliability Engineer (Senior SRE)
ABOUT WIND RIVER Wind River is a global leader delivering software for mission-critical intelligent systems. For over four decades, Wind River has powered billions of systems requiring the highest levels of security, safety, and reliability. Our software supports groundbreaking NASA missions such as Artemis I, the James Webb Space Telescope, multiple Mars rovers, and pioneering 5G initiatives.
ABOUT THE OPPORTUNITY Wind River Systems is seeking a Senior Site Reliability Engineer (SRE) experienced in deploying, managing, and scaling highly available, secure, and resilient software services across multi-cloud (AWS, Azure, GCP) and on-premises environments. You will collaborate closely with developers, architects, and operations teams to enhance system reliability, automation, security, and overall platform performance.
RESPONSIBILITIES
Kubernetes and Container Orchestration:
- Deploy, manage, optimize, and troubleshoot large-scale Kubernetes clusters in multi-cloud (AWS, Azure, GCP) and hybrid environments (OpenStack, VMware vSphere).
- Implement cluster autoscaling and resource management strategies with tools such as Karpenter.
Cloud and Hybrid Infrastructure Management:
- Architect, implement, and manage infrastructure in multi-cloud (AWS, GCP, Azure) and hybrid environments.
- Optimize cloud resource usage leveraging AWS Cost Explorer, Savings Plans, and similar tools on other cloud providers.
Monitoring, Observability, and Reliability:
- Develop and maintain comprehensive monitoring, logging, tracing, and alerting solutions using Prometheus, Grafana, CloudWatch, Datadog, or similar tools.
- Conduct root cause analysis (RCA) and implement proactive improvements to maximize system uptime, reliability, and performance.
CI/CD Pipelines and Automation:
- Design, implement, and maintain robust CI/CD pipelines using Jenkins, GitLab CI/CD, GitHub Actions, or Tekton.
- Promote and implement DevSecOps best practices across teams to automate testing, security scanning, and deployments.
Security, Compliance, and Governance:
- Integrate comprehensive security practices throughout the software lifecycle (DevSecOps), including vulnerability scanning and secure coding practices.
- Manage secrets securely using Vault, AWS Secrets Manager, Azure Key Vault, or similar tools.
- Ensure adherence to compliance standards and regulatory requirements.
Cost Optimization and Efficiency:
- Implement and enforce governance policies and frameworks to optimize infrastructure usage, reduce costs, and enhance operational efficiency.
- Regularly review and optimize cloud expenditure, performance, and scaling strategies.
Collaboration and Communication:
- Collaborate closely with architects, developers, QA, product teams, and management stakeholders.
- Clearly communicate complex infrastructure concepts and strategies to diverse stakeholders.
QUALIFICATIONS
- Bachelor's degree in Computer Science, Information Technology, or related technical discipline (Master’s preferred).
- 14+ years of experience as a Site Reliability Engineer, DevOps Engineer, Platform Engineer, or similar role.
- Extensive expertise in Kubernetes, container orchestration, and related ecosystem.
- Hands-on experience with cloud platforms (AWS, Azure, GCP), OpenStack, VMware vSphere, and hybrid environments.
- Proficiency in scripting and automation languages (Python, Bash, Go, or similar).
- Solid experience with infrastructure as code (Terraform, CloudFormation, Pulumi).
- Strong knowledge of CI/CD tools and pipeline design (Jenkins, GitLab CI/CD, GitHub Actions, Tekton).
- Exceptional troubleshooting and problem-solving skills, coupled with a proactive and continuous learning mindset.
PREFERRED QUALIFICATIONS
- Certifications in Kubernetes (CKA/CKAD/CKS), AWS (Solutions Architect, DevOps Engineer), Azure, or GCP.
- Familiarity with multi-cloud management tools and strategies.
- Background in software development or software infrastructure management.
Join our team at Wind River, contribute to building highly reliable, secure, and innovative software systems, and help shape the future of software-defined environments!
Wind River is an Equal Opportunity Employer with a commitment to diversity. We prohibit discrimination based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.