Senior SRE Engineer
Description
Company Overview:
Global Technology Services is a rapidly expanding organization situated in Medellín, Colombia. We pride ourselves on possessing one of the most influential networks within software development and IT services for the entertainment, financial, and logistics sectors. Our corporate projections offer a multitude of opportunities for professionals to elevate their careers and experience substantial growth. Joining our team means engaging with expansive engineering teams across Latin America, Philippines and the United States, contributing to cutting-edge developments in multiple industries.
We are seeking a highly experienced Senior SRE Engineer to lead the reliability, scalability, and evolution of a high-traffic, AI-driven production environment. This role sits at the intersection of traditional Site Reliability Engineering and modern LLMOps, supporting next-generation AI-powered systems and agentic workflows.
Position: Senior SRE Engineer
Location: LATAM
What you will be doing:
You will be responsible for ensuring platform stability, optimizing performance, and implementing secure, scalable infrastructure across a complex, cloud-first ecosystem. This includes managing AI infrastructure, large-scale data systems, and distributed application architectures.
Required Skills & Experience
- 5+ years of experience in SRE, DevOps, or Platform Engineering
- 3+ years managing high-scale AWS production environments
- Strong scripting skills in Python
- Proven experience with Infrastructure as Code (Terraform, CDK, or similar)
- Demonstrated ownership of high-availability systems (99.9%+ uptime)
Nice to Have Skills
- Familiarity with AI/ML systems, LLM APIs, or agent-based architectures
- Experience with large-scale data migrations and lakehouse architectures
- Familiarity with AI security risks (prompt injection, model misuse)
- Experience with CDN optimization and edge security
- Background in high-traffic eCommerce or distributed systems
- AI Fluency: Understanding of modern AI infrastructure and operational constraints
- Automation Mindset: Focus on reducing manual effort through tooling and systems design
- Familiarity with Node.js
Soft Skills
- Problem Solving: Ability to diagnose and resolve complex system issues quickly
- Communication: Ability to explain technical concepts to cross-functional stakeholders
- Ownership: Strong accountability for system performance and reliability