DevOps Engineer, Monitoring
Sizmek is the largest independent buy-side advertising platform that creates impressions that inspire. In the digital world, creating impressions that inspire is vital to building meaningful, long-lasting relationships with your customers. Sizmek provides powerful, integrated solutions that enable data, creative, and media to work together for optimal campaign performance across the entire customer journey. Our AI-driven decisioning engine can identify robust insights within data across the five key dimensions of predictive marketing—campaigns, consumers, context, creative, and cost. We bring all the elements of our clients’ media plans together in one place to gain better understanding for more meaningful relationships, make every moment of interaction matter, and drive more value across the entire plan. Sizmek operates its platform in more than 70 countries, with local offices in many countries providing award-winning service throughout the Americas, EMEA, and APAC, and connecting more than 20,000 advertisers and 3,600 agencies to audiences around the world.
The Monitoring Engineer has responsibility for the creation and refinement of Monitoring Service policies, processes and procedures for application, system software and infrastructure monitoring of Sizmek production systems and infrastructures.
The Monitoring Engineer is responsible for:
- Operational excellence of new product releases, focusing on monitoring,
- Over-seeing best practice in the implementation of rigorous monitoring procedures covering the full product lifecycle,
- Producing new and upkeep existing automated functional and non-functional monitoring scripts and procedures.
- Working closely with and providing consultancy, outreach and training services to NOC team.
- Working closely with and providing consultancy, outreach and training services to the R&D and DevOps teams around the business and operations in: Use of various monitoring tool and developing effective monitoring to ensure that only fully monitored releases are deployed to live.
The Monitoring Engineer will achieve this through collaboration with the other members of the Deployment and NOC Teams. This role reports to the Deployment Team Leader.
Role ResponsibilityOversee the daily requirements for support of the monitoring tools to provide monitoring for critical production applications and interfaces with development and operations teams to identify and fulfill monitoring requirements.
- Troubleshoot monitor issues, build new monitors, attain requirements, provide instruction to users of the monitoring tools, adjust thresholds as needed and occasionally build custom URL, web service, SQL or script monitors.
- Maintain health of the environment by keeping the environment current with upgrades and patches and by troubleshooting and resolving issues with the associated tools.
- Writing scripts (powershell,bash, perl, etc.) to execute monitoring tasks that are out of the scope and capabilities of the monitoring systems.
- Participate and support troubleshooting and analysis processes in case the dimensioning services or unexpected systems behavior impact the quality of the services offered by company.
- Ensure that Monitoring policies tie into the wider Monitoring remit and also work to join as seamlessly as possible the development, operations and usage lifecycles and hence end to end monitoring of the entire product lifecycle chain and service.
- Work with Operations and development teams to understand improvement, new feature and enhancement requirements for the monitoring development teams to implement.
- Champion Monitoring processes and tools. Ensure that Monitoring principles, processes and tools are established and adhered to.
- Provide a robust service for monitoring products deploying onto the platform. Utilise this service to validate all expected KPIs, events, alerts, action, documentation and trend analysis graphs are fit for purpose for the product and infrastructure. Work with development and DevOps teams to ensure alert thresholds are fit for purpose and do not generate false positives or spam alerts / events.
- Participate in defining and managing quality assurance processes and procedures.
- Analyse technical architecture of systems and applications to understand dependencies, points of failure, impacts, and external and internal interfaces, to provide monitoring recommendations for system and infrastructure and time estimates.
- Provide outreach activities to Development and DevOps teams to diagnose and facilitate resolution of monitoring bugs/issues.
- Be the main point of contact for all monitoring communication ensuring that all relevant teams are aware of any monitoring developments, releases and issues to ensure expectations are managed.
The Ideal Candidate
EssentialThe ability to analyze and understand complex technical systems built from many separate software and hardware components, specifically web application architectures.
- An extremely good technical understanding of working within a connected and distributed environment model of multiple applications and the issues and constants this places on the physical environment.
- Strong knowledge of application, and system software monitoring principles and practice. Some knowledge of infrastructure monitoring principles and practice.
- The ability to work effectively with 3rd party and internal service providers who manage the application, system software and production or distribution infrastructure.
- Strong experience in using application, system software and infrastructure internal and external open source and commercial monitoring tools (Microsoft SCOM, CA SOI/Spectrum, Nagios,Cacti, Grafana,CatchPoint, NewRelic) together with setting of KPIs and a good understanding of how the monitoring tools are used by operations.
- Proficiency with Unix/shell scripting, Perl or python and PowerShell
- Knowledge of requirements and best practices for systems management support of Windows and Unix.
- Working knowledge of TCP/IP and SNMP Management Protocols,Netflow,Telemetry, SSH,DNS.
- Experience working with and monitoring web applications, Linux and Windows servers, load balancers, switches, routers, MSSQL,PostgreSQL,Hadoop and Couchbase databases, Java, .NET, Apache web services.
- Experience in a medium/large Internet environment