(Remote, Latin America) Sr. Site Reliability Engineer, Big Data
Description
Description
About PulsePoint:
PulsePoint is a fast-growing healthcare technology company (with adtech roots) using real-time data to transform healthcare. We help brands and agencies interpret the hard-to-read signals across the health journey and unify these digital determinants of health with real-world data to produce the most dimensional view of the customer. Our award-winning advertising platforms use machine learning and programmatic automation to seamlessly activate this data, making marketing, predictive analytics, and decision support easy and instantaneous.
Watch this video here to learn more about our culture and get a sense of what it’s like to work at PulsePoint!
Watch this video here to learn more about our culture and get a sense of what it’s like to work at PulsePoint!
Note that this is a long-term contractor role.
What you’ll be doing:
- Deploying, configuring, monitoring and maintaining multiple big data stores, across multiple datacenters. Perform planning, configuration, deployment and maintenance work relevant to the environment. Managing the large-scale Linux infrastructure to ensure maximum uptime.
- Developing and documenting system configuration standards and procedures.
- Performance and reliability testing. This may include reviewing configuration, software choices/versions, hardware specs, etc.
- Advancing our technology stack with innovative ideas and new creative solutions.
Who are you:
- Collaboration is in your DNA. You enjoy contributing to a mutual cause, that is why you know when the team succeeds, you succeed.
- You are always looking for ways to grow your skills. You are hungry to learn new technologies and share your insights with your team.
- You like a big picture perspective and also digging into the fine details. You can think strategically but also dive into complex systems and break them down and build them back better.
- You are a proactive problem solver. You are irked by an unreliable infrastructure and your first instinct is to find ways to fix it.
What you'll need:
- Multi-faceted Alluxio and Apache Hadoop (GitOps) understanding, including the Kerberos, for data storage and Trino, Hive for data retrieval.
- Experience managing Kafka clusters on Linux.
- Experience administering Percona XtraDB Cluster.
- Thorough understanding of Linux (we use Rocky Linux in production).
- Any scripting language (Python/Ruby/Shell etc).
- Understanding of basic networking concepts (TCP/IP stack, DNS, CDN, load balancing).
- Willing and able to work East Coast U.S. hours 9am-6pm EST.
Bonus, but not required:
- Experience with Security-related best practices.
- Puppet configuration management tool.
- Experience with scalable infrastructure monitoring solutions such as Icinga, Prometheus, Graphite, Grafana and ELK.
- Experience with container technologies such as Docker and Kubernetes.
- Ability to work with Cassandra cluster from installation through troubleshooting and maintenance.
- Train/mentor junior-level staff.
- Experience in AdTech or High-Frequency Trading.
Selection Process:
- Initial Screening Call (30 mins)
- Team Lead Screening (30 mins)
- Technical Interview with SREs (1 hour)
- General Discussion with VP of Data Engineering (30 mins)
- Technical Interview with Principal Architect (1 hour)
- Meet & Greet w/ SVP of Engineering (15 mins)
- Final Video Call with Sr. Director of Data Management at WebMD (30 mins)
WebMD and its affiliates is an Equal Opportunity/Affirmative Action employer and does not discriminate on the basis of race, ancestry, color, religion, sex, gender, age, marital status, sexual orientation, gender identity, national origin, medical condition, disability, veterans status, or any other basis protected by law.