Sr. Site Reliability Engineer (U.S.)

Technology Newark, New Jersey


Description

Senior SRE Job Description

There's likely a reason you've taken the time out of your busy day to review this opportunity at PulsePoint. Maybe you're in need of a change or there's "an itch you're looking to scratch." Whatever the reason, ask yourself the following questions:

  • Do you want to join a company that takes pride in the work they do?

  • Do you want to work for a company that helps you navigate your career and invests in your development?

  • Is having an open and transparent leadership team important to you in your next role?

If you answered yes to the above questions, you are in luck….PulsePoint is hiring!   

About PulsePoint:   

PulsePoint is a leading technology company that uses real-world data in real-time to optimize campaign performance and revolutionize health decision-making. Leveraging proprietary datasets and methodology, PulsePoint targets healthcare professionals and patients with an unprecedented level of accuracy—delivering unparalleled results to the clients we serve. The company is now a part of Internet Brands, a KKR portfolio company and owner of WebMD Health Corp.

Watch this video here to learn more about our culture and get a sense of what it’s like to work at PulsePoint! 

We are looking for a Senior Site Reliability Engineer to join our team. Note that this is a full-time employee position for U.S. candidates and we cannot accept C2C for the U.S.

Location:   

Anywhere in the U.S. and EU/EE, as long as you can work East Coast U.S. hours. 

What you'll be doing:
  • Ensure reliability and scalability of our multi datacenter and hybrid Linux environments
  • Managing the large-scale Linux infrastructure to ensure maximum uptime 
  • Performance and reliability testing. This may include reviewing configuration, software choices/versions, hardware specs, etc.
  • Advancing our technology stack with innovative ideas and new creative solutions
  • Participating in capacity management of core systems and services, application analysis and performance and security tuning. Provide operational support of systems and build automation to remediate and address the root cause; with the goal of automating response to all non-exceptional service conditions.
  • Create strategies for long term permanent fixes to critical production incidents.
  • Maintain documentation, build tooling, and create alerts to both identify and address infrastructure reliability.
  • Proactively identify system anomalies.
  • Collaborate with the security team on the new initiatives and ongoing changes

Who are you:

  • Collaboration is in your DNA. You enjoy contributing to a mutual cause, that is why you know when the team succeeds, you succeed.
  • You are always looking for ways to grow your skills. You are hungry to learn new technologies and share your insights with your team.
  • You like a big picture perspective and also digging into the fine details. You can think strategy but also dive into complex systems, break them down and build them back better.
  • You are a proactive problem solver. You are irked by an unreliable infrastructure and your first instinct is to find ways to fix it.
  • You stay up to date with security best practices and implement them in everything you do 

 

What you’ll need:

 

  • Minimum 7 years of relevant experience  
  • Thorough understanding of Linux (we use CentOS and Rocky Linux in production)
  • Deep understanding of Puppet stack (roles & profiles, Hiera, PuppetDB)
  • Experience with Foreman 
  • You know what git is and can easily resolve a merge conflict. 
  • Experience with Jenkins CI
  • Experience administering SQL/NoSQL databases (MySQL, PostgreSQL, MongoDB, ES, Redis, Memcached)
  • Ability to work with Cassandra database clusters from installation through troubleshooting and maintenance.    
  • Experience with scalable infrastructure monitoring solutions such as Icinga, Prometheus, ELK, Graphite
  • Strong scripting and automation skills using languages like Ruby, Python,Bash 
  • Understanding of networking concepts (TCP/IP stack, DNS, PKI, CDN, load balancing )
  • Experience with on-prem/ bare metal servers operation
  • Knowledge of virtualization solutions - KVM 
  • Storage Configuration experience: Netapp, EMC 
  • Experience with container technologies such as Docker, Containerd 
  • Diverse experience with IT Security-related best practices in the SRE context
  • Willing and able to work East Coast U.S. hours (9am-6pm EST)

Bonus, but not required:

  • Knowledge of K8s and its ecosystem
  • Train/mentor junior-level staff
  • Experience in AdTech or High-Frequency Trading a plus
  • Hands on with Cloud platforms: AWS and GCP

 

Selection Process
1. Initial phone screen (30 mins)
2. Hiring manager video interview (1 hour)
3. Team video interview (40 mins each with 3-4 team members)
4. IB leadership video interview (30 mins)
There will also be an online assessment at some point in the process. We are still working on creating the assessment
What we’ll give to you:
  • Comprehensive healthcare with medical, dental, and vision options, and 100%-paid life & disability insurance
  • 401(k) Match
  • Generous paid vacation and sick time
  • Paid parental leave & adoption assistance
  • Annual tuition assistance
  • Better Yourself Wellness program
  • Commuter benefits and commuting subsidy
  • Group volunteer opportunities and fun events
  • A referral bonus program -- we love hiring referrals here at PulsePoint
And there’s a lot more!
WebMD and its affiliates is an Equal Opportunity/Affirmative Action employer and does not discriminate on the basis of race, ancestry, color, religion, sex, gender, age, marital status, sexual orientation, gender identity, national origin, medical condition, disability, veterans status, or any other basis protected by law.