Principal Data Scientist

Data Analysis United States


WebMD and its affiliates is an Equal Opportunity/Affirmative Action employer and does not discriminate on the basis of race, ancestry, color, religion, sex, gender, age, marital status, sexual orientation, gender identity, national origin, medical condition, disability, veterans status, or any other basis protected by law.

Position Overview

PulsePoint’s award winning platforms accelerate data and programmatic technology to deliver contextually relevant and personalized health information. We help brands and agencies better understand audience engagement and are revolutionizing health decisions through real time data.

As a member of our Data Science Engineering team, the Principal Data Scientist will focus on the following:

  • Optimizing real-time bidding strategies and auction mechanics to efficiently spend ad budgets and deliver against campaign targets;  In addition to the above, they will work with the greater Data Science/Engineering teams on:
  • Improving existing or developing new traffic segmentation algorithms and estimations of bid landscapes within each segment;
  • Supporting and enhancing the existing work on health user profiling, prediction, and targeting tools;
  • Contributing on projects relating to patient/physician identity for cross-device tracking, profiling and targeting;
  • Supporting existing codebase for data integration and production support for our core models.
  • Improving page contextualizer technology: work with healthcare topics detection algorithms, keywords/phrases extraction, general and aspect-based sentiment analysis.

These are the things that we'll be looking for from a candidate:

  • Exposure to RTB Auction or similar experience; (Note--Please provide detail on this important requirement in your cover letter).
  • Advanced knowledge of Python using numpy & pandas;
  • Being able to optimize and speed up code.
  • Past experience managing a team

In addition to the above, you'll need to have strong knowledge in the following areas (along with a breakdown of the areas we'd like for you to have exposure with) :

  • Algorithms and Data Structures--Sorting, search tree, binary heap, trie; Time & mem complexity of algorithms.
  • Probability & Statistics-Markov processes and its stationary distributions; Stochastic matrix and properties of its eigenvalues; Bayesian inference and conjugate distributions; Two-sample hypothesis testing.
  • ML & DS--Dimensionality reduction; Geometry of PCA and SVD; Geometry of L1 and L2 regularisation (Why does L1 result in feature selection?); Decision Trees; Collaborative filtering; Thompson sampling; MCMC; Boosting, (Biases in Boosted DT); Bagging
  • Neural Networks--Embeddings; Encoders; Drop-out; CNN, RNN; Internal covariate shift.