Data Engineer

Job ID 2022-4492

Technology New York, New York United States


Description

Position at WebMD

WebMD is the most recognized and trusted brand of health information and the leading provider of health information services, serving consumers, physicians, healthcare professionals, employers and health plans through our public and private online portals and WebMD the Magazine. The WebMD Health Network includes WebMD, Medscape, MedicineNet, eMedicine, RxList, theheart.org and Medscape Education. Our consumer portals and mobile health applications provide engaging, relevant and credible health and wellness information, personalized health assessment tools and access to online communities.

WebMD is an Equal Opportunity/Affirmative Action employer and does not discriminate on the basis of race, ancestry, color, religion, sex, gender, age, marital status, sexual orientation, gender identity, national origin, medical condition, disability, veterans status, or any other basis protected by law.

Position Summary:

WebMD is looking for a Data Engineer with a diverse background in data integration to join the Data Management team. You don’t need to be a doctor to work at WebMD, but we’re looking for individuals who can diagnose and resolve any data problems. Some data are small, some are very large (1 trillion+ rows), some data is structured, some data is not. Our data comes in all kinds of sizes, shapes and formats, PostgreSQL, Oracle, SQL Server, Vertica, BigQuery, Mongo, Cockroach, Elasticsearch, Hive, Parquet, ORC, Iceberg to name a few.


We are looking for individuals who can design and solve any data problems using different types of databases and technology supported within our team. We use MPP databases to analyze billions of rows in seconds. We use Spark and Iceberg, batch or streaming to process whatever the data needs are. We also use Trino to connect all different types of data and files on the fly without moving them around.

Besides a competitive compensation package, you’ll be working with a great group of technologists interested in finding the right database to use, the right technology for the job in a culture that encourages innovation. If you’re ready to step up and take on some new technical challenges at a well-respected company, this is a unique opportunity for you.

Responsibilities:

  • Design and develop data projects against RDBMS such as PostgreSQL
  • Enhance and work in our MPP Data Warehouse Vertica
  • Contribute to the WebMD Data Lake and Data Warehouse imitative using Hive, Spark, Iceberg, Presto/Trino
  • Implement ETL/ELT processes using various tools (Pentaho) or programming languages (Java, Python) at our disposal
  • Provide reporting needs using Tableau, Pentaho BI as well as Excel
  • Analyze business requirements, design and implement required data models

Qualifications: (must have)

  • BA/BS in Computer Science or in related field
  • 4+ years of experience with RDBMS databases such as Oracle, MSSQL or PostgreSQL
  • 1+ years of experience with MPP/distributed databases such as Vertica, Snowflake, Google BQ or Hive
  • Programming background with either Python, Scala, Java or C/C++
  • Experience with Spark. PySpark, SparkSQL, Spark Streaming, etc…
  • Proficiency in Linux environment and shell scripting
  • Experience working in both OLAP and OLTP environments
  • Experience working on-prem, not just cloud environments

Desired: (nice to have)

  • Experience in Hadoop ecosystem, loading, storing, processing terabytes of data
  • Experience with Hadoop components such as Hive, Ranger, ZooKeeper and Sqoop
  • Experience with data orchestration such as Presto/Trino and Alluxio
  • Web analytics or Business Intelligence a plus
  • Understanding of Ad stack and data (Ad Servers, DSM, Programmatic, DMP, etc