Senior Data Engineer

#4900   |   Engineering New York, New York


Position at Corporate Technology

 

 

The Senior Data Engineer works within the ML/AI Group and is responsible for the ongoing development of the Data & Machine Learning Platform.  

This role will participate in all aspects of the data application life cycle, including analysis, design, development, testing, production deployment and support.

 

The role will help formulate approaches to improve existing processes, develop opportunities and present innovative solutions on cutting-edge technologies. You will create big data accelerators to help deploy scalable solutions fast.

 

The Senior Data Engineer will work with Data Architects and Data Scientists to evolve the DATA/ML platform, while delivering both strategic and tactical projects to provide iterative value through impactful business outcomes. This individual should be highly self-motivated with a strong problem-solving and analytical nature, love of learning, passion for insight, and desire for achieving results.

 

 

Duties and responsibilities

  Design and build data processing pipelines for structured and unstructured data using tools and frameworks in the big data ecosystem

  Architect, write code, complete programming and perform testing and debugging of data applications

  Design and develop new systems and tools in a number of languages to facilitate effective data ingestion and curation; batch, scalable event frameworks, streaming and real-time data analytic pipelines

  Build data APIs and data delivery services that support critical operational and analytical applications for our internal business operations, customers and partners.

  Implement and configure big data technologies as well as tune processes for performance at scale

  Design, develop and maintain conceptual, logical and physical data models for big data systems through schema on write and schema on read delivery.

  Perform data analysis with business, understanding business process and structuring both relational and distributed data sets.

  Build and provision analytical sandboxes for data scientists and analysts. Work with data science team to bring machine learning models into production

  Conduct timely and effective research in response to specific requests (e.g. data collection, summarization, analysis, and synthesis of relevant data and information)

  Devops for unit, functional, integration and regression test plans. Communicate with QA and port data engineering test scripts to QA team

  Evaluate, benchmark and integrate cutting edge open source data tools and technologies

  Monitor performance of the data platform and optimize as needed

  Work with Machine Learning and Deep Learning pipelines

  Work with diverse data including, Images, Words, Video, Audio

 

Qualifications

  Proficient understanding of distributed computing principles

  Programming development on Hadoop, include understanding of Zookeeper, Oozie and Yarn

  Working knowledge of HDFS storage formats (such as Parquet, ORC and ORC with snappy compression)

  Familiarity with networking and network application programming, including HTTP/HTTPS, JSON, and REST APIs

  Experience with at least one scripting language (ex: Python) and one object oriented language (ex: Java)

  Experience building ETL flows in Spark, Pyspark Experience in a production setting is preferred

  Knowledge of SQL-on-hadoop engines (presto, polybase, phoenix, etc)

  Experience deploying with Containers and virtual environments

  Ability to solve complex problems in a fast-paced environment with limited guidance

  An eye for quality and a willingness to do what is necessary to achieve deadlines in a dynamic environment with frequent priority changes

  Able to work efficiently in teams and/or as an individual

  Top-notch oral and written communication skills, especially the ability to make complex data and data science techniques accessible to non-technical team members

  B.S./M.S. in Computer Science or a related field, or equivalent experience

 

Desired Skills:

  Unix/Linux OS level knowledge with Bash/Shell scripting is a plus

  Data Analysis: Understand business processes, logical data models and relational database implementations

  Expert knowledge in SQL. Optimize complex queries.

  Administration of Hadoop cluster, with all included services and ability to solve any performance and work load management issues on the cluster

  Experience with writing Kafka producers and consumers is a plus

  Experience with Spark Streaming, Spark SQL in a production setting is a plus

 

 

A Great Employee:

  Exhibits courage

  Navigates ambiguity

  Relies on common sense and experience

  Leverages their educational background

  Communicates openly

  Likes to laugh

  Loves a good challenge

  Is resilient when faced with setbacks

  Teams well with others

  Insists on quality results

  Is forward-thinking

  Adapts easily to change

  Solves problems creatively

 

 

 


Hearst Magazines is an equal opportunity/affirmative action employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected veteran status, age, or any other characteristic protected by law.

Pay Transparency Nondiscrimination Provision

The contractor will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by the employer, or (c) consistent with the contractor's legal duty to furnish information. 41 CFR 60-1.35(c)

EEO IS THE LAW

English Version

EEO IS THE LAW SUPPLEMENT

English Version