Senior Data Engineer
The Senior Data Engineer works within the ML/AI Group and is responsible for the ongoing development of the Data & Machine Learning Platform.
This role will participate in all aspects of the data application life cycle, including analysis, design, development, testing, production deployment and support.
The role will help formulate approaches to improve existing processes, develop opportunities and present innovative solutions on cutting-edge technologies. You will create big data accelerators to help deploy scalable solutions fast.
The Senior Data Engineer will work with Data Architects and Data Scientists to evolve the DATA/ML platform, while delivering both strategic and tactical projects to provide iterative value through impactful business outcomes. This individual should be highly self-motivated with a strong problem-solving and analytical nature, love of learning, passion for insight, and desire for achieving results.
Duties and responsibilities
• Design and build data processing pipelines for structured and unstructured data using tools and frameworks in the big data ecosystem
• Architect, write code, complete programming and perform testing and debugging of data applications
• Design and develop new systems and tools in a number of languages to facilitate effective data ingestion and curation; batch, scalable event frameworks, streaming and real-time data analytic pipelines
• Build data APIs and data delivery services that support critical operational and analytical applications for our internal business operations, customers and partners.
• Implement and configure big data technologies as well as tune processes for performance at scale
• Design, develop and maintain conceptual, logical and physical data models for big data systems through schema on write and schema on read delivery.
• Perform data analysis with business, understanding business process and structuring both relational and distributed data sets.
• Build and provision analytical sandboxes for data scientists and analysts. Work with data science team to bring machine learning models into production
• Conduct timely and effective research in response to specific requests (e.g. data collection, summarization, analysis, and synthesis of relevant data and information)
• Devops for unit, functional, integration and regression test plans. Communicate with QA and port data engineering test scripts to QA team
• Evaluate, benchmark and integrate cutting edge open source data tools and technologies
• Monitor performance of the data platform and optimize as needed
• Work with Machine Learning and Deep Learning pipelines
• Work with diverse data including, Images, Words, Video, Audio
• Proficient understanding of distributed computing principles
• Programming development on Hadoop, include understanding of Zookeeper, Oozie and Yarn
• Working knowledge of HDFS storage formats (such as Parquet, ORC and ORC with snappy compression)
• Familiarity with networking and network application programming, including HTTP/HTTPS, JSON, and REST APIs
• Experience with at least one scripting language (ex: Python) and one object oriented language (ex: Java)
• Experience building ETL flows in Spark, Pyspark Experience in a production setting is preferred
• Knowledge of SQL-on-hadoop engines (presto, polybase, phoenix, etc)
• Experience deploying with Containers and virtual environments
• Ability to solve complex problems in a fast-paced environment with limited guidance
• An eye for quality and a willingness to do what is necessary to achieve deadlines in a dynamic environment with frequent priority changes
• Able to work efficiently in teams and/or as an individual
• Top-notch oral and written communication skills, especially the ability to make complex data and data science techniques accessible to non-technical team members
• B.S./M.S. in Computer Science or a related field, or equivalent experience
• Unix/Linux OS level knowledge with Bash/Shell scripting is a plus
• Data Analysis: Understand business processes, logical data models and relational database implementations
• Expert knowledge in SQL. Optimize complex queries.
• Administration of Hadoop cluster, with all included services and ability to solve any performance and work load management issues on the cluster
• Experience with writing Kafka producers and consumers is a plus
• Experience with Spark Streaming, Spark SQL in a production setting is a plus
A Great Employee:
• Exhibits courage
• Navigates ambiguity
• Relies on common sense and experience
• Leverages their educational background
• Communicates openly
• Likes to laugh
• Loves a good challenge
• Is resilient when faced with setbacks
• Teams well with others
• Insists on quality results
• Is forward-thinking
• Adapts easily to change
• Solves problems creatively
Hearst Magazines is an equal opportunity/affirmative action employer. All qualified applicants will receive consideration for employment without regard to sex, gender identity, sexual orientation, race, color, religion, national origin, disability, protected veteran status, age, or any other characteristic protected by law.
Pay Transparency Nondiscrimination Provision
The contractor will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by the employer, or (c) consistent with the contractor's legal duty to furnish information. 41 CFR 60-1.35(c)