Data Scientist

Data Science Remote, United States


About O'Reilly Media

O’Reilly’s mission is “changing the world by sharing the knowledge of innovators.” For the last 40 years, we’ve been helping people learn new skills, track significant new technologies, and build careers in technology and business. This extends to our employees: we have a long and proud history of encouraging and enabling the people who work here to take advantage of O’Reilly’s resources and network to keep learning, take on new challenges, and build careers. O’Reilly Media has two main offices, in Boston and Sebastopol (San Francisco Bay Area), with a large portion of our employees working remotely.

Learn More

About your team

Data drives much of what we at O’Reilly Media do, making the Data Science Team central to how the company operates and continues to grow its influence. We provide business analysis and reporting, build machine learning models, develop prototypes for data-intensive components to the platform, and maintain data systems. We are a distributed team working remotely from home offices, but we place a strong emphasis on teamwork and collaboration: an ideal candidate is one that is both self-motivated and enjoys working with others. Team members are encouraged to learn new skills and technologies as they relate to our mission. We believe work should be fun and friendly, and we hold dear our core belief in maintaining a healthy work-life balance.

About the Job

We have an opening for a Data Scientist who will perform a variety of analyses and data tasks to support core business capabilities such as marketing, sales and product improvements.

Both quantitative and qualitative aspects of analysis are important to this role. The Data Scientist’s primary responsibilities include pulling, cleaning, transforming and analyzing both internal and external data, as well as performing more advanced modeling and machine learning tasks. Some of our projects are based on requests from other departments, others are driven by larger data science team goals. Often we are asked to look into fairly open-ended problems, which requires a degree of  creativity, curiosity and research skill.

Crucial to this role is seeing the bigger picture: not just following the instructions of a JIRA ticket or request, but realizing the implications and actionable potential of an analysis. The Data Scientist must understand what data we have – or need to obtain – in order to provide significant insights and value to our learning customers and internal organization. This demands creativity and drive to figure out how to get the data, and how to use the data to keep O’Reilly at the forefront of learning.

The Data Scientist works on problems that are core to the company’s mission. Our team is at the forefront of some of the business’s major challenges, for example, understanding and modeling how users progress through content and developing metrics that describe their learning experience. This role has a significant opportunity to positively impact the company through data-driven means.

Most of our analysis is performed in Python and SQL. Our primary data storage systems are based on Postgres, Redshift, MySQL, BigQuery and Google Cloud Storage. The Data Scientist will:

  • Extract, transform and analyze platform and customer data based on requests from product, sales, marketing and executive teams
  • Write code to automate processes for routine data reporting requirements
  • Undertake research to explore more open-ended data problems
  • Effectively present and communicate research results with coherent visualization and actionable recommendations
  • Work with product teams on platform features such as user-facing dashboards, personalization, topic organization, search and content discovery
  • Maintain our backend data ETL processes and automated reporting services
  • Drive the Data Science Team’s impact in the company by taking the initiative with data-driven solutions for company problems

About You

You should have a strong foundation in Python programming for data-related tasks, including experience with data science packages such as NumPy and scikit-learn, and strong SQL skills. The level of coding required will go beyond simple scripts and queries: we are always trying to automate the boring stuff with well-designed and efficiently engineered packages of code. Some knowledge of git, unix commands and big data / distributing computing solutions is highly valued.

Key skills and Experience

  • 2+ years in data science / analytics, or equivalent
  • Excellent Python skills
  • Strong SQL skills
  • Personable and able to work with with others in a team
  • Significant experience combining and transforming data
  • Experience with wrangling dirty data
  • Excellent critical thinking skills and the ability to develop creative solutions to challenging problems
  • Ability to create effective data visualizations
  • Excellent written, verbal, and presentation skills
  • Experience with scientific computing and machine learning, especially NumPy, SciPy and scikit-learn
  • Experience working with unstructured text data and performing NLP tasks
  • Comfortable working on a Linux server
  • Self-motivated and ability to work independently when necessary



At O’Reilly, we believe that true innovation depends on hearing from, and listening to, people with a variety of perspectives. We want our whole organization to recognize, include, and encourage people of all races, ethnicities, genders, ages, abilities, religions, sexual orientations, and professional roles.

Learn More