Sr Data Scientist [India]

Engineering - India Mumbai, India or Remote, India


Description

Job Description

We are looking for an experienced engineer to join our data science team, who will help us design, develop, and deploy machine learning models in production. You will develop robust models, prepare their deployment into production in a controlled manner, while providing appropriate means to monitor their performance and stability after deployment.


What You’ll Do will include (But not limited to):

  • Preparing datasets needed to train and validate our machine learning models
  • Anticipate and build solutions for problems that interrupt availability, performance, and stability in our systems, services, and products at scale.
  • Defining and implementing metrics to evaluate the performance of the models, both for computing performance (such as CPU & memory usage) and for ML performance (such as precision, recall, and F1)
  • Supporting the deployment of machine learning models on our infrastructure, including containerization, instrumentation, and versioning
  • Supporting the whole lifecycle of our machine learning models, including gathering data for retraining, A/B testing, and redeployments
  • Developing, testing, and evaluating tools for machine learning models deployment, monitoring, retraining.
  • Working closely within a distributed team to analyze and apply innovative solutions over billions of documents
  • Supporting solutions ranging from rule-bases, classical ML techniques to the latest deep learning systems.
  • Partnering with cross-functional team members to bring large scale data engineering solutions to production
  • Communicating your approach and results to a wider audience through presentations

Your Qualifications:

  • Demonstrated success with machine learning in a SaaS or Cloud environment, with hands–on knowledge of model creation and deployments in production at scale
  • Good knowledge of traditional machine learning methods and neural networks
  • Experience with practical machine learning modeling, especially on time-series forecasting, analysis, and causal inference.
  • Experience with data mining algorithms and statistical modeling techniques for anomaly detection in time series such as clustering, classification, ARIMA, and decision trees is preferred.
  • Ability to implement data import, cleansing and transformation functions at scale
  • Fluency in Docker, Kubernetes
  • Working knowledge of relational and dimensional data models with appropriate visualization techniques such as PCA.
  • Solid English skills to effectively communicate with other team members

Due to the nature of the role, it would be nice if you have also:

  • Experience with large datasets and distributed computing, especially with the Google Cloud Platform
  • Fluency in at least one deep learning framework: PyTorch, TensorFlow / Keras
  • Experience with No–SQL and Graph databases
  • Experience working in a Colab, Jupyter, or Python notebook environment
  • Some experience with monitoring, analysis, and alerting tools like New Relic, Prometheus, and the ELK stack
  • Knowledge of Java, Scala or Go-Lang programming languages
  • Familiarity with KubeFlow
  • Experience with transformers, for example the Hugging Face libraries
  • Experience with OpenCV

About Egnyte

In a content critical age, Egnyte fuels business growth by enabling content-rich business processes, while also providing organizations with visibility and control over their content assets. Egnyte’s cloud-native content services platform leverages the industry’s leading content intelligence engine to deliver a simple, secure, and vendor-neutral foundation for managing enterprise content across business applications and storage repositories. More than 16,000 customers trust Egnyte to enhance employee productivity, automate data management, and reduce file-sharing cost and complexity. Investors include Google Ventures, Kleiner Perkins, Caufield & Byers, and Goldman Sachs. For more information, visit www.egnyte.com

#LI-Remote