Hadoop and Big Data Administrator

Engineering Requisition ID 5000 Pune, India

Description

We are seeking a highly motivated and talented Hadoop Admin or Big Data Stack Admin to work on Qualys’ next-generation data platform. Working alongside a very talented team of engineers and architects, you will be responsible for running and managing our big data stack in production and supporting a highly scalable SaaS based cloud security data platform. This is a great opportunity to be an integral part of a team building Qualys’ next generation microservices based technology platform processing over a 100 million transactions and terabytes of data per day, leverage opensource technologies, and work on challenging and business-impacting projects. 
 
Responsibilities 

  • Responsible for implementation and ongoing administration of Hadoop infrastructure. 
  • Implementing, managing and administering the overall Hadoop infrastructure. 
  • Takes care of the day-to-day running of Hadoop clusters 
  • Responsible for working closely with the database team, network team, BI team and application teams to make sure that all the big data applications are highly available and performing as expected. 
  • Responsible for capacity planning and estimating the requirements for lowering or increasing the capacity of the Hadoop cluster 
  • Responsible for enabling different level of Hadoop security at ecosystem level. 
  • Performance tuning of Hadoop clusters and Hadoop ecosystem. 
  • Monitor and enhance Hadoop cluster jobs performance and capacity planning 
  • Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments. 
  • Cluster maintenance as well as creation and removal of nodes using Ambari or others and CICD pipeline 
  • Monitor Hadoop cluster connectivity and security 
  • Enhance data at rest and data at motion security 
  • Manage and review Hadoop log files and enhance retention policy. 
  • File system management and monitoring. 
  • Diligently teaming with the infrastructure, network, database, application and business intelligence teams to guarantee high data quality and availability 
  • Collaboration with application teams to install operating system and Hadoop updates, patches, version upgrades when required. 
  • Backup and recovery tasks 
  • Resource and security management 
  • Good understanding of OS concepts, process management and resource scheduling. 
  • Basics of networking, CPU, memory and storage. 
  • Good hold of shell scripting 

 

Who you are... 
 

  • A trendsetter. You thrive in an intellectually challenging environment with leading edge technologies. 
  • A teacher. You’re able to mentor your peers and help our team grow. 
  • A learner. You have an insatiable thirst for knowledge and greater understanding. 
  • A pragmatist. Your goal is to create useful products, not build technology for technology’s sake. 
  • An empath. You understand what the customer needs and use that perspective to create the best user experience. 

 

How we work... 
 

  • With others. We have remote pairing tools where engineers often work together, as well as virtual collaboration tools for building out architecture solutions. 
  • With transparency. We encourage open dialog and discourse. We don’t encourage silos. 
  • With agility. We don’t believe in following a process for process’s sake. We ship frequently and focus on delivering incremental value. 
  • With open minds. We are committed to building a diverse team of people with unique perspectives. This encourages a healthy and inclusive environment that builds a more sustainable, successful company. 
  • With pride. We value our people most of all. We invest in ourselves by applying our own strengths and interests to company needs. 

 
Qualifications 
 

Must have 

  • BS/MS degree in Computer Science, Applied Math or related field. 
  • 5+ years hands-on experience with managing big data cluster with Hadoop, HDFS, Hive and Spark. 
  • 4-6 years of experience in experience in admin duties like - Administration, maintenance, control, optimization of Hadoop capacity, Performance tuning, security, configuration, process scheduling, and errors. 
  • 4-6 years experience in understanding Big Data events flow pipeline. 
  • Knowledge of cluster monitoring tools like Ambari and Cloudera Manager 
  • Experience in planning and implementing Backup, Archival and Recovery (BAR) and High availability (HA) 
  • Experience in Planning and supporting hardware and Hadoop software installation, upgrades, design, configuration, installation (upgrade), monitoring and performance tuning of large scale hadoop production environments, preferably with Apache, Hortonworks, Cloudera distributions 
  • Implementing standards and best practices to manage and support data platforms  
  • Management of data, users, and job execution on the Hadoop System 
  • Knowledge of File system management and monitoring, Shell scripting 
  • Excellent knowledge of Centos/Ubuntu/Redhat LINUX OS 
  • Experience with Capacity planning and estimating the requirements for hadoop cluster. 
  • Prior experience with RDBMS systems and NoSQL databases. 
  • Proven knowledge of Strong analytical and debugging skills. 
  • Proven expertise in data-driven critical thinking problem-solving 
  • Demonstrated passion for innovation with an ability to work within agile software development methodologies 
  • Ability to clearly articulate and communicate technical concepts within and across teams. 

 

Preferred 

  • Experience with DR (Disaster Recovery) strategies and principles.  
  • Experience with deploying and managing Hadoop cluster on Public cloud providers 
  • Experience with infrastructure orchestration and automation tools (Terraform) 
  • Experience in working with ML models and related deployment 
  • Experience with Prometheus, Grafana, Alert Manager and Kibana monitoring 
  • Hands on experience in Linux administration  
  • Hands-on experience using CI/CD pipelines 

EEO Employer/Vet/Disabled