Data Analyst
Description
iSoftStone, Inc. is seeking Data Analysts to join our Team in Menlo Park, CA!
This is an Onsite Contract Position is Menlo Park, CA
Summary:
Despite rapid advancements in generative AI, achieving high-quality generation remains a challenge. This is primarily due to the scarcity of high-quality training data and the lack of reliable and robust evaluation metrics that can effectively capture subtle details in model outputs, which can significantly impact user experience.
Our team is dedicated to developing comprehensive data curation and evaluation solutions to enhance our model across various quality dimensions. These include visual quality, prompt adherence, identity preservation, naturalness, and visual text generation, among others. We employ diverse approaches, such as sourcing billions of images and identifying suitable ones through a combination of manual annotations and signals from machine learning models. We also utilize both manual and automated evaluation methods to pinpoint quality gaps and data requirements.
Job Responsibilities:
Data Curation: Manage data labeling workflows, including data enqueueing for labeling, UI for labeling, and extracting labels into datasets for the modeling team.
Data Engineering (Pipelines): Maintain large-scale, efficient, and reliable data processing pipelines (billions of images). This includes data sourcing, running machine learning models to understand content, and using LLMs to clean data.
Data Engineering (Governance): Maintain our portfolio of datasets, ensuring governance of access, retention, and privacy compliance.
Despite rapid advancements in generative AI, achieving high-quality generation remains a challenge. This is primarily due to the scarcity of high-quality training data and the lack of reliable and robust evaluation metrics that can effectively capture subtle details in model outputs, which can significantly impact user experience.
Our team is dedicated to developing comprehensive data curation and evaluation solutions to enhance our model across various quality dimensions. These include visual quality, prompt adherence, identity preservation, naturalness, and visual text generation, among others. We employ diverse approaches, such as sourcing billions of images and identifying suitable ones through a combination of manual annotations and signals from machine learning models. We also utilize both manual and automated evaluation methods to pinpoint quality gaps and data requirements.
Job Responsibilities:
Data Curation: Manage data labeling workflows, including data enqueueing for labeling, UI for labeling, and extracting labels into datasets for the modeling team.
Data Engineering (Pipelines): Maintain large-scale, efficient, and reliable data processing pipelines (billions of images). This includes data sourcing, running machine learning models to understand content, and using LLMs to clean data.
Data Engineering (Governance): Maintain our portfolio of datasets, ensuring governance of access, retention, and privacy compliance.
Additional Responsibilities:
Annotations:
Spend time manually annotating training data based on modeling team requirements.
Use of LLMs and other models to annotate training data or to evaluate generated content. Then apply auditing to understand these model performances.
Analysis: Collaborate with engineers to identify and summarize model gaps based on evaluations. Utilize these findings to identify necessary data, and then mine and prepare that data for subsequent model training iterations.
Auditing: Scale validated evaluation protocols with PDO teams, including coordination and auditing. Also, audit and correct human-labeled data.
Annotations:
Spend time manually annotating training data based on modeling team requirements.
Use of LLMs and other models to annotate training data or to evaluate generated content. Then apply auditing to understand these model performances.
Analysis: Collaborate with engineers to identify and summarize model gaps based on evaluations. Utilize these findings to identify necessary data, and then mine and prepare that data for subsequent model training iterations.
Auditing: Scale validated evaluation protocols with PDO teams, including coordination and auditing. Also, audit and correct human-labeled data.
Minimum Qualifications:
Verbal and written communication skills, problem solving skills, and interpersonal skills.
Attention to details and an aptitude to experimental investigations
Basic ability to work independently and manage one’s time.
Basic knowledge of Python, and SQL.
Basic knowledge of computer vision and generative models.
Basic knowledge with data ETL workflows & pipelines.
Verbal and written communication skills, problem solving skills, and interpersonal skills.
Attention to details and an aptitude to experimental investigations
Basic ability to work independently and manage one’s time.
Basic knowledge of Python, and SQL.
Basic knowledge of computer vision and generative models.
Basic knowledge with data ETL workflows & pipelines.
Usage of LLM for data labeling related work.
Education/Experience:
Associate's degree or equivalent training required in Computer Science, Electronic Engineering, Physics, Bioinformatics, or other STEM subjects.
Prior industrial experience in software development and testing and / or research experience in human computer interaction are preferred.
Worked at Meta before is preferred
Additional requirements:
Be onsite, in Menlo Park, CA, working with engineers
Education/Experience:
Associate's degree or equivalent training required in Computer Science, Electronic Engineering, Physics, Bioinformatics, or other STEM subjects.
Prior industrial experience in software development and testing and / or research experience in human computer interaction are preferred.
Worked at Meta before is preferred
Additional requirements:
Be onsite, in Menlo Park, CA, working with engineers
Primary Location Pay Range: $27.00 - $32.00 per hour
Benefits:
1099 – No Benefits
Temp hourly employee benefits, if scheduled to work at least 30 hours per week: medical, dental, vision, 401k.
Temp salaried employee benefits, if scheduled to work at least 30 hours per week: medical, dental, vision, 401k, holidays.
Temp salaried employee benefits, if scheduled to work at least 30 hours per week: medical, dental, vision, 401k, holidays.
iSoftStone is a global IT service and consulting company that creates value and drives success through technology solutions, service excellence, and digital innovation. We specialize in web and application development, software testing and support, data and content management, digital experience, accessibility, and data for machine learning and AI. With 20 delivery centers and more than 90,000 employees worldwide, iSoftStone is proud to serve some of the world’s most well-known businesses, including 90+ Fortune Global 500 companies.
iSoftStone is committed to the practice of equal opportunity for all its employees and applicants in employment, and does not discriminate on the basis of race or ethnicity, color, age, national origin, religion, creed, marital status, sex, pregnancy, gender, gender identity, sexual orientation, status as an honorably discharged veteran or disabled veteran or military status, political affiliation or belief, citizenship/status as a lawfully admitted immigrant authorized to work in the United States, or presence of any physical, sensory, or mental disability. In addition, reasonable accommodation will be made for known physical or mental limitations for all otherwise qualified persons with disabilities.