We are looking for a highly skilled (Senior Data Collection Lead -AI Training with proven experience in managing large-scale data collection projects for AI/LLM training. This role involves leading a team of data collectors, defining data quality standards, and ensuring that the datasets we build are accurate, diverse, and compliant with ethical and legal requirements. You will work closely with ML engineers, researchers, and project managers to create and maintain high-quality training data pipelines that directly shape the performance of our language models.
Key Responsibilities :
· Lead, mentor, and manage a team of data collectors and annotators.
· Define and implement best practices for data gathering, curation, and quality control.
· Design scalable workflows for collecting, annotating, and cleaning large text datasets.
· Collaborate with ML engineers and researchers to align data strategy with model requirements.
· Ensure compliance with data privacy, copyright, and ethical AI standards.
· Track progress of the data collection team, assign tasks, and maintain project timelines.
· Review and audit datasets for accuracy, consistency, and completeness.
· Continuously improve processes by identifying new data sources, tools, and methodologies.
Qualifications:
· 3+ years of hands-on experience in data collection, annotation, or dataset management for AI/ML projects.
· Proven track record of leading or managing small-to-medium teams.
· Strong understanding of LLM training data needs (text diversity, balance, quality).
· Excellent organizational, project management, and leadership skills.
· Familiarity with tools for annotation, data cleaning, and workflow management.
· Strong written and verbal communication skills.
· Bachelor’s or Master’s degree in Computer Science, Data Science, Linguistics, or related field (or equivalent experience).
Preferred Skills
· Experience with Python, regex, or scripting for large-scale data processing.
· Knowledge of data governance, copyright law, and responsible AI practices.
· Prior experience in designing and scaling annotation or data-labeling pipelines.
· Multilingual background or experience working with cross-lingual datasets.
What We Offer:
· Leadership opportunity in a cutting-edge AI/LLM project.
· Chance to shape data practices that directly influence AI model performance.
· Competitive compensation and growth opportunities.
· Collaborative, mission-driven work environment.
More Information
- Salary Offer The salary based on the experience
USD The salary based on the experience Month How to apply Send your updated CV to ([email protected]) and mention the position in the subject field.