INTERNSHIP DETAILS

Summer Research Intern

CompanyAbaka AI
LocationMountain View
Work ModeOn Site
PostedMay 6, 2026
Internship Information
Core Responsibilities
Interns will design and construct high-quality datasets and benchmarks across areas like LLM reasoning, vision, video, and 3D/4D perception, while also evaluating various multimodal models on reasoning and temporal tasks.
Internship Type
full time
Salary Range
$25 - $60
Company Size
346
Visa Sponsorship
No
Language
English
Working Hours
40 hours
Apply Now →

You'll be redirected to
the company's application page

About The Company
Abaka AI delivers human intelligence data for frontier AI, helping power advanced systems through a trustworthy data platform. Abaka is all you need when building AI, providing comprehensive data processing support across the entire AI data lifecycle. Its core services include high-quality off-the-shelf datasets, data cleaning, data collection, data annotation, model evaluation, and self-developed data annotation tools.
About the Role
Our Recent Related Work
About the Role
We’re looking for Summer Research Interns to help build high-quality datasets, benchmarks, and evaluation pipelines across LLMs, vision, video, 3D/4D, multimodal reasoning, agentic systems, and world models.
In this role, you’ll work closely with our internal research team and external collaborators from the 2077AI Foundation, contributing to research artifacts that are actively used by leading AI labs and academic groups. This internship is ideal for students passionate about evaluation science, dataset construction, and applied AI research at scale.
 
Responsibilities
  • Design and construct high-quality datasets and benchmarks for one or more of the following areas:
    • LLM reasoning and QA (graduate / PhD-level difficulty)
    • Vision and vision-language modeling
    • Video understanding, temporal reasoning, and multimodal QA
    • 3D/4D perception, embodied AI, and spatial reasoning
  • Evaluate LLMs, VLMs, Video-LLMs, and multimodal models on reasoning, factuality, temporal understanding, and spatial tasks.
  • Develop and maintain evaluation pipelines, metrics, and quality-control criteria for expert-level data generation.
  • Analyze model outputs, conduct error taxonomy and failure analysis, and summarize insights for internal reports and research papers.
  • Support research on long-context modeling, data efficiency, compression strategies, and benchmark standardization.
  • Contribute to open-source datasets, benchmarks, and public leaderboards in collaboration with the 2077AI Foundation.
 
Qualifications
  • Strong background in computer science, artificial intelligence, robotics, data engineering, or related fields.
  • Hands-on experience with machine learning or multimodal systems, including LLMs, vision models, or video models.
  • Proficient in Python; experience with PyTorch or similar frameworks.
  • Strong analytical reasoning skills and ability to reason about model behavior and data quality.
  • Excellent written and verbal English communication skills.
 
Preferred Qualifications
  • Experience with LLM or multimodal evaluation frameworks (e.g., LM Eval Harness, OpenCompass).
  • Background in computer vision, video understanding, or multimodal learning.
  • Experience with 3D/4D data pipelines, graphics, or robotics tools (e.g., Blender, COLMAP, PyTorch3D, Open3D).
  • Familiarity with NeRFs, Gaussian Splatting, SLAM, or embodied AI datasets and simulators.
  • Experience with video QA, action recognition, or long-context transformer models.
  • Relevant research experience or publications in top-tier conferences.
 
Compensation & Benefits
This is a paid internship, with a compensation range of $25–$60 per hour, depending on experience and qualifications. This will be an onsite internship based in our Palo Alto office.
Interns will work directly with experienced researchers, contribute to high-impact open-source benchmarks and datasets, and gain high-ownership experience shaping evaluation pipelines used by real AI teams. Exceptional performance may lead to future consideration for full-time opportunities.
Key Skills
PythonPyTorchLLM ReasoningVision ModelingVideo Understanding3D/4D PerceptionEmbodied AIMultimodal ReasoningDataset ConstructionBenchmark DevelopmentEvaluation PipelinesError TaxonomyLong-Context ModelingAgentic SystemsWorld Models
Categories
Science & ResearchSoftwareData & AnalyticsEngineering
Benefits
Paid InternshipFuture Consideration For Full-Time Opportunities