INTERNSHIP DETAILS

Hunyuan Multimodal Reinforcement Learning Research Intern

CompanyTencent
LocationPalo Alto
Work ModeOn Site
PostedMarch 10, 2026
Internship Information
Core Responsibilities
The role involves conducting research on Reinforcement Learning algorithms for multimodal models, including diffusion and autoregressive models, and designing RL infrastructure and reward modeling strategies for efficient large-scale training. Responsibilities also include exploring next-generation RL paradigms that learn effectively from environment feedback.
Internship Type
full time
Salary Range
$80,168 - $124,800
Company Size
89043
Visa Sponsorship
No
Language
English
Working Hours
40 hours
Apply Now →

You'll be redirected to
the company's application page

About The Company
Tencent is a world-leading internet and technology company that develops innovative products and services to improve the quality of life of people around the world. Founded in 1998 with its headquarters in Shenzhen, China, Tencent's guiding principle is to use technology for good. Our communication and social services connect more than one billion people around the world, helping them to keep in touch with friends and family, access transportation, pay for daily necessities, and even be entertained. Tencent also publishes some of the world's most popular video games and other high-quality digital content, enriching interactive entertainment experiences for people around the globe. Tencent also offers a range of services such as cloud computing, advertising, FinTech, and other enterprise services to support our clients' digital transformation and business growth. Tencent has been listed on the Stock Exchange of Hong Kong since 2004.
About the Role

Business Unit

What the Role Entails

Responsibilities:

1. Conduct research on RL algorithms for multimodal models, including diffusion models for image, video, and 3D generation, autoregressive models for multimodal understanding, and potentially unified multimodal frameworks.

2. Design and develop RL infrastructure and reward modeling strategies to enable efficient large-scale training, improve training stability, and mitigate reward hacking and related failure modes.

3. Explore next-generation RL paradigms that more directly and effectively learn from environment feedback.

Who We Look For

Requirements:

1. Currently enrolled as a PhD student in Computer Science or a closely related field.

2. Demonstrated strong research capability, with publications in top-tier conferences such as ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, SIGGRAPH.

3. Strong hands-on programming skills, with solid experience in deep learning system implementation, model training and inference optimization, CPU/GPU acceleration, and distributed training and inference.

4. Prior experience with diffusion models, autoregressive models, and/or text-to-image or text-to-video generation is highly preferred.

5. Participation in ACM/NOIP is a strong plus.

Location State(s)

US-California-Palo Alto

The expected base pay range for this position in the location(s) listed above is $80,168.40 to $124,800.00 per year. Actual pay may vary depending on job-related knowledge, skills, and experience. This position will be eligible for 1 hour of paid sick leave for every 30 hours worked and up to 13 paid holidays throughout the calendar year. Subject to the terms and conditions of the applicable plans then in effect, full-time interns are also eligible to enroll in the Company-sponsored medical plan.

Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

Key Skills
Reinforcement LearningMultimodal ModelsDiffusion ModelsAutoregressive ModelsReward ModelingDeep Learning System ImplementationModel Training OptimizationInference OptimizationCpu/Gpu AccelerationDistributed TrainingImage GenerationVideo Generation3D Generation
Categories
Science & ResearchEngineeringSoftwareData & Analytics
Benefits
Paid Sick LeavePaid HolidaysMedical Plan