How often are new jobs added?

New internships are added continuously throughout the day. We monitor company career pages and hiring tools in real time, so most roles appear on InternshipsHQ shortly after they’re posted.

Why can I only view a limited number per day on the free plan?

The free plan is designed to let you explore the platform and see how fresh the listings are. Limits help us prioritize serious applicants and keep the signal high. Pro removes these limits and gives you full access.

What sources do you pull internships from?

We source internships directly from company career pages, startup hiring platforms, and applicant tracking systems like Greenhouse, Lever, Ashby, Workable, and others. This helps us surface roles earlier and avoid repost-heavy job boards.

Are these internships legitimate?

Yes. We actively filter out expired, duplicate, misleading, and low-quality listings. Each role is checked for legitimacy and posting freshness so you’re not wasting time on dead or fake opportunities.

Can I filter by location, experience level, remote, or salary?

Yes. You can filter internships by role, location, experience level, remote or hybrid status, and pay when available. Pro users get access to more advanced and saved filters.

Why are early alerts so important?

For internships, timing matters more than volume. Most callbacks happen when roles are still new and applicant pools are small. Early alerts help you apply before listings get flooded.

Do you show salary or pay information?

When companies include pay information, we display it clearly. Not all internships list salary upfront, but we prioritize transparency whenever the data is available.

Is InternshipsHQ suitable for senior or experienced roles?

InternshipsHQ is primarily built for internships, entry-level, and early-career roles. If you’re looking for senior or leadership positions, you may find limited results here.

INTERNSHIP DETAILS

Cloud and AI System Intern

CompanyIntel Corporation

LocationMinhang District

Work ModeOn Site

PostedApril 24, 2026

Internship Information

Core Responsibilities

The intern will research system reliability, specifically focusing on silent data error characterization and mitigation for AI and general-purpose compute platforms. Responsibilities include analyzing platform telemetry, designing fault injection experiments, and developing tools to improve data integrity across the hardware and software stack.

Internship Type

full time

Company Size

109101

Visa Sponsorship

Language

English

Working Hours

40 hours

Apply Now →

You'll be redirected to
the company's application page

About The Company

Our mission is to shape the future of technology to help create a better future for the entire world, that’s the power of Intel Inside. With more ingenuity and creativity inside, our work is at the heart of countless innovations. From major breakthroughs to things that make everyday life better— they’re all powered by Intel technology. With a career at Intel, you can help make the future more wonderful for everyone. • Need help or have a support question? Visit Intel Support: http://ms.spr.ly/6054tmaop .

About the Role

Job Details:

Job Description:

In this position, you will work with a system reliability research team focusing on RAS (Reliability, Availability, Serviceability) and silent data error (SDE) characterization and mitigation on AI and general-purpose compute platforms, including heterogeneous systems (CPU + GPU/accelerators) and large-scale server clusters. You will help design and run experiments under representative AI training/inference and cloud workloads, analyze fleet-scale logs/telemetry, and prototype detection/diagnosis methods to improve end-to-end data integrity and platform robustness across the HW/FW/OS/runtime stack.
Your responsibilities will include but not be limited to:
-Collect, clean, and analyze platform telemetry / error logs from CPU servers and accelerator-enabled nodes (e.g., memory/DDR/HBM, storage, interconnect, PCIe/CXL, fabrics) to identify error signatures and failure patterns.
-Design and execute fault injection, stress tests, or workload-driven experiments to reproduce silent data corruption scenarios for AI training/inference and general compute workloads, and validate hypotheses.
-Research and analyze in-field scan and lockstep mode features (coverage, limitations, trigger conditions, and impact on AI/CPU workloads), and help evaluate how they can be leveraged to improve silent error detection and data integrity in production.
-Research and analyze Silicon Lifecycle Management (SLM) solutions, and integrate them with platform telemetry to enable in-field health monitoring, degradation/trend analysis, and proactive reliability improvements for AI/CPU platforms.
-Develop scripts/tools (Python preferred) to automate data processing, experiment orchestration, and report generation; build dashboards or repeatable pipelines when needed.
-Study and evaluate mitigation techniques for AI + CPU platforms (e.g., ECC/CRC/EDAC, scrubbing policies, retry/recovery, checkpoint/restart, end-to-end checks at data/communication boundaries) and quantify effectiveness vs. performance/cost impact.
-Collaborate with cross-functional teams (HW, FW, OS, driver/runtime, datacenter operations) to trace error propagation paths and drive actionable improvements; document findings and present progress regularly.

Qualifications:

Cloud and AI System Engineering Intern

Description
In this position, you will work on a system reliability research topic with platform engineering team focusing on RAS (Reliability, Availability, Serviceability) and silent data error (SDE) characterization and mitigation on AI and general-purpose compute platforms, including heterogeneous systems (CPU + GPU/accelerators) and large-scale server clusters. You will help design and run experiments under representative AI training/inference and cloud workloads, analyze fleet-scale logs/telemetry, and prototype detection/diagnosis methods to improve end-to-end data integrity and platform robustness across the HW/FW/OS/runtime stack.
Your responsibilities will include but not be limited to:
-Collect, clean, and analyze platform telemetry / error logs from CPU servers and accelerator-enabled nodes (e.g., memory/DDR/HBM, storage, interconnect, PCIe/CXL, fabrics) to identify error signatures and failure patterns.
-Design and execute fault injection, stress tests, or workload-driven experiments to reproduce silent data corruption scenarios for AI training/inference and general compute workloads, and validate hypotheses.
-Research and analyze in-field scan and lockstep mode features (coverage, limitations, trigger conditions, and impact on AI/CPU workloads), and help evaluate how they can be leveraged to improve silent error detection and data integrity in production.
-Research and analyze Silicon Lifecycle Management (SLM) solutions, and integrate them with platform telemetry to enable in-field health monitoring, degradation/trend analysis, and proactive reliability improvements for AI/CPU platforms.
-Develop scripts/tools (Python preferred) to automate data processing, experiment orchestration, and report generation; build dashboards or repeatable pipelines when needed.
-Study and evaluate mitigation techniques for AI + CPU platforms (e.g., ECC/CRC/EDAC, scrubbing policies, retry/recovery, checkpoint/restart, end-to-end checks at data/communication boundaries) and quantify effectiveness vs. performance/cost impact.
-Collaborate with cross-functional teams (HW, FW, OS, driver/runtime, datacenter operations) to trace error propagation paths and drive actionable improvements; document findings and present progress regularly.

Qualifications
Preference will be given to candidates who are interested in system reliability / data integrity research on AI and general-purpose compute platforms. The qualifications include but not limited to:

-PHD students (CS/CE/EE/Math/Statistics or related majors).

-Solid programming skills in Python; experience with Linux and basic scripting; familiarity with Github copilot is a plus.

-Strong data analysis skills; experience with pandas/numpy/matplotlib, SQL, or log analytics is a plus.

-Basic understanding of computer architecture and systems (memory hierarchy, storage, networking) is preferred; familiarity with RAS concepts (ECC, CRC, parity, scrubbing, checkpoints) is a plus.

-Understanding of AI system stack is a plus: GPU/accelerators, driver/runtime, distributed training/inference, communication collectives, data pipelines, and performance/reliability trade-offs.

-Good Mandarin and English communication skills are required, in both verbal and written.

-Research mindset: ability to form hypotheses, design experiments, and write clear technical reports.

Job Type:

Student / Intern

Shift:

Shift 1 (China)

Primary Location:

PRC, Shanghai

Additional Locations:

Business group:

The Sales and Marketing Group (SMG) leverages the product portfolio to drive Intel's revenue growth and market expansion, blending strategic initiatives with dynamic sales efforts to capture and retain customers. SMG is responsible for empowering the sales force with tools and insights needed to close deals and build lasting customer relationships. Sales analytics and market research ensure strategies are both targeted and impactful. In SMG, disciplined execution, creativity, and ambition are celebrated, providing ample opportunities for career advancement and skill development.

Posting Statement:

All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.

Position of Trust

N/A

Work Model for this Role

This role will require an on-site presence. * Job posting details (such as work model, location or time type) are subject to change.

ADDITIONAL INFORMATION: Intel is committed to Responsible Business Alliance (RBA) compliance and ethical hiring practices. We do not charge any fees during our hiring process. Candidates should never be required to pay recruitment fees, medical examination fees, or any other charges as a condition of employment. If you are asked to pay any fees during our hiring process, please report this immediately to your recruiter.

Key Skills

PythonData AnalysisComputer ArchitectureLinuxSystem ReliabilityTelemetry AnalysisFault InjectionAI TrainingInferencePandasNumpyMatplotlibSQLRAS ConceptsSilicon Lifecycle Management