INTERNSHIP DETAILS

AI/ML Data Scientist Intern

CompanyCommand Post Technologies, Inc.
LocationSuffolk
Work ModeOn Site
PostedMay 8, 2026
Internship Information
Core Responsibilities
Develop and fine-tune large language models and implement retrieval-augmented generation pipelines. Curate high-quality datasets and build agentic AI workflows to support organizational objectives.
Internship Type
full time
Company Size
112
Visa Sponsorship
No
Language
English
Working Hours
40 hours
Apply Now →

You'll be redirected to
the company's application page

About The Company
We are Command Post Technologies, Inc. (CPT). Headquartered out of Suffolk, VA in 2008, CPT has grown to a widespread, national scale having a presence in Orlando, FL, Chicago, IL, and Norfolk, VA. CPT is a Service-Disabled, Veteran-Owned Small Business (SDVOSB), providing engineering services in the areas of Cyber Security, Software Development, Test & Evaluation, and Strategic Planning. Over the years CPT has cultivated a dynamic work environment through developing a strong culture rooted in our core principles of integrity, determination, and innovation. In all of CPT’s collaboration efforts, our team prioritizes communication, accountability, and being resourceful in order to maximize efficiency and results. The diversity of our team is one of our greatest strengths. CPT consists of individuals with a proven record of their own wide-ranging accomplishments. These experiences include tactical and technical assignments serving in various elite units, in support of global contingency operations throughout the Afghanistan and Iraq theaters and elsewhere. Additionally, niche experience in compartmented special operations and intelligence activities allow us to provide in depth operational and analytical support to strategic, operational and tactical planning.
About the Role

Description

We are looking for a curious and driven AI/ML Data Scientist Intern to join our team in Suffolk, Virginia. This internship offers a hands-on opportunity for students or early-career professionals with a foundation in Computer Science to gain real-world experience in artificial intelligence, machine learning, and data science. You will work alongside experienced engineers and data professionals to build, fine-tune, and deploy machine learning models, construct retrieval-augmented generation pipelines, and curate high-quality datasets that support organizational objectives. 


What You’ll Do 

  • Assist in the development and fine-tuning of large language models using techniques such as LoRA to optimize model performance for specific use cases. 
  • Support the design and implementation of retrieval-augmented generation (RAG) pipelines to enhance AI-driven applications with relevant, contextual data. 
  • Curate, clean, and prepare datasets for training and evaluation, ensuring data quality and relevance across projects. 
  • Work with embedding models to convert text and documents into vector representations for search and retrieval systems. 
  • Develop and refine chunking strategies for processing large documents into manageable, semantically meaningful segments. 
  • Extract structured information from unstructured documents using automated document extraction techniques. 
  • Build and experiment with agentic AI workflows that enable autonomous task execution and decision-making. 
  • Contribute to front-end interfaces and internal tools using HTML, JavaScript, and React to support data visualization and model interaction. 
  • Document processes, experiments, and findings for internal knowledge sharing and reproducibility. 

Requirements

To be considered for this position, candidates must demonstrate foundational knowledge in the following areas: 

  • Linux Foundations – Basic understanding of Linux operating systems, including file system navigation, user management, permissions, and command-line operations. 
  • Python Basics – Foundational proficiency in Python programming, including the ability to write scripts, work with libraries, manipulate data structures, and debug code. 
  • Agentic AI – Familiarity with the concepts and architecture behind agentic AI systems, including how autonomous agents plan, reason, and execute multi-step tasks. 
  • Hugging Face – Experience navigating the Hugging Face ecosystem, including the ability to load pre-trained models, tokenizers, and datasets from the Hugging Face Hub. 
  • Dataset Curation – Understanding of how to source, clean, label, and organize datasets for machine learning training and evaluation purposes. 
  • LoRA Fine-Tuning – Knowledge of Low-Rank Adaptation (LoRA) techniques for efficiently fine-tuning large language models with reduced computational overhead. 
  • RAG Pipelines – Understanding of retrieval-augmented generation architecture, including how to connect language models with external knowledge sources to improve response accuracy. 
  • Document Extraction – Familiarity with techniques and tools for extracting structured data from unstructured documents such as PDFs, scanned images, and web pages. 
  • Chunking Strategies – Knowledge of methods for splitting large documents into smaller, semantically coherent segments optimized for embedding and retrieval. 
  • Embedding Models – Understanding of how text embedding models work and how they are used to represent documents as vectors for similarity search and retrieval applications. 
  • Basic Networking – Understanding of core networking concepts including IP addresses, subnetting, the OSI model, and the functional differences between Layer 2 and Layer 3 protocols. 
  • Azure Virtual Desktop Concepts – Familiarity with Azure Virtual Desktop components, including Host Pools, Workspaces, and Application Groups. 
  • HTML, JavaScript, React – Foundational knowledge of front-end web technologies, including the ability to read and understand HTML structure, JavaScript logic, and React component architecture. 

 

Nice to Have 

The following skills are not required but would strengthen your candidacy: 

  • Vector Databases – Experience working with vector database platforms such as Pinecone, Weaviate, or ChromaDB for storing and querying high-dimensional embeddings. 
  • LangChain or LlamaIndex – Familiarity with orchestration frameworks used to build applications powered by large language models. 
  • Prompt Engineering – Knowledge of techniques for crafting effective prompts to guide large language model behavior and improve output quality. 
  • MLOps and Model Deployment – Experience with tools and workflows for packaging, deploying, and monitoring machine learning models in production environments. 
  • Docker & Containerization – Basic understanding of container concepts and experience running applications in Docker or Kubernetes environments. 
  • Transformer Architectures – Understanding of the transformer model architecture, including self-attention mechanisms and how they power modern language models. 
  • Data Annotation and Labeling – Experience with data annotation workflows and labeling tools used to prepare supervised learning datasets. 
  • Evaluation Metrics for Generative AI – Knowledge of how to assess the quality of generative AI outputs using metrics such as BLEU, ROUGE, perplexity, or human evaluation frameworks. 
  • Cloud Platforms for ML Workloads – Exposure to cloud-based machine learning services on AWS, GCP, or Azure for training, hosting, and scaling models. 
  • Version Control Systems (Git) – Familiarity with Git workflows for managing code, collaborating with teams, and tracking project history. 
  • Microsoft EntraID – Familiarity with Microsoft’s identity and access management platform for managing user authentication and permissions. 
  • API Calls – Experience making and testing API calls using tools such as Postman, cURL, or similar utilities. 
  • Azure Services – Broader exposure to Azure services beyond the fundamentals, such as Azure Storage, Azure Networking, or Azure Active Directory. 
  • Node.js / .NET API – Experience building or consuming APIs using Node.js or the .NET framework. 
  • Azure Serverless Functions – Familiarity with event-driven, serverless computing in Azure for running lightweight backend processes. 
  • Visio or Other Drawing Application – Ability to create data flow diagrams, system architecture visuals, or workflow documentation using Microsoft Visio or comparable tools such as draw.io or Lucidchart. 

  

About us: We are Command Post Technologies, Inc. (CPT). CPT is a Service-Disabled, Veteran-Owned Small Business (SDVOSB), providing engineering services in the areas of Cyber Security, Software Development, Test & Evaluation, and Strategic Planning. CPT employees appreciate working in a caring environment that promotes a healthy work-life balance. As individuals, we come together as a team, supporting a culture rooted in our core principles of integrity, determination, and innovation. In all of CPT’s collaboration efforts, our team prioritizes communication, accountability, and being resourceful in order to maximize efficiency and results.


What’s In It for You

  • Leadership training
  • Career professional development
  • Work/Life balance
  • Rewards and recognition  

Command Post Technologies, Inc. (CPT) is a Service-Disabled Veteran-Owned Small Business (SDVOSB) founded in 2008 and headquartered in Suffolk, VA with personnel in various states including Virginia, Maryland, Florida, and Texas. With 2/3 of our staff being former military, CPT firmly believes in employing veterans. Command Post Technologies, Inc. is a unique provider of innovative solutions that enhance our corporate clients’ productivity and empower our government clients with the ability to protect against all enemies: foreign and domestic. CPT adapts its successful military experiential approach to the needs of leaders in a global business environment and provides an elite leadership curriculum that results in a world-class, leadership-altering event.


Command Post Technologies Inc. (CPT) is an Equal Employment Opportunity and Affirmative Action employer. We consider applicants without regard to race, color, religion, age, national origin, ancestry, ethnicity, gender, gender identity, gender expression, sex, sexual orientation, marital status, veteran status, disability, genetic information, citizenship status, or membership in any other group protected by federal, state, or local law. We take Affirmative Action to ensure equal opportunities for employees and potential employees without regard to race, color, religion, age, national origin, ancestry, ethnicity, gender, gender identity, gender expression, sex, sexual orientation, marital status, veteran status, disability genetic information, citizenship status, or membership in any other group protected by federal, state, or local law.


We abide by the Pay Transparency Nondiscrimination Provision and will refrain from discharging or otherwise discriminating against employees or applicants who inquire about, discuss, or disclose their compensation or the compensation of other employees or applicants. An exception exists where the employee or applicant makes the disclosure based on information obtained while performing his or her essential job functions.

Key Skills
PythonLarge Language ModelsLoRA Fine-TuningRAG PipelinesHugging FaceAgentic AIDataset CurationEmbedding ModelsDocument ExtractionLinuxReactJavaScriptHTMLAzure Virtual DesktopBasic NetworkingVector Databases
Categories
Data & AnalyticsTechnologySoftwareEngineeringScience & Research
Benefits
Leadership TrainingCareer Professional DevelopmentWork/Life BalanceRewards And Recognition