Every Meta AI Model Explained: Llama, SAM, Emu, AudioCraft, SeamlessM4T and More — The Complete 2023–2026 Guide

Complete guide to every Meta AI model — Llama 1 through 3.3, Code Llama, SAM 2, ImageBind, Emu, MusicGen, SeamlessM4T — with comparisons, use cases, and key differences explained clearly.

Meta has quietly built one of the most consequential AI portfolios in the world — and given most of it away for free. Here is every model they have released, what each one does, and why the open-source strategy behind it changes everything.

Introduction

When most people think of Meta, they think of Facebook, Instagram, and WhatsApp. When AI researchers think of Meta, they think of something different: the company that sparked the open-source AI revolution, released a 405-billion-parameter model for free, built the tool that lets you isolate any object in any image with a single click, and published a model that understands six completely different types of sensory data simultaneously.

Meta's AI ambition runs deeper and wider than its social media reputation suggests. The company's AI research division — FAIR (Fundamental AI Research), founded in 2013 — has produced a stream of genuinely influential work that has shaped how the entire industry builds and deploys AI systems.

The thread connecting all of it is a strategic bet that Mark Zuckerberg has made personally and publicly: open-source AI is the right path forward. Where OpenAI and Anthropic build closed, proprietary systems, Meta releases model weights, training code, and safety tools for anyone to download, modify, and deploy commercially. That choice has made Llama — Meta's flagship language model family — the most widely used open-weight model in the world, with over 350 million downloads and more than 100,000 companies running it across their products.

This guide covers every model Meta has released across all categories — language, vision, audio, translation, safety — and explains what each one is, what makes it significant, and where it fits in the broader picture.

Meta's AI Model Families at a Glance

Family	Type	Open Source	Key Capability
Llama 1 to 3.3	Language models	Yes	Text generation, reasoning, coding
Code Llama	Code-specialized LLM	Yes	Code completion, generation, 100K context
Llama Guard 1 to 3	Safety classifier	Yes	Content safety classification
Prompt Guard	Security classifier	Yes	Prompt injection detection
Emu	Image generation	Partial	Text-to-image, image editing, video
SAM and SAM 2	Image and video segmentation	Yes	Segment any object in images and video
ImageBind	Multi-modal embeddings	Yes (research)	Six-modality unified understanding
AudioCraft	Audio and music generation	Yes	Music, sound effects, audio compression
SeamlessM4T	Speech and text translation	Yes (research)	100+ language speech translation
DINOv2	Vision foundation model	Yes	Self-supervised visual features
Meta AI Assistant	AI assistant product	No	Consumer AI across Meta platforms

Part One: The Llama Language Model Family

The Llama series is Meta's flagship contribution to the AI landscape and the foundation of the open-source LLM ecosystem. Understanding each generation requires knowing not just what changed but why each release mattered beyond the numbers.

The Complete Llama Lineup

Model	Released	Sizes	Context Window	Key Advance
Llama 1	February 2023	7B, 13B, 33B, 65B	2K tokens	First public LLM from Meta, research only
Llama 2	July 2023	7B, 13B, 34B, 70B	4K tokens	Commercial license, RLHF, 2T tokens
Code Llama	August 2023	7B, 13B, 34B, 70B	100K tokens	Code specialist, massive context
Llama 3	April 2024	8B, 70B	8K tokens	15 trillion token training, new tokenizer
Llama 3.1	July 2024	8B, 70B, 405B	128K tokens	Frontier-level 405B, 16x context jump
Llama 3.2	September 2024	1B, 3B, 11B, 90B	128K tokens	First vision models, edge deployment
Llama 3.3	December 2024	70B	128K tokens	405B quality in a 70B model

Llama 1 — The Model That Started Everything

Released: February 2023 Sizes: 7B, 13B, 33B, 65B Context: 2K tokens License: Research only (non-commercial) Training data: 1.4 trillion tokens

Before Llama 1, running a capable large language model required access to proprietary APIs or specialized hardware. Meta changed that equation by releasing model weights directly to researchers — the actual parameters of the model itself, not just an interface to query it.

The capability numbers were immediately striking. Llama 1's 13B model outperformed OpenAI's GPT-3 at 175B parameters on several benchmarks — a model thirteen times smaller achieving better results. This demonstrated something the field had suspected but not proven at this scale: a model trained longer on carefully selected data could match or beat a much larger model trained on less. The principle became foundational to everything that followed.

The data sources — CommonCrawl, C4, GitHub, Wikipedia, ArXiv, Stack Exchange, and books — were entirely publicly available. This meant researchers could understand exactly what the model had learned from, which was itself a significant advance in model transparency.

Llama 1 was research-only, but that did not stop it from transforming the field. Within weeks of release, the model weights were leaked online, which dramatically accelerated community access. Within months, fine-tuned derivatives — Alpaca from Stanford, Vicuna from UC Berkeley, and dozens more — had demonstrated that a relatively small team with modest compute could produce genuinely useful instruction-following models by fine-tuning Llama 1. The open-source AI ecosystem as it currently exists traces directly to this moment.

Llama 2 — The Commercial Breakthrough

Released: July 2023 Sizes: 7B, 13B, 34B, 70B Context: 4K tokens License: Meta custom commercial license (free for most companies) Training data: 2 trillion tokens Distribution partner: Microsoft

Llama 2 addressed the single largest limitation of its predecessor: commercial use. The new license allowed most businesses to deploy Llama 2 in production applications without royalties or fees, subject to Meta's usage policy and a threshold of 700 million monthly active users above which licensing terms differ.

The model itself improved substantially over Llama 1. Training data grew from 1.4 trillion to 2 trillion tokens — a 40% increase. Meta applied Reinforcement Learning from Human Feedback (RLHF) to the instruction-tuned variants, aligning the model's outputs more closely with what humans actually find helpful and appropriate. The Chat variants — Llama 2 Chat — became the most widely fine-tuned open models in history. The 70B model demonstrated performance competitive with GPT-3.5 on many evaluation categories.

Microsoft's involvement as primary distribution partner was strategically significant. Llama 2 became available through Azure AI immediately at launch, giving enterprise customers a path to run Meta's open model within infrastructure they already managed and trusted. AWS and Google Cloud followed, establishing the pattern of Llama availability across all major cloud platforms that continues today.

Code Llama — The Code Specialist

Released: August 2023 Based on: Llama 2 Sizes: 7B, 13B, 34B, 70B Context: 100K tokens License: Same as Llama 2

Code Llama's defining feature is not just its code specialization — it is the 100K token context window at a time when most available models were limited to 4K to 32K. One hundred thousand tokens is large enough to hold substantial codebases, entire application modules, or multiple files simultaneously in a single context. This made Code Llama genuinely useful for the kind of large-scale code understanding that code completion tools had previously handled poorly.

Three distinct variants serve different use cases. The base Code Llama handles code completion — filling in missing sections of code in the style and patterns of the surrounding context. The Instruct variant follows natural language instructions for coding tasks, suitable for building AI coding assistants that respond to developer requests. The Python-specialized variant concentrates its fine-tuning on Python specifically, producing stronger results for Python developers at the cost of breadth across other languages. Supported languages include Python, C++, Java, PHP, TypeScript, C#, and Bash.

The fill-in-the-middle capability — where the model receives code before and after a gap and generates what belongs in between — was particularly strong and directly applicable to IDE integration scenarios.

Llama 3 — The Training Data Leap

Released: April 2024 Sizes: 8B, 70B Context: 8K tokens License: Llama 3 Community License Training data: 15 trillion tokens

The headline number for Llama 3 is the training data: 15 trillion tokens, compared to 2 trillion for Llama 2. That is a 7.5-fold increase in the information the model learned from, and the quality of that data was also substantially improved through more aggressive filtering and curation. More data, better selected, trained more efficiently — the combination produced results that made the previous generation look immediately dated.

Two architectural changes compounded the data advantage. A new tokenizer with a 128,000-word vocabulary, compared to Llama 2's 32,000-word vocabulary, made the model significantly more efficient at representing text — particularly important for multilingual content and technical material. Grouped Query Attention (GQA) improved inference efficiency, allowing the model to run faster and at lower cost for a given quality level.

The practical results: Llama 3's 8B model outperformed Llama 2's 70B model on many standard benchmarks. A model nearly nine times smaller delivering better results is a concrete demonstration of how much training quality matters alongside raw scale. The 70B model was competitive with Gemini 1.5 Pro and Claude 3 Sonnet at the time of release, which placed an open-weight model in genuine contention with the most capable closed models available.

Llama 3.1 — The Frontier Moment

Released: July 2024 Sizes: 8B, 70B, 405B Context: 128K tokens License: Llama 3.1 Community License Training data: 15.6 trillion tokens

Llama 3.1 is the most consequential single release in the history of open-weight AI models, for one reason: the 405B variant.

Before Llama 3.1 405B, the largest publicly available model weights were a fraction of the size of what leading closed labs were running internally. The capability gap between open and closed AI was substantial and assumed to be structural. Llama 3.1 405B competed directly with GPT-4o and Claude 3.5 Sonnet on major benchmarks — the first open-weight model to do so convincingly. Meta itself described it as "the first frontier-level open-source AI model," and the benchmark numbers supported that framing.

At the same time, the context window jumped from 8K to 128K tokens across all sizes — a 16-fold increase that fundamentally changed what the models could handle in a single pass. Long documents, large codebases, extended research papers, and multi-document analysis tasks that previously required chunking and retrieval engineering could now be processed directly.

The 8B and 70B models also improved substantially over Llama 3. All three sizes support eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. Native tool use and function calling — capabilities needed for AI agents to take actions and call external systems — were built in rather than requiring fine-tuning.

The impact on the ecosystem was immediate. Cloud providers, model hosting companies, enterprise software platforms, and independent developers all adopted Llama 3.1 as a new baseline. The 405B model became available on AWS Bedrock, Azure AI, Google Cloud Vertex AI, and NVIDIA NIM simultaneously at launch.

Llama 3.2 — Vision Arrives

Released: September 2024 Sizes: 1B, 3B (text), 11B, 90B (vision) Context: 128K tokens License: Llama 3.2 Community License

Llama 3.2 introduced two things simultaneously: the smallest Llama models ever released, and the first Llama models with vision capability.

The 1B and 3B text models are designed for deployment scenarios where larger models are impractical — mobile devices, embedded applications, edge computing, and scenarios where memory and power consumption are constrained. Despite their small size, both carry the 128K token context window and deliver surprisingly capable performance on standard tasks. The 3B model outperforms the Llama 3.1 8B on several benchmarks — a smaller model achieving better results, echoing the pattern that has run through the entire Llama lineage.

The 11B and 90B vision models are the more significant addition. They accept both text and images as input, enabling tasks like visual question answering, chart and document interpretation, screenshot analysis, and image description. Meta collaborated with Qualcomm to optimize these models for on-device hardware, extending vision capability toward deployment scenarios where cloud connectivity is unavailable or undesirable.

Llama 3.3 — Maximum Efficiency

Released: December 2024 Size: 70B Context: 128K tokens License: Llama 3.3 Community License

Llama 3.3 makes a specific and practical argument: you do not need to run a 405B model to get 405B-class results. The 70B Llama 3.3 delivers performance on most benchmarks that is comparable to Llama 3.1 405B — at a fraction of the memory, compute, and cost requirements.

Running a 405B model requires substantial infrastructure: multiple high-end GPUs, significant RAM, and serious hardware investment. Running a 70B model is accessible to a much wider range of organizations and individual developers. Llama 3.3 essentially democratizes access to near-frontier performance by compressing it into a size that does not require a data center.

Part Two: Safety Models — The Purple Llama Initiative

Building capable AI models without building the safety infrastructure around them creates risk. Meta addressed this through Purple Llama — an umbrella safety initiative launched in December 2023 that open-sourced the tools needed for responsible LLM deployment.

Llama Guard — Content Safety Classification

Released: December 2023 (v1), April 2024 (v2), 2024 (v3)

Llama Guard is a safety classifier built on top of Llama models — not a conversational AI, but a specialized system that reads inputs and outputs and classifies them as safe or unsafe across defined categories including violence, hate speech, sexual content, and criminal activity.

Version 1 was built on Llama 2 7B. Version 2 upgraded to Llama 3 8B and aligned classifications with the MLCommons taxonomy — a standardized framework for describing AI safety categories that allows consistent evaluation across different systems. Version 3 added multilingual classification, extending safety coverage beyond English-only content.

For developers building applications on top of Llama or any other LLM, Llama Guard provides a ready-made safety layer that can run before and after model calls to flag problematic content without requiring the developer to build classification logic from scratch.

Prompt Guard — Injection Attack Detection

Released: 2024

Distinct from content safety, Prompt Guard addresses a different attack surface: prompt injection. This is the technique of embedding instructions within user input or retrieved documents that attempt to override a model's intended behavior — telling it to ignore its system prompt, reveal confidential information, or take unauthorized actions.

Prompt Guard classifies whether an incoming prompt contains injection or jailbreak attempts, providing a detection layer that developers can use to screen inputs before they reach the main model.

Code Shield — Insecure Code Detection

Code Shield specifically targets the security vulnerabilities that LLMs sometimes introduce into generated code. SQL injection, cross-site scripting (XSS), and other common vulnerability patterns appear in AI-generated code often enough to warrant dedicated detection. Code Shield flags these issues before generated code reaches production.

Part Three: Vision Models

Segment Anything Model (SAM and SAM 2)

SAM Released: April 2023 SAM 2 Released: July 2024 License: Apache 2.0 Training data (SAM): SA-1B — 11 million images, 1 billion masks

SAM may be the single most practically impactful model Meta has ever released outside of Llama. The capability it introduced sounds simple: point at any object in any image, and the model draws a precise boundary around it. The implications are far-reaching.

The "zero-shot" characteristic is what makes SAM genuinely transformative. Most image segmentation systems were trained on specific categories — they could segment cars, people, and dogs because those were in their training data, but would fail on a novel object type. SAM was trained on a dataset of such scale and diversity that it can segment objects it has never specifically seen before, responding to clicks, boxes, or text prompts to identify where an object ends and the background begins.

Applications span medical imaging (segmenting tumors or organs in scans), augmented reality (isolating real-world objects for digital overlay), photo editing (removing or replacing backgrounds), robotics (understanding what objects are present and where they are), and autonomous vehicles (identifying road objects).

SAM 2 extended the capability to video — tracking the same object across frames in real time. An object pointed out in a single frame of video can be followed through movement, occlusion, and changing lighting conditions. Both SAM and SAM 2 are fully open-sourced under Apache 2.0.

Emu — Image Generation and Editing

Released: October 2023 Access: Integrated into Meta AI products

Emu is Meta's text-to-image generation model, powering the image generation features embedded in the Meta AI assistant across WhatsApp, Instagram, Facebook, and Messenger. The "Imagine" feature that generates images from text descriptions in WhatsApp and Messenger runs on Emu.

Beyond generation, Meta released Emu Edit — a companion model for precise image editing based on text instructions. Rather than regenerating an entire image to make a change, Emu Edit modifies only the specific elements described in the edit instruction. Emu Video extended the family to short text-to-video generation.

Meta has not fully open-sourced Emu's weights, distinguishing it from the Llama and SAM releases. It remains accessible primarily through Meta's own product surfaces.

DINOv2 — Self-Supervised Vision Understanding

Released: April 2023 License: Apache 2.0 Training data: 142 million curated images

DINOv2 takes a fundamentally different approach to vision AI than models trained with labeled data. Rather than learning from images labeled by humans — "this is a cat," "this is a car" — DINOv2 learns to understand visual content without any labels at all, using a self-supervised approach that finds structural patterns across a massive curated image dataset.

The result is a model that produces universal visual features — representations of images that capture meaningful visual information useful across many downstream tasks. On image classification, depth estimation, and semantic segmentation tasks, DINOv2 delivers strong results without any task-specific fine-tuning, simply by applying its learned visual features to new problems.

For developers building custom vision applications who need a strong foundation, DINOv2 provides a starting point that transfers more broadly than models trained narrowly for specific categories.

ImageBind — Six Senses in One Model

Released: May 2023 License: Open source (non-commercial research)

ImageBind is one of the most conceptually distinctive models in Meta's portfolio. Where most AI systems handle one or two modalities — text and images, or audio and text — ImageBind builds a single unified embedding space across six completely different types of sensory data:

Text, images, audio, depth maps, thermal (infrared) data, and IMU data (the inertial measurement data from accelerometers and gyroscopes that captures movement and orientation).

The remarkable property of ImageBind is that it can find meaningful relationships between any two of these six modalities even when it was not directly trained on examples that pair them together. If you have an audio recording of ocean waves, ImageBind can find images that visually match that sound. If you have a thermal image of a person, it can match it to a regular photograph of the same scene. This cross-modal reasoning emerges from the unified embedding space rather than from explicit training on paired examples.

The applications point toward future AR/VR systems, robotics that need to integrate multiple sensor streams, and any environment where AI must make sense of the physical world through more than text and images alone.

Part Four: Audio Models

AudioCraft — Music, Sound, and Compression

Released: August 2023 License: MIT (fully open source)

AudioCraft is Meta's open-source audio generation framework, and it contains three distinct systems serving different audio needs.

MusicGen generates complete musical compositions from text descriptions. Describe the genre, tempo, instrumentation, mood, and key, and MusicGen produces original audio matching that description. It comes in four sizes — Small, Medium, Large, and Melody — with the Melody variant capable of generating music that matches the style or feel of a reference audio clip you provide. A musician, filmmaker, or game developer can describe exactly the music they need and receive a custom composition without licensing fees or sessions with composers.

AudioGen handles a different category: environmental sounds and audio effects rather than music. Footsteps on different surfaces, crowd noise, mechanical sounds, weather effects — AudioGen generates realistic audio from text descriptions. For film production, game audio design, and accessibility applications that need non-music audio, AudioGen fills a gap that MusicGen was not designed for.

EnCodec is the third component — an AI-powered audio compression codec. Rather than using traditional compression algorithms, EnCodec uses learned representations to compress audio files significantly more efficiently, maintaining higher quality at smaller file sizes. It is the infrastructure that makes the other two models' outputs practically useful.

All three are fully open-sourced under MIT license — the most permissive license in Meta's portfolio.

Part Five: Translation

SeamlessM4T — One Model, All Translation Directions

Released: August 2023 SeamlessM4T v2: December 2023 License: CC-BY-NC (research) Coverage: Nearly 100 languages for speech, 200+ for text

The name explains the ambition: Massively Multilingual and Multimodal Machine Translation. Before SeamlessM4T, building a complete translation pipeline required multiple separate systems — one for speech recognition, one for translation, one for speech synthesis. SeamlessM4T handles all four translation directions — speech-to-speech, speech-to-text, text-to-speech, and text-to-text — within a single unified model.

The scale of language coverage makes it particularly useful in global contexts. Nearly 100 languages are supported for voice input and output. More than 200 languages are covered for text translation.

Two extensions address specific real-world needs. Seamless Expressive preserves the speaking style, emotion, and cadence of the original speaker in the translated output — critical for applications like diplomatic interpretation where tone carries as much meaning as words. Seamless Streaming enables real-time simultaneous translation, where output begins before the speaker has finished — the technology needed for live conference interpretation and real-time communication applications.

SeamlessM4T v2 improved both quality and processing efficiency across all supported translation directions.

Part Six: The Meta AI Assistant

Launched: September 2023 Surfaces: Facebook, Instagram, WhatsApp, Messenger, meta.ai, Ray-Ban Meta smart glasses, Quest VR headsets

Meta AI is the consumer-facing product that packages Llama's language capabilities alongside Emu's image generation, real-time web search through a Bing partnership, and a conversational interface embedded across Meta's entire platform ecosystem.

The scale of distribution is what makes Meta AI structurally different from competing AI assistants. ChatGPT requires users to visit openai.com or install an app. Meta AI appears inside applications that hundreds of millions of people use every day for other purposes — it meets users where they already are rather than requiring them to seek it out.

The Ray-Ban Meta smart glasses integration extended Meta AI into the physical world as a voice-first assistant capable of seeing what the wearer sees and responding to questions about the environment — a genuinely novel deployment surface that no other major AI company had matched at the time of its introduction.

How the Llama Generations Compare

Dimension	Llama 1	Llama 2	Llama 3	Llama 3.1	Llama 3.2	Llama 3.3
Training tokens	1.4T	2T	15T	15.6T	Not disclosed	Not disclosed
Max context	2K	4K	8K	128K	128K	128K
Commercial use	No	Yes	Yes	Yes	Yes	Yes
Vision	No	No	No	No	Yes (11B, 90B)	No
Largest size	65B	70B	70B	405B	90B	70B
Key advance	Open weights	Commercial license	7.5x more data	Frontier 405B	Vision and edge	405B quality at 70B

Meta's Open Source Strategy: Why It Matters

The decision to release Llama weights publicly, with commercial licensing, is not simply altruism. Zuckerberg has articulated a coherent strategic argument in multiple public statements.

First, open models allow security auditing. A model whose weights and architecture are public can be examined by security researchers, academics, and independent evaluators in ways that a closed API cannot. Problems discovered externally can be addressed before they cause harm.

Second, open models prevent monopolization. If AI capability is locked behind a small number of proprietary APIs, those providers gain structural power over every application and business that depends on them. Widely available open models distribute that power.

Third, and most practically for Meta: open models benefit Meta directly by creating an ecosystem of fine-tuned variants, research advances, and deployment tools that would cost billions to develop internally. The community that builds on Llama advances the model's capability in ways that accrue back to Meta's own products.

The numbers reflect how successfully this strategy has executed. Over 350 million Llama model downloads. More than 100,000 companies using Llama in their products. Llama models running on AWS, Azure, Google Cloud, and NVIDIA infrastructure simultaneously — achieving distribution that no proprietary model has matched.

Which Meta AI Model Should You Actually Use?

For general language tasks — writing, reasoning, coding, analysis — the answer depends on your hardware and context needs. Llama 3.3 70B delivers near-frontier quality in a size that does not require specialized infrastructure. For simpler tasks at scale where cost and speed matter most, Llama 3.2 3B or 1B provides capable lightweight inference.

For tasks requiring image understanding — analyzing photographs, charts, documents, screenshots — Llama 3.2 11B or 90B brings vision capability to the open-weight ecosystem.

For coding specifically, Code Llama with its 100K token context remains the specialized choice, capable of handling entire codebases in a single pass.

For the highest-capability open-weight inference where infrastructure allows, Llama 3.1 405B remains the largest publicly available Meta model and the one that benchmarks most directly against closed frontier systems.

For image segmentation in any computer vision application, SAM 2 is the standard — there is no comparable open-source alternative for zero-shot object segmentation in images and video.

For music generation, MusicGen via AudioCraft is the most capable freely available option. For speech translation across dozens of languages, SeamlessM4T covers more language pairs in a single model than any alternative.

For adding safety classification to any LLM application — not just Llama — Llama Guard 3 provides an open, auditable, multilingual content classifier that can be deployed independently of the underlying model.

FAIR's Historical Contribution

Meta's current model portfolio builds on decades of foundational research from FAIR that shaped how the entire industry works. The 2017 work that contributed to attention mechanisms influenced the transformer architecture that underlies virtually every modern language model. The 2020 DETR model brought transformers to object detection. The 2021 DINO work established self-supervised vision learning at scale. OPT in 2022 was an early open-weight language model that set the precedent for Llama.

The through-line from FAIR's academic research to the Llama downloads happening today is direct and traceable — which is unusual in an industry where research and product often diverge sharply.

Final Takeaway

Meta's AI portfolio is broader and deeper than its social media reputation suggests. Llama has become the backbone of the open-source AI ecosystem — the model that more developers, companies, and researchers build on than any other. SAM and SAM 2 set the standard for open computer vision. AudioCraft made high-quality music and audio generation freely available. SeamlessM4T brought multilingual speech translation into a single open model. ImageBind pointed toward a future where AI understands the physical world through multiple senses simultaneously.

The unifying logic is Meta's open-source conviction: that releasing capable models publicly accelerates progress, distributes power more broadly, and ultimately creates more value — including for Meta itself — than keeping them proprietary. With 350 million downloads and counting, the evidence that this bet is paying off is substantial.