LangChain’s architecture is built around a small set of powerful, composable primitives. Understanding these components is the key to understanding how the framework works.
1. Models (LLMs and Chat Models)
The model layer is the foundation of LangChain. It provides a unified interface to interact with different LLM providers, abstracting away the differences in their APIs.
LangChain distinguishes between two types of models:
- LLMs: Text-in, text-out models. You pass a string prompt and receive a string response. These correspond to older completion-style APIs.
- Chat Models: Message-in, message-out models. You pass a list of messages (system, human, AI) and receive a message response. This is the interface used by modern models like GPT-4, Claude, and Gemini.
Both types implement a common Runnable interface, meaning they can be dropped into any chain or pipeline interchangeably. Switching from one model provider to another is as simple as changing the model class — the rest of your chain remains unchanged.
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
llm = ChatOpenAI(model="gpt-4o")
# or swap to: llm = ChatAnthropic(model="claude-opus-4-6")
LangChain also supports local models via Ollama, HuggingFace, and other providers, making it suitable for privacy-sensitive or offline deployments.
2. Prompts
Prompts are the instructions you give to the model. LangChain’s prompt system transforms raw string templates into structured, reusable, and type-safe objects. Rather than building prompts by concatenating strings, you define templates with named variables and let LangChain handle formatting.
The key prompt classes are:
- PromptTemplate: A simple string template with variable substitution. Ideal for single-turn, text-completion-style interactions.
- ChatPromptTemplate: A list of message templates (SystemMessagePromptTemplate, HumanMessagePromptTemplate, AIMessagePromptTemplate) that together define a structured conversation. This is the standard for chat models.
- FewShotPromptTemplate: Automatically injects example input-output pairs into the prompt to guide the model through demonstration rather than instruction alone.
- MessagesPlaceholder: Reserves a slot in a ChatPromptTemplate for dynamic message lists — used to inject conversation history or retrieved documents.
Prompts in LangChain are Runnables, meaning they can be chained directly with models and output parsers using the pipe operator.
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("human", "{question}")
])
3. Output Parsers
LLMs return text. Most applications need structured data — a JSON object, a list, a Pydantic model, or a boolean. Output parsers bridge this gap by converting the model’s raw text response into a typed Python (or JavaScript) object.
LangChain provides several built-in parsers:
- StrOutputParser: Simply extracts the text content from the model’s response. The identity parser, useful for text generation tasks.
- JsonOutputParser: Parses a JSON-formatted response into a Python dictionary.
- PydanticOutputParser: Parses the response into a Pydantic model instance, with full type validation and automatic format instruction injection.
- CommaSeparatedListOutputParser: Splits a comma-separated response into a Python list.
- StructuredOutputParser: Allows you to define a custom schema using ResponseSchema objects.
Modern LLM APIs also support function calling and JSON mode, which LangChain wraps with the with_structured_output() method — binding a schema directly to the model for guaranteed structured output without relying on prompt instructions alone.
4. LangChain Expression Language (LCEL)
LCEL is the modern, recommended way to compose LangChain components into chains. It uses the pipe operator (|) to connect Runnables in a declarative, readable syntax — similar to Unix pipes or functional composition.
from langchain_core.output_parsers import StrOutputParser
chain = prompt | llm | StrOutputParser()
response = chain.invoke({"question": "What is LangChain?"})
Every component in LangChain — prompts, models, retrievers, output parsers, custom functions — implements the Runnable interface, which means they all share the same set of methods: invoke(), stream(), batch(), and their async counterparts ainvoke(), astream(), abatch().
LCEL chains support several powerful features out of the box:
- Streaming: Response tokens are streamed token-by-token to the end user with no extra code.
- Async: Every chain can be invoked asynchronously with await chain.ainvoke(…).
- Parallel execution: RunnableParallel runs multiple branches simultaneously and merges their outputs.
- Fallbacks: chain.with_fallbacks([backup_chain]) automatically retries with an alternative if the primary fails.
- Tracing: Every LCEL invocation is automatically traced in LangSmith with zero configuration.
5. Document Loaders
Before an LLM can reason about your data, that data needs to be loaded and converted into a format the framework understands. LangChain’s document loaders handle this ingestion step, reading from dozens of sources and returning a list of Document objects — each containing page_content (the text) and metadata (source, page number, etc.).
LangChain provides loaders for:
- Files: PDF, Word, Excel, CSV, JSON, Markdown, HTML, PowerPoint
- Web: URLs, sitemaps, Wikipedia, YouTube transcripts, arxiv papers
- Databases: SQL, MongoDB, Notion, Airtable, Google Drive
- Code: GitHub repositories, Python files, Jupyter notebooks
- Communication: Slack, Discord, email
Each loader normalizes its source into the same Document schema, so the rest of the pipeline — splitting, embedding, storing — works identically regardless of the original format.
6. Text Splitters
LLMs have a finite context window. A 50-page PDF cannot be passed directly to the model — it must be split into smaller, semantically coherent chunks. Text splitters handle this transformation, converting a list of Documents into a larger list of smaller chunk Documents.
LangChain offers several splitter strategies:
- RecursiveCharacterTextSplitter: The recommended default. Splits on a hierarchy of separators (paragraphs, sentences, words) to preserve semantic coherence as much as possible.
- CharacterTextSplitter: Splits on a single character (e.g., newline). Simple but less context-aware.
- TokenTextSplitter: Splits based on token count rather than characters, ensuring chunks never exceed a model’s token limit.
- MarkdownHeaderTextSplitter: Splits Markdown documents at heading boundaries and preserves headers as metadata.
- SemanticChunker: Uses embeddings to split text at semantic boundaries rather than character positions — produces the most coherent chunks for RAG.
7. Embeddings and Vector Stores
Retrieval-Augmented Generation (RAG) — the pattern of fetching relevant documents and injecting them into the prompt — is one of the most important patterns in LLM applications. Embeddings and vector stores are the infrastructure that makes RAG possible.
Embeddings convert text into a dense numerical vector that captures its semantic meaning. LangChain wraps embedding models from OpenAI, Cohere, HuggingFace, Google, and others behind a unified Embeddings interface.
Vector stores index these embedding vectors and enable fast similarity search — given a query embedding, return the k most similar document embeddings. LangChain integrates with:
- Cloud-native: Pinecone, Weaviate, Qdrant, Chroma, Milvus
- In-memory: FAISS, DocArrayInMemorySearch
- Relational: pgvector (PostgreSQL), SQLite-VSS
- Managed: MongoDB Atlas Vector Search, Azure AI Search, Elasticsearch
Once a vector store is populated, it can be wrapped in a retriever and plugged directly into an LCEL chain, enabling end-to-end RAG in just a few lines of code.
8. Retrievers
A retriever is an interface that takes a query string and returns a list of relevant Documents. The simplest retriever wraps a vector store’s similarity search. But LangChain offers many more sophisticated retrieval strategies:
- MultiQueryRetriever: Generates multiple reworded versions of the user’s query, runs them all, and merges the results — improving recall when a single query misses relevant documents.
- ContextualCompressionRetriever: Compresses and filters retrieved documents to remove irrelevant content before passing them to the model, reducing token usage.
- EnsembleRetriever: Combines results from multiple retrievers (e.g., BM25 keyword search + semantic search) using reciprocal rank fusion for better precision.
- ParentDocumentRetriever: Indexes small chunks for precise retrieval but returns their larger parent documents for richer context.
- SelfQueryRetriever: Uses the LLM to parse a natural language query into a structured filter + semantic query, enabling metadata-filtered retrieval.
9. Memory
LLMs are stateless — they have no memory of past interactions. Every call is independent. For conversational applications, you need to explicitly manage and inject conversation history into each prompt. LangChain’s memory module automates this.
Key memory types include:
- ConversationBufferMemory: Stores the full message history and injects it verbatim. Simple and accurate, but grows without bound.
- ConversationBufferWindowMemory: Keeps only the last k turns, discarding older messages to stay within the context window.
- ConversationSummaryMemory: Uses an LLM to progressively summarize the conversation, trading fidelity for token efficiency.
- ConversationTokenBufferMemory: Keeps the most recent messages up to a token limit, automatically trimming older content.
- VectorStoreRetrieverMemory: Stores past interactions in a vector store and retrieves the most semantically relevant memories for each new message.
In LCEL-based applications, memory is typically managed manually using RunnableWithMessageHistory, which wraps a chain and automatically loads and saves history to any supported store (in-memory, Redis, DynamoDB, etc.).
10. Chains
Chains are sequences of calls to components — LLMs, prompts, retrievers, tools — that together accomplish a task more complex than a single model call. In the modern LangChain ecosystem, chains are built with LCEL. However, LangChain also ships several prebuilt chain classes for common patterns:
- LLMChain: The classic prompt-and-model chain. Largely superseded by LCEL but still widely used.
- RetrievalQA / ConversationalRetrievalChain: End-to-end RAG chains that handle document retrieval, context injection, and answer generation.
- SequentialChain: Connects multiple chains so the output of one becomes the input of the next.
- RouterChain: Dynamically routes queries to different sub-chains based on the content of the input.
- TransformChain: Applies a custom Python function to transform data between chain steps.
With LCEL, most of these prebuilt chains can be replicated in just a few lines of code while gaining streaming, async, and tracing support for free.
11. Agents and Tools
Agents represent the most powerful pattern in LangChain. Rather than following a fixed sequence of steps, an agent uses the LLM itself to decide what actions to take, what tools to invoke, and when to return a final answer. Agents implement the ReAct (Reason + Act) loop: the model reasons about the current state, selects an action, observes the result, and repeats.
A Tool is any function the agent can call — a web search, a database query, a calculator, a code interpreter, or a custom API endpoint. LangChain ships with tools for:
- Web search: Tavily, Google Search, DuckDuckGo, Bing
- Code execution: Python REPL, E2B sandboxed environment
- Data: SQL database query, Pandas DataFrame agent
- Retrieval: vector store search, Wikipedia, arxiv
- Communication: Gmail, Slack, Zapier
Tools are defined with a name, a description (which the LLM reads to decide whether to use the tool), and a function to execute. Custom tools can be created with a simple decorator:
from langchain_core.tools import tool
@tool
def get_word_count(text: str) -> int:
"""Returns the number of words in the given text."""
return len(text.split())
LangGraph, LangChain’s companion library, extends agent capabilities to multi-agent workflows with stateful graphs, human-in-the-loop interrupts, and parallel branches — enabling sophisticated agentic systems like research assistants, coding agents, and autonomous data pipelines.
12. Callbacks and Observability
LangChain’s callback system provides hooks into every stage of a chain’s execution — prompt formatting, LLM start, LLM token streaming, tool invocation, and chain completion. Callbacks are used for logging, tracing, streaming to UIs, and custom monitoring.
LangSmith, LangChain’s observability platform, integrates via callbacks to provide full trace visibility into every LLM call, prompt, and tool use — along with evaluation, dataset management, and prompt playground features. Enabling LangSmith requires just two environment variables:
LANGCHAIN_TRACING_V2="true"
LANGCHAIN_API_KEY="<your-key>"
With tracing enabled, every invoke() call in your LCEL chain is automatically recorded with its inputs, outputs, latency, token usage, and cost — making debugging LLM applications dramatically easier.
Putting It All Together: A RAG Application
To see how the components work in concert, here is a sketch of a complete RAG application built with LangChain:
- Load documents using a document loader (e.g., PyPDFLoader).
- Split documents into chunks using RecursiveCharacterTextSplitter.
- Embed and store chunks in a vector store (e.g., Chroma or Pinecone).
- Create a retriever from the vector store.
- Build a RAG chain using LCEL: retriever | context_formatter | prompt | llm | output_parser.
- Add memory via RunnableWithMessageHistory for multi-turn conversation.
Each step uses a purpose-built LangChain primitive, and the entire pipeline is connected with LCEL — giving you streaming, async, batching, and tracing without any extra work.
The Broader LangChain Ecosystem
- LangGraph: A library for building stateful, multi-actor applications and agent workflows as directed graphs. It provides fine-grained control over agent loops, supports human-in-the-loop patterns, and enables complex multi-agent systems.
- LangSmith: A developer platform for LLM application observability, evaluation, and testing. It provides tracing, prompt playgrounds, dataset management, and automated evaluations.
- LangServe: A library for deploying LangChain chains as REST APIs with built-in streaming and playground UI, built on top of FastAPI.
- LangChain Hub: A public registry of community-reviewed prompt templates for common tasks — RAG, summarization, extraction, agents, and more.
LangChain has established itself as the dominant framework for building LLM applications because it solves the right problems at the right level of abstraction. Its components — models, prompts, output parsers, LCEL, document loaders, text splitters, embeddings, vector stores, retrievers, memory, chains, and agents — each address a specific challenge in the LLM application stack.
By treating every component as a composable Runnable and providing LCEL as a unified composition layer, LangChain makes it possible to build sophisticated, production-ready LLM applications quickly — and to maintain, test, and observe them over time.
Whether you are building a domain-specific Q&A system, a coding assistant, an autonomous research agent, or a multi-modal document processor, LangChain gives you the primitives to do it cleanly, efficiently, and with confidence.