30 models tracked · Updated daily · Prices verified against primary sources

Language Models

Language models are the foundation layer of the AI stack. sourc.dev tracks 30 models at launch — covering pricing per token (input and output), context window size, multimodal capability, open weights status, EU data residency, and generational development history. Every price has a source URL. Every change is timestamped. No estimates. No sponsored placements.

input priceoutput pricecontext windowopen weightsmultimodalEU residencyAPI rate limitsdrift indexrelease dateversion history

The price collapse

The cost of accessing a frontier language model has fallen 97% in five years. That is not a typo, and it is not a projection. It is the measured decline from GPT-3's launch pricing in 2020 to the models available in late 2024.

GPT-3 launched in June 2020 at $60.00 per million input tokens. It was the only commercial large language model API available, and that price reflected monopoly positioning plus genuine infrastructure costs — inference on 175 billion parameters was expensive. For over two years, that price was the market. Then compression started. GPT-3.5 Turbo arrived in March 2023 at $2.00 per million tokens — a 97% reduction from GPT-3 — while delivering substantially better capability. It was the model that powered the original ChatGPT. GPT-4 launched the same month at $30.00, establishing a new tier: the frontier model premium. You could have good-enough for $2.00, or the best-available for $30.00.

Competition arrived in mid-2023. Anthropic launched Claude 2 in July 2023 at $8.00 per million input tokens, positioning below GPT-4 on price while offering a 100,000-token context window — 12x larger than GPT-4's 8,192. OpenAI responded with GPT-4 Turbo in November 2023 at $10.00, cutting their own frontier price by 67% and expanding context to 128,000 tokens. Google entered with Gemini Pro at $0.50 — a sub-dollar price point that signaled infrastructure-scale competition.

The 2024 acceleration was faster still. GPT-4o launched in May 2024 at $5.00. GPT-4o mini followed in July at $0.15 — frontier-adjacent capability for less than the price of a single API call to GPT-3 four years earlier. Then came the DeepSeek moment. DeepSeek V3 arrived at $0.27 per million tokens and caused approximately $600 billion in stock market losses in a single trading day. Efficient. DeepSeek R1, a reasoning model released in January 2025, matched OpenAI's o1 on benchmarks while being open weights and priced at a fraction of the cost. The market interpreted this as evidence that massive compute budgets were not the only path to frontier performance.

The geographic dimension matters. US-China competition in AI is directly benefiting European builders. Every price cut from OpenAI forces a response from DeepSeek, which forces a response from Google, which pushes Mistral to compete. EU developers and companies are the beneficiaries of a subsidy war they did not start — and the input prices they pay reflect it.

Model Date Input Price Output Price
GPT-3Jun 2020$60.00$60.00
GPT-3.5 TurboMar 2023$2.00$2.00
GPT-4Mar 2023$30.00$60.00
Claude 2Jul 2023$8.00$24.00
GPT-4 TurboNov 2023$10.00$30.00
Gemini ProDec 2023$0.50$1.50
GPT-4oMay 2024$5.00$15.00
GPT-4o miniJul 2024$0.15$0.60
DeepSeek V3Dec 2024$0.27$1.10

LLM input pricing 2020–2025 ($ per million tokens)

LLM input pricing 2020–2025 ($ per million tokens) — log scale bar chart Bar chart showing the decline in LLM input pricing from GPT-3 at $60.00 in 2020 to DeepSeek V3 at $0.27 in 2024, representing a 97% reduction over five years. Y axis uses logarithmic scale from $0.10 to $100. $100 $10 $1 $0.10 $60.00 GPT-3 2020 $2.00 GPT-3.5T 2023 $30.00 GPT-4 2023 $8.00 Claude 2 2023 $10.00 GPT-4T 2023 $0.50 Gemini 2023 $5.00 GPT-4o 2024 $0.15 4o mini 2024 $0.27 DSV3 2024 97% reduction in 5 years Source: provider pricing pages. sourc.dev 2026.

Context window expansion

The context window — the maximum amount of text a model can process in a single request — has grown 244x in four years. GPT-3 launched in 2020 with a 4,096-token window. That is roughly three pages of text. If you wanted to summarize a 20-page document, you could not. The document simply did not fit.

The first major breakthrough came from Anthropic. Claude 2, launched in July 2023, offered a 100,000-token context window — the first model capable of processing an entire book in a single call. This was not an incremental improvement. It was a 24x jump from GPT-4's 8,192 tokens and enabled an entirely new category of applications: long-document analysis, full codebase understanding, and multi-document synthesis. Developers who had been building complex chunking and retrieval pipelines could suddenly pass entire documents directly to the model.

GPT-4 Turbo responded with 128,000 tokens in November 2023. Claude 3 pushed to 200,000 tokens in March 2024. Then Google set the current record: Gemini 1.5 Pro at 1,000,000 tokens — approximately 10 full-length novels in a single prompt. That is 244x the original GPT-3 window. At 1 million tokens, the constraint shifts from "what fits in context" to "what is it cost-effective to put in context," because you pay for every token in the window.

An honest caveat: larger context windows are not always better. Research shows that model attention quality can degrade in the middle of very long contexts — a phenomenon called "lost in the middle." Cost scales linearly with context length. And most applications do not need 1 million tokens. The practical question is not "what is the maximum?" but "what is the minimum context window that covers 95% of my use cases?" sourc.dev tracks context window size for every model precisely so builders can make that calculation.

Model Date Context Window Approx. Equivalent
GPT-3Jun 20204,0963 pages
GPT-3.5 TurboMar 202316,38412 pages
GPT-4Mar 20238,1926 pages
Claude 2Jul 2023100,0001 novel
GPT-4 TurboNov 2023128,0001.3 novels
Claude 3Mar 2024200,0002 novels
Gemini 1.5 ProFeb 20241,000,00010 novels

Context window size 2020–2024 (tokens)

Context window size 2020–2024 (tokens) — horizontal bar chart Horizontal bar chart showing context window growth from GPT-3 at 4,096 tokens in 2020 to Gemini 1.5 Pro at 1,000,000 tokens in 2024, a 244x increase. GPT-3 (2020) 4,096 ≈ 3 pages GPT-3.5T (2023) 16,384 GPT-4 (2023) 8,192 Claude 2 (2023) 100,000 GPT-4T (2023) 128,000 Claude 3 (2024) 200,000 Gemini 1.5 (2024) 1,000,000 ≈ 10 novels Source: provider documentation. sourc.dev 2026.

Who is building at scale

Adoption data tells a clear story: LLM usage crossed from early adopter to mainstream in 2024. GitHub Copilot reached 1.8 million paid subscribers by the end of 2023, making it one of the fastest-growing developer tools ever measured. The Stack Overflow 2024 Developer Survey found that 76% of developers are using or planning to use AI tools — up from roughly 44% the prior year. That is not a niche technology.

Enterprise adoption is moving equally fast. McKinsey's 2024 State of AI report found that 65% of organizations are regularly using generative AI, up from 33% in their 2023 survey — a doubling in twelve months. McKinsey called this the fastest technology adoption curve they have documented. The JetBrains 2023 Developer Ecosystem survey found that 55% of developers had used AI coding assistants.

On the consumer side, ChatGPT reached 100 million weekly active users by November 2024, making it one of the most-used software products globally. The open source ecosystem is scaling at a different level: Hugging Face hosts over 900,000 models as of early 2025. The tools built on these models — code assistants, agents, RAG pipelines, content generators — represent the application layer that sourc.dev tracks separately.

Developer AI tool adoption 2023–2024

Developer AI tool adoption 2023–2024 — horizontal bar chart Horizontal bar chart showing developer AI adoption rates: Stack Overflow 2024 at 76%, McKinsey 2024 (orgs) at 65%, JetBrains 2023 (AI assistants) at 55%, and McKinsey 2023 (orgs) at 33%. Stack Overflow 2024 76% McKinsey 2024 (orgs) 65% JetBrains 2023 (AI asst.) 55% McKinsey 2023 (orgs) 33% McKinsey: fastest technology adoption curve documented

What sourc.dev tracks for every model

Every language model in the sourc.dev directory is tracked across a consistent set of attributes. Each attribute has a dedicated learn page explaining what it measures, why it matters, and how it is collected.

Model families and generational development

Language models are developed in families — successive generations from the same organization, each building on the architecture and training of the previous version. Understanding model families matters because it reveals pricing trajectories, capability improvements, and the competitive dynamics that drive both.

The GPT family (OpenAI). OpenAI reached a $157 billion valuation in October 2024, making it the most valuable private technology company in the world. The GPT lineage runs from GPT-3 (June 2020, 175B parameters, $60.00/1M tokens) through GPT-3.5 Turbo (March 2023, $2.00) to GPT-4 (March 2023, $30.00), GPT-4 Turbo (November 2023, $10.00), GPT-4o (May 2024, $5.00), and GPT-4o mini (July 2024, $0.15). Each generation has delivered more capability at lower cost. OpenAI also produces the o1 reasoning model family, which represents a separate development branch optimized for multi-step logical tasks.

The Claude family (Anthropic). Anthropic, valued at $61.5 billion in early 2025, was founded by former OpenAI researchers and differentiates on safety research and long-context capability. The Claude lineage runs from Claude 1 (March 2023) through Claude 2 (July 2023, first 100k context window), Claude 3 (March 2024, Haiku/Sonnet/Opus tiers), to Claude Sonnet 4.6 (late 2024). Anthropic pioneered the tiered model approach — offering Haiku (fast and cheap), Sonnet (balanced), and Opus (maximum capability) under a single family.

The Gemini family (Google DeepMind). Google merged its AI research groups into Google DeepMind and launched the Gemini family in December 2023. Gemini Pro offered sub-dollar pricing ($0.50). Gemini 1.5 Pro set the context window record at 1 million tokens. Gemini 2.0 Flash (late 2024) at $0.10 per million input tokens became one of the cheapest frontier-adjacent models available. Google's advantage is infrastructure scale — they own the TPU hardware, the data centers, and the distribution through Google Cloud and Android.

The Llama family (Meta). Meta released Llama 2 as open weights in February 2023, fundamentally altering the market structure. Before Llama, open weights models were significantly behind proprietary models. Llama 2, and subsequently Llama 3 (April 2024) and Llama 3.3 (late 2024), demonstrated that competitive performance could be achieved in open weights form. Meta does not charge for the models directly — their business model uses AI to improve their advertising and social media platforms, and releasing open weights models builds ecosystem and talent.

The market has settled into a permanent structural split: hosted API models (OpenAI, Anthropic, Google) versus open weights models (Meta, Mistral, DeepSeek). Hosted APIs offer convenience, managed infrastructure, and frequent updates. Open weights models offer control, data privacy, and the ability to fine-tune for specific tasks. Most production architectures will use both — APIs for complex tasks requiring frontier capability, self-hosted models for high-volume tasks where cost and latency dominate.

Language models and European data sovereignty

The EU AI Act, signed into law in August 2024, is the world's first comprehensive legal framework for artificial intelligence. Combined with GDPR's existing data residency requirements, it creates a regulatory environment that shapes which language models European organizations can use and how. This is not theoretical — procurement teams at European enterprises and public sector organizations are already filtering model choices by EU data residency capability.

Mistral AI, headquartered in Paris, has emerged as the EU champion for language models. Having raised over $1.1 billion, Mistral offers models that process data within European jurisdiction by default. Aleph Alpha, based in Heidelberg, Germany, builds sovereign AI infrastructure specifically for European government and enterprise customers. These are not the only options — Azure OpenAI offers EU data residency through European regions, Google Cloud's Vertex AI supports EU data location constraints, and open weights models can be self-hosted entirely on European cloud providers like OVHcloud, Hetzner, or Scaleway.

The Nordic dimension is worth noting. The Nordics have among the highest developer density per capita globally, strong digital infrastructure, and government procurement frameworks that increasingly require data sovereignty. For Nordic builders, the practical reality is a three-option matrix: EU-native providers (Mistral, Aleph Alpha), US providers with EU data residency options (Azure OpenAI, Google Vertex), or self-hosted open weights models on European infrastructure. sourc.dev tracks EU data residency as a first-class attribute for every model to support exactly this decision.

Models tracked

placeholder — live data coming

GPT-4o

OpenAI

pricing from $5.00/1M tokens

context: 128k tokens

verified 2026-03-24

placeholder — live data coming

Claude Sonnet 4.6

Anthropic

pricing from $3.00/1M tokens

context: 200k tokens

verified 2026-03-24

placeholder — live data coming

Gemini 2.0 Flash

Google

pricing from $0.10/1M tokens

context: 1M tokens

verified 2026-03-24

placeholder — live data coming

Mistral Large

Mistral AI

pricing from $2.00/1M tokens

context: 128k tokens

verified 2026-03-24

placeholder — live data coming

DeepSeek V3

DeepSeek

pricing from $0.27/1M tokens

context: 128k tokens

verified 2026-03-24

placeholder — live data coming

Llama 3.3 70B

Meta

pricing from $0.40/1M tokens

context: 128k tokens

verified 2026-03-24

New to LLM pricing?

Frequently asked questions

What is a language model?

A language model is a statistical system trained on text data to predict and generate sequences of words. Modern large language models (LLMs) like GPT-4o, Claude, and Gemini use transformer architectures trained on billions of parameters to perform tasks including text generation, summarization, translation, code writing, and reasoning. They accept text input (a prompt) and produce text output (a completion). Language models are accessed through APIs with per-token pricing or deployed locally using open weights. sourc.dev tracks 30 language models across pricing, context windows, and capability dimensions.

What is a token?

A token is the fundamental unit of text that language models process. One token is approximately 0.75 English words, or conversely, one English word averages about 1.3 tokens. The word "hamburger" becomes three tokens (ham-bur-ger). A typical page of English text contains roughly 400-500 tokens. Tokenization varies across models — each provider uses its own tokenizer (GPT models use tiktoken, Claude uses its own BPE tokenizer). All LLM API pricing is denominated in tokens, typically quoted per million tokens for both input and output. Understanding token counts is essential for estimating API costs and working within context window limits.

What is the difference between input and output pricing?

LLM API providers charge separately for input tokens (what you send to the model) and output tokens (what the model generates back). Input tokens include your prompt, system instructions, conversation history, and any documents or context you provide. Output tokens are the model's response. Output tokens are almost always more expensive than input tokens — typically 2x to 4x the input price. This pricing split exists because generating output requires more computation than processing input. For GPT-4o, input costs $5.00 per million tokens while output costs $15.00 per million tokens — a 3x ratio. For Claude Sonnet 4.6, input is $3.00 and output is $15.00 per million tokens — a 5x ratio. When estimating costs, you need to model both sides. A chatbot application with long system prompts will be input-heavy. A content generation application will be output-heavy. The ratio between input and output spending varies significantly by use case, which is why sourc.dev tracks both input price and output price separately.

How do I calculate my monthly API cost?

To calculate monthly API cost, you need three numbers: your monthly input token volume, your monthly output token volume, and the per-million-token rates for each. Here is a worked example using GPT-4o pricing. Suppose your application sends 1 million input tokens and receives 200,000 output tokens per month. GPT-4o charges $5.00 per million input tokens and $15.00 per million output tokens. Input cost: 1,000,000 tokens x ($5.00 / 1,000,000) = $5.00. Output cost: 200,000 tokens x ($15.00 / 1,000,000) = $3.00. Total monthly cost: $5.00 + $3.00 = $8.00. The same workload on Claude Sonnet 4.6 ($3.00 input, $15.00 output): $3.00 + $3.00 = $6.00. On Gemini 2.0 Flash ($0.10 input, $0.40 output): $0.10 + $0.08 = $0.18. On DeepSeek V3 ($0.27 input, $1.10 output): $0.27 + $0.22 = $0.49. These are raw API costs only — they exclude infrastructure, caching, retry overhead, and engineering time. For production workloads, multiply your estimate by 1.2-1.5x to account for retries, prompt iteration, and monitoring overhead. sourc.dev tracks input pricing and output pricing for all 30 models with daily verification.

How have LLM prices changed since 2020?

LLM pricing has dropped 97% in five years. GPT-3 launched in 2020 at $60.00 per million input tokens — the first commercially available large language model API. By early 2023, GPT-3.5 Turbo brought that to $2.00. GPT-4 launched at $30.00 in March 2023, then GPT-4 Turbo reduced that to $10.00 by November 2023. Claude 2 entered at $8.00 in July 2023. Google's Gemini Pro launched at $0.50 in December 2023. GPT-4o arrived at $5.00 in May 2024, then GPT-4o mini at $0.15 in July 2024. DeepSeek V3 launched in December 2024 at $0.27 per million tokens. The pattern is consistent: each generation delivers equivalent or better capability at a fraction of the previous price. This collapse benefits builders — applications that were economically impossible at $60.00 per million tokens become trivial at $0.15.

How have context windows expanded since GPT-3?

Context windows have grown 244x in four years. GPT-3 launched in 2020 with a 4,096-token context window — roughly three pages of text. That meant the model could only process very short documents. GPT-3.5 Turbo extended to 16,384 tokens in 2023. Claude 2 was the first major model to break the 100,000-token barrier in July 2023, enabling practical long-document processing for the first time. GPT-4 Turbo followed with 128,000 tokens. Claude 3 reached 200,000 tokens in March 2024. Gemini 1.5 Pro set the current record at 1,000,000 tokens — approximately 10 full-length novels in a single prompt. Larger context windows enable new application categories: entire-codebase analysis, long-document summarization, and multi-document reasoning. However, larger is not always better — cost scales linearly with context length, and model attention quality can degrade at extreme lengths.

Should I use a proprietary API or self-host?

The choice between a proprietary API (GPT-4o, Claude, Gemini) and self-hosting an open weights model (Llama 3, Mistral, DeepSeek) depends on five factors. Data sensitivity: if data cannot leave your infrastructure, self-hosting eliminates third-party data exposure. Cost at scale: API pricing is simpler at low volume, but at roughly 10 million+ tokens per day, self-hosting on GPU instances often becomes cheaper. Latency requirements: self-hosted models on dedicated GPUs eliminate network round-trips and provider queue times. Capability requirements: as of early 2025, the largest proprietary models (GPT-4o, Claude Sonnet 4.6) still outperform the best open weights models on complex reasoning tasks, though the gap is narrowing. Operational complexity: self-hosting requires GPU procurement, model serving infrastructure, monitoring, and scaling — a minimum of one dedicated ML engineer. The practical decision rule: start with APIs for development speed, measure actual token volumes and latency needs for 30-60 days, then evaluate self-hosting only if you exceed 5-10 million tokens per day or have strict data residency requirements. Many production systems use both — API models for complex tasks, self-hosted models for high-volume simple tasks.

What is the difference between open weights and open source?

Open weights and open source are different things, though they are frequently conflated. Open weights means the trained model parameters are publicly downloadable — you can run the model on your own hardware. Llama 3, Mistral, and DeepSeek V3 are open weights models. However, open weights alone does not mean open source. True open source, by the Open Source Initiative definition, requires releasing the training data, training code, and model weights under an OSI-approved license with no usage restrictions. Think of it like cooking: open weights gives you the finished dish to reheat at home, but not the recipe or ingredient sourcing. Open source gives you the complete recipe, the ingredient list, and permission to open your own restaurant. Most "open" models use custom licenses with restrictions — Meta's Llama license prohibits use by companies with over 700 million monthly active users. Mistral uses Apache 2.0 for some models, which is a genuine open source license. The distinction matters for procurement, legal compliance, and long-term vendor risk.

What context window size do I actually need?

The context window you need depends on your use case. Here are practical token estimates. A single customer support message: 100-300 tokens. A one-page document: 400-500 tokens. A full conversation history (20 turns): 2,000-5,000 tokens. A 10-page research paper: 4,000-6,000 tokens. A full software file (500 lines): 3,000-5,000 tokens. An entire small codebase (50 files): 100,000-200,000 tokens. A book-length document: 80,000-120,000 tokens. For chatbot applications, 16,000-32,000 tokens handles most conversations. For document analysis, you need at least 32,000 tokens. For codebase-level work, 128,000+ tokens. For multi-document research or book-length analysis, 200,000+ tokens. Remember: you pay for every token in your context window, not just the new input. If you stuff 100,000 tokens of context into every API call, your costs multiply accordingly. The practical approach: choose the smallest context window that covers 95% of your use cases, and handle the remaining 5% with chunking or summarization strategies.

What is a reasoning model?

A reasoning model is a language model specifically trained to perform multi-step logical reasoning before producing a final answer. Unlike standard models that generate responses in a single pass, reasoning models produce an internal chain-of-thought — breaking complex problems into intermediate steps. OpenAI's o1 (September 2024) was the first major reasoning model, showing significant improvements on mathematics, coding, and scientific reasoning benchmarks. DeepSeek R1 (January 2025) demonstrated that reasoning capabilities could be achieved at dramatically lower cost — it is an open weights reasoning model that matched o1 performance on several benchmarks. Reasoning models typically cost more per token and take longer to respond because they generate many internal reasoning tokens. They excel at tasks requiring logical deduction, mathematical proof, code debugging, and complex planning. They are less suited for simple text generation, creative writing, or tasks where speed matters more than accuracy.

How many developers use LLM APIs?

Developer adoption of LLM tools has grown faster than any previous technology wave. GitHub Copilot reached 1.8 million paid subscribers by the end of 2023 — just two years after launch. The Stack Overflow 2024 Developer Survey found that 76% of developers are using or planning to use AI tools in their workflow. McKinsey's 2024 State of AI report found that 65% of organizations are regularly using generative AI, up from 33% in their 2023 survey — effectively doubling in one year. JetBrains' 2023 developer survey found that 55% of developers had used AI code assistants. ChatGPT reached 100 million weekly active users by November 2024. Hugging Face hosts over 900,000 models. McKinsey called this the fastest technology adoption curve they have documented. The adoption is not uniform — it is concentrated in software development, content creation, and data analysis, with slower uptake in regulated industries like healthcare and finance.

Which companies lead LLM development?

As of early 2025, LLM development is concentrated among six organizations. OpenAI (GPT family) reached a $157 billion valuation in October 2024 and remains the market leader by API revenue and brand recognition. Anthropic (Claude family) reached a $61.5 billion valuation, differentiating on safety research and long-context capability. Google DeepMind (Gemini family) has the advantage of Google's infrastructure and data assets, plus the largest context window at 1 million tokens. Meta (Llama family) released Llama 2 as open weights in February 2023, fundamentally changing the market by giving away competitive models. Mistral AI, based in Paris, has raised over $1.1 billion and serves as the leading European LLM developer. DeepSeek, a Chinese AI lab, disrupted the market in late 2024 and early 2025 with models matching frontier performance at a fraction of the cost. The market is split between proprietary API providers (OpenAI, Anthropic, Google) and open weights providers (Meta, Mistral, DeepSeek).

What is model drift?

Model drift is the phenomenon where a language model's behavior changes over time without any change to your code or prompts. Think of it like a scale that slowly loses calibration — you are weighing the same items but getting different results. Drift occurs because providers update, fine-tune, or replace model versions behind the same API endpoint. A prompt that worked reliably in January may produce different outputs in March. Stanford researchers documented measurable drift in GPT-4 and GPT-3.5 over a three-month period in 2023, with accuracy on certain tasks changing by 10-20 percentage points. Drift matters for production applications that depend on consistent model behavior. sourc.dev tracks drift index as a first-class attribute — documenting version changes, behavioral shifts, and provider update announcements for each model.

Large model vs small model — when to use which?

Large models (GPT-4o, Claude Sonnet 4.6, Gemini 1.5 Pro) and small models (GPT-4o mini, Gemini 2.0 Flash, Llama 3.3 8B) serve different purposes, and the optimal architecture often uses both. Large models excel at complex reasoning, nuanced writing, multi-step problem solving, and tasks requiring broad world knowledge. They cost more per token ($3.00-$5.00 per million input tokens) and respond more slowly. Small models excel at classification, extraction, simple summarization, routing, and high-volume tasks where speed and cost matter more than reasoning depth. They cost 10-50x less ($0.10-$0.40 per million input tokens) and respond faster. The practical pattern in production is a routing architecture: a small, fast model handles 80% of requests (simple queries, classification, extraction), and routes complex requests to a large model for the remaining 20%. This can reduce costs by 60-80% compared to using a large model for everything. Decision rule: if a task can be solved with a clear prompt and structured output, use a small model. If it requires multi-step reasoning, ambiguity handling, or creative generation, use a large model. Test both — small models are better than most developers expect.

What is multimodal?

Multimodal refers to language models that can process and generate multiple types of media — not just text. GPT-4o, Claude Sonnet 4.6, and Gemini 1.5 Pro can all accept images as input alongside text, enabling tasks like image description, chart reading, document OCR, and visual question answering. Gemini models additionally support video and audio input. Some models can generate images (DALL-E 3 via GPT-4) or speech (GPT-4o voice mode). Multimodal capability matters for applications that need to process real-world documents (PDFs, screenshots, photographs) rather than just clean text. Pricing for multimodal input varies — image tokens are typically more expensive than text tokens.

Which LLMs offer EU data residency?

For organizations subject to GDPR or EU data residency requirements, several options exist. Mistral AI, headquartered in Paris, processes data within EU jurisdiction by default — their La Plateforme API operates from European data centers. Azure OpenAI Service offers EU data residency through Azure's European regions (Netherlands, France, Sweden, Germany). Google Cloud's Vertex AI for Gemini models can be configured with EU data location constraints. Anthropic offers EU processing through AWS European regions for enterprise customers. For maximum control, open weights models (Llama 3, Mistral, DeepSeek) can be self-hosted entirely within EU infrastructure on European cloud providers like OVHcloud, Hetzner, or Scaleway. German company Aleph Alpha builds sovereign AI infrastructure specifically for European government and enterprise customers. sourc.dev tracks EU data residency as a first-class attribute for every model.

What is DeepSeek and why did it matter?

DeepSeek is a Chinese AI research lab that released two models that reshaped the LLM market. DeepSeek V3 launched in December 2024 as an open weights model with performance competitive with GPT-4o and Claude Sonnet 4.6 — but priced at $0.27 per million input tokens, roughly 10-20x cheaper than Western equivalents. DeepSeek R1, a reasoning model, followed in January 2025, matching OpenAI's o1 on several benchmarks while being open weights and dramatically cheaper. The market reaction was immediate: on January 27, 2025, AI-related stocks lost approximately $600 billion in market value in a single trading day. Nvidia alone lost nearly $600 billion in market capitalization — the largest single-day market cap loss for any company in history at that time. DeepSeek mattered for three reasons. First, it demonstrated that frontier-level AI performance did not require the massive compute budgets assumed by Western labs. Second, as open weights models, DeepSeek V3 and R1 could be self-hosted by anyone, anywhere. Third, it introduced a US-China competitive dynamic that benefits global builders through lower prices and more options.

What is the difference between GPT-4o and Claude?

GPT-4o (OpenAI) and Claude Sonnet 4.6 (Anthropic) are both frontier language models, but they differ in design philosophy, pricing, and capabilities. GPT-4o processes text, images, and audio natively, with a 128,000-token context window. It is priced at $5.00 per million input tokens and $15.00 per million output tokens. It has the largest third-party ecosystem (plugins, integrations, fine-tuning). Claude Sonnet 4.6 focuses on safety, instruction following, and long-context performance. It has a 200,000-token context window — 56% larger than GPT-4o. It is priced at $3.00 per million input tokens and $15.00 per million output tokens — cheaper on input. Anthropic emphasizes Constitutional AI and interpretability research. In independent benchmarks, the models trade leads depending on the task category. GPT-4o tends to score higher on coding and multimodal tasks. Claude tends to score higher on long-document analysis and instruction following. Both are updated frequently. sourc.dev tracks both models with daily price verification and does not rank one above the other.

What is the EU AI Act?

The EU AI Act is the world's first comprehensive legal framework for artificial intelligence, signed into law in August 2024. It classifies AI systems by risk level: unacceptable risk (banned), high risk (strict requirements), limited risk (transparency obligations), and minimal risk (no restrictions). For LLM providers, the Act introduces specific obligations for "general-purpose AI models" (GPAI). Providers of GPAI models must publish training data summaries, comply with EU copyright law, and implement technical documentation requirements. Models classified as presenting "systemic risk" (trained with compute exceeding 10^25 FLOPs) face additional obligations including adversarial testing, incident reporting, and cybersecurity measures. The Act's provisions take effect in phases: prohibited practices from February 2025, GPAI rules from August 2025, and high-risk system requirements from August 2026. For European builders, the practical impact is threefold: increased documentation requirements from LLM providers, stronger incentives to use EU-based providers like Mistral, and new compliance obligations for applications built on top of LLMs in high-risk categories (healthcare, finance, employment).

Is AI going to replace software developers?

This question generates strong opinions. Here is what the data shows from both sides. The case for significant displacement: GitHub Copilot studies show 55% faster task completion. Cognition AI's Devin (2024) demonstrated autonomous multi-step coding. Google reported that 25% of new code at Google is now AI-generated (October 2024). Reasoning models like o1 and DeepSeek R1 can solve complex programming problems that would challenge mid-level developers. Stack Overflow traffic dropped roughly 50% after ChatGPT launched, suggesting developers are shifting where they seek answers.

The case against replacement: software development is more than writing code — it involves understanding requirements, system design, debugging distributed systems, navigating organizational politics, and making tradeoffs that require human judgment. AI-generated code requires human review, testing, and integration. Companies that adopted AI coding tools report they need fewer junior developers but more senior developers to review and architect. The historical pattern with every previous automation technology (compilers, IDEs, frameworks, cloud services) has been that developer productivity increased but total demand for developers also increased — because lower costs expanded the market for software.

The most likely outcome based on current trajectory: AI will change what developers do (less boilerplate, more review and architecture) rather than eliminate the role. But the skill floor will rise. The Stack Overflow 2024 survey found 76% of developers are already using or planning to use AI tools — adaptation is happening whether individual developers choose it or not.

Submit a model or correction →