Version history — Text Generation Inference
15 data points across 4 dates
2026-04-15
2026-04-14
2026-04-13
2026-04-12
Description Production inference server for LLMs by Hugging Face. Optimized for transformers with continuous batching, tensor parallelism, and OpenAI-compatible API. Powers HuggingChat, HF Inference API, and Inference Endpoints. v1.0+ licensed under HFOIL (commercial SaaS use requires license). Now in maintenance mode. initial source ↗