Tagged Content
Everything on the platform tagged with inference.
OpenRouter is a unified API gateway for large language models. Through a single OpenAI-compatible endpoint, developers can reach 400+ models from 60+ providers, with automatic routing, price comparison, and failover across vendors. It removes the need to maintain separate integrations for OpenAI, Anthropic, Google, Mistral and dozens of others, and now processes around 100 trillion tokens per month for more than 8 million users.
GMI Cloud is an AI-native GPU cloud built around NVIDIA's H100 and H200 accelerators. Founded in 2021 and headquartered in Mountain View, it sells on-demand and reserved GPU compute, an orchestration layer (Cluster Engine), and a low-latency Inference Engine to AI labs and enterprises building generative models. In 2025 NVIDIA named it one of seven Reference Platform Cloud Partners worldwide.
MatX is a Mountain View semiconductor company building chips designed exclusively for large language models. Founded in 2022 by ex-Google TPU engineers Reiner Pope and Mike Gunter, it aims to deliver an order-of-magnitude more performance-per-dollar for frontier model training and inference than current GPUs.
Dean Leitersdorf is the 27-year-old Co-Founder and CEO of Decart, an AI research lab building real-time world models and ultra-fast inference infrastructure. A Technion PhD graduate at 23 and veteran of Israel's elite Unit 8200, he co-founded Decart in late 2023 with Moshe Shalev. The company went from stealth to unicorn status in under a year, raising over $453M total including a $300M Series C in May 2026 backed by NVIDIA, Radical Ventures, Adobe, Toyota, and angel investors including Andrej Karpathy. Decart's products - DOS (inference stack), Lucy (real-time video transformation), and Oasis (the viral AI-generated Minecraft-like game) - position it as a vertically integrated AI company targeting a billion-user consumer app.
Together AI is a San Francisco-based AI acceleration cloud that lets developers and enterprises train, fine-tune, and run open-source generative AI models on a high-performance GPU platform. Backed by $533M+ from General Catalyst, NVIDIA, Salesforce Ventures and others, it has become one of the most prominent challengers to the closed-model status quo.
FlexAI is a Paris-based AI infrastructure company building a 'universal AI compute' layer that lets teams deploy, train, and serve models across diverse GPU architectures and cloud providers without wrestling with the underlying hardware. Founded in 2023 by former Intel, NVIDIA, Apple, and Tesla veterans, it raised a $30M seed round in April 2024 and is positioning itself as Europe's answer to the GPU-as-a-service crunch.
SambaNova Systems is a Palo Alto AI company building purpose-built chips and a full-stack platform for fast, efficient inference on the largest open-source models, taking direct aim at Nvidia in the enterprise AI market.
Byung-Gon Chun is the CEO and Co-founder of FriendliAI, and a professor of Computer Science and Engineering at Seoul National University currently on leave. A systems researcher turned founder, he is best known for inventing continuous batching - the scheduling technique that became the default standard in every major LLM inference engine, from vLLM to TensorRT-LLM. His lab published the foundational ORCA paper at OSDI 2022, and he then turned that academic insight into FriendliAI, an enterprise AI inference platform that raised $26.7M and supports over 550,000 models from Hugging Face. With a career spanning Intel, Yahoo!, Microsoft, and Facebook, Chun brings rare depth across both research and production AI infrastructure.
Gavin Uberti is the co-founder and CEO of Etched, a Cupertino-based AI chip startup building the world's first transformer-specific ASIC called Sohu. A Harvard dropout and 2024 Thiel Fellow, Uberti co-founded Etched in 2022 alongside Chris Zhu and Robert Wachen after betting that transformer architecture would dominate AI for years to come. That bet has paid off spectacularly: Etched has raised over $625 million (including a $500M round in January 2026 at a $5 billion valuation), and Sohu claims to run transformer inference 20x faster than Nvidia's H100 at a fraction of the cost - positioning Etched as one of the most serious challengers to Nvidia's dominance in AI compute.
Rodrigo Liang is the co-founder and CEO of SambaNova Systems, an AI infrastructure company he built from the ground up in 2017 alongside Stanford professors Kunle Olukotun and Chris Ré. Born in Taipei, raised in Brazil, and trained in electrical engineering at Stanford, Liang spent two decades designing high-performance processors at Hewlett-Packard, Sun Microsystems, and Oracle before betting that the entire computing paradigm for AI needed to be reimagined. SambaNova's Reconfigurable Dataflow Unit (RDU) is the architectural expression of that conviction - chips designed around data movement rather than instructions - and in February 2026 the company announced the SN50, claiming it runs agentic AI 5x faster than competing chips at 3x lower cost, backed by $350M in Series E funding and a strategic partnership with Intel.

Dylan Patel is the founder and chief analyst of SemiAnalysis, a boutique AI infrastructure research and consulting firm he started as a solo blog on his 24th birthday. Growing up working night shifts at his immigrant parents' motel in rural Georgia, he taught himself semiconductor analysis while toggling between RuneScape and chip-geek forums. Today SemiAnalysis has 85+ employees, 260,000+ subscribers, and is on track to surpass $100 million in revenue in 2026 - making Patel one of the most influential voices in AI infrastructure, cited by Jensen Huang at GTC and referenced by Sam Altman.

Yao Fu (符尧) is an AI researcher at xAI specializing in large language model reasoning, efficient inference, and distributed systems. A PhD graduate of the University of Edinburgh, he previously worked at Google DeepMind on Gemini 3 and Project Astra. With over 5,000 citations and key papers like ServerlessLLM (OSDI '24) and DuoAttention (ICLR '25), Fu bridges systems engineering and ML research. He writes the 'Yao Fu' newsletter on Notion and is known for the Chain-of-Thought Hub benchmark repository, which helped track LLM reasoning progress across the field.

OctoAI (formerly OctoML) was a Seattle-based AI infrastructure company founded in 2019 by University of Washington researchers — including Apache TVM creator Tianqi Chen and CEO Luis Ceze. The company built a generative AI inference platform that gave developers fast, affordable API access to leading open-source LLMs and image generation models, along with OctoStack, an enterprise-grade private AI deployment stack. After raising ~$132M and pivoting from ML optimization to GenAI infrastructure, OctoAI was acquired by NVIDIA in September 2024 and wound down its commercial services by October 31, 2024.
Fireworks AI is a generative AI inference platform founded in 2022 by seven engineers — five of whom built PyTorch at Meta — that gives enterprises fast, cost-efficient, and customizable access to hundreds of open-source models. The company's proprietary FireAttention kernels and speculative-execution engine deliver up to 40× faster inference and 8× cost reduction versus alternatives, while its fine-tuning and model-deployment tooling lets companies own their AI stack end-to-end. With $327M+ raised, a $4B valuation, 10,000+ customers including Samsung, Uber, Shopify, and Cursor, and a $315M annualized run-rate as of early 2026, Fireworks AI has become the go-to inference layer for production generative AI applications.

Predibase was a San Francisco-based AI infrastructure company (founded 2020, acquired by Rubrik in June 2025) that pioneered efficient LLM fine-tuning and serving at scale. Built by the creators of Uber AI's Ludwig and Horovod frameworks, Predibase made it easy for enterprises to fine-tune and deploy open-source LLMs using LoRA adapters — often outperforming GPT-4 on domain-specific tasks for under $8 of compute. Its open-source LoRAX inference server enabled serving thousands of fine-tuned models from a single GPU, dramatically cutting costs. After raising $28M from Greylock and Felicis, Predibase was acquired by cybersecurity firm Rubrik for over $100M to accelerate agentic AI adoption.

Baseten is a San Francisco-based AI inference infrastructure company that provides dedicated and serverless GPU compute for running AI models at scale. Founded in 2019 by four ex-Gumroad engineers, the company has grown into a unicorn with a $5B valuation and $585M in total funding, backed by NVIDIA and other top-tier investors. Baseten powers inference workloads for 100+ enterprises including Cursor, Notion, HeyGen, and Clay, offering an inference stack with near-zero cold starts, proprietary networking, and open-source tooling like Truss for model packaging.

Modal (Modal Labs) is an AI-native serverless cloud computing platform that gives developers instant, elastic access to GPUs and CPUs through a clean Python SDK — no YAML, no Dockerfiles, no infrastructure management required. Founded in 2021 by Spotify ML veteran Erik Bernhardsson, Modal enables AI and ML teams to scale from zero to thousands of GPUs in seconds, paying only for what they use. With customers like Suno, Mistral AI, Harvey, Ramp, and Substack, Modal reached unicorn status at a $1.1B valuation in September 2025 and was reportedly in talks to raise at $2.5B just five months later.

RunPod is an AI cloud infrastructure company that provides on-demand GPU compute for training, fine-tuning, and deploying AI/ML models. Founded in 2022 by two former Comcast engineers who pivoted their Ethereum mining rigs into AI servers, RunPod grew to $120M ARR with just $22M raised by early 2026, serving 500,000+ developers across 183 countries. Its marketplace model, per-second billing, and support for 30+ GPU SKUs — from consumer RTX 4090s to enterprise H100s and B200s — make it a capital-efficient disruptor to hyperscaler GPU clouds like AWS, GCP, and Azure.