There is a moment Sid Manchkanti keeps returning to. He was working at NVIDIA - one of the fastest-moving companies in the history of semiconductors - and watching supply chain teams manually transcribe data from documents. PDFs. Scans. Spreadsheets so old they had been faxed into existence. Engineers who cost six figures a year were hand-keying numbers because no piece of software could reliably read the things those numbers lived inside. The problem was not a lack of intelligence. It was a failure of infrastructure.

Manchkanti filed it away. Then he went to D.E. Shaw in Manhattan, where precision is not a value - it is a survival condition. If your data pipeline drops 30% of the signal, you do not lose a customer. You lose a trade. He filed that away too.

In 2024, he co-founded Pulse with Ritvik Pandey - a former Tesla ML engineer and Goldman Sachs data engineer with a dual degree in CS and Mathematics from Georgia Tech. Together they built what Manchkanti had been mentally drafting for years: a production-grade document intelligence platform that converts unstructured documents into LLM-ready structured data. Not a demo. Not a research project. A system built to run at enterprise scale, quietly, without breaking.

600 Million Pages in Stealth

Most companies announce first and build second. Pulse did the opposite. Before the product was ever publicly announced, Pulse had already processed 600 million pages for Fortune 100 enterprises. The milestone was not a launch - it was the proof that the thing worked before anyone was watching. By December 2025, the counter had crossed one billion.

The number is not the point. What's interesting about a billion pages is the constraint it imposes. You cannot get there by getting lucky. You get there by handling the worst cases: documents scanned at 12 degrees, tables that span three pages and don't repeat their headers, handwritten clinical notes that a human would struggle with. Pulse built a five-stage vision pipeline - layout detection, OCR, reading order algorithms, table parsing, fine-tuned visual models - to handle exactly that. On handwritten medical notes, the system achieves 92% accuracy. Legacy systems average 54%.

"Document intelligence only matters if it works quietly, consistently, and correctly at scale."

- Sid Manchkanti, CEO of Pulse

The Investors Who Said Yes

In February 2025, Pulse announced a $3.9 million seed round. The lead investors were Nat Friedman and Daniel Gross - NFDG - two of the most signal-dense investors in AI infrastructure. Friedman ran GitHub after its Microsoft acquisition. Gross ran Y Combinator's AI track and has backed companies at the intersection of compute and capability. When those two write a seed-stage check together, it is worth paying attention.

The round also included Y Combinator (Pulse was part of the S24 batch), Sequoia Capital Scout, Soma Capital, Liquid 2 Ventures, Olive Tree Capital, Tiferes, and executives from NVIDIA, OpenAI, and Ramp. The cap table is a map of where AI infrastructure money is coming from in 2025.

Y Combinator partner Jared Friedman worked with the team during the batch. Y Combinator's account called Pulse's PDF data extraction "SOTA" when they went live - which for a batch that included hundreds of AI companies, is not a thing YC says about every product.

What Pulse Actually Does

The clearest way to understand Pulse is to understand the problem it solves. Every major enterprise AI initiative - whether it is a RAG pipeline for a global bank, a claims processing system for an insurance carrier, or a due diligence tool for a venture fund - runs on documents. PDFs, Word files, Excel spreadsheets, scanned forms, clinical notes, rent rolls, contracts, tax filings. The information inside those documents is not usable until it is extracted, structured, and normalized.

For decades, that process was either manual (slow, expensive, error-prone) or reliant on legacy OCR tools that dropped context and destroyed table structure. AI startups tried to fix it with LLMs and hallucinated numbers into financial models. Pulse took a different approach: a vision model fine-tuned specifically for document layout understanding, combined with schema-first extraction that outputs directly to JSON schemas customers define themselves.

The result: up to 40% higher retrieval accuracy on downstream search and analytics workloads. A growth-stage company using Pulse for accounting workflows saves 2,000+ hours per month. A global bank feeds Pulse outputs directly into its credit-risk models. A YC startup automated its entire investment due diligence workflow. The use cases span finance, healthcare, insurance, legal, real estate, and supply chain - all sectors where documents are not edge cases. They are the product.

"PDF parsing and OCR tools have been around for decades, yet both legacy and AI startups struggle with real-world document processing."

- Sid Manchkanti

Compliance as a Feature, Not a Checkbox

Enterprise document processing means handling sensitive data at scale. Pulse built the compliance stack that made Fortune 100 customers comfortable: SOC 2 Type II, ISO 27001, GDPR compliance, HIPAA BAA, and zero-data retention policies. Deployment options include private VPCs, on-premises, and multi-cloud configurations. For regulated industries - healthcare, finance, insurance - these are not optional features. They are the table stakes for even having the conversation.

In September 2025, Pulse joined Cloudera's Enterprise AI Ecosystem at the EVOLVE25 conference in New York - a move that opens the platform to Cloudera's base of enterprise customers running AI workflows at scale. The integration enables an end-to-end pipeline: documents in, Pulse processing, Cloudera data lakehouse, then into ERP, CRM, and compliance systems.

Building for Failure Modes, Not Milestones

Manchkanti's engineering philosophy is visible in how Pulse talks about its own milestones. The company did not announce "one billion pages processed" as a growth metric. They announced it as a constraints paper - a detailed account of what it takes for a document processing system to reach that scale reliably. Preprocessing failures, normalization edge cases, layout shifts, inconsistent formatting across document families.

"Building for failure modes, not milestones" - that is a line from Pulse's own milestone post. It is a rare thing for a startup to say. Most founders write about growth. Manchkanti writes about the edge cases that would have killed the growth if he hadn't addressed them first.

There is something in that orientation that traces back to D.E. Shaw. In quantitative finance, the interesting problems are never the median cases. They are the tails. The documents that arrive rotated, with merged cells and missing headers, with handwriting over printed text. Pulse built a rotation model specifically for document orientation. It publishes notes on its extraction methodology. The blog posts read less like marketing and more like engineering specifications - which, for the kind of enterprise buyer Pulse is targeting, is exactly the right register.

What's Next

In 2025, Pulse opened its platform to all users with a free tier covering up to 20,000 pages - the same technology running inside Fortune 100 pipelines. The product lineup now includes Pulse STUDIO (zero-shot production extraction), Pulse Ultra (advanced document processing), Pulse Ultra Nano (self-hosted enterprise-scale deployment), and a Vision API for custom integrations.

With 33 employees, open roles across nearly every function, and a YC X25 deal that gives the current batch free access to the platform, Manchkanti is building the distribution layer while the core product compounds. The underlying bet is straightforward: every enterprise AI application needs clean structured data. Most of that data starts as an unstructured document. The company that owns that pipeline reliably owns a critical choke point in enterprise AI infrastructure.

Pulse is not the loudest voice in the document AI space. It is the one that has processed a billion pages.