In 2013, while most PhD students were debugging code in windowless labs, Ashesh Jain was pointing cameras at human faces to predict what drivers would do next - before they did it. The project was called Brain4Cars. It combined eye-tracking, head pose, and gaze data to anticipate lane changes and turns seconds ahead of the actual maneuver. MIT Technology Review wrote about it. So did Wired. The thesis was simple and a little unsettling: a machine that watches you can know you better than you know yourself.

That bet - that perception, applied carefully enough, becomes prediction - has been the throughline of everything Jain has built since. He took it from Cornell (where he earned his PhD) and Stanford (where he was a visiting scholar at the AI Lab, doing RoboBrain alongside Brain4Cars), into the autonomous vehicle industry, and finally into the physical security market. The vehicle changed. The core insight didn't.

With the new generation of AI, human-level understanding of video is finally possible. We envision cameras evolving far beyond being mere video recorders - they will become a constant pair of eyes that keeps the public safe and secure in a privacy-sensitive manner.

- Ashesh Jain, Co-Founder & CEO, Coram AI

After Cornell, Jain went to Zoox - the autonomous vehicle startup that Amazon would later acquire for over $1 billion. He led the sensor fusion and 3D tracking team: the part of a self-driving car's brain that takes raw sensor data and turns it into a precise, real-time model of everything around the vehicle. Then came Lyft, where he joined in early 2018, one of the first engineers in an office that was nearly empty. He rose to become Director of Engineering and Head of Autonomy, overseeing perception, planning, and the machine learning infrastructure stack for the entire self-driving program.

When Toyota's Woven Planet acquired Lyft's self-driving division in 2021, Jain and his colleague Peter Ondruska - a PhD from Oxford - started looking for the next domain where the same class of perception problem existed at scale. They landed on physical security. Specifically, on the absurd gap between what security cameras could do and what they actually did. Most cameras in America are, functionally, expensive hard drives that record video no one watches until something already went wrong.

The Problem They Spotted

There are over 70 million security cameras in the United States. Fewer than 5% are monitored in real time. The rest are storing footage that gets reviewed only after an incident - when it's already too late.

Jain and Ondruska founded Coram AI in 2021 with a specific thesis: the same computer vision breakthroughs that made self-driving cars work - large vision models, multi-modal sensor fusion, real-time inference - could be layered onto existing IP security cameras without requiring customers to rip out their hardware. Plug in Coram's cloud platform, and your $200 camera becomes an AI security agent that can detect firearms, recognize license plates, flag safety violations, find specific footage through natural language search, and alert a human within seconds of an anomaly.

The platform works across schools, hospitals, warehouses, manufacturing plants, cannabis facilities, and multi-site retail. A school administrator can type "show me anyone who entered the east wing carrying a backpack yesterday afternoon" and get results in seconds. A warehouse safety manager gets an automatic alert when someone enters a restricted zone without PPE. A church's IT director gets a weekly digest of incidents the system caught and resolved.

Battery Ventures led Coram's $13.8M Series A in January 2025, with 8VC and Mosaic Ventures participating. Marcus Ryu, the Battery partner who led the deal, put it plainly: "Ashesh and Peter combine deep academic expertise in video processing and machine learning with technical pragmatism developed in the demanding domain of self-driving vehicles. This combination makes them, in my view, the ideal founding team to build winning consumer and enterprise video AI products."

Jain's total funding raised sits at over $19 million. The company now has 60 employees with offices in California and London. G2 users rate the product 4.9 out of 5, with a 9.5 out of 10 for ease of use - numbers that, for enterprise security software, are genuinely unusual.

Technology can get you excited day-to-day, but in the long run you'll only have an impact when you build a successful product.

- Ashesh Jain

The academic record behind Coram is formidable in ways that matter for what the company actually does. Jain's 2016 CVPR paper, Structural-RNN, introduced a method for doing deep learning on spatiotemporal graphs - modeling how objects and agents relate to each other across time and space. It has been cited over 1,600 times. His 2018 PointFusion paper, on fusing camera and LiDAR data for 3D object detection, has over 1,000 citations. These aren't theoretical footnotes; they are the kind of papers that get read by the engineers building the systems Coram now competes with.

He holds an h-index of 22, meaning 22 of his papers have each been cited at least 22 times - a benchmark that puts him comfortably in the top tier of applied ML researchers. His total citation count exceeds 6,200, with nearly 4,000 of those accumulating after 2021 - meaning his academic work is still actively influencing the field while he is simultaneously running a startup.

The Lyft dataset he helped open-source - One Thousand and One Hours of self-driving motion prediction data - has been cited over 640 times on its own. That's a dataset that continues to train the researchers training the models that companies like Coram now deploy.

Jain speaks with the cadence of someone who has spent years translating between research and engineering teams. His public quote on where physical security is heading - "The future of physical security is agentic" - is the kind of statement that is either exactly right or five years early, and in his case, the track record suggests it's the former. Brain4Cars was showing anticipatory machine behavior in 2013. Structural-RNN was modeling spatiotemporal relationships before transformers made it fashionable. PointFusion was fusing sensor modalities before "multimodal" became a pitch deck word.

The through-line, from a PhD student pointing cameras at drivers' faces to a startup CEO watching every school hallway in real time, is the same: the machine sees, the machine anticipates, and the human is safer for it.