What happens when AI enters the physical world — predicting actions, spotting risks and transforming … More
For years, AI has been great at seeing things. It can recognize faces, label objects and summarize the contents of a blurry image better than most humans. But ask it to explain why a person is pacing nervously near a fence, or predict what might happen next in a crowded room — and suddenly, the illusion of intelligence falls apart.
Add to this reality the fact that AI largely remains a black box and engineers still struggle to explain why models behave erratically or how to correct them, and you might realize the big dilemma in the industry today.
But that’s where a growing wave of researchers and startups believe the next leap lies: not just in faster model training or flashier generative outputs, but in machines that truly understand the physical world — the way it moves, reacts and unfolds in real time. They’re calling it “physical AI”.
The term was initially popularized by Nvidia CEO Jensen Huang, who previously has called physical AI the next AI wave, describing it as “AI that understands the laws of physics,” moving beyond pixel labeling to bodily awareness — space, motion and interaction.
From Passive Cameras To Active
At its core, physical AI merges computer vision, physics simulation and machine learning to teach machines cause and effect. Essentially, it enables AI systems to not just recognize objects or people, but to understand how they interact with their surroundings — like how a person’s movement might cause a door to swing open or how a ball might bounce off a wall.
At Lumana, a startup backed by global venture capital and growth equity firm Norwest, that phrase isn’t just branding; it’s a full-blown product shift. Known for AI video analytics, the company is now training its models not only to detect motion, but to recognize human behavior, interpret intent and automatically generate real-time alerts.
“We define physical AI as the next evolution of video intelligence,” Lumana CEO Sagi Ben-Moshe said in an interview. “It’s no longer just about identifying a red car or a person in a hallway — it’s about inferring what might happen next, and taking meaningful action in real-world conditions.”
In one real-world deployment, Lumana’s system flagged a possible assault after detecting unusual body language and close proximity between two men and a pair of unattended drinks, prompting an alert that allowed staff to step in before anything escalated. In another case, it caught food safety violations in real time, including workers skipping handwashing, handling food without gloves and leaving raw ingredients out too long. These weren’t issues discovered after the fact, but ones that the system caught as they unfolded. This kind of layered inference, Ben-Moshe explained, transforms cameras into “intelligent sensors.”
Real-World Impact
It’s no coincidence that Huang has also previously used the term “physical AI,” linking it to embodied intelligence and real-world simulation. It reflects a broader shift in the industry about creating AI systems that better understand the laws of physics and can reason more intelligently. Physics, in this context, is shorthand for cause and effect — the ability to reason about motion, force and interaction, not just appearances.
That framing resonated with investors at Norwest, who incubated Lumana during its earliest phase. “You can’t build the future of video intelligence by just detecting objects,” said Dror Nahumi, a general partner at Norwest. “You need systems that understand what’s happening, in context and can do it better than a human watching a dozen screens. In many cases, businesses also need this information in real-time.”
Norwest isn’t alone. Other players, from Hakimo to Vintra, are exploring similar territory — using AI to spot safety violations in manufacturing, detect loitering in retail, or prevent public disturbances before they escalate.
For example, Hakimo recently built an autonomous surveillance agent that prevented assaults, identified vandalism and even saved a collapsed individual using live video feeds and AI. At Nvidia GTC in March, Nvidia even demoed robotic agents learning to reason about gravity and spatial relationships directly from environment-based training, echoing the same physical reasoning that Lumana is building into its surveillance stack.
And just yesterday, Meta announced the release of V- JEPA 2, “a self-supervised foundation world model to understand physical reality, anticipate outcomes and plan efficient strategies.” As Michel Meyer, group product manager at the Core Learning and Reasoning arm of the company’s Fundamental AI Research, noted on LinkedIn yesterday quoting Meta chief AI scientist Yann Lecun, “this represents a fundamental shift toward AI systems that can reason, plan, and act through physical world models. To reach advanced machine intelligence, AI must go beyond perception and understand how the physical world works — anticipating dynamics, causality, and consequences. V‑JEPA 2 does just that.”
When asked what the real-world impact of physical AI might look like, Nahumi noted that it’s more than mere marketing. “Anyone can detect motion, but if you want real AI in video surveillance, you must go beyond that to understand context.” He sees Lumana’s full-stack, context-driven architecture as a foundation and not a vanity pitch.
“We think there’s a big business here and the technology is now reliable enough to augment and outperform humans in real time,” told me.
Trust And Transparency
The reality is that the success of physical‑AI systems will not be just about the technology. As AI continues to advance, it’s becoming much clearer that the success of most AI systems largely hinges on ethics, trust and accountability. Put in a different way, trust is the currency of AI success. And the big question that companies must continue to answer is: Can we trust your AI system to be safe?
In a security context, false positives can shut down sites or wrongly accuse innocent people. In industrial settings, misinterpreted behavior could trigger unnecessary alarms.
Privacy is another concern. While many physical AI systems operate on private premises — factories, campuses, hotels — critics warn that real-time behavior prediction, if left unchecked, could drift into mass surveillance. As Ben-Moshe himself acknowledged, this is powerful technology that must be used with guardrails, transparency and explicit consent.
But, according to Nahumi, Lumana’s multi-tiered model delivers actionable alerts, but also protects privacy and supports seamless integration into existing systems. “Lumana engineers systems that layer physical AI on current infrastructure with minimal friction,” he noted, “ensuring operators aren’t overwhelmed by false positives.”
A Market On The Brink
Despite these questions, demand is accelerating. Retailers want to track foot traffic anomalies. Municipalities want to prevent crime without expanding staff. Manufacturers want safety compliance in real time, not post-event reviews. In every case, the challenge is the same: too many cameras, too little insight.
And that’s the business case behind physical AI. As Norwest’s Nahumi put it, “We’re seeing clear ROI signals — not just in avoided losses, but in operational efficiency. This is no longer speculative deep tech. It’s a platform bet.”
That bet hinges on systems that are scalable, adaptable and cost-effective. Lumana’s approach, which layers physical AI on top of existing camera infrastructure, avoids the “rip-and-replace” problem and keeps adoption friction low. Nahumi pointed to rising enterprise demand across retail, manufacturing, hospitality and public safety — fields where video footage is ubiquitous, but analysis remains manual and inefficient.
And even across boardrooms and labs, the appetite for machines that “understand” rather than “observe” is growing. That’s why companies like Norwest, Nvidia, Hakimo and Lumana are doubling down on physical AI.
“In five years,” Ben-Moshe envisions, “physical AI will do more than perceive — it will suggest actions, predict events and give safety teams unmatched visibility.” This, he noted, is about systems that not only see, but also act.
The Takeaway
Ultimately, the goal of physical AI isn’t just to help machines see better — it’s to help them understand what they’re seeing. It’s to help them perceive, understand and reason in the messy physical world we inhabit.
Ben-Moshe envisions a future where physical AI suggests actions, prevents escalation and even predicts incidents before they unfold. “Every second of video should generate insight,” he said. “We want machines to reason about the world as a system — like particles tracing possible paths in physics — and highlight the most likely, most helpful outcome.”
That’s a far cry from today’s basic surveillance. From thwarting crime and preventing accidents to uncovering new operational insights and analyzing activity trends, reasoning engines over cameras promise real, demonstrable value.
But scaling them is where the real work is. It’ll require systems that are accurate, ethical, auditable and trustworthy. If that balance is struck, we could enter a world where AI won’t just help us see what happened, but help us know what matters most.


