Skip to main content

Ivan Ruchkin Is Building Worlds

Ivan Ruchkin, Ph.D.

Ivan Ruchkin, Ph.D., builds worlds. His recent work concerns the way autonomous agents perceive the world and, indeed, the way in which the agents’ worlds are constructed. It’s work that verges on the philosophical, but is strongly rooted in machine learning, neural networks, and cyber-physical systems.

Some of this work is happening under the umbrella of Ruchkin’s recent project, funded by the National Science Foundation, “Neuro-Symbolic Bridge: From Perception to Estimation & Control,” as well as his recent NSF CAREER Award.

Some Background

Deep neural nets are everywhere in autonomy and robots, but many in the field doubt that they alone are the ultimate future. There is an increasing interest in integrating classical approaches into deep neural networks. There are many names for such approaches: first-principles, theory-driven, model-based, physics-informed, etc. The unifying word encompassing them all has been “symbolic,” as in based on abstract symbols which are rooted in how humans think and communicate. Neuro-symbolic AI holds a great deal of promise as it combines the high performance and flexibility of neural nets with the robustness, reliability, and general nature of symbolic techniques.

Improving How an AI System Understands

Deep learning models are increasingly employed for perception, prediction, and control in complex systems. Embedding physical knowledge into these models is crucial for achieving realistic and consistent outputs, a challenge often addressed by physics-informed machine learning. However, integrating physical knowledge with representation learning becomes difficult when dealing with high-dimensional observation data, such as images, particularly under conditions of incomplete or imprecise state information.

The main challenge is that these models aren’t able to internalize the constraints and laws of the physical world. We’ve all seen deepfake videos with people and objects morphing into/passing through each other. These kinds of mistakes occur when an AI system does’t have a full grasp of the rules and constraints of the physical world it is attempting to portray. So, how to get these rules and constraints into the cyber world?

Ruchkin works to make world models interpretable using neuro-symbolic techniques, combining neural networks and first-principles math.

Physically Interpretable World Models

Ruchkin proposes Physically Interpretable World Models, a novel architecture that aligns learned latent representations with real-world physical quantities. The method enables the discovery of physically meaningful representations and eliminates the reliance on ground-truth physical annotations.

World models work by creating a surrogate world to train an AI system—essentially learning the internal dynamics model of systems. However, existing world models rely solely on statistical learning of how observations change in response to actions, lacking precise quantification of how accurate the surrogate dynamics are, which poses a significant challenge in safety-critical systems. They are consequently unable to, for example, predict safety violations. In other words, the models don’t ‘know’ the world.

An interpretable world model uses the latest AI to describe and predict how the world around an autonomous system behaves over time (especially in terms of very rich signals like images and lidar). Rich data like these are indispensable to making high-stakes decisions for cyber-physical systems like robots or battle vehicles.

Ruchkin also proposes what he calls foundation world models. While all world models predict rich signals, Ruchkin’s world models are grounded in physical reality—they embed observations into meaningful and causal latent representations. They result in models which:

  • are less likely to hallucinate physically impossible stuff
  • generalize better across scenarios/tasks/environments because of learning physical laws
  • ensure the safety and reliability of the system using other assurance techniques like formal verification or run-time shielding

This enables the surrogate dynamics to directly causally predict future physical states. One way to do so is by leveraging training-free large language models. In two common benchmarks, this novel model outperforms standard world models in the safety prediction task and has a performance comparable to supervised learning despite not using any data.

A few of Ruchkin’s key papers in the area are linked below.

A recent story from the Guardian describes other applications of world models.

Image created by Microsoft Copilot.