DENSEWORLD
The First World Action Model (WAM) for India
DenseWorld learns how people, vehicles, crowds, animals, and streets interact across India’s dense, dynamic, and unpredictable environments.

Six Indian Street Worlds
Each example isolates a different challenge: future prediction, social navigation, occlusion, crowd flow, and India-specific street dynamics.

Market Crossing
Future prediction under dense mixed traffic.

Signal Junction
Multi-agent interaction and causal influence.

Bike Occlusion
Hidden actor recovery behind a moving bike.

Narrow Street
Compressed motion and social navigation.

Bus Stop Crowd
Crowd dynamics around public transport.

Animal Crossing
India-specific dynamics in mixed road use.
Try DenseWorld on Your Own Video
For HF, this can be wired to Gradio. This static page keeps the upload panel ready while curated assets show full model behavior.
Upload Video
MP4 · 5–10 sec · traffic / crowd / street scene
Demo mode
- Live upload: lightweight frame sampling and motion proxy.
- Curated assets: qualitative V-JEPA 2.1 vs DenseWorld comparison.
- All images are expected in the same folder as this HTML file.
How DenseWorld Factorizes a Scene
For v1, every factor reuses the original frame. Later, these cards can receive true overlays for layout, agents, interactions, and motion.

Input world state
Dense market crossing used for factor decomposition.
Layout Factor 92%
- Road geometry
- Crosswalk and junction structure
- Buildings, shops, static context
Agent Factor 89%
- Pedestrians, motorbikes, autos, cars
- Cyclists and crowd clusters
- Agent density and proximity
Interaction Factor 0.77
- Pedestrian ↔ bike
- Bike ↔ auto
- Crowd ↔ vehicle negotiation
Motion Factor 0.83
- Crossing, yielding, avoiding, passing
- Mixed traffic flow
- Lane-less motion patterns

Layout
Stable structure of the world.

Agents
People and vehicles in the scene.

Interactions
Who influences whom.

Motion
How the world is changing.
Motion Understanding
Before predicting the future, the model must understand where people and vehicles are moving.

Actual Optical Flow
Reference motion field.

V-JEPA Motion
Noisier motion estimate.

DenseWorld Motion
Better-aligned flow and agent trajectories.

Motion Observatory
Technical visualization of dense motion.
Motion DNA
A high-level temporal signature: not just optical flow, but primitives such as crossing, yielding, avoiding, negotiation, and occlusion.

Selected actor
Pedestrian #12 in the market crossing scene.
DenseWorld motion sequence
DenseWorld preserves crossing, yielding, avoidance, and recovery after partial occlusion.
V-JEPA 2.1 motion sequence
V-JEPA captures local movement but loses negotiation and hidden-state continuity.
Understanding What Cannot Be Seen
DenseWorld should infer hidden actors using context, motion, and interaction cues — not just detect visible objects.

Original
Occlusion setup.

Occluded
Actor hidden behind bike.

V-JEPA Recovery
Lower-confidence hidden actor estimate.

DenseWorld Recovery
Clearer hidden actor hypothesis.
Can the Model Predict the Next Moment?
Observed frame, V-JEPA prediction, DenseWorld prediction, reference frame, and model-specific error maps.

Observed
Input frame.

V-JEPA
Prediction with motion ambiguity.

DenseWorld
Stronger interaction continuity.

Reference
Future frame.

V-JEPA Error Heatmap
Higher error around moving agents.

DenseWorld Error Heatmap
Reduced error in interaction zones.
Who Influences Whom?
Using A2, DenseWorld and V-JEPA are compared across local interaction, scene graph, and causal influence chain.

Interaction source scene
Signal junction with pedestrians, motorbikes, cars, autos, and bus-like agents.
Local Interaction Pedestrian–Bike

V-JEPA
Lower-confidence local interaction.

DenseWorld
Stronger pedestrian-bike influence estimate.
Junction Interaction Multi-Agent Graph

V-JEPA
Sparser graph and weaker coverage.

DenseWorld
Denser multi-agent interaction graph.
Causal Influence Who caused what?

V-JEPA
Shorter, uncertain causal chain.

DenseWorld
Clearer causal influence over multiple agents.
Future World Simulation
The finale: compare reference futures, V-JEPA futures, and DenseWorld futures at +5, +10, and +15 seconds.

Observed World State
A1 at T0: the starting point for future rollout.
+5 seconds Future consistency

Reference Future
Street state at +5 sec.

V-JEPA vs DenseWorld
DenseWorld maintains sharper pedestrian and vehicle motion.
+10 seconds Crowd preservation

Reference Future
Street state at +10 sec.

V-JEPA vs DenseWorld
DenseWorld preserves crowd flow and social structure more clearly.
+15 seconds Long-horizon reasoning

Reference Future
Street state at +15 sec.

V-JEPA vs DenseWorld
DenseWorld avoids long-horizon simplification and preserves interactions.
DenseWorld vs V-JEPA 2.1
Summary scoreboard across world-model capabilities. Values are demo placeholders and should be replaced by measured benchmark values when available.
DenseWorld Diagnosis
A compact explanation panel for reviewers, visitors, and demo users.
What the scene contains
Dense pedestrian, motorcycle, auto-rickshaw, cyclist, and crowd interactions. The main challenge is not object recognition; it is motion, negotiation, occlusion, and long-horizon consistency.
What DenseWorld improves
- Preserves motion and crowd flow better over time.
- Infers hidden actors from interaction context.
- Builds stronger causal and multi-agent interaction graphs.
- Maintains plausible futures at +5, +10, and +15 seconds.