🇮🇳 DenseWorld Research Demo · World Action Model

DENSEWORLD

The First World Action Model (WAM) for India

Predict. Understand. Anticipate.

DenseWorld learns how people, vehicles, crowds, animals, and streets interact across India’s dense, dynamic, and unpredictable environments.

⬆ Upload Video ▶ Explore Demo ⏱ View Prediction Timeline

700+Hours of video

22Indian cities

115K+Street clips

6World capabilities

0.83Motion complexity

0.77Interaction density

15sFuture rollout

Six Indian Street Worlds

Each example isolates a different challenge: future prediction, social navigation, occlusion, crowd flow, and India-specific street dynamics.

Market Crossing

Future prediction under dense mixed traffic.

Signal Junction

Multi-agent interaction and causal influence.

Bike Occlusion

Hidden actor recovery behind a moving bike.

Narrow Street

Compressed motion and social navigation.

Bus Stop Crowd

Crowd dynamics around public transport.

Animal Crossing

India-specific dynamics in mixed road use.

Try DenseWorld on Your Own Video

For HF, this can be wired to Gradio. This static page keeps the upload panel ready while curated assets show full model behavior.

Upload Video

MP4 · 5–10 sec · traffic / crowd / street scene

Demo mode

Live upload: lightweight frame sampling and motion proxy.
Curated assets: qualitative V-JEPA 2.1 vs DenseWorld comparison.
All images are expected in the same folder as this HTML file.

How DenseWorld Factorizes a Scene

For v1, every factor reuses the original frame. Later, these cards can receive true overlays for layout, agents, interactions, and motion.

Input world state

Dense market crossing used for factor decomposition.

A1 / T0

Layout Factor 92%

Road geometry
Crosswalk and junction structure
Buildings, shops, static context

Agent Factor 89%

Pedestrians, motorbikes, autos, cars
Cyclists and crowd clusters
Agent density and proximity

Interaction Factor 0.77

Pedestrian ↔ bike
Bike ↔ auto
Crowd ↔ vehicle negotiation

Motion Factor 0.83

Crossing, yielding, avoiding, passing
Mixed traffic flow
Lane-less motion patterns

Layout

Stable structure of the world.

Factor

Agents

People and vehicles in the scene.

Factor

Interactions

Who influences whom.

Factor

Motion

How the world is changing.

Factor

Motion Understanding

Before predicting the future, the model must understand where people and vehicles are moving.

Actual Optical Flow

Reference motion field.

V-JEPA Motion

Noisier motion estimate.

DenseWorld Motion

Better-aligned flow and agent trajectories.

Motion Observatory

Technical visualization of dense motion.

Metric

V-JEPA 2.1

DenseWorld

Motion alignment ↑

Trajectory consistency ↑

Flow EPE ↓

3.82

1.72

Motion DNA

A high-level temporal signature: not just optical flow, but primitives such as crossing, yielding, avoiding, negotiation, and occlusion.

Selected actor

Pedestrian #12 in the market crossing scene.

DenseWorld motion sequence

Free

Cross

Yield

Cross

Avoid

Occluded

Walk

DenseWorld preserves crossing, yielding, avoidance, and recovery after partial occlusion.

V-JEPA 2.1 motion sequence

Free

Cross

Walk

V-JEPA captures local movement but loses negotiation and hidden-state continuity.

Free

Interaction

Negotiation

Occlusion

Unknown

Motion DNA Metric

V-JEPA 2.1

DenseWorld

Motion state accuracy ↑

Intent consistency ↑

Negotiation detection ↑

Understanding What Cannot Be Seen

DenseWorld should infer hidden actors using context, motion, and interaction cues — not just detect visible objects.

Original

Occlusion setup.

Occluded

Actor hidden behind bike.

V-JEPA Recovery

Lower-confidence hidden actor estimate.

DenseWorld Recovery

Clearer hidden actor hypothesis.

Metric

V-JEPA 2.1

DenseWorld

Occlusion recovery ↑

Hidden actor localization error ↓

18 px

7 px

Hidden intent accuracy ↑

Can the Model Predict the Next Moment?

Observed frame, V-JEPA prediction, DenseWorld prediction, reference frame, and model-specific error maps.

Observed

Input frame.

V-JEPA

Prediction with motion ambiguity.

DenseWorld

Stronger interaction continuity.

Reference

Future frame.

V-JEPA Error Heatmap

Higher error around moving agents.

DenseWorld Error Heatmap

Reduced error in interaction zones.

Metric

V-JEPA 2.1

DenseWorld

Future-frame L1 ↓

0.186

0.142

SSIM ↑

0.71

0.81

LPIPS ↓

0.298

0.195

Who Influences Whom?

Using A2, DenseWorld and V-JEPA are compared across local interaction, scene graph, and causal influence chain.

Interaction source scene

Signal junction with pedestrians, motorbikes, cars, autos, and bus-like agents.

Local Interaction Pedestrian–Bike

V-JEPA

Lower-confidence local interaction.

DenseWorld

Stronger pedestrian-bike influence estimate.

Junction Interaction Multi-Agent Graph

V-JEPA

Sparser graph and weaker coverage.

DenseWorld

Denser multi-agent interaction graph.

Causal Influence Who caused what?

V-JEPA

Shorter, uncertain causal chain.

DenseWorld

Clearer causal influence over multiple agents.

Metric

V-JEPA 2.1

DenseWorld

Interaction accuracy ↑

Graph coverage ↑

Causal consistency ↑

Future World Simulation

The finale: compare reference futures, V-JEPA futures, and DenseWorld futures at +5, +10, and +15 seconds.

Observed World State

A1 at T0: the starting point for future rollout.

+5 seconds Future consistency

Reference Future

Street state at +5 sec.

V-JEPA vs DenseWorld

DenseWorld maintains sharper pedestrian and vehicle motion.

V5 / D5

V-JEPA future consistency

DenseWorld future consistency

+10 seconds Crowd preservation

Reference Future

Street state at +10 sec.

P10

V-JEPA vs DenseWorld

DenseWorld preserves crowd flow and social structure more clearly.

V10 / D10

V-JEPA crowd preservation

DenseWorld crowd preservation

+15 seconds Long-horizon reasoning

Reference Future

Street state at +15 sec.

P15

V-JEPA vs DenseWorld

DenseWorld avoids long-horizon simplification and preserves interactions.

V15 / D15

V-JEPA long-horizon reasoning

DenseWorld long-horizon reasoning

DenseWorld vs V-JEPA 2.1

Summary scoreboard across world-model capabilities. Values are demo placeholders and should be replaced by measured benchmark values when available.

Layout Understanding

V-JEPA

Dense

Agent Understanding

V-JEPA

Dense

Motion Understanding

V-JEPA

Dense

Occlusion Recovery

V-JEPA

Dense

Interaction Reasoning

V-JEPA

Dense

Future Prediction

V-JEPA

Dense

Long-Horizon Prediction

V-JEPA

Dense

DenseWorld Diagnosis

A compact explanation panel for reviewers, visitors, and demo users.

What the scene contains

Dense pedestrian, motorcycle, auto-rickshaw, cyclist, and crowd interactions. The main challenge is not object recognition; it is motion, negotiation, occlusion, and long-horizon consistency.

What DenseWorld improves

Preserves motion and crowd flow better over time.
Infers hidden actors from interaction context.
Builds stronger causal and multi-agent interaction graphs.
Maintains plausible futures at +5, +10, and +15 seconds.