Spatial Intelligence Visualization

India's Sovereign
Embodied AI

Building Vision-Language-Action (VLA) models for India, grounded in Indian-language instruction, built to navigate Indian terrains, aligned with Indian ethical principles through the Kalam Protocol, and trained for defense using principles of Kalaripayattu.

Dense World

DENSEWORLD is our ongoing endeavor to develop world models for populous, crowded urban environments in the Global South. Built from street-level, pedestrian, and aerial data collected across Tier-1 and Tier-2 cities of India, it learns structured representations of scene geometry, motion, hazards, traversability, and crowd dynamics under diverse lighting, weather, and seasonal conditions. Rather than treating perception as static recognition, DENSEWORLD models the environment as an evolving dynamical system—capturing what is navigable, what is risky, what is changing, and what is likely to happen next—thereby enabling downstream Vision-Language-Action (VLA) systems to operate with robustness, prediction, and situational awareness in dense real-world settings.

Dense urban mobility and crowd dynamics
Neural Architecture

FactorJEPA

FACTORJEPA is a factorized Joint Embedding Predictive Architecture designed to learn structured world representations for complex real-world environments. Instead of encoding scenes as a single monolithic latent state, it decomposes perception into disentangled predictive factors capturing geometry, motion, agents, objects, and environmental context. This factorized design supports stronger robustness, compositional generalization, and interpretability, enabling world models to reason more effectively about interaction, change, and uncertainty in dense urban settings.

PragyaVLA

PRAGYA-VLA is the sovereign Vision-Language-Action model at the core of our embodied AI stack, designed for grounded decision-making in dense, dynamic Indian terrains. It unifies visual perception, Indian-language instruction understanding, embodied reasoning, and action generation within a single framework, enabling robots to interpret commands, reason about their surroundings, and execute safe, context-aware behavior. By coupling Indian-language grounding with terrain-aware actionability, PRAGYA-VLA is designed to support robust embodied intelligence for real-world deployment in challenging public settings.

Embodied AI
Humanoid navigation in dense urban traffic

DenseWalk

DENSEWALK is our data-and-benchmark pipeline for short-horizon humanoid navigation in populous, crowded, and chaotic Global South urban environments. From 200 hours of egocentric walk-through video, it combines monocular depth, agent detection and tracking, optical flow, and traversability-based gap inference to derive navigation decisions such as advance, yield, sidestep, slow, and wait. We then pair these motion-grounded actions with language supervision and evaluate in a dense benchmark spanning bottlenecks, crossings, occlusion, temporary gap openings, and dynamically narrowing free space with metrics for safety, stability, and social compliance.

Kalam Protocol

KALAM Protocol is our behavioral alignment and governance framework for embodied AI systems operating in sensitive real-world environments. It is designed to align robot perception, reasoning, and action with safety, reliability, human oversight, and mission-aware constraints, especially in dense public settings where errors can have real operational consequences. Rather than treating alignment as a post hoc filter, KALAM Protocol embeds ethical control, deployment discipline, and decision accountability into the action loop, enabling sovereign robotic systems to act with restraint, situational awareness, and trustworthiness.

Network Mesh
Motion Tracking

KalariSena

KalariSena is our movement-intelligence and embodied skills program for humanoid robotics, centered on teaching Indian martial art—Kalaripayattu—to robots. It translates the principles of agility, balance, reflexes, defensive movement, rapid response, and whole-body coordination into machine embodiment, enabling humanoids to move with greater precision, control, and physical discipline in complex real-world environments. By grounding robotic motion in a structured tradition of combat-tested movement, KalariSena aims to build faster, more resilient, and operationally capable humanoid systems for deployment in railways, airports, Kumbh Mela-scale gatherings, crowd-control scenarios, security at Indian monuments, and other dense public settings where mobility, responsiveness, and embodied reliability are critical.

The Evidence

Cross-ecosystem preview of raw sensory input versus processed spatial understanding.

Raw Sensor Data Raw Input
Spatial Map FactorJEPA Map

Urban Occlusion Benchmarks

FactorJEPA successfully reconstructing 3D volumes in 98% occlusion scenarios in dense traffic.

Humanoid Raw Kinematic Input
Resilience Map KalariSena Policy

Zero-Shot Locomotion Transfer

PragyaVLA + KalariSena achieving 100% stability on uneven simulated and real hardware surfaces.