PragyaVLA | The Spatial Academic

Model Scale

12.4B

Dense Parameters

CoT Reliability

94.2%

Success on Complex Tasks

Avg Latency

42ms

Action Token Generation

Adaptability

High

Zero-Shot Capabilities

01. RAW INPUT DATA

Scenario: Dynamic Manipulation

Scenario: Locomotion Terrain

Scenario: Spatial Mapping

02. PROCESSED RESULTS

> vision_input: detected_object_sphere
> cot_init: plan_approach_vector
> reasoning: calculate_torque_limits
> action_tokens: [0.12, -0.44, 0.89, 0.0]
> feedback: contact_established

Task: Precise Placement

Model decomposes high-level instructions into executable joint-space vectors with continuous feedback loops.

> state: walking_gait_unstable
> loc_aware: adjust_center_of_mass
> reasoning: shift_left_pedal_pressure
> action_tokens: [0.05, 0.22, -0.11, 0.9]
> status: balance_restored

Task: Terrain Recovery

Integration of IMU data into the transformer context window allows for immediate postural correction.

> user_input: "fetch the blue beaker"
> visual_search: active_scan
> cot: identify_beaker_depth_map
> reasoning: grasp_point_optimization
> action_tokens: [0.88, 0.12, -0.34, 0.5]

Task: Zero-Shot Fetch

Natural language instruction to complex action mapping via internal semantic reasoning chain.

Analysis 01

94.2% Peak Efficiency

Performance remains logarithmic even as spatial complexity increases, demonstrating the robustness of LA-CoT reasoning.

Analysis 02

SOTA-VLA

PRAGYAVLA

BASE-TRANSFORMER

~42ms Optimized Path

PragyaVLA achieves a 2.4x latency reduction compared to existing vision-action architectures by offloading non-critical reasoning to background threads.

Contributors

Amitava Das

Professor, BITS Goa | Former Research Associate Professor, AIISC, USA

Aman Chadha

GenAI Leadership @ AWS | Stanford AI | Ex-Amazon Alexa, Nvidia, Qualcomm | EB-1 "Einstein Visa" Recipient | EMNLP 2023 Outstanding Paper Award

Vasu Sharma

Vinija Jain

AI @ Meta | Ex-Amazon, Oracle, PANW | Stanford AI | EMNLP Outstanding Paper Award Recipient