PragyaVLA

An Instruction-Finetuned Vision-Language-Action Model engineered for high-fidelity humanoid control. PragyaVLA introduces Locomotion-Aware Chain-of-Thought (LA-CoT), bridging the gap between high-level reasoning and millisecond-level motor primitive execution.

Project Status

Research Phase: V3.2 Deployment

Active Inference Engine
Model Scale

12.4B

Dense Parameters

CoT Reliability

94.2%

Success on Complex Tasks

Avg Latency

42ms

Action Token Generation

Adaptability

High

Zero-Shot Capabilities

01. RAW INPUT DATA

Scenario: Dynamic Manipulation

Scenario: Locomotion Terrain

Scenario: Spatial Mapping

02. PROCESSED RESULTS
> vision_input: detected_object_sphere
> cot_init: plan_approach_vector
> reasoning: calculate_torque_limits
> action_tokens: [0.12, -0.44, 0.89, 0.0]
> feedback: contact_established

Task: Precise Placement

Model decomposes high-level instructions into executable joint-space vectors with continuous feedback loops.

> state: walking_gait_unstable
> loc_aware: adjust_center_of_mass
> reasoning: shift_left_pedal_pressure
> action_tokens: [0.05, 0.22, -0.11, 0.9]
> status: balance_restored

Task: Terrain Recovery

Integration of IMU data into the transformer context window allows for immediate postural correction.

> user_input: "fetch the blue beaker"
> visual_search: active_scan
> cot: identify_beaker_depth_map
> reasoning: grasp_point_optimization
> action_tokens: [0.88, 0.12, -0.34, 0.5]

Task: Zero-Shot Fetch

Natural language instruction to complex action mapping via internal semantic reasoning chain.

Analysis 01


L1: BASIC L5: DYNAMIC L10: ADAPTIVE
94.2% Peak Efficiency

Performance remains logarithmic even as spatial complexity increases, demonstrating the robustness of LA-CoT reasoning.

Analysis 02


SOTA-VLA
PRAGYAVLA
BASE-TRANSFORMER
~42ms Optimized Path

PragyaVLA achieves a 2.4x latency reduction compared to existing vision-action architectures by offloading non-critical reasoning to background threads.