PragyaVLA
An Instruction-Finetuned Vision-Language-Action Model engineered for high-fidelity humanoid control. PragyaVLA introduces Locomotion-Aware Chain-of-Thought (LA-CoT), bridging the gap between high-level reasoning and millisecond-level motor primitive execution.
Research Phase: V3.2 Deployment
12.4B
Dense Parameters
94.2%
Success on Complex Tasks
42ms
Action Token Generation
High
Zero-Shot Capabilities
Scenario: Dynamic Manipulation
Scenario: Locomotion Terrain
Scenario: Spatial Mapping
> cot_init: plan_approach_vector
> reasoning: calculate_torque_limits
> action_tokens: [0.12, -0.44, 0.89, 0.0]
> feedback: contact_established
Task: Precise Placement
Model decomposes high-level instructions into executable joint-space vectors with continuous feedback loops.
> loc_aware: adjust_center_of_mass
> reasoning: shift_left_pedal_pressure
> action_tokens: [0.05, 0.22, -0.11, 0.9]
> status: balance_restored
Task: Terrain Recovery
Integration of IMU data into the transformer context window allows for immediate postural correction.
> visual_search: active_scan
> cot: identify_beaker_depth_map
> reasoning: grasp_point_optimization
> action_tokens: [0.88, 0.12, -0.34, 0.5]
Task: Zero-Shot Fetch
Natural language instruction to complex action mapping via internal semantic reasoning chain.
Performance remains logarithmic even as spatial complexity increases, demonstrating the robustness of LA-CoT reasoning.
PragyaVLA achieves a 2.4x latency reduction compared to existing vision-action architectures by offloading non-critical reasoning to background threads.