Installing a "Super Brain": Demystifying JetArm's AI Decision
As a leader in robotics education, Hiwonder has launched intelligent robot platforms like JetRover and JetAuto for ROS and AI education, earning widespread acclaim for their cutting-edge technology integration and high-performance hardware. Take the JetArm ROS robotic arm as an example. It is not merely an execution tool but an intelligent agent equipped with a "Super Brain." So, how exactly does JetArm perceive the world and make decisions?
1. Multimodal Perception: The Synergy of JetArm's "Eyes, Ears, and Brain"
The secret behind JetArm's AI decision-making pipeline lies in its powerful "Super Brain" — Multimodal Large AI Models. Unlike robots with single-mode sensors, JetArm integrates three primary perceptual channels: vision, voice, and text, forming a high-dimensional system for environmental understanding.
It deploys vision models, utilizing a 3D depth camera to capture environmental information. This allows it not only to identify objects, colors, and shapes but also to understand scene semantics, achieving the leap from "seeing" to "comprehending." Simultaneously, the AI Voice Interaction Box mounted on the body grants JetArm "hearing" and "speech synthesis" capabilities, supporting real-time voice dialogue and command response for a more natural and fluid human-robot experience.
Furthermore, by calling mainstream large language models (LLMs) like Tongyi Qianwen, DeepSeek, and 01 AI via network connectivity, JetArm gains powerful semantic understanding and reasoning abilities. This enables it to parse complex instructions, engage in multi-turn dialogues, and perform task planning and AI-driven decision-making.

💡Get JetArm tutorials here, or go to Hiwonder GitHub for more repositories.
2. The Decision Hub: The Intelligent Chain from Understanding to Planning
These three dimensions of large models do not operate in isolation. They are deeply fused into a collaborative cognitive system, enabling cross-model understanding and autonomous decision-making. Let's analyze how the multimodal AI models execute the decision-making process through a basic color block sorting application.
Place multi-colored blocks on a table and say to JetArm: "Keep the items that are the same color as the sky, and remove the others." Upon recognizing the voice command, JetArm's "Super Brain" begins efficient operation. The entire decision pipeline can be broken down into three core layers:
● Intent Understanding Layer: The LLM first performs a deep parse of the instruction. It accurately interprets "the same color as the sky" as "blue" and confirms the core requirement is to "retain blue items and remove items of other colors." The key here is enabling the machine to understand not just the literal meaning but also the natural, sometimes ambiguous, expressions used by humans.
● Task Planning Layer: After understanding the intent, JetArm enters the planning phase. Using information provided by the vision model—such as the color, shape, and spatial layout of all blocks—the system matches visual recognition results with semantic understanding. It precisely locates all blue blocks and formulates a plan: which non-blue blocks to remove first, the optimal grasping sequence and path, and planning for the drop-off zone with obstacle avoidance strategies.
● Behavior Choreography Layer: Finally, JetArm utilizes inverse kinematics algorithms and PID closed-loop control algorithms to convert the planned grasping point coordinates into precise angles for each servo. It monitors and dynamically adjusts the arm's motion in real-time to achieve accurate and smooth action execution.

This highly efficient decision pipeline isn't limited to color block sorting. It similarly underpins JetArm's more complex, high-level applications like waste sorting, intelligent home assistance, and smart material handling. While traditional robotic arms remain in the stage of "executing pre-programmed routines," JetArm can achieve true "Understand-Plan-Act" autonomous intelligence through multimodal AI models. This embodies the core value of Embodied AI.
Through its three-stage decision pipeline of multimodal perception, large-model reasoning, and precise execution, JetArm achieves the leap from "following orders" to "making autonomous decisions." It is not only an ideal platform for ROS learning and AI research but also a crucial vehicle for realizing embodied intelligence. JetArm is empowering every developer and educator to jointly shape a smarter, more natural future for robotics.