Zero-Shot LLM-Driven Multi-Robot Coordination
A framework enabling users to orchestrate a fleet of sensor-less, differential-drive mobile robots using natural language.
Zero-Shot LLM-Driven Multi-Robot Coordination
Overview
A framework enabling users to orchestrate a fleet of sensor-less, differential-drive mobile robots using natural language.
Project Overview
This research presented a comprehensive “zero-shot” framework enabling users to orchestrate a fleet of sensor-less, differential-drive mobile robots using purely natural language instructions through a Python-based GUI. The system bridged the gap between abstract, high-level user intent and precise, low-level physical execution by delegating task decomposition to Large Language Models.
System Architecture
The framework integrates LLM tool-calling, perception, planning, and control layers.These layers are connected over a high-frequency RPC layer, which allows the system to create, schedule, and execute tasks directly from user text inputs.
The perception layer relied on an overhead multi-camera setup calibrated using a Charuco board and stitched into a unified top-down view using homography alignment. Robot localization was handled externally by tracking ArUco tags mounted on the chassis.
High-level cognitive reasoning was powered by Gemini 2.5 Pro, which decomposed complex user prompts into actionable sub-tasks, while Moondream handled VLM-based object queries. An RPC server was established to coordinate task, robot, and data services, communicating with the physical hardware via an ESP32 bridge supporting Wi-Fi, BLE, and ESP-NOW.
For physical execution, the framework utilized a highly responsive motion stack combining an RRT* planner with online replanning. A hybrid controller was deployed utilizing Pure Pursuit for general navigation and a tuned PID terminal controller to ensure millimeter-precision alignment. The framework was rigorously validated through complex physical experiments, including collaboratively ordering numbered blocks and two-robot color-sorting with live collision avoidance.
Technologies Used: Gemini API, ZMQ, RRT*, and Control-Systems.
Collaborative multi-robot task execution
System interface and tracking