MIT's DAAAM System Enables Robots to Query Object Locations in Natural Language

A robot deployed in MIT's test kitchen can now tell you that the scissors are in the second drawer from the left, that it saw them there eleven minutes ago, and that you used them to open a package before leaving them on the counter yesterday afternoon. The system answering these queries, DAAAM—short for Detect, Assign, Allocate, and Match—processes natural language questions about object locations and retrieves answers in 1.8 seconds on average, according to benchmarks published by the Computer Science and Artificial Intelligence Laboratory in June 2026. That response time includes parsing the question, searching spatial memory, and generating a verbal reply, a pipeline that required separate systems in previous implementations.

The architecture represents a departure from how mobile robots typically handle object tracking. Most commercial systems maintain object databases with fixed coordinate pairs, updating locations only when sensors detect movement. DAAAM instead builds a four-dimensional map—three spatial axes plus time—logging not just where an object sits but when the robot last observed it and what context surrounded that observation. The system tested in MIT's deployment remembered 312 distinct household items across a 1,200-square-foot apartment over a six-week trial, maintaining accurate location data for objects that moved an average of 4.3 times per day. Temporal indexing proved critical: when a user asked about missing keys, the robot retrieved the last three locations in chronological order, a capability that required storing snapshots of the environment at each observation rather than overwriting previous records.

Daniela Rus, director of CSAIL and a co-author on the research, emphasized that the breakthrough lies in integration rather than any single technical component. The object detection model uses a standard vision transformer trained on household items, achieving 94.7 percent accuracy on the test set. The spatial memory builds on occupancy grid mapping, a technique borrowed from autonomous vehicles that divides physical space into cells and tracks probability distributions for object presence. What DAAAM contributes is the matching layer—a neural network that translates natural language queries into database lookups without requiring users to phrase questions in rigid templates. During testing, the system correctly interpreted 89 percent of queries that included ambiguous references like "the thing I was using this morning" or "where I left my tools," matching them to specific objects based on usage patterns and timestamps.

The practical implications extend beyond finding misplaced items. Manufacturing facilities with mobile robots could query part locations without manual inventory scans. Warehouses could ask robots which shelf last held a specific SKU, then cross-reference that answer with the time of last observed movement to identify potential theft or misplacement. Healthcare settings could deploy the technology for medication tracking, with robots answering questions about which cabinet contains a particular drug and when staff last accessed it. The temporal component addresses a limitation in current robotic systems: they know where things are now but have no record of where things were or how they got there. DAAAM's memory persists across power cycles and can be shared between robots, allowing multiple units to pool observations into a unified spatial-temporal map. In the MIT trials, two robots operating in the same space synchronized their memory databases every six minutes, maintaining consistency even when one robot observed an object movement the other missed.

The system's architecture also reveals constraints that will shape near-term deployment. DAAAM requires 18 gigabytes of RAM to maintain memory for 300 objects over a month, limiting it to robots with desktop-class computers rather than embedded processors. Processing visual input from four ceiling-mounted cameras plus the robot's onboard sensors consumed 47 watts during active querying, compared to 12 watts for navigation alone. The researchers noted that edge cases remain: objects smaller than three centimeters frequently disappeared from tracking, reflective surfaces confused the vision system, and the robot failed to distinguish between identical items unless they occupied different locations. Query accuracy also degraded when users asked about events more than two weeks old, suggesting the memory architecture needs refinement for long-term retention. Commercial deployment will likely require optimizing the model for specific environments rather than attempting general-purpose household memory.

What to Watch: MIT's CSAIL plans to release a limited research version of the DAAAM codebase in August 2026, targeting academic labs working on mobile manipulation. Watch for integration announcements from Boston Dynamics or Agility Robotics, both of which have research partnerships with MIT and deploy mobile robots in warehouse environments where spatial memory would provide immediate value. Monitor whether the system scales beyond household objects to industrial parts, which would require handling thousands rather than hundreds of distinct items and maintaining accuracy over months rather than weeks.

Robot.com Deploys Autonomous Ad Robots at LEAP East Hong Kong

Built Robotics Lands $75M Blattner Contract for Solar Site Autonomy

NVIDIA Opens Robotics Roles Across Beijing, Shanghai, and Shenzhen