MosaicLeaks Study: Can AI Research Agents Really Keep Secrets?

📰 Key Takeaways

MosaicLeaks is a new study on deep research AI agent privacy leaks, revealing a vulnerability called the “mosaic effect”: when agents simultaneously access local private files and external network tools, each seemingly harmless search query can accumulate to allow observers to piece together enterprise secrets.

The study uses a medical institution as an example: to complete a multi-step question, the agent first queried cloud migration milestones, security disclosure events, and affected vendors — no single query directly leaked sensitive information. However, by examining the complete query log, observers could infer that “MediConn had migrated 70% of its infrastructure to the cloud by January 2025” — data that originally only existed in private files.

The research team defined three leakage levels: intent leakage (predicting what questions the agent is researching), answer leakage (directly answering private questions from query logs), and full information leakage (observers can proactively derive private facts without knowing the original questions).

To address this, researchers built the MosaicLeaks evaluation set containing 1,001 multi-hop research chains and proposed PA-DR, a privacy-aware deep research training method that uses reinforcement learning to introduce privacy leakage awareness. Experimental results show that PA-DR increased strict chain-of-thought accuracy from 48.7% to 58.7% while dramatically reducing answer and full information leakage rates from 34.0% to 9.9%, demonstrating that task accuracy and privacy protection can be achieved simultaneously.

💬 JudyAI Lab Perspective

MosaicLeaks reveals the “mosaic effect” and makes one thing clear: privacy risks in AI agents often hide in the cumulative patterns of multi-step queries, rather than in a single action’s mistake.

For AI builders, this study points out a common design blind spot: privacy protection usually focuses on “access control,” but ignores that the agent’s external query behavior itself is also a leakage vector. The three leakage levels defined in the research — intent, answer, and full information — show that attackers don’t need to steal files; they can simply observe search logs to reverse-engineer secrets. What’s even more noteworthy is the PA-DR method’s experimental results: strict chain-of-thought accuracy rose from 48.7% to 58.7% while leakage rates dropped from 34.0% to 9.9%, breaking the intuition that “security must sacrifice accuracy.”

When designing multi-step research agents, consider this first: if someone fully records all of the agent’s external queries, how much secrets can they reconstruct? This question is worth figuring out before “is the data encrypted.”

📅 Source Information

Published: 2026-06-18T18:13
Source: https://huggingface.co/blog/ServiceNow/mosaicleaks

📰 Key Takeaways#

💬 JudyAI Lab Perspective#

📅 Source Information#

🔗 Further Reading#

Get our weekly AI digest:

📰 Key Takeaways

💬 JudyAI Lab Perspective

📅 Source Information

🔗 Further Reading