<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>LLM Evaluation on Judy AI Lab</title>
    <link>https://judyailab.com/en/tags/llm-evaluation/</link>
    <description>Recent content in LLM Evaluation on Judy AI Lab</description>
    <image>
      <title>Judy AI Lab</title>
      <url>https://judyailab.com/logo.jpg</url>
      <link>https://judyailab.com/logo.jpg</link>
    </image>
    <generator>Hugo -- 0.147.4</generator>
    <language>en</language>
    <lastBuildDate>Sun, 12 Apr 2026 05:01:06 +0000</lastBuildDate>
    <atom:link href="https://judyailab.com/en/tags/llm-evaluation/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Open-Source LLM in Production: Why We Chose MiniMax M2.7 for Our AI Team</title>
      <link>https://judyailab.com/en/posts/open-source-llm-agent-team-2026/</link>
      <pubDate>Sun, 12 Apr 2026 05:01:06 +0000</pubDate>
      <guid>https://judyailab.com/en/posts/open-source-llm-agent-team-2026/</guid>
      <description>Not a leaderboard ranking. This is what actually happened when we ran MiniMax M2.7 as the backbone of our daily AI team operations. Includes real output quality observations from two agent roles (ada and mimi), plus three pitfalls around context windows, tool calling stability, and language output you won&amp;#39;t find in any benchmark. Useful for developers evaluating model selection for multi-agent systems.</description>
    </item>
  </channel>
</rss>
