EVA-Bench Data 2.0 Benchmark Released: Covering 3 Domains, 121 Tools, and 213 Test Scenarios

📰 Key Takeaways The summary field is currently blank with no content available for translation. Please confirm if you forgot to paste the original content. 💬 JudyAI Lab Perspective The summary field is blank, with no news content available for reference. Please paste the original content, and I’ll process it immediately. 📅 Source Information Published: 2026-06-04T12:24 Source Original: https://huggingface.co/blog/ServiceNow-AI/eva-bench-data 🔗 Further Reading Rise of Customized AI Models: Tailoring Intelligence for Your Enterprise From Trading Ideas to Production: Real-World AI-Assisted Strategy Development

2026-06-04 · 1 min · 81 words · Judy

GPT-Rosalind New Capabilities Released

AI News Flash: OpenAI recently released capability expansion updates for GPT-Rosalind, focusing on four areas: biological reasoning, medicinal chemistry, genomics analysis, and experimental workflow integration. The enhancement of biological reasoning capabilities allows the model to more deeply understand complex interaction mechanisms and regulatory logic in biological systems; improved medicinal chemistry expertise helps researchers get more precise AI-assisted judgments in new drug design and molecular structure analysis…

2026-06-04 · 2 min · 370 words · Judy

Google Launches Fake Call Detection to Combat AI Deepfake Voice Scam Impersonation Attacks

AI News Flash: As more people adopt the habit of rejecting calls from unknown numbers, phone scam groups are shifting tactics. They now spoof trusted numbers to make the caller ID appear as banks, government agencies, friends, or employers, luring victims into answering. Even more dangerous, these scams now pair with AI deepfake voice technology that can generate near-perfect fake voices in real-time, precisely mimicking family members, supervisors, or officials…

2026-06-03 · 2 min · 359 words · Judy

Codex Expands Universally: Supporting All Roles, Tool Integrations & Workflows

AI News Flash: OpenAI recently released a suite of Codex expansion tools designed for different functional roles, covering plugins, integrations, web annotations, and more—to enable non-engineering team members like analysts, marketers, designers, and investors to directly leverage Codex’s AI capabilities for daily productivity gains. The core of this update is transforming Codex from a pure developer tool into a cross-functional AI productivity platform, no longer limited to coding scenarios. However, the original summary is just a brief intro without specific plugin names, feature details, or metrics—please refer to the original link for full details.

2026-06-02 · 2 min · 304 words · Judy

Travelers Insurance Partners with OpenAI to Deploy AI Claims System Across the US

AI News Flash: Insurance company Travelers partners with OpenAI to build an AI-driven Claim Assistant system. The system tackles three key pain points: first, guiding customers through the claims process step by step to reduce form-filling errors and administrative friction, enabling everyday users unfamiliar with insurance procedures to submit applications smoothly; second, providing round-the-clock customer support—24 hours a day, seven days a week—so clients can get instant responses without waiting for human agents; third, flexibly scaling capacity during peak claim periods (such as after major natural disasters like hurricanes or floods) to avoid processing delays caused by workforce bottlenecks…

2026-06-02 · 2 min · 356 words · Judy

This AI Weather Startup's Forecast Accuracy Has Surpassed Government Agencies

AI News Flash: WindBorne’s competitive edge comes from owning both data collection and model building. The company currently releases weather balloons at 15 locations worldwide, with about 400 balloons in the air at any moment, reading atmospheric sensor data in real-time. The accuracy boost in their latest weather forecasting model doesn’t come from switching to a bigger model architecture—it comes from improving how balloon data gets fed into the model, aka optimizing the data preprocessing and assimilation pipeline…

2026-06-02 · 2 min · 324 words · Judy

Florida Sues OpenAI and Sam Altman, Becoming the First US AI Violence Lawsuit

AI news flash: The Florida state government sues OpenAI and its CEO Sam Altman, becoming the first state-level legal action against a generative AI company in US history. One of the core disputes involves a shooting incident at a Florida state university last year, and the alleged role ChatGPT played in the incident. The state government alleges that OpenAI’s products had some degree of involvement in the violent incident, using this as one of the bases for the lawsuit…

2026-06-02 · 2 min · 340 words · Judy

Alphabet Plans to Raise $80 Billion for AI Infrastructure Expansion

AI News Flash: Alphabet (Google’s parent company) publicly states that demand from enterprise customers and consumers for its AI solutions and services continues to grow strongly, significantly exceeding the company’s current supply capacity.

2026-06-02 · 2 min · 345 words · Judy

Beyond Large Language Models: The Key to Enterprise AI at Scale is Agent Logic

AI News Flash: IBM Research study reveals that the key to enterprise AI scaling isn’t bigger LLMs—it’s ‘Agent Logic’: a guidance layer built from software primitives like knowledge graphs, static program analysis, and algorithm decomposition. This mechanism compresses LLM context space while reducing hallucination rates and token consumption, making model behavior more controllable and costs more predictable.

2026-06-01 · 1 min · 104 words · Judy

JetBrains Releases Mellum2: 12B Parameter Mixture-of-Experts Architecture Developer-Focused Model

AI News Flash: JetBrains released Mellum2 on June 1, 2026—a 12-billion parameter open-source model based on Mixture-of-Experts (MoE) architecture, but it only activates 2.5 billion active parameters per inference, making inference over twice as fast as models of equivalent scale, significantly reducing deployment costs, released under Apache 2.0 license. Mellum2 isn’t positioned as a replacement for frontier large models, but rather as a ‘focused model’ in multi-model collaboration systems, handling high-frequency lightweight tasks including prompt classification, tool selection, context compression and summarization for RAG pipelines, sub-agent planning validation, and code completion. The model processes only text and code modalities, deliberately excluding multimodal capabilities to keep the architecture lean—particularly suitable for enterprises deploying in private environments to handle internal code and confidential data. Across multiple benchmarks including code generation, reasoning, science, and math, Mellum2 achieves competitive performance among open-source models of similar scale. The technical report has also been published on arXiv (编号 2605.31268), and model weights are available for download on HuggingFace.

2026-06-01 · 2 min · 384 words · Judy
Get our weekly AI digest:

AI engineering, trading systems, automation — curated weekly. No spam.