huggingface · faridun-ag2 · Mar 31, 2026 · Apr 1, 2026
diff --git a/notebooks/en/_toctree.yml b/notebooks/en/_toctree.yml
@@ -165,6 +165,9 @@
           title: Multi-agent RAG System 🤖🤝🤖
         - local: mongodb_smolagents_multi_micro_agents
           title: MongoDB + SmolAgents Multi-Micro Agents to facilitate a data driven order-delivery AI agent
+        - local: ag2_multiagent_system
+          title: Building a Model Comparison Pipeline with Multi-Agent Collaboration
+          isNew: true
 
 - title: Enterprise Hub Cookbook
   isExpanded: True

diff --git a/notebooks/en/ag2_multiagent_system.ipynb b/notebooks/en/ag2_multiagent_system.ipynb
@@ -0,0 +1,127 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "8ey1c8hg2a2",
+   "source": "# Building a Model Comparison Pipeline with Multi-Agent Collaboration\n\n_Authored by: [Faridun Mirzoev](https://huggingface.co/faridunm)_\n\nChoosing the right model from the Hugging Face Hub can be overwhelming — there are thousands of options for any given task. In this notebook, we build an automated model comparison pipeline where multiple AI agents collaborate to search, analyze, and compare models for your specific use case.\n\nWe'll use [AG2](https://ag2.ai) to orchestrate three specialized agents:\n- A **Scout** that searches the Hub and finds candidate models\n- An **Analyst** that extracts technical specs from model cards\n- An **Advisor** that compares the candidates and gives a recommendation\n\nEach agent has access to real Hugging Face Hub tools and works with live data — this isn't a simulation.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "k0n9cib2i3k",
+   "source": "## Setup",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "a419rza83zc",
+   "source": "!pip install \"ag2[openai]>=0.11.4,<1.0\" huggingface_hub -q",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "u7ednlz6w1o",
+   "source": "## Connecting to a Hugging Face Model\n\nWe'll power our agents with an open-source model hosted on the Hugging Face Inference API. The API provides an OpenAI-compatible endpoint, so we just need to set the `base_url` and use a [HF token](https://huggingface.co/settings/tokens) as the API key.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "mhlor4gklud",
+   "source": "import os\nfrom huggingface_hub import get_token\nfrom autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager, LLMConfig\n\nhf_token = get_token()\n\nllm_config = LLMConfig({\n    \"model\": \"Qwen/Qwen2.5-Coder-32B-Instruct\",\n    \"api_key\": hf_token,\n    \"api_type\": \"openai\",\n    \"base_url\": \"https://router.huggingface.co/v1\",\n})",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6u2cbbdex",
+   "source": "## Building the Hub Tools\n\nBefore creating agents, we need to give them the ability to interact with the Hugging Face Hub. We'll create three tools that wrap the `huggingface_hub` API:\n\n1. **search_models** — find models for a given task and query\n2. **get_model_details** — extract key specs from a model card (size, license, languages, downloads)\n3. **compare_model_stats** — fetch download/like counts for side-by-side comparison",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "m1xszst8yh7",
+   "source": "import json\nfrom typing import Annotated\nfrom huggingface_hub import HfApi, ModelCard\n\napi = HfApi()\n\n\ndef search_models(\n    task: Annotated[str, \"The task type, e.g. 'text-generation', 'text-classification', 'image-classification'\"],\n    query: Annotated[str, \"Search query to filter models\"] = \"\",\n    limit: Annotated[int, \"Max number of results\"] = 5,\n) -> str:\n    \"\"\"Search the Hugging Face Hub for models matching a task and query.\"\"\"\n    models = list(api.list_models(\n        pipeline_tag=task,\n        search=query,\n        sort=\"downloads\",\n        limit=limit,\n    ))\n    if not models:\n        return \"No models found for this task and query.\"\n\n    results = []\n    for m in models:\n        results.append({\n            \"model_id\": m.id,\n            \"downloads\": m.downloads,\n            \"likes\": m.likes,\n            \"pipeline_tag\": m.pipeline_tag,\n        })\n    return json.dumps(results, indent=2)\n\n\ndef get_model_details(\n    model_id: Annotated[str, \"The model ID, e.g. 'meta-llama/Llama-3.1-8B-Instruct'\"],\n) -> str:\n    \"\"\"Get detailed information about a model: size, license, description, tags.\"\"\"\n    try:\n        info = api.model_info(model_id)\n        try:\n            card = ModelCard.load(model_id)\n            card_text = card.text[:500] if card.text else \"No model card text available.\"\n        except Exception:\n            card_text = \"Could not load model card.\"\n\n        details = {\n            \"model_id\": info.id,\n            \"pipeline_tag\": info.pipeline_tag,\n            \"downloads_last_month\": info.downloads,\n            \"likes\": info.likes,\n            \"license\": info.card_data.license if info.card_data and info.card_data.license else \"not specified\",\n            \"tags\": info.tags[:15],\n            \"card_summary\": card_text,\n        }\n        return json.dumps(details, indent=2)\n    except Exception as e:\n        return json.dumps({\"error\": str(e)})\n\n\ndef compare_model_stats(\n    model_ids: Annotated[list[str], \"List of model IDs to compare\"],\n) -> str:\n    \"\"\"Compare download and like counts for multiple models side by side.\"\"\"\n    comparison = []\n    for model_id in model_ids:\n        try:\n            info = api.model_info(model_id)\n            comparison.append({\n                \"model_id\": info.id,\n                \"downloads\": info.downloads,\n                \"likes\": info.likes,\n                \"license\": info.card_data.license if info.card_data and info.card_data.license else \"unknown\",\n                \"pipeline_tag\": info.pipeline_tag,\n            })\n        except Exception as e:\n            comparison.append({\"model_id\": model_id, \"error\": str(e)})\n    return json.dumps(comparison, indent=2)",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ivlu78hr65",
+   "source": "## Creating the Agent Team\n\nEach agent has a focused role and access to specific tools. The **Scout** finds candidates, the **Analyst** digs into technical details, and the **Advisor** synthesizes everything into a recommendation.\n\nWe use AG2's decorator pattern to register which agent can call which tool, and which agent executes the result.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "now27eshhbm",
+   "source": "scout = AssistantAgent(\n    name=\"Scout\",\n    system_message=(\n        \"You are a model scout. Your job is to search the Hugging Face Hub \"\n        \"to find candidate models for the user's task. Use the search_models tool \"\n        \"to find the top candidates. Present results clearly with model IDs and \"\n        \"download counts. Only use the tools provided — do not make up model names.\"\n    ),\n    llm_config=llm_config,\n)\n\nanalyst = AssistantAgent(\n    name=\"Analyst\",\n    system_message=(\n        \"You are a model analyst. When the Scout has found candidates, use \"\n        \"compare_model_stats to get a side-by-side overview of all candidates at once. \"\n        \"Then use get_model_details on the top 2-3 most promising ones only. \"\n        \"Summarize your findings in a structured comparison table. \"\n        \"Only use the tools provided — do not invent specifications.\"\n    ),\n    llm_config=llm_config,\n)\n\nadvisor = AssistantAgent(\n    name=\"Advisor\",\n    system_message=(\n        \"You are a model advisor. Based on the Scout's search results and the \"\n        \"Analyst's detailed comparison, provide a clear recommendation. Consider: \"\n        \"download popularity (community trust), license compatibility, model size, \"\n        \"and suitability for the user's stated use case. \"\n        \"Give a top pick with reasoning, plus an alternative. \"\n        \"End your recommendation with TERMINATE.\"\n    ),\n    llm_config=llm_config,\n)\n\nexecutor = UserProxyAgent(\n    name=\"Executor\",\n    human_input_mode=\"NEVER\",\n    max_consecutive_auto_reply=10,\n    is_termination_msg=lambda x: (x.get(\"content\") or \"\").rstrip().endswith(\"TERMINATE\"),\n    code_execution_config=False,\n)\n\n# Register tools — Scout and Analyst can call them, Executor runs them\nfor tool_fn, description in [\n    (search_models, \"Search the Hugging Face Hub for models matching a task and query\"),\n    (get_model_details, \"Get detailed information about a specific model\"),\n    (compare_model_stats, \"Compare download and like counts for multiple models side by side\"),\n]:\n    executor.register_for_execution()(tool_fn)\n    scout.register_for_llm(description=description)(tool_fn)\n    analyst.register_for_llm(description=description)(tool_fn)",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "teakl7tivpc",
+   "source": "## Running the Pipeline\n\nLet's ask our agent team to help us choose a text-classification model. The GroupChat manager will coordinate which agent speaks when — the Scout searches first, the Analyst digs deeper, and the Advisor wraps up with a recommendation.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "5xyvski36ol",
+   "source": "group_chat = GroupChat(\n    agents=[executor, scout, analyst, advisor],\n    messages=[],\n    max_round=20,\n    speaker_selection_method=\"auto\",\n)\n\nmanager = GroupChatManager(\n    groupchat=group_chat,\n    llm_config=llm_config,\n)\n\nexecutor.run(\n    manager,\n    message=(\n        \"I need a text-classification model for sentiment analysis on product reviews. \"\n        \"It should be open-source with a permissive license (Apache 2.0 or MIT), \"\n        \"well-maintained (high downloads), and not too large (suitable for fine-tuning \"\n        \"on a single GPU). Find the best options and recommend one.\"\n    ),\n).process()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "s9dnwgkaovi",
+   "source": "## Reviewing the Conversation\n\nLet's trace how the agents collaborated:",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "hl6mz1tqcdb",
+   "source": "for msg in group_chat.messages:\n    name = msg.get(\"name\", msg.get(\"role\", \"unknown\"))\n    content = msg.get(\"content\", \"\")\n    if content and content.strip():\n        print(f\"{'='*60}\")\n        print(f\"  {name}:\")\n        print(f\"{content[:600]}\")\n        print()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "gtd3eb2x568",
+   "source": "## Try It Yourself\n\nChange the query below to explore different tasks — image classification, text generation, translation, or anything else on the Hub:",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "uq7k3eytrel",
+   "source": "# Change this to your own use case!\nyour_query = (\n    \"I need an image-classification model for classifying medical X-ray images. \"\n    \"It should have a permissive license and be suitable for fine-tuning.\"\n)\n\ngroup_chat_2 = GroupChat(\n    agents=[executor, scout, analyst, advisor],\n    messages=[],\n    max_round=20,\n    speaker_selection_method=\"auto\",\n)\n\nmanager_2 = GroupChatManager(groupchat=group_chat_2, llm_config=llm_config)\nexecutor.run(manager_2, message=your_query).process()\n\n# Print the advisor's recommendation\nfor msg in reversed(group_chat_2.messages):\n    if msg.get(\"name\") == \"Advisor\" and msg.get(\"content\", \"\").strip():\n        print(\"Advisor's Recommendation:\")\n        print(msg[\"content\"].replace(\"TERMINATE\", \"\").strip())\n        break",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dabq14hu8h6",
+   "source": "## What We Built\n\nThis notebook demonstrated a practical model comparison pipeline where:\n\n1. **The Scout** searched the Hub using task filters and sorted by downloads\n2. **The Analyst** pulled model cards and specs to build a structured comparison\n3. **The Advisor** synthesized the data into an actionable recommendation\n\nThe key idea is that each agent has a focused role with access to real tools — they're not just chatting, they're calling the `huggingface_hub` API and working with live data.\n\n**Extending this pattern:**\n- Add a **Benchmarker** agent that runs test inferences to compare latency\n- Include a **Cost Estimator** that checks model size vs. your hardware\n- Connect to the [Hugging Face Inference Endpoints API](https://huggingface.co/docs/inference-endpoints) for deployment recommendations\n- Swap the backbone model — any model on the Inference API works as a drop-in replacement\n\nFor more on multi-agent patterns, see the [AG2 documentation](https://docs.ag2.ai).",
+   "metadata": {}
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.14.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/notebooks/en/index.md b/notebooks/en/index.md
@@ -7,6 +7,7 @@ applications and solving various machine learning tasks using open-source tools
 
 Check out the recently added notebooks:
 
+- [Building a Model Comparison Pipeline with Multi-Agent Collaboration](ag2_multiagent_system)
 - [Concurrent Multi-Config SFT Training with RapidFire AI](rapidfire_sft_multiconfig_training)
 - [Optimizing Language Models with DSPy GEPA](dspy_gepa)
 - [Efficient Online Training with GRPO and vLLM in TRL](grpo_vllm_online_training)