This example walks through a complete evaluation of the A-MEM memory layer on the LongMemEval dataset using the three-stage pipeline: memory construction, memory retrieval, and question answering with evaluation.
You can replace A-MEM with any other supported memory layer by swapping the config and --memory-type argument.
Download the LongMemEval dataset from HuggingFace:
https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned
Save it to a local path, e.g., /path/to/longmemeval.json.
Each memory layer requires its own configuration JSON file. This example uses A-MEM. See amem_config.json.
Note: The
user_idfield is a placeholder that will be overwritten during execution. API keys and base URLs are read from the environment variablesOPENAI_API_KEYandOPENAI_API_BASEby default. You can also set them explicitly viallm_api_key/llm_base_urlandembedding_api_key/embedding_base_urlin the config if needed.
The full list of configuration fields can be found in membase/configs/amem.py.
The evaluation stage requires an API config to call LLM-based QA and judge models. See api_config.json:
{
"api_keys": ["sk-your-api-key-1", "sk-your-api-key-2"],
"base_urls": ["https://api.openai.com/v1", "https://api.openai.com/v1"]
}Alternatively, set environment variables instead of using --api-config-path:
export OPENAI_API_KEY="sk-your-api-key"
export OPENAI_API_BASE="https://api.openai.com/v1"Edit run_construction.sh to set your dataset path, API keys, and base URLs, then run:
bash examples/evaluate_amem_on_longmemeval/run_construction.shThis example processes 4 trajectories split across 2 parallel processes (ranges 0-2 and 2-4), each with num_workers=2. Monitor progress in amem_logs/.
After memory construction completes, edit run_search.sh and run:
bash examples/evaluate_amem_on_longmemeval/run_search.shThe output will be saved to {save_dir}/{top_k}_{start_idx}_{end_idx}.json (e.g., amem_output/10_0_4.json).
Edit run_evaluation.sh and run:
bash examples/evaluate_amem_on_longmemeval/run_evaluation.shThe evaluation results will be saved as {search_results_path}_evaluation.json.
- API Rate Limits: Set
num_workersconservatively (e.g., 4-8) to avoid upstream API overload. - Resume Interrupted Runs: If the process is interrupted, simply re-run the same command. Completed trajectories will be skipped automatically.
- Token Cost Tracking: Check the generated
token_cost_*.jsonfiles for detailed token consumption statistics. - Log Files: Monitor
{log_dir}/process_*.logfiles for real-time progress and debugging.