Skip to content

Chanakya1305/ai-log-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1 Commit
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ” AI Log Analyzer - Intelligent Anomaly Detection

Real-time log monitoring system powered by Claude AI. Combines statistical pattern analysis with AI intelligence to detect anomalies, identify root causes, and recommend automated remediation actions. Built for production systems requiring proactive incident detection.

๐ŸŽฏ Overview

Production-grade log analysis platform that processes application logs to detect anomalies before they impact users. Uses baseline comparison for statistical anomaly detection, then leverages Claude AI for intelligent root cause analysis and action recommendations.

Key Capabilities:

  • Real-time log ingestion and analysis
  • Statistical baseline tracking with exponential moving averages
  • AI-powered root cause analysis and impact assessment
  • Automated alert generation with rate limiting
  • Actionable remediation recommendations
  • Multi-service anomaly correlation

๐Ÿ—๏ธ Architecture

Application Logs โ†’ FastAPI Endpoint โ†’ Background Processing
                                              โ†“
                                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                    โ”‚ Pattern Analyzer โ”‚
                                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                             โ†“
                          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                          โ”‚                                     โ”‚
                    Current Stats                        Baseline Stats
                 (5-min window)                         (Redis cache)
                          โ”‚                                     โ”‚
                          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                             โ†“
                                   Statistical Comparison
                                   (Error rate, volume, etc.)
                                             โ†“
                                      Anomaly Detected?
                                             โ†“
                                            Yes
                                             โ†“
                              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                              โ”‚  Claude AI Analysis      โ”‚
                              โ”‚  - Root cause            โ”‚
                              โ”‚  - Impact assessment     โ”‚
                              โ”‚  - Recommended actions   โ”‚
                              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                        โ†“
                              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                              โ”‚  Alert System    โ”‚
                              โ”‚  - Rate limiting โ”‚
                              โ”‚  - PostgreSQL    โ”‚
                              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Data Flow:

  1. Logs ingested via POST endpoint
  2. Background task analyzes 5-minute window
  3. Compare statistics to Redis-cached baseline
  4. If anomaly detected โ†’ Claude AI analyzes root cause
  5. Create alert (if not in cooldown period)
  6. Update baseline with exponential moving average

๐Ÿš€ Key Features

1. Statistical Anomaly Detection

Metrics Tracked:

  • Error rate (ERROR + CRITICAL logs)
  • Log volume (total logs per window)
  • Service count (unique services logging)
  • Level distribution (INFO, WARN, ERROR, CRITICAL)

Detection Algorithms:

  • Error rate spike: 5x baseline AND >5% absolute
  • Volume spike: 3x baseline
  • Service changes: ยฑ3 services from baseline

Baseline Management:

  • Exponential moving average (ฮฑ = 0.2)
  • 24-hour Redis cache per window size
  • Automatic baseline updates

2. AI-Powered Root Cause Analysis

Claude AI Integration:

  • Analyzes anomaly context and log samples
  • Identifies likely root causes
  • Assesses user/business impact
  • Generates remediation recommendations
  • Provides confidence scores (0.0-1.0)

Prompt Engineering:

Analyze this system anomaly:

ANOMALIES DETECTED:
- error_spike: high severity (0.15 vs baseline 0.01)

CURRENT METRICS:
- Error rate: 15%
- Services affected: 3

ERROR SAMPLES:
- ERROR: Database connection timeout
- CRITICAL: Payment processing failed

Provide: root_cause, impact, recommended_actions, confidence

3. Intelligent Alerting

Alert Structure:

  • Anomaly ID (timestamp-based)
  • Severity (critical, high, medium, low)
  • Category (error_spike, volume_spike, service_change)
  • Affected services
  • AI analysis and recommendations
  • Confidence score

Rate Limiting:

  • Max 1 alert per category per 15 minutes
  • Prevents alert fatigue
  • Redis-based cooldown tracking

4. Production Features

Async Processing:

  • Non-blocking log ingestion
  • Background analysis tasks
  • Parallel AI calls

Data Persistence:

  • PostgreSQL for alert history
  • Redis for baselines and cooldowns
  • Indexed queries for fast retrieval

Observability:

  • Structured JSON logging
  • Health check endpoints
  • Processing time tracking

๐Ÿ’ป Technical Implementation

Baseline Calculation

# Exponential moving average for smooth baseline
alpha = 0.2
updated_baseline = {
    "avg_error_rate": 
        alpha * current_error_rate + 
        (1 - alpha) * previous_baseline,
    
    "avg_total_logs": 
        alpha * current_total_logs + 
        (1 - alpha) * previous_baseline
}

Why exponential moving average?

  • Recent data weighted more heavily
  • Adapts to gradual changes
  • Smooths out temporary spikes

Anomaly Detection Logic

# Error rate spike detection
if (current_error_rate > baseline * 5 and 
    current_error_rate > 0.05):
    
    anomaly = {
        "type": "error_spike",
        "severity": "high" if current_error_rate > 0.2 else "medium",
        "multiplier": current_error_rate / baseline
    }

Multi-threshold approach:

  • Relative threshold (5x baseline)
  • Absolute threshold (>5%)
  • Prevents false positives from low baselines

Claude AI Analysis

async def analyze_anomaly(anomaly_data):
    """
    Uses Claude to analyze detected anomaly
    Returns structured insights
    """
    prompt = build_analysis_prompt(anomaly_data)
    
    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1500,
        temperature=0.2,  # Low for factual analysis
        messages=[{"role": "user", "content": prompt}]
    )
    
    return parse_json_response(response.content[0].text)

Response Format:

{
  "root_cause": "Database connection pool exhaustion",
  "impact": "Payment processing degraded, ~15% of transactions failing",
  "recommended_actions": [
    "Scale database connection pool from 20 to 50",
    "Investigate long-running queries blocking connections",
    "Enable query timeout enforcement"
  ],
  "confidence": 0.87
}

๐Ÿ“ฆ Installation

git clone https://github.com/yourusername/ai-log-analyzer.git
cd ai-log-analyzer

python -m venv venv
source venv/bin/activate

pip install -r requirements.txt

# Set environment variables
export ANTHROPIC_API_KEY=your_key_here
export DATABASE_URL=postgresql://localhost/log_analyzer
export REDIS_URL=redis://localhost:6379

# Run server
python src/main.py

Dependencies:

fastapi>=0.104.0
uvicorn>=0.24.0
anthropic>=0.8.0
asyncpg>=0.29.0
redis>=5.0.0
pydantic>=2.0.0

๐Ÿ”Œ API Usage

Ingest Logs

curl -X POST http://localhost:8000/logs/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "logs": [
      {
        "timestamp": "2024-01-15T10:30:00Z",
        "level": "ERROR",
        "service": "payment-api",
        "message": "Database connection timeout",
        "context": {"duration_ms": 5000}
      }
    ]
  }'

Get Recent Alerts

curl http://localhost:8000/alerts/recent?limit=10

Response:

{
  "alerts": [
    {
      "anomaly_id": "anom_20240115_103045",
      "detected_at": "2024-01-15T10:30:45Z",
      "severity": "high",
      "category": "error_spike",
      "description": "Database connection pool exhaustion...",
      "affected_services": ["payment-api", "user-service"],
      "recommended_actions": [
        "Scale database connection pool",
        "Investigate long-running queries"
      ],
      "confidence": 0.87
    }
  ]
}

๐Ÿ“Š Performance

Metric Value
Log Ingestion <50ms (async)
Analysis Window 5 minutes
Baseline Update <10ms (Redis)
AI Analysis 1-2s (Claude API)
Alert Creation <100ms (PostgreSQL)

Scalability:

  • 10K+ logs/minute throughput
  • Sub-second anomaly detection
  • Redis-cached baselines (no DB reads)

๐ŸŽ“ Technical Highlights

AI/ML:

  • Claude Sonnet 4 for root cause analysis
  • Structured prompt engineering
  • Confidence scoring
  • Impact assessment

Algorithms:

  • Exponential moving average for baselines
  • Multi-threshold anomaly detection
  • Statistical pattern recognition

Production Patterns:

  • Async Python (FastAPI)
  • Background task queues
  • Rate-limited alerting
  • Redis caching
  • PostgreSQL persistence

SRE Concepts:

  • Anomaly detection
  • Incident response automation
  • Alert fatigue prevention
  • Baseline drift handling

๐Ÿ”ฎ Use Cases

Production Monitoring

  • Error rate spike detection
  • Service degradation alerts
  • Anomaly root cause analysis

DevOps Automation

  • Automated incident triage
  • Remediation recommendation
  • Alert correlation

SRE Operations

  • Proactive issue detection
  • Service health monitoring
  • Pattern-based alerting

๐Ÿ“„ License

MIT License


About

AI-powered log monitoring with Claude for anomaly detection, root cause analysis, and automated alerting

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages