Wisdom of Crowds Simulation

Explore how different aggregation methods perform when combining predictions from multiple forecasters

Simulation Parameters

Aggregation Method Comparison

Lower Brier score = better accuracy

Key findings:

  • Mean: Simple average of all forecasts
  • Median: Middle forecast value
  • Skill-Weighted: Weights forecasts by past performance
  • Extremized: Pushes mean forecast away from 50%
  • Combined: Skill-weighted + extremized

Calibration Plot

Closer to diagonal line = better calibration

Interpretation:

  • Each point represents a bin of predictions
  • X-axis: Average predicted probability in that bin
  • Y-axis: Fraction of events that actually occurred
  • Points on diagonal line = perfect calibration
  • Points above line = underconfidence
  • Points below line = overconfidence

Forecaster Skill vs Performance

Higher skill should correlate with lower Brier score

Observations:

The trend line shows how forecaster skill correlates with prediction accuracy. Lower Brier scores indicate better performance.

Aggregation Performance by Question

Shows how different methods perform across various questions

Analysis:

This chart shows performance across the first 20 questions, allowing us to see when certain aggregation methods outperform others.

Key Findings

Made with DeepSite LogoDeepSite - 🧬 Remix