Wisdom of Crowds Simulation

Explore how different aggregation methods perform when combining predictions from multiple forecasters

Simulation Parameters

Number of Forecasters

Number of Questions

Mean Skill Level

Skill Std Dev

Lower Brier score = better accuracy

Key findings:

Closer to diagonal line = better calibration

Interpretation:

Higher skill should correlate with lower Brier score

Observations:

The trend line shows how forecaster skill correlates with prediction accuracy. Lower Brier scores indicate better performance.

Shows how different methods perform across various questions

Analysis:

This chart shows performance across the first 20 questions, allowing us to see when certain aggregation methods outperform others.

Skill-weighted aggregation typically outperforms simple mean or median predictions, similar to Metaculus's approach of weighting forecasts by track record.
Extremization works by pushing forecasts further from 50%, correcting for the tendency of crowds to be underconfident.
Combined approaches that both weight by skill and extremize typically perform best, reflecting how Metaculus likely implements their prediction algorithm.
Noise reduction is a key benefit of aggregation - individual forecasters may have high variance, but the aggregate is more stable.
Calibration shows whether forecasters are over or underconfident. A well-calibrated forecast says 70% only when events occur 70% of the time.