Murmur is a synthetic forecasting swarm. It runs dozens of AI personas with diverse expertise through a structured analytical process, clusters their predictions into natural scenarios, and surfaces the key disagreements that matter.
Most predictions fail for predictable reasons. People anchor on a single narrative. They confuse confidence with accuracy. They ignore base rates. They don't update when evidence changes. And when asked "what's the probability?", they round to 0%, 50%, or 100%.
Philip Tetlock's Good Judgment Project spent decades studying what separates accurate forecasters from everyone else. The answer wasn't domain expertise or intelligence — it was a specific set of cognitive habits: thinking in probabilities, breaking questions into components, balancing inside and outside views, and actively looking for reasons you might be wrong.[1]
The problem is that these habits are hard to maintain. Even trained superforecasters regress when they're tired, rushed, or emotionally invested. Murmur automates the discipline.
Vague questions produce vague forecasts. Before anything runs, Murmur asks you 2–4 clarifying questions based on Tetlock's commandments: What's the timeframe? What would count as resolution? What's your prior estimate? What's the strongest force for and against?
Your answers get synthesized into a precise, falsifiable question with explicit resolution criteria — the kind of question that can actually be scored right or wrong later.
Murmur sends your question to multiple AI personas, each with distinct expertise, analytical frameworks, known cognitive biases, and blind spots. A CEO thinks about market timing. An actuary thinks about tail risk. A red teamer thinks about how assumptions break under pressure. An artist thinks about cultural adoption. A philosopher questions the hidden assumptions everyone else takes for granted.
The system intelligently selects the most relevant personas based on the question's domain — always including an adversarial challenger and a humanistic perspective. Each persona runs multiple times with varied parameters — different temperatures, evidence emphases, and temporal anchors — producing dozens of independent forecasts.
Every forecast follows a structured reasoning chain:
Dozens of probability estimates don't speak for themselves. Murmur clusters them into 2–3 natural scenarios using a combination of DBSCAN (density-based clustering that finds natural groupings) and k-means with silhouette score optimization. The cap at 3 scenarios is deliberate — more than 3 produces blurry, overlapping futures that don't help decision-making.
The result isn't a single number. It's a map of possible futures: "35% of forecasters think gradual augmentation, 28% think rapid displacement, 22% think hybrid equilibrium." Each cluster represents a coherent, distinct story about what could happen.
Murmur shows two aggregate probabilities, not one, because there's an honest uncertainty about how to combine the estimates.
Panel mean is the simple average across all forecasters. This is the right number if you believe the personas share systematic biases from the same base model — which they do. They all read the same training data. When they agree, it might reflect genuine evidence or a shared blind spot. The mean treats their agreement cautiously.
Extremized aggregate uses Tetlock's formula from the Good Judgment Project: geometric mean of odds, then push away from 50% by a factor of d=2.5.[2] The intuition: if independent forecasters mostly agree, the true probability is probably more extreme than the average. This was validated on genuinely independent human superforecasters in the IARPA tournament.
After clustering, Murmur identifies the two scenarios with the highest disagreement and picks a "champion" persona from each — the persona whose viewpoint dominates that cluster. Then it runs a structured debate: each champion sees the other's strongest argument and must rebut it.
The debate doesn't revise the numbers. Its purpose is to surface the core analytical tension — the structural disagreement that explains why the forecasters diverge. This is often the most useful output: not "38% probability" but "the real question is whether regulatory friction or market pressure wins."
Each cluster gets a narrative: a vivid 2–3 sentence description of what this future looks like, the key assumption it depends on, and the condition that would break it. This turns statistical clusters into stories you can reason about.
Every scenario is also expandable — you can drill into the reasoning of individual forecasters to see their base rate estimate, inside view adjustment, sub-question decomposition, and what specific evidence would change their mind. This transparency lets you evaluate why the number is what it is, not just what the number is.
The final step is often the most valuable. Murmur examines all the scenarios and extracts the load-bearing assumptions — the specific, falsifiable claims about the world that must be true for each scenario to play out.
For each assumption, Murmur identifies:
Critically, Murmur also identifies shared assumptions — assumptions that appear across multiple scenarios. These are the highest-leverage monitoring targets, because if a shared assumption breaks, it doesn't just shift one scenario — it reshuffles the entire forecast.
The linchpin assumption is the single assumption whose reversal would cause the largest redistribution of probability across all scenarios. This is the thing to watch. If you're going to monitor one signal to know whether the forecast is still valid, it's this one.
Murmur ships with a diverse roster of personas spanning cybersecurity, technology, business, policy, finance, and humanities. Each has:
The diversity is the point. A CEO and an actuary will forecast the same question through completely different lenses. That's not noise — it's signal. The scenarios that emerge from clustering many different perspectives are richer than any single expert's prediction.
Murmur is not an oracle. It's a structured thinking tool. The output is not "the answer" — it's a map of plausible futures weighted by probability, with the key assumptions and breaking conditions made explicit.
The value isn't the point estimate. It's the decomposition: what are the real sub-questions? Where do smart people disagree, and why? What specific evidence would change the picture?
Use it to think better, not to think less.
[1] Tetlock, P.E. & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown Publishers. The foundational work on what makes forecasters accurate. Wikipedia · Good Judgment Project
[2] Baron, J. et al. (2014). Two Reasons to Make Aggregated Probability Forecasts More Extreme. Decision Analysis, 11(2), 133–145. The empirical basis for extremized aggregation with d=2.5. doi:10.1287/deca.2014.0293
[3] Halawi, D. et al. (2024). Approaching Human-Level Forecasting with Language Models. arXiv preprint. Demonstrates structured prompting improves LLM forecasting accuracy by up to 41% over baseline. arXiv:2402.18563
[4] Schoenegger, P. et al. (2024). AI Superforecasting: Can AI Beat Human Forecasters? Multi-agent experiments showing independent analysis followed by selective debate outperforms consensus-seeking approaches. arXiv:2409.08322
[5] Zou, A. et al. (2024). Forecasting with Large Language Models. arXiv preprint. Structured single-pass prompts with decomposition and base rate anchoring outperform multi-round self-revision. arXiv:2402.01426