If you’ve just read What is a high-quality replication, you know that our replications will be high-power — usually much higher than the original study — but not quite the “100 replications” ideal set out by DARPA:

If you’re a forecaster, should this difference affect you, and if so, how?  One suggestion:

  • Big-picture person, in a hurry, or new? Forecast the ideal. Maybe hedge a little to allow for noise.
  • Detail person, experienced, have time? Maximize your accuracy. Especially in the market.

With high enough power, these are the same, but actually there will be some difference.  DARPA has asked us for Confidence Scores, but we can recover those from accurate forecasts – not necessarily the reverse.  This matters most in the market, because the market pays on actual ground truth: market incentives are closely tied to real accuracy.

But what should forecasters do? It depends.

  • Either should do pretty well. Power is pretty high.
  • In the survey, there is a separate question for plausibility. Put your ideal Confidence Score there, and then adjust for the Replication question.
  • In the market, whichever suits you. A division of labor is efficient.

The market is probably most accurate when forecasters specialize, because one kind of forecaster won’t have enough points. Two strategies are ballparking and calibrating

  • Ballparking: make early cheap edits to the ballpark of the ideal. 
  • Calibrating: adjust the ballpark when appropriate.

Calibrators will want to get good at estimating the difference between ideal and actual, and maximizing predictive accuracy.  Some of their edits will be costly, so they need to know the LMSR scoring rule well, and get good at recovering points and other technical trading.  

Ballparkers can be bigger-picture, but may need to get fast. They also have to watch their budget: smaller edits on more questions are likely more efficient. They need to know that the LMSR is a bowl-shaped function, getting more expensive as you push the ball (forecast) uphill to the extremes, and cheapest in reverse. 

Both can prosper. The market pays for information: an early ballpark edit that moves the replication chance from 68% to 85% receives twice the reward of one that then moves it from 85% to 95%, assuming the claim replicates.  

Participants in previous replication markets (70-80% accurate!) report doing well with two basic strategies:

  1. Estimate the plausibility of the claim itself. (A ballpark Confidence Score).
  2. Attend to the statistics of the original study, chiefly p-value, power, and effect size. (Our starting prices now take some of this into account, so you have to work a little harder than before. More calibrating now.)

Good luck! Be sure to follow the conversation and share your own thoughts.

Follow Us

Although final results will not be available until mid-2020, you may follow the project to receive updates and relevant posts.

Replication Markets will be launched in August 2019 pending IRB approval; you can sign up here to be notified.