Evidence from the first four rounds
tl;dr: In our monthly surveys, high participation does not necessarily lead to high prizes – some highly active forecasters earned $0 – suggesting our surrogate score is robust to participation bias.
The monthly survey prizes are risky. Unlike markets that pay for ground truth (eventually, and for only 5-10% of claims), the surveys pay out monthly, for all claims, but based only on a noisy estimate. Therefore we allocate ⅔ of our prize money to markets, and ⅓ to surveys – but need to check if we’re being taken. We expect some noise of course (about 30% based on past data), but might there be a systematic way to exploit the proxy so that participants prefer to provide misleading forecasts? One such possibility is participation bias, or spamming. Because the proxy uses the crowd forecasts to estimate truth, does it simply reward participation?
We examined the performance of participants who completed surveys for at least 100 of the 1,100 claims available in the first four rounds. Rounds occur about monthly, with surveys open for one week. In each round, there were either 20 or 30 batches of ~10 surveys each. Each survey asks 6 questions about each claim. Participants can complete as many batches as they like. Rewards are given to the Top 4 participants in each batch: $80, $40, $20, and $20. A participant could in theory earn up to $2,400 by placing first in all 30 batches. Rewards are based on a surrogate score that estimates their true rank, based on the total set of survey answers. Within this group of active forecasters, we examine the relationship between prizes and number of forecasts, and average rank and number of forecasts.
As shown in Figure 1, there is a correlation between number of tasks and reward, which is natural because max winning is capped by the number of batches you do. But at each level, the distribution is broad. Even among those finishing all surveys, payout ranges from more than $1,000 all the way to $0. More clearly, as shown in Figure 2, the average rank achieved by a user over the batches they completed is NOT correlated with the number of tasks they finished in total in a round. This suggests that our surrogate scoring rule is not swamped merely by the volume of forecasts. It is something about the quality of forecasts.
Figure 1. The number of claims forecasted and the money prize won by users who finished at least 100 claim surveys in the first four rounds.
Figure 2. The number of claims forecasted and the average rank over badges of a user who finished at least 100 claim surveys in the first four rounds.
Furthermore, in Figure 3, we can see the reward dynamics of the top 5 users who finished the highest number of tasks. The figure shows that even if some users, like Top users 1 and 5, have completed a high volume of claims, they could still get no reward if they make random predictions. Conversely, participants who provide (apparently) quality forecasts on a large number of claims can earn around $1,000 in a round, as Top users 2 and 3 — though that would be expected to decrease as other participants increased effort to match.