What makes a good judgemental forecasting question

Specificity and clarity

Imagine a group of people trying to answer the seemingly innocuous question Will we experience war in our lifetime? Initially, they may feel utter confusion. The question is very broad and there are many ways to interpret it. When several people try to discuss such a question, each of them will have a slightly different interpretation in mind.

First of all, what do we mean by we? We, the group of people asking this question? All of us? At least one of us? How directly / personally would they need to be affected to count (e.g. enlisting, part of their city being bombed etc.)? Or our country in general? What if some of us move to a different country (which could itself be more probable than a war breaking out in our current country)? Similarly, what about our lifetime? Do we now have to factor in estimates of life expectancy? (again, even several of us dying prematurely may be more likely that a war breaking out so it cannot just be ignored)

And perhaps even more importantly, what falls under our definition of war? Do we count cyber warfare for example? Or do we mean a situation where soldiers from another country march into our country, or people from our country actively enter another country with the goal of conquest? Each of these variations will probably lead to a very different answer.

Even when we consider a seemingly much more exact question such as “By what percentage will the GDP of the US rise over the next 10 years?”, problems arise.

The US is the largest economy in the world, and there are many statistics that deal with US GDP, which seems to make this question easy to evaluate. On closer inspection, however, the meaning of GDP is not as clear as it seems. It can be interpreted to mean either nominal GDP or real GDP (corrected for inflation). Also we could be asking about GDP of the entire country (which could be influenced also by population changes) or GDP per capita. When a question can be interpreted in multiple different ways, the forecasters are not on the same page when responding to it. Thus we would be essentially aggregating answers to several different questions and the result would have very little predictive value and could be completely inaccurate and misleading. Neither would it be possible to compare individual forecasts with each other or to determine the best forecaster. With the potential for question evaluation being questionable, the willingness of rational participants to engage in research on the topic also decreases.

In short, all the terms in the question must be clearly defined, even seemingly obvious ones.

Resolvability

For example, we can imagine asking the question: “What will the number of indigenous people in the Amazon who live at least 50 km from a paved road be in 2035?” It is of course an interesting question, and if we define terms like "Amazon'' and "indigenous population" clearly, we will all know what is meant by the question.

The problem with such a question, however, is that so far, globally, no international institution, no local institution and no non-governmental organisation counts the indigenous Indians in the Amazon living 50 km or more from a paved road.

In order to evaluate the question, we would have to go to the Amazon ourselves, and even then, we would not be able to count all of them during one year because the Amazon is too large and impenetrable.

So this question is clearly asked, but we lack a source for its evaluation, which would rob us of one of the great features of forecasting – feedback.

Determining the source against which the question will be evaluated is therefore essential for each question. The best sources are usually official reports or regularly updated statistics published on the websites of respected national and international institutions.

If the topic of the question does not allow the use of established institutions with high credibility, the question can be evaluated on the basis of information from reputable news sites.

The uncertainty associated with the evaluation of a question if no source for evaluation is identified may significantly reduce the willingness of participants to make predictions on such a question.

Each question should include additional information that clearly says how the question will be evaluated in light of all possible future scenarios. If an outcome occurs that was not considered in the question, the question must be cancelled. Such cases should be minimised as, in addition to the loss to the asker of not receiving the necessary prediction, such oversights lead to a loss of motivation for participants. However, such events can sometimes occur regardless of how much we try to avoid them.

Informativeness

All this specificity and clarity also has its dark side. Often when asking a question, we have something on your mind that we would actually want to know, but it’s rather vague and cannot be asked directly as a forecasting question. For example: How much more infectious is the Omicron coronavirus variant? (But it may be something even more abstract, e.g. Which of these research directions shows more promise / should we prioritise?)

Then we need to operationalize – formulate a much more specific question (or several questions) intended to be somehow relevant to answering the original question.

But we have to be very careful to ensure that the operationalization is actually informative of your original question and not in fact mostly determined by the details of the operationalization.

For example, if we tried to operationalize the above question as How many confirmed cases of the Omicron variant will there be in the US in the next month? it could be mostly determined by how much discriminative testing and sequencing is done and even the actual after-the-fact answer would tell us very little about the true spread of Omicron.

Similarly, if you want to know something about the progress of AI and you ask about the success of AI at e.g. playing a specific game, this may be influenced more by the inherent ease or difficulty of that game than the level of AI capability.

In general, quite often it turns out that the question you asked is actually about something else than you wanted it to be about.

And as good forecasters are trained to predict on the question as asked (not as intended), such predictions will probably also not be of much value.

To make your forecasting question as informative as possible, you can use a strategic question decomposition, or simply “a decomposition.” Question decompostion is the result of breaking down a big strategic question into smaller parts in order to identify forecast questions that could help us better understand the original question and develop a sense of what’s likely to occur. You can read more baout it here.

Here is a checklist that summarizes criteria each forecasting question should meet:

It is clearly states so that it can only be interpreted in one particular way.
Each of the terms used is clearly defined in the additional information for the question. For example: conflict, assault, breach of convention, etc.
The question allows for all possible scenarios including unlikely variants such as edge cases, ties, postponed/cancelled events, etc. and specifies how it will be resolved in these edge cases.
Dates and times are precisely specified. E.g. instead of asking whether something will happen within two years, we specify a period ending on midnight of January 1, 2024. If the time zone is not clear, it should be added in the clarifying information.
The question and its context are made sufficiently interesting to motivate forecasters to make predictions, and it is explained why the question is important or which institution may value the result of the prediction.
It is accompanied by information about the source from which it will be evaluated. At the same time, the criteria for its evaluation are unambiguous based on evolving data sources.
It is accompanied by information about the source from which it will be evaluated. At the same time, the criteria for its evaluation are unambiguous based on evolving data sources.
The wording of the question is tasteful; for example, the death of a particular person or group should not be predicted. In the case of public interest, the question can be rephrased, for example, "When will X no longer be a member of the court?" and so on.
The question is not defamatory and respects privacy (for example, it does not interfere with the personal lives of persons who are not publicly known).
Before asking the question, an expert who has long been involved in the subject matter of the question is consulted. This point is particularly valid for questions whose result is given in a scale.
The outcome of a given question is simply not possible for tournament participants to influence, for example, by voting in a poll on the topic or by any type of technical solution such as google trends or automatically generated polls.

What makes a

good question about the future?

Specificity and clarity

Resolvability

Informativeness

Here is a checklist that summarizes criteria each forecasting question should meet:

Wanna exchange theory for practice?