Introduction
To creatures of habit, the passage of time is marked by the seasons of football, ballet, or grouse. For Gulf Coast insurers it is Atlantic hurricane seasons.
Prior to the start of the hurricane season, forecasters offer predictions of how active the season will be. Such forecasts may have some input into reinsurance pricing, perhaps more so with short-dated structures. At the least, they provide a helpful context for investor discussions.
But how accurate are these forecasts? What does ‘accuracy’ even mean? What is the benchmark?
Finally, the rather existential question: does accuracy matter? If not, why not?
Assessment of Forecasting Skill
The Atlantic hurricane season runs from 1 June to 30 November. Prior to the season start, a number of forecasters will predict the number of named storms (NS); hurricanes (H); major hurricanes (MH); and accumulated cyclone energy (ACE), a measure of the aggregate season activity1.
In doing so, they may rely on expectations of sea-surface temperatures, trade winds, wind shear, ENSO (El Niño-Southern Oscillation) phase, West African monsoons etc.
To the credit of the forecasters, the estimates are clearly articulated; the inputs to forecasts are identified; and historic forecasts are retained.
How good are these pre-season forecasts? Indeed, how should we even assess them? We are all used to the financial disclaimer ‘past performance is not indicative of future results’, but if we simply take the average of the last 10 years of storm data, how does that compare against professional forecasting?
Our starting point is to compile a clean history of actual NS/H/MH counts by year that fall within the season (‘Actual’). As described in the appendix, this is a little more involved than might first be imagined. Next, we create a baseline forecast which is simply the average of the last ten years of NS/H/MH Actual counts. For example, our forecast for the number of hurricanes in 2022 would be the average of the number of hurricanes seen between 2012 and 2021 (inclusive). This does not depend upon any meteorological data and as such we refer to it as our Naïve Forecast.
We define a Forecast Difference as the average of the squared differences between a forecast and Actual for each year. Naïve Difference is similarly defined. Finally, we define the ‘skill’ of a forecast as2
Skill = 1 – Forecast Difference / Naïve Difference
With this definition, Skill=100% when the Forecast Difference is zero, i.e. the forecast corresponds exactly to Actual. Skill=0% if the Forecast Difference is the same as the Naïve Difference. There is no reason why Skill could not be negative – that would correspond to the Forecast Difference being bigger than the Naïve Difference (i.e. an actual forecast being worse than the Naïve Forecast).
Enough preamble! What do the results look like? Table 1 shows the Skill measure for three well known forecasters, and Figure 1 shows the difference between the forecast and actual number of named storms by year.
Table 1. Forecasting Skill relative to a Naïve Forecast
Problems loading this infographic? - Please click here
Problems loading this infographic? - Please click here
Source: Man Group database, NOAA, Met Office, Tropical Storm Risk.
From Table 1 (top, using all forecast data), we immediately see that the Skill percentages are a mixture of both positive and negative values. By this measure, over 40% of forecasts are actually worse than the Naïve Forecast.
How are we to interpret a positive skill percentage? One point of reference is to imagine that, each year, a forecast is exactly halfway between the Naïve Forecast and Perfect Forecast. Under such circumstances, the computed skill would be exactly 75%. Even if the forecast error for each year were just 30% less than the Naïve Forecast, then the Skill would be 50%. The fact that all skill values are less than 20% suggests that all forecasts are ‘closer’ to the Naïve Forecast than they are to the Perfect Forecast3.
Figure 1. Forecast minus actual named storms by year
Problems loading this infographic? - Please click here
Source: Man Group Database, NOAA, Met Office, Tropical Storm Risk.
Figure 1 plots the forecast minus actual NS by year. A few features pop out:
- Forecasters tend to underestimate outlier years. In recent years, the most active two seasons (2005 and 2020) and least active two seasons (2009 and 2014) were under/overshot by all forecasters, including the Naïve Forecast
- The Naïve Forecast is visually competitive with meteorological expertise. Unless you knew what feature to look for (successive naïve forecasts being ‘smooth’) it would otherwise be hard to pick this forecast out as having no climate input
Has forecasting improved over the years and, by using all available forecast data, we are not giving recognition to this? This is certainly possible, but the Skill measure is being impacted disproportionately by the ‘misses’ of 2005 and 2020. As a result, when measured over the last ten years rather than using all available data (Table 1, bottom), the Skill for named storms is actually worse for all three forecasters.
Do forecasting errors matter?
Pre-season forecasts are meant to give a sense of the season’s activity – not the number of storms that make landfall. The issue with the latter is that landfall activity is guided by local weather patterns as the hurricane approaches, and these are only known a few days in advance.
Even if we knew the exact number of hurricanes that would make landfall, that would not be sufficient to estimate the economic impact. Hurricane Milton from 2024 illustrated the point: by making landfall just south of Tampa Bay: a) a surge of the bay itself was avoided; b) the densely populated Tampa region north of the bay missed a direct hit. Making landfall a few tens of miles south made a large difference to the economic (and insured) impact.
Forecasting the number of storms in a season, their peak intensity, and their landfall location is still not sufficient to predict insured losses:
- The lateral speed of a storm affects total flooding and needs to be factored in. Hurricane Harvey of 2017 illustrated the point, with its slow motion over southeast Texas leading to catastrophic flooding; and
- We care about the hurricane intensity at the point of landfall, not the peak intensity along its path. Forecasting hurricane intensities at a location is further complicated by some storms experiencing rapid intensification (an effect which is very localised in nature). As we have discussed previously, explosive intensification seems to be becoming more common.
Several of these issues came together in 1992. The season was relatively quiet, with six named storms, four hurricanes, and one major hurricane. Yet that one hurricane was Andrew, making landfall in south Florida as a category five hurricane, and causing record-breaking economic loss.
So if what one cares about is an economic loss forecast for the season, then a season activity forecast inaccuracy is not terribly important, because so many other factors drive an impactful season.
“the exact present determines the future, but the approximate present does not approximately determine the future”
Edward Lorenz, mathematician and meteorologist
This quote by Lorenz, a pioneer of chaos theory, captures the forecasting conundrum: small uncertainties in measurements today lead to increasing uncertainty in forecasts, and explains many of the difficulties above.4
If a catastrophe bond portfolio manager cannot predict the future, what is (s)he to do?
Our view is that, rather than seeking opportunities based on forecasts, we should embrace unpredictability, and focus on robust portfolio construction. Specifically, we recognise that (through supply and demand) certain perils, triggers, and seniorities are more attractive than others, but this should be balanced by a desire for diversification. Doing so will naturally lead to a lower US wind and US quake exposure (relative to a market-cap allocation), balanced by being overweight some of the smaller perils.
Appendix – the nitty-gritty of storm-counting
We begin with a few definitions. A tropical cyclone is a term used to describe a rotating, organised system of clouds and thunderstorms with closed circulation. Depending upon peak sustained windspeeds, we have:
- Tropical depression (speed 33 knots [kt] or less);
- Tropical storm, subtropical storm (speed 34-63kt). The difference between the two lies in their structure and dynamics;
- Hurricane (speed 64kt or more). Hurricanes are further divided into category one to category five on the Saffir-Simpson scale, according to their intensity; and
- Major Hurricane (speed 96kt or more), equating to a category three or higher
Today, tropical and subtropical storms are given names from an alphabetical list (which rotates on a six-year cycle). Prior to 2002, subtropical storms were not named.
For the Atlantic basin, the official ‘season’ runs from 1 June to 30 November, although storms can (and do) form outside this period.
By year, how many named storms, hurricanes and major hurricanes have formed in each hurricane season? Very conveniently, NOAA5 provides historic track data and, using these definitions, we can answer the question consistently by year. By these definitions, certain adjustments need to be made. For example:
- Hurricane Alex of 16 January is excluded because it formed and dissipated before the season began. The same applies to Arthur and Bertha of 2020 and numerous others
- While Bonnie of 2022 became a named storm on 1 July, it crossed Nicaragua and Costa Rica into the Pacific before becoming a hurricane and then a major hurricane. Therefore it only contributes as a named storm
- Rebekah of 2019 reached windspeeds of 70kt, which qualifies as a hurricane. However, these high winds were attained while it was extratropical, so it only contributes to statistics as a named storm
- Arthur of 2008 formed before the season began, however it persisted into the season and therefore gets included. For similar reasons, we only count Zeta of 2005 as a named storm because, while it became a hurricane, it did so after the season ended
- Subtropical storm AL192000 of October 2000 was not assigned a name. The windspeed threshold was met but, prior to 2002, NOAA did not assign names to subtropical storms. As such, we include it as a named storm even though it was not actually assigned a name
Forecast data is in respect of the hurricane season itself (rather than the calendar year). Where a probable range of storms is given, we used the middle value. Some years had experienced storms by the time of the forecast (e.g. Ana in 2015) and these were included in some season estimates (and excluded in others). For consistency, we removed them.
1. Notionally, ACE is a measure of the total cyclone energy released during a season. In practice, it should be very much regarded as a guide. For example, the computation depends upon the square of the windspeed, but not the size of the storm. Thus, it does not even have the units of energy.
2. This is often referred to as ‘mean-squared error skill score’ (MSESS). See, for example,
https://gmd.copernicus.org/articles/18/361/2025/
3. For any mathematicians out there, computing the relevant L2 norms leads to the same conclusion.
4. We should note that while seasonal forecasts have limited skill and questionable utility, shorter term forecasts are critical for preparedness planning.
5. National Oceanic and Atmospheric Administration
You are now leaving Man Group’s website
You are leaving Man Group’s website and entering a third-party website that is not controlled, maintained, or monitored by Man Group. Man Group is not responsible for the content or availability of the third-party website. By leaving Man Group’s website, you will be subject to the third-party website’s terms, policies and/or notices, including those related to privacy and security, as applicable.