New York City Marathon 2014

120 Minutes Over 26.2 Miles: A Statistical Approach

120 Minutes Over 26.2 Miles: A Statistical Approach

Oct 28, 2014 by Gordon Mack
120 Minutes Over 26.2 Miles: A Statistical Approach

By: Scott Olberding @isthatsol

All it will take to secure a permanent fixture in sporting lore is running 68.25 seconds per 400 meters for 26.219 miles.  And as a species, we are getting close.  How close, precisely, has been a heated debate as of late.

There are a lot of bad ways to estimate when this feat will be accomplished.  One would be to subtract 2 hours from the current world record (2:02:57) and look at how long it took us to drop 2 minutes at 57 minutes to our current record.  On September 20th, 1998, Roberto de Costa ran 2:06:02 at the Berlin Marathon to break the at-the-time world record.  This was 16 years ago.  If Dennis Kimetto’s mark of 2:02:57 were lowered by the amount between these two times, our new record would be 1:59:49.  Extrapolating this time reduction into the future, we would have a sub-2 hour marathon in 2030.  However, this is not really a fair way to calculate a prediction; this linear approach will ultimately prove futile. I say this because as records get faster and faster, we must realize there exists an insurmountable, unbeatable opponent.  That being: the concept of zero.  Short of developing a method to displace time, no being will ever run a race faster than 0 seconds.  Trust me, no one is more excited than myself about the idea of engaging in a footrace and traveling backwards in time, perhaps to witness a historically significant event or maybe to suggest to 2001-Scott that frosted tips is not a good look.  But I digress.
In all honesty, we have been pretty spoiled as of late with some phenomenal marathon performances.  In the past ten years, the world record has been broken a total of five times.  Since 1997, it has been bettered nine times.  In the ten years preceding 1997, it was broken twice.  Something special occurred in 1997 - no doubt, it was a magical year. For instance, IBM’s Deep Blue computer beat a human (Grandmaster Garry Kasparov) in a game of chess for the first time.  James Cameron’s iconic Titanic premiered.  And Notorious B.I.G.’s Life After Death was released posthumously, changing college dance parties forever.  But more importantly, at least in the context of marathon running, 20 Kenyan men broke 2:10, which is more times than had been done up until that point in a single year, even when including every nation.  And after that, as they say, nothing was the same:

To be completely fair, runners hailing from Ethiopia have also been doing much of the heavy lifting when it comes to running fast marathons both consistently and often.  Shown here, for fun, is the breakout of every sub-2:10 performance by year and nationality.  I do not recommend staring at this chart for too long as I cannot guarantee it won’t induce a neurological condition.

The above information is more aptly displayed below, showcasing the amazing modern advent of east-African distance-running dominance.

The point of analyzing the past three graphs is to drive home the idea that the pool of runners who are in the ballpark of running 2:00 for a marathon is growing at an extremely rapid pace.  And that is important especially given that many elite runners do not take a shot at a fast marathon more than once or twice in a given year.  In order to keep inching the record closer and closer to 120 minutes, we will need hundreds, if not thousands, of athletes who are in the realm of that type of performance.
As an aside, I think it is also important to note that the average age of athletes running faster than 2:10 has been decreasing since the mid-1990’s, albeit slowly.

Getting back to the projection of a date for this magical performance, one metric to consider is what would be a comparable time in a similar event, such as the 10,000-meters or half-marathon.   From here, we could evaluate whether the record in that respective event is comparable to the 2-hour barrier in the marathon.  Below is a chart showing the 10,000-meter and marathon PBs for all athletes who have run faster than both 28:30 and 2:10, respectively.

Here is the same chart for the half-marathon and marathon, including every athlete who has a PB of at least 61:30 and 2:10, respectively.

Note that the equation for the line of best fit has been included in each chart.  This allows us to extrapolate a comparable time in each event.  Inserting 2 hours into the y variable of each equation, we are given a 10,000-meter time of 20:15 and a half-marathon time of 52:32.  Intuitively, these performances are ludicrous.  There is good reason for this.  Our R2, or coefficient of determination, is comically low in both instances. As a bit of background, the coefficient of determination is defined as the degree to which our regression line is represented by the data.  This coefficient is expressed by a numeric value between 0 and 1.  Given our respective R2's of 0.15 and 0.21, the statistical community would regard these corollaries somewhere between complete garbage and worthless.  Taking 1-R2, we can say that 85% and 79% of these regression lines are based on error in the model.  This is not good for predicative purposes.  If I hear that you are telling people “Hey, my buddy Scott said that a 2 hour marathon is the equivalent of a 20:15 10K and 52:32 half marathon” I will come find you and sternly wag my finger.  This exercise was largely conducted to show that the 10,000-meters and half-marathon have little predictive value when it comes to extrapolating marathon times.
A final approach would be to evaluate the trajectory of the world record year-by-year.  Since we have had very consistent time-keeping practices over the past hundred years or so, there may be some predictive value in projecting future times based on all of our previous bests.  In a sense, after all, we are standing on the shoulders of all those who came before.  Below is a chart showing the world record since 1908, with a logarithmic line of best fit inserted for projection purposes.

Utilizing the formula displayed on the chart, we are able to project, using the progression of prior records as a basis, when the 2-hour barrier will be broken.  As you can note above, the R2 is a very solid, at 0.97, meaning that the majority of our model (line of fit) is explained by the underlying data.  Doing a little bit of arithmetic, we can solve for x when inserting 2:00 for the y variable, giving us a date of September 7th, 2046.  I must urge that extrapolating in this manner can be a bit dangerous, as we are in fact dealing with human beings, who are subject to the effects of our environment.  I suspect that in order for this 120-minute barrier to fall, we will not only need perfect race conditions, but also the perfect course, not to mention an athlete in very, very good shape.  And the proverbial stars only align like this on very rare occasions.  It’s what makes such a feat so special.
To clarify, I would interpret this data in a more broad sense.  Based on the above analysis, I am willing to bet that more likely than not, the 2-hour barrier will be broken within the next fifty years or so.  Which isn’t as exciting as a clear, concise matter-of-factly statement, but it is much more valuable than a guess.  If you ever hear someone begin a prediction with, “I’m no scientist, but...”, run in the opposite direction.  As the famous Athenian philosopher Thucydides once quipped, “When a person finds a conclusion agreeable, they accept it without argument, but when they find it disagreeable, they will bring against it all the forces of logic and reason.”  It is our goal, especially when it comes to predictive analysis, to throw aside any and all bias, seeking the truth.  And it is a sincere hope of mine that within my lifetime, I am able to witness a harrier clip through the half-marathon point in 59:52 and continue this intrepid pace for the better part of 13 miles, earning not only a place in our people’s history, but in the process proving it is indeed possible to accomplish that which was once considered utterly inconceivable.

Scott's Previous Work: Will Colorado Repeat? | Unsung Heroes: 3rd, 4th, 5th Runners