Running in the lab versus running in the real world: why they differ and what to do about it

My article from a few weeks ago on Emile Cairess’ training before his 3rd-place finish at this year’s London Marathon was one of the most popular articles I’ve written in a long time. However, it actually wasn’t the only article I published that week! A few days prior, one of my dissertation studies was published, and it was titled:

Are Gait Patterns during In-Lab Running Representative of Gait Patterns during Real-World Training?”—and the answer is “not really.”

I’d like to go over both what we did in this study and what implications it has for people (like me!) who want to apply findings from biomechanics research to real-world training.

Studying gait mechanics in gait labs

If you come into a gait lab for a biomechanical evaluation, you’ll be greeted by a scene like this:

Black and white photo of a motion capture laboratory with a force treadmill in the middle

That, in the middle, is a force treadmill—like a regular treadmill, except underneath the belt it has a steel force-sensing platform.[1] Surrounding it, mounted on the rails by the ceiling, are motion capture cameras.

Before running on the treadmill, you’d be outfitted with reflective markers to measure the motion of your body. There are many cool things you can do by combining the marker data and the force data, such as the following:

That is a musculoskeletal model that can estimate the forces in every major muscle of the lower body, and important locations like the patellofemoral joint (pictured on the right), across the entire gait cycle—and all it needs are the force data and the marker data you’d get from an in-lab gait analysis.[2]

The big question about in-lab gait data

You might see where this is going: we can measure how you run in the lab with high precision.[3] But…do you run the same way when you leave the lab and go for a run outside?

That’s an important question because it has big implications for the degree to which you can extrapolate from an in-lab gait analysis to your real-world training.

For example, if an in-lab gait analysis showed that your patellofemoral joint contact force increased dramatically at faster speeds, you might do faster workouts less often to reduce the risk of runner’s knee.

However, if you don’t run the same way outside of the lab, it’s a lot harder to say anything definitive about real-world running from in-lab data.

Whether people run the same way in the real world versus in the lab is not such an easy question to answer—after all, the whole reason you need all that equipment in the lab is because we can’t measure how you run without it!

To get some idea of whether people actually do run the same way inside and outside, you’d need some way of assessing running gait (or some important parts of it) that works both inside the lab and outside, in the real world.

Garmin and Stryd running dynamics as a proxy for gait pattern

My dissertation happened to provide just such a way to assess gait both in the lab and in the real world. That wasn’t actually the main goal of my dissertation—I was aiming to predict Achilles tendon and knee joint forces using wearable sensors—but the experimental design happened to allow a look at running gait in the lab versus in the real world.[4]

My idea for the main part of my dissertation was to use “consumer-grade” devices—a Garmin HRM and a Stryd foot pod—to measure a few key parameters of running gait (“Running Dynamics” in the commercial parlance), like cadence, stance time, and vertical oscillation, then use the in-lab data to build a predictive model that used these key parameters of gait to reconstruct the full, high-resolution data we get in the lab.[5]

That way, you could run outside in the real world, and the predictive model could (in theory) predict what we would be measuring about your gait, if you were running in the lab at that moment. This is a classic, straightforward predictive modeling study design.

But, since we had the sensors anyways, why not send people back home with them for a few days and get a few real-world runs as well? That way, we could test to see whether their running gait looked the same in the real world as it does in the lab, since they’d be wearing the devices for both the in-lab and real-world sessions.

With this in mind, the protocol for any given subject looked like this:

  1. Complete an in-lab gait analysis while wearing the Garmin HRM and the Stryd foot pod
  2. Take the devices home and do five runs with them—any distance, any speed
  3. Return the devices to the lab

Capturing gait patterns with running dynamics

“Gait pattern” is a funny term. Everyone sort of knows what you mean—more or less, “the way you run,” but captured in a way that’s general enough to ignore smaller things that don’t matter, like the way you position your pinky finger, but specific enough to differentiate important aspects of running gait across different speeds and between different runners.

Operationalizing “gait pattern” into something specific is more tricky. We went with defining a runner’s gait pattern as a set of five key components of gait that we could measure with the sensors: speed, step length,[6] ground contact time, vertical oscillation, and leg stiffness.[7]

This set of five metrics gives a good, but not perfect, representation of how a runner moves: it captures some key components, like speed, cadence, and how “bouncy” your stride is, but neglects others, like how much your knee bends or whether you forefoot strike or heelstrike (or something in between).

Comparing in-lab and real-world gait patterns

Now that we had a way to quantify a runner’s gait pattern in the lab and in the real world, we needed a way to quantitatively compare gait patterns—what does it mean to run “the same way” outside vs. in the lab?

Conceptually, I think of it as follows: no two steps are ever exactly the same, but when you run, there’s a certain distribution of gait parameters: when running at 8:00/mi, for example, your step length, vertical oscillation, etc., all fall within a certain range of values.

That range shifts as a function of speed, of course, but if you imagine the “cloud” of gait patterns that spans across the range of speeds you typically run, we can say that the outline of that cloud forms a range of gait patterns that are representative of the way you run.

So, the analysis is simple at a conceptual level: capture the “cloud” of gait patterns in the lab, capture the “cloud” of gait patterns in the real world, and see how much they overlap: very high overlap would indicate that most of the steps you take in the real world are pretty similar to steps you’d take in the lab, while low or no overlap would indicate a distributional shift—a qualitative change in how you run in the lab versus in the real world.

Moreover, by quantifying what percentage of the steps taken during real-world running fell within the bounds of the “cloud” formed by your steps taken in the lab, we could put a number—as a percentage—on how well your in-lab gait represented your real-world gait.[8]

The results: Most people run quite differently in the real world!

We had 49 runners who completed the study successfully, and who collectively logged 150 hours of real-world running in addition to the in-lab gait analysis.

The results were pretty surprising: on average, only about 33% of the steps taken during real-world running are well-represented by an in-lab gait analysis!

Another way of thinking about this finding is that two-thirds of the time, when you take a step outside in the real world, you’re doing so in a way that’s totally unlike any step you would take in the lab.

What explains this finding? There are a few possibilities we quickly ruled out:

First, it was not that people ran at different speeds in the lab versus in the real-world. The in-lab protocol was specifically designed to capture a wide range of speeds, relative to the runner’s own typical training pace, and indeed, the range of speeds in the real world fell within the in-lab speed measurements ~95% of the time.

Second, it wasn’t because gait was affected by inclines, declines, or turns. Because we collected GPS and elevation data, I was able to use GPS position and elevation rates of change to filter out uphills, downhills, and turns:

Using GPS location data, it's pretty easy to calculate whether someone is running straight, turning left, or turning right.

Even when we compared only real-world running on flat, straight segments to the in-lab data, the difference was the same.

Third, it wasn’t caused by just one gait metric that might’ve been measured poorly. Even when you only looked at speed, step length, and stance time—the three most accurate gait metrics measured by the sensors—there was still only about 50% overlap between real-world and in-lab running.

Distributional shifts in gait from the lab to the real world

The best illustration of what’s going on comes from this figure:

2D visualization of in-lab versus real-world gait patterns, represented as points in a scatterplot.

The red shows the "cloud" of gait patterns for one runner during the in-lab gait analysis. The blue shows the "cloud" of gait patterns from that runner's real-world training.[9]

When they’re overlaid, only part of these “clouds” overlap! In other words, there is indeed a distributional shift in gait patterns. The direction and manner of this shift is different for different runners, but the most reasonable explanation is simply that most people run differently in the real world compared to how they run in the lab.

Most individual gait metrics overlapped pretty well on a “univariate” basis—step length, stance time, etc., all spanned pretty similar values inside the lab versus out in the real world. What changed was the relationship between gait metrics.

Why does real-world running differ from in-lab running?

There are a few plausible explanations for why people run differently, and these intersect a bit with some of the weaknesses of our study. Why might we be measuring differences in the lab vs. in the real world? It could be because:

  1. Treadmill running is intrinsically different from overground running
  2. There is something about the in-lab environment that “cues” people to run differently
  3. There is a systematic difference in the sensor measurements (even if underlying gait stays the same)
  4. People really are running differently, regardless of #1-3

Point 1, about treadmill running being different, is only true in a weak sense. A meta-analysis from 2019 that compared in-lab treadmill running with in-lab “overground” running found only small average differences—not enough, in my opinion, to account for the differences we found.

Point 2 (people are cued to run different inside a lab) is a bit more plausible: that 2019 meta-analysis would not capture the differences between being outside versus being in the lab—even “overground” in-lab running is still not at all like running outside. Here’s what a typical overground setup looks like:

Hardly the open road. You've got about ten meters total to get up to speed, then decelerate before hitting the wall at the end.

So, maybe the fact that you’re inside, in a confined space, significantly alters your gait? Seems plausible to me.

Even though we filtered out turns, inclines, and declines, it’s also possible that road running involves a lot of small undulations: ups and downs of just one or two centimeters per step, as compared to the perfectly flat in-lab treadmill. Or perhaps the subtle step-to-step increases and decreases in speed during real-world running are to blame.

Maybe just letting your mind wander when you’re outside makes your gait different. When you’re in a gait lab aware your gait is being recorded it is hard not to think about how you’re running.

Point 3 (the sensors are off, either in the lab or in the real world) is also worth considering: these are consumer-grade devices and their accuracy is not perfect. We didn’t include stance-time balance as a gait metric because it was so poorly measured compared with a ground-truth measurement from the motion capture system.

Maybe the sensors get less accurate outside the lab?  I don’t find this very likely, though, since the sensitivity analysis we did suggested that it wasn’t radical changes in one individual metric that were driving the results. Other studies have used Stryd and Garmin HRM data during overground running and have not found gross errors.

That leaves us with Point 4: there really is a difference in how most people run, and it can’t be attributed to smaller experimental or environmental issues. It’s worth thinking through the implications of stark, persistent differences in in-lab versus real-world gait patterns.

Should we throw out all in-lab biomechanics studies?

The pessimistic view on these results would be “all in-lab biomechanics work is suspect,” but I don’t think it’s quite so bad, for a few reasons.

First, group averages from in-lab gait analyses are much better-representative of real-world running than data from just one person.

If, instead of comparing the “cloud” of real-world gait patterns from one person to their own in-lab gait patterns, you compare them to a pooled sample of many other runners, the overlap improves from 33% to 90%. So, I’m more willing to trust conclusions from in-lab data when they’re about what happens on average across runners.

Second, it’s still the case that individual gait parameters match up pretty well. People with low cadence in the lab have low cadence in the real world, and ditto for most other aspects of gait. So, relative differences across runners probably hold up reasonably well.

Where I’m more skeptical of in-lab gait data

All of the above notwithstanding, there are still several scenarios where I’m more skeptical of the value of in-lab gait data after doing this study.

The first is the traditional “N of 1” gait analysis on an individual runner. The whole point of, say, sending a chronically injured runner into a lab for a gait analysis is to find out what aspects of that runner’s gait might be contributing to their injury problems.

If that gait analysis highlights something that only happens during in-lab running, not during real-world running, the runner and their physical therapist could end up going on a wild goose chase, wasting time on changing a gait pattern that doesn’t even occur in the real-world—or completely missing the true cause of the runner’s problems because it never shows up during the in-lab analysis.

I’m also more lukewarm on “personalized” models of musculoskeletal stress during running. I mentioned above that the main goal of my dissertation was to build predictive models that could take wearable sensor data and predict Achilles tendon and patellofemoral joint forces during real-world running.

Initially, my fallback if this wasn’t going to work at a group level was to develop personalized models, where you’d come into the lab, do a treadmill run, and get a “customized” predictive model tuned to your biomechanics.

After doing this study, I’m more skeptical that this personalized approach would work—in fact, even if you are building a personalized model, it probably makes sense to combine a runner’s own data with a pool of data from other runners before developing your model, given that there’s likely to be a distributional shift in the runner’s own gait pattern.

Unanswered questions and what comes next

By far the biggest limitation of this study was the fact that it had to rely on the relatively crude gait metrics available on Garmin and Stryd devices. It’s an open question whether other aspects of gait, like left/right asymmetries or ankle/knee/hip angles, also show distributional shifts when moving outside of the lab, and to what degree.

I would love to see a follow-up study that uses a more carefully controlled gradation of settings—for example:

  1. In-lab treadmill running
  2. In-lab overground running on a runway in a small room
  3. Outdoor running on flat, straight, smooth pavement, while observed by the research team
  4. Uncontrolled outdoor running on roads of any kind, not being observed by the research team

This kind of comparison, especially with a more comprehensive suite of gait metrics, would help pin down the exact source of distributional shifts in gait patterns.

Smart sensors or smartphones?

My other big takeaway from this and my other sensor-based research projects is that the best “wearable sensors” for studying running biomechanics in the real world might not be a heart rate monitor or a foot pod, but your smartphone.

Why? Because smartphones have cameras, and cameras can, in principle, capture a lot more about gait than any individual sensor on your body.

We’re now at the point where video from smartphone cameras, fed into deep learning models, can do things like this:

State-of-the-art pose detection in video. From David Pagnon's Sports2D project

This tech is tantalizingly close to being able to do full 3D gait analysis outside of the lab, without the need for expensive cameras, proprietary software, or a decade of graduate and postdoctoral training.

Stanford’s OpenCap project is a huge advance on this front, though it currently still requires multiple synchronized smartphones, mounted on tripods, and a calibration board in the background (not to mention its relatively restrictive research-only license).

To be clear, we aren’t there yet—there are still major challenges to developing and validating this pose detection technology, but I think it’s by far the most promising direction for studying running biomechanics, at scale, in the real world.

Recap

Your gait pattern during your day-to-day training is much more variable than the gait pattern you adopt during a comprehensive in-lab gait analysis. Only about one-third of the steps you take during real-world running are well-represented by in-lab gait, with many of the others being far outside the range of gait patterns seen in the lab.

Though group averages are less affected, the findings are a bit worrying for people who are relying on in-lab gait data to tell them about issues with their gait that might be causing injury—if their real-world running is different from how they run in the lab, the cause of their problems may not even show up when they run in the lab.

This study certainly had some limitations, the biggest of which was the reliance on just five coarse-grained gait metrics to summarize an entire gait pattern, and it’s not certain that the findings are fully attributable to a true difference in gait during in-lab versus real-world gait mechanics.

Even in light of these limitations, I’m still more skeptical of the ability for in-lab gait data to generalize well to real-world running. Fully addressing these concerns is going to require doing high-quality biomechanical studies outside, in the real world, on real runners doing their real daily training.

Learn more about the science of running

If you enjoyed this article, subscribe to my free email list! It’s the best way to find out when I’ve got new content coming out.

I also have a book, Modern Training and Physiology for Middle and Long-Distance Runners, that focuses more on the science of performance and training for events from 800m to the 10k. Check it out!

Disclosures & funding

This was a very scrappy project—I had to pull together funding and equipment from a lot of different sources to make it happen. Stryd Inc. provided some of the on-shoe foot pods used in this study. Equipment, travel costs, and subject payments for this study were funded in part by grants from the American College of Sports Medicine, World Athletics, the American Society of Biomechanics, the De Luca Foundation, and the IU Graduate and Professional Student Government. None of these companies or agencies had any role in designing, conducting, or analyzing the results of this study.

Reproducibility and data availability

Are you a running data nerd? This paper is not just open-access, but open-source and open-data. The code to reproduce the full analysis is on my GitHub, and the raw sensor data from the Stryd and Garmin devices are available on FigShare, both under the highly permissive MIT license.

Footnotes


[1] Because of the force sensors, force treadmills are extremely rigid—they don’t have the cushion of a typical commercial or home treadmill, so in this sense they are more similar to outdoor running on concrete or pavement as compared with your standard treadmill.

[2] Most university gait labs that do “gait analysis” for runners do not use the kind of musculoskeletal models pictured above, so if you go into a lab for a paid gait analysis that is not part of a research study, you are unlikely to get this level of detail in your gait analysis results. It’s not because of cost (OpenSim is free software) but because of expertise—there’s perhaps a few dozen gait labs in the world that know how to do this kind of modeling work.

[3] Some gait lab measurements are more accurate than others. One of the major sources of error is the movement of the markers on your skin, relative to the underlying position of the bones. For that reason, measurements of hip rotation and ad/ab-duction angles are significantly less accurate than measuring, say, knee flexion angle.

[4] The Achilles / knee joint force prediction papers are still in the works, though technically they are already available in my actual dissertation document. I recommend waiting on the publication of the papers, though!

[5] My motivation for using consumer-grade devices was to scale up this predictive model to massive datasets—e.g. everyone on Strava who has a Garmin HRM—to solve the inherent problems with small sample sizes in biomechanics research on running injuries.

[6] We used step length instead of cadence because of a mundane technical reason: speed, cadence, and step length are mathematically linked, such that speed = cadence times step length. Garmin and Stryd both round cadence to half-integer or integer stride frequencies (e.g. 80 spm for a cadence of 160 steps per minute), and this discretization reduces the accuracy of the data—it’s perfectly possible to take 160.3 steps in a minute. Step length is measured to the millimeter, so we went with that instead thanks to its higher precision.

[7] Why not other Garmin Run Dynamics, like vertical ratio and stance time percent? These metrics are “constructed” in that they are purely derived from other variables. Vertical ratio, for example, is just vertical oscillation divided by step length—it doesn’t add any new information beyond what’s already conveyed by the main five metrics

[8] To capture the outline of this “cloud” of gait patterns, we calculated a shell (a convex hull, technically) that encompassed 95% of the individual steps—so for example, if you took 1000 steps during the in-lab gait analysis, the shell would encompass the most central 950 of them. This helped reduce the effects of weird, one-off outlier steps where you may have taken an awkward step. As a consequence, our threshold for gait patterns being “well-represented” was 95%, not 100%.

[9] What you’re actually looking at is a 2D projection of the 5D gait pattern data, using a technique called multi-dimensional scaling—very similar to principal component analysis (PCA), if you’ve encountered that before.

About the Author

John J Davis, PhD

I have been coaching runners and writing about training and injuries for over ten years. I've helped total novices, NXN-qualifying high schoolers, elite-field competitors at major marathons, and runners everywhere in between. I have a Ph.D. in Human Performance, and I do scientific research focused on the biomechanics of overuse injuries in runners. I published my first book, Modern Training and Physiology for Middle and Long-Distance Runners, in 2013.

Leave a Comment

Did you know I have a book? Check it out here!