The FMS is a great tool to uncover asymmetry and movement dysfunction in fitness clients as well as patients in the clinic who are ready to transition back to sport.  I have been using this tool consistently for 2.5 years in my practice.  One of the questions I have asked myself about the screening tool is how reliable is it?


Click here for an earlier post I wrote on this topic regarding what it tells us as practitioners.  One of the challenges with any screen or test is not only validity but reliability.  In the April edition of the Journal of Strength and Conditioning Research, we gain some new insight regarding intra and interrater reliability via 2 new articles.

The first article discusses a controlled laboratory study where repeated measures were used to investigate how experience using the FMS and clinical experience as an athletic trainer (AT) affects the intrarater reliability of FMS testing.  The raters (17 men and 21 women who were recruited from the university’s athletic training clinical staff and academic programs), with different levels of FMS and clinical experience (AT students, AT or AT with at least 6 months experience using the FMS) viewed each of the 3 videotaped models.

None of the AT students or AT members had seen or used the FMS previously compared to the AT group with at least 6 months of experience. Each group rated the models on each of the FMS exercises according to the script presented by the lead investigator.  A week later the raters watched the same videos again in a different randomized order and rated each model on each exercise.

The intersession scores were examined to establish intrarater reliability of all participants.  In addition, the intrarater reliability of different groups of participants (students and clinicians) was compared to infer differences about the influence of clinical experience as an AT along with previous experience using the FMS.


  1. Average FMS score was 13.68 +/- 0.98
  2. There was moderate intrarater reliability was observed when all participants were analyzed
  3. The AT group with experience had the strongest intrarater reliability followed by the AT group (no experience)
  4. The AT students demonstrated poor reliability with a large 95% confidence interval

Key takeaways:

  • Previous research by Minick et al. had established excellent agreement on all components of the FMS among expert and novice raters indicating strong interrater reliability, but authors point out that is first study to look at intrarater reliability
  • This helps establish reliability of assessment giving more credibility to using it as an effective assessment tool
  • All raters were either AT’s or AT senior undergraduate students – need to include other professionals in future studies to confidently apply the external validity of the results in this particular study
  • Primary limitation of this study is the fact that video was used as opposed to live assessment
  • Possessing clinical experience and experience with the FMS strengthens intrarater reliability so learning to use the tool with other experienced clinicians may be wise to improve validity

Click here to read the abstract on this article.

The second article discusses a study to examine both intra and interrater reliability of the FMS between raters of different experience and education with real time administration in healthy injury-free men and women.  The authors hypothesized that intra and interater reliability woulb be good as well as having FMS certification would result in increased intrarater reliability. Four raters simultaneously scored the FMS using standard guidelines.  Then, they all returned one week later for re-testing.

Twenty healthy subjects (10 men, 10 women) with normal BMI were assessed.  The raters included an entry-level physical therapy student who had completed 100 FMS tests but was not certified, a certified FMS rater, a faculty member in Athletic Training with a PhD in Biomechanics and Movement Science and no previous FMS experience, and an entry-level physical therapy student with no previous FMS experience.

A 2-hour training session was conducted using materials from the FMS developers.  This session was led by the entry-level PT who had conducted 100 FMS tests.  This covered the 7 movements, 3 clearing tests, verbal instructions and the scoring criteria.


  1. Overall score ranged from 11 to 17 (mean = 14.3 +/- 1.5
  2. Interrater reliability was good for session 1 (ICC = 0.89) with 95% confidence interval and for session 2 (ICC = .87)
  3. 100% agreement b/w raters with all 3 clearing tests
  4. Interrater reliability was high for hand length and tibia length for sessions 1 and 2
  5. Hurdle step was least reliable of all tests (ICC = 0.30 for session 1 and 0.35 for session 2)
  6. Most reliable measure was shoulder mobility (ICC = 0.98 for session 1 and 0.96 for session 2)
  7. Overall good interrater reliability, however lowest was the certified FMS rater (ICC = 0.81) and highest was faculty member with extensive background in movement (ICC = 0.91)

Key takeaways:

  • Results support hypothesis one of good interrater reliability for total FMS and it can be effectively and consistently scored by raters of varying experience with the FMS
  • The aforementioned point means the FMS can be used by a multi-disciplinary team, and used as athletes and clients move from rehab to performance training (we use it this way all the time in my clinic)
  • FMS certification may be less important than experience in assessing human movement, but this can not necessarily be generalized as it was a small sample of subjects and raters, not to mention the fact that the assessments were not videotaped to allow for any post-hoc evaluation of scoring the movements
  • Hurdle step was least reliable – this may mean more attention needs to be given to teaching the movement and its scoring criteria to improve interrater reliability

Click here to read the abstract on this article

Both of these studies are positive and indicate we can feel confident about administering the FMS as a tool with our clients and patients.  In the future, we need to continue looking at different populations of subjects and raters to further identify trends and improve reliability measures.