Research Article| Volume 53, ISSUE 7, P2625-2634, July 2022

Vision Transformer for femur fracture classification


      • The proposed approach used a Vision Transformer (ViT) for femur fractures classification for the first time.
      • Attention maps and clustering showed the reliability of this architecture.
      • An evaluation carried out by clinicians with and without the help of our method showed the utility of this tool.



      In recent years, the scientific community focused on developing Computer-Aided Diagnosis (CAD) tools that could improve clinicians’ bone fractures diagnosis, primarily based on Convolutional Neural Networks (CNNs). However, the discerning accuracy of fractures’ subtypes was far from optimal. The aim of the study was 1) to evaluate a new CAD system based on Vision Transformers (ViT), a very recent and powerful deep learning technique, and 2) to assess whether clinicians’ diagnostic accuracy could be improved using this system.

      Materials and methods

      4207 manually annotated images were used and distributed, by following the AO/OTA classification, in different fracture types. The ViT architecture was used and compared with a classic CNN and a multistage architecture composed of successive CNNs. To demonstrate the reliability of this approach, (1) the attention maps were used to visualize the most relevant areas of the images, (2) the performance of a generic CNN and ViT was compared through unsupervised learning techniques, and (3) 11 clinicians were asked to evaluate and classify 150 proximal femur fractures’ images with and without the help of the ViT, then results were compared for potential improvement.


      The ViT was able to predict 83% of the test images correctly. Precision, recall and F1-score were 0.77 (CI 0.64–0.90), 0.76 (CI 0.62–0.91) and 0.77 (CI 0.64–0.89), respectively. The clinicians’ diagnostic improvement was 29% (accuracy 97%; p 0.003) when supported by ViT's predictions, outperforming the algorithm alone.


      This paper showed the potential of Vision Transformers in bone fracture classification. For the first time, good results were obtained in sub-fractures classification, outperforming the state of the art. Accordingly, the assisted diagnosis yielded the best results, proving the effectiveness of collaborative work between neural networks and clinicians.



      AO (arbeitsgemeinschaft für osteosynthesefragen), OTA (orthopaedic trauma association), CNN (convolutional neural network), ViT (vision transformer)
      To read this article in full you will need to make a payment

      Purchase one-time access:

      Academic & Personal: 24 hour online accessCorporate R&D Professionals: 24 hour online access
      One-time access price info
      • For academic or personal research use, select 'Academic and Personal'
      • For corporate R&D use, select 'Corporate R&D Professionals'


      Subscribe to Injury
      Already a print subscriber? Claim online access
      Already an online subscriber? Sign in
      Institutional Access: Sign in to ScienceDirect


        • Woolf A.D.
        • Pfleger B.
        Burden of major musculoskeletal conditions.
        Bull World Health Organ. 2003; 81: 646-656
        • Reginster J.Y.
        • Burlet N.
        Osteoporosis: a still increasing prevalence.
        Bone. 2006; 38 (FebSuppl 1): S4-S9
        • Parker M.
        • Johansen A.
        Hip fracture.
        BMJ. 2006; 333 (Jul 1): 27-30
      1. Journal of Orthopaedic Trauma. Femur. 2018 Jan;32:S33–44.

        • Kirby M.W.
        • Spritzer C.
        Radiographic detection of hip and pelvic fractures in the emergency department.
        Am J Roentgenol. 2010; 194 (Apr 1): 1054-1060
        • Tanzi L.
        • Vezzetti E.
        • Moreno R.
        • Aprato A.
        • Audisio A.
        • Massè A.
        Hierarchical fracture classification of proximal femur X-Ray images using a multistage deep learning approach.
        Eur J Radiol. 2020; 133 (Dec)109373
        • LeCun Y.
        • Bengio Y.
        • Hinton G.
        Deep learning.
        Nature. 2015; 521 (May): 436-444
        • Tanzi L.
        • Piazzolla P.
        • Vezzetti E.
        Intraoperative surgery room management: a deep learning perspective.
        Int J Med Robot Comput Assist Surg. 2020; 16: e2136
        • Twinanda A.P.
        • Shehata S.
        • Mutter D.
        • Marescaux J.
        • Mathelin M.D.
        • Padoy N.
        EndoNet: a deep architecture for recognition tasks on laparoscopic videos.
        IEEE Trans Med Imaging. 2017; 36: 86-97
      2. Tanzi L., Piazzolla P., Porpiglia F., Vezzetti E. Real-time deep learning semantic segmentation during intra-operative surgery for 3D augmented reality assistance. Int J CARS [Internet]. 2021 Jun 24 [cited 2021 Jun 25]; Available from: doi: 10.1007/s11548-021-02432-y.

        • Olivetti E.C.
        • Ferretti J.
        • Cirrincione G.
        • Nonis F.
        • Tornincasa S.
        • Marcolin F.
        Deep CNN for 3D face recognition.
        (editors)in: Rizzi C. Andrisano A.O. Leali F. Gherardini F. Pini F. Vergnano A. Design tools and methods in industrial engineering. Springer International Publishing, Cham2020: 665-674
        • Krizhevsky A.
        • Sutskever I.
        • Hinton G.E.
        ImageNet classification with deep convolutional neural networks.
        Commun ACM. 2017; 60 (May 24): 84-90
        • Vaswani A.
        • Shazeer N.
        • Parmar N.
        • Uszkoreit J.
        • Jones L.
        • Gomez A.N.
        • et al.
        Attention is all you need.
        in: Proceedings of the 31st international conference on neural information processing systems. Curran Associates Inc., Red Hook, NY, USA2017: 6000-6010 (NIPS’17)
        • Devlin J.
        • Chang M.W.
        • Lee K.
        • Toutanova K.
        BERT: pre-training of deep bidirectional transformers for language understanding.
        in: Proceedings of the conference of the North American chapter of the association for computational linguistics: human language technologies. 2019: 4171-4186 (Volume 1 (Long and Short Papers) [Internet]. Minneapolis, Minnesota: Association for Computational LinguisticsAvailable from)
      3. Radford A., Narasimhan K., Salimans T., Sutskever I. Improving language understanding by generative pre-training. 2018;

        • Carion N.
        • Massa F.
        • Synnaeve G.
        • Usunier N.
        • Kirillov A.
        • Zagoruyko S.
        End-to-end object detection with transformers.
        (editors)in: Vedaldi A. Bischof H. Brox T. Frahm J.M. Computer vision – ECCV 2020. Springer International Publishing, Cham2020: 213-229 (Lecture Notes in Computer Science)
        • Ye L.
        • Rochan M.
        • Liu Z.
        • Wang Y.
        Cross-modal self-attention network for referring image segmentation.
        in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA2019: 10494-10503 ([Internet]Available from)
        • Girdhar R.
        • Carreira J.J.
        • Doersch C.
        • Zisserman A.
        Video action transformer network.
        in: Proceedings of the IEEE Computer Society. 2019: 244-253 ([cited 2021 Jun 25]Available from)
      4. Zhang H., Goodfellow I., Metaxas D., Odena A. Self-attention generative adversarial networks. arXiv:180508318 [cs, stat] [Internet]. 2019 Jun 14 [cited 2021 Jun 25]; Available from:

        • Tanzi L.
        • Vezzetti E.
        • Moreno R.
        • Moos S.
        X-Ray bone fracture classification using deep learning: a baseline for designing a reliable approach.
        Appl Sci. 2020; 10 (Feb 22): 1507
        • Cao Y.
        • Wang H.
        • Moradi M.
        • Prasanna P.
        • Syeda-Mahmood T.F.
        Fracture detection in x-ray images through stacked random forests feature fusion.
        in: Proceedings of the IEEE 12th International Symposium on Biomedical Imaging (ISBI). IEEE, Brooklyn, NY, USA2015: 801-805 ([Internet][cited 2019 Nov 25]Available from)
        • Myint W.W.
        • Tun H.M.
        • Tun K.S.
        Analysis on detecting of leg bone fracture from X-ray images.
        IJSRP. 2018; 8 ([Internet]Sep 12 [cited 2019 Nov 25]Available from)
        • Lindsey R.
        • Daluiski A.
        • Chopra S.
        • Lachapelle A.
        • Mozer M.
        • Sicular S.
        • et al.
        Deep neural network improves fracture detection by clinicians.
        Proc Natl Acad Sci USA. 2018; 115 (Nov 6): 11591-11596
        • Olczak J.
        • Fahlberg N.
        • Maki A.
        • Razavian A.S.
        • Jilert A.
        • Stark A.
        • et al.
        Artificial intelligence for analyzing orthopedic trauma radiographs: deep learning algorithms—Are they on par with humans for diagnosing fractures?.
        Acta Orthopaedica. 2017; 88 (Nov 2): 581-586
      5. Rajpurkar P., Irvin J., Bagul A., Ding D., Duan T., Mehta H., et al. MURA: large dataset for abnormality detection in musculoskeletal radiographs. arXiv:171206957 [physics] [Internet]. 2018 May 22 [cited 2019 Nov 25]; Available from:

        • Chung S.W.
        • Han S.S.
        • Lee J.W.
        • Oh K.S.
        • Kim N.R.
        • Yoon J.P.
        • et al.
        Automated detection and classification of the proximal humerus fracture by using deep learning algorithm.
        Acta Orthopaedica. 2018; 89 (Jul 4): 468-473
      6. Jiménez-Sánchez A., Kazi A., Albarqouni S., Kirchhoff C., Biberthaler P., Navab N., et al. Towards an interactive and interpretable CAD system to support proximal femur fracture classification. arXiv:190201338 [cs] [Internet]. 2019 Feb 4 [cited 2019 Nov 25]; Available from:

        • Lee C.
        • Jang J.
        • Lee S.
        • Kim Y.S.
        • Jo H.J.
        • Kim Y.
        Classification of femur fracture in pelvic X-ray images using meta-learned deep neural network.
        Sci Rep. 2020; 10 (Aug 13): 13694
        • Kazi A.
        • Albarqouni S.
        • Sanchez A.J.
        • Kirchhoff S.
        • Biberthaler P.
        • Navab N.
        • et al.
        Automatic classification of proximal femur fractures based on attention models.
        (eds)in: Wang Q. Shi Y. Suk H.I. Suzuki K. Machine learning in medical imaging. Springer International Publishing, Cham2017: 70-78 (Lecture Notes in Computer Science)
        • Dosovitskiy A.
        • Beyer L.
        • Kolesnikov A.
        • Weissenborn D.
        • Zhai X.
        • Unterthiner T.
        • et al.
        An image is worth 16x16 words: transformers for image recognition at scale.
        in: Proceedings of the international conference on learning representations. 2021 ([Internet] Available from)
      7. Redmon J., Farhadi A.. YOLOv3: an incremental improvement. arXiv:180402767 [cs] [Internet]. 2018 Apr 8 [cited 2021 Mar 23]; Available from:

        • Cohen J.F.
        • Korevaar D.A.
        • Altman D.G.
        • Bruns D.E.
        • Gatsonis C.A.
        • Hooft L.
        • et al.
        STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration.
        BMJ Open. 2016; 6 (Nov)e012799
      8. Hassani A., Walton S., Shah N., Abuduweili A., Li J., Shi H. Escaping the big data paradigm with compact transformers. arXiv:210405704 [cs] [Internet]. 2021 Aug 13 [cited 2021 Oct 19]; Available from:

      9. Chollet F., Others. Keras [Internet]. 2015. Available from:

      10. Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., et al. TensorFlow: large-scale machine learning on heterogeneous systems [Internet]. 2015. Available from:

      11. Oliphant T. NumPy: a guide to NumPy [Internet]. 2006. Available from:

        • Marks R.
        • Allegrante J.P.
        • Ronald MacKenzie C.
        • Lane J.M
        Hip fractures among the elderly: causes, consequences and control.
        Ageing Res Rev. 2003; 2 (Jan): 57-93
        • Ring J.
        • Talbot C.
        • Cross C.
        • Hinduja K.
        NHSLA litigation in hip fractures: lessons learnt from NHSLA data.
        Injury. 2017; 48 (Aug): 1853-1857
        • Hallas P.
        • Ellingsen T.
        Errors in fracture diagnoses in the emergency department – characteristics of patients and diurnal variation.
        BMC Emerg Med. 2006; 6 (Dec): 4
        • Guly H.R.
        Diagnostic errors in an accident and emergency department.
        Emerg Med J. 2001; 18 (Jul): 263-269
        • Doi K.
        Computer-aided diagnosis in medical imaging: historical review, current status and future potential.
        Comput Med Imaging Graph. 2007; 31 (Jun 1): 198-211
      12. Goodfellow I.J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., et al. Generative adversarial networks. arXiv:14062661 [cs, stat] [Internet]. 2014 Jun 10 [cited 2019 Nov 25]; Available from:

      13. Frid-Adar M., Klang E., Amitai M., Goldberger J., Greenspan H. Synthetic data augmentation using GAN for improved liver lesion classification. arXiv:180102385 [cs] [Internet]. 2018 Jan 8 [cited 2021 Jul 19]; Available from: