Evaluation of Physiotherapy Exercise by Motion Capturing Based on Artificial Intelligence: A Review

Physical therapy is an important form of rehabilitation for patients suffering from a variety of disorders. Since professional physiotherapists are not always available, there is a need to introduce an intelligent system that assets the patients to perform the exercise by themselves. Any evaluation system consists of hardware interfacing, computers, processing, and evaluation tools. These tools made it easier to build methods for automating the evaluation of patient performance and advancement in functional rehabilitation. In this research, about one hundred research papers are classified according to the above-mentioned system parts. The review of current tools for capturing rehabilitative motions shows that the Kinect camera has been used in about 35% of the studies. This review concentrates on using machine learning techniques to evaluate motion in rehabilitation. The most relevant research for physiotherapy evaluation using deep learning have shown that the Convolutional Neural Network (CNN) is widely used by 44% of the researcher. A useful overview the collection of the reference datasets illuminates that the KIMORE dataset is popular and used by 38% as compared with other types of datasets. The advanced literature in the present peer-reviewed paper (2016–2022), includes primary studies and organized reviews.


INTRODUCTION
Physical rehabilitation is often given to patients who need to improve functional abilities after surgery or injury or who have limited mobility or disabilities [1].Physiotherapy also has a positive effect on balance, muscular strength, enhanced mobility, management of age-related disease, and improved balance [2][3][4].Numerous studies emphasize the necessary importance of physical therapy for improving patient outcomes and show the significant relation between exercise strength and rehabilitation program results [5].However, offering patients access to doctors during all rehabilitation sessions is neither practical nor cost-effective.As a result, current health services around the world are concentrated in hospitals under the direct supervision of physicians, then there is an initial phase of rehabilitation programs when patients complete a series of supervised activities in their homes or outpatient clinics [6].If a patient is sufficiently motivated and does the exercises correctly at home, they rehabilitated faster than they would with traditional physical therapy that is exclusively provided in clinics.Feedback and direction are given to assist patients to have a better experience receiving physical therapy at home.Consequently, equipment and technology, like robotic assistance devices, exoskeletons, haptic devices, and virtual gaming environments, are employed to promote home-based rehabilitation [7]- [9].scene that was collected, which has restrictions on the evaluation accuracy.One method to overcome this shortcoming is the use of optical motion tracking systems, which used a collection of signs placed at important points on a participant's body and are recorded by numerous cameras with high resolution.These systems use computer techniques to reconstruct the three-dimensional scene by analyzing and comparing the images captured by a collection of various cameras.Recent 3dimensional scene recon-struction technology has gained popularity due to its simplicity of use and low cost [10]- [12].
Low-cost motion capture sensors are used recently, like the Asus Xtion and Microsoft Kinect.Examples of such instruments that help rehabilitation activities include Virtual Exercise Rehabilitation Assistant (VERA) and Kinect Rehabilitation System (KiReS).These methods use a depth sensor to capture patient motions, and a user interface shows two images performing the exercise that the doctor has recommended as well as the patient's ongoing movements and postures in real-time [13].Patients can improve their exercise performance and self-correct when necessary with the help of such visual feedback [14].
One of the methods that use depth sensors to obtain a real image to perform processing exercises, a large number of optical sensors such as cameras, 3D cameras, motion sensors, and depth sensors were used.Sensors and data analytics can improve evaluation by offering a suitable quantitative and objective measurement of patient performance [15], [16].This is because many exercises and movements are subjectively evaluated, which can be difficult.Recently, a variety of computational techniques have been proposed to create systems that can remotely evaluate abnormal human motion and help with athome rehabilitation [17], [18].

Feature Extraction
Physical rehabilitation computer-aided assessment involves analyzing movement data recorded by a sensory system to measure patient performance in performing prescribed rehabilitation exercises.To reconstruct the humanoid skeletal structure and pose and to extract gesture and body motion information, the process of identifying the human body joints is known as "human pose estimation" [19]- [21].Due to its significance in numerous scientific domains, it is regarded as one of the most difficult in the world of computer vision and has gained a lot of interest from researchers in this field.Pose estimation techniques have been applied in a wide range of different technological fields to automate various processes and tasks as well as emphasize the provision of a more organic, human-centric application interface [22]- [24].
Skeletal data generated by the color camera, depth camera, and inertial sensors are commonly used in the evaluation of rehabilitation.These data are collections of angular or positional coordinates for the human body joints that are organized chronologically.For example, Fig. 1 shows the 33 landmarks detected using MediaPipe pose detector [25].Due to their high levels of redundancy, full-body skeleton data are little used directly for modeling and analyzing human movements.As a result, feature engineering is frequently used to extract pertinent information from skeletal data by choosing significant skeletal dimensions or distances [26].
Fig. 1 Landmarks detected using MediaPipe pose detection.Another common technique for feature extraction is the development of new methods to represent the motions based on a collection of evaluation criteria that are specified for an exercise or using functional mapping.Similarly to this, dimensionality reduction is a frequently utilized processing step in feature engineering when removing highly linked or unimportant dimensions from the data [27]- [29].To reduce the dimensionality of movement data, Principal Component Analysis (PCA) and its variations are frequently utilized.Recent research has focused on automatic feature extraction from gathered rehabilitation data using machine learning models [30], [31].The effective feature engineering of these algorithms results in a better representation of the movement evaluation's raw input data, and it is an important factor in the appropriate techniques.
Artificial intelligence (AI) is a developing field of research that aims to create systems that resemble human intelligence and it has applications in several fields, particularly in the medical field.It is one field of computer science that can improve the healthcare system through new approaches and very well decisions for the facilitation of patient engagement.AI can be used to predict outcomes and automate making a decision based on patient data [32].

Rehabilitation Exercises Evaluation
Evaluation of rehabilitation exercises needs a set of methods and devices to achieve an efficient evaluation of treatment exercises.The researchers generally worked in several directions, which included research on motion sensors and the development of algorithms for the evaluation method.In addition to the use of databases available for each case of evaluation, which extends to a wide range.As well as the use of smart systems to increase the efficiency of the evaluation process [33].We will quickly review the devices used and developments in the above fields.
An important part of home rehabilitation systems is the improvement of these systems to make them able for recording human motion, analyzing the recorded data automatically, and evaluating the recorded movements.These systems are developed using low-cost sensors and developing algorithms to analyze and model movements and thus assess rehabilitation.The concerned therapist can also follow online exercise recordings and evaluate them to make the necessary corrections [34].
Traditional clinical evaluation of a patient's development often relies on the physician observing the patient moving or performing an exercise [35].Tests commonly used for this purpose include the Fugl-Meyer Assessment (FMA), the Wolf Motor Function Test (WMFT), the Jebsen-Taylor Hand Function Test (JTHFT), and the Optimum Motion Execution Ratio, which are case-specific tests [14], [36], [37].In addition to the manual muscle test and the range of motion test to evaluate the muscles and joints when performing the exercise.
While certain frequently used physical examinations or checking procedures may be more objective and quantitative, others depend on a physician's guess and evaluate the performance of the patient.Clinical tests and examinations that are more subjective may have difficulties with the accuracy and reliability of the evaluation based on the task, exercise, or activity being examined.On the perceptual aspect of motion assessment, obtaining accurate motion data through motion sensors is essential.
This article reviews the systems of measurements that have been used to evaluate rehabilitation exercises depending on the data captured from motion sensors.The computational methods for evaluating physical therapy exercises are classified into discrete movement score, rulebased and template-based methods [38], [39].Using discrete movement score techniques, each repetition of rehabilitation activities is classified into specific categories, such as correct or incorrect.Machine learning classifiers are frequently used for this purpose by producing discrete class values of 0 or 1. (i.e., repetition done correctly or incorrectly).These techniques only provide moderate levels for the quality of movement (for example, with scores between 0 and 1) but are limited in their capability to detect minor variations in patient performance [40].
In the rule-based approach, experts identify some motion descriptors (angles, joint position, comparative distance, velocity), which define the "motion sample".The rules serve as the benchmark for measuring the degree of correctness [41], [42].These methods have the drawback that the principles are exercise-specific and cannot be applied to other activities [43].In the templatebased method, the motion sequence is first recorded and measured motions are then compared with a template of the original motions.The template is usually created by monitoring healthy participants to complete the exercises correctly [44], [45].Comparisons are made with the monitoring, through either action-match methods or using machine learning systems which is be easily applied to various types of exercise [46]- [48].One group of related methods uses distance functions to obtain a matching score between repetitions performed by the patient and reference template recurrence.The most often used functions for this purpose are dynamic temporal warping distances, Euclidean distances, and Mahalanobis distances [49]- [52].
The evaluation of rehabilitation is organized in this review in two sections.Section 2 reviews the methods for assessing patient motions as they perform rehabilitation exercises.While Section 3 presents the most relevant research for physiotherapy evaluation using deep learning.

EVALUATION METHODS
Successful assessment of motion in rehabilitation systems requires accurate quantification for the level of implementation of the therapeutic exercises using recorded data of the movement.The methods for assessing rehabilitation exercises are mostly divided into three categories: discrete movement score methods, rule-based methods, and template-based methods.

Discrete Movement Score Methods
The approaches in this group classify each repetition of a single exercise into different classes.Correct and incorrect movements are the most frequent distinct classes.For example, binary class values of 0 or 1 are frequently the outputs of incorrect or correct repetition.
To distinguish between two groups, the Adaboost classifier, Bayesian classifier, k-nearest neighbors, and multi-layer perceptron neural networks have all been utilized.For example, dimensionality reduction by PCA and noise filtering of the data were used before applying the k-nearest neighbor's classifier for exercise categorization.Machine learning classification was also used in earlier studies for motion grouping into score classes.For instance, the automated Fugl-Meyer Assessment (FMA) used Support Vector Machines (SVM), artificial neural networks, and random forest, Wolf Motor Function Test (WMFT) used a naive Bayes classifier, and the Functional Ability Scale (FAS) evaluation used random forest [53], [54].
Studies using discrete motion scores have found that they are highly accurate in differentiating between correct and incorrect movement sequences.Despite this, the approaches all share the inability to track ongoing changes in the quality of movement or measure the improvement in patient execution throughout the rehabilitation exercise.Therefore, the discrete movement score categorization is less important for creating systems that quantify the effectiveness of rehabilitation.

Rule-based Methods
The techniques of rule-based make use of a predetermined collection of guidelines for a rehabilitation activity that have been created beforehand by medical professionals or experts in human motion.For determining the degree of movement correctness, the rules serve as a standard.Although there are fewer rules such as relative angles or distances to represent simple activities, a more inclusive group of rules is required for the representation of more complex exercises.For example, the knee and ankle angles were used to evaluate the effectiveness of sit-tostand and squat workouts [55].W. Zaho et al. proposed three categories of kinematic rules: rules for static postures, rules for dynamic movement, and rules for motion stationary to simulate the rehabilitation exercises.Then the efficiency of a rehabilitation program was measured by one final score which was generated using fuzzy logic.For simpler workouts, investigating rule-based approaches is a beneficial alternative; however, for more complicated forms of rehabilitation exercises, it gets more challenging to extract trustworthy features and provide a thematic rating.These systems also lack flexibility and the ability to generalize to new exercises because they required a new collection of rules chosen for each specific exercise [10].

Template-based Methods
In template-based methods, the evaluation of the performance patient's exercise is based on the variance between the patient's training movement sequences and the template movement sequences.For instance, the template series may be reference movements carried out by healthy individuals, professionals, or patients who are being supervised by a clinician, while the training sequences could be recorded while a patients do an exercise.Distance functions and probability density functions are the two types of metrics applied to determine motion matches in templatebased methods [56]- [58].

ALGORITHMS OF AI IN THE MEDICAL FIELDS
AI technologies are recently employed in the healthcare sector and this generates questions and discussions on if AI doctors would eventually replace the requirement for human doctors.Nearby future, AI can assist physicians in making more accurate clinical decisions or sometimes substitute for human decisions in certain practical fields in healthcare.
Machine learning (ML) is an application of AI that gives systems the capacity to automatically learn from experience and get better over time without having to be programmed using the features that were previously chosen.There are two main divisions of ML algorithms, supervised and unsupervised learning.Unsupervised learning is popular for feature extraction, whereas supervised learning is appropriate for predictive modeling by creating some correlations between the characteristics of the patient and the desired result [59], [60].
One tool of AI is the Artificial Neural Network (ANN) which is used for rehabilitation assessment.Researchers frequently used artificial neural networks (ANNs) for rehabilitation assessment.An ANN can be visualized as a network of nodes connected in layers, where each node's output feeds into the next processing layer's input.The most prevalent of these was the Multilayer Perceptron (MLP).
Convolutional Neural Network (CNN) architectures have utilized a variety of constructing blocks, including convolution layers, pooling layers, and fully-connected layers, to efficiently and automatically learn spatial feature hierarchies by backpropagation (see Fig. 2).Although CNNs achieved remarkable accuracy in the field of computer vision, this did not apply to the time-series data format, which is how the sensor data is organized [61].Despite the crucial contribution that rehabilitation assessment makes to better patient outcomes and lower healthcare costs, the current methodologies lack adaptability, robustness, and applicability in real-world settings.One of the proposed solutions for the above-mentioned drawbacks is using Deep Learning (DL).A deep learning approach for automating the evaluation of the effectiveness of physical rehabilitation exercises was presented by many researchers as given below.
Yalin L. et al, present the framework's key elements including metrics for measuring movement performance, and scoring function for converting the performance metrics into numerical ratings of mobility quality.DL models are used to produce quality assessments of input motion through supervised learning.The benefits of deep neural networks for this task come from their ability to mimic human movements hierarchically at various spatial and temporal levels of abstraction [62].
Active Range of Motion (ARoM) measured by hand-held goniometer introduced errors in measuring finger and knee motion measurement.A hand-held goniometer is currently used to measure finger ARoM, which introduces measurement error.Therefore, CNNs are used to measure the active range of motion (ARoM) of the fingers and the range of motion of the knees to properly evaluate results following hand surgery and throughout rehabilitation.Aws et al. proposed a classifier that automatically determines the hand attitude by moving Intel® RealSense™ Camera (SR300) as needed by each algorithm to provide the best view.By testing four different classifiers Alexnet, Support Vector Machine (SVM), Speeded-Up Robust Features (SURF-SVM), and proposed CNN, the designed CNN achieves 99% accuracy in classifying [63], [64].
Human-computer interaction, entertainment, surveillance, human motion analysis, healthcare, and robots are several fields that have used human pose estimation approaches.While there are fewer works in recent years on CNNbased 3D human pose estimation from depth (see Fig. 3), as well there isn't a lot of 3D data available.Earlier approaches have not been able to outperform the current ones.In part because depth maps are used as simple 2D single-channel images rather than a true 3D world representation [65].Fig. 3 3D skeleton for different exercises.
Manolis et al. [66] presented a 3D-CNN detection architecture for 3D human pose estimation from 3D data to get around this constraint and take into account recent developments in 3D detection tasks of a similar kind.The system can estimate numerous human positions simultaneously without the number of people in the scene having an impact on how quickly it runs.3D CNN is robust to differences in hand sizes and global coordination by estimating hand poses from single-depth images.
Jun et al. [67] proposed a feature-boosting network for estimating both the 3D hand pose and the 3D body pose using a single RGB image.The Long-Short-Term Dependence-aware (LSTD) module augments the features learned by the convolutional layers and allows the inter-mediate convolutional feature maps to understand the graphical long-short-term dependency among various hand (or body) parts using the created Graphical ConvLSTM.
Vakanski1 et al. [68] proposed a deep learning approach for human motion capture data to learn a general representation from a huge corpus of motion capture data and extrapolate effectively to new unknown motions.By utilizing the representational power of these techniques to capture nonlinear input-output correlations over large temporal horizons, they use recent developments in machine learning and neural networks (RNN and LSTM) to construct a parametric model of human motions.RNN and LSTM networks generates its current response by considering the current input and the previous output as well as have the ability to remember the previous input sequence [69].Since skeletal data is structured differently, they propose and assess Vol. 28, No. 2, September 2023, pp.237-251 various network topologies that depend on various presumptions regarding time dependencies and limb correlations.
Sardari et al. [70] proposed the viewinvariant RGB-based method to evaluate the effectiveness of human motion.Quality of Movement Assessment for Rehabilitation (QMAR), the only multi-view, non-skeleton, nonmocap (Motion Capture System) rehabilitation movement dataset was used to compare methods and assess how well a method works.
Most research works developed probability density functions for modeling and assessing rehabilitation programs since the probabilistic models are capable of dealing with the stochastic variation of human motion.For instance, movement quality has been evaluated using the log-likelihood of individual sequences produced from a trained Gaussian mixture model [71].Human motion data was analyzed and segmented for therapeutic exercises using discrete hidden Markov models (HMM).These methods are assessed using data from workouts frequently done following hip or knee replacement surgery [72].For example, Marianna et al. [73] presented a methodology based on hidden semi-Markov models (HSMM) to assess five different exercise programs and generate a score.The limitations for splitting the exercises into particular replications by discrete HMM or HSMM were overcome by using a continuous HMM.
Parkinson's disease (PD) is a neurological condition that worsens over time and causes severe mobility loss.Exercise is receiving attention as a different option that may help enhance mobility and other nonmotor symptoms relating to cognition and emotion in addition to the advantages of either drugs or surgery, even if both may improve various elements of movement [74].
Several studies [70], [75]- [79] have been developing automated systems to assess the rehabilitation of Parkinson's and stroke patients.For example, Wenchuan et al. [77] proposed a virtual Physical Therapist (PT) system using machine learning to offer individualized remote training for PD patients.To assist individuals with PD in gaining better balance and mobility, three physical therapy exercises with varying degrees of difficulty were chosen.A Kinect sensor records patient movement.PT carefully crafts the criteria for each task so that the performance of the patients may be assessed mechanically.They provide a Two-Phase Human Action Understanding algorithm (TPHAU) to comprehend the patient's movements based on the motion data as well as an error identification model to identify the patient's movement errors.
The year 2020 marked an important turning point for the entire world, requiring everyone to change their daily routines due to the pandemic.According to the Eurofund study conducted in July 2020, 33.7% of European Union (EU) employees work at home full-time [80], while an additional 14.2% work at home part-time.On the other side, spending a lot of time in front of a computer can result in injuries to the spine, neck, shoulders, arms, and/or wrists; these injuries frequently result from incorrect postures in front of the computer [81].For this purpose, Enrique Piñero-Fuentes et al. [82] proposed a CNN system based on the postural detection of the person.This system used specialized hardware to process video in real-time.The 4-class original problem's posture detection accuracy was over 80%, while the 2-class classification system's accuracy was over 90%, according to the results.
Additionally, childhood skill acquisition is crucial for both physical and psychological development.Many physiometry tests are designed to evaluate the performance of a transgenic child, but the evaluation process is a tedious manual task.Satoshi Suzuki et al. [83] presented a CNN-based deep learning network to classify and evaluate the Gross Motor (GM) skills of children.Applying the methods to actual GM evaluation including 13 types of GM motions with 155 combinations of GM assessment scores.A summary of using Deep Learning in the evaluation of physical rehabilitation is provided in Table 1.

DISCUSSION
The feasibility of the evaluation of rehabilitation technology has been supported by the most relevant research in the literature as it can have a positive impact on patients as well as healthcare systems.
This paper presents a review of computational techniques for monitoring patient performance during rehabilitation exercises.Focusing on machine learning techniques to assess the degree of patient movements performed at home.The relevant evaluation methods are classified into three main categories: discrete degree of motion, rule-based method, and template-based method.The degree of motion is effective and obtains high precision but not able to measure varying levels of functional capabilities.Rule-based methods provide various performance scores with less complicated computation, while it is required to design a new rule for every new exercise.Finally, the templet-based method is not necessity to design rules as it gives final score for the performance.
In earlier studies, manual identification of important motion characteristics is used.These techniques have drawbacks because manual extraction of features from motion data requires expert and kinematics knowledge, and the resulting characteristics cannot be applied to different exercises.This is the reasoning behind why many researchers have turned to deep learning.
Deep learning techniques are a desirable approach to motion modeling and analysis because they can capture highly nonlinear interactions between groups of variables and they can encode data features at many levels.
The conclusion from Table 1, shows that the CNN algorithms are the most widely used in evaluation methods approximately by 44% as compared with the other methods (see Fig. 4).The accuracy of using CNN in evaluation systems is approximately between 77%-99% depending on the application and the dataset used.In the reviewed studies, CNN has been applied to several standard datasets such as KIMORE, UTD-MHAD, UI-PRMD, and other collected data as shown in Fig. 5. KIMORE is the most used dataset by 38% of the reviewed studies.
Different types of motion capture sensors are used in the earlier studies as shown in Fig. 6.Where Kinect v1 and v2 are the most sensors used at 40% followed by inertial sensors at 35%, leep motion controller at 15%, and IntelRealsense at 10%.

CONCLUSION
The evaluation methods to assess the rehabilitation exercises are classified into three main categories: discrete degree of motion, rulebased method, and template-based method.The discrete degree of motion is preferred to be used for systems that require higher accuracy.The rulebased method is used for systems that have various scores with simple computation.While the template-based method is used for systems to indicate the ability of physical skills.
It can be concluded that CNN algorithms are the most widely used in evaluation.This is due to several reasons, including the ease of applying these techniques with different types of data for various therapeutic exercises.In addition to their ability to classify and detect a wide range of data.The KIMORE dataset is the most preferred in the assessment as contains a large heterogeneous population and various exercises for different groups of healthy, pain, and disorders.
The RealSense camera can be seen as a similar if not preferable alternative to Microsoft Kinect due to its capabilities in highly accurate sampling by tracking the 3D coordinates of body joints, especially when tracking precise or fast movements.Therefore it is seen as a suitable system for research and development in the healthcare sector.As it is track movement during therapeutic exercises and the ability to track improvements during rehabilitation by simply performing these tests in small spaces and without attaching sensors or tools to the patient.

Table 1
Summary of using Deep Learning in the evaluation of physical rehabilitation.