Sunteți pe pagina 1din 13

Beryl: Motion Capture Analysis

Cezar Babin Fahim Abouelfadl


cezbabin@seas.upenn.edu fahim@seas.upenn.edu
University of Pennsylvania University of Pennsylvania
Philadelphia, PA Philadelphia, PA

ABSTRACT

The goal of this project was to perform analysis on human motion capture data and provide
useful insight for applications necessitating movement correction. We built a library of tools that
could parse .bvh files and then visualize, filter, compress and structure the data so that it can be
analyzed with traditional data analysis techniques.

Equipped with this custom set of tools, we seeked to demonstrate the application of motion
capture to the field of sports analytics. We built several classifiers that could determine the
success of a free-throw shot in basketball by making use of input parameters that consisted of
key features related to the shooting motion. These tools allowed us to investigate the various
attributes that lead to a shots success and assess their importance based on the individual that
performs the shot. The work conducted here serves as a strong base for developing programs that
can provide customized coaching feedback for individuals attempting to strengthen their
free-throw shot, and can be used as a framework for conducting motion capture analysis in other
areas relating to sports.

1. INTRODUCTION

Motion capture is the process or technique of recording patterns of movement digitally.


Traditionally, motion capture has been most closely associated with the entertainment sector, and
progress in the field has largely been motivated by a desire to create more immersive experiences
for gamers and by a desire to create computer-generated characters that are increasingly realistic
in the film industry. In addition to its prominent role in the entertainment sector, motion capture
has a variety of other applications in the fields of sports analytics, medicine, and scientific
research.

The primary goal of this project is to explore the intricacies associated with the collection and
analysis of motion capture data. Many of the tools required for the compilation and basic
analysis of motion capture data are consistent across the various applications. As a result, a large
portion of this project was dedicated towards building a toolbox consisting of operations that
would play an essential role in reaching tangible results and identifying interesting patterns in a
variety of motion capture use cases. To underscore the use of such a toolkit, the remainder of the
project was dedicated towards applying these tools to the field of sports analytics.
A free throw in basketball is an unimpeded attempt at a basket awarded to a player following a
foul or other infringement. When awarded a free throw, a player lines up at the foul line and
must remain behind the line for the entirety of the shot. Improving ones free throw accuracy is
important at every level of basketball, as free throws account for approximately 20% of all points
scored in an NBA game.

Attempts have been made in the past to quantify the features that contribute to the success of a
free throw. There are a multitude of views as to which of these features actually play a role in
determining the success of a free throw, and even when assessing a specific feature, there is a
lack of consensus as to what the optimal value should be. Due to the conflicting opinions about
the aspects of a players shot that determine its success, our team wondered if it would be
possible to predict whether or not a shot was made based on the shooters motion. The aim of
this portion of the project was to build a binary classifier that can compare the motion associated
with a given shot to a database, and make such a prediction. We further analyzed how a classifier
as such could be used to determine the importance of certain features based on their impact on
the accuracy of the classifier.

2. APPROACH

The first step in our project was to be able to parse the .bvh file format and convert all the data
into a format that would be usable for our analyses. The typical .bvh file structure consists of a
hierarchy at the beginning of the file that states which joints are connected and how they relate to
one another through parent-child relationships. Following this hierarchy is a specification of the
total number of frames in the recording and the length of each frame in seconds. The .bvh file
then contains information that specifies the movement of a given joint in relation to its parent
joint. In order to determine the absolute position of the joints, we had to compute the translation
matrices and perform the necessary translational operations in a manner that reflected the
hierarchy. The .bvh format contains variations in the way the joints are structured hierarchically.
In our case, the format of the data that we collected from the online CMU database was
structured in a manner that differed from the structure of the data captured using our hardware.
Such differences had to be accounted for in our parsing tool to enable others to make use of our
software independently of the structure of their .bvh files. Due to the fact that the motion of a
given joint is recorded relative to the motion of its parent joint, and the fact that some .bvh files
only have a precision of two decimal places, data values recorded for joints that were structurally
lower in the hierarchy were inherently less accurate due to the compounded rounding errors
associated with the motion of the parent joints.

Our next step was to record a shooters movement and the result of each shot in order to identify
any patterns that may exist. To perform the motion capture, we used a suit that consisted of nine
inertial measurement unit (IMU) sensors that were connected to one another and that transmitted
data wirelessly to a computer. The nine sensors were placed on the left/right wrist, the left/right
forearm, the left/right upper arm, the left/right shoulder, and the head.

Figure 1. A single IMU sensor Figure 2. The complete system of sensors used
Each of our data capture sessions involved a continuous recording of 10 free-throw shots at a
time. We limited each recording session to 10 shots in order to minimize the effects of positional
drift, which can compound over time to result in erroneous data. On the other hand, each session
required a fairly cumbersome calibration process, so having fewer shots per session would not
have been optimal in terms of time. A set of predefined actions was performed at the start of
each recording session to provide a simple way of resolving any inconsistencies and
standardizing the data captured across different sessions. In order to classify the shots, we needed
to extract each individual shot from the session.

The activity of segmenting motion capture sessions that consist of a repeated movement into
each individual movement can be used for any repeated action, and not just in this specific
application to free throws. As a result, we wanted to build a tool that would help automate
segmentation for any motion that is repeated, such as segmenting a persons walk into individual
steps, or segmenting the motion of a person jumping rope into each individual jump. We thus
decided to build a cycle detection tool. The input to the tool is the number of distinct repeating
events (number of total free throws), the approximate frame-length of each event, and a stream
of the recorded data that consists of the repeated motion. The tool outputs a set of tuples, each of
which contains the number of the frame where each repeating event starts and ends. In order to
estimate the length of the event we manually looked at a couple of shots and arrived at an
approximation of ~150 frames. Sometimes, the subject would dribble the ball before shooting or
stretch his arms. As a result, the cycle detection tool would output erroneous results that had to
be manually adjusted.

Naturally, some shots took longer than others so it was important to restructure the shots of
varying lengths into a standardized format that made each shot comparable to all other shots. We
thus decided to build a keyframe reduction tool. The tool works by averaging out certain data
points of shots in order to give each shot an equal number of intervals. This method has the
drawback of abstracting away certain useful features that could be used for classification such as
the speed of release. We decided that this drawback was acceptable for this portion of our
analysis, as we could later determine the impact of omitting these features on our classifier, and
we would be able to experiment with different methods of keyframe reduction to eventually use
the one that returns the best results.

Afterwards, we noticed that the shots were not perfectly aligned. This means that even though
each shot had an equal number of intervals, the aspect of the shot that was stored in each interval
varied across shots. We thus decided to build an alignment tool. Initially, we aligned the shots
by determining where the explosive movement is, and then adjusted each of the shot sequences
so that the release of the ball took place at around frame number 70 for every single shot (see
Appendix A). This decision was based on the hypothesis that the activity in the joints in the time
preceding the shot is a lot more influential in the outcome of the shot than the activity that takes
place after the basketball leaves the hands of the subject.

One of the most important aspects that we wanted to analyze in our project was how different
features affect the outcome of the shot. In order to do so, we needed to be able to write a method
that takes a given feature as input and extracts that feature by performing a transformation on our
joint positional data. The transformed positional data is then passed in as a parameter to the
classifier. As a result, we simplified and abstracted the feature computation so that we could test
different evaluation features on the spot. One feature that we decided to extract was the area of a
triangle formed by representing a subset of the joints as vertices. This area was computed by
using the 3-dimensional positions of each relevant joint. The area was extracted across time as a
means of evaluating the impact of the distance between joints on the accuracy of a shot. We were
inspired by a paper that used triangular mapping to do gait identification.6

Figure 3. An example of triangular mapping using the elbow, knee, and ankle joints

After preparing all the data for analysis and building the code so that we could change
parameters, we manually mapped each shot to a binary outcome depending on whether the shot
was made or not. For the training of the classifier we used the take-one-out approach where we
would exclude the shot we were trying to classify, and would use all other shots as a reference.
Due to the imprecision of our sensors, and the fact that we only had access to a total of nine
sensors, we knew that our resulting classification accuracy would be low. In order to account for
these inadequacies, we used principal component analysis (PCA) as a benchmark for our
accuracy. More specifically, we used the x,y,z position/velocity matrices across time and we
classified them after we determined their principal components. Upon calculating and
incorporating our feature computations, we tested them and compared them to the benchmark.
Features that improved our accuracy were considered relevant, and those that decreased our
accuracy were considered irrelevant. Our goal was to be able to give suggestions to a user as to
how to improve their shot by computing the relevance of certain features using the classifier
accuracy. A potential user of our software could write their own python code that computes a
certain parameter, and incorporate it as an additional criterion for evaluating the free-throw shot.

3. RESULTS & MEASUREMENTS

Our first task in assessing the viability of classifying the success of a given free throw attempt
based on its associated motion was to determine the type of classifier to focus on. We
experimented with a variety of classifiers including a k-nearest neighbor (KNN) classifier, a
support vector machine (SVM), a decision tree classifier, a random forest classifier, and others.
With regards to the KNN classifier, we used a variety of values for k with a minimum of 1 and a
maximum of 30. Figure 4 consists of the accuracy of the various kinds of classifiers for three
different feature extraction methods: the PCA reduction of the x,y,z positions across time, the
root-mean square (RMS) of the x,y,z positions, and the triangular mapping feature. The results
displayed indicate that the most accurate classifications occurred when we used the various KNN
classifiers and the SVM, thus we decided to focus on these two types of classifiers for the
remainder of our analysis.

We initially chose the PCA as the benchmark because we assumed that it would give us the most
accurate results. We also built our own classifier that is denoted NN in the table, which is a
modification of the KNN algorithm that performed relatively well. The triangular mapping
feature extraction performs particularly well on KNN with less than 4 members.

Figure 4. Accuracy of various classifiers when given a feature as input


PCA of (x,y,z) RMS Triangular Mapping
coordinates
across time

KNN1 0.523529412 0.470588235 0.617647059

KNN2 0.535294118 0.529411765 0.652941176

KNN3 0.529411765 0.488235294 0.629411765

KNN4 0.552941176 0.564705882 0.635294118

KNN5 0.5 0.529411765 0.594117647

KNN10 0.588235294 0.623529412 0.6

KNN15 0.588235294 0.576470588 0.617647059

KNN20 0.617647059 0.576470588 0.605882353

KNN25 0.617647059 0.6 0.570588235

KNN30 0.617647059 0.617647059 0.594117647

SVM 0.617647059 0.6 0.617647059

DTREE 0.535294118 0.552941176 0.582352941

RFOREST 0.558823529 0.588235294 0.547058824

ADABOOST 0.517647059 0.511764706 0.547058824

GAUSS 0.394117647 0.611764706 0.582352941

LINEARD 0.529411765 0.5 0.564705882

QUADRATIC 0.517647059 0.582352941 0.576470588

NN 0.564705882 0.552941176 0.523529412

The triangular mapping algorithm, which computes the area of a 3D polygon determined by right
arm joints, outperformed PCA and RMS at classification, and therefore feature extraction. This
result demonstrated that it is possible to outperform numerical quantitative transformations
(PCA) with a qualitative metric of what a good shot actually is. We decided to use the PCA of
(x,y,z) as a benchmark for any other new implemented feature even though the other methods
gave higher accuracies because PCA is the standard in the industry for motion analysis and it
would be more relevant to researchers trying to understand the problem. In addition to
performing approximately 5% better on the KNN classifiers and SVM, Triangular mapping
improved the average accuracy of all classifiers and decreased the standard deviation as can be
seen in Figure 6. This reiterated the fact that our feature extraction method is very relevant
within the context.

Figure 6. Accuracy of various classifiers when using different feature extraction methods
Benchmark PCA RMS Triangular Mapping

Avg. Accuracy 0.550326797 0.559803922 0.592156863

Max. Accuracy 0.617647059 0.623529412 0.652941176

Standard Deviation 0.05552049 0.046108484 0.033941797

KNN + SVM Avg. 0.571657754 0.561497326 0.612299465

Our next focus was to investigate the extent to which a relationship exists between the motion
preceding the shot and its success rate. As a benchmark, we used a 35 frame buffer surrounding
the shot, with 30 frames dedicated to the motion preceding the shot, and 5 frames dedicated to
the motion following the shot. We compared the accuracy of the KNN classifiers and the SVM
classifier as we changed the number of frames that are used in representing the motion preceding
the shot. We ran the classifiers repeatedly while incrementing the number of frames used to
represent the motion preceding the shot by 10. As demonstrated in Figure 7, the majority of
classifiers had the highest accuracy when we dedicated 70 frames to the motion preceding the
shot. We can therefore conclude that the motion preceding the shot does have an impact on its
likelihood of being successful.

Figure 7. Accuracy of various classifiers when incrementing the number of frames


allocated to the motion preceding the shot
We furthered our investigation of the triangular mapping feature to ensure that the improvement
in accuracy attributed to its use was well-founded and not only based on it outperforming the
other feature extraction in KNN and SVM. To verify our previous result, we compared the
accuracy of the classifiers when using triangular mapping on the shooting side as a feature, as
opposed to their accuracy when using triangular mapping on the non-shooting side. As shown in
Figure 8, the classifiers using the benchmark PCA and the triangular mapping for the
non-shooting side yielded relatively similar accuracies, while the classifiers using the triangular
mapping for the shooting-side yielded noticeably higher accuracies. This leads us to the
conclusion that the triangular mapping of the shooting side is a feature that does correlate to the
accuracy of the shot, while the triangular mapping of the non-shooting side does not correlate to
the accuracy of the shot.

Figure 8. Accuracy of classifiers when varying the side used for the triangular mapping
feature

The final part of our analysis consisted of determining whether or not having a smaller number
of sensors would greatly impact the results of our classifiers. Naturally, one would assume that
the sensors placed on the shooting side of the subject would be most relevant, and so we
investigated the difference in accuracy when omitting the data collected from sensors placed
elsewhere on the body. We first omitted the data collected from the sensor on the subjects head,
then omitted the data collected from the sensors on the subjects head and non-shooting shoulder,
then additionally omitted the data collected from the sensor on the subjects non-shooting arm,
and finally omitted the data collected from the sensor on the subjects non-shooting hand in
addition to those previously omitted. For all the classifiers in this part, we used the optimal value
of 75 buffer frames surrounding the shot: 70 frames dedicated to representing the motion
preceding the shot, and 5 frames dedicated to representing the motion following the shot.
From Figure 9, we can see that the omission of the data collected from the head resulted in an
improvement in the accuracy of almost all the classifiers, which is intuitive considering the
motion of the head should not correlate to the success of the shot. We can see that omitting the
data collected from the sensors non-shooting side resulted in a decrease in the accuracy across
the board, and that this decrease in accuracy was compounded as we omitted the data collected
from additional sensors on that side. It is worth noting that for the majority of classifiers, the
decrease in the accuracy was rather marginal as can be seen in Figure 10. This means that a
future researcher attempting to conduct similar experiments with a more limited quantity of
sensors should still be able to extract meaningful data that can be used for analyzing and
classifying the success of free throw shots.

Figure 9. The impact of joint removal on accuracy

Figure 10. The impact of joint removal on accuracy


None Head Previous & L. Previous & L. Previous & L.
Shoulder Arm Hand

Avg. 0.574509804 0.569934641 0.568627451 0.564052288 0.570261438


Accuracy

Max. 0.629411765 0.635294118 0.635294118 0.623529412 0.617647059


Accuracy

Standard 0.033701072 0.045693378 0.04644934 0.046706679 0.036899599


Deviation

KNN + SVM 0.580748663 0.593048128 0.593048128 0.577005348 0.576470588


Avg.
4. ETHICAL/PRIVACY CONSIDERATIONS

The research conducted and the tools used in this project do not pose ethical or privacy issues,
nor do they solve any such existing issues. This would hold true even if the research were
conducted on a larger scale involving a greater number of subjects. Any data collection
conducted with the intention of compiling a large database to be used for an analysis of the
biomechanics of a free throw would not contain the identity of the subjects, thus alleviating any
privacy concerns that may arise. The project does not collect or deal with any private
information, and thus there is no risk of misusing or misplacing such information.

5. DISCUSSION

One of the most important things that we learned throughout this study was the fact that the data
capture part is key to producing meaningful results in motion analysis. More specifically, in
studying the correlation between body motion and the outcome of a free throw, each individual
shot has to be preceded by some calibration motion and a carefully chosen protocol. Our work
would have been greatly simplified if instead of segmenting data based on the basketball leaving
the hands, we used the predetermined motions as indicators of the start and end of a shot.
Additionally, the data collection process would benefit from having a camera that can record the
ball entering the hoop as well as the trajectory of the ball. These additional steps would help in
differentiating between clean shots, and shots that make it into the hoop only after bouncing
off the hoop, rim, or the backboard. Using the same shot pattern would limit the degree of
freedom of each shot and make them more comparable. The subjects movement should be
limited to the shot so that anything that is not a shot can be easily identified. In our scenario, the
subject turned at times and attempted to fix the sensors, which resulted in ghost-shots during
segmentation where the algorithm misinterpreted the hand movement for a shot.

The sensors give us the x,y,z position across time, however, we know that the position is
computed by the firmware using the angular acceleration. That angular acceleration is
transformed into x,y,z acceleration, which is then doubly integrated in order to get the position.
This process obscures several features of the motion due to rounding errors. Velocity and
acceleration could have a noticeable impact on the outcome of the shot, especially if the speed of
the ball or the angle of release were to be calculated. In our case, we calculated those values
through differentiation, however, since the process of obtaining the x,y,z positions involved
double integration, the positional coordinates already numerically encoded that information. The
reason that the x,y,z positions are provided rather than the initial angular accelerations captured,
is because the motion capture product we used is tailored to animation applications. Motion
capture suits designed specifically for biomechanics are only built by a few companies and are
prohibitively expensive for our purposes. The specification of the sensors components say that
each value is a 32-bit floating point number calibrated to allow a small range of acceleration
values that are humanly possible to achieve. Such values would give an immense advantage as
opposed to the 2 decimal precision doubly integrated values that we worked with. In our case, we
did not track the legs of the subject, which significantly reduced the accuracy of our data,
particularly because leg movement is very important to a basketball free-throw shot.1 Thus
having more sensors can only lead to an improvement in the accuracies associated with
classification. As shown above, however, not all parts of the body are as significant to the
outcome.

Some error correction could be made by using kinematic equations. These equations would
detect if the motion does not seem close enough to the range of human motion. For example,
during one of our sessions, we recorded an anomalous position for the wrist due to the effects of
drift. Such mistakes could be prevented through the use of kinematic equations. Having a set of
limits to the degrees of motion of the subject would limit the mapping space, which would
improve classification as the best performing classifier was found to be the KNN algorithm. The
kinematics equations would have to be specifically tailored to the body of the subject and would
add an additional layer of complexity to work with.

Another limiting factor to our analysis was the relatively small number of samples we collected.
With only 200 shots in total, the training set was 100 shots each for shots made and not made.
Even spam filters, which have come a long way in the set of heuristics to track down malicious
e-mail, need training sets of at least 10 times this size. Not only does the training size have to be
large, but there needs to be a way to track down files that are anomalies to the general sample
set. These could be tracked on a by-recording session basis, and a session with a low average
classification accuracy would need to be manually reviewed.

We laid the groundwork for analyzing different parameters that influence the shot outcome by
using them as features for classification. This approach has proven to be successful with
triangular mapping. However, there is an interesting relationship between the number of
classification features incorporated and the weight that would need to be allocated to each of
these features. This is where the qualitative assessment of a professional shooter and a larger
dataset could contribute to an improvement in the accuracy of the analysis. An interesting
approach would be to have an artificial intelligence agent that would try different combinations
of features over a vast number of samples in order to find an optimal balance. The valuable part
of such an approach is that every player would have a tailored assessment of their technique
depending on their shot biomechanics. However, at the current stage of our analysis, there would
be no way for us to instruct the subject as to how to improve their shot or to determine if the shot
needs improvement, even if we were to identify key drivers of the outcome. Our tool merely is
capable of making a determination of the key features that a shooter should pay attention to.

6. REFERENCES

1. Jackie L. Hudson. A Biomechanical Analysis by Skill Level of Free Throw Shooting in


Basketball. J. Terauds (Ed.), Biomechanics in Sports (pp. 95-102). Del Mar, CA:
Academic Publishers. 1982.
2. Joerg M. Gablonsky and Andrew S. I. D. Lang. Modeling Basketball Free Throws.
Society for Industrial and Applied Mathematics Review (Vol. 47, No. 4, pp. 775-798).
2005.
3. Gary Yates. The Development of Multiple Linear Regression Equations to Predict
Accuracy in Basketball Jump Shooting. L.E. Holt School of Recreation, Physical and
Health Education, Dalhousie University. Halifax, N.S.
4. Jernej Barbic, Alla Safonova, Jia-Yu Pan, Christos Faloustsos, Jessica K. Hodgins, and
Nancy S. Pollard. Segmenting Motion Capture Data into Distinct Behaviors. Computer
Science Department, Carnegie Mellon University.
5. Dirk Ormoneit, Trevor Hastie, and Michael Black. Functional Analysis of Human Motion
Data. Dept. of Statistics, Stanford University. Xerox Palo Alto Research Center, Palo
Alto, CA 94304.
6. Sajid Ali, Zhongke Wu, Mingquan Zhou, Abdul Razzaq, Hafeez Ahmad. Human
Identification based on Gait Joints Area through Straight Walking View. Department of
Electrical Engineering, Institute of Space Technology. Islamabad, Pakistan.
Appendix A. RMS vs frame count (shots 5,6,7,9 made). Shots aligned to take place at frame nr
30

S-ar putea să vă placă și