Sunteți pe pagina 1din 55


This project aims to develop a toolkit that aids in the analysis of finger movement in a video

stream. The toolkit will consist of a hardware setup that includes a camera and an input device such as an iPad. In addition, there is also a software side to the toolkit which includes

a program to analyse the videos recorded.

The toolkit will be beneficial as a research tool for academics such as PhD students as it helps to better understand the movement habits of the intended target user by producing a result of the analysis containing the finger path, coordinates and angle. Computer vision libraries, such as OpenCV will be used to analyse the video streams that are recorded.

[The aim of this project is to be able to use this research toolkit to be able to record, analyse and produce a suitable result sheet for any experiment that involves finger movement within

a video stream.]


Thanks to Jason Alexander, my project supervisor for providing me with feedback and guidance.

Chapter 1


1.1 Chapter Overview

In this chapter, we will be looking at the proposed aims and motivations behind this project. In addition, there will be an explaining of the structure of the paper and how everything has come together for this project.

1.2 Introduction

Today there are almost 2.53 billions smartphone users in the world, this market for touch screen devices is growing with an increase of 8.3% since 2017 [9]. For a long time, research on human-computer interaction has been restricted to techniques based on the use of a graphics display, a keyboard and a mouse [2]. Nowadays users want to get as close to the software as possible and the way users interact with their phones and other electronic devices have changed drastically from the 2007 iPhone release [9]. In line with these changes it is only natural that the way user interaction is recorded and analysed should also adapt, this is what this project hopes to achieve.

With this growing use of touchscreens, multi-touch interaction techniques have become widely available recently [3]. This project is relevant to the current growth of how users interact with technology and, with this continued growth, we need to analyse how users are approaching new technology and how we can adapt devices to be easier to use, with human- centred requirements i.e. low system latency [1].

The main challenge facing this project and similar projects to this is the ability to be accurate with the analysis. There are a large number of factors to be considered when looking into user interaction analysis, these factors can be a hindrance when accurately analysing human- computer interaction. Producing a final qualitative result of these user input analysis evaluations can be challenging. This paper will break down the steps needed for the development of a research toolkit that aids in the analysis of finger movements captured as a video and how and why certain features were chosen to be a part of this toolkit. The research included looks at a range of hand and finger movement analysis from pianists to simple bare- hand finger tracking analysis and some open-source toolkits that are available for use now [4,

1, 6].

A research toolkit such as this would be useful to a researcher for three main reasons. Firstly,

the toolkit includes both the software and hardware aspects to it, thus making it easy for the researcher to conduct their tests. This is because everything the research needs is in one place

and all works together well, in addition the toolkit is adaptable to meet the researches need and thus they would be able to swap the hardware or add elements onto the software. Secondly, the hardware is easy to construct and portable while the software should be user friendly and easy to use. Lastly, the research toolkit should be able to be adapted to the intended needs of the researcher, thus with a modular structure that constraints can be added on top of.

The aim for this research toolkit is to have the functionality to process and detect hand positions without prior knowledge of existence [3]. The difficulty with this task would be to determine the background and foreground of the video to make the data intake easier, while also needing to make sure the lighting and environment did not hinder the results. Thus by distinguishing the background and foreground (foreground the object in question and background everything else in a frame) it enables the progression of data analysis from the video, this process is called segmentation [5].

Furthermore, detecting individual's fingers from the hand and having the ability to track these fingers in their movement, we will need a high-resolution RGB camera to observe a path. The intended goal is to produce useful information that would be required by a researcher. Areas

of analysis will include ‘hit’ locations, ‘hit’ times and ‘hit’ angle.

Finally, a suitable output of data in various forms, graphs and charts would be available for the researchers to choose from and the toolkit would produce an appropriate data sheet. The toolkit itself will be evaluated on how easy it is to use. How responsive the program is and most importantly how intuitive the whole sequence of gathering a library of data, analysing and publishing a final report is.

1.3 Aims and Objectives

The aim of this project is to develop a simple and easy toolkit that analyses the finger movement path and hit detection, the time between hits, angle and the errors for a user and their particular device, the aim being to record at least two of the initial criteria suggested. The toolkit should be able to analyse a large library of video data recorded on a high- resolution RGB camera and should be simple to use while producing the data collected in a useful manner to the researcher.

My Aims for this project are:

To build a toolkit, which includes hardware that should be quick and easy to assemble, with minimal equipment.

Record a library of 20 - 40 video streams of participants undergoing various tasks using the RGB camera.

To be able to extrapolate information/data from the library gathered, including ‘hit’ time, ‘hit’ angle, ‘hit’ locations, time taken for tasks to complete etc.

To evaluate the toolkit and if it meets the researchers needs and wants (this may include ‘hit’ are, ‘hit’ time, ‘hit’ angle etc.). Evaluate the toolkit to make sure it is capable to do everything that was intended from it for this project.

For the user to be able to carry out Fittz’s law study while finger movement is analysed by the toolkit.

1.4 Report Overview

The rest of this report will be structured as followed, and will outline the key areas of research and design.

Chapter 2 – Background: The Initial research and findings from past works will be discussed and analysed. The results of similar projects were looked at and the implementations of those designs were taken to aid in the design of this project.

Chapter 3 – Design: After condensing the background research, the design chapter looks at a high-level construct of the project. In this chapter examples of equipment that could possibly be used along with a structure of how the project should be conducted.

Chapter 4 – Implementation: An overview of the program written, with examples from the code itself and how the implementation of the program is structured.

Chapter 5 – Process Description: A breakdown of the process that are taking place behind the code, along with a manual breakdown of what happens when analysing the video stream

Chapter 6 – Testing and Evolutions: Critical review of the program with vigorous testing and examples of results along with evolutions of if aims were achieved.

Chapter 7 – Conclusions: A full review of the project, with a reflection on the introductions aims and objectives chapter, the project as a whole, challenges faced, lessons learns and what would be changed if conducted again will be discussed.

Chapter 2


This section will highlight findings and research prior to the design and implementation phase, it will look at areas of research such as, segmentation techniques, contouring and OpenCV.

2.1 OpenCV Introduction

In this project, computer vision libraries will be needed to help breakdown and analyse the multiple video streams collected. Thus the use of OpenCV, a library of programming functions mainly aimed at real-time computer vision, is vital for this project.

OpenCV (Open Source Computer Vision), as stated before, is a library of programming functions mainly aimed at computer vision, this library of functions can be used for image/video processing. The library has more than 2500 optimized algorithms, which includes a comprehensive computer vision and machine learning algorithms, these algorithms can be used to detect and recognise faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, etc. [17]

OpenCV is a cornerstone in computer vision analysis and has been used for countless projects over the years. Yeo, Lee and Lim [9] exclusively use features from the OpenCV library along with a Kinect to create a hand/finger tracking and gesture recognition system for human-computer interacting using low-cost hardware. On the other hand, Godbehere, Matsukawa and Goldberg [10] developed a tracking algorithm within their own computer vision system that is said to demonstrate significant performance improvement over existing methods in OpenCV2.1. For this project, the use of the functions provided in the OpenCv library are of great aid while helping to deconstruct a video stream.

2.2 Background/Foreground Segmentation

Once the video has been captured, it is necessary for the software to break down the video to help with analysis. One technique that can be used to break down the video is background/foreground segmentation to help the software distinguish between the subject that is being analysed and everything else in the background that is irrelevant.

The basic problem lies in the extraction of information from vast amounts of data. The most important part of the project for the researchers are the results and how/why they are relevant. The goal of segmentation is to decrease the amount of image information by selecting areas of interest. Typically hand segmentation techniques are based on stereo information, colour, contour detection, connected component analysis and image differencing [2].

Similarly, Moscheni et al [18] highlight the intimate relationship between background and foreground segmentation and computation of the global motion. When breaking down the global motion (background) we can estimate the local motion (foreground) to be performed by foreground, thus foreground can be assumed to be the displacement of the object in a scene. This approach of tracking or clustering pixels works well at determining the foreground and background. This paper was helpful as it helped to approach the project in a layered structure, by building on existing concepts and adapting and adding to them to gain more accurate results. The disadvantages of background/foreground segmentation that were found by Von Hardenberg and Berard included, “unrealisable cluttered backgrounds, sensitive to changes in the overall illumination, sensitive to shadows, prone to segmentation errors caused by objects with similar colours” [2] . Letessier and Bérard developed new techniques to overcome these disadvantages, Image Differencing Segmentation (IDS) is insensitive to shadows and Fast Rejection Filter (FRF) is insensitive to finger orientation [1].


Haar Cascade

Haar Cascade s are using in computing vision software such as OpenCV to help with detection [13]. In the OpenCV python environment we are able to train a Cascade by providing the program thousands of positive and negative images. Next, the program takes each positive image and matches that against every pixel of the negative images, when the process is complete the Cascade should be able to distinguish between a positive object Cascade and its negative environment.

This concept is perfectly demonstrated in Padilla et al’s face detection paper [19]. Faces much like fingers/hands come in various different shapes and sizes. The 4-steps suggested in this paper are similar to the suggestions made in the python programming tutorial [13] these consists of: face detection, image pre-processing, feature extraction and matching. This along with varying image colour, lighting, rotation and quantity of times gives you a more precise detection of the frame. Ultimately, to gain an accurate representation of the subject of interest it is vital to do everything that can be done make the object (finger in this case) as easy to detect, with the highest precision possible, by the system.

2.5 Chapter Summary

Through the research conducted there are several ways to conduct this project. The following steps will be to undertake each method and evaluate which gains the most precise result that’s required but is able to fit out timeline and resources. Ultimately, the background research that has been conducted will determine the shape and structure of the projects design.

Chapter 3

Design (high level system design)

Chapter 3 Design (high level system design) 3.1 Chapter Overview Through this chapter the various design
Chapter 3 Design (high level system design) 3.1 Chapter Overview Through this chapter the various design
Chapter 3 Design (high level system design) 3.1 Chapter Overview Through this chapter the various design

3.1 Chapter Overview

Through this chapter the various design decisions that were made will be discussed prior to the final implementation of the toolkit. The explanation of the discussion made will also be discussed.

3.2 Interview with PhD student

As the intended target user for this research toolkit would be academics such as PhD students to aid them in their research, it was a logical start by finding out what researchers are actually looking for in a product such as this and what they intended its design be like.

The conversations with PhD students were helpful as it brought a different perspective to this project, the PhD student looked at this project as a stepping stone and look at further study and how this toolkit can be adapted and evolved into making it better and used in other areas of work such as Medicine and Physiology. Additionally, it allowed the toolkit to be visualised in a broader picture and seeing how it could possibly be used in multiple other ways as a small part of a much larger project as well. This added significance to the results that would be achieved from this toolkit. The questioning opened up with "What analytical information would you like to gain from tracking hand/finger movement in a video stream?"



], the higher level and general answer that was given my most students

was that the requirements for projects varies and a useful toolkit should accommodate

multiple results but also should be open to adapt and change for the users desired output. This indicated the need to publish the source code for other users to be able to modify and adapt the desired code for their specific needs. As a default, the most important results that should be produced by the toolkit include time to track and analyse the finger through the video stream, as well as the "pause" time taken by the individual between the tasks they carry out. An additional add-on that could be added to the toolkit would be the ability to calculate errors during the experiment.

Following on from the initial talk, the next questions posed to the PhD students was "Would a toolkit which analyses finger movement in a video stream be useful to your work?". As previously addressed the feedback from the students depended on their work. Most HCI (Human-Computer Interaction) students had part of their work that they said would benefit from this toolkit. One particular student stated that she was working on a medical based project and that this kind of toolkit would be extremely useful in that line of work. The toolkit could help track and analyse a surgeon’s hand movements during surgery and then use this information to work towards getting robots to replicate the work done by the surgeon. With the use of movement tracking software, if robots are able to replicate precise human tasks this could lead to safer medical operations that would eliminate human error and increasing precision. Technology such as this is still a far distance in the future but is, and should be, slowly worked towards.

The final questions expanded on the idea of how a movement tacking toolkit could be adapted and used in the future for more extensive research. For a wider range of uses, the PhD student advised looking outside of computer science to find some common group with departments such as physiology. Not only the use of finger and hand movement tracking but this can also be expanded to analyse facial features and detect facial releases and this would be very useful for website builders/content producers as they use physiological reactions from individuals of certain content. This would include gaining information on hot sports on interfaces, how to achieve an action most efficiently and we will be able to gage emotional reactions (happy, sad) etc.

In summary, there are three key areas that the HCI PhD students are looking for. Firstly adaptability, as projects differ and result needed are different and it is important that a research toolkit should be able to be adaptable to the researcher's needs. This could include

things such as being able to code specific reading requirements that are needed from the software. Secondly, a clear representation of results is extremely useful for future analysis, this includes producing the output data in a CSV or excel format. Lastly, the whole toolkit, which includes the hardware and software should be easy and quick to use and setup.

3.3 Software Resources

As discussed in prior research and in the project brief the OpenCV library provides vast tools that are capable to gather all information’s required for this toolkit. Furthermore, the functions available in the library aid in analysing this information. This includes being able to convert videos files to grayscale and using that information to further analyse the video clips. Accompanying OpenCV library, Python was the language of choice for this toolkit. The simplicity of the Python language along with its consistent syntax and large library, compliments OpenCV functions in a way that enables them to work fluently together. These were the tools available to undertake this project but to get a high-level overview of the design that was required by the intended target audience interviews were carried out with PhD students to ask what results would be helpful for them in their line of work.

3.4 Video Clip Capture

A large part of this project is to be able to capture video from your tests that will run through

the software for analysis. It is important that the videos captured are fitting for what the software is capable of analysing, this is why it is important to capture video clips in one of the methods that is explained.

For any analysis work to take place, a large quantity of raw data needs to be on hand to be able to put through the software of the toolkit and evaluate the results of the analysis. As stated in the proposal of this project the video capture of this project needs to be simple, easy

to assemble and portable. The camera for capturing the footage may vary in choice but the

video should be of good quality with a minimum of 1080p, this is because it will produce a resolution of 1,920x1,080 pixels thus allowing for the edge of the fingers in the video to be

clear and crisp making it easier to be detected by the software resulting in more accurate results. In addition, 60fps is a standard of most webcams and cameras therefore that will be perfect for the recording. Anything less than 60fps such as 24 fps or 30 fps, could result in empty frames and the footage will not be as smooth as that collected from a 60fps camera. This is especially important when you have a fast-moving fingers in a video stream.

An iPad will be used as an input device for this project. The camera will be placed directly above the iPad, birds eye view and will capture videos of participants while they attempt to do the tests for the experiment. The iPad (or any equivalent device that is used) must be portable and easy to set up, it should be able to load any tasks and should be simple to use. An iPad was the ideal device for this project but this can be altered depending on the researcher's preference.

The iPad should be placed on a flat surface with the brightness of the screen adjusted for the camera. The camera will be connected to a tripod thus allowing it to be able to gain a birds eye view of the iPad. The cameras intention is to capture the iPad screen and to minimise the recording any of the surroundings, therefore, the video should only have footage of the iPad screen in its frames and with the brightness adjusted so that we are able to clearly read any writing on the iPad balancing out the exposure of the screen. The camera should be connected to a laptop or any other portable device that is able to host and execute the

program. This is the final setup to take in data/video clips


Once everything is in place, the experiment will begin and the participant will be asked to undertake various tasks, these could involve: typing a sentence, navigating a website, doing the Fitz law test etc. The video stream is then uploaded onto the laptop and saved. Following this, the video file is run through the finger movement program that should analyse the time, position and angle of the subject’s finger movement in the video stream. This information will be represented in the form of a spreadsheet or document that the researcher can then further use in their work.


3.5 Camera Capture Types

3.5 Camera Capture T ypes Figure 1:Samsung S9 [29] Figure 2: Logitech C922 [30] Figure 3:
3.5 Camera Capture T ypes Figure 1:Samsung S9 [29] Figure 2: Logitech C922 [30] Figure 3:

Figure 1:Samsung S9 [29]

Figure 2: Logitech C922 [30]

Figure 3: Iphone X [31]

Figure {1,2,3} shows the different cameras that are able to capture the intended video input, these include smartphones that are able to capture the desired input. Ultimately the chosen device should be able to precisely capture the videos clip while being in focus, clear and easy for the file transfer to take place. A webcam works well as the video captured can be directly saved onto a laptop and conveniently inputted into the program. On the other hand, a smartphone also works well as you don't need to directly connect it to a laptop straight away and multiple video streams can be recorded and saved. These can be transferred to a laptop at a later date for analysis. The choice of camera is at the discretion of the researcher's preference.

3.6 System Constraints

As discussed by Yu, et al [33] the problems of background subtraction includes a resulting image being filled with noise, thus requiring the process of edge preserving filtering in order to remove this noise and to make the system more accurate. This is a very costly process and would not be suitable for this project. However (as explained in the implementation chapter) other techniques such as thresholding and contouring would need to be used to produce an accurate result in this project.

Furthermore, foreground constraints include accurate detection due to colour noise and colour tracking. A solution to this would be to use a specific colour range that would be

tracked similarly to the one used by Ghorkar, et al [34]. Another solution would be

grayscaling the images and eliminating the need for colour tracking altogether.

When using contouring to analyse the shape of the finger it goes through the frames and

images to get the most accurate shape it can find, but this function only allows you to gage

the approximation of the shape and not the actual shape [28], thus ultimately this has an

effect on the accuracy of the results.

Another option with regards to gaining the shape of the finger would be using Song, et al

finger shape tracking [35]. Similar to implementing Haar Cascades with a redefined image of

a shape, this system requires a very complicated process of checking certain colour and

extraction of finger trips that don’t include the full hand (if the hand is out of frame) will

cause an issue with the system.

3.7 Desire Output and System Run-Through

The overall desired output for this system is for a program to accurately be able to track the

finger movements within a video stream and produce these results in a useful excel

spreadsheet format. The system should be able to output various results depending on the

researches preference. This can be implements in HCI analysis of various input devices. The

system should be able to be adapted for functional add-ons that are able to be coded on by

any user.



Take video recordings of the hand movments and upload onto a computer


Run the video files into the software on the computer

hardware, with

hardware, with
hardware, with

camera and

input device.

with camera and input device.     Review the result file that is produced with the

Review the result file that is produced with the details of the experiment

3.8 Chapter Summary

In this chapter the high-level design features of this project have been established. This will be broken-down further in the implementation phase.

Chapter 4


4.1 Chapter Overview

In this chapter, the concepts that are involved in the project will be looked at further to gain a better understanding of what is occurring in the background of the toolkit’s software. These concepts include Video Capture, grayscale conversation, binarization, contouring, polygon approximation, morphology, the law of cosine etc.



Finger Position


of cosine etc. Segmentation Contouring Finger Position Angle 4.2 Video Capture The Requirement for a simple

4.2 Video Capture

The Requirement for a simple and easy to use command is a must for this toolkit as it is intended to be quick and efficient for the intended research purpose

As the software is initially run using the command “python <full_path_of_video_file>” the “cv2.VideoCapture” function sets the reader to read from video file input. This “VideoCapture” instance is able to capture the video frame-by-frame until it is released.

“VideoCapture” instance is able to capture the video frame-by-frame until it is released. 4.3 Grayscale Conversion

4.3 Grayscale Conversion

After the video stream is read into the program frame-by-frame, the images are converted into grayscale. Grayscale is simply reducing complexity: from a 3D pixel value (R, G, B) to a 1D values. Grayscale is extremely important in image processing for several reasons. Firstly, the elimination of noise in coloured images. Due to the change in pixels, hue and various different colours, is it difficult to identify edges and the extra colours are just considered to be noise.

Secondly, colour is complex and unlike the ease of perception and identification that humans can detect colour, computers need a lot more processing power and this results in an increase in the complexity of the code. On the other hand, grayscale is fairly easy to conceptualise because we can think of the two spatial dimensions and one brightness. Lastly, the speed of processing is a major factor as coloured images can take a very long time, a lot longer than processing a grayscale image. As we are analysing hundreds of these frames the time taken to analyse coloured images is much longer than that of a grayscale image, thus we use grayscale images for the next sections of processing.

In the following images grayscale has been implemented along with foreground/background subtraction. With this technique, the object that is moving (foreground) is white and the objects that are still in the frame (the desk etc.) are in black.

object that is moving (foreground) is white and the objects that are still in the frame
object that is moving (foreground) is white and the objects that are still in the frame
object that is moving (foreground) is white and the objects that are still in the frame

4.3 Thresholding (Extracting the finger from the video)

The need to separate the subject, in this case it would be the individuals finger in a video, is highly important. It is a priority to establish the difference between the background/ irreverent noise and the specific finger that will be used for analysis. Once the finger is established, the initial stages of analysis on the finger will be easier to accomplish. There are many forms of image segmentation, these include: clustering, compression, edge detection, region-growing, graph partitioning, watershed etc. The most basic type and the one looked at in this project is thresholding.

Firstly, the screen area of the video stream is segmented, this is done by thresholding and looking for bright rectangles/squares. Thresholding works as followed, If a pixel value is greater than a threshold value, it is assigned one value (may be white), else it is assigned another value (may be black). The function used is cv.threshold [22].

(may be black). The function used is cv.threshold [22]. By looking at the signature of the

By looking at the signature of the thresholding function it is determined that the first parameter is our source image, or the image that we want to perform thresholding on. This image should be grayscale. The second parameter, thresh, is the threshold value which is used to classify the pixel intensities in the grayscale image. The third parameter, maxval, is the pixel value used if any given pixel in the image passes the thresh test [21]. Finally, the fourth parameter is the thresholding method to be used, these methods include:


cv2.THRESH_BINARY_INV – Inverts the colours of cv2.THRESH_BINARY

cv2.THRESH_TRUNC - If the source pixel is not greater than the supplied threshold the pixels are left as they are.





In this project, after the video stream is converted into grayscale this is followed by binarization using “cv2.THRESH_BINARY”, if the source pixel is not greater than the supplied threshold, the pixels are left as they are.

the supplied threshold, the pixels are left as they are. "Neutrophils" by Dr Graham Beards The
the supplied threshold, the pixels are left as they are. "Neutrophils" by Dr Graham Beards The

"Neutrophils" by Dr Graham Beards

The Binarization method converts the grayscale image (0 up to 256 gray-levels) into a black and white image (0 or 1). Simply, Binarization is the process of converting a pixel image to a binary image [24]. The high-quality binarized image can give more accuracy in character recognition as to a compared original image due to the noise that is present in the original image [23].

4.4 Morphological Transformations

The binary image is then used to find contours in the image. Once contours are found, we use polygon approximation to get a rectangle from the contour. This is done to get the location of the screen so that we can observe the hand motions only within the screen. To obtain the hand contour a couple of steps need to be completed first. These involved looking at darker objects in the frames then applying the blurring, thresholding and other morphology operations that we have looked at. This is all done to obtain a more accurate hand contour.

In linguistics, morphology is the study of the internal structure of words. While in computer vision analysis, morphology transformations are simple operations based on the image shapes The transformation is normally performed on binary images (this gained after binarization). It needs two inputs, one is our original image, the second one is called structuring element or kernel which decides the nature of the operation. Two basic morphological operators are Erosion and Dilation [25].

Erosion works by having a slider, called the 'kernel'. In this project “MORPH_RECT” was used as a kernel, this is a rectangle with the 7 pixels by 7 pixels dimension. Additionally, there are three shapes that can be used for the kernel:

§ Rectangular box: MORPH_RECT



MORPH_RECT § Cross: MORPH_CROSS § Ellipse: MORPH_ELLIPSE Figure 3: MORPH_RECT, is shown to be used in

Figure 3: MORPH_RECT, is shown to be used in the code to obtain a rectangle.

The kernel slides through the image and erodes away any pixel that doesn't match all surrounding pixels. If all the pixels in the kernel are 0's (black) and one of the pixels is a 1 (white), it will remove (eroded) that pixel. It is useful for removing small white noises, detach two connected objects etc.

On the other dilation is the opposite to erosion, dilation works by pushing out the pixels that do not match. Normally, in cases like noise removal, erosion is followed by dilation. Because, erosion removes white noises, but it also shrinks the object (as seen in the images below [26]). So we dilate it. Since noise is gone, they won't come back, but our object area increases. It is also useful in accurately determining the whereabouts of the finger/hand in the area of the screen.

whereabouts of the finger/hand in the area of the screen. Figure 4: Original image Figure 5:

Figure 4: Original image

Figure 5: Erosion

4.5 Contour Features

Figure 6: Dilation

As the most important parts of the screen have been established, the ones where we have the hand movement occurring in, the next step would be to use contour techniques to determine the outline and position of the finger itself to start tracking. There are several contour features such as moments, contour area, contour parameter, contour approximation, convex hull etc. The prerequisites for contouring is first to use a binary image, and secondly to apply thresholding or canny edge detection, both which have been done, this is to ensure that a accurate hand contour.

Contours can be explained simply as a curve joining all the continuous points (along the boundary), having same colour or intensity [27]. The contours are a useful tool for shape analysis and object detection and recognition. This is the perfect feature to use for the finger analysis portion of the software.

Figure 7: Image shows the start of using contouring to find the finger Starting with

Figure 7: Image shows the start of using contouring to find the finger

Starting with the "cv2.findContours()" function (in the picture above). this function allows us to detect an object in an image. The function, "findContours" consists of three arguments. First is the source image, second is contour retrieval mode, third is contour approximation method, the output is a modified image. The contour retrieval mode "RETER_TREE" retrieves all the contours and creates a full family hierarchy list, this means it is able to detect objects that are in different locations or shapes are inside other shapes but is still able to connect (the contours) them together.

Moments help you to calculate some features like centre of mass of the object, area of the object etc. From this moments, you can extract useful data like area, centroid etc. Centroid is given by the relations [28]:

area, centroid etc. Centroid is given by the relations [28]: This calculation is used to determine

This calculation is used to determine the centre of the finger for the tracking to be able to follow a certain point. The centre of contour is calculated to aid us in the angle detection future on in the analysis. The “contourArea” function is used to be able to gain a list of all the contours within the screen area to be able to pick the maximum contour area.

Once the hand contour is obtained, we then look for the finger by analyzing position of the hand and looking for extreme points which would correspond to pointed fingers. This allows the “calculate_fingertip” function to be used to calculate the location of the fingertip. It does this by looking all the contour points, the points on the extreme left and extreme right that are found. Next the top left, top right, bottom left and bottom right corner are searched for presence of hand. Based on where the hand is present, extreme left or extreme right point is picked as fingertip. If hand is present in top left, then it is expected that point on the extreme left will be the fingertip. This allows us to identify the fingertip and we are now able to track it.


Haar Cascade

This section will look at an object detection technique called Haar Cascades, this technique was implemented during initial iteration of the toolkit but was not used in the final version. This is due to Haar Cascade not being as accurate as required by this project, and it is very time to consume to develop Haar Cascade which wasn't the most efficient technique for this project.

Haar Cascades is a machine learning based approach where a cascade function is trained from a lot of positive and negative images and this is used to detect objects in other images. For this project, four thousand positive images of hands/finger and four thoughts of images that did not include hands or fingers were used to train a hand detection Haar Cascade. This proved to take a long time as training the Haar Cascade took a couple of hours, sometimes overnight. Furthermore, the detection accuracy using this technique for fingers was poor but worked well for faces and eyes.

The Haar Cascade are beneficial when wanted to track a large object that won’t necessarily change shape or over. This includes being able to detect eyes and a head but also can be used to detect an object such as a ball or a pen. However, trying to detect a finger movement which can be small and intricate can be challenging to achieve with a Haar Cascade. The pictures below demonstrate a trained Haar Cascade detecting a face, eyes and a hand. As demonstrated it is not accurate and therefore unsuitable for this project.

is not accurate and therefore unsuitable for this project. 4.7 Finger Path To visualise the finger
is not accurate and therefore unsuitable for this project. 4.7 Finger Path To visualise the finger

4.7 Finger Path

To visualise the finger path being tracked it was useful for researchers to be able to have a video clip of the actual tracking taking place. We use the pixels in the screen to gain

distances arounds the finger, three lengths a,b and c having been able to be calculated. Next,

the cosine rule is used to find the angle to of finger. The cosine rule is given as: cos(C)

= a 2 + b 2 c 2 2ab, in the code this same formula is replicated and the three lengths that

were previously found are used to find the angle on the finger in comparison to the iPad.

to find the angle on the finger in comparison to the iPad. Figure 8: The points

Figure 8: The points obtain and the use of the cosine rule to work out the angle of the finger.

The centre of the hand contour is found and angle is measured between the center and the pointed finger to get angle of the finger.

The ‘self.draw’ method runs through that points that were calculated and is able to use the .circle and .line to draw the path of the finger over the videos file. The results are stored as the position of the x and y coordinates on each frame of the video stream. Finally, the function ‘write_output’ is able to publish the results in a .csv file. That fine can be used for further analysis by creating graphs to show the diverse experiments that can be obtained from this research toolkit. The full analysis of the toolkit will be presented in the next section.

4.8 Chapter Review

In this chapter, a clear understanding of the stages required for accomplishing this project aims has been established. The techniques mentioned in the implementation chapter will be able to track the path of a finger whilst providing the locations of the finger with respect to the screen as well as the angle, this meets the required brought forward in the aims.

Chapter 5 System in Operation and Process Description 5.1 Chapter Overview

This section will look at the process of conducting an experiment using this research toolkit. The section will be broken down into the software handing and hardware handing as well as a general user guide or manual.

5. 2

Process Description

This following section is a high-level run-through of the manual in Appendix A.

is a high-level run-through of the manual in Appendix A. Figure 9: The desired setup. camera

Figure 9: The desired setup. camera birds-eye view of the iPad

As with the intention of an experiment, it is key to have a goal and understand what you are trying to achieve and should be the basis of an experiment. When using this particular finger tracking research toolkit, the hardware will need to be installed/ step-up first to get data to analyse later. As described in the design chapter the set-up will include an overhead camera with a bird's eye view on an input device (iPad).

The goal fo this experiment is to determine whether an input delay exists amoung the results of the toolkit. The experiment begins with the particular test (such as the Fittz Law or any finger traking test) up loaded onto the iPad and the camera recording (birds eye view on top

of the iPad) will start. The candidate being evaluated, will sit down and begin the tests on the iPad using their pointer finger. The videos should then be uploaded to the computer that is able to run the software portion of the toolkit. As the default, the toolkit is able to track the finger movement by giving the x and y coordinates of the finger at various sections (frames) in the video stream. Along with recording the time of the experiment and the 'hit' angle of the finger. The video should be fed into the software and a .CSV file ( in the image below) will be produced with the details of the experiment.

below) will be produced with the details of the experiment. 5.4 Chapter review This chapter looks

5.4 Chapter review

This chapter looks into the process that is taken through this experiment to achieve an accurate and precise results file of a tracked finger. Furthermore visual representation of the final output final and the setup are given

6. Testing and Evaluation

The final toolkit that was designed will be assessed and evaluated through several different testing methods. The accuracy of the toolkit will be looked at to see if it meets the criteria, these criteria’s involve accurately determining the time, location and angle of a finger as it is traced in a video stream.

6.1 Accuracy

The main criteria when researchers are looking for in a research toolkit is how accurate the results are. This evaluation will be broken down into 3 categories and will reflect the aims of this problem. The key areas of accuracy that will be looked at are the ‘hit’ time, ‘hit’ angle and the ‘hit’ locations.

6.11 Time

The reason for this investigation to be conducted is to determine whether there is a delay time between the raw footage and the results that are produced after the analyses. If there is a delay a quantitative result should be evaluated to determine the time offset present in the toolkit.

To evaluate the ‘hit’ time, this experiment will consist of recording a video stream where an individual is asked to move their hand from one square to another on the iPad. The squares will be 15cm apart and will be at the same level. The individual will be asked to press a square (for example square A) and hold their finger there for two seconds, then they will move their finger onto the next square (square B) and hold for two seconds, finally they will move their finger back to the original square (square A). This experiment will be repeated with changes in the duration of the press (there will be a timer running beside the iPad).

After the video is recorded, the times and frames of the video will be looked at. The timing of when the finger initially presses down on a square will be recorded as well as the time point when the finger is released, the number of frames between the press will also be counted and noted. After the video is recorded, the times and frames gathered from the video will then be looked at and analysed.

Next, the video stream will be run through the software of the toolkit and the results will be evaluated. In the results, concurrent repeat values of x and y coordinates will be looked for, this is because if the x and y coordinates are the same for multiple frames/seconds thus the finger hasn't moved in the 2D plane and can be concluded to be "pressing the square". The time of the initial repeated coordinates will be noted along with its duration.

Finally, the results gained from the software will be compared to the results recorded when looking at the video stream. The accuracy of the toolkit can be determined by if the results match up or if there is a delay or offset in the time taken by the software.

The following explains the results from the experiment:

The following explains the results from the experiment: Figure 10: Screenshot of the initial touch at

Figure 10: Screenshot of the initial touch at A at 0:02

Figure 10: Screenshot of the initial touch at A at 0:02 Figure 11: Results screen for

Figure 11: Results screen for the first 3 seconds of the video

As shown in figure 10 the initial press onto A was done at 2 seconds however in figure 11 the results from the toolkit show the finger was stationary at about 2.5 seconds this is determined by the x coordinate only varying by 1 from 2.375 to 3.08 and the y varying by 2 between 2.5 second and 3 seconds.

We can conclude that the offset (delay) of this toolkit is around 0.5seconds, thus a delay of 0.5 seconds between the raw video file and the result sheet produced, this being a very successful result. This experiment was carried out several time along with a 5 second press, all the raw results and data are present on the website.

6.12 Angle

To accurately measure the angle, the next experiment involved an individual initially placing their finger lying flat on the iPad . They lifted their finger up to 90 degrees then placed it back down on the other side (as shown in the pictures below).

down on the other side (as shown in the pictures below). Figure 12: Side view of

Figure 12: Side view of the experiment, the actual recording will be taken from a birds eye view.

the actual recording will be taken from a birds eye view. This helped evaluate the accuracy
the actual recording will be taken from a birds eye view. This helped evaluate the accuracy

This helped evaluate the accuracy of the toolkit and the desired output is for the program to produce a data set that goes from 0 degrees (at the start of the video) up to 90 degrees and then to 180 degrees (at the end of the video) , the accuracy of this experiment will be determined by whether the system can accurately record the change in angle consistently.

The following experiment was carried out multiple times, with all the raw data collected available on the website.

From the data below figure 13, it is clear to see that at the beginning of the video the angle recorded is 98.40 degrees, which is not the desired output (desired output being 0 degrees).

At the midway point of the video around 3.65 seconds (with the 0.5 seconds added for delay from the previous section), the angle at 3.67 is 145.46 degrees and 4.12 is 152.65 degrees these values are again incorrect as the desired output should be 90 degrees. Finally, the end of the video shows the angle at 7.33 seconds at 100.40 degrees however it should be 180 degrees.

seconds at 100.40 degrees however it should be 180 degrees. Figure 13: Results sheet from the

Figure 13: Results sheet from the Angles experiment

In conclusion, this toolkit is unable to accurately evaluate angles of fingers while they are being tracked. The results showed that the angle region stays between 98 degrees to 150 degrees and therefore is something they would need to be refined for further work with this toolkit.

6.13 Location

For this portion of the evaluating the accuracy of the fingers location was tested. To conduct this experiment an iPad was used along with two rulers either side of the screen to record the measurement of the iPad screen. To determine where the individual has touch, PPI (pixels per inches) will be used to calculate where on the iPad a ‘hit’ has accoutred. This is done by measuring the location using the ruler (in inches) and with the information that the iPad used has a PPI of 264 the screen pixels can we calculated [34]. The result sheet data will be checked to locate the finger at specific times and check x/y coordinates for the same location. The accuracy will be judged according to how precise the results are able to output, the desire output would be to get the same coordinates.

The following experiment was carried out and the results are evaluated below, the full data

The following experiment was carried out and the results are evaluated below, the full data and links to the video of the experiments being run are available on the website.


The following marks from A to E were made at 0.11, 0.15, 0.19, 0.23 and 0.28 seconds respectively. Thus this would also be the location of the finger at those specific times. To work out the location of the fingers using PPI the measurements of the marks were taken, for example, A measure in at (1.74, 5.52) (x and y coordinates). With the information that this iPad has a PPI of 264 the location of the finger at A would be 1.74 * 264 = 459.36px and the 5.52* 264 = 1457.28px. Concluding for A location with the x/y coordinates are (459.36,

1457.28). This result was then compared to the result output from the video file and large differences were found. If we take into account the 0.5 seconds delay that was found out in 6.1, on line 37 of the results page (figure 14) the coordinates are notes as (610, 347) thus concluding that these results are not accurate. Mark B was calculated as (459.30, 810.48) but as recorded on figure 15 the coordinates were (600,128). This experiment was done with all five marks and no correlation between the data was found, therefore can conclude there was no specific offset number for the location.

there was no specific offset number for the location. Figure 14: Results for B at 15.5

Figure 14: Results for B at 15.5 seconds

for the location. Figure 14: Results for B at 15.5 seconds Figure 15: Results for A

Figure 15: Results for A at 11.5 seconds

6.3 Set Up and Toolkit Run Through by Another Person

For this evaluation another person was given the equipment, the software and an user manual guide for the toolkit. They were asked to setup the toolkit and record a video, upload it onto their computer. Open the software and run the video stream through the software and provide an output file. At the end the experiment the individual was asked some questions on how

they found the experience and what they thought worked well and what could be changed and done better.

worked well and what could be changed and done better. The full questionnaire for this experiment
worked well and what could be changed and done better. The full questionnaire for this experiment
worked well and what could be changed and done better. The full questionnaire for this experiment
worked well and what could be changed and done better. The full questionnaire for this experiment

The full questionnaire for this experiment is detailed in Appendix B. From this experiment the user has no prior knowledge of coding or this project. This experiment was to evaluate the ease of use of this toolkit and therefore was appropriate to get someone who has no technical knowledge. The user said “it was easy and took a short time to do” when referring to the hardware setup as they used the pictures to guide them. In addition, even with no knowledge

of the code, they were able to understand the output of the toolkit and use appropriate modifications (changing the threshold in line 74) to gain their desired output. They did notice some glitches with the tracing as the marker would occasionally register something other than their fingertip but overall they were happy with their experiment.

6.4 Tracking Different Object Movement

The final experiment consisted of using different object in the video stream and evaluated if the software is able to recognise a finger or if it recognised anything that’s the shape of a finger. The video stream used conducted of using a pen to mimic the movement of a finger on a keyboard, using different fingers (pinky finger) and wearing a glove while doing the experiment (this final experiment was not included but footage and results can be found on the website).

The following experiment was carried out with initially using a pen to replicate the movements of a finger typing.

using a pen to replicate the movements of a finger typing. Figure 16: The start of

Figure 16: The start of the video as the pen enters the frame.

Figure 17: Middle of the video with the pen 'typing' Figure 18: Final outcome after

Figure 17: Middle of the video with the pen 'typing'

Figure 17: Middle of the video with the pen 'typing' Figure 18: Final outcome after the

Figure 18: Final outcome after the pen has exited the frame.

As represented in figures 16,17 and 18 the system was able to track the pen at certain point but was unable to fluidly track its path as it would do with a finger. The desired output of this experiment would have been if the software wasn’t able to recognise the pen altogether but it can be concluded that we software was able to partially track the pen but not to great accuracy.

The next experiment was using the little finger or pinky finger to replicate what has previously been done by a pointer finger.

to replicate what has previously been done by a pointer finger. Figure 19: Initial set-up when

Figure 19: Initial set-up when starting the video

Figure 20: Mid video at 00:09 seconds Figure 21: End of the video at 00:18

Figure 20: Mid video at 00:09 seconds

Figure 20: Mid video at 00:09 seconds Figure 21: End of the video at 00:18 seconds

Figure 21: End of the video at 00:18 seconds

From the test run the conclusion can be made that the toolkit does tracks the pinky finger. From the initial observations of the output video with the traced routes it does look like the software is able to track the pinky with the same accuracy as the pointer finger. In contracts with the pen this shows that the system is efficient at determining fingers in comparison to objects.

6.5 Chapter Review

This section has seen the evaluation of the toolkit. It can be concluded that the toolkit needs more work to refine and make the system more accurate. However, it does meet the aim of tracking the path of a finger in a video stream and produces results with a 0.5 second delay.

Chapter 7


In this chapter the overall project is discussed. The overall outcome of the success/failure of the project will be evaluated. The aims will be reflected upon as well as the knowledge and experience gained over the process.

7.1 Conclusion of Aims

Aim 1: To build a toolkit, which includes hardware that should be quick and easy to assemble, with minimal equipment.

This aim was successfully completed, in section 6.3 during the evaluation the individuals that was given the user manual along with the toolkit was able to efficiently and quickly set-up the equipment and conduct the experiment. From the feedback they said it was quick and easy to run and required minimum3 pieces of equipment which was good but could have benefited with a more user friendly user interface for the software. In addition, the hardware for the toolkit was adaptable to allow the research whatever device they had available to them but also the device used during this project was quick to set up and run.

Aims 2: Record a library of 20 - 40 video streams of participants undergoing various tasks using the RGB camera.

Through various different experiments and investigation 20 video stream was a good target to hit but during this experiment that was met and exceeded, while collecting and evaluating nearly 50 videos streams. The evidence for this can be located on the website that contains all the raw footage that was recorded during this experiment along with all the outputs produced.

Aim 3: To be able to extrapolate information/data from the library gathered, including ‘hit’ time, ‘hit’ angle, ‘hit’ locations, the time taken for tasks to complete etc.

From the evaluation of the video stream, the data regarding the angle is able to be collected along with the location. The ‘hit’ time, could be calculated this is shown in 6.11 where the toolkit was efficiently able to track the fingermovement but did experiment a 0.5 second input delay. However, the toolkit wasn’t able to evaluate at what stages in the video stream the subject actually clicked onto the input device, this was only determined by the repeated values (coordinates) of x and y as the finger would have been stationary at those stages.

As evaluated in sections 6.12 and 6.13 the toolkit was not fully accurate when recording the location and angle. This is due to the values that were calculated not matching up with the results produced by the toolkit.

Aim 4: To evaluate the toolkit and if it meets the researchers needs and wants (this may include ‘hit’ are, ‘hit’ time, ‘hit’ angle etc.). Evaluate the toolkit to make sure it is capable to do everything that was intended from it for this project.

This aim is subject to the situation that it is being used in. For this experiment the three main needs were ‘hit’ time, angle and location. The toolkit was able to accomplish one of these needs but wasn’t able to accurately determine all three. Therefore this currently toolkit was not capable to do everything that was intended by the project.

Aim 5: For the user to be able to carry out Fittz’s law study while finger movement is analysed by the toolkit.

The Fitts’s law study was used as one of the testing methods when gaining video streams for the evaluation of the toolkit. The images below show the Fitts’ law in action, these were taken during the beginning stages of development for the toolkit and therefore results in sporadic tracking. This helped to gage the threshold level that was require from a toolkit to provide smooth and accuracy tracking.

from a toolkit to provide smooth and accuracy tracking. 7.2 Project Revision As most of the
from a toolkit to provide smooth and accuracy tracking. 7.2 Project Revision As most of the
from a toolkit to provide smooth and accuracy tracking. 7.2 Project Revision As most of the

7.2 Project Revision

As most of the aims were met and the toolkit was able to answer several questions there is still room for improvement. These are a few ideas and implements that would be added on if this project was done again or is extra time for further development was given. Firstly, the toolkit intention was to be able to record the ‘hit’ of the finger onto an input device. This would be able to record the actual input of the finger. For example, if a finger is

typing on a keyboard the toolkit should be able to recognise the letters that are being typed. This doesn’t work when the toolkit if used to do the Fittz’s law experiment or instructed to browse the web. This is because the actual input is not reflected by a finite letter combination.

Secondly, the user interface with this toolkit is poor. This was developed for HCI researched and was assumed that they had prior knowledge of compilation and running programs in java. In reality, if this toolkit was to be used by other researched in a different department, for example, psychology. The research, if they are unable to use the toolkit or able to follow the manual will not be able to use it efficiently.

7.3 Further Work

The most important section in this project would be the further work. This section looks at the concluded project and the ideas proposed and challenge the need for further research and work on the idea to make it bigger and better in the future. This section will look at how this idea of finger tracking in a video stream can be implemented in the future for a variety of different uses in different sectors. These sectors may include medicine, sign language translation, input device research and website development

For this project there are several avenues for improvement. Firstly the location and angle detection is very inaccurate and therefore would require a different method of gaining these results, these could include adding sensors gloves to this toolkit but would require more equipment. In addition, the delay time of 0.5 seconds needs to be reduced with the goal of obtaining 0 seconds delay time, . As mentioned in the project revision the user interface needs to be more accessible and user friendly.

7.31 Medical Use

As previously outlined in the interview with the PhD student, a great use for tracking fingers in a video stream would be in medicine. More important during surgery, minor surgeries that are routine and mostly repetitive. A video that records the surgeon's hand movements during a survey and outputting the analysed information including the x and y coordinates of their

hands and the angles of their fingers. This information can then be uploaded into a robotic pair of hands that would be able to replicate the surgery, this would increase accuracy during the surgery and would also free up time for the surgeons for more complicated work.

This approach works better than the current approach of hand/finger monitoring which uses sensors that are clipped onto the individual's hands. The major challenges that face this further work are first, the three definitional aspects of the surgery. This would possibly require a minimum of two cameras along with a more powerful software toolkit. In addition, the contrast between the surgeon and this patient might be an issue but could be solved with bright gloves creating a greater contrast. There are more obstacles before the use of finger tracking can be implemented in the health-care industry but hopefully, this project provides some ideas for getting started.

7.32 Sign Language Translation

Another idea for further work that would benefit from the ideas that are proposed in this project would be a sign language translation system. The basis of the idea is to capitalise on a unique input method. The idea suggests that by recording a video stream of a pair of hands saying a sentence in sign language and software using computer vision techniques can be used to analyse the hand/finger movements and relate them to words.

After a library is filled and a software is taught how to translate sign language into words, individuals can use sign language to type as the video stream analyses the finger movements, translates them into words and they are added into a document, email anywhere. This would be a unique input device that could challenge the transaction keyboard but also encourage people to learn sign language thus help communicate with audio impaired individuals.

7.33 Input Device Research

One of the goals of HCI (human computer interaction) is to be able to make the communication between the human and the computer as seamless and effortless as possible. This starts with the input device, as mention before the sign language transitional is a unique take on using the ideas of the project to create an input device. Another use of this port would be to monitor the approach taken by individuals when using a keyboard and other input devices. By analysing the travel time between keys we can use this toolkit to gain

information that could help with creating a more efficient and easier to use a keyboard. These keyboards are already in the market, but by gathering information for individuals we can determine what exactly they are looking for. Potentially recording individuals for a certain amount of time and personalising keyboard specifically for their typing habits.

7.34 Website Development

Just like the toolkit used to gain information about user input habits and the devices they use, this can also be translated to a website usage that is available on iPads or smartphone (any touchscreen devices). The toolkit can be used to track the user's habits when using the websites and how they navigate through it, over time the website developers can determine what buttons/pathways are the most common in the website and can change the layout to be more user-friendly and efficient. This will help the development of websites and will also help with user experience because they are not frustrated navigating a website (and everyone has been there).

7.35 Robotic Arm

Finally another optimistic idea for further research. As developed/proposed in on Internet connect robotic arms, “the movement of the robot arm can be controlled by a computer via the internet” [32]. This vision can be aided by the research and ideas brought forward in this project. If an individual’s hand movements for example, while playing chess, are able to be traced and recorded. This data can be transferred over the internet to a robotic arm that is able to recreate the individual's hand movements, therefore playing physical chess over the internet and all that is needed is a webcam.

7.4 Reflection

During this project, I believe I have learned a huge amount about computer vision, something that I previously had very little to no knowledge of. I have been introduced to the world of OpenCV with its massive library of functions capable of various operations. Initially, when the project started I didn’t have a certain idea or direction I wanted to go with therefore I was able to play around and learnt how to use different concepts such as Haar Cascades and background subtraction, I was able to understand the advantages and disadvantages of using

various techniques to achieve my goal. This was all before understanding the steps I needed to take to ensure I was able to create an efficient and accurate toolkit and overall project.

Going into detail regarding how images can be manipulated and changed to obtain certain information really fascinated me while during my research. I really enjoyed learning how binarization worked and why it worked and how we can basically manipulate individual pixels and alter an image/video to analyse and extract specific information, such as the shape of a finger. Furthermore, I understood in intricate relationship between thresholding, contouring, blurring and how they all work together and how the order in which processes are done is very important depending on what information you would like to obtain and how you want to use that information.

I enjoyed talking to the HCI PhD students along with peers. As this toolkit is intended to be used by others and therefore it was important to me that look at the user manual or the instruction is given someone that they are able to understand where I am coming from and why this is of interest. In particular, I enjoyed talking to the PhD student about innovations in the future that involved HCI. Such innovation as robotic arms conducting their own surgeries or using potentially recording hand recordings of yourself to play chess with someone across the world just because the toolkit was able to analyse the movement of your hand and replicate this to a robotic arm, this without using expensive sensors or high tech equipment.

Finally, thanks to this project I have been able to learn a wide arrange of skills. This experience has taught me how to put together a large report along with research. In particular, with the structure of the project, meeting personal deadlines and working largely independently. This has improved my communication skills because I needed to be clear and concise with my points and idea to get across the message to someone who is unfamiliar with computer science altogether. In addition, doing the project in python has aided me immensely as I am more confident with my coding. Python is extremely desired and this has allowed me to be more confident during assessment days and job interviews while talking about python and computer vision. Finally, I believe I have improved my ability to undertake a large project and be able to efficiently execute it individually.

7.5 Closing statement

In conclusion, I believe this project was a great success as it personally has helped me develop in various ways. This includes communication, time management, structure, work management etc. Whilst I was notable to complete all my desired objectives to the fullest as I aspired to, such as accurately calculating the angle. I believe I was able to put my best effort in and I am proud of the outcome. I was able to pick an area of computer science I was interested in (HCI) and I was given the opportunity to learn so much more this included computer vision OpenCV library and techniques to alter video frames/images. Furthermore, I have acquired many invaluable skills which I can now apply to any number of fields. Finally, even with a challenging project such as this one, with scary hurdles and stressful times, I am ultimately proud of the work that I was able to achieve.


[1] Letessier, J. & Bérard, F., 2004. Visual tracking of bare fingers for interactive surfaces. Proceedings of the 17th annual ACM symposium on user interface software and technology, pp.119–122. (Accessed Oct 2018) [2] Von Hardenberg, Christian, and François Bérard. "Bare-hand human-computer interaction." Proceedings of the 2001 workshop on Perceptive user interfaces. ACM, 2001 (Accessed Oct 2017) [3] Hackenberg, G., Mccall, R. & Broll, W., 2011. Lightweight palm and finger tracking for real-time 3D gesture control. Virtual Reality Conference (VR), 2011 IEEE, pp.19–26. (Accessed Oct 2017) [4] Gorodnichy, D.O. & Yogeswaran, A., 2006. Detection and tracking of pianist hands and fingers. Computer and Robot Vision, 2006. The 3rd Canadian Conference on, p.63. (Accessed Oct 2017) [5] Dorfmuller-Ulhaas, K. & Schmalstieg, D., 2001. Finger tracking for interaction in augmented environments. Augmented Reality, 2001. Proceedings. IEEE and ACM International Symposium on, pp.55–64. [6] Gesture-Based Interfaces: Practical Applications of Gestures in Real World Mobile

Settings – Julie Rico, Andrew Crossan, and Stephen Brewster (Accessed Oct 2017)

[7] Straw, Andrew D. & Dickinson, Michael H., 2009. Motmot, an open-source toolkit for realtime video acquisition and analysis.(Research). Source Code for Biology and Medicine, 4, p.5. (Accessed Oct 2017)

[8] Popa, D., Gui, V. & Otesteanu, M., 2015. Real-time finger tracking with improved performance in video sequences with motion blur. 2015 38th International Conference on Telecommunications and Signal Processing, TSP 2015, pp [9] Facts, S. (2018). Topic: Smartphones. [online] Available at: [Accessed 7 Oct. 2017].

[10] (2018). Visualising Fitts's Law. [online] Available at: [Accessed 14 Oct. 2017]. (2018). Basic Gesture Recognition with a Kinect and OpenCV –

Finding the Hand – Jack Romo. [online] Available at:


finding-the-hand/ [Accessed 9 Dec. 2017]. (2018). Hand Tracking And Recognition with OpenCV.

[online] Available at: recognition-with-opencv/ [Accessed 7 Mar. 2018].


[13] (2018). Python Programming Tutorials. [online] Available

at: -Cascade -face-eye-detection-python-opencv-

tutorial/ [Accessed 7 Mar. 2018]. Google Books. (2018). Learning OpenCV. [online] Available at:


f=false [Accessed 7 Mar. 2018].




Visual Tracking of Bare Fingers for Interactive Surfaces – Julien Letessier, Francois

Berard (Accessed Oct 2018)


Bare-Hand Human-Computer Interaction – Christian Von Hardenberg, Francois Berard

(Accessed Oct 2017)

3. Lightweight Palm and Finger Tracking for Real-Time 3D Gesture Control – Georg

Hackenberg, Rod McCall, Wolfgang Broll (Accessed Oct 2017)

4. Detection and Tracing of pianist hands and fingers – Dimitry O.Gorodnichy and Arjun

Yogeswaran (Accessed Oct 2017)

5. Finger tracking for interaction in augmented environments – Klaus Dormuller-Ulhaas,

Dieter Schmalstieg (Accessed Oct 2017)


Real-Time Finger Tracking with improved Performance in Video Sequences with Motion

Blur – Daniel Popa, Vasile Gui and Marius Otesteanu (Accessed Oct 2017)

9. Smartphones industries: Statistics and Facts – (Accessed Oct 2017)

10. An Interactive Visualisation of Fitts's Law with JavaScript and D3 – Simon Wallner, Otilia

Danet, Trine Eilersen, and Jesper Tved. (Accessed Oct 2017)

[21] Rosebrock, A. (2018). Thresholding: Simple Image Segmentation using OpenCV -

PyImageSearch. [online] PyImageSearch. Available at:

[Accessed 7 Mar. 2018].

[22] (2018). OpenCV: Image Thresholding. [online] Available at: [Accessed 7 Mar. 2018].

[23] M. Sezgin, B. Sankur, “Survey over image thresholding techniques and quantitative performance evaluation”, Journal of Electronic Imaging 13 (1) (2004) 146–168.

[24] (2018). Image Processing - Binarization. [online] Available at: [Accessed 7 Mar. 2018].

[25] (2018). OpenCV: Morphological Transformations. [online] Available at: [Accessed 8 Mar. 2018].

[26] (2018). Eroding and Dilating — OpenCV documentation. [online] Available at: [Accessed 8 Mar. 2018].

[27] (2018). OpenCV: Contours : Getting Started. [online] Available at: [Accessed 8 Mar. 2018].

[28] (2018). OpenCV: Contour Features. [online] Available at: [Accessed 18 Mar. 2018].

[29] Webcam, C. and Webcam, C. (2018). Logitech C922 Pro Stream 1080P Webcam for Game Streaming. [online] Available at: stream-webcam [Accessed 17 Mar. 2018].

[30] Papitawholesale. (2018). Samsung Galaxy S9 Plus Dual Sim - 64GB, 6GB Ram, 4G LTE- Grey. [online] Available at: 6gb-ram-4g-lte-grey?variant=11185661804587 [Accessed 17 Mar. 2018].

[31] (2018). [online] Available at: [Accessed 17 Mar. 2018].

[32] Kadir, Samin & Ibrahim, 2012. Internet Controlled Robotic Arm. Procedia Engineering, 41, pp.1065–1071.

[33] Yu, T., Zhang, C., Cohen, M., Rui, Y., and Wu, Y. “Monocular video foreground/background segmentation by tracking spatial-color gaussian mixture models.". In Motion and Video Computing, 2007. 23 Feb. 2007. [34]

// change reference styel ^^^^

[34] Ghotkar, A. S., and Kharate, G. K. "Hand segmentation techniques to hand gesture recognition for natural human computer interaction." International Journal of Human, University of Pune, India, 2012.

[35] Song, P., Yu, H., and Winkler, S. "Vision-based 3D finger interactions for mixed reality games with physics simulation." In Proceedings of the 7th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry. ACM, 2008, Singapore, December 08, 2008.

Appendix A – User Manual

Crisanto Da Cunha Toolkit support for the analysis of finger movement in a video stream

Finger Movement Toolkit User Manual

Product Description

This toolkit is able to take recordings and run them through a software that is able to analyse the finger movements within the video stream. The threshold (how sensitive the software needs to be to eliminate background noise) can be altered to achieve best results when analysing a video sample.

System Requirements

The video camera required to capture the footage for analyse needs to be of highest quality, preferably 1080p with 60 fps or higher, this is to ensure the highest accuracy possible when running through the software in addition to avoiding missed frames.

Hardware setup

//pics of setup

Download and Install the toolkit software

To begin, download the code that is linked below and copy and paste it in any code editor, preferably in PyCharm.

Open Toolkit

Windows Open Toolkit using any of the following:

Create a folder called ‘finger movement toolkit’

Download the code into a code editor and save in the folder.

Insert any video file that you want to run through the toolkit into the same folder

Follow the link and instruction to download python for windows:

Go into the folder and right click and select the option “open command window here”

While in command promp enter the follow line: python <full_path_of_video_file>

Mac Open Toolkit using any of the following:

Create a folder called ‘finger movement toolkit’

Download the code into a code editor and save in the folder.

Insert any video that are going to be used in the same folder

Open “terminal” and go into the folder that contains your project

While in the folder use the following command to compile the code and analyse your video file: : python <full_path_of_video_file>


Lines 46, 72 and 73 can be varied to change threshold area for hand detection as well as screen detection. //make more clear

Additional modules (methods) can be added onto toolkit to detect/analyse other things.

Line 229 you can alter the output that is written to the results file.

Appendix B – 6.3 Questionnaire


Are you from a technical background (i.e. computer science, engineering, physics etc.)?

No I am not.


Do you have any experience coding?

No, I may have done some coding in ICT in school but I don’t really remember. Even if I did it might have been excel or very basic.


How did you find the set up process?

So I had the user manual but I mainly looked at the pictures and just tried to replicate that, it wasn’t too difficult the hardest part was probably putting the webcam onto the tripod. Other than that it was simple and when I plugged the webcam into the computer it worked straight away. So it was very easy and simple and took a short time to do.


What experiments did you run?

I wanted to start slow to understand what I was doing so I just started by typing my name into notes and then I tried to draw a smiley face in notes.


Did you understand the code/ what was going on?

No I didn’t, but I knew what the actual program did and I knew I had to change one number on line 74 to change how accurate I want the tracking to be, so I took different videos and changed the number on each one and played the code to see how it changes.


How accurate is the tracking?

From what I can see the tracking is very accurate, I was very surprised on how accurate it actually is. However, it goes bug out at times and picks up other parts

of my body like my knuckle and will get confused but normally goes back to tracking my fingertips.


Can you see this toolkit being use?

Personally for me I would never use it, but for someone whose doing research work or in this field I can see it being an easy assemble tool to use to get quick results. Especially if they know coding and are able to understand whats going on.