Sunteți pe pagina 1din 9

Netw Model Anal Health Inform Bioinforma (2013) 2:71–79

DOI 10.1007/s13721-013-0026-x

ORIGINAL ARTICLE

Finding objects for assisting blind people


Chucai Yi • Roberto W. Flores •
Ricardo Chincha • YingLi Tian

Received: 18 September 2012 / Revised: 21 December 2012 / Accepted: 17 January 2013 / Published online: 7 February 2013
Ó Springer-Verlag Wien 2013

Abstract Computer vision technology has been widely with their surrounding environments, and the most common
used for blind assistance, such as navigation and way challenge is to find dropped or misplaced personal items
finding. However, few camera-based systems are devel- (i.e., keys, wallets, etc). While many literatures and systems
oped for helping blind or visually impaired people to find have been focusing on navigation (Kao et al. 1996), way
daily necessities. In this paper, we propose a prototype finding (Guide et al. 2008), ‘‘scene object recognition
system of blind-assistant object finding by camera-based (Nikolakis et al. 2005) text reading (http://www.seeing-
network and matching-based recognition. We collect a withsound.com/ocr.htm), bar code reading, banknote recog-
dataset of daily necessities and apply Speeded-Up Robust nition (Sudol et al. 2010), etc., there are rare camera-based
Features and Scale Invariant Feature Transform feature systems available in the market to find daily necessities for
descriptors to perform object recognition. Experimental the blind. According to the AFB (2012), visually impaired
results demonstrate the effectiveness of our prototype individuals can live comfortably in a house or apartment by
system. following these principles: lightning increment, hazard
elimination, color contrasts creation, item organization and
labeling, and glare reduction. It means that blind or visually
1 Introduction impaired people need effective assistance, including natural
daylight simulation, fluorescent tape, colored objects
The World Health Organization (WHO) estimated that in against dark backgrounds, and the tagging of daily neces-
2002, 2.6 % of the world’s total population was visually sities for recognition. However, it is inconvenient and
impaired. Also, the American Foundation for the Blind expensive to make these complex arrangements in most
(AFB) approximates that there are more than 25 million cases. On contrast, camera setup is easier and more eco-
people in the United States living with vision loss. Visually nomical. It costs less than $500 to buy a 1080 p full High-
impaired people face much inconvenience when interacting Definition camera, which supports video in 1920 9 1080
resolution. Thus computer vision and pattern recognition
technology could be an effective alternative to assist blind
C. Yi  Y. Tian
or visually impaired people.
Department of Computer Science, The Graduate Center,
The City University of New York, New York, NY 10016, USA Based on our knowledge, few blind-assistive systems are
e-mail: cyi@gc.cuny.edu able to help blind or visually impaired people find their daily
necessities. Previous blind assistant system focuses more on
R. W. Flores  R. Chincha  Y. Tian (&)
navigation, path planning and blind person tracking. In
Department of Electrical Engineering,
The City College of New York, New York, NY 10031, USA Orwell et al. (2005), a method was designed to track foot
e-mail: ytian@ccny.cuny.edu player under a network of multiple cameras. In Marinakis
R. W. Flores and Dudek (2005), a method is designed to infer the relative
e-mail: rflores01@ccny.cuny.edu locations of cameras and construct indoor layout by ana-
R. Chincha lyzing their respective observations. In Hoover and Olsen
e-mail: rchinch00@ccny.cuny.edu (2000), cooperated with first-view camera, a sensor network

123
72 C. Yi et al.

was built to assist robot to perceive obstacles and plan systems used FM sonar sensors (Kao et al. 1996) and
effective paths. In Xie et al. (2008), a camera network was camera-integrated walking canes (Guide et al. 2008) for
built for object retrieval, where Scale Invariant Feature blind navigation. Sensor modules could be used for
Transform (SIFT) descriptor was employed for object searching tasks in the surrounding environments (Hub et al.
detection and recognition. Locomotion analysis (Tang and 2004). Some other prototypes for blind-assistant localiza-
Su 2012) could be used to design specific room layout to help tion and recognition were introduced in Hub et al. (2006),
blind people live a more convenient life. These systems Gehring (2008) and Bobo et al. (2008). These devices
assume that blind user is located at unfamiliar environment, could offer an equivalent of raw visual input to blind
and provide general instruction to assist blind people reach people, via complex sounds capes (head-mounted camera
their destination. However, the probability of blind or visu- and stereo headphones), thus leaving the recognition tasks
ally impaired people in unfamiliar environment is low (even to the human brain. Furthermore, many other techniques
for normal-vision people), so blind-assistant system should have been developed for efficient object detection and
care more about the daily life of blind people. They need recognition. Jauregi et al. (2009) proposed a two-step
better perception and control of their personal objects, so we algorithm based on region detection and feature extraction.
employ the computer vision technology to help them manage This approach aims to improve the extracted features by
their daily necessities. reducing unnecessary keypoints, and increasing efficiency
In our proposed system, a multiple-camera network is through accuracy and computational time. Ta et al. (2009)
built by placing a camera at the important locations of the presented an efficient algorithm for continuous image
indoor environment of blind user’s daily life. The impor- recognition and feature descriptor tracking in video.
tant locations are usually located around tables, cabinets,
and wash sink. The cameras provide scene monitoring
around these fixed locations, and inform blind users of the 2 System design
locations of his/her demanded objects. In this process,
matching-based object recognition is performed to find out Our system consists of a wearable camera with multiple
the objects. Fast and efficient object recognition is a very fixed cameras. Figure 1 illustrates the layout design of the
popular area in the computer vision community. It is to system. Blind user is equipped with a wearable camera
localize and certify given objects in images or video which is connected (wire or wireless) to a computer (PDA,
sequences. Humans are able to recognize a wide variety of or laptop), as shown in Fig. 2. The user can send requests
object within images or videos with little effort, no matter of finding an item by speech command and then wear the
the difference in viewpoint, scenarios, image scale, trans- camera system to look for the item. When the system finds
lation, rotation, range of distortion, and illumination the requested item, an audio signal will be produced. A
(Kreiman 2008). It is proved that human visual system can dataset of the personal items for the user is created as
discriminate among tens of thousands of different object reference samples. In the dataset, multiple images for each
categories (Biederman 1987) in an efficient way, about item are captured for different camera views, scales,
100–200 ms (Potter and Levy 1969; Thorpe et al. 1996; lighting changes, and occlusions.
Hung et al. 2005). Also human vision system can effec-
tively handle background clutter and object occlusions.
However, it is extremely difficult to build robust and
selective computer vision algorithms of object recognition
to handle objects with large similarity or variations. For-
tunately, present and future technological innovations in
object detection and recognition systems will contribute
extensively to helping blind individuals. LookTel was
presented in Sudol et al. (2010) to be an extensive platform
that integrates state of the art computer vision techniques
with mobile communication devices which return real-time
banknote recognition results announced by a text-to-speech
engine. OCR engines (http://www.seeingwithsound.com/
ocr.htm) were developed to transform the text information
on camera-based scene image into readable text codes.
Fig. 1 An illustration of the layout design of our prototype system.
Signage and text detection (Wang et al. 2012) were
Cameras are fixed at the important locations where blind users usually
designed to extract abstract symbol information directly leave their daily necessities. All the cameras have access to a local
from natural scene images. Besides, some other recognition network, which is managed by a host

123
Finding objects for assisting blind people 73

SURF source code 2008), precise matching is allowed


between images containing identical objects. Both SIFT
and SURF are able to extract representative and discrimi-
native keypoints from an image, which contain significant
information of the object appearance and structure in the
image.
SURF is a robust image detector and descriptor that can
be used in computer vision tasks like object recognition or
3D reconstruction. The standard version of SURF is several
Fig. 2 An illustration of the wearable devices for blind user, times faster than SIFT. The SURF detector is based on the
including sunglasses camera, mini-computer for data processing, Hessian matrix which provides excellent performance in
and audio system to output recognition results
computational time and accuracy. Given point x(x,y) in an
image I, the Hessian matrix H(x,r) in x at scale r is defined
as Eq. (1).
 
Lxx ðx; rÞ Lxy ðx; rÞ
Hðx; rÞ ¼ ð1Þ
Lxy ðx; rÞ Lyy ðx; rÞ

where Lxx(x,r), Lxy(x,r) and Lyy(x,r) are the convolution of


the Gaussian second order derivative with image I in point
Fig. 3 Flow chart of our proposed object recognition algorithm
x(x,y). Gaussians are best for scale-space analysis, but in
practice, they must be discretized and cropped. The
Multiple cameras are fixed at the important locations
detection of interest points in the image is determined by
where blind user probably leaves his/her items, and com-
non-maximum suppression in a 3 9 3 9 3 neighborhood.
pose a blind-assistant network. When a blind user sends a
On the other hand, the SURF descriptor is based on
request of object finding into the system, all the fixed
sums of approximated Haar wavelets responses within the
cameras will start object recognition by comparing their
circular interest point region, with radius 6S, with S being
captured objects with the reference objects in the dataset.
the scale at which the interest point is detected. Since the
Then the system will report the most similar object though
wavelets are large at high scales, integral images are effi-
matching distance and instruct blind user to get close to its
ciently used for fast filtering. Once the wavelet responses
location. Next, the camera attached to blind user will
are calculated, dominant orientation can be estimated
perform further recognition to certify the existence of the
within a sliding orientation window covering an angle of
demanded object.
p/3. The horizontal and vertical responses originated from
The flowchart of our proposed object recognition algo-
this window yield a vector, which lends it orientation to the
rithm is shown in Fig. 3. First, based on the blind user’s
interest point according to the longest one obtained. Square
request, features from camera captured images are extrac-
descriptors are then constructed around the interest points
ted by Speeded-Up Robust Features (SURF) or SIFT.
along the orientation determined by the vector. Each square
Then, these features are compared to pre-calculated fea-
then subdivides regularly into a smaller 4 9 4 sub-region
tures from reference images of the request object in the
keeping important spatial information in for the descriptor
dataset. If matches are found, the algorithm will output if
construction.
the object has been found or not according to the pre-
SIFT extracts local features by mapping gradient ori-
established thresholds for each object.
entations within predefined blocks and cells into a histo-
gram. It calculates the Difference of Gaussian (DOG) of
image maps in scale space as Eqs. (2) and (3).
3 Object feature extraction
Lðx; y; rÞ ¼ Gðx; y; rÞ  Iðx; yÞ ð2Þ
A large variety of features can be extracted from camera-
Dðx; y; rÞ ¼ Lðx; y; krÞ  Lðx; y; rÞ ð3Þ
based scene image, so the combinations and selection of
the features play an important role in data analysis (Xiang where x, y, and r denote spatial coordinate and scale
et al. 2012; Husle et al. 2012). Interest point detectors and respectively, Gðx; y; rÞ is a Gaussian filter with variable
descriptors (Bay et al. 2006; Lowe 2004) are able to extract scale r, and Dðx; y; rÞ represent the DOG map.
representative and discriminative features from reference The DOG function approximates the scale-normalized
images. Based on the state-of-the-art local feature Laplacian of Gaussian, that is, there is Dðx; y; rÞ  r2 r2 G.
descriptors SIFT (Lowe 2004) and SURF (Bay et al. 2006; Previous work shows that local maxima and minima of

123
74 C. Yi et al.

Fig. 4 An illustration of SIFT


descriptor. The left presents
original image and a keypoint in
the red circle. The middle
presents the 16 blocks around
the keypoint and their
respectively gradient
orientations. The right denotes a
128-dimensional feature vector
generated by the votes of
quantized gradient orientations
(color figure online)

r2 r2 G are the most stable image features in comparison image scale, translation, rotation, change in viewpoint, and
with gradient, Hessian, Harris and so on. Thus, the local partial occlusion. All these images are captured in the
maxima and minima of r2 r2 G are extracted as SIFT fea- presence of cluttered backgrounds. Some examples are
ture points. displayed in Fig. 5.
The SIFT keypoints are in the form of oriented disks
attached to representative structure of the objects in the 4.1 Camera network
image. The detected keypoints could keep invariant to
transitions, rotations, scale changes, and other deforma- Our proposed system is based on a network of cameras. A
tions. They are then used as representative local features of blind user sends his/her request of finding a specific object,
the objects from image. Around each SIFT keypoint, a and the system will start object recognition from the
4 9 4 block is defined and a histogram of gradient orien- cameras to search their respective regions. Each camera
tations is generated. Since gradient orientation is quantized will output a recognition score by comparing its captured
into eight values, the histogram has eight corresponding objects with image samples of the reference object in the
bins. Then all histograms of the blocks are cascaded into a dataset. The score is calculated from the average matching
16  8 ¼ 128 dimensional vector, as shown in Fig. 4. This distances of the SURF/SIFT key points. Thus several
feature vector will be used as SIFT feature descriptor at the possible locations of the request object could be obtained
keypoint. from small matching distances.
SIFT descriptor plays an important role in many appli- In our system, all the cameras are connected to a local
cations involved in content-based information retrieval, network for information share. Each camera reports the
object matching, and object recognition etc. It is able to results of matching reference objects. A host takes charge
find out distinctive keypoints in images that are invariant to of information collection and analysis. It will notify the
location, scale, rotation, affine transformations and illu- blind user of the most probable locations of his/her
mination changes. Given two images, ones is a complete expected objects.
object with clean background and standard viewpoint, and
the other contains the same object under a different view- 4.2 Matching-based recognition
point, complex background or partial occlusion. We apply
SIFT detector and descriptor to find out matches between To identify an object in every query image and without
the two images. Each match will be assigned a score based counting any false matches, thresholds are set up for every
on the Euclidean distance between the SIFT descriptors of reference image based on experiments. In our system, a
the two matched points. The number and score of keypoint threshold level of ten key matching points is employed for
matching will be used to design our algorithm of object the cell phone, sunglasses and keys, and a threshold level
recognition. of 25 key matching points is used for the Watch Wrist, and
Juice Cup.
Figure 6 illustrates the detected interest points from dif-
4 System implementation ferent reference objects, marked in red circles. The same
method is also applied to the query object. After the con-
In order to effectively recognize objects, we collect a struction of descriptors around the interest points of reference
dataset of daily necessities as reference objects. This and query images, SURF features from the reference images
dataset contain personal items that are essential to visually are extracted and matched against the features from the query
impaired individuals, such as keys and sunglasses. In images in our dataset. If the SURF features in the query image
addition, this dataset covers a variety of conditions such as match the ones in the selected object reference image, then we

123
Finding objects for assisting blind people 75

Fig. 5 Samples of object


images taken under different
conditions. In each group, the
first one is a standard reference
image, and the later five
respectively represent scaling
change, transition, rotation,
viewpoint change and partial
occlusion

employ the predefined thresholds to determine if the object is


the expected item which the blind user looks for. Figure 8
illustrates that our algorithm can successfully predict the
category of query image in the presence of background clutter,
partial occlusion, and viewpoint changes.
Next, SIFT descriptor is applied to the dataset for per-
formance evaluation of object recognition through the
keypoint matching. At first, we collect an object dataset of
reference images, which consists of ten categories of
common-use objects, such as key, glasses, coffee cup, etc.
To predict object category of the queried image IQ , we
compare it with each reference image IR in the dataset.
Then SIFT detector is applied to obtain two set of key-
points KQ and KR respectively from the two images. The
corresponding 128-dimensional SIFT descriptor is calcu-
lated from each keypoint. Based on Euclidean distance Fig. 6 Interest points of main reference images. For each group, the
between SIFT descriptor, each key point will be assigned a left one denotes original image and the right one denotes detected
SURF features
match from the other image. Firstly, a key point P ðP 2
KQ Þ (is measured to obtain its distance to each keypoint in
KR . If the keypoint P0 (P0 2 KR ) has the minimum distance reduce false positives and increase robustness. The other
to P, then they are regarded as a match of each other. keypoints and their corresponding matches will be
Secondly, we only preserve the keypoints where the ratio removed. Thirdly, we calculate the mean distance dðIQ ; IR Þ
between nearest neighboring distance and the second from the rest keypoint matches. As above mention, IR is
nearest neighboring distance is greater than 0.6. It is able to one of the reference images in the dataset. Then we

123
76 C. Yi et al.

calculate all the mean distances from the query image IQ to samples. It covers a variety of conditions such as image
the sibling reference images of IR , which belong to the scale, translation, rotation, change in viewpoint, and partial
same category, as Eq. (4), occlusion.
1 X The proposed algorithm can effectively distinguish dif-
LC ðIQ Þ ¼ dðIQ ; IR Þ ð4Þ ferent classes of objects from test images of the same
jjCjj I 2C
R
objects in the presence of a cluttered background in dif-
where LC ðIQ Þ represents the mean distances from query image ferent scenarios under a variety of conditions. When further
to a category C of reference images, and jjCjj represents the testing is performed with multiple query images, the
number of reference images in the category C. The minimum algorithm successfully identified objects as shown in
distance models similarity between the object in query image Fig. 8. The white lines demonstrate the traces from the
and those in the dataset, and it can be used to predict which reference image to the test image which represent the
category the object in query image belongs to. In the same position where the matching features have been detected. If
way, we calculate the mean distance to each category in the the number of the matching points is greater than the
dataset. Then the query image will be assigned to the category thresholds, the object is detected. Furthermore, there are no
with minimum mean distance as Eq. (5), falsely identified items for all the classes of objects even
when they are partially occluded by other objects.
C ¼ arg min LC ðIQ Þ ð5Þ We observe that under some situations, the algorithm
C2dataset
fail to find enough matching points and identify an
Figure 7 depicts the keypoint matching between the expected object when it is captured by the camera. Fur-
objects from identical category, but in different viewpoint, ther, when the algorithm initially identifies an object of
distance and background. This figure demonstrates that interest, based on the thresholds of each class of object,
SIFT feature detector and descriptor can handle arbitrary its matching features are not enough to correctly identify
transitions, rotations, and scale changes. It ensures that our it. This happens due to the challenging conditions of
system of object finding will not be influenced by relative some test images which are taken on a variety of
positions of the blind user and the object, or the conditions.
surrounding environments of the object. SURF features are only robust to some degree of rota-
tion and view point changes. These errors are caused by
images with very large variations in illumination, back-
5 Experimental results and discussion ground, scaling and/or rotation. Changes in illumination
and cluttered background generate few matching points
To evaluate the performance of SURF and SIFT-based that would affect object recognition. The large scale
object recognition, we collect a testing dataset for the ten change has been approved as the strong relief effect in Yu
classes of daily necessities. Each class contains ten image and Morel (2011) ASIFT and would also affect

Fig. 7 SIFT feature based


interest points matching of main
reference images in the dataset
while blue lines represent
matched keypoints (color figure
online)

123
Finding objects for assisting blind people 77

Fig. 8 Matching between


reference and test images in the
presence of cluttered
background under various
conditions, keys (image scale)
and sunglasses (translation). It
shows that the SURF detector
and descriptor can recognize the
expected object on various
environmental conditions

Table 1 SURF-based recognition accuracy for each reference object Table 2 presents the results of object recognition. We
Objects No. of Correct False Accuracy can see the objects with discriminative surface texture,
images detected positive (%) such as Book and Juice Cup, obtain higher accuracy of
recognition. It is because SIFT descriptor is designed for
Book 10 7 3 70
texture matching. However, the recognition results of keys
TV-control 10 5 5 50
and sunglasses are lower, because their appearance
Antiseptic 10 5 5 50
pad
depends much on the viewpoints of capture. The above two
Coffee cup 10 8 2 80
tables demonstrate that SIFT-based descriptor obtain better
performance than SURF-based descriptor.
DVD 10 8 2 80
control
Juice cup 10 8 2 80 5.1 Comparisons between SURF-based recognition
Water 10 6 4 60 and SIFT-based recognition
bottle
Watch 10 10 0 100 The experimental results demonstrate that SURF-based
wrist recognition has higher efficiency, while SIFT-based recog-
Keys 10 6 4 60 nition obtains higher accuracy over all testing object cate-
Sunglasses 10 6 4 60 gories. SIFT detector extracts DOG maxima and minima as
Total 100 69 31 69 keypoints, which are more stable than SURF keypoints based
on Hessian matrix. Besides, SIFT descriptor in our experi-
ments has 128 dimensions, while SURF descriptor has only
recognition. Rotation plays a big role as well, since the 64 dimensions. SIFT descriptor preserves more information
SURF descriptors are constructed around interest points of local features. From another aspect, SURF improves the
whose largest orientation vector is within a sliding orien- computational efficiency of feature extraction, because (1) it
tation window of p/3. Any orientation vector outside this lowers feature dimension compared to SIFT; (2) it simply
range will result in mismatching. Therefore, if we were not sums first-order Haar wavelet responses to extract feature
dealing with images that contain extreme scaling changes descriptor, instead of the statistics of gradients.
and rotations, we would have a very robust and efficient It is challenging tasks to combine SIFT and SURF,
object recognition algorithm. because the two detectors extract different groups of key-
Because we deal with images that contain a variety of points from an identical object, which correspond to different
conditions, the test accuracy for each class of objects is aspects of object appearance and structure. The simple cas-
between 50 and 95 %. Our algorithm achieves an average cade of the two descriptors cannot improve the recognition
accuracy of 69 % as shown in Table 1. accuracy, but further lower the efficiency. In future work, we

123
78 C. Yi et al.

Table 2 SIFT-based recognition accuracy for each reference object and auditory display of the object recognition on computers
Objects No. of Correct False Accuracy
and cell phones.
images detected positive (%)
Acknowledgments This work was supported by NSF grant IIS-
Book 10 9 1 90 0957016, EFRI-1137172, NIH 1R21EY020990, ARO grant
TV-control 10 8 2 80 W911NF-09-1-0565, DTFH61-12-H-00002, Microsoft Research, and
CITY SEEDs grant.
Antiseptic 10 8 2 80
pad
Coffee cup 10 9 1 90
DVD 10 8 2 80
References
control
American Foundation for the Blind (2012) http://www.afb.org/.
Juice cup 10 9 1 80
Accessed 2012
Water 10 7 3 70 Bay H, Tuytelaars T, Van Gool L (2006) SURF: speeded up robust
bottle features. European Conference on Computer Vision
Watch 10 6 4 60 Biederman I (1987) Recognition-by-components: a theory of human
wrist image understanding. Psychol Rev 94:115–147
Keys 10 5 5 50 Bobo B, Chellapa R, Tang C (2008) Developing a real-time identify-
and-locate system for the blind. In: Workshop on computer
Sunglasses 10 5 5 50 vision applications for the visually impaired
Total 100 74 26 74 Gehring S (2008) Adaptive indoor navigation for the blind. Proc GI
Jahrestagung 1:293–294
Guide R, Østerby M, Soltveit S (2008) Blind navigation and object
will integrate both SIFT and SURF into a histogram of visual recognition. Laboratory for Computational Stochastics, Univer-
sity of Aarhus, Denmark. http://www.daimi.au.dk/*mdz/
words under the Bag-of-Words framework, and it will gen-
BlindNavigation_and_ObjectRecognition.pdf. Accessed 2008
erate most robust and efficient object recognizer. Hoover A, Olsen B (2000) Sensor network perception for mobile
robotics. IEEE Int Conf Robotics Autom 1:342–347
Hub A, Diepstraten J, Ertl T (2004) Design and development of an
indoor navigation and object identification system for the Blind.
6 Conclusion and future work
In: Proceedings of ASSETS, pp 147–152
Hub A, Hartter T, Ertl T (2006) Interactive tracking of movable
This paper presents a prototype blind-assistant system. It helps objects for the blind on the basis of environmental models and
blind user to find their personal items in daily life through perception oriented object recognition methods. In: Proceedings
of ASSETS, pp 111–118
camera-based network and matching-based object recogni-
Hung C, Kreiman G, Poggio T, DiCarlo J (2005) Fast read-out of
tion algorithm. We employ two types of local feature object identity from Macaque inferior temporal cortex. Science
descriptors for detecting keypoint matches. The SURF and 310:863–866
SIFT interest point detector and descriptor are scale-invariant Husle J, Khoshgoftaar M, Napolitano A, Wald R (2012) Threshold-
and rotation-invariant, and provided our object recognition based feature selection techniques for high-dimensional bioin-
formatics data. Netw Model Anal Health Inform Bioinform
algorithm with ways to handle image scaling, translation, 1(1–2):47–61
rotation, change in viewpoint, and partial occlusion between Jauregi E, Lazkano E, Sierra B (2009) Object recognition using
objects in the presence of cluttered backgrounds. region detection and feature extraction. In: Proceedings of
towards autonomous robotic systems (TAROS) (ISSN:
In order to evaluate the performance of our algorithm, a
2041-6407)
reference dataset is built by collecting daily necessities that Kao G, Probert P, Lee D (1996) Object recognition with FM sonar: an
are essential to visually impaired or blind people. Experi- assistive device for blind and visually-impaired people. AAAI
mental results demonstrate that the proposed algorithm can fall symposium on developing assistive technology for people
with disabilities. MIT, Cambridge
effectively identify objects under conditions of cluttered
Kreiman G (2008) Biological object recognition. Scholarpedia
background and occlusions without falsely identifying any 3(6): 2667 http://www.scholarpedia.org/article/Biological_object_
reference object. However, based on pre-learned thresholds, recognition
the algorithm also failed to find enough matches to identify Lowe D (2004) Distinctive image features from scale-invariant
an object of interest when present in the image such as ‘‘cell keypoints. Int J Comput Vision
Marinakis D, Dudek G (2005) Topology inference for a vision-based
phone’’ due to a lack of distinguishable features. sensor network. In: Proceedings of Canadian conference on
Our future work will focus on enhancing the object computer and robot vision, pp 121–128
recognition system so that it can better detect and identify Mobile OCR, face and object recognition for the blind.
http://www.seeingwithsound.com/ocr.htm. Accessed 1996–2013
objects under extreme and challenging conditions. We will
Nikolakis G, Tzovaras D, Strintzis MG (2005) Object recognition for
enhance cooperation of different cameras within the sys- the blind. In: Proceedings of 13th European signal processing
tem, and address human interface issues for image capture conference (EUSIPCO 2005). Antalya, Turkey

123
Finding objects for assisting blind people 79

Orwell J, Lowey L, Thirde D (2005) Architecture and algorithms for Thorpe S, Fize D, Marlot C (1996) Speed of processing in the human
tacking footable players with multiple cameras. IEEE Proc visual system. Nature 381:520–522
Vision Image Signal Process 152(2):232–241 Wang S, Yi C, Tian Y (2012) Signage detection and recognition for
Potter M, Levy E (1969) Recognition memory for a rapid sequence of blind persons to access unfamiliar environment. J Comput
pictures. J Exp Psychol 81:10–15 Vision Image Process 2(2)
SURF source code (2008) http://www.vision.ee.ethz.ch/*surf/. Xiang Y, Fuhry D, Kaya K, Jin R, Catalyurek U, Huang K (2012)
Accessed 2008 Merging network patterns: a general framework to summarize
Sudol J, Dialameh O, Blanchard C, Dorcey T (2010) LookTel—a biomedical network data. 1(3): 103–116
comprehensive platform for computer-aided visual assistance. Xie D, Yan T, Ganesan D, Hanson A (2008) Design and implemen-
IEEE Conference on Computer Vision and Pattern tation of a dual-camera wireless sensor network for object
Recognition retrieval. In: Proceedings of the 7th international conference on
Ta DN, Chen WC, Gelfand N, Pulli K (2009) SURFTrac: efficient information processing in sensor networks, pp 469–480
tracking and continuos object recognition using local feature Yu G, Morel J.-M (2011) ASIFT: an algorithm for fully affine
descriptors. IEEE Conference on Computer Vision and Pattern invariant comparison. Image Process On Line, 2011
Recognition, pp 2937–29
Tang W, Su D (2012) Locomotion analysis and its applications in
neurological disorders detection: state-of-art review. Network
Model Anal Health Inform Bioinform

123

S-ar putea să vă placă și