Blind Assistant - Rev 03

An Open Architecture to Develop a Handheld Device for Helping Visually Impaired People
Filippo Battaglia and Giancarlo Iannizzotto, Member, IEEE

Abstract The wide availability of modern handheld devices equipped with powerful hardware, large memory and mass storage, a webcam and wireless connection suggests that they can also be used to develop aids for visually impaired people. Most of the handheld devices and software currently available for this purpose are quite expensive, cannot easily be expanded and are based on closed systems, while little or no effort seems to have been made to achieve integration between different applications. This paper introduces Blind Assistant, an integrated modular, expandable, open source software package intended for showing that producing effective and affordable aids for the visually impaired is possible, by easily porting software currently available only for cumbersome personal computers, with little effort and at low cost1. Index Terms Assistive technologies, handheld devices, open source software, image analysis
I. INTRODUCTION AND MOTIVATION The development of portable devices that can perform primary visual recognition functions could significantly contribute towards improving the level of autonomy and thus the quality of life of many visually impaired men and women. Artificial vision functions such as face recognition, automatic text reading, automatic detection of obstacles, detecting and counting people in a room, recognition of rooms or environments the user has previously been in, can currently be performed in real-time and with a sufficient degree of accuracy for many applications, by a personal computer equipped with an adequate video camera [1][2]. On the other hand, although it is not inconceivable to equip a visually impaired user with a device the same size as a notebook with the aim of providing powerful and highly useful visual functions, weight and size remain a serious problem when the attempt is to develop aids that really ensure autonomy and mobility. A second problem is cost: whereas the average cost threshold for medical aids designed for impaired people is traditionally considered to be higher than that of other applications (for reasons that today are perhaps not fully acceptable [3]), it is also true that costs can be decidedly reduced by using consumer components that are widely available, with evident social benefits.
Filippo Battaglia is with VISILab, University of Messina, 98100 Italy (e-mail: filbattaglia@libero.it). Giancarlo Iannizzotto is with DISIA, University of Messina, 98100 Italy (e-mail: ianni@unime.it).
1
A third problem is the need for customizability: during our interviews with visually impaired people and care operators belonging to a nation-wide volunteers association, it was repeatedly pointed out that visually impaired people may have different needs and different abilities. They have learnt different ways to cope with everyday difficulties and different work-arounds to overcome the constraints of a physical or social environment. This is also widely reported in literature [4]. In order to optimize the interaction of the user with the aiding device, and maximize the usability and utility of the device, a personalization of the interface and a custom choice of some specific aiding functions would be needed. Such customization in most cases would require the underlying software/hardware to be an open platform, i.e. an architecture which has published an external programming interface that allows the extension of the available functions, or the integration of their technology into a new application or framework, without requiring the modification of the source code. Using such interface, typically known as Application Programming Interface (API), a 3rd party software can be integrated with the platform to add functionalities. Unfortunately, in our knowledge currently there are no open platforms for aiding visually impaired people, providing full customizability. A viable choice, as pointed out by some of the volunteer care operators we interviewed, would be using open source software. Most volunteer associations are already accustomed to using open source software for economical reasons (usually, open source software is free) and in several cases they can count on skilled volunteering programmers who can modify and customize software when the source code is available. Finally, open source software can be developed, improved and enhanced on a collaborative basis by volunteering programmers, which may result in better, more effective and cheaper products [5]. In this paper we introduce Blind Assistant, a modular and expandable software package based on a free and open-source software development kit (SDK) named Nanodesktop. The main contribution of the presented work is to describe and demonstrate an open source package providing high-end computer vision, speech recognition and synthesis and communication functions for aiding visually impaired people. Blind Assistant features full customizability, modularity, expandability and a complete and standard toolchain compliant with the requirements of legacy PC software building and development toolchains. We also suggest that the
open source approach, together with the availability of a standard toolchain (in our case, largely compatible with Linux), can get the large community of open source volunteering programmers to improve the original projects and help in adapting them to the real needs of the potential users. In the next sections we briefly present some relevant related work and illustrate the Nanodesktop architecture and the Blind Assistant features and functions. We also describe our approach to testing and evaluating the system. Section 8 reports our final remarks and introduces our plans for future development of the presented work. II. RELATED WORK The idea of developing aiding devices for visually impaired people has fascinated the researchers since several years. Most works available in literature refer to prototype systems aimed at providing functions like indoor and outdoor orientation, obstacle avoidance, object detection and recognition, optical character recognition (OCR), barcode reading, speech recognition and text to speech applications. Reading assistance is one of the most explored research field: in [6] Hedgpeth and Black proposed a system aimed at helping the blind to read a book. The presented architecture was based on a personal computer, but the authors express the intention to create a portable and even a wearable version of their system. A different target is pursued by the SWANS (Semantic Web Accessibility Network Services) research project. In [7] M. Rana and M. Cirstea presented a prototype aimed at enabling the blind to navigate on the web from a mobile device. The system uses a screen reader (JAWS), a portable keyboard and a PDA with wireless connection. Other systems were designed to provide different functions: in [8] Helal and Moore presented Drishti, a navigation system able to guide the blind along a predefined pathway, but using a cumbersome wearable equipment. All previous systems are specialized for a single function and thus using two or more of them at the same time is difficult: a multi-functional system capable of providing two or more services could be of great advantage. A project of this kind is Tyflos [9], a very powerful wearable system for movement within an environment and also text reading. The input information is acquired by two small video cameras and a microphone, assisted by an RFID reader, a GPS device and a proximity sensor. The output is conveyed to the user via a vibrating device and a speech synthesizer. Tyflos can calculate the distance map of the scenario captured by the two video cameras in real-time and convey the information to the user by means of a vibration array cell. On the basis of this information the user understands, for example, whether there is a door in front of him. It is also equipped with face detection and face recognition functions. Another project worth mentioning is Sypole [10], which provides a system for the recognition of banknotes, a text reader and an object recognition system, running on a simple
handheld device. Unfortunately, both Tyflos and Sypole appear to be still at an experimental stage and have never been made publicly available (on the other side, our platform is currently available to the public). It has been observed [11] that hardware aimed at assisting visually impaired people should cost no more than $1000, if it is supposed to gain large diffusion. Higher costs would only be justified for a platform providing enormous and obvious advantages. Although objections could be raised as to the exact amount mentioned above, it is reasonable to consider cost as a decisive factor for the widest diffusion of aids for the visually impaired. For example, the hardware required by Tyflos is expensive: the apparatus comprises two small cameras mounted on a pair of dark glasses, a microphone, a set of headphones, a proximity sensor, a GPS, an RFID antenna, a laptop and a wearable 2D vibration matrix. The devices listed above would require a mass production to reduce their cost to a level which might make them really accessible to a large number of users. III. AN OPEN SOFTWARE PLATFORM FOR HANDHELD DEVICES In designing Blind Assistant we primarily had to understand what blindness means in everyday life and how a blind user could be helped by a portable device with computer vision and speech recognition/synthesis technologies. We referred to the experiences of Blevis and Siegel [12], Shinoara [13] and Blythe et al. [14] to understand how to investigate the needs of a blind user and the way technology can aid him or her in everyday life. Our investigation about users needs is not exhaustive and should not be considered as a significant contribution to the field: it was only aimed at determining a basic set of functions which could be introduced in Blind Assistant to explore the suitability of our approach. We started with a tour into the everyday life of a group of 5 potential users (three are blind from their birth, one has acquired total blindness and one has low non-functional vision), all legally blind. We met, and spent at least three hours with each of them singularly, taking notes and recording some parts of the conversations. Each tour spanned various places, each one being peculiar of some significant part of the user's life: home, workplace, most frequented streets. The aim of this first investigation was to gather information about the habits of the users and the most significant problems and difficulties they met in their everyday lives. We also asked them to fill an anonymous form with a set of questions about some significant personal information: age, gender, education, technological skill (as perceived by the user: see below) and degree of confidence (expressed as a numerical value ranging from 0 to 4) in computer technology as an aid to their problems. As the final step of the interview, the form had three blank sections headed with the words I wish I had.... The participant was then asked to fill the
sections with the descriptions of up to three different problems that could be solved by potential technological developments. The participants were all in the age from 21 to 42 years old, 4 men and one woman. Such distribution was not our choice: it depends on the number of people who volunteered to participate in the interviews. Then we separately met and interviewed 5 care operators belonging to a volunteers association for caring visually impaired people. The interviews were quite informal but very informative. We gained an insight about what the operators think about some technological aids for visually impaired people and how the operators could intervene in improving the way such aids work. We were also informed about some severe issues related to technical and economical factors such as the initial price of the aids, the cost for upgrading and updating technologies and devices, the need for customization and the wish for an open architecture, and so on. We collected the information resulting from the analysis of the interviews and the questionnaires and built a list of requirements for our aiding system. Most potential users required specific functionalities, while the care operators mainly concentrated on structural specifications such as reliability, upgradeability, acceptable cost. We received in particular one strong structural requirement from the potential users: the aid should be easy to use and to carry around. We where quite prepared to receive the first half of this requirement, as literature states clearly that often developers fail in designing adequate user interfaces for aiding devices [13]. On the other hand, it took us quite an effort to fully understand that easy to carry around does not refer only to weight and size, but also to the way the users perceive the device and to the way the user supposes that others (sighted) people perceive it [4]. Keeping in hand a cell phone or a digital camera while walking is considered normal by sighted people, but a small video camera tightened to the chest or forehead of someone walking is almost immediately perceived as a prosthesis to be concerned about. We therefore focused our attention on handheld, widely used devices instead of wearable, concealable ones. We then roughly divided the functional requirements into two categories: those requiring well-established, stable technologies (such as OCR, bar-code reading, speech synthesis and speech recognition) and those requiring experimental, cutting-edge technologies, prone to be unstable or somewhat unreliable. We consider the latter functions as unsuitable for end-user applications, but very useful for revealing the real limits of the current handheld technology and of our software platform. Also, some of the most intriguing wishes coming from the users fell in this category: for example, the face recognition function was requested by a user who just wanted the ability to do what most sighted people seamlessly do at any time: recognize a someone in person" after seeing him or her on a set of photographs (for example, on a journal). When proposing this request to other potential users, later on, we were suggested that a better use would be to have a shared set of mug shots of people, for example care operators supposed to be of interest
to a community of (visually impaired) users, in order to be able to recognize them immediately when met in person. Another intriguing request was related to the ability of recognizing a place from a set of pictures. Again, the most interesting application would be the ability to share those pictures in order to share the ability to visually recognize a specific place. For example, such a place would be a specific room in a building of interest of a whole community. Note that, while GPS technology enables a handheld device to recognize a building from the outside, it does usually help us to locate our position inside the building. We therefore decided to add some of the most intriguing advanced applications, even though we do not consider them fully reliable, and evaluated them separately from the other, stable, functions. Among those applications requiring more stable and wellestablished technologies, we found very interesting the ability to remotely read 1D and 2D barcodes. Though there are a number of devices able to read barcodes, they in most case rely on a laser probe which has to be positioned very close to the barcode label. We were asked a function able to read the barcodes from a distance of at least 40-50 centimeters, and without the need to accurately aiming at the label. While the Blind Assistant platform is fully modular, expandable and customizable, after the above considerations we choose to start with a basic set of functionalities, all suggested by the users: face recognition, able to recognize a restrict set of well-described faces; text recognition for automatic reading labels and short sentences; place (room) recognition; e-mail, messages reading and dictating; color recognition; barcode remote reading As discussed in detail in the previous sections, in the literature and on the market there are a number of software packages that could be extremely useful if correctly integrated into a portable device designed as an aid for the visually impaired. In this case, it is not necessary to develop the software from scratch but it should be sufficient to port existing software created for other platforms (typically x86). IV. NANODESKTOP The choice of the hardware/software platform on top of which to build Blind Assistant was mainly driven by our design constraints but also somehow influenced by some contingencies. First of all, we were oriented towards handheld devices due to the considerations listed above. We wanted a commercial off-the-shelf device for economical reasons, and because we wanted our users to carry a very anonymous device, i.e. a device so much widespread to be hardly noticed by sighted people. We also needed a powerful device, with a powerful CPU for image processing and analysis, with wireless connectivity for Internet access and resource sharing.
We finally needed an on-board camera for practical reasons. By chance, we also had a software development kit supporting full portability of legacy applications (applications made for PC) and standard libraries and building tools which we wanted to test with some demanding and challenging application. This SDK, named Nanodesktop, is a free and open-source software aimed at developing computer vision applications on embedded systems. It is particularly suitable for non-standard or exotic hardware devices, that is, devices for which a fully functioning Linux kernel is not available or devices based on proprietary operating systems.. A full description of Nanodesktop and the advantages of exploiting it for programming or porting legacy (i.e. PC-based) programs to mobile/handheld platforms was published in a previous paper [15]. The SDK is, in fact, a system of libraries acting as an intermediate layer between the operating system of the device and the (so-called) Nanodesktop version libraries. These are libraries for x86 that would normally run on a PC and have been modified so as to access the hardware and the GUI only through Nanodesktop. Its role, in other words, is to abstract the hardware behavior for the libraries placed at the upper layers. It is technically possible for a single programmer to port the SDK to a new device in a few days. The current version of Nanodesktop (0.4) was developed to work on devices like phones or widespread game consoles. The hardware that we use for Blind Assistant consists of a handheld console equipped with a pair of RISC microprocessors (working at 333 and 166 Mhz), 32 (or 64) Mb of Ram, a video accelerator, a wireless connection, an USB port and a slot for flash memory cards. This class of hardware was chosen because of the computational power of the integrated microprocessors, which make the device particularly suitable for the execution of complex image processing algorithms. In its current version, Nanodesktop provides primitives for: handling the webcam connected to the console; sound and compressed audio reproduction; network connection; external wireless or virtual keyboard management; serial port handling; mouse emulation. Hardware mathematical operations are sped up by special routines that use a dedicated hardware component, equipping the chosen console, which is specialized for the execution of matrix and floating point computations. V. BLIND ASSISTANT Blind Assistant is the software platform stemming from our investigation and design effort. The application, whose source code is released under GPL license, currently implements the 6 functionalities listed at the end of section 3 and can be automatically and remotely upgraded through Internet. The user interface (by definition, a GUI is of no use to the visually impaired), exploits the Nanodesktop version of PocketSphinx. This is a free and open source speech
recognition software released by Carnegie Mellon University (CMU) for mobile devices. This allows switching the current function of the system on the basis of voiced commands imparted by the user. As an alternative, the user can switch between functions using the two shoulder" buttons at the sides of the device. The user interface also uses the ndFLite speech synthesis software, through which it provides spoken feedback (multilanguage: currently Italian and English are supported) to the user. The software assumes that the blind user has the assistance of an operator who handles the initial configuration of the program and the training of the various systems: all settings are, however, saved on the memory stick integrated in the console and automatically restored at system start-up. A. Face Recognition Mode A face recognition system is able to recognize people from a photo or video. These recognition systems are usually executed by powerful processors. In Blind Assistant, on the other hand, it is possible to run a face recognition algorithm on a small embedded RISC processor, working at a frequency of 333 MHz, such as the one installed in the console (it has 2 processors, the latter is used by Nanodesktop for media encoding/decoding).
Fig. 1. A screenshot of Blind Assistant in face recognition mode
The Blind Assistant face recognition system can currently operate on a set of at most 10 people. The image is captured by the webcam in 8-bits gray levels with a resolution of 320x240 pixels (see Fig. 1). As the face recognition system is sensitive to variations in both illumination and face pose, a training phase is run for each person, during which a set of 8 samples is acquired. In fact, since such variations can sensibly alter a gray-level image as compared with the initial sample, if a single photo was used, the performance of the system in the presence of varying poses of the same person would be very poor. The system guides the blind's personal assistant by means of a wizard. Details of the
technology used for face recognition are beyond the scope of this paper; in short, an image is first normalized as regards luminosity, then subjected to a face detection process by means of the Viola-Jones [16] algorithm, and finally to a face recognition process by means of the PCA (Principal Component Analysis) algorithm [17]. The software determines whether a known face is in the image and its position and reproduces a spoken message alerting the user about the presence and average position of the subject (Mario is in front of you, Mario is on your left, Mario is on your right). There is no need to aim exactly at the face: the software detects any number of faces in the video in real time and when at least one face is visible enough to be potentially recognized a short sound is emitted to signal that a person is in close to the user. At that point, if the face is recognized the spoken message is emitted. B. Room Recognition Mode Blind Assistant can recognize a room filmed by a webcam, so as to allow a blind user to move and orientate inside a building. The problem of room recognition is quite complex. It has been observed in the literature that one of the most robust algorithms that can be used for this purpose is the one based on SIFT keypoint histograms classified using Adaboost [18][19]. It can individuate a set of unique keypoints in an image: as they are nearly invariant respect to illumination and viewpoint changes, comparing the keypoints of a photo belonging to an unknown environment with a set of images stored in a database, we can identify the actual location of the blind. As the power of the main CPU of the portable console is insufficient to execute the SIFT algorithm on board, a workaround has been implemented: when the user presses a button, a photo captured by camera is sent via wi-fi to a server, called BlindServer that performs the analysis (see Fig. 2). The obtained result is then resent to the console and passed to the voice synthesizer.
C. Optical Char Recognition Mode Blind Assistant can also read out a text captured by the video camera (see Fig. 3). The console can be equipped with a dedicated camera which features a convenient high resolution (images of the text are captured with a resolution of 1024x768) for this task. Unfortunately, it uses a manually adjustable focus which is unsuitable for blind use. This results in poor performances when the text is small: it is reasonable that a better, autofocus camera would produce better results. However, the performances are fully satisfactory for the original requirements we got from the potential users, who requested a reader for large and short text such as the labels on food boxes. The optical character recognition system in Blind Assistant works as follows: the user presses a key on the keypad and utters the command: Enable optical char recognizer. At this point, Nanodesktop switches to OCR mode and starts capturing the high-resolution images. When a text is to be read, the device is placed about 5 cm. away from the page and a second key is pressed: a high-resolution image is captured by the video camera.
Fig. 3. A screenshot of Blind Assistant in optical character recognition mode.
Fig.2. BlindServer. The program has still received by the console a photo of the room where the blind actually is. It runs the SIFT analysis, returning the result to the mobile device via wireless link.
For the recognition we have used a software called Tesseract, that is actually one of the best open-source OCR engine available. We developed a version of the program, called ndTesseract, specifically designed for Nanodesktop environment. In our test, on the 333 MHz RISC processor equipping the console, ndTesseract was able to scan an image of 1024x768 pixels in 120 seconds, with an average recognition rate of 80%. The recognition time can be seem quite long: anyway, the user can stop in any moment the scanning process, simply pressing a button of the console. The text obtained by the OCR engines is generally affected by grammar mistakes. For this reason, a correction algorithm based on a linguistic dictionary is always used before synthesizing the results.
D. Color Scanner Mode There are cases where a blind person may want to know the color of an object, or an item of clothing. To solve this problem, Blind Assistant has a color scanner, software that detects the dominant color of the pixels in an image picked up by the webcam. To enable the color scanner, the user presses a key and utters the command Enable color scanner. The system switches the video camera to the RGB 8 bit color mode with a resolution of 480x272 pixels. Pressing another key activates analysis of the colors present in a frame. The system calculates the average RGB value of the pixels, performs a correction taking the average brightness of the image into account, and then determines the dominant color by means of a maximum similarity comparison between the RGB values and those contained in a look-up table. The result is communicated to the user through the speech synthesizer. E. Mail Reader Mode Blind Assistant can also be used for services which exploit the speech synthesizer, the speech recognition system and the wireless connection made available by the game console to provide the user with information. This is of great importance for blind users because it allows them to receive information and messages without the need to use a PC. To activate the mail reader mode the user presses a key and utters the command Enable mail reader. He then presses a second key to activate a reading session. The system connects to the POP3 server and downloads any email. The blind user is notified about the sender and subject of each mail (see Fig. 4). The system then asks for authorization to read (the user replies with a spoken command). The content of the mail is read word by word by the speech synthesizer.
F. Barcode Reader Mode A barcode reader can help the visually impaired person to recognize an object or an obstacle, if it is adequately tagged with a barcode label. We can use a more sophisticated version of this concept: a two dimensional barcode, coded in a open standard, can offer robustness against noise, automatic error correction, automatic detection in the frame. Blind Assistant, starting from the version CFW0006, integrates a Data Matrix Scanner, a module able to recognize automatically a label that contains a 2d-barcode (see Fig. 5). The system uses the DMTX library, compatible with the ISO/IEC 16022:2006 standard, that has been ported for this purpose to the Nanodesktop environment. In our tests, the system was able to recognize a label of 5x5 cm. at a maximum distance of 80 cm. (with the standard camera provided by the producer of the console: a better sensor would probably produce better performances). The system has an invariant behavior respect to rotations of the camera or variations of the pose. The frequency of the scanner is 1 Hz. The system is able to automatically detect when a matrix is present in the frame: in this case, it decodes that, and passes its content to the ndFLite voice synthesizer system, reading to the user the content of the matrix.
Fig. 5. A screenshot of Blind Assistant in barcode reader mode, recognizing a tagged door from a distance of about 80 cm.
VI.
DEVELOPING BLIND ASSISTANT ON TOP OF NANODESKTOP
Fig. 4. A screenshot of Blind Assistant in mail reader mode: the content of the message is read word-by-word by the speech synthesizer.
Designing and developing Blind Assistant on top of Nanodesktop and a video game console was quite an experience. The software platform itself offered us a wide range of functions allowing us, in theory, to fulfill most of the requirements and the underlying hardware is powerful enough to perform with acceptable speed most of the required computation. The availability of full support for porting open source applications made for PC let us freely pick from a huge collection of powerful applications, mostly designed for Linux, to implement those functionalities not covered by the platform.
The console also features well designed, affordable controls (buttons and a cloche) easily distinguishable without the need of seeing them (blind operation). In particular, the buttons are well spaced and located on distinguishable places and have informative (e.g. arrow) shapes. The wireless connectivity allows Internet communication and a proprietary camera allows image acquisition at an acceptable frame rate. With all those promising premises, we expected the real challenges to be the integration of those applications specifically developed or ported to the platform within the Nanodesktop environment and the realization of an adequate user interface able to reliably and promptly respond to the blind user commands. Indeed, the user interface is still under refinement and will probably take the longest and hardest effort, and application integration required a huge effort and a bit of interaction with the developers of some of the applications. But we also had a hard time because of the lack of host USB connectivity on the console we have chosen. We originally thought that the availability of a plug-in camera, yet proprietary, would suffice. As described in Section 7, this was not the case. We found that the proprietary camera is definitely not the best choice for real use and that a host USB port is also needed for several other reasons, such as the ability to connect a GPS module (useful for outdoor orientation), a Bluetooth dongle (useful for connecting the device to a cellular phone and obtain 3G/GPRS data connectivity). Developing and porting the required applications under Nanodesktop was quite easy, even though we had to modify some parts of it in order to support some special needs with the necessary speed and reliability. For example, we had to develop a new memory management module because the Tesseract OCR performances decay very fast during its use due to the huge amount of requests for small memory blocks it makes. We however consider such additional work as an improvement of the Nanodesktop platform, which finally proved its ability to support a very wide range of applications. VII. TESTING AND EVALUATION We conducted a set of tests asking a group of visually impaired people to accomplish a sequence of tasks by exploiting Blind Assistant. The participants were initially instructed on how to use the system and calibrated the device for each participant. We took note about each test and distributed at the end of the campaign a questionnaire with a set of questions about how the user felt while using the system, how accurate it was, what it was lacking, which functions worked and which did not. Most of the questions required a numerical answer, ranking from 1 to 5, or yes/not answers. Again, the questionnaire ended with a request for I wish I had" indications, regarding further improvements for the device at hand. For the sake of brevity we only report some salient information which we could extract from the compiled forms, describing the numerical scores in terms of mean value
(M) and standard deviation (SD). The participants to the tests where 15 visually impaired, 11 men and 4 women, ranging from 27 to 41 years old. All of them were accustomed to using cell phones. The overall score for the tested device and the described set of functions, was encouraging (M=4.73, SD=0.44), while the specific tests evidenced that the functions which were perceived as most reliable, useful and enjoyable where barcode reading, OCR, email reading and face recognition. More in detail: OCR received a high score (M=4.12, SD=0.38) both for interest and reliability: we invited the users to try it on food boxes, book covers, CD covers. The face recognition was considered as reliable and interesting (M=3.82, SD=1.16) even though the exploited algorithm featured an accuracy close to 80%. After further investigation we found that the high ranking for this application was due to the very efficient and effective face detection algorithm. The reliable face detection produced in the users the perception that the whole face recognition algorithm was reliable. The room recognition function was not considered fully reliable and thus received a low score (M=2.21, SD=0.87). This was mainly due to the camera characteristics and due to an excessive sensitivity of our implementation of the algorithm to changes in illumination. The barcode reading function received a very high score (M=4.79, SD=0.37) because it was considered reliable and the idea of tagging objects and places with 2d barcodes (see [20]) proved to be quite effective. Color reading was considered as reliable but not particularly needed. All the users confirmed that they did not feel encumbered by the necessity of keeping at hand the handheld device. As expected, most of them noticed that the size could have been smaller but that they enjoyed the disposition of the buttons and the cloche much more than the layout of most common handheld devices. The speech-based interface was considered not fully reliable (M=3.81, SD=1.16): in some cases the need for repeating the spoken commands when not correctly recognized was considered irritating and all the users required some form of small headset, preferably wireless, to render the speech-based interaction less evident. We therefore plan to reduce the importance of spoken commands in Blind Assistant, while keeping the spoken feedback. Overall, the perceived reliability of the proposed functions is strongly influenced by the user interface: when a simple and affordable fallback function is immediately available, for example, the failure of the primary function does is often neglected by the user. A typical example is the availability of a button-based interface as a fallback for the speech-based interface.
Finally, the layout of the hardware device is critical: most users expressed satisfaction (M=4.58, SD=0.11) for the presence of physical buttons and cloche, with informative shapes and spatially well distributed. No one of them was interested in touch screen devices and other virtual" interfaces. Also, most of them manifested some degree of intolerance to very small keyboards such as those equipping some smartphones and handheld devices and explicitly asked to keep the current layout featuring relatively large and well spaced buttons". VIII. CONCLUSION AND FUTURE WORKS Blind Assistant is not intended to be a full-edged product for the final users: it is more a proof-of-concept, showing that several, extremely useful technologies, able to offer to the user a better quality of life, can be considered mature enough to be ported on portable or handheld devices and that an open source platform running on widely available hardware is a viable choice to create a fully open architecture, accessible to the largest community of users and developers. We learned some interesting lessons, which we report in the paper, from both the design and implementation processes and from the test campaign. The next version of Blind Assistant will probably be developed on top of some different hardware. We are still considering which platform would be most interesting, given the large set of constraints listed above. We are evaluating other real-time face recognition algorithms available in the literature that are most suitable in terms of accuracy and computational complexity, like Fisherfaces (LDA - Linear Discriminant Analysis) [21]. We are also working on porting a series of applications for the recognition of road signs and traffic lights. Finally, we are investigating the possibility of adding an ultrasound sensor driver, thus developing a system of collision avoidance which could be used to avoid obstacles. REFERENCES
[1] [2] W. Barfield and T. Caudell, Fundamentals of wearable computers and augmented reality, Lawrence Erlbaum Associates, 2001. J. Kim and H. Jun, Vision-based location positioning using augmented reality for indoor navigation, IEEE Trans. Consumer Electron., vol. 54, no. 3, pp..954-962, Aug. 2008 P. Narasimhan, Assistive embedded technologies, IEEE Computer, vol. 39, no. 7, pp. 85-87, Jan. 2006. K. Shinohara, Designing assistive technology for blind users, Assets '06: Proceedings of the 8th international ACM SIGACCESS conference on computers and accessibility, Portland, OR, pp. 293-294, Oct. 2006. T. Watanabe, Merits of open-source resolution to resolve a digital divide in information technology, Proceedings of the First International Conference on The Human Society and the Internet Internet Related Socio-Economic Issues, Seul, Korea, pp. 92-99, July 2001. T. Hedgpeth and J. A. Black, A demonstration of the iCARE portable reader, Assets06: Proceedings of the 8th international ACM SIGACCESS conference on computers and accessibility, Portland, OR, pp. 279-280, Oct. 2006. M. Rana, M. Cirstea and T. Reynolds, Developing a prototype using mobile devices to assist visually impaired users, IEEE International
[8]
[9]
[10]
[11] [12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
Symposium on Industrial Electronics, Cambridge, UK, pp. 1826-1830, June 2008. A. Helal and S.E. Moore, Drishti: an integrated navigation system for visually impaired and disabled, Proceedings of the 5th International Symposium on Wearable Computer (ISWC 2001), Zurich, Switzerland, pp. 149-156, Oct. 2001. N. Bourbakis and D. Kavraki, Tyflos: an intelligent assistant for navigation of visually impaired people, Proceedings of the IEEE 2nd International Symposium on Bioinformatics and Bioengineering Conference, Bethesda, MD, pp. 230-235, Nov. 2001. C. Mancas-Thillou, S. Ferreira, J. Demeyer, C. Minetti, and B. Gosselin, A multifunctional reading assistant for the visually impaired, EURASIP International Journal. on Image and Video Processing, vol. 2007, no. 3, pp. 1-11. S. Upson, Tongue vision: a fuzzy outlook for an unpalatable technology, IEEE Spectrum, vol. 44, no. 1, pp. 44-45, May 2007. E. Blevis and M. A. Siegel, The explanation for design explanations, 11th International Conference on Human-Computer Interaction: Interaction Design Education and Research: Current and Future Trends, Las Vegas, NV, 2005. K. Shinohara and J. Tenenberg, A blind person's interactions with technology, Communications of the ACM, vol. 52, no. 8, pp. 58-66, Aug. 2009 M. Blythe, A. F. Monk, and J. Park, Technology biographies: field study techniques for home use product development, Computer Human Interaction 2002 Extended Abstracts, pp. 658-659, 2002. F. Battaglia, G. Iannizzotto, and F. La Rosa, An open and portable software development kit for handheld devices with proprietary operating systems, IEEE Trans. Consumer Electron., vol. 55, no. 4, pp. 2436-2444, Nov. 2009. P. Viola and M. Jones, Rapid object detection using a boosted cascade of simple features, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 511-518, Dec. 2001. M. A. Turk and A. P. Pentland, Face recognition using eigenfaces, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Proceedings CVPR '91, pp. 586-591, 1991. D. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004. B. Ayers and M. Boutell, Home interior classification using SIFT keypoint histograms, CVPR07: Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, pp. 1-6, June 2007. G. Iannizzotto, C. Costanzo, P. Lanzafame, and F. La Rosa, Badge3d for visually impaired, in CVPR '05: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, p. 29-37, June 2005. P. Belhumeur, J. P. Hespanha, and D. J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 19, no. 7, pp. 711-720, July 1997. BIOGRAPHIES
[3] [4]
[5]
Filippo Battaglia received the M.S. degree in electronics engineering from University of Messina, Italy, in 2008. He is the main developer of the Nanodesktop team and currently collaborates with VisiLAB, Computer Vision Laboratory of the University of Messina. His research interests include artificial vision, vocal synthesis, theory of the operating systems, micro-optoelectronics and communication systems. Giancarlo Iannizzotto (M99-SM09) received the M.D. degree in electronics engineering from the University of Catania, Italy, in 1994 and the Ph.D in computer science from the same University in February 1998. From 1996 to 2006 he was Assistant Professor at the Faculty of Engineering, University of Messina, Italy. Since 2006 he is Associate Professor at the same faculty. His research activity is in the field of image analysis and processing, as the leader of the VisiLAB at the University of Messina.
[6]
[7]

Blind Assistant - Rev 03

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Blind Assistant - Rev 03

Încărcat de

Drepturi de autor:

Formate disponibile

An Open Architecture to Develop a Handheld Device for Helping Visually Impaired People

Filippo Battaglia and Giancarlo Iannizzotto, Member, IEEE

Fig. 1. A screenshot of Blind Assistant in face recognition mode

Fig. 3. A screenshot of Blind Assistant in optical character recognition mode.

DEVELOPING BLIND ASSISTANT ON TOP OF NANODESKTOP

S-ar putea să vă placă și