Sunteți pe pagina 1din 6

ENHANCING ART HISTORY EDUCATION THROUGH MOBILE AUGMENTED

REALITY
Ann M. McNamara
Department of Visualization, Texas A&M University
College Station, Texas, USA

Abstract

to capitalize on AR to influence gaze and visually guide the viewer


through a scene has gone largely unexplored. The goal of this work
is to use the dynamic nature of AR elements to provide a heightened perception of the real world. We propose using eye-tracking,
to determine where users look in an augmented scene, and then
use this information to accomplish these goals. The objectives for
this work are to a) develop models that can transform eye-tracked
information into interest, b) develop display visualizations that
use these models to present more informed content which does
not distract from the users task, and c) use AR to give visual
distinction to areas in the real world in order to visually guide the
viewer to important or interesting features.

This paper describes a new project which will focus on the integration of eye-tracking technology with mobile Augmented Reality
(AR) systems. AR provides an enhanced vision of the physical
world by integrating virtual elements, such as text and graphics,
with real-world environments. The advent of affordable mobile
technology has sparked a resurgence of interest in mobile AR
applications. Inherent in mobile AR applications is the powerful
ability to visually highlight information in the real world. We are
working on new algorithms to harness this ability to direct gaze
to Points of Interest (POIs). Combining mobile AR and image
manipulation gives visual distinction to POIs in order to directly
influence and direct gaze in real world scenes. Our initial test
domain is that of Art History Education. POIs are determined
based on salient regions of paintings, as identified by the visual
narrative of the painting. We are developing new system that
will be deployed in the Museum of Fine Art in Houston that will
enhance visitor education through the use of gaze-directed mobile
AR.

Imagine a scenario in which an Art History major is trying to


improve his visual literacy skills. Narrative art tells a story, either
as a moment in an ongoing story or as a sequence of events
unfolding over time. A synoptic narrative depicts a single scene
in which a character(s) are portrayed multiple times within a
frame to convey that multiple actions are taking place. This can
cause the sequence of events to be unclear. Synoptic narratives
typically provide visual cues that convey the sequence, but still
might be difficult to decipher for those unfamiliar with the story.
For example, the student is studying The Tribute Money by
Renaissance artist Masaccio, Figure 1. This painting describes
a scene from the Gospel of Matthew, in which Jesus directs Peter
to find a coin in the mouth of a fish in order to pay the temple
tax. The optimal way to visually navigate this piece is to begin
in the center with the tax collector demanding the money, Jesus
surrounded by his disciples instructs Peter to retrieve the money
from the mouth of a fish. By moving their gaze to the left of
the painting (perhaps counter-intuitive to western civilization
who normally read left to right) viewers notice Peter executing
Jesus instruction. The viewers eyes finally need to travel to the
extreme right of the painting to view the third episode in which
Peter pays the tax collector. At the time it was painted, audiences
understood the order in which each episode of the painting was
to be viewed to convey the correct story. However, our ability,
as artists and audiences, to correctly read these paintings may
not be so accurate in present day because our visual literacy is not
conditioned to follow the viewing pattern the artist intended. While
web-based solutions exist to show the narrative, they manipulate
a digital representation of a painting using strong outlines, or
interruptive text over the image to explain where the viewer should
look, Figure 1. While these represent a good first start, a more
elegant solution would not interrupt the visual experience of
the audience. Employing mobile AR devices with eye-tracking
capabilities would allow the viewer to see the actual painting
with areas of interest accentuated in a manner which protects the
visual experience. This scenario illustrates the need to display
information in a manner that minimizes disruption to the view,
but can direct gaze to certain locations of an image, in a specific
sequence.

Introduction

Figure 1: Current web-browser based educational tools use text


pop-ups and rectangular outlines to highlight important information in a visual narrative. This not only distracts the viewer from
appreciating the image, but also breaks up the image into smaller
pieces so it is not viewed in a holistic manner. The red colored rectangle destroys the visual experience by superimposing distracting
overlay on the original painting.
AR applications use factors such as location and navigation
direction to deliver contextual information to the user [Azuma
1997] citePapa:2008 [Tumler et al. 2008] [Feng et al. 2008]
[Sielhorst et al. 2008] [Carmigniani et al. 2011]. The opportunity
e-mail:

ann@viz.tamu.edu

Copyright 2011 by the Association for Computing Machinery, Inc.


Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for commercial advantage and that copies bear this notice and the full citation on the
first page. Copyrights for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on
servers, or to redistribute to lists, requires prior specific permission and/or a fee.
Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail
permissions@acm.org.
VRCAI 2011, Hong Kong, China, December 11 12, 2011.
2011 ACM 978-1-4503-1060-4/11/0012 $10.00

Now imagine an AR scenario that accounts for where the user is


looking on their mobile device, and delivers content based on gaze
location. Not only that, but it delivers that information to an area

507

of the screen that will not obstruct image features that are (or will
become) important to the user. Also, imagine a complementary AR
system that can influence where viewers look in a scene, both spatially and temporally. This work proposes strategies to realize these
AR scenarios. The ideal outcome is an eye-tracking AR system
that is fully integrated into mobile devices and can inform AR applications on the optimal placement of AR elements based on gaze
information, and also manipulate AR elements to direct visual attention to specific regions of interest in the real world.

mation. The overlay may include virtual text, images, web links or
even video. AR applications for education are exploding in many
academic arenas [Medicherla et al. 2010] [Kaufmann and Schmalstieg 2002] [Kaufmann and Meyer 2008]. What separates this work
from existing applications is the integration of eye-tracking. Eyetracking gives information about where the student is looking, what
they are looking at and they have actually looked at all the most
pertinent regions. Also, if they are not looking where they are
supposed to, subtle techniques will be introduced to draw attention back to those region. These may be subtle but do not have to
be: the most promising solution may iteratively increase in strength
until the users gaze is drawn to that location. Building on SGD we
plan to incorporate innovative ways to attract and focus attention to
visual information in AR mobile applications. The focus (initially)
is on on Art History Education, although the ideas presented here
have potential application in many disciplines. A healthy number of
(mobile) AR applications have successfully been applied in the Art
domain [Gwilt 2009] [Damala et al. 2008] [Andolina et al. 2009]
[Bruns et al. 2007] [Choudary et al. 2009] [Chou et al. 2005]
[Srinivasan et al. 2009]. To date, however, none have proposed
eye-tracking as an added dimension. The novelty of this approach
lies in the eye-tracking and in attracting and directing the gaze to
the correct region of the artwork in a sequence that will encourage
appropriate visual navigation and understanding of the image and
strengthen observation skills.

A healthy number of (mobile) AR applications have successfully


been applied in the Art domain [Gwilt 2009] [Damala et al. 2008]
[Andolina et al. 2009] [Bruns et al. 2007] [Choudary et al. 2009]
[Chou et al. 2005] [Srinivasan et al. 2009]. To date, however, few
have proposed eye-tracking as an added dimension. The novelty of
this approach lies in the eye-tracking and in attracting and directing
the gaze to the correct region of the artwork in a sequence that will
encourage appropriate visual navigation and understanding of the
image and strengthen observation skills

Approach
1.1

Eye-Tracking

Eye tracking refers to techniques used to record and measure eye


movements [Yarbus 1967]. Recent years have seen a rapid evolution of eye-tracking technology with systems becoming cheaper,
easier to operate and less intrusive on the viewer[Duchowski 2003].
Generally eye-tracking data is analyzed in terms of fixations and
saccades. Saccades are rapid eye movements used to position the
gaze. During each saccade, visual acuity is suppressed and the visual system is in effect blind. Only during fixations is clear vision possible. The brain virtually integrates the visual images that
we acquire through successive fixations into a visual scene or object. Eye-tracking systems first emerged in the early 1900s [Dodge
1900; Dodge and Cline 1901] (see [Duchowski 2003] for a review
of the history of eye-tracking). Until the 1980s, eye-trackers were
primarily used to collect eye movement data during psychophysical experiments. This data was typically analyzed after the completion of the experiments. During the 1980s, the benefits of realtime analysis of eye movement data were realized as eye-trackers
evolved as a channel for human-computer interaction [Levoy and
Whitaker 1990]. More recently, real-time eye-tracking has been
used in interactive graphics applications [Cole et al. 2006] [DeCarlo and Santella 2002] [Hyona et al. 2003] [Bourlon et al. 2011]
and large scale display systems to improve computational efficiency
and perceived quality. These techniques follow gaze. For this work
we need to influence gaze. Subtle Gaze Direction (SGD)[Bailey
et al. 2011] [Bailey et al. 2009] [Bailey et al. 2007] [McNamara
et al. 2009] [McNamara et al. 2008], is a technique that exploits the
fact that our peripheral vision has very poor acuity compared to our
foveal vision. By presenting brief, subtle modulations to the peripheral regions of the field of view, the technique presented here draws
the viewers foveal vision to the modulated region. Additionally, by
monitoring saccadic velocity and exploiting the visual phenomenon
of saccadic masking, modulation is automatically terminated before
the viewers foveal vision enters the modulated region. Hence, the
viewer never sees the stimuli that attracted her gaze. This gaze directing technique can successfully guide gaze about an image.

1.2

Obviously conspicuous objects in a scene (such as a black sheep in


a white flock) will draw the viewers attention first. However, there
are more subtle image characteristics that can also draw our gaze.
Image properties such as color, size and orientation can be used
to control attention [Veas et al. 2011] [Underwood and Foulsham
2006] [Underwood et al. 2009]. We are also researching complementary ways to use AR elements to filter the image under
scrutiny by overlaying virtual templates to highlight or defocus
image details. In movies directors using an arsenal of cinematographic tricks to lead the audience to look where they want them
to look (see [Bordwell 2011]). Taking an automated approach, Itti
and Koch [Itti and Koch 2000] [Itti and Koch 2001] developed an
algorithm to measure visual saliency (how likely people are to look
at parts of an image) on the basis of image characteristics such
as intensity distribution, color changes, and orientation. Saliency
maps could prove to be a good candidate to indicate the initial
attention in a painting. Then by modifying the digital version of
the painting to re-distribute saliency we build several versions of
the painting with the pre-selected interesting regions manipulated
to increase saliency. For example, in The Tribute Money when it
is time to look at Peter retrieving the coin from the mouth of the
fish, image processing of the AR overlay could boost the intensity
contrast in that region and thereby influence the viewer to redirect their gaze. We also plan to iteratively adjust emphasis until
the desired result is achieved. Such a scenario is shown in Figure 2.
We will also investigate gracefully degrading the regions of the
image that are not important at that moment in time. Taking
inspiration from work by DeCarlo and Santella, we can apply a
softening filter to all areas of the AR overlay apart from the area
where the user should be looking. DeCarlo and Santella [Cole
et al. 2006] [DeCarlo and Santella 2002] used an eye tracker to
identify regions in photographs that viewers tended to focus on.
Taking this information, they generated abstract renderings of these
photographs with the other, less interesting regions presented in
reduced in detail. They then used eye tracking to confirm that this
abstraction was effective at directing the viewers gaze. This would
in essence visually fade unimportant information.

Gaze Direction using Mobile AR

Tablet computing and mobile devices promise to have a dramatic


impact on education [El Sayed et al. 2011] [Chen et al. 2011]
Imagine a student holding a tablet computer in front of an artifact
or image and, instantly, that object is annotated with more infor-

This approach will also insure that the viewer does not inadver-

508

2.2

Image Alignment

Aligning the real-world image on the mobile device with the (enhanced) augmented version of the image is necessary for this work.
Two popular algorithms for achieving this are SIFT and SURF.
Scale-Invariant Feature Transform (or SIFT) detects and describes
local features in images. SIFT transforms an image into a large
collection of local feature vectors. Each of these feature vectors
is invariant to any scaling, rotation or translation of the image.
Speeded Up Robust Features (SURF) has similar performance to
SIFT, but executes faster, which is important for mobile devices due
to processing power. OpenSURF, an open-source vision algorithm
to find salient regions in images, forms the basis of many vision
based tasks including object recognition and image retrieval and
will be used to address image recognition and registration [openSURF 2011] [Takacs et al. 2008] [Chen and Koskela 2011].

Existing mobile AR devices use cameras, angular velocity sensors,


and accelerometers to gauge the absolute position of the user [Lane
et al. 2010]. This information is then used to inform placement
of annotations in the virtual view. By contrast, very little work
has focussed on gauging where the users attention is focussed and
leveraging that information for the placement and delivery of AR
elements. Our work uses eye-tracking in conjunction with AR applications to determine where the viewer is looking at each point
in time. We can then use where people look as a mechanism to inform the placement of AR elements in a manner that aligns with the
users visual attention and eliminates ambiguity by hiding information that the user ignores. This will ultimately lead to gaze-aware
mobile AR applications that minimize visual clutter and enhance
visual literacy by eliminating elements that are not being attended
to. This project is in its preliminary stages, and has great potential
to impact human-centered AR systems.

Figure 2: A mock-up of how manipulating the AR version of the


image has the potential to draw visual attention. The image on the
iPad shows the main characters brighter than the other elements to
highlight them(top). An alternative approach is to blur out unimportant information, this is shown in the mock-up on the bottom.
tently miss any areas of importance by directing the gaze about the
painting. Using eye-tracking we can ensure that viewers hit the
highpoints of an image and receive all the salient visual information. The novelty lies in the use eye-tracking and image features to
guide the eye, neither of which will interrupt the visual experience
of viewing the painting, or disrupt the original painting in any way
(unlike current technical approaches, Figure 1). One of the most
attractive features of incorporating eye-tracking in mobile AR is
that the virtual layer of information lights-up only when the user
looks at a certain point. That means the information delivered is
relevant at that particular gaze location, and at that particular time.

2
2.1

Conclusion

References
A NDOLINA , S., S ANTANGELO , A., C ANNELLA , M., G ENTILE ,
A., AGNELLO , F., AND V ILLA , B. 2009. Multimodal virtual
navigation of a cultural heritage site: the medieval ceiling of steri
in palermo. In Proceedings of the 2nd conference on Human
System Interactions, IEEE Press, Piscataway, NJ, USA, HSI09,
559564.
A ZUMA , R. T. 1997. A survey of augmented reality. Presence:
Teleoperators and Virtual Environments 6, 4 (Aug.), 355385.
BAILEY, R., M C NAMARA , A., S UDARSANAM , N., AND G RIMM ,
C. 2007. Subtle gaze direction. In ACM SIGGRAPH 2007
sketches, ACM, New York, NY, USA, SIGGRAPH 07.

Implementation

BAILEY, R., M C NAMARA , A., S UDARSANAM , N., AND G RIMM ,


C. 2009. Subtle gaze direction. ACM Trans. Graph. 28 (September), 100:1100:14.

Image Retrieval

To retrieve the appropriate AR information to present, image recognition is into the application. OpenCV, a library of programming
functions for real time computer vision, will be used in the proposed
work as these functions can easily capture and analyze images and
video, [openCV 2011]. Also, OpenCV has been successfully ported
to work with iOS, the mobile devices operating system. OpenCV
can also handle event input (such as mouse events). Rather than
use x,y position from the mouse, we measure the x,y gaze position
in order to drive the gaze direction events. OpenCV commands are
used to stream video capture to the device e.g( #0 CvCapture
capture = cvCaptureFromCAM(0);).

BAILEY, R., M C NAMARA , A., C OSTELLO , A., AND G RIMM , C.


2011. Impact of subtle gaze direction on short-term spatial information recall. In ACM SIGGRAPH 2011 Talks, ACM, New
York, NY, USA, SIGGRAPH 11.
B ORDWELL ,
D.,
2011.
http
:
//www.davidbordwell.net/blog/2011/02/14/watching
you watch there will be blood/.
B OURLON , C., O LIVIERO , B., WATTIEZ , N., P OUGET, P., AND
BARTOLOMEO , P. 2011. Visual mental imagery: What the

509

heads eye tells the minds eye. Brain Research 1367, 287
297.

G WILT, I. 2009. Augmented reality and mobile art. In Handbook


of Multimedia For Digital Entertainment and Arts, B. Furht, Ed.
Springer US, 593599.

B RUNS , E., B ROMBACH , B., Z EIDLER , T., AND B IMBER , O.


2007. Enabling mobile phones to support large-scale museum
guidance. IEEE MultiMedia 14 (April), 1625.

H YONA , J., R ADACH , R., AND D EUBEL , H. 2003. The minds


eye: cognitive and applied aspects of eye movement research.
North-Holland, Boston.

C ARMIGNIANI , J., F URHT, B., A NISETTI , M., C ERAVOLO , P.,


DAMIANI , E., AND I VKOVIC , M. 2011. Augmented reality
technologies, systems and applications. Multimedia Tools Appl.
51 (January), 341377.

I TTI , L., AND KOCH , C. 2000. A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Research 40, 10-12 (May), 14891506.
I TTI , L., AND KOCH , C. 2001. Computational modelling of visual
attention. Nature Reviews Neuroscience 2, 3 (Mar), 194203.

C HEN , X., AND KOSKELA , M. 2011. Mobile visual search from


dynamic image databases. In Proceedings of the 17th Scandinavian conference on Image analysis, Springer-Verlag, Berlin,
Heidelberg, SCIA11, 196205.

K AUFMANN , H., AND M EYER , B. 2008. Simulating educational


physical experiments in augmented reality. In ACM SIGGRAPH
ASIA 2008 educators programme, ACM, New York, NY, USA,
SIGGRAPH Asia 08, 3:13:8.

C HEN , N.-S., T ENG , D. C.-E., L EE , C.-H., AND K INSHUK.


2011. Augmenting paper-based reading activity with direct access to digital materials and scaffolded questioning. Comput.
Educ. 57 (September), 17051715.

K AUFMANN , H., AND S CHMALSTIEG , D. 2002. Mathematics


and geometry education with collaborative augmented reality. In
ACM SIGGRAPH 2002 conference abstracts and applications,
ACM, New York, NY, USA, SIGGRAPH 02, 3741.

C HOU , S.-C., H SIEH , W.-T., G ANDON , F. L., AND S ADEH ,


N. M. 2005. Semantic web technologies for context-aware museum tour guide applications. In Proceedings of the 19th International Conference on Advanced Information Networking and
Applications - Volume 2, IEEE Computer Society, Washington,
DC, USA, AINA 05, 709714.

L ANE , N., M ILUZZO , E., P EEBLES , D., C HOUDHURY, T., AND


C AMPBELL , A. T. 2010. A survey of mobile phone sensing.
L EVOY, M., AND W HITAKER , R. 1990. Gaze-directed volume
rendering. In Proceedings of the 1990 symposium on Interactive
3D graphics, ACM, New York, NY, USA, I3D 90, 217223.

C HOUDARY, O., C HARVILLAT, V., G RIGORAS , R., AND G UR DJOS , P. 2009. March: mobile augmented reality for cultural
heritage. In Proceedings of the 17th ACM international conference on Multimedia, ACM, New York, NY, USA, MM 09,
10231024.

M C NAMARA , A., BAILEY, R., AND G RIMM , C. 2008. Improving


search task performance using subtle gaze direction. In Proceedings of the 5th symposium on Applied perception in graphics and
visualization, ACM, New York, NY, USA, APGV 08, 5156.
M C NAMARA , A., BAILEY, R., AND G RIMM , C. 2009. Search
task performance using subtle gaze direction with the presence
of distractions. ACM Trans. Appl. Percept. 6 (September), 17:1
17:19.

C OLE , F., D E C ARLO , D., F INKELSTEIN , A., K IN , K., M ORLEY,


K., AND S ANTELLA , A. 2006. Directing gaze in 3D models with stylized focus. Eurographics Symposium on Rendering
(June), 377387.

M EDICHERLA , P. S., C HANG , G., AND M ORREALE , P. 2010. Visualization for increased understanding and learning using augmented reality. In Proceedings of the international conference on
Multimedia information retrieval, ACM, New York, NY, USA,
MIR 10, 441444.

DAMALA , A., C UBAUD , P., BATIONO , A., H OULIER , P., AND


M ARCHAL , I. 2008. Bridging the gap between the digital and
the physical: design and evaluation of a mobile augmented reality guide for the museum visit. In Proceedings of the 3rd international conference on Digital Interactive Media in Entertainment
and Arts, ACM, New York, NY, USA, DIMEA 08, 120127.

OPEN CV,

2011. http : //opencv.willowgarage.com/wiki/.

OPEN SURF,

D E C ARLO , D., AND S ANTELLA , A. 2002. Stylization and abstraction of photographs. ACM Trans. Graph. 21 (July), 769
776.

2011. http : //www.chrisevansdev.com.

S IELHORST, T., F EUERSTEIN , M., AND NAVAB , N. 2008. Advanced medical displays: A literature review of augmented reality. J. Display Technol. 4, 4 (Dec), 451467.

D ODGE , R., AND C LINE , T., 1901. The angle velocity of eye
movements.

S RINIVASAN , R., B OAST, R., F URNER , J., AND B ECVAR , K. M.


2009. Digital museums and diverse cultural knowledges: Moving past the traditional catalog. The Information Society 25
(July), 265278.

D ODGE , R., 1900. Visual perception during eye movement.


D UCHOWSKI , A. T. 2003. Eye Tracking Methodology: Theory and
Practice. Springer-Verlag New York, Inc., Secaucus, NJ, USA.

TAKACS , G., C HANDRASEKHAR , V., G ELFAND , N., X IONG , Y.,


C HEN , W.-C., B ISMPIGIANNIS , T., G RZESZCZUK , R., P ULLI ,
K., AND G IROD , B. 2008. Outdoors augmented reality on mobile phone using loxel-based visual feature organization. In Proceeding of the 1st ACM international conference on Multimedia information retrieval, ACM, New York, NY, USA, MIR 08,
427434.

E L S AYED , N. A. M., Z AYED , H. H., AND S HARAWY, M. I.


2011. Arsc: Augmented reality student card. Comput. Educ. 56
(May), 10451061.
F ENG , Z., D UH , H. B. L., AND B ILLINGHURST, M. 2008. Trends
in augmented reality tracking, interaction and display: A review
of ten years of ismar. In Proceedings of the 7th IEEE/ACM International Symposium on Mixed and Augmented Reality, IEEE
Computer Society, Washington, DC, USA, ISMAR 08, 193
202.

T UMLER , J., D OIL , F., M ECKE , R., PAUL , G., S CHENK , M.,
P FISTER , E. A., H UCKAUF, A., B OCKELMANN , I., AND
ROGGENTIN , A. 2008. Mobile augmented reality in industrial
applications: Approaches for solution of user-related issues. In

510

Proceedings of the 7th IEEE/ACM International Symposium on


Mixed and Augmented Reality, IEEE Computer Society, Washington, DC, USA, ISMAR 08, 8790.
U NDERWOOD , G., AND F OULSHAM , T. 2006. Visual saliency
and semantic incongruency influence eye movements when inspecting pictures. Q J Exp Psychol (Colchester) 59, 11 (Nov.),
19311949.
U NDERWOOD , J., T EMPLEMAN , E., AND U NDERWOOD , G.
2009. Attention in cognitive systems. Springer-Verlag, Berlin,
Heidelberg, ch. Conspicuity and Congruity in Change Detection,
8597.
V EAS , E. E., M ENDEZ , E., F EINER , S. K., AND S CHMALSTIEG ,
D. 2011. Directing attention and influencing memory with visual saliency modulation. In Proceedings of the 2011 annual
conference on Human factors in computing systems, ACM, New
York, NY, USA, CHI 11, 14711480.
YARBUS , A. 1967. Eye movements and vision. Plenum Press, New
York.

511

512

S-ar putea să vă placă și