Documente Academic
Documente Profesional
Documente Cultură
2
(a) (b) (c) (d)
Figure 2. (a) The active user appears at the center of the radar. The distance of the dot at the center of the radar to any other dot
represents the proximity of the users represented by these dots. The color of a dot represents the status of the associated user. By
moving the mouse pointer over the radar we can discover more information for each user. (b) Activity indicators inform a user about
the recent activities of other people in the neighborhood. (c) By right-clicking on a person, options to invite him to a private chat
or to put him at the center of the radar are provided. (d) Choosing another person at the center of the radar reveals the users in his
neighborhood.
3.1.1 The Semantic Neighborhood Radar equate in our case; they cannot be applied on a dynamic
environment where the 2D layout needs to be regularly up-
Visualizing a concept is often challenging since one needs dated.
to balance between informative, comprehensive and compu- The radar metaphor evokes the proximity functionality
tationally feasible interfaces. We choose to represent the in- we discussed, but also adds new features such as:
formation of recently relevant users using a radar metaphor.
A radar in the real world, that operates on an object x (the • Representing the Time-axis on the Radar: An es-
reflector), scans a wide area, measures the distance of other sential aspect when representing users on the radar is
objects to x and presents these objects along with their dis- to clearly indicate whether they are (recently) active or
tances from x on a display. inactive. Active users are represented as green dots,
In our case, objects are users and distance is a metric of while inactive users are represented as red dots. Fur-
user to user proximity. This visualization enables users to thermore, we would like to represent how recently a
instantly conceive the details of our conceptual model. It specific user has been relevant to the active user. To
also has the nice property of displaying a lot of information represent this information for each dot we use a spec-
in a restricted area. trum of its color (either green or red) that spans from
The radar represents people as dots. The active user, for dark to light, with darker meaning more recently. The
which the radar is defined, appears at the center of the radar, active user is therefore always represented as a dark
while the people most relevant to the active user are plotted green dot at the center of the radar (see Figure 2(a)).
on the radar in distances from the center that respect the
• Action indicators: In addition to who is relevant and
computed proximity of each user to the active user (see Fig-
to what degree, we would like to also make available
ure 2(a)). Conceptually, this represents a semantic neigh-
information of what people in our neighborhood are
borhood around the active user, with the captured semantic
doing. To this end, we design action indicators that
being the correlation of recent user navigational patterns.
track the activity of people in the neighborhood and
Note, however, that when we are placing dots (users) on communicate the actions to the user (e.g., who got on-
the radar we do not ask to respect all pair-wise similari- line, etc.). Action indicators are visualized in the form
ties of all users. That case usually appears in the litera- of a balloon assigned to a specific user. By this way,
ture as the ordination problem, where we need to represent despite the stateless environment on which browsers
n-dimensional data by a small number of salient dimen- operate, we are able to track the state of the radar in-
sions and thus be able to display multivariate data on the formation (see Figure 2(b)).
two-dimensional surface. The main tool for the ordination
problem is the family of the dimensionality reduction tech- • Private Chat: One of the direct communication fea-
niques, including eigenvalue decomposition, multidimen- tures that are provided by the system is the private chat.
sional scaling, latent semantic analysis and more. Despite A user is able to start a private chat conversation (af-
the wide adoption of these techniques in various problem ter invitation) with any of the users presented in the
areas, they suffer from limitations that render them inad- semantic neighborhood radar. (see Figure 2(c)).
3
• Exploration of other Neighborhoods: Another fea- Such click-through information presents a challenging
ture of the radar is that it provides the possibility to set opportunity for analysis and mining with the goal of per-
another user at its center. By this way, one can explore sonalization and has been extensively used in research
the neighborhood (the relevant users) of another user. [15, 2, 16]. However, most existing approaches use the
By traversing from a neighborhood to another, one can click-through data to devise similarity measures with lit-
discover and communicate with more people (see Fig- tle consideration of the temporal factor. At the same time,
ure 2(c) and Figure 2(d)). these data are often dynamic and contain rich temporal in-
formation [19]. In this section we present a time-dependent
3.1.2 Website-based Chat similarity model that exploits the temporal characteristics of
historical click-through data. The intuition is that since the
The website-based chat allows people that coexist in a web- information needs of users change through time, user pro-
site, during their navigation, to directly communicate with filing algorithms should take into account the timestamps
each other. Navigating a website automatically makes you of the historical click-through data in order to identify re-
a part of that website’s virtual public chat room. There- gions of significant similarity that may be a consequence of
fore communication between all users in a specific website functional relationship.
is enabled (see Figure 1). Note, that our system enables
Formally, given the unified web history log H of all users
the concurrent communication of users at any website and
and with respect to a temporal factor T , we would like to
is not the same with the Web applications that allow to a
find a set of users that have similar web history in the time
website owner to directly communicate with its visitors. In-
period bounded by T . In other words, we would like to
terestingly, the website-based chat forms the foundation for
identify users that recently navigated same websites. To ex-
transforming a space to place [8].
press the temporal factor we use as a surrogate for time the
number of the last T records in the unified log H and oper-
3.1.3 Sharing Information Spaces ate on the subset H T ⊆ H of the last T visits. In similar
Pointing out interesting information in a collaborative sys- manner if we would like to constrain a specific user’s u his-
tem is essential. Beyond (private or public) chatting, our tory log to its last S visits we write HuS ⊆ Hu .
system further supports exchange of information in the form Note that in order a web browser client to show the radar
of a shared history feature. During navigation, a user may to a user u, it needs to communicate with the main server
share websites (along with tags) with the people that ap- and retrieve the required information. This information con-
pear in his radar. A user receiving suggestions for websites sists of the set of users relevant to u along with their asso-
would need to click on a user’s dot at the radar to indicate ciated similarity values. Instead of computing this informa-
his intention to see this list (see Figure 1). tion whenever a client sends a request asking for the infor-
mation to show on the radar, we perform all computations
3.1.4 Collaborative Annotation System in a preprocessing phase that takes place in specific time
intervals and cache the results. The premise of the prepro-
Our system provides the functionality to annotate a website cessing is that whenever a client requests information, this
with a set of keywords. By this way, a set of keywords would be promptly available. Therefore we require that the
is assigned to each website coming from different users at client-server communication takes place in periodic time in-
different times and therefore define a collaborative website tervals that allow for the pre-computation phase to finish.
annotation system (see Figure 1). These keywords can then Algorithm 1 describes the steps of the preprocessing phase.
form the basis for a number of applications. Algorithm 1 takes as parameters the unified history log
H, the temporal factor T , the temporal factor S and the
4 Algorithms maximum number of relevant users to be retrieved for each
user k. First, it forms the set H T of the last T records of
With millions of users accessing billions of webpages the unified log H. Then, it identifies the set of unique users
everyday one would consider a visit of a user u to a web- U in the set H T and for each user u ∈ U retrieves the set
site w at time t to be the primitive action that takes place of its temporal history log HuS that corresponds to its last
in the overall web browsing activity. From a user’s u per- |S| log records from H. For each pair of users ui , uj in U
spective the chronologically ordered sequence of visits to it computes their affinity by computing the overlap of their
a number of webpages defines a web history log Hu . Our temporal history sets HuSi and HuSj and saves the score to
system functions as the aggregation point of these individ- array A. For each user u ∈ U it computes the top-k users
ual history logs by defining a unified web history log H. In according to the scores in A and saves them in L. Finally,
its most simple form, H consists of a chronologically or- L is returned that keeps information about the most relevant
dered sequence of visit records of the form (u, w, t). users of each user along with their scores.
4
Algorithm 1 Finds Relevant Users where nw is the number of times w appears in H T . This
1: procedure C OMPUTE R ELEVANT U SERS(H, T , S, k) count is normalized to prevent a bias towards longer logs
2: L is a HashTable of the form L < u, < Set >> and to give a measure of the importance of the website w
3: A is a HashTable of the form A << i, j >, Ai,j > within the particular log H T . Then, a weight zw of an ele-
4: H S is a HashTable of the form H S < u, HuS > ment in the set is defined as follows:
5: H T = getLastRecords(H, T ) 1
6: U = getU niqueU sers(H T ) zw =
fw
7: for all u ∈ U do
8: HuS = getLastRecords(Hu , S) We may now consider a weighted version of our set simi-
9: end for larity algorithms, where there is a weight ze associated with
10: for all ui ∈ U do each set element e (i.e., with each website). Our approach
11: for all uj ∈ U do is to convert a weighted set into an unweighted bag by mak-
12: if ui 6= uj && i < j then ing ze copies of each element e. We use standard rounding
13: Ai,j = Af f inity(HuSi , HuSj ) = s(i, j) techniques if weights ze are non-integral. In that case, the
14: end if similarity s between i and j is defined as:
15: end for
16: end for P
17: for all u ∈ U do S ∩H S | zw
w∈|Hu
iu j
18: Lu = getT opK(A, k, u) s(i, j) = P P (3)
max( S |
w∈|Hu zw , S | zw )
w∈|Hu
19: end for i i
5
their intent into three classes: navigational, informational the subjects into four groups of four subjects each. The
and transactional. For the needs of our experimental evalu- first two groups (group1, group2) were assigned the query
ation we focus on informational queries (IQ), for which an- U IQ1 and the other two groups (group3, group4) were as-
swers are assumed to be present on many web pages. These signed the query U IQ2 from Table 1.
queries are the most likely to be benefited by social navi- Then, subjects were asked to fulfill their assigned task
gation tools, since the information seeking task requires to for a predefined time period. The subjects of group1 and
constructively select information from several sources. We group3 were allowed to use the browser extensions provided
further distinguish the informational queries into two cate- by our application to communicate their findings, while the
gories; ambiguous (AIQ) and unambiguous (UIQ). AIQs do subjects of group2 and group4 were asked to use a typical
not require that a specific answer is sought, while UIQs usu- web browser.
ally look for a specific answer. For the various experimental The objective of this experiment is to evaluate whether
scenarios we consider the queries of Table 1. and to what degree direct collaboration, that is inherent to
systems like ours, could improve the performance of an
informational task. The intuition behind this experiment
Table 1. Queries Data Set
Type Query is that as people in group1 (group3) collaborate towards
AIQ1 Find information and reviews about iPhone a common goal they will outperform group2 (group4) and
AIQ2 Find information about Ancient Rome will produce better results (either more or the same amount
AIQ3 Find reviews about Xbox but in less time).
AIQ4 Find information about Egyptian pyramids
AIQ5 Collect information about USA presidents 5.3 Results
AIQ6 Collect information about space exploration
First Study: A simple radar observation revealed the fol-
U IQ1 Find the list of the current prime ministers
lowing: At the end of phase-1 subjects were far away from
of the world that are lawyers
each other which indicates that they did not share the same
U IQ2 Find the list of paintings that have been sold
interests. Then, at the end of phase-2, subjects that had the
for more than 1 million dollars
same search task approached each other. Finally, in phase
3, subjects drew away from each other.
5.2 Experimental Studies For this experiment we used as evaluation metric the av-
erage distance avgd between the set of users defined by
First Study: To test the first hypothesis we asked from 24 each query. Formally, let the set of subjects that have been
students (subjects), to participate in a study. The subjects assigned the query AIQx be Gx . Then the average distance
were familiar with the Web searching task but not aware of between subjects of Gx and Gy is defined as:
our browser extensions. P
i∈Gx ,j∈Gy (d(i, j))
First, subjects were asked to surf the web using our ap- avgd(Gx , Gy ) = , ∀i < j (4)
n
plication without a specific objective for a time interval
(phase-1). Then, each subject was assigned a search task where n is the number of distinct pairs of subjects between
from the set of the ambiguous informational queries of Ta- Gx and Gy and d(i, j) is the distance of subjects i and j as
ble 1 (AIQ1 ...AIQ6 ). Each query was assigned to ex- defined in Equation 2. Table 2 shows the results for phase-2.
actly four participants. Participants were not aware of each
other’s assigned search tasks. Subjects were asked to fulfill Table 2. First User Study Results
their assigned task for a predefined time period (phase-2). G1 G2 G3 G4 G5 G6
After this period has elapsed subjects were asked to surf the G1 0.77 0.99 0.99 0.95 0.97 0.98
web without a specific objective for another time interval G2 - 0.75 0.99 0.97 0.99 0.97
(phase-3). G3 - - 0.73 0.99 0.98 0.96
The objective of this experiment is to evaluate how well G4 - - - 0.89 0.95 0.99
the radar groups together subjects that have been assigned G5 - - - - 0.78 0.98
the same searching scenario. The intuition behind this ex- G6 - - - - - 0.77
periment is that although subjects are free to perform the
given searching task following their judgement, there is a The results indicate that our algorithm was able to cap-
high probability that some intersection between the subjects ture the fact that some users had similar search interest.
that have been assigned the same task will occur. Note the diagonal of the Table 2, where it is clear that sub-
Second Study: To test the second hypothesis we asked from jects that have been assigned the same search task came
16 students (subjects) to participate in a study. We separated closer at the end of phase-2.
6
Second Study: For this experiment we need metrics to eval- made sure that it entails a balance of visibility, awareness of
uate the performance of individuals and of groups. For the others, and accountability.
former, we define the mean subject performance to be the The main idea of our system is to utilize temporal cor-
average number of distinct results found by the subjects of a relations between users’ web history logs. Though intu-
group. For the latter, we define the performance of a group itive the realization of such a system is not trivial since it
to be the number of distinct results found by all subjects in poses a number of challenges spanning from technical to
a group. Table 3 presents the results of this experiment. social aspects. Overall, the proposed application operates
as a system for storing meta-data of web browsing activity,
chat conversations and URL suggestions. By seamlessly
Table 3. Second User Study Results merging the available collaboration data our system forms
Query Group Average Subject Group the foundation for a practical social navigation system.
Performance Performance
U IQ1 1 9 35
U IQ1 2 9.75 13
References
U IQ2 3 12 42 [1] Web server survey (by www.netcraft.com), January 2008.
U IQ2 4 33.5 37 [2] E. Agichtein, E. Brill, S. T. Dumais, and R. Ragno. Learn-
ing user interaction models for predicting web search result
Regarding U IQ1 , subjects in group1 and group2 had preferences. In SIGIR, pages 3–10, 2006.
[3] J. Budzik, S. Bradshaw, X. Fu, and K. J. Hammond. Clus-
similar individual performance. However, group1 outper-
tering for opportunistic communication. In WWW, 2002.
formed group2. Users of group1 coordinated their effort by [4] S.-J. Chang and R. E. Rice. Browsing: A multidimen-
communicating a simple strategy. They separated the work- sional framework. Annual Review of Information Science
load (e.g., by continent) to four equal parts and assigned and Technology, 28, 1993.
each part to a subject. On the other hand, all subjects in [5] A. Dieberger. Supporting social navigation on the world-
group2, which where not collaborating, followed a natural wide web. Int. J. of Human-Computer Studies, 46, 1997.
strategy of looking for prime ministers of popular countries [6] P. Dourish and M. Chalmers. Running out of space: Models
causing their results to have a large overlap. Eventually, the of information navigation. Proceed. of HCI’94.
[7] T. Erickson and W. A. Kellogg. Social translucence: an ap-
results of the one were subsumed by the results of the other. proach to designing systems that support social processes.
Regarding U IQ2 , subjects of group3 coordinated their ACM Trans. CHI, 7(1), 2000.
effort by assigning only one member to the straightforward [8] S. Harrison and P. Dourish. Re-place-ing space: the roles of
task of collecting the top paintings from available listings place and space in collaborative systems. In CSCW, 1996.
(e.g., Wikipedia). The rest of the subjects continued search- [9] J. Kahan and M.-R. Koivunen. Annotea: an open rdf infras-
ing for expensive paintings by either submitting more so- tructure for shared web annotations. In WWW, 2001.
[10] A. Large, J. Beheshti, and T. Rahman. Gender differences in
phisticated queries in search engines or by visiting the web-
collaborative web searching behavior: An elementary school
sites of known galleries and auction houses. On the other study. Inf. Process. Manage., 38, 2002.
hand, subjects of group4 spent most of their time collecting [11] M. R. Morris. A survey of collaborative web search prac-
the same paintings from listings of the top paintings. tices. In CHI, 2008.
[12] M. R. Morris and E. Horvitz. Searchtogether: an interface
for collaborative web search. In UIST, 2007.
6 Concluding Remarks [13] S. Pandit and C. Olston. Navigation-aided retrieval. In
WWW, 2007.
To enable social navigation on the web, we had to design [14] T. Rodden, J. Mariani, and G. Blair. Supporting cooperative
a system that makes use of web history logs to encourage applications. In CSCW, 1992.
communication and collaboration among large groups of [15] J.-T. Sun, H.-J. Zeng, H. Liu, Y. Lu, and Z. Chen. Cubesvd:
a novel approach to personalized web search. In WWW,
people. To support these features some personal informa-
2005.
tion and a certain, limited amount of visibility of users’ ac- [16] J. Teevan, S. T. Dumais, and E. Horvitz. Personalizing
tions is required, which eventually infringes on user privacy. search via automated analysis of interests and activities. In
Privacy concerns could therefore serve as a major stumbling SIGIR, 2005.
block towards acceptance of our system. [17] M. B. Twidale, D. M. Nichols, and C. D. Paice. Browsing is
Erickson and Kellogg [7] introduced the concept of so- a collaborative process. Inf. Process. Manage., 33(6), 1997.
cial translucence as an approach to designing systems that [18] A. Wexelblat and P. Maes. Footprints: history-rich tools for
information foraging. In CHI, 1999.
support social processes. According to this concept it is not
[19] Q. Zhao and S. C. H. H. et al. Time-dependent semantic
only necessary to see other users, but to clearly communi- similarity measure of queries using historical click-through
cate what information is disclosed and how it is used. We data. In WWW ’06, pages 543–552.
followed the same approach when designing our system and