Documente Academic
Documente Profesional
Documente Cultură
www.elsevier.com/locate/tele
Abstract
The quest for quality of life (QoL) is a growing concern for individuals and communities
seeking to find sustainable life satisfaction in a technologically changing world. Industry,
consumer groups, academics, and policy makers have sought to better understand how the
Internet contributes to or detracts from society. This study examined the eects of Internet
activities, new media use, social support, and leisure activities on perceived quality of life.
Correlational results showed that Internet activities, such as using the Internet for sociability,
fun seeking and information seeking, and new media use, correlate positively with various
dimensions of social support. However, use of the Internet, especially for sociability, and
computer use were inversely linked to QoL. Furthermore, hierarchical regression analysis
revealed that aectionate, positive social interaction, and emotional and informational social
support, received from either online or oine sources, are the strongest determinants of
quality of life. More important, QoL can also be enhanced if suitable amounts of time are
spent on media-related activities, namely, less time on using the Internet for intimate selfdisclosure and in playing computer games, and more time on listening to music on CD/MD/
MP3. Finally, participating in community or religious activities for leisure was also a significant predictor of QoL. Implications regarding policy formulation to improve life quality are
discussed.
! 2004 Elsevier Ltd. All rights reserved.
Keywords: Internet; Quality of life; Social support; Leisure activities
The work described in this paper was fully supported by a grant from the Research Grant Council of
the Hong Kong Special Administrative Region (project no. CUHK 4315/01H).
*
Corresponding author. Tel.: +852-26097703; fax: +852-26035007.
E-mail address: louisleung@cuhk.edu.hk (L. Leung).
0736-5853/$ - see front matter ! 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.tele.2004.04.003
162
1. Introduction
Over the past decade, the Internet has changed the way people work, play, learn,
and communicate. Today, there is a scarcely an aspect of our life that is not being
aected by the torrent of information available on the hundreds of millions of sites
crowding the Internet, not to mention its ability to keep us in constant touch with
each other via electronic mails (Henderson, 2001). The Internet adds a new entry to
the list of older mechanisms such as the telephone, postal mail, TV, radio, and
newspaper, all of which import communication and information into the household.
In fact, many view the growth of the Internet and e-commerce as a global megatrend
along the lines of the printing press, the telephone, the computer, and electricity.
Since these relatively recent developments, technology in much of the world has just
about taken over our lives.
The quest for quality of life is a growing concern for individuals and communities
seeking to find sustainable life satisfaction in a technologically changing world
(Mercer, 1994). Globalization and rapid advances in information technology oer us
vast, unprecedented opportunities to improve life quality. Yet, this opportunity may
also be burdened with undesirable consequences. With the Internet, peopleliving
in the most plugged-in and mechanized society in historymay be working harder
than ever. Rather than creating time for leisure, our technology is creating ways that
make it possible to undertake more work at home. Cellular phones, palmtops, and
Internet access devices may be making it virtually impossible to escape our jobs.
Technology may diminish our leisure time, not increase it (Anderson and Tracey,
2001). Does using the Internet make people happier or unhappier? Is the Internet
empowering, to which specific groups of people, and under what circumstances?
Does virtual community erode face-to-face community? These are some of the key
questions social scientists are exploring today.
Previous research in assessing life quality have included selected attributes such as
access to leisure activities, amount of non-work time, telework, and use of new
media technology (Kernan and Unger, 1987; Leung, 2004; Moller, 1992; Wei and
Leung, 1998), among others. However, little research has been carried out to further
explore the potential relationship between the Internet and QoL. For the time being,
both theoretical and empirical researches on the impact of the Internet are still in
their infancy. This study examines the possible influence of the Internet with particular emphasis on the roles of Internet activities, use of new media, social support,
and leisure activities on quality of life.
2. Theoretical frameworks
2.1. Quality of life
Quality of life, a cognitive judgmental process, is defined as a global assessment
of a persons life satisfaction according to his chosen criteria (Shin and Johnson,
163
1978). Diener (1984) suggested that the judgment of how satisfied people are with
their present state of aairs is based on a comparison with a standard, which each
individual sets for himself or herself. It is not externally imposed. Although many
people see wealth, health, employment, leisure, personal life, and fame as desirable,
dierent individuals may place dierent values on them. As defined by Argyle (1987),
[the] meaning of happiness is a state of joy or positive emotion; or the satisfaction
with life as a whole, or with work, leisure, and other parts of it. Therefore, quality
of life is a measure of overall life satisfaction, rather than a summation of life satisfaction across specific domains.
In reviewing the quality of life literature, two constructs have been used to explain
the determinants of life satisfaction or quality of life: subjective and objective perspectives (Diener, 1984). The subjective construct hypothesizes that perceived quality
of life is influenced by personality or dispositional factors (e.g., optimism, pessimism,
isolation, self-worth, and neuroticism). On the other hand, the objective construct
proposes that life quality is aected by environmental or situational factors (e.g.,
family, job, leisure, neighborhood, community, and satisfaction with standard of
living). According to the objective determinants of life quality, peoples quality of life
tends to be a direct function of their evaluations of important life domains such as
social support, leisure activities, and standard of living of overall life (e.g., Andrew,
1986; Andrews and Withey, 1976; Diener, 1984). Satisfaction or dissatisfaction with
standard of living is likely to spill over to influence subjective well-being. Therefore,
the greater the satisfaction with ones standard of living, the greater the satisfaction
with life and vice versa. Here, standard of living is usually meant as being materially
better o than a typical family (Andrews and Withey, 1976; Diener, 1984; Prenshaw, 1994).
To maintain or to have a high standard of living, technologies and innovations have always played a major role in the past (McPheat, 1996). Household
technologies introduced around the middle of the last century, such as television,
refrigerators, air-conditioners, vacuum cleaners, and clothes dryers, are permanently embedded in society. Even more taken-for-granted are changes in workplace technology such as the use of mobile phones, faxes, and e-mails. The
impact of the Internet on society as a whole has been debated continuously since
its widespread use in the 1990s. Industry, consumer groups, academics, and
policy makers have sought to better understand how the Internet contributes to
or detracts from society. Communications media are so fundamental to society
that new media forms have the capacity to reshape our work, leisure, lifestyle,
social relationships, national and cultural groups and identities in ways that are
dicult but important to predict. As the Internet continues to expand its technological capabilities and global penetration, one of the most pressing questions
is: Does the Internet have a positive or negative eect on life quality? As
shown in Fig. 1, this study examines, from an objective perspective, the impact
of social support, leisure activities, and standard of living (as supported and
maintained by the use of information technologies such as the Internet) on
quality of life.
164
Internet Activities
H 2.1
H 2.2
H4.1
Social Support
* emotional & informational
* affectionate
* positive social interaction
H 4.2
H1
H 3.1
Leisure Activities
H 3.2
Quality of Life
Demographics
165
patients with AIDS (Brennan et al., 1991). These studies have demonstrated that the
use of a computer-based communication system reduced self-reported isolation in an
AIDS trial and led to greater perceived confidence in the ability to care for family
members in the Alzheimers caregivers study. Internet-based peer support groups
for depression have also been found providing information and support, in which
heavy users of the Internet groups were more likely to have resolution of depression
during follow-up than less frequent users (Houston et al., 2002). Similarly, in
addition to research focused on the impact of the Internet on disabled people, past
study also investigated social support in the computer-mediated environment for
well-bodied people and found that older adult Internet users reported higher satisfaction with Internet providers of social support; and greater involvement with an
online community was predictive of lower perceived life stress (Wright, 2000).
It is impossible to consider all the variables from the subjective and objective
perspectives in assessing quality of life for any individual. The list of possible indicators is endless. One solution is to see which of them increases the objective quality
of life within the domain of mediated social impact by information technologies.
Furthermore, building from Putnams (1995) conceptual links between quality of
life, community involvement, and social capitals, further research has demonstrated
that frequent and increasing use of the community computer network and the Internet significantly influence social capital formation (Kavanagh and Patterson,
2001). Therefore, we expect that:
H1: Social support is positively associated with QoL.
H2.1: Internet activities (especially for sociability) are positively associated with
social support.
H2.2: Internet activities (especially for sociability) are positively associated with
QoL.
2.3. Leisure activities
As reviewed earlier, one important objective determinant of life quality is leisure
activities. In studying leisure, scholars like to ask whether place-centered leisure
activities, which take place in urban parks, or sporting and entertainment venues,
contribute more to a persons self-reported quality of life or whether QoL is primarily influenced by people-centered factors such as social interaction, sense of
achievement, and level of satisfaction with ones leisure lifestyle. Social interaction is
a central component of leisure activities (Auld and Case, 1997) and the most positive
experiences people report are usually those with friends (Csikszentmihalyi, 1997). In
a study that examined the relative importance of selected place and people-centered
leisure attributes in predicting quality of life, Lloyd and Auld (2001) found that the
people-centered leisure activities were the best predictor of quality of life. In particular, the domain of social support from family, friends, and marriage has the most
eect on life quality and social leisure activity has the most positive influence on QoL
for a diverse range of social groups (Siegenthaler and Vaughan, 1998). Moreover,
previous research has demonstrated a positive relationship between engaging in
166
leisure activities such as sports (Wankel and Berger, 1990) and fitness exercises
(Dowall et al., 1988) and improved life quality. Foong (1992) explained that these
significant relationships are due to the salutary consequences of social interaction
with other people resulting from engagement in active leisure. This study will use
both people-centered as well as place-centered indicators to assess leisure activities. As
a result, we hypothesize that:
H3.1: Leisure activities are positively associated with social support.
H3.2: Leisure activities are positively associated with quality of life.
2.4. Impact of the Internet and new media
Extensive qualitative and quantitative evidences also supported the Internets
potential that home Internet access enabled the informationally disadvantaged or
low-income families to experience powerful emotional and psychological transformations in identity (self-perception), self-esteem, personal empowerment, a new
sense of confidence, and social standing or development of personal relationships on
the Internet (Anderson and Tracey, 2001; Bier and Gallo, 1997; Henderson, 2001).
The appropriate use of computers, mobile phone, online newspaper, and online
forum, etc. can help to promote self-suciency, psychological empowerment, lifelong learning, and rehabilitation (Bier and Gallo, 1997; Hu and Leung, 2003;
Wellman and Haythornthwaite, 2002). Wright (2000) found that greater involvement
with the online community was predictive of lower perceived life stress for older
adults. A trend toward decreased loneliness and improved psychological well-being
among older adults was observed when e-mail and Internet access was provided
(White et al., 1999). Based on these findings and the theoretical frameworks reviewed, we propose two additional hypotheses and ask one research question:
H4.1: Use of new media technology is positively associated with social support.
H4.2: Use of new media technology is positively associated with QoL.
RQ: To what extent can Internet activities, use of new media, and traditional media
use aect quality of life when other influences, such as social support, leisure activities, and demographics are considered simultaneously for Internet users?
3. Method
3.1. Sample and sampling procedures
Data were gathered from a probability sample of 1192 respondents, using a faceto-face structured questionnaire interview during the months of OctoberDecember
2002. Respondents were eligible members of randomly generated households from
the Census and Statistics Department in Hong Kong. If there was more than one
eligible respondent living in the household, the person who was between the ages of
15 and 64 and had had the most recent birthday was interviewed. Interviewers were
167
trained university students. A total of 238 households were discarded when interviewers found them to be vacant, for non-residential use or ineligible, had no response after having visited more than three times, or were simply refused by the
respondents. Of the 954 qualified households, 696 successfully completed the questionnaires, resulting a 73% response rate.
The sample consisted of 46.7% males and 53.3% females. The mean age was 36.8
with 30.3% who were in the 3544 age group, 21.6% in 2534, 20% in 1524, 19.7% in
4554, and 8.5% were in 5564. This age distribution very closely resembled the 2001
population census in Hong Kong. Of the 696 respondents, 41.9% were high school
graduates, 24% college graduates, 19.5% had completed junior high, and 13.4% only
had grade school education. In terms of income, the mean was at the income bracket
of US$2565$3205 a month, with 16.9% earning less than US$1282 a month, 21%
between US$1282 and $1923, 13.6% between US$1924 and $2564, 12.8% between
US$2565 and $3205, 17.6% between US$3206 and $5128, 9.9% between US$5129
and $7692, and 8.3% more than US$7692 a month. Over 38% were managers,
administrators, professionals, or associate professionals, 19.4% clerks, 14.3% service
or sales workers, 10.8% craft and related workers, 9.8% had elementary occupations,
and about 5% were plant and machine operators and assemblers.
3.2. Measurements
Quality of life. To measure quality of life, the Satisfaction with Life Scale (SWLS)
developed by Diener et al. (1985) was employed. With good internal consistency and
high reliability, SWLS is narrowly focused to assess global life satisfaction and does
not tap related constructs such as positive aect or loneliness. Respondents were
asked about their agreement with a five-item scale using a 5-point scale with
1 strongly disagree, 2 disagree, 3 ordinary, 4 agree, and 5
strongly agree. The five items include: (1) in most ways my life is close to my ideal;
(2) the conditions of my life are excellent; (3) I am satisfied with my life; (4) so far I
have gotten the important things I want in life; and (5) if I could live my life over, I
would change almost nothing. Reliability alpha was high at 0.83.
Social support. To assess social support, a battery of 19 items within four subscales developed by The Rand and Medical Outcome Study (MOS) teams was
adopted with slight modification. The five original dimensions of social support were
further reduced as items from emotional support and informational support were
highly correlated and considerably overlapped. Therefore, emotional and informational support was merged into one. As a result, the four subscales were tangible,
aectionate, positive social interaction, and emotional or informational
supports. It was recommended that the subscale scores rather than the total score be
used (McDowell and Newell, 1996). Moreover, items from the tangible support
subscale were excluded because tangible support refers mostly to medical or health
related assistance from friends or close relatives rather than being aective or
emotional related. Respondents were asked how often each of the support items,
measured in the remaining three dimensions, is available to them if they need them
either from the online or oine world. A 5-point scale was used including
168
Table 1
Factor analysis of social support
How often is each of the following kinds of
support available to you if you need it?
Emotional and informational
1. Someone whose advice you really want
2. Someone to give you good advice about
a crisis
3. Someone to give you information to help
you understand a situation
4. Someone to turn to for suggestions about
how to deal with a personal problem
Positive social interaction
5. Someone to get together with for
relaxation
6. Someone to do something enjoyable with
7. Someone to do things with to help you get
your mind o things
Aectionate
8. Someone who shows you love and
aection
9. Someone to love and make you feel
wanted
10. Someone who comforts you sincerely
(hugs you)
11. Someone you can count on to listen to
you when you need to talk
Eigenvalue
Variance explained
Cronbachs alpha
Mean
SD
Factors
1
3.58
3.54
0.83
0.89
0.77
0.77
3.57
0.86
0.71
3.47
0.95
0.61
3.63
0.84
0.80
3.56
3.35
0.86
0.90
0.78
0.67
3.69
0.87
0.86
3.61
0.91
0.69
3.64
0.90
3.72
0.84
0.41
0.44
0.42
6.41
58.27
0.86
0.65
0.53
0.80
7.27
0.83
0.69
6.26
0.84
Scale used: 1 none of the time, 2 a little of the time, 3 some of the time, 4 most of the time, and
5 all of the time; N 388.
1 none of the time, 2 a little of the time, 3 some of the time, 4 most
of the time, and 5 all of the time. Principal components factor analysis in Table
1 extracted three factors and explained 71.8% of the variance. The three factors were
emotional and informational support with alpha 0.86, positive social interaction (alpha 0.83), and aectionate (alpha 0.84).
Leisure activities. Respondents were asked how often they engage in five popular
people-centered and place-centered leisure activities in Hong Kong including: talking
to family and friends face-to-face for more than 10 min, playing mahjong, participating in community or religious activities, physical exercise, and window shopping.
A 5-point scale was used with 1 never, 2 seldom, 3 sometimes,
4 quite often, and 5 very often.
Internet activities. Respondents were asked how often they use the following Internet activities: learning from the Internet, searching for information, reading news
online, listening to music, playing games, surfing for leisure and entertainment,
169
Mean
SD
Factors
1
2.59
2.93
3.36
1.26
1.22
0.98
0.79
0.77
0.69
2.05
1.02
0.78
3.42
1.08
0.77
2.16
1.09
3.70
0.86
0.79
3.07
2.82
0.93
0.90
0.69
0.67
2.81
0.62
0.86
2.74
0.64
0.83
0.51
3.32
30.16
0.71
0.54
1.56
14.13
0.70
1.17
10.66
0.58
1.04
9.42
0.62
purchasing, using services on the Internet (such as paying bills, account transfer,
booking tickets, etc.), communicating with somebody you did not know before,
communicating with somebody you knew before, and talking about aspects of your
inner world to other people. A 5-point Likert scale was used with 1 meaning
never, 2 seldom, 3 sometimes, 4 often, and 5 very often. After
excluding two items, principal components factor analysis with Varimax rotation
yielded four factors with eigenvalues greater than 1.0, explaining 64.37% of the
variance. As shown in Table 2, these factors are fun seeking, sociability, information
seeking, and e-commerce with alpha ratings equaling 0.71, 0.70, 0.58, and 0.62
respectively.
New media use. Respondents were asked how much time they spent on the eight
most popular new media technologies in their leisure time, namely, Internet use,
computer use, ICQ, e-mail, and talking on the phone in minutes per day and playing
computer games, listening to CD, MD, MP3, and watching VCD and DVD in
minutes per week.
Traditional media use. Four traditional mass media variables were included in the
analyses: printed newspaper reading, TV watching, magazine reading, and radio
170
listening. Respondents were asked to report the time on average spent on these media
in a normal day. Newspaper reading, TV watching, and radio listening were measured
in minutes per day while magazine reading was measured in minutes per week.
4. Results
4.1. Hypotheses testing
H1 predicted that social support is positively associated with quality of life. As
expected, correlation results in Table 3 showed that emotional and informational
(r 0:36, p < 0:001), positive social interaction (r 0:40, p < 0:001), and aecTable 3
Correlation analyses of all criterion variables and social support and quality of life
Social support
Quality of
life (QoL)
Emotional
and informational
Positive social
interaction
Aectionate
Internet activities
Fun seeking
Sociability
Information seeking
E-commerce
0.11"
0.11"
0.11"
n.s.
0.16""
n.s.
n.s.
n.s.
n.s.
n.s.
0.12"
0.11"
n.s.
)0.16""
n.s.
n.s.
n.s.
n.s.
n.s.
n.s.
0.14"""
0.11""
0.15"""
n.s.
n.s.
n.s.
n.s.
n.s.
0.14""
0.14""
0.23"""
n.s.
n.s.
n.s.
n.s.
n.s.
n.s.
n.s.
0.21"""
n.s.
)0.16""
)0.13"
n.s.
n.s.
n.s.
)0.08""""
0.18""
n.s.
Social support
Emotional and informational
Positive social interaction
Aectionate
""""
p < 0:1;
"""
p < 0:05;
0.24"""
n.s.
0.14"""
0.10""
0.11""
""
0.27"""
0.33"""
0.36"""
0.40"""
0.47"""
0.25"""
0.11"
n.s.
n.s.
n.s.
n.s.
0.13""
n.s.
0.22"""
n.s.
0.11"
0.11"
n.s.
171
tionate (r 0:47, p < 0:001) dimensions of social support were significantly correlated to quality of life. This suggests that people, who have strong social support
available when they need it, such as armation, aid, encouragement, information,
aect, and validation of their feelings, are those who enjoy a high quality of life.
Thus, H1 received strong support from the data.
H2:1 predicted that Internet activities are significantly linked to social support. In
the four main categories of Internet activities, results in Table 3 indicated that fun
seeking, sociability, and information seeking were significantly related to the emotional and informational dimension of social support (each with r 0:11, p < 0:05).
This means that people who often receive advice in crises in the real world are those
who are active on the Internet talking about aspects of their inner world with friends
and strangers, relying heavily on the Internet for advice and information to help
them understand their personal problems, playing games, listening to music, and
surfing the web. Secondly, fun seeking was also significantly related to the positive
social interaction dimension of social support (r 0:16, p < 0:01). This indicates
that people who enjoy a large social network for interaction and relaxation oine
are those who are active game players and fun seekers on the Internet. Thirdly, bivariate relationships between information seeking and e-commerce and the aectionate dimension of social support were also significant (r 0:12, p < 0:05 and
r 0:11, p < 0:05 respectively). This suggests that people who have a large circle of
friends providing them with love and aection in the oine world are also those who
are active on the Internet seeking information, advice, and receiving support.
Therefore, H2:1 is largely supported.
Contrary to what H2:2 hypothesized, that Internet activities are positively associated with life quality, results in Table 3 showed that sociability was negatively
linked to quality of life (r #0:16, p < 0:01). Such a finding indicates that people
who spend a lot of time disclosing their inner world to others on the Internet are
those with a lower assessment of their overall life quality. This relationship could be
explained in that when people spend a lot time talking about their personal feelings
online, this may take away time from more valuable activities oine, including social
contact, sleep, leisure activities, or reading books. Therefore, H2:2 is not supported.
H3:1 proposed that leisure activities would influence social support. Results in
Table 3 showed that emotional and informational social support are significantly
related to people-centered leisure activities such as talking with family and friends
face-to-face (r 0:24, p < 0:001) and participating in community or religious
activities (r 0:14, p < 0:001). This means that, at the time of crises, people tend to
obtain information and advice by engaging in face-to-face conversation with other
people and/or by actively involving in religious or community activities. In addition,
people also find informational and emotional social support through place-centered
leisure activities, e.g., physical exercise (r 0:10, p < 0:01) and window shopping
(r 0:11, p < 0:01). Similarly, when people get together for relaxation and fun for
positive social interaction, people tend to talk with family and friends (r 0:27,
p < 0:001), play mahjong (r 0:11, p < 0:05), and to go window shopping with
friends (r 0:22, p < 0:001)a popular place-centered leisure activity in Hong
Kong. As expected, people who receive a lot of aection and love for social support
172
are those who often engage actively in people-centered, face-to-face chat with family
and friends (r 0:33, p < 0:001) and place-centered window shopping (r 0:11,
p < 0:05). Thus, H3:1 is largely supported.
H3:2 proposed that leisure activities would influence quality of life. As anticipated,
talking with family and friends face-to-face (r 0:25, p < 0:001) and participating in
community or religious activities (r 0:13, p < 0:01) in people-centered leisure
activities are significantly linked to quality of life. Furthermore, physical exercise or
sports (r 0:11, p < 0:05), a place-centered leisure activity, was also found significantly liked to QoL. Hence, H3:2 received strong support.
H4:1 predicted that use of new media technology is positively associated with
social support. Correlational results in Table 3 showed that, of the eight new media
technologies commonly used in daily life, emotional and informational social support was significantly linked to talking on the phone (r 0:14, p < 0:001), playing
computer games (r 0:11, p < 0:01), and listening to CD/MD/MP3 (r 0:15,
p < 0:001). This shows that people who can receive advice and information about a
crisis when they need them are those who tend to spend a lot of time on the phone
seeking counsel, guidance, or encouragement; others receive emotional and informational social support through computer gaming and listening to or sharing music
with online/oine friends. Similarly, positive social interaction and social support
were also significantly related to talking on the phone (r 0:14, p < 0:01), playing
computer games (r 0:14, p < 0:01), and listening to CD/MD/MP3 (r 0:23,
p < 0:001). These findings indicate that people who receive social support by getting
together and doing something enjoyable with friends are those who often like to talk
on the phone, play games with computers, and listen to CD/MD/MP3. Finally,
people who receive a lot of love, aection, and hugs in real life are those who also
listen to music online regularly to release social pressure (r 0:21, p < 0:001). As a
result, H4:1 received partial support.
H4:2 predicted that use of new media technologies is positively associated with
quality of life. Results showed that only three out of eight new technologies and QoL
were significantly linked. Surprisingly, use of the Internet and use of computer were
negatively related to QoL (r #0:16, p < 0:01 and r #0:13, p < 0:05 respectively); while watching VCD/DVD/LD for entertainment and life quality were
positively linked (r 0:18, p < 0:01). Therefore, H4:2 received little support.
4.2. Predicting quality of life
Finally, to compare the relative influence of Internet activities, use of new media,
and traditional media use on quality of life when other factors, such as social support, leisure activities, and demographics are considered simultaneously for Internet
users, a hierarchical regression analysis was run. Results in Table 4 show that
sociability was a significant predictor (b #0:11, p < 0:05) under the Internet
activities block. However, the negative correlation indicates that the people who
spend more time communicating their inner thoughts to other people on the Internet
are those who tend to have a lower level of life quality. The first block accounted for
2% of the variance.
173
Table 4
Stepwise regression of Internet activities, new media use, traditional media use, social support, leisure
activities, and demographics on quality of life (QoL)
DR2
Predictor variables
n.s.
)0.11"
n.s.
n.s.
0.02
n.s.
n.s.
n.s.
)0.14""
0.18""
n.s.
0.08
n.s.
n.s.
n.s.
n.s.
0.00
0.13""
0.23"""
0.35"""
0.20
n.s.
n.s.
0.08"
n.s.
n.s.
0.01
Block 6: Demographics
Gender (female 1)
Age
Education
Monthly household income
n.s.
0.16""
n.s.
0.11"
0.03
R
Final adjusted R2
2
0.36
0.34
Notes: Figures are standardized beta coecients from final regression equation with all blocks of variables
included for the entire sample.
""""
p < 0:1; """ p < 0:05; "" p < 0:01; " p < 0:001; N 388.
Use of new media technologies were entered into the equation next. Results
showed that playing computer games (b #0:14, p < 0:01) and listening to CD/
MD/MP3 (b 0:18, p < 0:001) were the only two significant predictors. The negative link between playing computer games and QoL reveals that the violent nature of
most computer games has led people to view computer games as a negative force in
aecting their self-evaluation of life quality. Quality of life was also predicted by
174
heavy listening to CD/MD/MP3. This indicates the eect of a wide range of music on
users well-being. These two variables contributed 8% of the variance. However,
traditional mass media had no significant impact on life quality.
The three dimensions assessing social support were the next entries in the equation. Aectionate (b 0:35, p < 0:001), positive social interaction (b 0:23,
p < 0:001), and emotional and informational (b 0:13, p < 0:01) dimensions contributed significantly to the regression equation and explained a total of 20% of the
variance. Five variables from the leisure activities block were entered next. Participating in community or religious activities (b 0:08, p < 0:05) was a significant
predictor that accounted for another 1% of the variance.
Demographic predictors were entered last and it was found that age (b 0:16,
p < 0:01) and monthly household income (b 0:11, p < 0:05) were significant. The
equation explained 34% of the variance in total with the first three blocks of mediarelated predictors contributing a significant proportion of 10%. This suggests that
while social support dimensions were the strongest predictors, appropriate use of the
Internet and new media technologies do have an impact on quality of life.
5. Discussion
5.1. Social support and QoL
This study has shown that people with strong social support, such as armation,
aid, encouragement, and aect, available to them when they need them either from
the online or oine world reported a higher quality of life. This finding means that
receiving support from strong ties increases life quality. This is consistent with past
research that lower levels of perceived social support, satisfaction with social contacts, and participation in social activities were all found to be related to poorer
psychological well-being or life quality (House, 1986). Conversely, when people have
high levels of emotional support, mediated entirely by the perception that one has
someone to call on when they need to, they expect to live longer (Ross and Mirowsky, 2002). Furthermore, hierarchical regression results confirmed that aectionate, positive social interaction, and emotional and informational dimensions of
social support were all significant predictors of QoL and explained the majority of
the variance.
5.2. Internet activities and social support
Internet activities, such as using the Internet for sociability, fun seeking, and
information seeking, were found to be positively related to various dimensions of
social support. These imply that people who communicate their inner world with
friends and strangers online and rely heavily on the Internet for advice and information to help them understand personal problems are those who often receive
guidance and assistance in times of crisis. This finding is in line with past research
which indicates that individuals who regularly oer advice and information oine
175
receive more help more quickly when they ask for something in the online world
(Rheingold, 1993; Wellman and Gulia, 1999). Wellman and Haythornthwaite (2002)
also found that those who have more real support receive more Internet support.
Thus the receipt of support happens synergistically online and oine.
5.3. Internet activities and QoL
However, contrary to what was originally hypothesized, using the Internet for
sociability and their overall assessment of quality of life were inversely linked. There
are several possible explanations. First, many of the social relationships people
maintain online are less substantial and less sustaining than relationships that people
have in their actual lives. Second, more time spent online may take away from more
valuable activities, including social contacts oine, sleeping, or reading books.
Third, online communication is a less adequate medium for close social communication than the telephone or face-to-face interactions it displaces. Fourth, computermediated relationships are usually superficial with easily broken bonds. This finding
is in line with previous research suggesting that relationships maintained over long
distances through the Internet erode personal security and happiness (Kraut et al.,
1998). In the end, the Internet is useful for linking people to information and social
resources which are unavailable in peoples closest local groups (e.g., professional
groups), but may be poor for deep feelings of aection and obligations. Thus, the
weak social ties supported by the Internet network are likely to be more limited than
friendships supported by physical proximity. As a result, this negative relationship
may lead to a decrease in the assessment of life quality.
5.4. New media use and social support
It is also interesting to note that frequencies of participation in new media
activities, such as talking on the telephone regularly, playing games on the computer,
and listening to music on CD/MD/MP3, showed positive relationships with social
support, especially in the emotional/informational and positive social interaction
dimensions, among Internet users. These mean that use of new media technologies,
such as the telephone, computer, and CD/MD/MP3, may service various needs such
as companionship, entertainment, and relaxation (Wachter and Kelly, 1998). In past
research, companionship has been linked to the direct eects model of social support
(Antonucci, 1990). In other words, people might receive advice, information, suggestions, relaxation, and various types of social supports derived from a wide range
of new media activities.
5.5. New media use and QoL
Interestingly, use of ICQ, e-mail, and talking on the phone did not significantly
influence QoL as expected (see Table 3 for details). In fact, use of the Internet and
computer were negatively linked to QoL. These findings may mean that heavy use of
the Internet and computer, such as playing computer games and use of the Internet
176
for sociability purposes, may actually degrade quality of life if these technologies
were used excessively or used for unhealthy reasons. Furthermore, with the Internet,
we are living in the most plugged-in society in history. Rather than creating time for
leisure, computer and the Internet may have created ways by which we can do more
work while we are away from the oce. Similarly, cellular phones, e-mails, and
Internet access devices are making it virtually impossible to escape our jobs. As a
result, technology may be diminishing our leisure time, not increasing it. In a study
of the impact of TV, Brock (2002) also found that excessive or frequent TV viewing
contributes to a number of issues, including fractured family time, poor reading and
academic performance, increased violence, inactive lifestyles, and obesity. However,
TV-free individuals fill their newly discovered free time with a variety of hobbies,
community involvement, conversation, reading, writing, cooking, and playing (Sirgy
et al., 1998). By turning o the TV and taking back their time, they gained more
communication with children and spouses, improved marriages, experienced less
conflict among siblings, and increased community involvement (Brock, 2002; Kubey
and Csikszentmihalyi, 1990).
In sum, as shown in the results from the hierarchical regression analysis that the
extent to which the QoL of the individual can be enhanced does in part hinge on a
suitable amount of time spent on media-related activities, namely, less time on using
the Internet for intimate self-disclosure, less time in playing computer games, and
more time on listening to music on CD/MD/MP3.
5.6. Leisure activities, social support, and QoL
Finally, although past research indicates that the people-centered leisure attribute,
especially leisure satisfaction, was the best predictor of quality of life and placecentered attributes failed to influence life quality (Lloyd and Auld, 2001), most
bivariate relationships in this study, however, between people-centered and placecentered leisure activities and social support, as well as quality of life, were found
significant. These results are consistent with findings by McCormick and McGuire
(1996) that the primary leisure attribute that creates and maintains life quality is not
exclusively person-centered or place-centered leisure activities, but their interaction.
This means that people who engaged in social activities more frequently and who are
more satisfied with the psychological benefits they derive from leisure, regardless of
people-centered or place-centered, experienced a higher level of perceived quality of
life (Lloyd and Auld, 2001). Despite these results, however, the hierarchical regression analysis revealed that participating in community and religious activities was
the only people-centered leisure activity predictor which contributed significantly to
the objective assessment of living quality when the influences of Internet activities,
new media use, social support, and demographics were controlled.
Furthermore, it is also interesting to note that socioeconomic status, indicated by
age, gender, education, and income variables, only contributed a total of 3% incremental variance in the 34% total explained, while social support accounted for 20%,
media-related activities 10%, and leisure activities 1% of the variance. This suggests
that economic status is not a key determinant in predicting life quality in this data.
177
6. Conclusion
To conclude, this study has demonstrated that social relationships and social
supports are potent variables that can enhance quality of life. This suggests that
happy people may be those who receive and give love, aection, sympathy, guidance,
advice, information, and social companionship which involves spending time with
others in leisure and recreational activities. As a result, well-connected people, both
online and oine, with strong socially supportive relationships would contribute
greatly in both quality and quantity of life (House, 1986). Furthermore, use of the
Internet and some new media technologies do play important roles in enhancing life
quality, especially in music listening from CD/MD/MP3 and non-pathological use of
computer games, ICQ, or chat rooms on the Internet. However, the addictive potential of the Internet with harmful consequences could silently run rampant in our
schools, our universities, and our homes. These are the new societal challenges that
must be addressed through education. Only when parents and teachers recognize
Internet addiction as a true disorder and oer ways to combat it can schools and
parents start regaining the benefits certain applications of the Internet has unwittingly taken away. This research supports the need for the formulation of problem
deterrence policies to prevent excessive non-productive use of the Internet if a high
and sustainable quality of life can be maintained.
Furthermore, while many genuinely appreciate the wonder of technology and the
accommodations it continues to provide, many still find it disconcerting that technology may have created an environment for even greater intrusion, expectations,
and stress. For example, many workers today are perhaps concerned that with their
mobile phones, e-mails, and Internet at home, their work may appear to be a 24hour job intruding into every other aspect of their lives. In the past, it used to take a
day or two for a memo to reach the employeenow we have instant e-mail, which
demands an instant response. In fact, the long hours culture is seriously undermining the quality of family life.
Where technology takes us from here is an issue that is widely discussed. It is also
an issue that is hotly disputed. While technological change will always occur, there
will always be a section of the society, which is unable to accept the change comfortably. With changes so widespread and dramatic as those brought by the Internet,
the associated social changes are also very important. Not everybody is included in
the advantages brought by the Internet and those included may not be included
evenly. Nevertheless, regardless of the positives and negatives, the Internet will
clearly continue to be part of contemporary life. It is hoped that we use it wisely so
that we remain vigilant about how we should use the Internet to truly bring about a
better quality of life.
178
References
Abbey, A., 1993. The eect of social support on emotional well-being. Paper presented at the First
International Symposium on Behavioral Health. Nags Head, North Carolina.
Anderson, B., Tracey, K., 2001. Digital living: the impact (or otherwise) of the Internet on everyday life.
American Behavioral Scientist 45 (3), 456475.
Andrew, F.M., 1986. Research on the quality of life. Survey Research Center, Institute for Social
Research, University of Michigan, MI.
Andrews, F.M., Withey, S.B., 1976. Social Indicators of Well-being: Americas Perception of Life Quality.
Plenum, New York.
Antonucci, T.C., 1990. Social support and social relationships. In: Binstock, R.H., George, L.K. (Eds.),
Handbook of Aging and the Social Sciences. Academic Press, San Diego, CA, pp. 205226.
Argyle, M., 1987. The Psychology of Happiness. Methuen, London.
Auld, C., Case, A., 1997. Social exchange processes in leisure and non-leisure settings: a review and
exploratory investigation. Journal of Leisure Research 29, 183200.
Bier, M., Gallo, M., 1997. Personal empowerment in the study of home Internet use by low-income
families. Journal of Research on Computing in Education 30 (2), 107121.
Brennan, P.F., Ripich, S., Moore, S.M., 1991. The use of home-based computers to support persons living
with AIDS/ARC. Journal of Community Health Nursing 8, 314.
Brennan, P.F., Moore, S.M., Smyth, K., 1995. The eects of a special computer network on caregivers of
persons with Alzheimers disease. Nursing Research 44, 166172.
Brock, B., 2002. Life without TV. Parks & Recreation 37 (11), 6872.
Cobb, S., 1976. Social support as a moderator of life stress. Psychosomatic Medicine 38, 301314.
Cohen, S., Syme, L., 1985. Social Support and Health. Academic Press, Orlando, FL.
Csikszentmihalyi, M., 1997. Finding Flow: The Psychology of Engagement in Everyday Life. Basic Books,
New York.
Diener, E., 1984. Subjective well-being. Psychological Bulletin 95 (3), 542575.
179
Diener, E., Emmons, R., Larsen, R., Grin, S., 1985. The satisfaction with life scale. Journal of
Personality Assessment 49, 7175.
Donald, C.A., Ware, J.E., 1984. The measurement of social support. In: Greenley, R. (Ed.), Research in
Community and Mental Health, vol. 4. JAI Press, Greenwich, CT, pp. 325370.
Dowall, J., Bolter, C., Flett, R., Kammann, R., 1988. Psychological well-being and its relationship to
fitness and activity levels. Journal of Human Movement Studies 14, 3945.
Foong, A., 1992. Physical exercise/sports and biopsychosocial well-being. Journal of the Royal Society of
Health 112, 227230.
Gallienne, R.L., Moore, S.M., Brennan, P.F., 1993. Alzheimers caregivers: psychosocial support via
computer networks. Journal of Gerontology Nursing 19, 1522.
Henderson, C., 2001. How the Internet is changing our lives. Futurist 35 (4), 3845.
House, J.S., 1986. Social support and the quality and quantity of life. In: Andrew, F.M. (Ed.), Research on
the Quality of Life. Survey Research Center, Institute for Social Research, University of Michigan,
Ann Arbor, MI.
Houston, T.K., Cooper, Ford, D.E., 2002. Internet support groups for depression: a 1-year prospective
cohort study. The American Journal of Psychiatry 159 (12), 20622068.
Hu, S., Leung, L., 2003. Eects of expectancy-value, attitudes, and use of the Internet on psychological
empowerment experienced by Chinese women at the workplace. Telematics and Informatics 20 (4),
365382.
Kahn, R.L., Antonucci, T.C., 1980. Convoys over the life course: attachment, roles and social support.
In: Baltes, P.B., Brim, O. (Eds.), Life-Span Development and Behavior, vol. 3. Lexington Press,
Boston.
Kavanagh, A.L., Patterson, S.J., 2001. The impact of community computer networks on social capital and
community involvement. American Behavioral Scientist 45 (3), 496510.
Kernan, J., Unger, L., 1987. Leisure, quality of life, and marketing. In: Samli, A. (Ed.), Marketing and the
Quality of Life Interface. Quorum Books, New York, pp. 236252.
Kraut, R., Patterson, M., Lundmark, V., Kiesler, S., Mukopadhyay, T., Scherlis, W., 1998. Internet
paradox: a social technology that reduces social involvement and psychological well-being? American
Psychologist 53, 10171031.
Kubey, R., Csikszentmihalyi, M., 1990. Television and the Quality of Life: How Viewing Shapes Everyday
Experience. LEA, Hillsdale, NJ.
Leung, L., 2004. Societal, organizational and individual factors in the adoption of telework. In: Lee, P.,
Leung, L., So, C.Y.K. (Eds.), Impact and Issues in News Media: Toward Intelligent Societies.
Hampton Press, Cresskill, NJ.
Lloyd, K.M., Auld, C.J., 2001. The role of leisure in determining quality of life: issues of content and
measurement. Social Indicators Research 57, 4371.
McCormick, B., McGuire, F., 1996. Leisure in community life of older rural residents. Leisure Sciences 18,
7793.
McDowell, I., Newell, C., 1996. Measuring Health: A Guide to Rating Scales and Questionnaires, second
ed. Oxford University Press, New York.
McPheat, D., 1996. Technology and life-quality. Social Indicators Research 38-1, 2952.
Mercer, C., 1994. Assessing liveability: from statistical indicators to policy benchmarks. In: Mercer, C.
(Ed.), Urban and Regional Quality of Life Indicators. Institute for Cultural Policy Studies, Grith
University, Brisbane, pp. 312.
Moller, V., 1992. Spare time use and perceived well-being among black South African youth. Social
Indicators Research 26, 309351.
Prenshaw, P.J., 1994. Good life images and brand name associations: evidence from Asia, America, and
Europe. In: Allen, C., John, D.R. (Eds.), Advances in Consumer Research, vol. 21. Association for
Consumer Research, Provo, UT.
Putnam, R.D., 1995. Bowling Alone: The Collapse and Revival of American Community. Simon and
Schuster, NY.
Rheingold, H., 1993. The Virtual Community: Homesteading on the Electronic Frontier. Addison-Wesley,
Reading, MA.
180
Ross, C.E., Mirowsky, J., 2002. Family relationships, social support and subjective life expectancy.
Journal of Health and Social Behavior 43 (4), 469489.
Sherbourne, C.D., Stewart, A., 1991. The MOS social support survey. Social Science & Medicine 32, 705
714.
Shin, C.C., Johnson, D.M., 1978. Avowed happiness as an overall assessment of quality of life. Social
Indicators Research 5, 475492.
Siegenthaler, K., Vaughan, J., 1998. Older women in retirement communities: perceptions of recreation
and leisure. Leisure Sciences 20, 5366.
Sirgy, M.J., Lee, D.J., Kosenko, R., Meadom, H.L., 1998. Does television viewership play a role in the
perception of quality of life? Journal of Advertising 27 (1), 125142.
Wachter, C., Kelly, J., 1998. Exploring VCR use as a leisure activity. Leisure Sciences 20, 213227.
Wankel, L., Berger, B., 1990. The psychological and social benefits of sport and physical activity. Journal
of Leisure Research 22, 167182.
Wei, R., Leung, L., 1998. Owning and using new media technology as predictors of quality of life.
Telematics and Informatics 15 (4), 237251.
Wellman, B., Gulia, M., 1999. Net surfers dont ride alone. In: Wellman, B. (Ed.), Networks in the Global
Village. Westview, Boulder, CO, pp. 331366.
Wellman, B., Haythornthwaite, C., 2002. The Internet in Everyday Life. Blackwell, Malden, MA.
White, H., McConnell, E., Clipp, E., Bynum, L., 1999. Surfing the Net in later life: a review of the
literature and pilot study of computer use and quality of life. Journal of Applied Gerontology 18 (3),
358378.
Wright, K., 2000. Computer-mediated social support, older adults, and coping. Journal of Communication 50 (3), 100118.
Abstract
Competing claims have been presented in the literature regarding the impact of Internet use
on social support. Some theorists have suggested that Internet use increases social interaction
and support (Silverman, 1999, American Psychologist 54, 780781), while others have argued
that it leads to decreased interaction and support (Kiesler & Kraut, 1999, American Psychologist 54, 783784). This study was designed to address this issue by examining the relationships among Internet use, personality, and perceived social support. Two-hundred and six
participants completed questionnaires that assessed Internet use, personality (agreeableness,
conscientiousness, extraversion, neuroticism, openness), and perceived social support. Using
principal components analysis, individual computer activities were combined into three primary factors: Technical, Information Exchange, and Leisure. Correlation and regression
analyses revealed only a marginal relationship between computer use and social support.
Similarly, only modest associations were found between personality and computer use. However, personality did moderate the relationship between computer use and social support.
That is, on two occasions, high computer use coupled with high personality was associated
with decreased perceived social support and on a third occasion this combination resulted in
increased perceived social support. These results help to address some of the inconsistencies
that have been reported in the literature. # 2002 Elsevier Science Ltd. All rights reserved.
Keywords: Internet; Computer use; Social support; Personality
It can be argued that the Internet has opened up a new frontier for human interaction. Like any new frontier there are many unknown factors and challenges associated with its exploration. The Internet is no exception to this rule, especially when
one is attempting to understand the impact of online activity on social interaction.
* Corresponding author. Fax: +1-843-953-7151.
E-mail address: swickertr@cofc.edu (R.J. Swickert).
0747-5632/02/$ - see front matter # 2002 Elsevier Science Ltd. All rights reserved.
PII: S0747-5632(01)00054-1
438
In particular, the role that the Internet might play in inuencing an individuals
social support system is, as of yet, unclear. Some researchers have suggested that
online activity might serve to facilitate an individuals feeling of social support
(Bromberg, 1996; Mickelson, 1997; Parks & Floyd, 1996; Silverman, 1999; Winzelberg, 1997). Others have indicated that Internet use can actually degrade social
relationships and reduce an individuals feeling of support (Jones, 1997; Kiesler &
Kraut, 1999; Kraut, Patterson, Lundmark, Kiesler, Mukopadhyay, & Scherlis,
1998b). This study was designed to test these competing claims by investigating the
relationship between Internet use and perceived social support.
Researchers who argue that Internet use facilitates feelings of social connectedness
and social support cite a variety of factors that appear to contribute to this eect.
One of the most important of these factors concerns the opportunity that the Internet aords individuals to meet and interact with people who have similar interests
(McKenna & Bargh, 2000). Relationships formed online via chat rooms or discussion groups might allow individuals with mutual interests or experiences to obtain
information and encouragement from others who are like-minded. Similarity has
long been known to contribute to friendship formation (Martin & Anderson, 1995;
Newcomb, 1961) and the Internet seems to maximize this eect. Indeed, researchers
have determined that it is common for individuals to form friendships with others
online (Katz & Aspden, 1997; The UCLA Internet Report, 2000) and to consider
those relationships to be as close as face-to-face non-Internet relationships
(McKenna, 1998; Parks & Floyd, 1996). Furthermore, research has demonstrated
that online relationships can be an important source of social support. For instance,
Winzelberg (1997), using an archival analysis approach, analyzed the postings of an
eating disorder discussion group over a 3-month period. Comments posted were
categorized into dierent types of social interaction. While it was found that the
most common message content involved self-disclosure (31%), requests for information (23%) and the direct provision of emotional support (16%) were also
recorded. These results are consistent with the conclusion that individuals do receive
(and provide) social support through online interaction and similar research has
supported this nding (King & Moreggi, 1998; Mickelson, 1997). Unfortunately
though, this work is based primarily on discussion group participants and therefore
may not generalize to other types of online contact (e.g., chat rooms, multiuser
dungeons). In addition, other research has suggested that online interaction
may actually reduce social connections and feelings of social support (Kraut et
al., 1998b).
The Home Net Project (Kraut, Kiesler, Mukopadhyay, Scherlis, & Patterson,
1998a) is the seminal study to date that provides evidence for the negative social
impact of the Internet. In this study, a sample of 169 people in the Pittsburgh,
Pennsylvania area were followed during their rst 2 years online. Kraut et al.
(1998a) reported that as participants used the Internet more their social connectedness, as measured by contact with family and friends, was reduced. Participants
perceptions of their social support was also measured over the 2-year period.
Although a negative relationship was found between Internet use and perceived
support, this relationship failed to meet the traditional level of statistical sig-
439
nicance. One reason why this eect may have failed to reach signicance is that the
measure used to assess social support was an abbreviated version of a larger scale
(the Interpersonal Support Evaluation List) and therefore the range of the scale may
have been restricted, making it dicult to detect a signicant eect. Also, because
only part of the scale was used, the measure may not have been psychometrically
reliable or valid. Because of these methodological problems the relationship between
Internet use and perception of social support remains unclear.
Given the conicting theoretical views, inconsistent research ndings, and paucity
of strong empirical evidence, further study is required to clarify the relationship
between Internet use and social support. We were particularly interested in determining the relationship between online activity and a type of support called perceived social support. Measures of perceived support assess whether individuals
perceive that they have others they can turn to for support (Cohen & Hoberman,
1983). We chose to focus on this facet of social support because recent research
suggests that perceived support is more psychologically salient and meaningful than
other types of support (e.g., objective or structural support; Hittner & Swickert,
2002). In addition, perceived support has been shown to be more strongly associated
with eective coping eorts than are other types of social support processes (Lakey
& Drew, 1997; Mankowski & Wyer, 1997). Given the importance of this type of
social support, it appears reasonable to assume that if Internet use does indeed
inuence social support, then, by measuring perceived levels of support, one should
be able to assess this putative eect. The question remains, however, as to the nature
of this eect. That is, does Internet use increase the amount of support a person
perceives because he or she now has more people in their support network? Or,
conversely, would it reduce the quality of the Internet users face-to-face social
contacts and lead to a degraded sense of support? Addressing this issue was one goal
of this study.
In addition to investigating the relationship between online activity and perceived social support, we were also interested in exploring the relationship between
Internet use and personality. One potentially fruitful place to start in addressing
the relationship between personality and online activity is with the Five Factor
Model (FFM) of personality. Extraversion (E) and neuroticism (N) are two of the
proposed Big Five personality traits; the other big ve traits include agreeableness (A), openness (O), and conscientiousness (C) (Costa & McCrae, 1992a). The
big ve, while not universally accepted (see Block, 1995, for a dissenting opinion),
are generally viewed as the essential traits of personality (McCrae & Costa,
1999), and they have been demonstrated to account for a wide variety of behaviors
from job performance to stress and coping (Barrick & Mount, 1991; OBrien &
DeLongis, 1996; Watson & Hubbard, 1996). Likewise, there is reason to believe
that some of these personality traits may be predictive of Internet use. For
instance, it could be argued that individuals who are high in openness, with their
curious manner and their tendency toward adventure seeking (McCrae, 1996),
might be very attracted to online activity as an opportunity to explore and seek out
the new and novel. Recent research seems to bear out this prediction (Tuten &
Bosnjak, 2001). Similarly, individuals high in agreeableness are often described as
440
very nice and easy to get along with (Costa & McCrae, 1992b). Given the
sometimes hostile nature of Internet interactions (Joinson, 1998), this trait might
make them very attractive to others when they go online and make it easier for
them to form friendships online. Likewise, individuals that are high in extraversion
tend to be gregarious and are attracted to stimulating environments (Eysenck,
1967). This tendency may inuence the extravert to go online to seek out the new
and exciting. In fact, researchers have documented, at least for males, a positive
association between extraversion and surng sex web sites (Hamburger & BenArtizi, 2000). However, in the same study, a negative correlation was found
between extraversion and traditional social online activities (e.g., chat room visits,
participate in discussion groups). Finally, it has been documented that individuals
that are high in neuroticism report lower levels of Internet usage (Tuten &
Bosnjak, 2001), and, in particular, information based activities (e.g., utilizing
search engines). This tendency may be due to the neurotics higher level of
anxiety and lowered self-ecacy in this particular domain. While these ndings
are suggestive, much of this work is based on single studies employing relatively
small numbers of subjects drawn from psychology courses. We were interested in
replicating these ndings in a larger and broader-based sample. Moreover, we
chose to have participants report the specic amount of time they spent engaged
in online activites, rather than asking subjects to approximate their time
online with likert-scale descriptors (1=not at all; 5=a lot). Because of this
specicity, we believe our approach will yield a more precise assessment of Internet
use.
In addition to addressing the association between Internet use and personality, a
third goal of this study was to determine the potential moderating role that personality might play between Internet use and social support. Personality factors have
been demonstrated to aect both Internet use (Hamburger & Ben-Artzi, 2000;
Kraut et al., 1998b) and perceived social support (Halamandaris & Power, 1999;
Procidano, 1992; Turner, 1999). Because of this, we believe that personality and
Internet use might interact to inuence perceived support. To illustrate, individuals
high and low in extraversion (E) might be dierentially aected by the same level of
Internet use. Whereas an individual high in E might report no change in perceived
support based on online social experiences, an individual low in E might report
enhanced support. This dierence could be due to the fact that low E individuals,
compared to high E individuals, have more to gain from these interactions because
their social support network is typically smaller. Determining the precise nature of
the interaction between personality (A, C, E, N, O) and Internet use was a nal goal
of this study.
To summarize, there were three major aims of this study. First, this study examined the relationship between Internet use and perceived social support to determine what eect, if any, Internet use may have on social support. A second aim of
the study was to determine if personality dimensions might inuence frequency and
type of Internet use. Finally, a third aim of the study was to explore how personality might moderate the relationship between Internet use and perceived social
support.
441
1. Method
1.1. Participants
Two-hundred and six participants were recruited from computer science, political
science, psychology, and sociology classes at a medium-sized public liberal arts college in the southeastern United States. Sixty-one percent of the participants were
female (39% male) and their ages ranged from 18 to 45 (M=21.34). Finally, 18% of
the participants were African-American, 1% Asian, 78% Caucasian, and 3% classied themselves as other.
1.2. Materials
1.2.1. Social support
The Interpersonal Support Evaluation List (ISEL; Cohen & Hoberman, 1983) was
used to assess the perceived availability of social support. This 48-item questionnaire
assesses four types of social support and yields an overall support measure. The
Appraisal subscale assesses the perceived availability of someone to talk to about
ones problems; the Belonging subscale, the perceived availability of people to do
things with; the Self-esteem subscale, the perceived availability of a positive comparison when comparing oneself to others; and the Tangible subscale, the perceived
availability of someone to provide material aid. Individuals are asked to indicate
whether statements concerning the availability of social support are probably true or
probably false. Items associated with each subscale are summed together to yield
four subscale totals and all of the items are added together to arrive at a total score.
The subscale scores can range from 0 to 12 and the total score can range from 0 to
48. The higher the total score, the higher the level of perceived support. The internal
reliability of the overall scale is good (alpha=0.77). Internal reliability of the
appraisal, belonging, self-esteem, and tangible subscales are adequate as well
(alpha=0.77, 0.75, 0.68, and 0.71, respectively). In the present study, alpha coecients for the subscales could not be calculated because only the sum scale scores,
rather than the individual items, were recorded. However, the internal reliability of
the overall scale could be estimated by treating the four subscales as items. The
resulting alpha coecient for the total ISEL was 0.77. Descriptive statistics, including mean, standard deviation, and range for the ISEL are reported in Table 1.
Information regarding the construct and convergent validity of the ISEL can be
found in Cohen and Hoberman (1983).
1.2.2. Internet use
The Computer Use Survey (CUS) was developed to assess Internet use and social
contact through online interactions. The survey requires the participant to record
the amount of time (hours/minutes) in an average week that she or he engages in a
variety of online activities including: search and do research, visit bulletin boards,
visit chat rooms, create/update websites, play games, use email, use instant messaging, visit multiuser dungeons, and access information as a form of entertainment
442
Table 1
Descriptive statistics for the ISEL, the CUS, and the NEO-FFI
Variable
Social support
Appraisal
Belonging
Self-esteem
Tangible
ISEL total
Computer usea
Search/research
Bulletin board
Chat room
Create webpage
Play games
Email
Instant messaging
Multiuser dungeon
Access information
Personality
Agreeableness
Conscientious
Extraversion
Neuroticism
Openness
a
Mean
Standard Deviation
10.16
8.26
8.55
10.67
37.62
2.46
2.59
2.21
1.90
7.08
131.17
12.51
18.03
8.52
60.54
160.19
77.65
2.95
150.16
204.83
51.54
56.95
35.34
204.07
244.89
176.70
23.10
230.49
31.30
30.74
30.87
21.77
28.81
6.57
7.04
6.40
8.77
5.72
Range
012
012
012
012
1048
02100
0600
0390
0300
02100
02400
01500
0300
01200
644
1146
343
242
1544
(e.g., read newspaper, listen to music). Descriptive statistics for these variables are
presented in Table 1.
1.2.3. Personality
The NEO Five-Factor Inventory (NEO-FFI; Costa & McCrae, 1992b) was used
to assess agreeableness (A), conscientiousness (C), extraversion (E), neuroticism (N),
and openness (O). It consists of 60 items that participants respond to using a vepoint likert scale format (strongly disagree to strongly agree). Each factor is made
up of 12 items that collectively yield a score of 048. In each case, higher numbers
are associated with higher levels of the personality factor. Internal consistency of
this scale was calculated using coecient alpha. Coecients for A, C, E, N, and O
were 0.68, 0.81, 0.77, 0.86, and 0.73, respectively. Construct validity for this test is
reported in the NEO PI-R Manual (Costa & McCrae, 1992b). Descriptive statistics
for the NEO-FFI can be found in Table 1.
1.3. Procedure
Participants were tested while in class, at various sites on campus, typically in a
group of 2025, during 45 min testing sessions. All testing occurred between the
hours of 9.00 a.m. and 3.00 p.m. Participants were told that the purpose of the study
443
was to examine factors associated with Internet use. The assessment packets were
then administered to the participants in the following order: the demographic form,
the ISEL, the CUS, and the NEO-FFI. After completing the study, participants
were thanked for their participation and debriefed.
2. Results
Prior to addressing the major aims of the study, a variety of preliminary analyses
were conducted to reduce the data, transform skewed variables, and screen for
multivariate outliers. Regarding the data reduction procedure, the nine Internet use
variables were subjected to a principal components factor analysis with varimax
rotation and kaiser normalization. Inspection of the scree plot indicated a threefactor solution and the types of Internet use loading on each factor are as follows
(Cronbach alpha values in parentheses): (1) bulletin board use, chat room visitation,
creating web pages, and multiuser dungeon visitation (0.69), (2) search/research,
email use, and accessing information (0.60), and (3) utilizing instant messenger and
playing games (0.75). In light of the low Cronbach alpha coecient for factor No. 2,
we examined the intercorrelations among the three Internet use variables. Although
the correlation between email and accessing information was moderately large in
magnitude (r=0.54), the correlations between these two variables and search/
research were considerably smaller (rs of 0.23 and 0.19 for email and accessing
information, respectively). Given these results, we excluded search/research from the
second factor and recalculated the Cronbach alpha. The new alpha coecient was
0.70 and the three principal component factors accounted for 70% of the variance in
the Internet use intercorrelation matrix. Three Internet use factor variables were
then created by summing the individual Internet use variables within each factor.
Upon visual inspection of the component variables, we decided to label the rst
factor Technical. This factor was made up of bulletin board use, chat room visitation, creating web pages, and multiuser dungeon visitation. While these activities
appear to be quite diverse, we believe that there is a common underlying theme for
this factor. That is, one must be fairly technologically savvy to be able to engage in
all of these online activities, hence, the label of Technical seemed appropriate. The
second factor is comprised of email use and accessing information. Both of these
Internet activities involve either generating (email) or receiving (accessing) information. Therefore, we chose to label this factor Information Exchange. Finally, the
third factor, Leisure, is made up of utilizing instant messaging and playing games.
We felt that both of these activities were consistent with relaxing and having fun
through playing games and interacting with others online.
While these three factors collectively seemed to eectively capture online computer use, not all of our participants engaged in these online activities to an equal
degree. In fact, upon inspection of the factors it was found that many participants
reported that they did not engage in one or more of these types of online activities.
For instance, out of 206 participants only 70 individuals reported computer use
consistent with the Technical factor, 183 participants reported computer use
444
consistent with the Information Exchange factor, and 122 participants reported
computer use consistent with the Leisure factor. In an attempt to eectively deal
with this issue we decided to exclude non-users from the analyses. We reasoned that
it would be irrelevant to ask if computer use was inuencing social support for these
individuals given the fact that they were not participating in these types of online
activities. Therefore, all subsequent analyses were conducted solely on participants
who reported use consistent with each factor (Technical70 participants, Information Exchange183 participants, Leisure122 participants).
After completing the principal components analyses and excluding individuals
who reported no computer use by type of online activity, the normality of the distributions for each of the three computer use factors was examined. Due to the signicant positive skewness of all three factors, each distribution was logarithmically
transformed to increase normality. We also screened for multivariate outliers before
conducting multiple regression analyses. In particular, all participants who evidenced statistically signicant Mahalanobis D2 values were excluded from the
regression analyses. Finally, because so little is known about the relationships
among specic types of Internet use, personality and perceived social support, we
felt it was important to not overlook potential associations among these variables.
Therefore, in order to minimize the likelihood of committing Type II errors, we set
our critical P-value for all analyses at a more liberal value of P < 0.10. We report all
of our ndings on the basis of this value, however, we label P-values between 0.06
and 0.10 as marginally signicant.
The rst aim of the study was to determine if Internet use is related to perceived
social support. Correlational procedures were used to investigate this issue. No signicant correlations were found between two of the three types of Internet use and
the ISEL. Specically, nonsignicant associations were found between Technical
and the ISEL (r= 0.11, P=0.18) and Information Exchange and the ISEL
(r=0.03, P=0.32). However, a marginally signicant correlation was found between
Leisure and the ISEL (r=0.13, P=0.08). To further examine this issue a simultaneous multiple regression analysis (SMR) was conducted by entering all online
activities in one block to predict ISEL. No signicant eects were found in this
analysis.
To explore the second aim of the study, the relationship between personality, as
measured by the NEO-FFI, and Internet use was examined. No signicant correlations were found between personality and Technical. However, personality was signicantly correlated with Information Exchange and Leisure. Regarding
Information Exchange, both neuroticism (r= 0.11, P=0.07) and agreeableness
(r= 0.10, P=0.09) evidenced marginally signicant correlations. A SMR analysis
was conducted by entering all ve personality traits in one block to predict Information Exchange. The results from this analysis were somewhat consistent with the
correlational ndings in that there was a marginally signicant eect for neuroticism
[t (175)= 1.69, P=0.09, = 0.140]. However, no signicant eect was found for
agreeableness. Regarding the computer use factor of Leisure, signicant correlations
were found with neuroticism (r= 0.16, P=0.04) and conscientiousness (r=0.15,
P=0.05), and a marginally signicant association was found with extraversion
445
(r=0.13, P=0.08). However, results of a SMR analysis failed to reveal any signicant personality predictors of Leisure.
To explore the third aim of the study, hierarchical multiple regression analyses
were conducted to examine the interactive eects of Internet use and personality on
perceived social support. The rst set of analyses explored the interactive eects of
Technical and personality in predicting the ISEL. The main eects of personality (A,
C, E, N, and O) and Technical were entered in the rst block of the equation and the
interactions between personality and Technical (ATechnical, CTechnical,
ETechnical, NTechnical, OTechnical) were entered in the second block of the
equation. Results of these analyses demonstrated a signicant main eect for extraversion [t(60)=4.74, P=0.001, =0.587] and a marginal main eect for openness
[t(60)= 1.91, P=0.06, = 0.195]. In addition, a marginally signicant interaction
eect was found between neuroticism and Technical [t(55)= 1.75, P=0.09,
= 0.957]. In order to explore the nature of the interaction eect, we plotted the
four mean ISEL scores that are obtained by factorially crossing the neuroticism and
Technical factor (i.e., Low N, Low T; Low N, High T; High N, Low T; High N,
High T). These group means indicated that individuals who are high in neuroticism
and high in Technical have lower levels of perceived social support than any other
group (Fig. 1).
The second analysis examined the interactive eects of Information Exchange and
personality in predicting the ISEL. The same procedure as mentioned above was
utilized in entering the variables into the equation to predict the ISEL. Signicant
main eects were found for neuroticism [t(167)= 2.49, P=0.01, = 0.181],
extraversion [t(167)=5.58, P=0.001, =0.425], and openness [t(167)= 2.03,
Fig. 1. Mean perceived social support scores by Technical computer use and Neuroticism.
446
P=0.04, = 0.137]. A marginally signicant interaction eect was found for neuroticism and Information Exchange [t(167)= 1.82, P=0.07, = 0.707]. Visual
inspection of the mean ISEL scores for the four groups indicated that individuals
high in neuroticism and high in Information Exchange reported lower levels of perceived social support than any other group (Fig. 2).
The third analysis examined the interactive eects of Leisure and personality in
predicting the ISEL. The same procedure as noted above was utilized in entering the
variables into the equation to predict the ISEL. Once again, signicant main eects
were found for neuroticism [t(110)= 2.71, P=0.01, = 0.246] and extraversion
[t(110)=4.75, P=0.001, =0.438). A signicant interaction eect was also found for
agreeableness and Leisure [t(105)=2.38, P=0.02, =1.691]. An examination of the
four group means indicated that individuals high in agreeableness and high in Leisure
reported higher levels of perceived social support than any other group (Fig. 3).
3. Discussion
There were three major aims of this project. First, this study attempted to determine the association between Internet use and perceived availability of social support. Results of this study indicated no strong relationships between these variables.
However, a marginally signicant positive correlation was found between Leisure
and the ISEL. The factor of Leisure involves social Internet activities like instant
messaging and playing games with others online. The positive relationship between
Fig. 2. Mean perceived social support scores by Information Exchange computer use and Neuroticism.
447
Fig. 3. Mean perceived social support scores by Leisure computer use and Agreeableness.
these variables indicates that individuals who reported higher Leisure use perceived
greater social support when compared with individuals who reported less Leisurebased online activity. Although this positive correlation is suggestive, it is important
to note that this nding is only marginally signicant and SMR analyses did not
replicate this association.
The second aim of this project was to determine the relationship between Internet
use and ve basic personality factors. There were no signicant relationships found
between Technical Internet use and any of the personality traits. However, personality was marginally related to Information Exchange (email and accessing information) and Leisure (instant messaging and playing games). The personality
dimension of neuroticism seemed to be most consistently related to these types of
online activities. While other personality traits were correlated with Information
Exchange (agreeableness) and Leisure (conscientiousness, extraversion), these correlations were not supported by regression analyses and therefore do not merit further discussion. Regarding the eects of neuroticism, both correlation and
regression analyses revealed marginally signicant negative associations between
neuroticism and Information Exchange and neuroticism and Leisure. These ndings
indicate that individuals who are high in neuroticism are less likely to utilize these
types of Internet activities. While these ndings are consistent with some research
presented in the literature (Tuten & Bosnjak, 2001), these results contradict other
published results. Specically, although Hamburger and Ben-Artzi (2000) reported a
positive relationship between neuroticism and social-leisure activities, our research
was not supportive of this nding. We found a negative relationship between
448
neuroticism and Leisure activity. One explanation for the inconsistency between the
present results and past research concerns the degree of measurement specicity that
has been utilized when assessing Internet activity. In particular, whereas previous
studies have generally examined only global measures of activity on the Internet, the
present study measured online activities much more precisely (e.g., reported minutes
online). In addition, whereas most previous work has examined individual Internet
use variables as the unit of analysis, this study utilized principal component factors
in all inferential statistical analyses. This approach is generally regarded by psychometricians as being superior to individual variable analyses because principal components are typically more reliable than individual variables (Tabachnick & Fidell,
1989). Regardless of the merits of this study, the association between neuroticism
and leisure Internet use requires further attention in order to clarify the inconsistencies in the literature.
The third aim of this study was to examine whether personality serves to moderate
the association between Internet use and perceived social support. Both signicant
and marginally signicant interaction eects were found between personality and
Internet use. Regarding the marginal eects, neuroticism was found to interact with
Technical Internet use (bulletin board, chat room, web page, multiuser dungeon) in
that individuals high in neuroticism and high in Technical use reported lower perceived support than any other group. This same trend was found between neuroticism and Information Exchange. Individuals high in neuroticism and high in
Information Exchange reported lower perceived support compared with the other
groups. These eects imply that highly neurotic individuals who use these types of
Internet activities do seem to be at risk for lowered perception of social support.
However, the causal direction of this eect may not be so clear, and in fact, may
actually operate in a reverse manner. That is, highly neurotic individuals who have
very low levels of perceived support might seek out these types of Internet activities
in an eort to compensate for their lowered sense of support. While this issue is
beyond the purpose and scope of this study, future work in this area should try to
elucidate the nature and causal direction of the associations between neuroticism
and these types of Internet activities.
Finally, a signicant interaction eect was found between agreeableness and
Leisure. Participants who reported high levels of agreeableness and high levels of
Leisure Internet use perceived themselves as having higher levels of social support,
compared to the other groups. While the current study does not allow for any denitive explanation of this eect, perhaps it is the case that highly agreeable individuals experience more positive interactions when engaging in instant messaging and
online games, which leads to higher quality social interactions and higher levels of
perceived support. Obviously the merits of this positive social interaction
hypothesis cannot be discerned from the current study and hence necessitates further
research. It is also unclear as to why agreeableness was the only personality factor to
interact with Leisure in this manner. This issue would likewise benet from further
research.
In summary, while this study seems to invite as many questions as it addresses,
one should not be surprised by this, given that we are exploring a new area of
449
research, a new frontier. However, what can be said about these ndings is that
although Internet use alone may not strongly inuence perceived social support, it
does seem to interact with personality in an important way to inuence perceptions
of support. Furthermore, these ndings help to address some of the inconsistencies
that have been reported in the literature. Specically, research in this area has
alternately indicated that Internet use either facilitates or degrades social relationships and social support (Kraut et al., 1998a, 1998b; Parks & Floyd, 1996). What
this study demonstrates is that both of these eects can occur, it is not simply a
question of one or the other. To illustrate, high levels of neuroticism, when combined with high levels of specic types of Internet use, are associated with reduced
feelings of social support. In contrast, high levels of agreeableness, coupled with
high levels of Internet use, lead to an enhancement of perceived support.
While this study may help to address some of the inconsistencies in the Internetsocial support literature, we acknowledge that there are some limitations of the
present study that should be considered. First, while the participants were selected to
represent a wide variety of majors, the sample used in this study is still based on
college students and therefore is somewhat limited in its generalizability. Also, while
this study assessed Internet use more precisely than past studies, the measurement of
this variable was based on a self-report approach, rather than on objective behavioral criteria. Therefore, the assessment of Internet use may be somewhat biased due
to memory errors. In addressing these limitations, future research should attempt to
survey a more representative sample that includes both college students as well as
traditional adults. Furthermore, rather than relying on self-reported Internet use, a
behavioral measure (e.g., a computer program that records time spent online) could
be employed which would perhaps yield a more reliable measurement of Internet
activity.
To conclude, while this study has several limitations that need to be addressed in
future work, it nevertheless makes an important contribution to our understanding
of the eects of Internet use. Specically, these ndings indicate that researchers can
no longer look at bivariate relationships or simple main eects of online activity on
social support and expect to understand the complexity of the association between
these two constructs. Other relevant variables, such as personality factors, should
also be considered as they might exert important moderating eects. Future work
should attempt to understand why certain personality traits are associated with
benecial eects of Internet use (enhanced support) while other personality factors
seem to be associated with more problematic experiences (degraded support). In
focusing on such issues, a more accurate understanding of the relationship between
Internet use and social support may be possible.
Acknowledgements
The authors would like to thank Andy Abrams, Von Bakanic, and Walter Pharr
for allowing us to recruit participants in their classes.
450
References
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: a
meta-analysis. Personnel Psychology, 44, 126.
Block, J. (1995). A contrarian view of the ve-factor approach to personality description. Psychological
Bulletin, 117, 187215.
Bromberg, H. (1996). Are MUDs communities? Identity, belonging and consciousness in virtual worlds.
In R. Shields (Ed.), Cultures of the Internet: virtual spaces, real histories, living bodies (pp. 143152).
London: Sage.
Cohen, S., & Hoberman, H. M. (1983). Positive events and social supports as buers of life change stress.
Journal of Applied Social Psychology, 13, 99125.
Costa, P. T., & McCrae, R. R. (1992a). Four ways ve factors are basic. Personality and Individual Differences, 13, 653665.
Costa, P. T., & McCrae, R. R. (1992b). NEO PI-R professional manual. Odessa, Florida: Psychological
Assessment Resources.
Eysenck, H. J. (1967). The biological basis of personality. Springeld, Illinois: Charles Thomas.
Halamandaris, K. F., & Power, K. G. (1999). Individual dierences, social support and coping with the
examination stress: a study of the psychosocial and academic adjustment of rst year home students.
Personality and Individual Dierences, 26, 665685.
Hamburger, Y. A., & Ben-Artzi, E. (2000). The relationship between extraversion and neuroticism and
the dierent uses of the Internet. Computers in Human Behavior, 16, 441449.
Hittner, J. B., & Swickert, R. J. (2002). Modeling functional and structural social support via conrmatory factor analysis: evidence for a second-order global support construct. Journal of Social
Behavior and Personality (in press).
Joinson, A. (1998). Causes and implications of disinhibited behavior on the Internet. In J. Gackenbach
(Ed.), Psychology and the Internet: intrapersonal, interpersonal, and transpersonal implications (pp. 43
60). San Diego: Academic Press.
Jones, S. G. (1997). The Internet and its social landscape. In S. G. Jones (Ed.), Virtual culture: identity and
communication in cybersociety (pp. 735). London: Sage Publications.
Katz, J. E., & Aspden, P. (1997). A nation of strangers? Communications of the ACM, 40, 8186.
Kiesler, S., & Kraut, R. (1999). Internet use and ties that bind. American Psychologist, 54, 783784.
King, S. A., & Moreggi, D. (1998). Internet therapy and self-help groupsthe pros and cons. In
J. Gackenbach (Ed.), Psychology and the Internet: intrapersonal, interpersonal, and transpersonal implications (pp. 77109). San Diego: Academic Press.
Kraut, R., Kiesler, S., Mukopadhyay, T., Scherlis, W., & Patterson, M. (1998a). Social impact of the
Internet: what does it mean? Communications of the ACM, 41, 2122.
Kraut, R., Patterson, M., Lundmark, V., Kiesler, S., Mukopadhyay, T., & Scherlis, W. (1998b). Internet
paradox: a social technology that reduces social involvement and psychological well-being? American
Psychologist, 53, 10171031.
Lakey, B., & Drew, J. B. (1997). A social-cognitive perspective on social support. In G. R. Pierce,
B. Lakey, I. Sarason, & B. Sarason (Eds.), Sourcebook of social support and personality (pp. 107140).
New York: Plenum Press.
Mankowski, E. S., & Wyer, R. S. (1997). Cognitive causes and consequences of perceived social support.
In G. R. Pierce, B. Lakey, I. Sarason, & B. Sarason (Eds.), Sourcebook of social support and personality
(pp. 141168). New York: Plenum Press.
Martin, M. M., & Anderson, C. M. (1995). Roommate similarity: are roommates who are similar in their
communication traits more satised? Communication Research Reports, 12, 4652.
McCrae, R. R. (1996). Social consequences of experiential openness. Psychological Bulletin, 120, 323337.
McCrae, R. R., & Costa, P. T. (1999). A ve-factor theory of personality. In L. A. Pervin, & O. P. John
(Eds.), Handbook of personality: theory and research (pp. 139153). New York: Guilford.
McKenna, K. Y. A. (1998). The computers that bind: relationship formation on the Internet. Unpublished
doctoral dissertation, Ohio University.
451
McKenna, K. Y. A., & Bargh, J. A. (2000). Plan 9 from cyberspace: the implications of the Internet for
personality and social psychology. Personality and Social Psychology Review, 4, 5775.
Mickelson, K. D. (1997). Seeking social support: parents in electronic support groups. In S. Kiesler (Ed.),
Culture of the Internet (pp. 157178). Mahwah, New Jersey: Lawrence Erlbaum Associates.
Newcomb, T. M. (1961). The acquaintance process. New York: Holt, Rinehart & Winston.
OBrien, T. B., & DeLongis, A. (1996). The interactional context of problem-, emotion-, and relationshipfocused coping: the role of the big ve personality factors. Journal of Personality, 64, 775813.
Parks, M. R., & Floyd, K. (1996). Making friends in cyberspace. Journal of Communication, 46, 8097.
Procidano, M. E. (1992). The nature of perceived social support: ndings of meta-analytic studies. In
C. D. Spielberger (Ed.), Advances in personality assessment (Vol. 9) (pp. 126). Hillsdale, NJ: Lawrence
Erlbaum Associates.
Silverman, T. (1999). The Internet and relational theory. American Psychologist, 54, 780781.
Tabachnick, B. G., & Fidell, L. S. (1989). Using multivariate statistics (2nd ed.). New York: HarperCollins.
Turner, R. J. (1999). Social support and coping. In A. V. Horwitz, & T. L. Scheid (Eds.), A handbook for
the study of mental health: social contexts, theories, and systems (pp. 198210). New York: Cambridge
University Press.
Tuten, T., & Bosnjak, M. (2001). Understanding dierences in web usage: the role of need for cognition
and the ve factor model of personality. Social Behavior and Personality, 29, 391398.
The UCLA Internet Report (2000). Surveying the digital future. Available: www.ccp.ucla.edu.
Watson, D., & Hubbard, B. (1996). Adaptational style and dispositional structure: coping in the context
of the ve-factor model. Journal of Personality, 64, 737773.
Winzelberg, A. (1997). The analysis of an electronic support group for individuals with eating disorders.
Computers in Human Behavior, 13, 393407.
a r t i c l e
i n f o
Article history:
Available online 19 May 2011
Keywords:
Internet use
Personality
Happiness
Introversion
Social support
a b s t r a c t
The Internet is no longer an advanced technology accessible to a select few. It has become a ubiquitous
tool for users ranging from professional programmers to casual surfers and young children. The exponential increase in time online has prompted curiosity and speculation about the interaction between this
technology and individual person variables. While general survey data exist regarding broad patterns
of Internet use, less is known about the relationship between specic usage and individual personality
dimensions, mood variables, or social activity. This study sought to clarify several of these relationships.
One hundred eighty-ve undergraduate student volunteers completed two detailed measures of Internet
use across various domains (for example: work/school, tasks/services, entertainment), as well as measures of happiness, perceived social support, and introversion. Specic types of Internet use, including
gaming and entertainment usage, were found to predict perceived social support, introversion and happiness. Use of the Internet for mischief-related activities (for example: downloading without payment,
fraud, snooping) was associated with lower levels of happiness and social support. These ndings support
the utility of and need for specic rather than general Internet research. Directions for future research
clarifying the role of the Internet in quality of life and interpersonal relations are suggested.
2011 Elsevier Ltd. All rights reserved.
1. Introduction
Members of almost every demographic background use the
Internet in order to stay better connected with loved ones, to
quickly and efciently complete daily tasks and transactions, and
stay abreast of the most up-to-date current events. Broad survey
studies conrm that Internet use continues to rise, and that previously cited gaps based on age, gender, technology access and socioeconomic status, are quickly disappearing (c.f., Fallows, 2004;
Lenhart, Madden, Macgill, & Smith, 2007; Madden, Fox, Smith, &
Vitak, 2007).
The Pew Internet and American Life Project represents one of
the main efforts to gather large-scale data on Internet use. Using
nationwide telephone surveys, most recently in December 2008
(N = 2253), Pew has been a leader in documenting the activities
of the Internet. Those data support and verify the rapid continued
expansion of Internet use. While the Pew project has characterized
teens as one of the most wired segments of the American population for the past 10 years, they also reported that Internet penetration reached 74% for all American adults in 2008 (Jones & Fox,
Corresponding author. Address: 3105 S. Dearborn, Illinois Institute of Technology, 252 LS, Chicago, IL 60616, United States. Tel.: +1 312 567 3501; fax: +1 312 567
3493.
E-mail address: mitchelle@iit.edu (M.E. Mitchell).
0747-5632/$ - see front matter 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.chb.2011.04.008
1858
broadening in ways that, only a few years ago, would have seemed
unlikely or impossible. In addition to access to products, information, and transactions, users increasingly turn to the Internet for
social reasons. The Internet enables individuals to nd new relationships and fosters more efcient communication within existing relationships as well as offers multitudes of new ways to
develop and maintain friendships and romances. It is unclear however, how individual person variables, and interpersonal variables
interact with the burgeoning use.
Perceived social support has long been recognized (e.g., Barrera,
1986; Cohen & Wills 1985; Winemiller, Mitchell, Sutliff, & Cline,
1993) to provide a buffer in times of stress, increase happiness,
and enhance psychological well-being. Internet relationships offer
a new avenue for potential experiences of perceived social support,
in which relationships may exist entirely without any face-to-face
interaction. It is an empirical question whether or not interpersonal relationships developed and maintained predominantly or
even entirely over the Internet increase levels of perceived support
and/or convey the same benets that social support has been
shown to provide in the past.
Some (e.g., Parks & Floyd, 1996) contend that online interactions are shallow approximations of quality real life relationships,
and that cyberspace creates an easily-penetrated illusion of community. This argument suggests the possibility that time spent online, in lieu of participating in the face-to-face world, might
actually detract from an individuals assessment of perceived social
support. Indeed, preliminary survey data suggest that online relationships may not be equivalent to their face-to-face counterparts.
Virtual interactions are generally marked by higher levels of selfdisclosure than face-to-face interactions (Underwood & Findlay,
2004). Deception and misrepresentation on the Internet are easy
and frequent, and misinterpretation of specic interactions due,
in part, to the absence of nonverbal cues, also are common concerns (Wallace, 1999; Whitley, 1997). The somewhat limited data
are mixed; some studies support better outcomes in face-to-face
interactions, whereas others show evidence that online support
carries unique benets (Bargh, Katelyn, & McKenna, 2004). Controversy exists as a consequence of contradictory ndings and much
remains unknown regarding the benets and drawbacks of online
social support.
Initial investigations of the emotional benets and consequences of the Internet (Kraut et al., 1998; Shklovski, Kraut, &
Cummings, 2006) found high levels of Internet use to be associated
with depression and social isolation. Specically, increased time
online was associated with declines in individuals communication
with members of their household, declines in the size of their
face-to-face social circle and increased feelings of loneliness.
Amichai-Hamburger, Fine, and Goldstein (2004) found that Internet use was directly related to feelings of loneliness. These ndings
have not been consistently supported by other studies. For example, Bargh et al. (2004) reviewed the existing Internet research
and disputed the conclusion that Internet use contributes to
depression and loneliness characterizing those ndings as exaggerated media-friendly fallacies. Their review found that the Internet
helped reduce these symptoms, facilitating relationships with
long-distance friends and family members and enhancing feelings
of connectivity and community.
In addition to these studies, several authors (e.g. AmichaiHamburger et al., 2004; Kraut et al., 1998) have attempted to identify patterns of Internet use in relation to personality variables. The
notion that individuals may be predisposed to excessive use or
avoidance of Internet use is predicated on the view that individual
characteristics underlie this behavior in much the same way as
these same variables would inuence face-to-face behavior. Amichai-Hamburger et al. (2004) examined introversion, identity, and
level of neurotic behavior in relation to establishing group mem-
1859
Social Support Appraisal Scale (SS-A; Vaux et al., 1986). The SSA is a 23-item questionnaire evaluating perceived support of
friends, family, and others. It has been shown to have good internal
consistency, and adequate concurrent, divergent, and convergent
validity.
Multidimensional Scale of Perceived Social Support (MSPSS; Zimet, Dahlem, Zimet, & Farley, 1988). The MSPSS is a 12-item measure designed to examine ones subjective assessment of social
support adequacy from family, friends, and other signicant others. The authors reported excellent total scale reliability r :88,
as well as excellent testretest reliability r :85.
MyersBriggs Type Indicator (MBTI; Hirsch & Kummerow,
1989). The MBTI provides a useful measure of personality based
on eight personality preferences. The eight preferences are organized into four bi-polar scales (extroversionintroversion; sensingintuiting; thinkingfeeling; judgingperceiving). For the
current study, only the extroversionintroversion scale was used.
Self-Assessment for Introverts (SAI; Laney, 2002). This measure assesses individuals level of introversion and contains 30
true or false items. Scores are categorized into three groups:
introvert, middle of the continuum, or extrovert. This measure
is unique because it conceptualizes introversion as distinct from
extroversion. No psychometric data have been reported for this
measure.
Internet Usage Survey (IUQ). This questionnaire was developed
for the current study to measure solitary Internet usage not involving interaction, real or virtual, with others. The measure consists of
thirty questions across six domains of solitary Internet activity:
purchasing, information-seeking, tasks/services, entertainment,
work/school-related activities and mischief. Each domain contains
ve items assessing the frequency of different aspects of use. The
ve items within each domain include endorsement (or not) of
participation in that domain at any time, acknowledgment (or denial) of current use, and an item asking about specic activities
within the domain. Examples of domain-specic activities are
illustrated in Table 1. The IUQ yielded a Cronbachs alpha coefcient of r :80. Test retest reliabilities for domains ranged from
r 1 to r :22 for information seeking; purchasing r :60 to
r :02; entertainment r :83 to r :22; mischief r :71 to
r :23; tasks and services r :73 to .11; work and school r 1
to r :19. The lowest reliabilities were associated with activities
that logically are low frequency.
Modes of Interaction Questionnaire (MOIQ). Developed for the
current study, this questionnaire assesses the different ways an
individual can interact with others on the Internet, and the frequency and duration of those activities. It is comprised of seven
categories of interactive activity including: instant messaging,
audio/video conferencing, virtual dating, interactive online
games/activities, emailing, chatrooms, and cybersex. Respondents
are asked to indicate the frequency with which they engage in each
mode of interaction. Within each domain respondents are asked
rst to endorse or deny ever having engaged in activities within
the domain, a second item then requires the respondent to
acknowledge or deny current activities within the domain. For
each item, the respondent is asked whether a real, virtual or a combination of virtual and real identities were used when engaging in
the activity, the frequency of the activity within the past 30 days
and then nally, the average amount of time spent per session in
the activity.
Coefcient alpha of the MOIQ was r :89. Spearman reliabilities ranged from r :81 to .64 for use of instant messaging; r 1
to r :57 on email use; r :66 to .05 on audiovideo conferencing; r :70 to r :21 on use of chat rooms; r :92 to r :27 on
virtual dating, r :87 to .12 on cybersex and nally r 1 to .17
for online gaming. The most unstable items were those reporting
levels of current use, suggesting that use varied rapidly within a
Table 1
Example activities associated with Internet domains.
Purchasing
Information seeking
Entertainment
Travel related
Job hunting/researching
employers
School application process
Online games
Movies
Viewing
pornography
Gambling
Internet surng
Literature
Sports
Fantasy sports
Other
Entertainment/
leisure
Food
Clothing and
accessories
Sporting goods
Books
Medical
Electronics
Other
Diet/exercise
Potential purchases
News
Reference
Housing
Transportation
Finance
Personal interests/hobbies
Electronic repairs
Religious/spiritual purposes
Environment
Other
Music
Mischief
Work/school related
tasks
Banking
Bills
Investing
Hacking
Snooping/lurking
Downloading without
payment
Stealing
Online classes
Class assignments
Internet based work
Personal information
tasks
Electronic repairs
Selling
Other
Other
Fraud
Plotting
Other
3. Results
Almost all participants used the Internet for purchasing, information seeking, work/school, tasks and services and entertainment
purposes. Online mischief (e.g., theft, illegal downloads, etc.) was a
unique category with 119 participants reporting that they did not
engage in mischief whereas 63 endorsed such activities. Eightyfour participants indicated that they had engaged in mischief in
the past and 99 denied ever engaging in mischief activities.
A series of regressions were computed to determine if Internet
use across the various domains could predict level of social support, happiness, or introversion. Gaming and mischief predicted total support as measured by the SSAS in which R2 = .11; F = 10.66;
p < .000. Mischief predicted the level of happiness as measured
by the SHS in which R2 = .04; F = 6.68; p = .01. As expected, time
spent on solitary tasks predicted introversion as measured by the
SAI, R2 = .02; F = 3.92, p < .05. Entertainment predicted introversion
as measured by the MBTI, R2 = .04; F = 7.36, p < .007.
A comparison between groups by mischief (yes/no) demonstrated no difference in social support, happiness or introversion.
However, there were differences in time spent chatting online
and engaging in cybersex, in which the individuals endorsing mischief engaged in more online purchasing, chatting and cybersex.
The groups were then divided by the level of mischief as follows:
1860
4. Discussion
The penetration of the Internet into both professional and interpersonal life domains has the potential for far-reaching implications for quality of life and social interactions. This study
conrmed Pew ndings (Jones & Fox, 2009) that the Internet has
become ubiquitous for a wide variety of uses, including entertainment, purchasing, information-seeking, tasks and services and
work and school purposes. Participants in this study varied in the
amount of time they spent online, in which higher levels of participation in certain activities signicantly predicted lower levels of
happiness, social support and higher levels of introversion. Specifically, these results suggest that heavy Internet use in specic domains (i.e. gaming and mischief) is associated with a diminution of
an individuals perceived social support, which would suggest that
there is risk for higher levels of a variety of problems since the relationship between social support and well being has been so robust.
Also, individuals who spent more time online engaged in activities
categorized as entertainment were more introverted. It appears
that merely examining overall use of the Internet in relation to
well-being or happiness may not be as useful as a more ne
grained analysis of Internet activities in relation to specic person
variables. This study was the rst step in developing a model that
can be tested to determine if the relationship between types of
internet use and person variables is of sufcient strength to have
utility in, for example, identifying youth at risk.
Internet use for the purpose of mischief was associated with
lower levels of happiness. Differences also were found in specic
types of Internet between participants who reported engaging in
mischief and those who did not. Participants endorsing mischief
spent signicantly more time online engaged in cybersex, purchasing and chatting. When these groups were further divided into
three groups, less-serious mischief (downloads only), serious mischief and no mischief, additional group differences were found.
Individuals endorsing serious mischief were less happy, and yet
had higher levels of perceived social support than the less serious
mischief and no mischief groups. These results suggest that there
might be a unique prole for individuals who engage in more serious mischief activities. The unexpected combination of high perceived social support and low happiness warrants further
investigation. A reasonable but perhaps counterintuitive interpretation of this combination of lower happiness and high social support might indicate the existence of a subpopulation of relatively
well-connected mischief engagers.
The specic combination of activities endorsed by the individuals within the most serious mischief group (i.e., higher levels of cyber-sexual activity, high levels of talking online and spending) is
consistent with behavior exhibited by individuals frequently described as hypomanic or bipolar. It is unclear, however, if this psychological prole would represent an accurate characterization of
overall adjustment or functioning of these participants, particularly since the sample was small and homogeneous. This also warrants additional investigation as such individuals are at risk for an
array of problems and mischief on the internet may be a good marker variable for early identication of such individuals.
Limitations of this study pertain directly to the sample size and
limits of generalizability. Participants were predominantly male
college students attending a science and technology-focused university. It is possible that Internet use may be somewhat different
in this sample as compared with the general population. Additionally, because the Internet is such a dynamic and rapidly changing
technology, it seems unlikely that the Internet use measures created for this study were able to tap into all the possible domains
of specic types of use. This study underscores the need for focused
research examining specic aspects of Internet use that take into
account the dynamic nature of the Internet and how it relates to
individual differences and interpersonal interaction.
Acknowledgments
The authors wish to thank other members of the research team
who contributed to this effort including Frank Connors, Sapna Ram,
Manasa Kasinath, Alexis Kramer, Bethany Grix, Jennifer Marola,
and Morgan Carey, Illinois Institute of Technology, Chicago, IL.
References
Amichai-Hamburger, Y., Fine, A., & Goldstein, A. (2004). The impact of Internet
interactivity and need for closure on consumer preference. Computers in Human
Behavior, 20, 103117.
Bargh, J., Katelyn, Y., & McKenna, A. (2004). The Internet and social life. Annual
Review of Psychology, 55, 573.
Barrera, M. Jr., (1986). Distinctions between social support concept, measures, and
models. American Journal of Community Psychology, 14, 413445.
Bradburn, N. (1969). The structure of psychological well being. Chicago, IL: Aldine.
Cohen, S., & Wills, T. A. (1985). Stress, social support, and the buffering hypothesis.
Psychological Bulletin, 98, 310357.
Diener, E., Larsen, R. J., & Emmons, R. A. (1984). Person situation interactions:
Choice of situations and congruence response models. Journal of Personality and
Social Psychology, 47, 580592.
Fallows, D. (2004). The Internet and daily life: Many Americans use the Internet in
everyday activities, but traditional ofine habits still dominate. Pew Internet and
American life project. <http://pewInternet.org/pdfs/>.
Fleeson, W., Malanos, A. B., & Achille, N. M. (2002). An intra individual process
approach to the relationship between extraversion and positive affect: Is acting
extraverted as good as being extraverted. Journal of Personality and Social
Psychology, 83(6), 14091422.
Hirsch, S., & Kummerow, J. (1989). Life types. New York: Warner Books, Inc.
Jones, S., & Fox, S. (2009). Generations online in 2009. Pew Internet and American
life project. <http://www.pewInternet.org/Reports/2009/Generations-Online-in2009.aspx>.
Kraut, R., Lundmark, V., Patterson, M., Kiesler, S., Mukopadhyay, T., & Scherlis, W.
(1998). Internet paradox: A social technology that reduces social involvement
and psychological well-being? American Psychologist, 53(9), 10171031.
Laney, M. O. (2002). The introvert advantage: How to thrive in an extroverted world.
New York, NY: Workman Publishing Company, Inc..
Lenhart, A. (2000) Whos not online: 57% of those without Internet access say they
do not plan to log on. Pew Internet and American life project. <http://www.
pewInternet.org//media//Files/Reports/2000/Pew Those_Not_Online_ Report.
pdf.pdf>.
Lenhart, A., Madden, M., Macgill, A. R., & Smith, A. (2007). Teens and social media:
The use of social media gains a greater foothold in teen life as they embrace the
conversational nature of interactive online media. Pew Internet and American life
project. <http://www.pewInternet.org/pdfs/PIP_Teens_Social_Media_Final>.
Lyubormirsky, S., & Lepper, H. S. (1999). A measure of subjective happiness:
Preliminary reliability and construct validation. Social Indicators Research, 46(2),
137155.
Madden, M., Fox, S., Smith, A., & Vitak, J. (2007). Digital footprints: Online identity
management and search in the age of transparency. Pew Internet and American
life project. <http://www.pewInternet.org/pdfs/PIP_Digital_Footprints>.
1861
Wallace, P. (1999). The Psychology of the Internet. New York, NY: Cambridge
University Press.
Watson, D., & Friend, R. (1969). Measurement of social-evaluative anxiety. Journal of
Consulting and Clinical Psychology, 33, 448457.
Whitley, E. (1997). In cyberspace all they see are your words: A review of the
relationship between body, behavior and identity drawn from the sociology of
knowledge. Information, Technology and People, 10(2), 147163.
Winemiller, D., Mitchell, M. E., Sutliff, J., & Cline, D. (1993). Measurement strategies
in social support. Journal of Clinical Psychology, 49, 638648.
Zimet, G. D., Dahlem, N. W., Zimet, S. G., & Farley, G. K. (1988). The
multidimensional scale of perceived social support. Journal of Personality
Assessment, 52, 3031.
www.elsevier.com/locate/diin
Abstract The Trojan defence; I didnt do it, someone else did e myth or
reality? This two-part article investigates the fascinating area of Trojan & network
forensics and puts forward a set of processes to aid forensic practitioners in this
complex and difficult area. Part I examines the Trojan defence, how Trojan horses
are constructed and considers the collection of volatile data. Part II takes this
further by investigating some of the forensic artefacts and evidence that may be
found by a forensic practitioner and considers how to piece together the evidence
to either accept or refute a Trojan defence.
2005 Elsevier Ltd. All rights reserved.
h t t p : / / w w w. t h i s i s l o n d o n . c o . u k / n e w s / a r t i c l e s /
6026981?sourceZevening%20standard.
3
http://news.bbc.co.uk/1/hi/technology/3202116.stm.
1742-2876/$ - see front matter 2005 Elsevier Ltd. All rights reserved.
doi:10.1016/j.diin.2005.01.010
24
said The Caffrey case suggests that even if no
evidence of a computer break-in is unearthed on
a suspects PC, they might still be able to successfully claim that they were not responsible for what
their computer does, or what is found on its hard
drive.4
The Trojan defence places a lot of pressure on
the prosecution, which in turn places pressure on
the forensic investigators to prove, beyond all
reasonable doubt, that the accused is responsible
for the evidence located on the computer.
Mark Rasch of SecurityFocus, comments in his
article, The Giant Wooden Horse Did It!5 that
this defence is all the more frightening because it
could be true. He asks, .if you were a hacker,
would you want to store your contraband files on
your own machine, or, like the cuckoo, would you
keep your eggs in another birds nest?
Storing files on other systems is a common tactic
for attackers. Individuals who share copyright
protected materials store their contraband on
high-speed servers; hackers store their rootkits
or other tools on compromised systems or other
publicly accessible servers. No doubt many forensic practitioners have seen examples of this;
however, the Honeynet Project6 has several challenges, which show evidence of this practice.
Rasch further points out, In late December
2003, companies around the world began to report
a new kind of cyber-attack that had been apparently going on for about a year. Cyber extortionists
(reportedly from Eastern Europe) threatened to
plant child pornography on their computers and
then call the cops if they didnt agree to pay
a small fee. Unless the recipient pays a nominal
amount ($30), the hacker claims he will either
wipe the hard drive or plant child porn. The
possibility of Trojans and the relative ease with
which they could be used to promulgate such an
attack made the threats credible.
It is clear that the Trojan defence needs to be
carefully considered. As forensic practitioners, it is
important that whenever an examination is conducted, we should keep the Trojan defence possibility at the forefront of our minds. All existing
Trojans can be detected provided forensic examiners know how to identify and process the digital
traces. The methodologies used to conduct an investigation differ from practitioner to practitioner,
however this two-part article aims to show some
D. Haagman, B. Ghavalas
steps that should be considered which might substantiate or refute the Trojan defence.
Definitions
First, it is worth looking at the definition of
a Trojan and how it relates to backdoors.
According to Wikipedia7 a Trojan horse or Trojan
is a malicious program that is disguised as legitimate software . Trojan horse programs cannot
replicate themselves, in contrast to some other
types of malware, like viruses or worms. A Trojan
horse can be deliberately attached to otherwise
useful software by a programmer, or it can be
spread by tricking users into believing that it is
a useful program.
A Trojan is simply a delivery mechanism. It
contains a payload to be delivered elsewhere.
The payload may consist of almost anything such
as a piece of spyware, adware, a backdoor, implanted data or simply a routine contained within
a batch file. Additional tools such as keyloggers,
packet generation tools (for denial-of-service attacks) and sniffers may form part of the payload. It
is beyond the scope of this article to discuss each
of these in turn as we would simply not have
enough space so we will instead concentrate on
backdoors themselves as part of the overall Trojan
debate.
The above properties are important to an
analyst. Finding the original infection vector or
artefacts relating to the Trojan could influence the
timeline and validity of evidence. Locating the
actual Trojan and understanding its payload and
capabilities is exceptionally useful when building
(or defending) a case.
Wikipedia explains, A backdoor in a computer
system (or a cryptosystem, or even in an algorithm) is a method of bypassing normal authentication or obtaining remote access to a computer,
while intended to remain hidden to casual inspection. The backdoor may take the form of an
installed program (e.g., Back Orifice) or could be
a modification to a legitimate program. . Many
computer worms, such as Sobig and Mydoom,
install a backdoor on the affected computer
(generally a PC on broadband running insecure
versions of Microsoft Windows and Microsoft Outlook). Such backdoors appear to be installed so
that spammers can send junk email from the
machines in question.8
http://news.bbc.co.uk/1/hi/technology/3202116.stm.
http://www.theregister.co.uk/2004/01/20/the_giant_wooden_
horse_did/.
6
http://project.honeynet.org/scans/index.html.
5
http://en.wikipedia.org/wiki/Trojan_horse_%28computing%29.
8
http://en.wikipedia.org/wiki/Backdoor.
GAME
25
TROJAN HORSE
the malware author can take a game called newgame.exe and some malicious payload called
malware.exe and bind them together.
Over time this process has become very simple
and widespread with graphical tools making the
process extremely simple.
BACKDOOR
Changing shape
Figure 1
Again, the properties of the backdoor can influence the case. For example, many investigators
use an anti-virus (AV) tool to process the forensic
image. The AV tool will highlight files containing
malware, including backdoors. However, the mere
existence of the files does not necessarily mean
that the backdoor was ever active. Establishing
this fact can be crucial and will be revisited in Part
II of this article.
P2P
email
file sharing and removable media
direct implant through hacking, etc.
10
11
http://www.programmerstools.org/packers.htm.
http://www.illmob.org.
26
D. Haagman, B. Ghavalas
Figure 2 EliteWrap used to wrap two pieces of software together; the original game (game.exe) will unpack in the
foreground when executed whilst malware.exe unpacks itself in a stealthy manner.
Trojan scenarios
DETECTABLE
COMPRESSION
UNDETECTABLE
Scenario 1
27
TROJAN HORSE
GAME.EXE
AV KILLER
COMPRISING
FW KILLER
COMPRESSED
BACKDOOR
FALSE
REGISTRY
ENTRIES
Scenario 2
If the above scenario was not bad enough then
consider the same type of Trojan deployment, but
also with an FW killer to disable personal firewall
software and a routine that could implant false
registry keys into the victims system. Such keys
could ensure stealthy start-up of rogue processes
or could even add falsified histories relating to
Internet surfing activity. The possibilities are numerous as shown in Fig. 5.
Whilst all this may seem rather complex and
possibly too difficult to achieve, remember that
tools have emerged that automate much of the
above. We now see all-in-one kits such as Optix-
TROJAN HORSE
GAME.EXE
COMPRISING
AV KILLER
COMPRESSED
BACKDOOR
28
D. Haagman, B. Ghavalas
Figure 6
Network evidence
Having a well-rehearsed plan for acquiring live
evidence is critical. Using trusted and forensically
sound tools is a must. Before gathering the
evidence from the suspect system, it could be
worth considering a network forensic approach by
sniffing the communication flows to and from the
suspect system. Unfortunately, this tends to be
easier said than done e both from a legal and
a technical perspective.
In some situations, such as a corporate environment or a home-networked environment, it may
be possible to intercept communication through
the use of the port spanning function of a switch.
Plugging in to an existing hub or placing a hub
between the suspect system and the network may
also be an option. The investigators machine
would then be configured to capture all traffic
to and from the suspect machine. It may be
http://www.tracespan.com/2_2LI%20Monitoring.html.
http://www.hmso.gov.uk/acts/acts2000/20000023.htm.
http://www.hmso.gov.uk/si/si2000/20002699.htm.
29
Figure 7
A screenshot of WFT.
16
http://www.foolmoon.net/security/wft/.
http://www.giac.org/practical/GCFA/Monty_McDougal_
GCFA.pdf.
Coming up in Part II
In the next article, we will show how the volatile
information we have gathered can be used to aid
an offline forensic analysis of the computer. We
will also discuss the virtues of network analysis and
the use of Virtual Machines to aid an investigation.
17
18
http://users.erols.com/gmgarner/forensics/.
30
The use and limitations of AV products and their
benefit to investigations will also be addressed.
Dan Haagman (BSc, CSTP, CFIA) and Byrne Ghavalas (CSTP,
CFIA, GCFA) instruct and practice in computer forensics for
D. Haagman, B. Ghavalas
7Safe e an independent Information Security practice delivering an innovative portfolio of services including: Forensic
Investigation, BS7799 Consulting, Penetration Testing & Information Security Training.
G Model
DRUPOL-1172; No. of Pages 7
ARTICLE IN PRESS
International Journal of Drug Policy xxx (2013) xxxxxx
Commentary
Silk Road, the virtual drug marketplace: A single case study of user experiences
Marie Claire Van Hout a, , Tim Bingham b
a
b
a r t i c l e
i n f o
Article history:
Received 30 September 2012
Received in revised form 1 January 2013
Accepted 14 January 2013
Keywords:
Silk Road
Internet
Online drug forums
New psychoactive substances
Psychonautics
Ethnopharmacy
a b s t r a c t
Background: The online promotion of drug shopping and user information networks is of increasing public
health and law enforcement concern. An online drug marketplace called Silk Road has been operating on
the Deep Web since February 2011 and was designed to revolutionise contemporary drug consumerism.
Methods: A single case study approach explored a Silk Road users motives for online drug purchasing,
experiences of accessing and using the website, drug information sourcing, decision making and purchasing, outcomes and settings for use, and perspectives around security. The participant was recruited
following a lengthy relationship building phase on the Silk Road chat forum. Results: The male participant
described his motives, experiences of purchasing processes and drugs used from Silk Road. Consumer
experiences on Silk Road were described as euphoric due to the wide choice of drugs available, relatively easy once navigating the Tor Browser (encryption software) and using Bitcoins for transactions,
and perceived as safer than negotiating illicit drug markets. Online researching of drug outcomes, particularly for new psychoactive substances was reported. Relationships between vendors and consumers
were described as based on cyber levels of trust and professionalism, and supported by stealth modes,
user feedback and resolution modes. The reality of his drug use was described as covert and solitary
with psychonautic characteristics, which contrasted with his membership, participation and feelings of
safety within the Silk Road community. Conclusion: Silk Road as online drug marketplace presents an
interesting displacement away from traditional online and street sources of drug supply. Member support and harm reduction ethos within this virtual community maximises consumer decision-making and
positive drug experiences, and minimises potential harms and consumer perceived risks. Future research
is necessary to explore experiences and backgrounds of other users.
2013 Elsevier B.V. All rights reserved.
Introduction
The Internet is increasingly viewed as the driver of the contemporary drug markets by the promotion of drug shopping in web
based retail outlets and settings for user communication of information (Burillo-Putze, Domnguez-Rodrguez, Abreu-Gonzlez, &
Nogu Xarau, 2011; Califano, 2007; Corazza et al., 2011, 2012;
Davey, Corazza, Schifano, Deluca, & Psychonaut Web Mapping
Group, 2010; Davey, Schifano, Corazza, & Deluca, 2012; Davies,
2012; Eurobarometer, 2011; Forsyth, 2012; Hill & Thomas, 2011;
Jones, 2010; Measham, 2011; Oyemade, 2010; Prosser & Nelson,
2011; Psychonaut Web Mapping Research Group, 2009; Solberg,
2012; Sumnall, Evans-Brown, & McVeigh, 2011; Vardakou, 2011;
Winstock, Marsden, & Mitcheson, 2010). Research has underscored
how the cyber drug market has become increasingly dynamic and
innovative in its capacity to retail drugs, create new compounds
Corresponding author at: School of Health Sciences, Waterford Institute of Technology, Waterford, Ireland. Tel.: +353 51 302166.
E-mail address: mcvanhout@wit.ie (M.C.V. Hout).
0955-3959/$ see front matter 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.drugpo.2013.01.005
Please cite this article in press as: Hout, M. C. V., & Bingham, T. Silk Road, the virtual drug marketplace: A single case study of user experiences.
International Journal of Drug Policy (2013), http://dx.doi.org/10.1016/j.drugpo.2013.01.005
G Model
DRUPOL-1172; No. of Pages 7
2
ARTICLE IN PRESS
M.C.V. Hout, T. Bingham / International Journal of Drug Policy xxx (2013) xxxxxx
Please cite this article in press as: Hout, M. C. V., & Bingham, T. Silk Road, the virtual drug marketplace: A single case study of user experiences.
International Journal of Drug Policy (2013), http://dx.doi.org/10.1016/j.drugpo.2013.01.005
G Model
DRUPOL-1172; No. of Pages 7
ARTICLE IN PRESS
M.C.V. Hout, T. Bingham / International Journal of Drug Policy xxx (2013) xxxxxx
brieng sessions were held between researchers in order to circumvent this and we strove to minimise bias towards verication by
checking for validity and reliability within the collection, analysis
and subsequent presentation of resultant themes. Data credibility
was further improved by employing a focused analysis by consistently exploring the data within the scope of the research questions
and existing literature on Silk Road (Darke et al., 1998; Miles &
Huberman, 1994; Russell, Gregory, Ploeg, DiCenso, & Guyatt, 2005;
Yin, 2003). The focused collection and comparison of these single
case study narratives with extant literature enhanced the quality of
resultant deconstruction and reconstruction of various Silk Road
phenomena by virtue of idea convergence and conrmation of ndings (Baxter & Jack, 2008; Kna & Breitmayer, 1989). Five themes
emerged from the data, and are presented in the following section: Participant drug use history, Internet drug sourcing and risk
perceptions, Preparing to access Silk Road, Silk Road purchasing
mechanisms and Drug use, testing and setting.
1
A person who intelligently experiments with mind-altering chemicals, sometimes to the extent of taking exact measurements and keeping records of
experiences. Also dened as a scientic explorer of inner space (Newcombe, 2008;
Newcombe & Johnson, 1999).
2
Erowid Online is an online library containing information about psychoactive
drugs, plants, and research chemicals.
3
Bluelight is an international message board that educates the public about
responsible drug use by promoting free discussion.
Please cite this article in press as: Hout, M. C. V., & Bingham, T. Silk Road, the virtual drug marketplace: A single case study of user experiences.
International Journal of Drug Policy (2013), http://dx.doi.org/10.1016/j.drugpo.2013.01.005
G Model
DRUPOL-1172; No. of Pages 7
ARTICLE IN PRESS
M.C.V. Hout, T. Bingham / International Journal of Drug Policy xxx (2013) xxxxxx
Getting hold of the Bitcoins, thats probably the only hard part.
I have managed to set up a bank account fraudulently. I suppose
probably the worse crime I have ever committed. I can now
deposit money under a false name and get it into this online
account, so I can then get it out and cipher it through Bit wallets of my own that are completely anonymous and not linked
to me any shape or form, and from there transfer them into
my account on Silk Road. It has become a lot more complicated
to do it very securely. But, the nature of Silk Road itself and
the fact that they have a tumbler system that the Bitcoin go
through, the chances of it being linked back to your bank account
would be slim to none really. I have not heard of any users being
arrested for going on Silk Road and if they have, it has not been
publicised.
Please cite this article in press as: Hout, M. C. V., & Bingham, T. Silk Road, the virtual drug marketplace: A single case study of user experiences.
International Journal of Drug Policy (2013), http://dx.doi.org/10.1016/j.drugpo.2013.01.005
G Model
DRUPOL-1172; No. of Pages 7
ARTICLE IN PRESS
M.C.V. Hout, T. Bingham / International Journal of Drug Policy xxx (2013) xxxxxx
Discussion
Online research methodologies for the recruitment, surveying
of and engagement with drug users are increasingly utilised, given
the emergent importance of the Internet in peoples associational
day to day lives and recent explosion of online pharmacies, drug
user forums and sites selling new psychoactive substances (Barratt
& Lenton, 2010, De Luca et al., 2012; Fielding, Lee, & Blank, 2008;
Miller & Snderlund, 2010). Research to date has focused on the
web mapping of online retailing, marketing and use of drugs, and
equally the potential for Internet based interventions to reduce
harm (De Luca et al., 2012; Kypri, 2009; Sinadinovic, Wennberg,
& Beman, 2012). This unique exploratory single case study followed protocols advocated by Yin (2003, p. 184) and Flyvbjerg
(2011), and is dened as an empirical inquiry that investigates
a contemporary phenomenon within its real-life context; when the
boundaries between phenomenon and context are not clearly evident;
and in which multiple sources of evidence are used (Yin, 1984, p.
23). Resultant ndings provide a phenomenological insight into an
expert account of the cases experiences of Silk Road and associated drug taking. The hidden nature of Silk Road on the Deep
Web and its covert operation limits access to its members by
outsiders. This severely hampered recruitment and snowballing
efforts by the research team. Despite these shortcomings, the
case himself was an active participant in the Silk Road forums
and willing to be interviewed. He proved to describe his experiences in an intelligent and erudite manner, and illustrated site
characteristics and purchasing mechanisms corroborated by published literature, recent media reporting and law enforcement
statements. We recognise the limitations associated with this
potentially unveriable single case study, and recommend further
research into other unrelated users accounts and experiences of
Silk Road.
Accessing Silk Road was described as a joyful child in a sweet
shop type experience by virtue of its host of quality products and
Please cite this article in press as: Hout, M. C. V., & Bingham, T. Silk Road, the virtual drug marketplace: A single case study of user experiences.
International Journal of Drug Policy (2013), http://dx.doi.org/10.1016/j.drugpo.2013.01.005
G Model
DRUPOL-1172; No. of Pages 7
ARTICLE IN PRESS
M.C.V. Hout, T. Bingham / International Journal of Drug Policy xxx (2013) xxxxxx
References
Barratt, M. (2012). Letters to the editor Silk Road: Ebay for drugs. Addiction, 107,
683684.
Barratt, M., & Lenton, S. (2010). Beyond recruitment? Participatory online research
with people who use drugs. International Journal of Internet Research Ethics, 3,
6986.
Baxter, P., & Jack, S. (2008). Qualitative case study methodology: Study design
and implementation for novice researchers. The Qualitative Report, 13(4),
544559.
Becker, H. (1963). Outsiders: Studies in the sociology of deviance. London: Free Press
of Glencoe.
Bitcoin. (2011). Bitcoin. Bitcoin P2P digital currency. Retrieved from http://bitcoin.org/
(27.09.12).
Brandt, S. D., Sumnall, H. R., Measham, F., & Cole, J. (2010). Second generation mephedrone. The confusing case of NRG-1. British Medical Journal, 341,
3564.
Burillo-Putze, G., Domnguez-Rodrguez, A., Abreu-Gonzlez, P., & Nogu Xarau, S.
(2011). Khat, mefedrona y dolor torcicom. Medicina Clnica [Medicina Clinica
(Barcelona)], 137, 712713.
Califano, J. A. (2007). Press release: Youve Got Drugs! Retrieved from http://
www. casacolumbia.org/absolutenm/templates/PressReleases.aspx?articleid=
492and zoneid=65 (22.09.12).
Chen, A. (2011). The underground website where you can buy any drug
imaginable. Retrieved from http://gawker.com/5805928/the-undergroundwebsite-where-you-canbuy-any-drug-imaginable (20.09.12).
Christin, N. (2012). Traveling the Silk Road: A measurement analysis of a large anonymous online marketplace. July, Id: paper.tex 1286 2012-07-30 21:29:14Z nicolasc.
Corazza, O., Schifano, F., Simonato, P., Fergus, S., Assi, S., Stair, J., et al. (2012).
Phenomenon of new drugs on the Internet: The case of ketamine derivative methoxetamine. Human Psychopharmacology: Clinical and Experimental, 27,
145149.
Corazza, O., Schifano, F., Farre, M., Deluca, P., Davey, Z., Drummond, C., et al. (2011).
Designer drugs on the internet: A phenomenon out-of-control? The emergence of hallucinogenic drug Bromo-Dragony. Current Clinical Pharmacology,
6, 125129.
Darke, P., Shanks, G., & Broadbent, M. (1998). Successfully completing case study
research: Combining rigour, relevance and pragmatism. Information Systems
Journal, 8, 273289.
Davey, Z., Corazza, O., Schifano, F., Deluca, P., & Psychonaut Web Mapping Group.
(2010). Mass-information: Mephedrone, myths, and the new generation of legal
highs. Drugs and Alcohol Today, 10, 2428.
Davey, Z., Schifano, F., Corazza, O., & Deluca, P. (2012). e-Psychonauts: Conducting research in online drug forum communities. Journal of Mental Health, 21,
386394.
Davies, B. (2012). Dangerous drugs online. The Australian Prescriber, 35, 3233.
Davis, J. (2011). The crypto-currency. The New Yorker. Cond Nast., p. 62
De Luca, P., Davey, Z., Corazza, O., Di Furia, L., Farre, M., Holmefjord Flesland, L., et al.
(2012). Identifying emerging trends in recreational drug use; outcomes from
the Psychonaut Web Mapping Project. Progress in Neuro-Psychopharmacology
and Biological Psychiatry (Early Online).
Dixon, B. (2010). Worries over legal drugs. Current Biology, 20, 298299.
EMCDDA. (2011a). The state of the drugs problem in Europe Annual report. Lisbon,
Portugal: European Monitoring Centre for Drugs and Drug Addiction.
EMCDDA. (2011b). Report on the risk assessment of mephedrone in the framework of
the Council decision on new psychoactive substances. Luxembourg: Publications
Ofce of the European Union.
Eurobarometer. (2011). Eurobarometer: Youth attitudes on drugs. Analytical
report. Retrieved from http://ec.europa.eu/public opinion/ash/ 330 en.pdf
(20.09.12).
Fielding, N. G., Lee, R. M., & Blank, G. (Eds.). (2008). The handbook of online research
methods. London: Sage.
Flyvbjerg, B. (2011). Case study. In K. Norman, Denzin, S. Yvonna, & Lincoln (Eds.),
The Sage handbook of qualitative research (4th ed., pp. 301316). Thousand Oaks,
CA: Sage.
Flyvbjerg, B. (2006). Five misunderstandings about case study research. Qualitative
Inquiry, 12(2), 219245.
Forsyth, A. J. M. (2012). Virtually a drug scare: Mephedrone and the impact of the
Internet on drug news transmission. International Journal of Drug Policy, 23,
198209.
Furlong, A., & Cartmel, F. (1997). Young people and social change: Individualization
and risk in late modernity. Buckingham: Open University Press.
Gordon, S. M., Forman, R. F., & Siatkowski, C. (2006). Knowledge and use of the Internet as a source of controlled substances. Journal of Substance Abuse Treatment,
30, 271274.
Govier, M. (2011). Research chemicals: An approach to lling the information gap.
Drugs and Alcohol Today, 11, 7176.
Greenhalgh, T. (1997). How to read a paper: The basics of evidence based medicine. UK:
BMJ Publishing Group., pp. 151162
Grifths, P., Sedefov, R., Gallegos, A., & Lopez, D. (2010). How globalization and market innovation challenge how we think about and respond to drug use: Spice
a case study. Addiction, 105, 951953.
Hill, S., & Thomas, S. H. (2011). Clinical toxicology of newer recreational drugs.
Clinical Toxicology, 49, 705719.
Hughes, B., & Winstock, A. R. (2012). Controlling new drugs under marketing regulations. Addiction, http://dx.doi.org/10.1111/j.1360-0443.2011.03620.x
Inciardi, J. A., Surratt, H. L., Cicero, T. J., Roseblum, A., Ahwah, C., Bailey, E., et al.
(2010). Prescription drugs purchased through the internet: Who are the end
users? Drug and Alcohol Dependence, 110, 2129.
Jay, M. (1999). Articial paradises: A drugs reader. London: Penguin.
Jones, A. L. (2010). Legal highs available through the Internetimplications and
solutions? Quarterly Journal of Medicine, 103, 535536.
Karila, L., & Reynaud, M. (2011). GHB and synthetic cathinones: Clinical effects and
potential consequences. Drug Testing and Analysis, 3, 552559.
Kna, K., & Breitmayer, B. J. (1989). Triangulation in qualitative research: Issues of
conceptual clarity and purpose. In J. Morse (Ed.), Qualitative nursing research: A
contemporary dialogue (pp. 193203). Rockville, MD: Aspen.
Kypri, K. (2009). New technologies in the prevention and treatment of substance
use problems. Drug and Alcohol Review, 28, 12.
Lather, P. (1992). Critical frames in educational research: Feminist and poststructural perspectives. Theory into Practice, 31(2), 8799.
Leary, T., Metzner, R., & Alpert, R. (1964). The psychedelic experience. NY: Citadel
Press.
Lilly, J. (1972). The Centre of the Cyclone: An autobiography of inner space. London:
Marion Boyars.
Measham, F. (2011). Legal highs: The challenge for government. Criminal Justice
Matters, 84, 2830.
Mendelson, C. (2007). Recruiting participants for research from online communities.
Computers, Informatics, Nursing, 25, 317323.
Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook. CA: Sage.
Miller, P. G., & Snderlund, A. L. (2010). Using the internet to research hidden populations of illicit drug users: A review. Addiction, 105, 15571567.
Miller, P. (2005). Scapegoating, self-condence and risk comparison: The functionality of risk neutralisation and lay epidemiology by injecting drug users.
International Journal of Drug Policy, 16, 246253.
Moore, K., & Miles, S. (2004). Young people, dance and the sub-cultural consumption
of drugs. Addiction Research and Theory, 12, 507523.
Murgua, E., & Tackett-Gibson, M. (2007). The new drugs Internet survey: A portrait
of respondents. In E. Murgua, M. Tackett-Gibson, & A. Lessem (Eds.), Real drugs
in a virtual world: Drug discourse and community online (pp. 4558). Lanham,
MD: Lexington Books.
Newcombe, R. (2008). Ketamine case study: The phenomenology of a ketamine
experience. Addiction Research and Theory, 16, 209215.
Newcombe R., & Johnson, M. (1999, November). Psychonautics: A model and method
for exploring the subjective effects of psychoactive drugs. Paper presented at
Club Health 2000 First International Conference on Nightlife and Substance
Use, Royal Tropical Institute, Amsterdam, Netherlands.
Norrie; J., & Moses, A. (2011). Drugs bought with virtual cash. The Sydney
Morning Herald. Fairfax Media. Retrieved from http://www.smh.com.
au/technology/technology-news/drugs-bought-with-virtual-cash-201106111fy0a.html (20.09.12).
Oyemade, A. (2010). Meow Meow or Miaow Miaow a new drug of concern. Psychiatry, 7, 10.
Peretti-Watel, P. (2003). Neutralization theory and the denial of risk: Some evidence
from cannabis use among French adolescents. The British Journal of Sociology, 54,
2142.
Prosser, J. M., & Nelson, L. S. (2011). The toxicology of bath salts: A review of synthetic
cathinones. Journal of Medical Toxicology, 8, 3342.
Psychonaut Web Mapping Research Group. (2009). Mephedrone report. London, UK:
Institute of Psychiatry.
Rosenbaum, C. D., Carreiro, S. P., & Babu, K. M. (2012). Here Today, Gone
Tomorrow. . .and Back Again? A review of herbal marijuana alternatives (K2,
Spice), synthetic cathinones (bath salts), kratom, salvia divinorum, methoxetamine, and piperazines. Journal of Medical Toxicology, 8, 1532.
Russell, C., Gregory, D., Ploeg, J., DiCenso, A., & Guyatt, G. (2005). Qualitative research.
In A. DiCenso, G. Guyatt, & D. Ciliska (Eds.), Evidence-based nursing: A guide to
clinical practice (pp. 120135). St. Louis, MO: Elsevier Mosby.
Schepis, T., Marlowe, D. B., & Forman, R. F. (2008). The availability and portrayal of
stimulants over the Internet. Journal of Adolescent Health, 42, 458465.
Schifano, F., Albanese, A., Fergus, F., Stair, J. L., Deluca, P., Corazza, O., et al. (2011).
Mephedrone 4-methylmethcathione: meow meow: Chemical pharmacological and clinical issues. Psychopharmacology, 214, 593602.
Schumer, C. (2011). Schumer pushes to shut down online drug marketplace. Associated Press (NBC New York). Retrieved from http://www.nbcnewyork.com/
news/local/123187958.html (20.09.12).
Shulgin, A., & Shulgin, A. (1992). PIHKAL: A chemical love story. Berkeley, CA: Transform Books.
Shulgin, A., & Shulgin, A. (1997). TIHKAL: The continuation. Berkeley, CA: Transform
Books.
Silk Road forums. (2012). Retrieved from http://dkn255hz262ypmii.onion
(20.09.12).
Silk Road Sellers Guide. (2011). Restricted items. Sellers guide, Silk Road.
Retrieved from http://ianxz6zefk72ulzz.onion/index.php/silkroad/sellers guide
(22.09.12).
Sinadinovic, K., Wennberg, P., & Beman, A. H. (2012). Targeting problematic users of
illicit drugs with Internet-based screening and brief intervention: A randomized
controlled trial. Drug and Alcohol Dependence (Early Online)
Sixsmith, J., Boneham, M., & Goldring, J. E. (2003). Accessing the community: Gaining
insider perspectives from the outside. Qualitative Health Research, 13, 578589.
Solberg, U. (2012). Websites as a source of new drugs/legal highs. Recreational Drugs
European Network (RedNet News), 8.
Please cite this article in press as: Hout, M. C. V., & Bingham, T. Silk Road, the virtual drug marketplace: A single case study of user experiences.
International Journal of Drug Policy (2013), http://dx.doi.org/10.1016/j.drugpo.2013.01.005
G Model
DRUPOL-1172; No. of Pages 7
ARTICLE IN PRESS
M.C.V. Hout, T. Bingham / International Journal of Drug Policy xxx (2013) xxxxxx
Please cite this article in press as: Hout, M. C. V., & Bingham, T. Silk Road, the virtual drug marketplace: A single case study of user experiences.
International Journal of Drug Policy (2013), http://dx.doi.org/10.1016/j.drugpo.2013.01.005
ACCESSING THE
DEEP WEB
94
95
ACCESSING THE
DEEP WEB
94
95
Make:
Acura
Author:
Model:
All
Title:
Price:
Any
Subject:
Subject word(s)
Within:
30
mi
GO
Exact name
ISBN:
Publisher:
Cars.com
City:
State:
Bedrooms:
Rent:
Search Now
Your ZIP:
book
database
Amazon.com
Select a State
Doesnt matter
0
to
Biography.com
Apartments.com
Search
9999
dollars
GO!
401carfinder.com
First Name
State (required)
Region
411 locate.com
Select a State
Make
All Regions
Model
All Makes
to
Search
to
Keywords
Clear Fields
Search Tips
Price
all prices
Format
all formats
Age
all age ranges
Subjects
all subjects
advanced search
SEARCH
simple search
Artist Name
Search
Album Title
Clear
Song Title
Artist
SEARCH
Artist
Title
Song
All
Instrument
Label
With its myriad databases and hidden content, this deep Web is an
important yet largely unexplored frontier for information search.
May 2007/Vol. 50, No. 5 COMMUNICATIONS OF THE ACM
Authors Name
Search
simple search
Keyword
Title
Author
Keyword
ISBN
You can narrow your search by selecting one or more options below:
Year
Price
96
advanced search
Title of Book
City
music
database
Figure 1b. Site, bn.com, the simple book search in Figure 1(b) is
end Web databases, each of
databases, and
which is searchable through one
interface. present in almost all pages.
survey Web databases and deep Web
or more HTML forms as its
He fig 1b (5/07)Second,
- 39.5 we
picas
query interfaces. For instance, as Figure 1(b) sites based on the discovered query interfaces.
shows, bn.com is a deep Web site, providing several Specifically, we compute the number of Web dataWeb databases (a book database, a music database, bases by finding the set of query interfaces (within
among others) accessed via multiple query inter- a site) that refer to the same database. In particular,
faces (simple search and advanced search). Note for any two query interfaces, we randomly choose
that our definition of deep Web site did not five objects from one and search them in the other.
account for the virtual hosting case, where multiple We judge that the two interfaces are searching the
Web sites can be hosted on the same physical IP same database if and only if the objects from one
address. Since identifying all the virtual hosts interface can always be found in the other one.
within an IP address is rather difficult to conduct Finally, the recognition of deep Web site is rather
in practice, we do not consider such cases in our simple: A Web site is a deep Web site if it has at
survey. Our IP sampling-based estimation is thus least one query interface.
accurate modulo the effect of virtual hosting.
When conducting the survey, we first find the RESULTS
number of query interfaces for each Web site, then (Q1) Where to find entrances to databases? To
the number of Web databases, and finally the num- access a Web database, we must first find its
ber of deep Web sites.
entrances: the query interfaces. How does an interFirst, as our survey specifically focuses on online face (if any) locate in a site, that is, at which
databases, we differentiate and exclude non-query depths? For each query interface, we measured the
HTML forms (which do not access back-end data- depth as the minimum number of hops from the
bases) from query interfaces. In particular, HTML root page of the site to the interface page.1 As this
forms for login, subscription, registration, polling, study required deep crawling of Web sites, we anaand message posting are not query interfaces. Sim- lyzed one-tenth of our total IP samples: a subset of
ilarly, we also exclude site search, which many 100,000 IPs. We tested each IP sample by making
Web sites now provide for searching HTML pages HTTP connections and found 281 Web servers.
on their sites. These pages are statically linked at Exhaustively crawling these servers to depth 10, we
the surface of the sites; they are not dynamically found 24 of them are deep Web sites, which conassembled from an underlying database. Note that tained a total of 129 query interfaces representing
our survey considered only unique interfaces and 34 Web databases.
removed duplicates; many Web pages contain the
same query interfaces repeatedly, for example, in 1Such depth information is obtained by a simple revision of the wget software.
97
Make:
Acura
Author:
Model:
All
Title:
Price:
Any
Subject:
Subject word(s)
Within:
30
mi
GO
Exact name
ISBN:
Publisher:
Cars.com
City:
State:
Bedrooms:
Rent:
Search Now
Your ZIP:
book
database
Amazon.com
Select a State
Doesnt matter
0
to
Biography.com
Apartments.com
Search
9999
dollars
GO!
401carfinder.com
First Name
State (required)
Region
411 locate.com
Select a State
Make
All Regions
Model
All Makes
to
Search
to
Keywords
Clear Fields
Search Tips
Price
all prices
Format
all formats
Age
all age ranges
Subjects
all subjects
advanced search
SEARCH
simple search
Artist Name
Search
Album Title
Clear
Song Title
Artist
SEARCH
Artist
Title
Song
All
Instrument
Label
With its myriad databases and hidden content, this deep Web is an
important yet largely unexplored frontier for information search.
May 2007/Vol. 50, No. 5 COMMUNICATIONS OF THE ACM
Authors Name
Search
simple search
Keyword
Title
Author
Keyword
ISBN
You can narrow your search by selecting one or more options below:
Year
Price
96
advanced search
Title of Book
City
music
database
Figure 1b. Site, bn.com, the simple book search in Figure 1(b) is
end Web databases, each of
databases, and
which is searchable through one
interface. present in almost all pages.
survey Web databases and deep Web
or more HTML forms as its
He fig 1b (5/07)Second,
- 39.5 we
picas
query interfaces. For instance, as Figure 1(b) sites based on the discovered query interfaces.
shows, bn.com is a deep Web site, providing several Specifically, we compute the number of Web dataWeb databases (a book database, a music database, bases by finding the set of query interfaces (within
among others) accessed via multiple query inter- a site) that refer to the same database. In particular,
faces (simple search and advanced search). Note for any two query interfaces, we randomly choose
that our definition of deep Web site did not five objects from one and search them in the other.
account for the virtual hosting case, where multiple We judge that the two interfaces are searching the
Web sites can be hosted on the same physical IP same database if and only if the objects from one
address. Since identifying all the virtual hosts interface can always be found in the other one.
within an IP address is rather difficult to conduct Finally, the recognition of deep Web site is rather
in practice, we do not consider such cases in our simple: A Web site is a deep Web site if it has at
survey. Our IP sampling-based estimation is thus least one query interface.
accurate modulo the effect of virtual hosting.
When conducting the survey, we first find the RESULTS
number of query interfaces for each Web site, then (Q1) Where to find entrances to databases? To
the number of Web databases, and finally the num- access a Web database, we must first find its
ber of deep Web sites.
entrances: the query interfaces. How does an interFirst, as our survey specifically focuses on online face (if any) locate in a site, that is, at which
databases, we differentiate and exclude non-query depths? For each query interface, we measured the
HTML forms (which do not access back-end data- depth as the minimum number of hops from the
bases) from query interfaces. In particular, HTML root page of the site to the interface page.1 As this
forms for login, subscription, registration, polling, study required deep crawling of Web sites, we anaand message posting are not query interfaces. Sim- lyzed one-tenth of our total IP samples: a subset of
ilarly, we also exclude site search, which many 100,000 IPs. We tested each IP sample by making
Web sites now provide for searching HTML pages HTTP connections and found 281 Web servers.
on their sites. These pages are statically linked at Exhaustively crawling these servers to depth 10, we
the surface of the sites; they are not dynamically found 24 of them are deep Web sites, which conassembled from an underlying database. Note that tained a total of 129 query interfaces representing
our survey considered only unique interfaces and 34 Web databases.
removed duplicates; many Web pages contain the
same query interfaces repeatedly, for example, in 1Such depth information is obtained by a simple revision of the wget software.
97
25%
30%
25%
20%
15%
10%
5%
0%
0
20%
15%
10%
5%
0%
10
be
Depth
ci nm en rs he go rg sc ed ah
si
re ot
Subject Categories
We found that query inter- and the number of query inter- Figure 2b. Distribution
databases over
faces
tend to
locatepicas
shallowly in faces as shown in Equation 4 (the of Websubject
category.
2a (5/07)
- 19.5
He
fig
2b
(5/07)
19.5
picas
their sites: none of the 129 query results are rounded to 1,000).
interfaces had depth deeper than The second and third columns of
5. To begin with, 72% (93 out of 129) interfaces Table 1 summarize the sampling and the estimawere found within depth 3. Further, since a Web tion results
respectively. We also compute the con25%
30%
database may be accessed through multiple inter- fidence interval of each estimated number at 99%
25% measured its depth as the minimum
faces, we
level of
20% confidence, as the 4th column of Table 1
depths 20%
of all its interfaces: 94% (32 out of 34) Web shows, which evidently indicates the scale of the
15%
databases appeared within depth
deep Web is well on the order of
15% 2(a) reports the depth
3; Figure
105 sites. We also observed the
10%
distribution
of
the
34
Web
datamultiplicity of access on the
10%
bases. Finally, 91.6% (22 out of
deep Web. On average, each
5%
5%
24) deep
Web sites had their
deep Web site provides 1.5 datadatabases
within
depth
3.
(We
bases, and each database sup0%
0%
Equation 2.
be ci nm en rs he go rg sc ed ah si re ot
6
7
8
9 10
0
1 ratios
2
3
4 as5 depthrefer to these
ports 2.8 query interfaces.
Depth
Subject Categories
three coverage, which will guide
The earlier survey of [1] estiour further larger-scale crawling
mated 43,000 to 96,000 deep
in Q2.)
Web sites by overlap analysis
He equation 2 (5/07)
(Q2) What is the scale of the
between pairs of search engines.
He fig
2a
(5/07)
15
picas
He
fig
2b
(5/07)
- 15[1]picas
deep Web?
We then tested and Equation 3.
Although
did not explicitly
analyzed all of the 1,000,000 IP
qualify what it measured as a
samples to estimate the scale of
search site, by comparison, it
the deep Web. As just identified,
still indicates that our estimation
He equation 3 (5/07)
with the high depth-three coverof the scale of the deep Web (on
age, almost all Web databases can
the order of 105 sites), is quite
be identified within depth 3. We Equation 4.
accurate. Further, it has been
thus crawled to depth 3 for these
expanding, resulting in a 37 times increase in the
one million IPs.
four years from 20002004.
He equation
(5/07)
The crawling found 2,256 Web servers,
among 4 (Q3)
How structured is the deep Web? While
which we identified 126 deep Web sites, which information on the surface Web is mostly unstruccontained a total of 406 query interfaces represent- tured HTML text (and images), how is the nature
ing 190 Web databases. Extrapolating from the s of the deep Web data different? We classified Web
=1,000,000 unique IP samples to the entire IP databases into two types: unstructured databases,
space of t = 2,230,124,544 IPs, and accounting for which provide data objects as unstructured media
the depth-three coverage, we estimate the number (text, images, audio, and video); and structured
of deep Web sites as shown in Equation 2, the databases, which provide data objects as structured
number of Web databases as shown in Equation 3, relational records with attribute-value pairs. For
98
instance, cnn.com has an unstructured database of ed, ah, si, re, otwhich together occupy 51% (97
news articles, while amazon.com has a structured out of 190 databases), leaving only a slight minordatabase for books, which returns book records (for ity of 49% to the rest of commerce sites (broadly
example, title = gone with the wind, format = defined). In comparison, the subject distribution of
the surface Web, as charpaperback, price =
acterized in [7], showed
$7.99).
that commerce sites
By manual querying The entire deep Web
dominated with an 83%
and inspection of the Google.com (32%)
share. Thus, the trend of
190 Web databases sam- Yahoo.com (32%)
deepening emerges not
pled, we found 43 MSN.com (11%)
only across all areas, but
unstructured and 147 All (37%)
also relatively more sigstructured. We similarly
0% 5%
37%
100%
nificantly in the nonestimate their total
commerce ones.
numbers to be 102,000
Figure 3. Coverage of search
and 348,000 respec(Q5) How do search engines cover the deep Web?
engines.
tively, as summarized in
Since some deep Web sources also provide
Table 1. Thus, the deep
browse directories with URL links to reach the
He fig 3 (5/07)
Web features mostly structured data sources, with a hidden content, how effective is it to crawl-anddominating ratio of 3.4:1 versus unstructured index the deep Web as search engines do for the
sources.
surface Web? We thus investigated how popular
Table 1. Sampling and estimation of
searchscale.
engines index data on the deep Web. In par(Q4) What is the subject distribution oftheWeb
deep-Web
databases? With respect to the top-level categories ticular, we chose the three largest search engines
of the yahoo.com directory as our taxonomy, we Google (google.com), Yahoo (yahoo.com), and
MSN (msn.com).
manually categorized the
We randomly selected
sampled 190 Web dataSampling Results Total Estimate 99% Confidence Interval
20
Web databases from
bases. Figure 2(b) shows Deep Web sites
126
307,000
236,000 - 377,000
the
190 in our sampling
the distribution of the Web databases
190
450,000
366,000 - 535,000
result.
For each database,
14 categories: Business
43
unstructured
102,000
62,000 - 142,000
first,
we
manually sam& Economy (be), Com147
structured
348,000
275,000 - 423,000
pled five objects (result
puters & Internet (ci),
406
1,258,000
1,097,000 - 1,419,000
pages) as test data, by
News & Media (nm), Query interfaces
querying the source with
Entertainment
(en),
Recreation & Sports (rs), Table 1. Sampling and estimation some random words. We then, for each object colof the deep Web scale.
queried every search engine to test whether
Health (he), GovernHe table 1lected,
(5/07)
the page was indexed by formulating queries specifment (go), Regional (rg),
ically matching the object page. (For instance, we
Society & Culture (sc),
Education (ed), Arts & Humanities (ah), Science used distinctive phrases that occurred in the object
page as keywords and limited the search to only the
(si), Reference (re), and Others (ot).
The distribution indicates great subject diversity source site.)
Figure 3 reports our finding: Google and Yahoo
among Web databases, indicating the emergence
and proliferation of Web databases are spanning both indexed 32% of the deep Web objects, and
well across all subject domains. While there seems MSN had the smallest coverage of 11%. However,
to be a common perception that the deep Web is there was significant overlap in what they covered:
driven and dominated by e-commerce (for exam- the combined coverage of the three largest search
ple, for product search), our survey indicates the engines increased only to 37%, indicating they were
contrary. To contrast, we further identify non-com- indexing almost the same objects. In particular, as
merce categories from Figure 2(b)he, go, rg, sc, Figure 3 illustrates, Yahoo and Google overlapped
99
25%
30%
25%
20%
15%
10%
5%
0%
0
20%
15%
10%
5%
0%
10
be
Depth
ci nm en rs he go rg sc ed ah
si
re ot
Subject Categories
We found that query inter- and the number of query inter- Figure 2b. Distribution
databases over
faces
tend to
locatepicas
shallowly in faces as shown in Equation 4 (the of Websubject
category.
2a (5/07)
- 19.5
He
fig
2b
(5/07)
19.5
picas
their sites: none of the 129 query results are rounded to 1,000).
interfaces had depth deeper than The second and third columns of
5. To begin with, 72% (93 out of 129) interfaces Table 1 summarize the sampling and the estimawere found within depth 3. Further, since a Web tion results
respectively. We also compute the con25%
30%
database may be accessed through multiple inter- fidence interval of each estimated number at 99%
25% measured its depth as the minimum
faces, we
level of
20% confidence, as the 4th column of Table 1
depths 20%
of all its interfaces: 94% (32 out of 34) Web shows, which evidently indicates the scale of the
15%
databases appeared within depth
deep Web is well on the order of
15% 2(a) reports the depth
3; Figure
105 sites. We also observed the
10%
distribution
of
the
34
Web
datamultiplicity of access on the
10%
bases. Finally, 91.6% (22 out of
deep Web. On average, each
5%
5%
24) deep
Web sites had their
deep Web site provides 1.5 datadatabases
within
depth
3.
(We
bases, and each database sup0%
0%
Equation 2.
be ci nm en rs he go rg sc ed ah si re ot
6
7
8
9 10
0
1 ratios
2
3
4 as5 depthrefer to these
ports 2.8 query interfaces.
Depth
Subject Categories
three coverage, which will guide
The earlier survey of [1] estiour further larger-scale crawling
mated 43,000 to 96,000 deep
in Q2.)
Web sites by overlap analysis
He equation 2 (5/07)
(Q2) What is the scale of the
between pairs of search engines.
He fig
2a
(5/07)
15
picas
He
fig
2b
(5/07)
- 15[1]picas
deep Web?
We then tested and Equation 3.
Although
did not explicitly
analyzed all of the 1,000,000 IP
qualify what it measured as a
samples to estimate the scale of
search site, by comparison, it
the deep Web. As just identified,
still indicates that our estimation
He equation 3 (5/07)
with the high depth-three coverof the scale of the deep Web (on
age, almost all Web databases can
the order of 105 sites), is quite
be identified within depth 3. We Equation 4.
accurate. Further, it has been
thus crawled to depth 3 for these
expanding, resulting in a 37 times increase in the
one million IPs.
four years from 20002004.
He equation
(5/07)
The crawling found 2,256 Web servers,
among 4 (Q3)
How structured is the deep Web? While
which we identified 126 deep Web sites, which information on the surface Web is mostly unstruccontained a total of 406 query interfaces represent- tured HTML text (and images), how is the nature
ing 190 Web databases. Extrapolating from the s of the deep Web data different? We classified Web
=1,000,000 unique IP samples to the entire IP databases into two types: unstructured databases,
space of t = 2,230,124,544 IPs, and accounting for which provide data objects as unstructured media
the depth-three coverage, we estimate the number (text, images, audio, and video); and structured
of deep Web sites as shown in Equation 2, the databases, which provide data objects as structured
number of Web databases as shown in Equation 3, relational records with attribute-value pairs. For
98
instance, cnn.com has an unstructured database of ed, ah, si, re, otwhich together occupy 51% (97
news articles, while amazon.com has a structured out of 190 databases), leaving only a slight minordatabase for books, which returns book records (for ity of 49% to the rest of commerce sites (broadly
example, title = gone with the wind, format = defined). In comparison, the subject distribution of
the surface Web, as charpaperback, price =
acterized in [7], showed
$7.99).
that commerce sites
By manual querying The entire deep Web
dominated with an 83%
and inspection of the Google.com (32%)
share. Thus, the trend of
190 Web databases sam- Yahoo.com (32%)
deepening emerges not
pled, we found 43 MSN.com (11%)
only across all areas, but
unstructured and 147 All (37%)
also relatively more sigstructured. We similarly
0% 5%
37%
100%
nificantly in the nonestimate their total
commerce ones.
numbers to be 102,000
Figure 3. Coverage of search
and 348,000 respec(Q5) How do search engines cover the deep Web?
engines.
tively, as summarized in
Since some deep Web sources also provide
Table 1. Thus, the deep
browse directories with URL links to reach the
He fig 3 (5/07)
Web features mostly structured data sources, with a hidden content, how effective is it to crawl-anddominating ratio of 3.4:1 versus unstructured index the deep Web as search engines do for the
sources.
surface Web? We thus investigated how popular
Table 1. Sampling and estimation of
searchscale.
engines index data on the deep Web. In par(Q4) What is the subject distribution oftheWeb
deep-Web
databases? With respect to the top-level categories ticular, we chose the three largest search engines
of the yahoo.com directory as our taxonomy, we Google (google.com), Yahoo (yahoo.com), and
MSN (msn.com).
manually categorized the
We randomly selected
sampled 190 Web dataSampling Results Total Estimate 99% Confidence Interval
20
Web databases from
bases. Figure 2(b) shows Deep Web sites
126
307,000
236,000 - 377,000
the
190 in our sampling
the distribution of the Web databases
190
450,000
366,000 - 535,000
result.
For each database,
14 categories: Business
43
unstructured
102,000
62,000 - 142,000
first,
we
manually sam& Economy (be), Com147
structured
348,000
275,000 - 423,000
pled five objects (result
puters & Internet (ci),
406
1,258,000
1,097,000 - 1,419,000
pages) as test data, by
News & Media (nm), Query interfaces
querying the source with
Entertainment
(en),
Recreation & Sports (rs), Table 1. Sampling and estimation some random words. We then, for each object colof the deep Web scale.
queried every search engine to test whether
Health (he), GovernHe table 1lected,
(5/07)
the page was indexed by formulating queries specifment (go), Regional (rg),
ically matching the object page. (For instance, we
Society & Culture (sc),
Education (ed), Arts & Humanities (ah), Science used distinctive phrases that occurred in the object
page as keywords and limited the search to only the
(si), Reference (re), and Others (ot).
The distribution indicates great subject diversity source site.)
Figure 3 reports our finding: Google and Yahoo
among Web databases, indicating the emergence
and proliferation of Web databases are spanning both indexed 32% of the deep Web objects, and
well across all subject domains. While there seems MSN had the smallest coverage of 11%. However,
to be a common perception that the deep Web is there was significant overlap in what they covered:
driven and dominated by e-commerce (for exam- the combined coverage of the three largest search
ple, for product search), our survey indicates the engines increased only to 37%, indicating they were
contrary. To contrast, we further identify non-com- indexing almost the same objects. In particular, as
merce categories from Figure 2(b)he, go, rg, sc, Figure 3 illustrates, Yahoo and Google overlapped
99
on 27% objects of their 32% coverage: a 84% over- range of 0.2%3.1%. We believe this extremely
lap. Moreover, MSNs coverage was entirely a sub- low coverage suggests that, with their apparently
Table 2. Coverage of deep-Web directories.
set of Yahoo, and thus a 100% overlap.
manual classification of Web databases, such direcThe coverage results
tory-based indexing serreveal some interesting
vices can hardly scale for
Number of Web Databases
Coverage
phenomena. On one
the deep Web.
completeplanet.com
70,000
15.6%
hand, in contrast to the
lii.org
14,000
3.1%
common perception,
CONCLUSION
turbo10.com
2,300
0.5%
the deep Web is probaFor further discussion,
bly not inherently hidwe summarize the findinvisible-web.net
1,000
0.2%
den or invisible: the
ings of this survey for the
major search engines
deep Web in Table 3 and
Coverage of deep
were able to each index one-third (32%) of the Table -2.19.5
make the following conpicas
Web directories.
He table 2 (5/07)
data. On the other hand, however, the coverage
clusions. While imporseems bounded by an intrinsic limit. Combined,
tant for information
these major engines covered only marginally more search, the deep Web remains largely unexplored
than they did individually, due to their Number
significant
and is
currently neither well supported nor well
of Web Databases
Coverage
overlap. This phenomenoncompleteplanet.com
clearly contrasts 70,000
with understood.
The poor coverage of both its data (by
15.6%
the surface Web where, as lii.org
[7] reports, the overlap
search
engines)
and databases
(by directory serTable 3. Summary
of findings
in our survey.
14,000
3.1%
between engines is low, and combining them (or vices) suggests that access to the deep Web is not
turbo10.com
0.5%
metasearch) can greatly improve
coverage. In 2,300
this adequately
supported. In seeking to better underinvisible-web.net
1,000
0.2%
case, for the deep Web, the fact
Aspect
Findings
that 63% objects were not
scale
The deep Web is of a large scale of 307,000 sites, 450,000 databases, and 1,258,000 interfaces.
indexed by any engines indiIt has been rapidly expanding, with 37 times increase between 20002004.
cates certain inherent barriers
diversity 2 (5/07)
The deep Web
diversely distributed across all subject areas. Although e-commerce is a
He table
- 15is picas
for crawling and indexing data.
main driving force, the trend of deepening emerges not only across all areas, but also
relatively more significantly in the non-commerce ones.
Most Web databases remain
structural
Data sources on the deep Web are mostly structured, with a 3.4 ratio outnumbering
invisible, providing no linkcomplexity unstructured sources, unlike the surface Web.
based access, and are thus not
depth
Web databases tend to locate shallowly in their sites; the vast majority of 94% can be found
indexable by current crawling
at the top-3 levels.
techniques; and even when
search
The deep Web is not entirely hidden from crawlingmajor search engines cover about
engine
one-third of the data. However, there seems to be an intrinsic limit of coveragesearch
crawlable, Web databases are
coverage
engines combined cover roughly the same data, unlike the surface Web.
rather dynamic, and thus crawldirectory
While some deep-Web directory services have started to index databases on the Web, their
ing cannot keep up with their
coverage
coverage is small, ranging from 0.2% to 15.6%.
updates.
(Q6) What is the coverage of
Table 3. Summary of
stand the deep Web, weve determined that in some
deep Web directories? Besides
survey findings.
table 3 (5/07)
aspects He
it resembles
the surface Web: it is large,
traditional search engines, sevfast-growing, and diverse. However, they differ in
eral deep Web portal services
have emerged online, providing deep Web directo- other aspects: the deep Web is more diversely disries that classify Web databases in some tax- tributed, is mostly structured, and suffers an inheronomies. To measure their coverage, we surveyed ent limitation of crawling.
To support effective access to the deep Web,
four popular deep Web directories, as summarized
in Table 2. For each directory service, we recorded although the crawl-and-index techniques widely
the number of databases it claimed to have used in popular search engines have been quite sucindexed (on their Web sites). As a result, com- cessful for the surface Web, such an access model
pleteplanet.com was the largest such directory, may not be appropriate for the deep Web. Crawlwith over 70,000 databases.2 As shown in Table 2, ing will likely encounter the limit of coverage,
compared to our estimate, it covered only 15.6% which seems intrinsic because of the hidden and
of the total 450,000 Web databases. However, dynamic nature of Web databases. Further, indexother directories covered even less, in the limited ing the crawled data will likely face the barrier of
structural heterogeneity across the wide range of
deep Web data. The current keyword-based index2However, we noticed that completeplanet.com also indexed site search, which we
have excluded; thus, its coverage could be overestimated.
ing (which all search engines do), while serving the
100
101
on 27% objects of their 32% coverage: a 84% over- range of 0.2%3.1%. We believe this extremely
lap. Moreover, MSNs coverage was entirely a sub- low coverage suggests that, with their apparently
Table 2. Coverage of deep-Web directories.
set of Yahoo, and thus a 100% overlap.
manual classification of Web databases, such direcThe coverage results
tory-based indexing serreveal some interesting
vices can hardly scale for
Number of Web Databases
Coverage
phenomena. On one
the deep Web.
completeplanet.com
70,000
15.6%
hand, in contrast to the
lii.org
14,000
3.1%
common perception,
CONCLUSION
turbo10.com
2,300
0.5%
the deep Web is probaFor further discussion,
bly not inherently hidwe summarize the findinvisible-web.net
1,000
0.2%
den or invisible: the
ings of this survey for the
major search engines
deep Web in Table 3 and
Coverage of deep
were able to each index one-third (32%) of the Table -2.19.5
make the following conpicas
Web directories.
He table 2 (5/07)
data. On the other hand, however, the coverage
clusions. While imporseems bounded by an intrinsic limit. Combined,
tant for information
these major engines covered only marginally more search, the deep Web remains largely unexplored
than they did individually, due to their Number
significant
and is
currently neither well supported nor well
of Web Databases
Coverage
overlap. This phenomenoncompleteplanet.com
clearly contrasts 70,000
with understood.
The poor coverage of both its data (by
15.6%
the surface Web where, as lii.org
[7] reports, the overlap
search
engines)
and databases
(by directory serTable 3. Summary
of findings
in our survey.
14,000
3.1%
between engines is low, and combining them (or vices) suggests that access to the deep Web is not
turbo10.com
0.5%
metasearch) can greatly improve
coverage. In 2,300
this adequately
supported. In seeking to better underinvisible-web.net
1,000
0.2%
case, for the deep Web, the fact
Aspect
Findings
that 63% objects were not
scale
The deep Web is of a large scale of 307,000 sites, 450,000 databases, and 1,258,000 interfaces.
indexed by any engines indiIt has been rapidly expanding, with 37 times increase between 20002004.
cates certain inherent barriers
diversity 2 (5/07)
The deep Web
diversely distributed across all subject areas. Although e-commerce is a
He table
- 15is picas
for crawling and indexing data.
main driving force, the trend of deepening emerges not only across all areas, but also
relatively more significantly in the non-commerce ones.
Most Web databases remain
structural
Data sources on the deep Web are mostly structured, with a 3.4 ratio outnumbering
invisible, providing no linkcomplexity unstructured sources, unlike the surface Web.
based access, and are thus not
depth
Web databases tend to locate shallowly in their sites; the vast majority of 94% can be found
indexable by current crawling
at the top-3 levels.
techniques; and even when
search
The deep Web is not entirely hidden from crawlingmajor search engines cover about
engine
one-third of the data. However, there seems to be an intrinsic limit of coveragesearch
crawlable, Web databases are
coverage
engines combined cover roughly the same data, unlike the surface Web.
rather dynamic, and thus crawldirectory
While some deep-Web directory services have started to index databases on the Web, their
ing cannot keep up with their
coverage
coverage is small, ranging from 0.2% to 15.6%.
updates.
(Q6) What is the coverage of
Table 3. Summary of
stand the deep Web, weve determined that in some
deep Web directories? Besides
survey findings.
table 3 (5/07)
aspects He
it resembles
the surface Web: it is large,
traditional search engines, sevfast-growing, and diverse. However, they differ in
eral deep Web portal services
have emerged online, providing deep Web directo- other aspects: the deep Web is more diversely disries that classify Web databases in some tax- tributed, is mostly structured, and suffers an inheronomies. To measure their coverage, we surveyed ent limitation of crawling.
To support effective access to the deep Web,
four popular deep Web directories, as summarized
in Table 2. For each directory service, we recorded although the crawl-and-index techniques widely
the number of databases it claimed to have used in popular search engines have been quite sucindexed (on their Web sites). As a result, com- cessful for the surface Web, such an access model
pleteplanet.com was the largest such directory, may not be appropriate for the deep Web. Crawlwith over 70,000 databases.2 As shown in Table 2, ing will likely encounter the limit of coverage,
compared to our estimate, it covered only 15.6% which seems intrinsic because of the hidden and
of the total 450,000 Web databases. However, dynamic nature of Web databases. Further, indexother directories covered even less, in the limited ing the crawled data will likely face the barrier of
structural heterogeneity across the wide range of
deep Web data. The current keyword-based index2However, we noticed that completeplanet.com also indexed site search, which we
have excluded; thus, its coverage could be overestimated.
ing (which all search engines do), while serving the
100
101
THE EUROPEAN
PHYSICAL JOURNAL B
Regular Article
1 Introduction
Most of the studies of social networks concentrate on properties of groups formed due to attraction among participants. In such situations the links form between actors
sharing some similarity, for example common interest (e.g.
scientic collaboration networks or cross-linked Internet
sites) or likeness of views (e.g. political associations). Differences and conicts are viewed as limitations and barriers to network formation. Frequent reaction to meeting
with someone who holds an opposing view is not an attempt to convince (phenomenon assumed widely in the
consensus formation models), but rather to cutting o
the connection. In face-to-face encounters this avoidance
limits growth of networks based on contrariness. Perhaps
the most known form of links between communities based
on hate is provided by long term family or tribal feuds, and
there are usually limited in scope. However, the advance
of modern technologies has provided opportunities for indirect contacts, where it is possible to express hate and
aggression without the risk of reciprocal physical injury
or personal danger. This bravery of being out of range
allows hate based networks to form and ourish. In this
paper we present a study of specic communities that grow
thanks to disagreements, without any attempt to nd consensus. Despite fundamental motivational dierence, some
properties of the studied groups are similar to the more
common friendship networks.
The system we study is based on records of linked
user comments related to news items published on the
Internet. Technology provides the necessary ease of access and anonymity. From the research point of view, such
a
e-mail: pawelsobko@gmail.com
discussions are relatively easy to document and can provide necessary data for meaningful statistical work. We
have chosen discussion forums at one of the most popular
Internet portals and news sites in Poland, http://www.
gazeta.pl. We have limited our research to discussions
spurred by the Politics subsection of the news. Current
situation in Poland makes it an almost ideal ground for
such a study. There is almost clearly bipolar split between
the two main parties (Platforma Obywatelska, PO, and
Prawo i Sprawiedliwosc, PiS). The conict shown at the
highest positions of the state is even more visible in the
group of active readers of Internet portals.
The reason for choosing this particular forum is that
while preserving anonymity of the users, it also provides
relative recognizability of participants. This results from
the fact that only registered users are allowed to post
comments, and can be identied by registered nicknames.
We may assume that participant XYZ in one discussion
thread is the same person as participant XYZ in a different one. Of course this leaves open the possibility of
a single real person using multiple Internet personalities.
However, even with this limit on true identities, it is possible to try to nd hubs of communication, both in the
comment writing and in reaction to published comments.
Our goal is to nd if the change in motivations for
linking from positive to negative inuences general network properties.
2 Methods
The data for the study have been gathered using a dedicated program, written for the purpose of loading and initial analysis of the discussion threads at the selected site.
634
The program performed automated tasks of data collection and cleaning and enabled the next step, which consisted of assigning political stance to discussion participants and to classication of the comments. This part of
the analysis, by far most time consuming, had to be done
by a human, by reading all the comments in a thread.
It should be noted that in almost all cases (with a single exception only), the whole discussions were actually
linked not to the original news article but to the rst comment. This is a result of operational process of the portal,
and the fact that pushing the comment button in typical
situations links the post not to the original source but to
the earliest existing post. To avoid spurious statistics this
phenomenon has been corrected for by the program.
The participants were assigned three possible types
(called nodeclass). For commentators whose viewpoints
were visibly in agreement with each of political factions
we assigned nodeclassses A and B. The remaining participants, for whom it was impossible to clearly determine
political views, were given nodeclass NA.
The comments were classied according to the following scheme:
Agr comment agrees with the covered material (either
the original news coverage or the preceding comment
in a thread);
Dis comment disagrees with the covered material;
Inv comment is a direct invective and personal abuse of
the previous commentator;
Prv provocation - comment is aimed at causing dissent,
often only weakly related to the topic of discussion;
Neu comment is neutral in nature, neither in obvious
agreement or disagreement;
Jst just stupid comment, which is totally unrelated to
the topic of discussion, but without malicious intent;
Swi comment signifying a switch in participants position leading to agreement between two previously opposing commentators.
Other works on computer mediated discussions in closed
communities used dierent message classication themes,
for example Jeong [1,2] has proposed grouping comments
by categories such as Arg argument for a given thesis
(corresponding to our Agr category); But a challenge
(corresponding to our Dis category), Expl for posts giving explanations, and Evid for posts giving factual evidence. In our case, the explanations and evidence posts
were rather scarce, due to political nature of the disputes.
We have therefore opted for categorization that reected
the emotional nature of communication, rather than factual one.
Following the process of categorization we performed
standard analyses typical for network systems. As the literature on the subject is very rich we refer here to the
general overviews, for example [36]. It should be noted
that the average size of the network formed by posts related to a single news item was relatively small (from a few
tens to a few hundreds of comments, thus the statistical
spread of results for single discussion threads was rather
signicant.
In our analysis we have used publicly available program GUESS, developed and maintained by Eytan Adar
(http://graphexploration.cond.org/index.html, see
also [7]).
To understand the data we have developed a computer simulation model, which has resulted in quantitatively comparable system characteristics, allowing to understand the role of the most important factors driving
the growth of comment networks. Details of the model
are given in Section 3.5. The programs and scripts used in
analysis are available from the authors on request.
3 Results
3.1 General statistics of discussions and temporal
dynamics
The statistical properties of discussion threads depend,
obviously, on the visibility of the news stories they relate
to. Some of the news are featured on the portal opening page, so one would expect that this should receive a
greater number of user comments. Our observations do not
conrm these expectations the advantage of the front
page news is not signicant. The users activity does not
follow editors choices. Within the Politics category the
portal carries between 5 and 20 news items each day. On
a typical graphics display, a visitor sees 46 most recent
news items (although the web page has also a most commented section, providing a short cut to older, but popular stories). While the screen space and graphical clues
give no preferences (order of presentation is strictly temporal), the number of comments spurred by each story
varies signicantly, depending on their content.
Discussion size distribution shows a denite fat-tail behaviour. In addition to news items that raise no commentary at all, and weakly commented ones (below 5070 comments), there are quite a few mid-sized discussions (up to
about 200 comments) and occasional extended discussions
(between 200 and 500 posts).
The news related discussions are, by their very nature,
short-lived. While the portal allows to view and comment
stories backdating more than 2 weeks, the comment frequency vanishes rapidly with time. Usually, there are very
few comments later than 24 h after publication, and practically none after 48 h. Numerical analysis of the threads
shows for many discussions a reasonably good t with exponential decay timeline, with half-life of between 1 and
4 h. There are some exceptions, for example news which
gain popularity many hours after publication (this happens usually for stories published at night and commented
during business hours) or stories which get a second life
due to a quarrel between a few participants.
It is important to note the dierence between the time
scales typical for individual news items and related comments and time scales of user activities and interactions.
While the comments have on the spot, non-deliberative
characteristics, the user relationship network has persistence time scales of at least the duration of recorded observations (2 months). The interplay between the short-lived
635
such the ratio was much higher, for example 21% of the
220 posts in one of the discussions resulted from just two
long exchanges.
Recurrence of user nicknames connected with quarrels
in various threads has added plausibility to a hypothesis that they are largely due to the presence of duellists
users seeking each others comments almost regardless
of the topic and creating/joining in the ghts. For such
users the growth of ki and ko should be correlated. To
test these ideas, we have performed cumulative statistical analysis for all participants in a set of 58 discussions.
This has been done using assumption that the identity of
people remains xed to nicknames within the whole scope
of the portal. Results are quite interesting. Out of almost
2000 users there were only a few with very high values
of indegree (16 with ki 50). Similarly, there were only
23 users with ko 50. Eleven users belonged to both
groups. The average outdegree was ko = 4.62, while indegree (excluding references to the original news sources,
to count only post-to-post links) was ki = 3.01.
In addition to duellists in the studied discussions we
have found a group of hyperactive users specializing in
abusive comments (known as trolls) who, while publishing a lot of comments, receive much smaller number of
replies. For example one user has posted 236 times receiving only 51 replies. Although trolls post highly provocative
comments, they are frequently ignored most users seem
to know the rule dont feed the troll .
Figure 1 shows the cumulative network topology for
the 58 analysed discussions. There were 1977 users, 9135
posts, out of which 3194 were linked directly to news
items. In the gures we have removed all such links, leaving only connections between the users. The two views
focus on outdegree (upper panel) and indegree. Multiple
connections between pairs of users are emphasized by link
width. We can clearly see how a few of the users dominate
the whole forum. Figure 2 presents the cumulative distributions P (ki ) and P (ko ), as well as correlation between
ki and ko values. The two quantities are highly correlated,
especially for high values, with overall correlation coecient of 0.85.
Additional information about the network properties
may be provided by its correlation coecient. As the studied network is a directed one, this quantity is sometimes
called transitivity. To focus on relationships between the
users, we rst remove all links to the original news sources.
Because of the presence of multiple connections between
the same users there are two options for characterising
the network. In the rst option, we simply register the
presence of link between the users regardless of the actual number of connections. In this option, which we call
unweighted, all links have the same strength. In the second option, weighted, the weight of the link between two
users is naturally given by the number of comments. For
this scenario there are many ways of dening the correlation coecient. We follow use the geometric mean
method proposed by Opsahl and Panzarasa [10]. The
calculated value of unweighted CiU (P olitics) = 0.0665,
while for weighted option CiW (P olitics) = 0.0866. Large
636
Fig. 1. Two views of the topology of the network connecting the users participating in 58 large and mid-size discussions within
one month. Right panel: size of the nodes corresponds to outdegree of the user. Left panel: size of the node corresponds to
indegree. Links width reects the number of communications between the users binary exchanges are clearly identiable. Some
users have been identied by their nicknames. This allows to identify notorious trolls (such as wrojoz and koloratura1), who
have many posts but relatively few responses, and controversy leaders, such as tuskomatolek and junkier (who have more
responses that the posts). Despite the fact that almost 2000 users have participated in the discussions, only a few of them
dominate the exchanges, by their posting activity and by the concentration of responses, such as kralik111. A perfect example
of a user whose participation in discussions is motivated by negation and abuse of a particular opponent is given by rooboy
whose main target is kralik111.
Fig. 2. Cumulative indegree and outdegree distributions for 58 Politics mid-size and large discussions over a period of 30 days.
The third panel shows correlation between ki and ko , with two squares indicating the notorious trolls, i.e. individuals posting
a lot of comments but getting only a few answers. The triangles indicate the controversy leaders, receiving signicantly more
comments than they post. To be able to show the posts that have not resulted in any comment (with indegree equal zero) on
the log-log scales we have articially shifted them to ki = 0.1.
Agr
Dis
Inv
Prv
Neu
Jst
Swi
panel in Fig. 2). We observe a crowd of one-comment participants and several popular and prolic ones. Moreover,
the duellists recognize each other and tend to join in the
sub-threads simply to spur new rounds of abuse.
An interesting psychological observation is the existence of impersonators of famous commentators. They
choose a nickname that is on the rst glance identical to
the original one, for example by adding unobtrusive parts
to the user name, such as changing from XYZ to XYZ.
which often goes unnoticed. This is the most aggressive
form of trolling. In most cases the views of the original user
and the impostor are radically dierent. The trolls intention is to create chaos and confusion, as an unsuspecting
reader often nds comments with radically dierent views
or even exchanges between apparently the same participant, quarrelling with himself.
637
638
Fig. 3. Top left panel: distribution P (L) of discussions sizes L for three forum topics: politics, sport and science. Filled points
show the number of news items that did not elicit any comments. Lines show power law ts. The large dierence of exponent
for the sport forum is due to much larger number of news items, most of which get no or almost no reaction at all. Bottom left
panel: average outdegree ko , for the Politics forum as function of L. Right panel: correlation between the ko and percentage
of discussion spent on binary exchanges (quarrels). Points show percentages of quarrels longer than 6 posts and longer than
4 posts. The data support supposition that large ko values are due to extended quarrels between a few participants.
639
Fig. 4. Indegree and outdegree distributions obtained from computer simulations without quarrels. The third panel shows poor
correlation between ki and ko .
640
Fig. 5. Indegree and outdegree distributions obtained from computer simulations including quarrels. The third panel shows
strong correlation between ki and ko .
4 Discussion
4.1 Internet networks and friendly discussion forums
Modern Internet activities contain many examples of social activities based on similarity of interests: music communities, online role playing games, friendship websites.
Several such networks have been studied by Grabowski
et al. [12,13]. It seems interesting to compare the comment
network and the ones formed by interlinked web sites. Here
again we nd some resemblance and some dierences. The
most important dierences are that web sites and their
links are much more stable than comments usually more
thought is given by the authors when deciding what their
pages should be linked to. Also, these links are usually
driven by common interest and views. One seldom nds
links to web sites showing opposite viewpoint . The last
dierence might be that generally there is less emotion and
more content in traditional web pages. Despite these differences the general properties of both types of networks
641
the two systems is the lack of observed signicant correlation between indegree and outdegree for the blogs, with
ki and ko correlation coecient of only 0.16, much lower
than 0.85 in the Politics network dominated by bilateral
exchanges. Leskovec et al. propose a cascading model of
blog links and provide data on relative probability of various patterns of link connections. The binary exchanges of
our approach which would correspond to linear topology
of the cascade model are relatively less probable in the
blog case, where the cascades tend to be wide rather than
deep.
The discussions studied in this work are by no means
the only examples of hate present in the vast space of
the Internet. Chau et al. [16,17] study the network structure of Hate groups. These studies are important for us
for two reasons. First, they focus on bloggers, who enjoy
a lot of freedom to express their opinions and emotions.
Second, the authors use networking methods, similar to
the ones employed here. The network of users is formed
through formal subscriptions between blogs and through
impromptu comments posted to each other. This last aspect corresponds directly to our situation. While the political views studied by Chau and Xu are probably more extreme than the ones of the readers of the www.gazeta.pl
portal, the emotional reactions seem to be as strong. It
is quite interesting that the degree distribution for the
giant component of 273 nodes in Chau and Xu network
exhibits power-law behaviour, P (k) k with exponent
1.38.
642
Despite the fact that the motivation for posting is radically dierent, the same phenomenon is observed in the
network discussions. We expect that the reason is again
technical. We noted that most of the heated exchanges
were related to early posts. This is due to the way the
discussion is visually fragmented into pages containing
100 posts viewing the later comments requires more effort. Thus late comments linking directly to the original
news story are not immediately visible and at a disadvantage compared to early ones. Only in rare cases, if there is
an interesting discussion, some later posts might get high
response rate despite this disadvantage.
4.4 Implications for consensus formation modelling
The last conclusion from our observation relates to a different domain of social modelling. We refer to computer
models of opinion formation (for recent review see [27]).
Most such models use so called agent based societies and
assume that consensus is achieved through a series of exchanges between agents. Some models postulate a form
of averaging of opinions towards a mean value (for example [28,29]), other use assumption that as a result of
interaction between two agents one of of them changes his
or her opinion to t the others [30]. Unfortunately in large
part the studies concentrate of mathematical formalisms
or Monte Carlo simulations, and not on descriptions of
real-life phenomena. The need of bringing simulations and
models closer to reality has been realized and voiced quite
a few times [3133].
An interesting result from the present study is that
the exchanges studied here (voicing of opinions in a quasianonymous medium) may not lead to consensus formation at all, despite repeated interactions between participants. In certain situations, such interactions lead rather
increased rift between the participants. This eect should
be studied in more detail, as it possibly suggests modications of the models of consensus formation in other
situations.
On one hand, we could assume that this is a phenomenon specic to computer mediated interactions, with
their lack of face-to-face eects of increased responsibility, shyness, induced submissiveness and even sympathy.
Anonymity and lack of fear of retribution might embolden
the participants and also promote additional mischievousness (clearly visible in the presence of provocative posts).
Thus one might assume that the studied form of exchanges
is an exception to the general rules of opinions getting
closer as result of interactions.
But everyday experience shows that even when people
meet face-to-face, with full use of non-verbal and emotional communication, the conicting views may remain
stable. Both history and literature are full of examples of
undying feuds, where acts of aggression follow each other,
from Shakespearean Verona families to modern political
or ethnic strife. Observations of the Internet discussions
should therefore be augmented by sociological data on
esh-and-blood conicts and arguments, and the dynamics of the opinion shifts. But even before such studies are
done or referred to (which the present authors feel is beyond their competence) the basic assumptions of the sociophysical modelling of consensus formation should be expanded. This is a very interesting task, because ostensibly
we are faced with two incompatible sets of observations:
Hard data and evidence to support their viewpoint,
participants in the studied Internet discussions tend
to hold to their opinions, strengthening their resolve
with each exchange. Within the analysed subset of the
discussions the conversion of opinion even a simple
agreement to a statement from opposing side was
virtually absent. Interactions do not seem to lead to
opinion averaging or switching.
Yet, most of the participants do have well dened opinions. These must have formed in some way. There are
studies indicating genetic/biological base for some of
the political tendencies [3437]. So perhaps the participants in our discussions did have a built-in tendency
to pick one of the sides of the divide, and to stick to it.
Regardless of genetic considerations the political attitudes are thought to be dependant on fairly stable
elements, such as childhood environment, which again
decreases the chances of reaching a consensus. But specic opinions on concrete events or people can not be
genetically coded nor due to general cultural formation they must be reached individually in each case.
Where do such inuences come from?
Judging by content of the analysed posts, we suggest in
our case existence of two mechanisms: fast consensus formation within ones own group (including adoption of
common, stereotyped views and beliefs); and persistence
of dierences with other groups. An interesting experimental conrmation of such phenomenon has been published recently [38]. Knobloch-Westerwick and Meng note
that their ndings demonstrate that media users generally
choose messages that converge with pre-existing views. If
they take a look at the other side, they probably do not
anticipate being swayed in their views. [. . . ] The observed
selective intake may indeed play a large role for increased
polarization in the electorate and reduced mutual acceptance of political views. This nding is in full agreement
with the behaviour we report.
The persistence of dierences of opinions exhibited in
online discussions studied in this work stands in contrast
to observations of Wu and Huberman [39,40], who measured a strong tendency towards moderate views in the
course of time for book ratings posted on Amazon.com.
However, there are signicant dierences between book
ratings and expression of political views. In the rst case
the comments are generally favourable and the voiced
opinions are not inuenced by personal feuds with other
commentators. Moreover, the spirit of book review is a
positive one, with the ocial aim of providing useful information for other users. This helpfulness of each of the
reviews is measured and displayed, which promotes prosociality and good behaviour. In the case of political disputes it is often the reception in ones own community
that counts, the show of force and verbal bashing of the
opponents. The goal of being admired by supporters and
References
1. A.C. Jeong, The American Journal of Distance Education
17, 25 (2003)
2. A.C. Jeong, Distance Education 26, 367 (2005)
3. S.N. Dorogovtsev, J.F.F. Mendes, Evolution of Networks
From Biological Nets to the Internet and WWW (Oxford
University Press, 2003)
4. R. Albert, A.L. Barab
asi, Rev. Mod. Phys. 74, 67 (2002)
5. M.E.J. Newman, J. Stat. Phys. 101, 819 (2000)
6. M.E.J. Newman, D.J. Watts, S.H. Strogatz, Proc. Natl.
Acad. Sci. USA 99, 2566 (2002)
7. E. Adar, in Proceedings of the SIGCHI conference on
Human Factors in computing systems, ACM Press (2006),
pp. 791800
8. M.E.J. Newman, S.H. Strogatz, Duncan J. Watts, Phys.
Rev. E 64, 026118 (2001)
9. A.L. Barab
asi, R. Albert, Science 286, 509 (1999)
10. T. Opsahl, P. Panzarasa, Social networks 31, 155 (2009)
11. B.A. Huberman, D.M. Romero, F. Wu, Arxiv preprint
arXiv:0809.3030, (2008)
12. A. Grabowski, N. Kruszewska, R.A. Kosi
nski, Eur. Phys.
J. B 66, 107 (2008)
13. A. Grabowski, Eur. Phys. J. B 69, 605 (2009)
14. L.A. Adamic, N. Glance, in Proceedings of the 3rd international workshop on Link discovery (2005), pp. 3643
15. J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance,
M. Hurst, in SIAM International Conference on Data
Mining (SDM 2007) (2007)
643
Indexing and Access for Digital Libraries and the Internet: Human, Database, ...
Marcia J Bates
Journal of the American Society for Information Science (1986-1998); Nov 1998; 49, 13;
ABI/INFORM Global
pg. 1185
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
students to acquire advanced competence through individual study or optional courses, but if
there is no integrated and required training in the subject, it is probable that only a minority of
students will take advantage of them, and thus benefit from the full potential. Some universities
have chosen the opposite point of view, by organizing courses dedicated to the students of the
humanities within their computer science departments.
In recent times (but mainly in anglo-american environment!) attention has been increasingly
brought on the essence of cumanities computing, investigated from the point of view of teaching
and from the point of view of institutional organization. I have registered the important
conference on Is Humanities Computing an Academic Discipline?, held under the auspices of
the Institute for Advanced Technology in the Humanities (IATH), at the University of Virginia [4]
, a gathering of prominent individuals in the fields of computing and communications science,
and arts and humanities research [5]; two contributions of Willard McCarty [6]; a lively
discussion inside the Humanist Discussion Group [7], active on the internet; the book produced
by the Socrates European Program, cited above; and an interesting paper by John Lavagnino
[8].
The result of all these contributions is that the essential questions have been indicated, and
most of the right answers. I shall resume the points that I consider as definitely settled, although
they do not completely solve the problem which we are discussing in this paper.
These are the opinions of Willard McCarty. Just a tool: otherwise intelligent colleagues refer to
the computer as just a tool or simply a bunch of techniques, as if ways of knowing did not
have much to do with what is known. Because the computer is a meta-instrument a means of
constructing virtual instruments or models of knowing we need to understand the effects of
modelling on the work we do as humanists. Creative expression and mechanical analysis: What
is the relationship between creative expression and mechanical analysis? What scholarly role
can the algorithmic machine play in the life of the mind as practising scholars live it, and how
might this role best be carried out? The effects of computing may easily be overemphazised,
and often are, but we have good reason to suspect that fundamental changes are afoot.
Mediation of thought by the machine: From the beginning it has been quite clear that humanities
computing is centred on the mediation of thought by the machine and the implications and
consequences of this mediation for scholarship. We are reminded by the cultural sea-change of
which the computer is a most prominent manifestation, that our older scholarly technologies,
such as alphabetic writing, the codex, and printing, are technologies, and that they also shape
our thinking. Methodologies: What jumps immediately into focus is the importance of
methodologies. When you teach humanities computing what immediately becomes obvious is
that the only subject you have to talk about is the methodology. Computing and the humanities
not separated: That computing and the humanities are fundamentally separate is an illusion
caused by a lack of historical perspective and perpetuated in the discipline-based structure of
our institutions. Philosophical training: In the broad sense, philosophical questions naturally
arise out of a machine that mediates knowledge and whose modelling of cognition reflects back
on the question of how we know what we know. Philosophical training would seem a sine qua
non because of its disciplined and systematic focus on logic and critical thinking skills, as well
as a concern with how to interpret diverse representations of knowledge, including what
philosophers and literary critics jointly refer to as hermeneutics. Computing not purely utilitarian:
The assumption that computing mimics what we already do, that it is purely utilitarian would
meant that projects were thoughtlessly undertaken, software then written and put out into the
field, but it seems that we can save much grief by prior thought about the questions we'd want
to ask. The labour-saving myth: We know this myth to be silly; we know that only the dull,
unimaginative scholar would not be inclined to do a better job with the time liberated from
mechanical. We also know that the computer does not so much save labour as change the
nature as well as scope of what we labour at. Research methods: We must objectify our
research methods before we can compute the artefacts we study, and in so doing we bring out
into the open what has formerly been hidden from view. Part of the problem has been the
attitude in the humanities by which the physical bits and craftsmanship of research, its
technology, are relegated to a lesser status.
Roly Sussex [9], about a new epistemology, observed that what is interesting about
computational methods is that these methods are providing us with both a new methodology
and a new epistemology. The notion of data is undergoing a reworking. Humanists are
learning to interpret statistical reports on what our software says the text is doing. This whole
process is tending to bring some areas of the Humanities closer to questions of methodology in
other disciplines, and indeed to make the Humanities more scientific.
Manfred Thaller [10]: We are dealing with methods, that is, the canon (or set of tools) needed to
increase the knowledge agreed to be proper to a particular academic field. Computer science is
a very wide ranging field. At one extreme, it is almost indistinguishable from mathematics and
logic; at another, it is virtually the same as electrical engineering. This, of course, is a
consequence of the genealogy of the field. Having widely different ancestors in itself, computer
science in turn became parent to a very mixed crowd of offspring. The existence of this wide
variety of disciplines, related to or spun off from computer science in general, implies two things.
First, there must be a core of computer science methods, which can be applied to a variety of
subjects. Second, for the application of this methodological core, a thorough understanding of
the knowledge domain to which it is applied is necessary. The variety of area specific computer
sciences is understandable from the need for specialized expertise in the knowledge domain of
each application. The core of all applied computer sciences is more than the sum of its
intellectual ancestors, which may themselves be inextricably associated with particular
knowledge domains. If we accept the assumption that the successful application of
computational methods strongly depends on the domain of knowledge to which it is applied,
then we also have to accept that applying computational methods without an understanding of
that domain will be disastrous.
We conclude that it is pointless to teach computer science to humanities scholars or students
unless it is not directly related to their domain of expertise. We conclude that humanities
computing courses are likely to remain a transient phenomenon, unless they include an
understanding of what computer science is all about.
As I said, I consider as settled the points so far examined, but this does not solve entirely the
problem which we are discussing now. Before illustrating my opinion on it, it seems convenient
to clarify the reason why it is important to discuss the problem, and the limits of the discussion.
In fact, all this would not be worth spending our time, if it has not practical consequences in the
academic organization. A discipline exists independently from the will of the scholars. It can be
acknowledged or refused, but if it really exists, it cannot be either created or destroyed. Beside
this, knowledge is theoretically unitarian and interdisciplinary, and the separation of the
disciplines is only valid as a useful mean for teaching, and partially for research.
The proposal of a (new) discipline concerns the official academic organization of the different
states. They at last are beginning to acknowledge the importance of teaching computer
applications (in this case, to the humanities) to the students, but, as we have seen above, their
approach is far from consistent. We must distinguish between the simple alphabetization, which
may be usefully left to the informaticians, and the teaching of applications for research. In this
case it is important to pose the problem of how the teachers themselves will be formed. The
idea of blending mechanically some courses of general computer science with the normal,
traditional courses of humanities in a curriculum is dangerous, and, for what we can assume
from the present experience, disastrous.
If it is not too late, we must try and persuade the academic organizations that humanities
computing as a discipline in fact exists, and how it is shaped. These are my arguments. I begin
by observing that the application of computing is not the same as the application of computers.
The computer as I see it, is not the type of the machine, of which the tokens are in front of us,
on our desks or laps or palms, but the set of devices (not one device!) described by von
Neumann, as the realisation (we add) of the Turing universal machine, along the lines of the
construction of the ENIAC, EDVAC, and the Mark I.
The Turing machine is central in my approach to the problem of humanities computing, because
it is the abstract, logic (I prefer to avoid the term mathematical) model underlying every
realisation of a computer. Only an abstract, logic model can clarify the methodological problems
raised by the meeting of humanities with the computers. In other words, I am separating the
concept of a normal machine, like the book or the typewriter or the calculator, from that of the
universal machine, of the automaton per se.
Such a view may of course be disputed, but if it is accepted, the next step is to realize that the
computer may be used in two different ways: (1) to simulate the behaviour of another machine,
because the computer can simulate any possible machine; (2) in its full capacity of computing
machine, that is, for the peculiarity which distinguishes the computer from all other machines,
which consists in the possibility to do computation as developed in the theory of recursive
functions. The first option is that adopted by those who would be content with teaching
alphabetization courses. The second requires as a matter of course the institution of and
independent discipline.
The distinction is important, because it helps to establish why the application of computers
raises methodological problems, and to what extent it does so. Because it seems evident that
when the computer is applied in the humanities only so far as it simulates (does the work of) a
traditional machine, then no new methodological problems arise, because there is no
substantial difference from the traditional procedures, if not of speed and convenience.
On the contrary, when the computer is applied in its full capacity of running algorithms,
humanities are confronted with a radically new situation, for which there is no commonly
recognised methodology. Something new happened in the field of epistemology when A. Turing
proposed his famous paper On Computable Numbers, because after it some of the rules which
help to build our knowledge were changed in a basic way.
The use of computers may require (or sometimes produces) a change in our minds. I would say
that the Turing machine is in fact a way of thinking, the formal way of thinking, which might have
remained restricted to the discipline of mathematics, had it not given birth, as a by-product, to
the computers. Although some of the elements of the new methodology were present in many
disciplines before the advent of computers, the systematic use of the Turing scheme, and the
possibility to use computers in humanities, is fundamentally altering part of all humanities
disciplines.
In order to be used in a proper way, that is, in order that it may give good results, or in any case
the wanted results, the Turing machine dictates some conditions, and particularly it dictates the
formalization of reasoning, and the formalization of data. If we accept this, we understand the
importance of teaching a good theory of formalization, and especially one which is valid in the
field of the humanities. As often in such instances, everybody has an intuitive idea of what
formalization is, but only a specialist in humanities computing can teach the right idea.
Computation is introducing in the humanities new methodological concepts and procedures,
especially for what concerns the formalization of problems and data representation.
On the other hand, it is easy to realise (a) that part of the humanities was computed well
before computers were used, and (b) that even where the computer is used as it were a
common machine, it imposes some constraints on the form of data, which did not exist before.
The reflection on, and clarification of all these fundamental issues seems both necessary and
urgent, as it is, as a consequence, the foundation of an independent scientific discipline,
humanities computing, which studies the problems of formalization and models, crossing all
humanities disciplines (linguistic, literature, history, archaeology, history of art, history of music),
but which none of them can fully develop by itself.
(17. Mai 2002)
NOTES
[1] Robert Proctor: Defining the Humanities. Bloomington/Indianapolis: Indiana University. Press 1998 (IId.
ed.). Another reason for me to cite this book is the interesting part about humanities curriculum, which should
be carefully considered, although, of course, the idea to introduce humanities computing is far from his view.
[2] See the URL <http://www.hd.uib.no/AcoHum/aco-hum.html> (21.4.2002).
[3] <http://www.hd.uib.no/AcoHum/book/> (21.4.2002), chapter 2: European studies on formal methods in the
humanities. Cf. also the list of academic centres of humanities computing by W. McCarty and M.
Kirschenbaum, in : Humanities computing units and institutional resources,
<http://www.kcl.ac.uk/humanities/cch/wlm/hcu/> (21.4.2002).
[4] Guy Fawkes Day 1999, cf. URL: <http://www.iath.virginia.edu/hcs/> (21.4.2002).
[5] Sponsored by The Computer Science and Telecommunications Board (CSTB) of the National Research
Council, in an attempt to explore the complexities of cross-disciplinary collaboration = American Council of
Learned Societies, Occasional Paper No. 41: Computing and the Humanities, cf. URL:
<http://www.acls.org/op41-toc.htm> (21.4.2002).
[6] W. McCarty: Poem and Algorithm. Humanities Computing in the Life and Place of the Mind. Keynote
speech for: HumanITies. Information Technology in the Arts and Humanities: Present Applications and Future
Perspectives, The Open University Milton Keynes 10 October 1998. W. McCarty: We would know how we
know what we know. In: The Transformation of Science: Research between Printed Information and the
Challenges of Electronic Networks. Max Planck Gesellschaft, Schloss Elmau, 31 May - 2 June 1999, URL:
<http://ilex.cc.kcl.ac.uk/wlm/essays/know/> (21.4.2002).
[7] Vol. 12; Centre for Computing in the Humanities, King's College London Cp. URL:
<http://www.princeton.edu/~mccarty/huanist/> (21.4.2002).
[8] Forms of Theory. Some Models for the Role of Theory in Humanities-Computing Scholarship, abstract in:
International seminars Computers, Literature and Philology (CLiP) 06.-09.12.2001, URL: <http://www.uniduisburg.de/FB3/CLiP2001/abstracts/Lavagnino-en.htm> (21.4.2002).
[9] Centre for Computing in the Humanities, vol. 13, No. 351 (note 7).
[10] Advanced Computing, cap. 2 (note 3).
Vous pouvez adresser vos commentaires et suggestions :
orlandi@rmcisadu.let.uniroma1.it
Rfrence bibliographique : Orlandi, Tito. Is Humanities computing a discipline? Jahrbuch fr
Computerphilologie [en ligne], 2002, n4, p. 51-58. Disponible sur :
<http://computerphilologie.uni-muenchen.de/jg02/orlandi.html>. (Consulte le ...).
doi:10.1111/j.1083-6101.2009.01478.x
In a little more than a decade, the Internet has revolutionized mediated communication and communication flow. With the pace of change and the emergence of
new uses of the Internet (e.g., YouTube, MySpace) over this time, researchers have
continued to struggle with explaining various positive and negative effects of Internet
use that have garnered attention. Some have suggested that Internet use can enhance
living conditions by providing access to diverse information (Bauer, Gai, Kim,
Muth, & Wildman, 2002), widen users social circles (e.g., Hampton & Wellman,
2003; Katz & Aspden, 1997; Rheingold, 1993), and enhance psychological well-being
(Chen, Boase, & Wellman, 2002; Kang, 2007). Others have considered some potential
negative effects of the Internet, arguing that it can be an isolating medium leading
to loneliness, less social interaction with family members and friends (e.g., Kraut,
Patterson, Landmark, Kielser, Mukophadhyaya, & Scherlis, 1998; Sanders, Field,
Diego, & Kaplan, 2000; Stoll, 1995; Turkle, 1996), and clinical depression (Young &
Rogers, 1998).
One negative effect that has received considerable attention over the last several
years is the extent to which people may become addicted to the Internet. The ongoing
evolution of Internet use and growth in the amount of time people spend using the
Internet has fueled this concern. Researchers have used different terms to describe
very similar types of behavior. These include problematic Internet use (Caplan,
2002; Davis, Besser, & Flett, 2002), pathological Internet use (Morahan-Martin,
& Schumacher, 2000), Internet dependency (Anderson, 1998; Scherer, 1997), and
Internet addiction (Beard & Wolf, 2001; Griffiths, 1996; Young, 1996a). In the current
study we use the term Internet addiction for consistency. However, it must be noted
that conceptual confusion surrounding this emotion-laden term has made it difficult
to ascertain the precise psychopathology arguably associated with it (Shaffer, 2004).
For example, whereas terms such as dependency and addiction have a longstanding
history of being used interchangeably in the context of drug and alcohol abuse
(Eisenman, Dantzker, & Ellis, 2004), in media studies such terms have very different
historical meanings. For instance, dependence or reliance on a particular medium or
988 Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
channel has been viewed as a normal consequence of using a medium to satisfy ones
communication needs (e.g., Ball-Rokeach, 1985; Rubin & Windahl, 1986), even if it
is associated with heavy use and extreme affinity with the medium.
Such divergent conceptualizations of Internet addiction result in two glaring
problems that researchers must remedy. First, without clarification, we are left to
struggle with distinguishing between use that may reflect mere dependency on a
medium (which media researchers have suggested is a normal consequence of media
use), mere heavy use (which may or may not be healthy), and actual addiction
(a pathological state as understood in contexts such as substance addiction). This,
in turn, leads to a lack of clarity among professionals and policymakers who
need to understand exactly what problems and symptoms, if any, they have to
address. Second, divergent conceptualizations hinder the advancement of theoretical
explanations about when Internet users exhibit characteristics of use that amounts
to addiction and the identification of antecedent factors/conditions that may
influence this psychological state.
In the current study, we draw on prior addiction research in an effort to
synthesize prior thinking on the current subject, and attempt to conceptualize
Internet addiction in a manner that is consistent with conceptualizations of addiction
in other contexts, such as substance abuse. This is necessary, because if Internet
addiction is a problematic phenomenon, it should have similar indicators and
psychosocial risk factors as other addictions (Shaffer, 2004). Because certain traits
or background characteristics have been considered to be significant predictors of
addiction in both Internet and other contexts such as alcoholism (Loos, 2002; Medora
& Woodward, 1991) and drug abuse (Rokach & Orzeck, 2003), we also suggest the
need to explore more deliberately users background characteristics that may make a
user prone to Internet addictive behavior. We also examine whether motives for using
the Internet help explain such behavior. Research conducted within the uses and
gratifications perspective (U&G) over the past 3 decades has shown that background
characteristics and media-use motives can enhance or mitigate media effects (e.g.,
Rubin, 2002). Therefore, we examine how Internet-use motives and background
characteristics work together and help explain Internet addiction.
In this study we focus on the Internet, generally, rather than addiction to specific
content that may be available via the Internet. The possibility that people can be
addicted to general use of the Internet has been investigated in a group of previous
studies (e.g., Caplan, 2002; Davis, 2001; Young, 1996b). It may be that some people
turn to the Internet to fulfill needs for particular content (e.g., violence, pornography)
or behavior (e.g., gambling). However, our goal here is simply to add to prior research
that has suggested that people can be addicted to the Internet itself, but failed to
account for differences in users background factors and motives that may contribute
to such a consequence.
Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
989
Addiction
A number of scholars have suggested that addiction does not necessarily have to
involve abuse of a chemical intoxicant or substance (Griffiths, 1999; Young, 2004).
For example, the term addiction has been used to refer to a range of excessive
behaviors, such as gambling (Griffiths, 1990), video game playing (Keepers, 1990),
eating disorders (Lesieur & Blume, 1993), physical exercise (Morgan, 1979), and
media use (e.g., Horvath, 2004; Kubey, Lavin, & Barrows, 2001). Although such
behavioral addictions do not involve a chemical intoxicant or substance, a group
of researchers have posed that some core indicators of behavioral addiction are
similar to those of chemical or substance addiction, such as loss of control, tolerance,
withdrawal, and negative life consequences (Brown, 1993; Lesieur & Blume, 1993;
Marks, 1990). It also has been suggested that individuals who engage in different
types of addictive behaviors share similar reasons, such as relief of anxiety, boredom,
and depression (Lesieur & Rosenthal, 1991; Zweben, 1987).
The Diagnostic and Statistical Manual of Mental Disorders (DSM) has been one
widely used source for identifying indicators of addiction. The DSM, published by the
American Psychiatric Association (APA), is a handbook that lists a diverse range of
mental disorders, including addiction, and criteria for diagnosing them. The DSM-IV
is the latest major revision published in 1994. It divides disorders into four categories;
clinical disorders, cognitive disorders, mental retardation, and personality disorders.
Although Internet addiction, specifically, has not been recognized as a disorder by the
APA, it did recommend further research of overuse of the Internet and video games
(American Psychiatric Association, 2006).The APAs recommendation suggests the
value of exploring further whether Internet addiction can or should be categorized
as another type of addiction, as promoted by some researchers (Griffiths, 1999;
Young, 2004).
Internet Addiction
Concerns that people can become addicted to a medium pre-date the Internet. For
example, popular books such as The Plug-In Drug (Winn, 1977) referenced addictive
properties of television, and researchers have explored this further in recent years
(Horvath, 2004).
Regardless of medium, using an emotion-laden term such as addiction has
been controversial. This has been the case with the Internet as well. Nonetheless, it has
caught the attention of and spurred debate among the APA (American Psychiatric
Association, 2006), medical professionals, and social scientists. For example, at its
annual conference in June 2007, members of the APA considered a proposal to
include excessive Internet use as an addiction, but decided to table it for further
investigation (Mandell, 2007). Jerald J. Block, M.D., in an editorial published on
The American Journal of Psychiatry, suggested that Internet addiction has become an
increasingly commonplace compulsive-impulsive disorder and should be included
as a common disorder that merits inclusion in DSM-V (p. 306). However, other
medical professionals, such as Dr. Stuard Gitlow of the American Society of Addiction
990 Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
Medicine, have rejected such a suggestion and argued that there is not enough evidence
that Internet addiction is a complex physiological state close to alcoholism or drug
addiction (Martin, 2007).
There also has been a lack of agreement among social scientists. While some
have promoted the notion that Internet addiction can or should be categorized as
another type of addiction (Griffiths, 1999; Young, 2004), others have contended
that we should focus more on other sources of maladjustment that lead people to
unhealthy use of the Internet rather than on the Internet itself (e.g., Walther, 1999).
The lack of agreement among the medical and scholarly communities implies that
a clear definition of this disorder has yet to be developed (Shaffer, Hall, & Vander
Bilt, 2000).
Even with some unresolved issues, a growing body of research has suggested that
DSM-IV may offer the most promise for identifying Internet addiction (Brenner,
1997; Thatcher & Goolam, 2005; Widyanto & McMurran, 2004; Young, 1996a) or
addiction to specific online content (e.g., sexual content) (Bingham & Piotrowski,
1996). Others have used the DSM-IV criteria to conceptualize and operationalize
addiction to other media such as television (e.g., Horvath, 2004; Winn, 1977).
Using the DSM diagnostic criteria of substance addiction (e.g., alcohol, cocaine,
etc.)1 , Goldberg (1996) specified four criteria for diagnosing Internet addiction: 1)
one needs to increase the amount of time spent online to achieve the same effect
(tolerance), 2) one experiences an unpleasant feeling when he/she is not online
(withdrawal), 3) one needs to access the Internet more often and for longer periods
of time (craving), and 4) one experiences conflicts between Internet use and other
activities (negative life outcomes). Griffiths (1998) added three more criteria: 1) using
the Internet becomes the most important activity in ones life (salience), 2) one uses
the Internet to alleviate their mood (mood modification), and 3) one keeps going back
to his/her old Internet use pattern with unsuccessful efforts to cut down (relapse).
Young (1997) defined Internet addiction as a type of impulse control disorder. She
created a 20-item Internet addiction scale based on the DSM-IV diagnostic criteria
used for diagnosing substance addiction and pathological gambling. Actually, Young
(1996b, 1998) found that addictive Internet users exhibited tolerance, withdrawal,
and negative academic and occupational consequences that were consistent with
those exhibited by substance abusers.
In light of this line of research, Kubey et al. (2001) suggested that pathological
users of the Internet were engaged in a much more excessive form of use than mere
reliance or dependence. Whereas many Internet users may spend a great deal of time
online, heavy use or reliance does not necessarily reflect what may be one of the
most important characteristics of Internet addiction: the loss of control. It has been
suggested, for example, that those who struggle with Internet addiction are compelled
to spend significant time involved with various Internet activities even though these
activities cause them to neglect family, work, or school obligations. These intemperate
problems reflect a users loss of control over Internet use, increasing involvement
with the Internet and an inability to curtail this involvement in spite of adverse
Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
991
consequences associated with such use (Shaffer, 2004). Such a loss of control is
reflected in DSM IV criteria for identifying addiction in various contexts.
Unfortunately, although the DSM-IV criteria for diagnosing pathological
gambling and substance addiction has provided criteria that has been used for
identifying Internet addiction, most research has not been theoretically grounded.
Therefore, we dont have a good overarching theoretical picture of relationships
among variables that may predict Internet addiction. As Kubey et al. (2001) argued,
there is a need, at a minimum, for theoretical explanations why the Internet may have
a hold on some individuals. In the current study, we use an audience-centered media
effects approach, uses and gratification (U&G) theory, to study Internet addiction.
U&G focuses specifically on how various media user background characteristics,
motives for using media, and media use patterns work in concert to influence
effects. Thus, it provides a theoretical framework with which we can consider
the relative contribution of social and psychological antecedent factors that have
predicted addiction in other contexts (e.g., substance addiction), and media-use
motive variables that have been linked to addiction to other media (e.g., television)
to Internet addiction.
Uses and Gratifications Theory (U&G)
Shyness
According to McKenna and Bargh (2000), individuals who feel lonely because of
their lack of good social skills try to overcome their problems through online
Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
993
social interactions. As in the case of shy individuals, lonely people may use the
Internet for social compensation when they are not satisfied with their offline
interpersonal relationships (Papacharissi & Rubin, 2000). Reliance on the Internet to
alleviate loneliness may lead to problematic Internet use (Caplan, 2002, 2003; Davis,
2001). Kubey et al. (2001) also suggested a link between loneliness and Internet
addiction, claiming that lonely people feel socially incompetent and tend to feel
more comfortable with online activities. Outside of the Internet context, loneliness
has been linked to drug use (Grunbaum, Tortolero, Weller, & Gingiss, 2000) and
alcoholism (e.g., Akerlind & Hornquist, 1989; Loos, 2002; Medora & Woodward,
1991; Nerviano & Gross, 1976). Based on this prior research, we predicted a positive
association between loneliness and Internet addiction.
Locus of control
Locus of control refers to an individuals belief about the extent to which he/she
is in control of his/her life (i.e., internal locus of control) vis-`a-vis the extent to
which he/she believes external forces (e.g., other people or chance) are in control
of his/her life (i.e., external locus of control) (Rotter, 1966). According to Chak
and Leung (2004), individuals who believed that they had control over their lives
were less likely to be addicted to the Internet, because they believed that they could
maintain healthy Internet use behaviors. If that argument has merit, individuals who
believe that external factors control their lives may be more susceptible to Internet
addiction. In other media contexts, Wober and Gunter (1982) found that individuals
who were externally controlled were heavier TV viewers than those who were
internally controlled. External control also has been linked to problematic effects of
television use, such as increased aggression (Haridakis, 2002). Researchers have found
that high external locus of control scores in adolescents predicted heavy substance
use (Bearinger & Blum, 1997) and alcohol use (Steele, Forehand, Armistead, &
Brody, 1995).
Self-esteem
According to Baumeister (1993) and Swann (1996), individuals with low self-esteem
have negative evaluations about themselves and are suspicious of praise. In order to
withdraw or escape from these negative evaluations and stresses, individuals with
low self-esteem tend to engage in addictive behavior such as substance abuse (e.g.,
Craig, 1995; Hirschman, 1992; Marlatt, Baer, Donovan, & Kivlahan, 1988). In the
context of Internet use behavior, Armstrong, Phillips, and Saling (2000) found that
low self-esteem was a significant positive predictor of addictive Internet use. Outside
of the Internet context, Peele (1985) suggested that one of the reasons people may
become addicted to media use is to bolster their self-esteem. Consistent with the
research results of Armstrong et al. (2000) and Peele (1985), we treat self-esteem as a
possible negative predictor of Internet addiction.
994 Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
Applying these findings from prior research on the relationships between diverse
background characteristics and substance or behavioral addiction, the following
hypotheses are posed:
H1a: Shyness, sensation-seeking, and loneliness will be positively related to Internet addiction.
H1b: Internal locus of control and self-esteem will be negatively related to Internet addiction.
Amount of Use
U&G, the theoretical framework guiding this study, suggests that exposure to a
medium is an important antecedent to media effects. U&G also suggests that media
use can be related to unintended consequences of use, such as Internet addiction.
In fact, Widyanto and McMurran (2004) found that the higher the amount of time
spent online, the greater the extent of symptoms of Internet addiction. Leung (2004)
also suggested that hours spent on the Internet per day was a positive predictor of
Internet addiction. Similarly, Horvath (2004) found that those who measured higher
than their counterparts on a measure of television addiction tended to be heavier
television viewers. The results of these studies indicate that amount of Internet use
and Internet addiction have been treated as distinct but related concepts in prior
Internet addiction research. If, as prior research suggest, heavier users of a medium
are likely to be more prone to be addicted to the medium, the amount of use is an
important variable to consider.
H2: The amount of Internet use will be positively associated with Internet addiction.
Kubey et al.s (2001) claim that addictive Internet user use the Internet to meet
others suggests the importance of examining communication motives for using
the Internet. Peeles (1985) claim that individuals addicted to media use them to
gain a sense of control in their lives and to bolster self-esteem also suggests the
importance of considering the role of motives when exploring predictors of Internet
addiction. U&G has been one of the predominant theoretical frameworks used to
study the influence of media use motives on media effects over the last 30 years or so.
Researchers specifically have suggested that people use the Internet for a variety of
interpersonal (e.g., affection, inclusion, social interaction) and media-related reasons
(e.g., entertainment, information seeking, passing time, escape) (e.g., Charney &
Greenberg, 2002; Ebersole, 2000; Ferguson & Perse, 2000; Kaye & Johnson, 2004;
Papacharissi & Rubin, 2000). Accordingly, we assessed a range of such motives
individuals may have for using the Internet.
Some researchers have considered the influence of motives for using the Internet
on both Internet dependency (authors, in press) and Internet addiction (Chou &
Hsiao, 2000; LaRose, Lin, & Eastin, 2003). However, there is little research truly
exploring possible links between a range of motives individuals may have for Internet
addiction. This is a significant gap in the research, because prior media use research
Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
995
suggests that motives impact effects (see Rubin, 2002 for a review of studies).
Specifically, it has been suggested that more purposive and instrumental use (e.g.,
information seeking, control, caring others, etc.) may inhibit negative outcomes and
that more habitual use (e.g., habitual entertainment, escape) enhances the likelihood
of unintended, and potentially negative outcomes of use (Song, LaRose, Eastin, &
Lin, 2004). We wanted to see if this was the case for Internet addiction.
Integrating all the previous research about the effects of user characteristics, media
use motives, and the amount of use on addiction, the following research question is
put forth;
RQ1: How do users background characteristics, motives, and the amount of Internet use
contribute to Internet addiction?
Research Methods
Sample
The sample included 203 undergraduate students ranging from freshmen to seniors
from a variety of majors enrolled in a multisection course required as part of a large
Midwestern U.S. universitys liberal education requirement. The sample was 48%
men and 52% women. The mean age was 21.5 years (SD = 5.32). Students were asked
to come into a classroom and took a pen-and-paper survey. Given the exploratory
nature of this research, we felt the sample was appropriate. College students tend
to use a variety of Internet functions (Morahan-Martin & Schumacher, 2000). In
addition, computers and the Internet were widely available across campus, and all
students were required to use the Internet.
Measures
Internet addiction scale
Internet addiction was measured by asking respondents how often they engaged
in each of 31 indicators of Internet addiction (1 = Never, 5 = Very Often). This
index consisted of 20 items from Youngs (1996a) Internet Addiction Test (IAT)
and 11 items from Horvaths (2004) Television Addiction Scale. Both measures are
based on DSM-IV criteria in line with the assumption that media addiction shows
symptoms that are similar to addiction to other devices/substances (e.g., drugs).
The reason we chose Youngs scale for this study was that it had been widely used
for measuring Internet addiction in previous research (e.g., Chak & Leung, 2004;
Hur, 2006; Pratarelli, Browne, & Johnson, 1999; Thatcher & Goolam, 2005). We
added additional items from Horvaths Television Addiction Scales (2004), since
we felt it was prudent to encompass additional DSM-IV criteria that were not
included in Youngs scale. Because the measure we used was comprised of items
drawn from different media addiction scales, we subjected it to principle components
factor analysis with varimax rotation to uncover any possible underlying component
structure. Factors with eigenvalue of at least 1.0, primary loadings of at least .50 and
no items that loaded significantly on another factor (i.e., a larger than .20 difference
996 Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
between primary and secondary loadings) were retained. Five factors showed up when
all 31 items were entered into factor analysis. However, three items from escaping
reality, two items from attachment, all four items from the fourth factor, and all three
items from the fifth factor were removed because they were cross-loaded across more
than one factor. The remaining 19 items were divided into three factors and were
summed and averaged to create respective indexes of Internet addiction dimensions.
These three factors explained 62.5% of the variance after rotation. Responses that
loaded on each factor were summed and averaged to create indexes of each Internet
addiction dimension.
Factor 1, intrusion, (eigenvalue = 8.78) explained 46.2% of the variance after
rotation. Items comprising this factor reflected that using the Internet became
intrusive to participants everyday life (M = 1.66, SD = 0.77, = .92) (e.g., I often
find that I stay online longer than I intended, I often neglect household chores to
spend more time online). Factor 2, escaping reality, (eigenvalue = 2.06) explained
10.8% of the variance. This factor suggested that the Internet was a tool for escaping
reality (M = 2.63, SD = 0.91, = .90) (e.g., I often block out disturbing thoughts
about my life with soothing thoughts of using the Internet, I often snap, yell,
or act annoyed if someone bothers me while I am online). Factor 3, attachment,
(eigenvalue = 1.04) explained 5.5 % of the variance. This factor reflected a strong
attachment or affinity for the Internet (M = 1.93, SD = 0.92, r = .43) (i.e., I cant
imagine living without the Internet, When I am unable to use the Internet, I miss
it so much that I feel upset). The final results of the factor analysis are depicted in
Table 1.
Amount of the Internet use
General Internet use behavior was measured with two questions asking how much
time participants spent using the Internet yesterday (the day before they participated
the survey) and how much time they spent using the Internet on a typical day. These
two items have been used to measure other media use research, such as television,
and produced reliable estimates (Haridakis, 2002). Answers to the two questions
were summed and averaged (M = 194 minutes, SD = 117.7).
Motives
Internet-use motives were measured with a 45-item Internet motives scale used
in prior research (Papacharissi & Rubin, 2000). This scale taps several motives
associated with using the Internet, ranging from interpersonal motives (e.g., inclusion,
control, affection) to media-use motives gleaned from prior media research (e.g.,
entertainment, escape, pass time, information seeking). We added four additional
items taken from Rubins (1983) television motives scale that were not covered in
the Internet motives scale. These items reflected using the Internet for thrill and
excitement. Respondents were asked how well each of the 49 statements was like
their own reasons for using the Internet (1 = Not at all, 5 = Exactly). All items
were subjected to principle components factor analysis with varimax rotation. To
Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
997
Intrusion
Escaping reality
Attachment
.75
.74
.74
.16
.07
.31
.24
.00
-.01
.74
.05
-.08
.70
.26
.30
.68
.22
.16
.66
.43
.24
.63
.62
.44
.35
.23
.29
.54
.41
.31
.10
.83
.14
.33
.79
.11
.23
.13
.73
.71
.01
.26
.34
.70
.12
.12
.68
.37
.36
.67
.18
.25
.05
.12
.36
.78
.73
1.66
.77
2.63
.91
1.93
.92
retain a factor, we used the same criteria used in the factor analysis of the Internet
addiction measure. Eight factors emerged when all 49 items were entered into the
factor analysis, but 20 items were removed. Specifically, five items from habitual
entertainment, one item from seeking information, three items from escapism, four
items from control, all four items from the seventh factor, and all three items from
the eighth factor were removed because of their high cross-loading values. The
998 Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
remaining 29 items were divided into six factors and were summed and averaged to
create respective indexes of motives. Six factors explaining 62.4% of the variance after
rotation emerged. Responses that loaded on each factor were summed and averaged
to create respective indexes of motives.
The first motive, habitual entertainment, (eigenvalue = 11.44) explained 34.7%
of the variance after rotation. This factor was composed of items that reflected
both habitual use and using the Internet to be entertained (M = 1.66, SD = 0.77,
= .93) (e.g., Because its fun just to play around and check things out, Because
its just a habit, just something to do). The second motive, caring for others,
(eigenvalue = 3.06) explained 9.3% of the variance. This factor reflected using the
Internet to show others affection and care (M = 2.45, SD = 0.83, = .85) (e.g.,
To help others, To let others know I care about their feelings). Factor 3,
economical information seeking, (eigenvalue = 2.24) explained 6.8% of the variance
and contained items reflecting using the Internet to search for and share information
conveniently (M = 4.00, SD = 0.66, = .78) (e.g., To get information for free,
and Because it is cheaper than other ways of sending information to other people).
The fourth factor, excitement, (eigenvalue = 1.51) explained 4.6% of the variance.
This three-item factor reflected using the Internet to seek excitement and thrill
(M = 2.72, SD = 1.10, = .90) (e.g., Because it is thrilling, Because it is
exciting). Factor 5, control, (eigenvalue = 1.27) explained 3.8% of the variance.
Items comprising this factor reflected that people used the Internet to affect and
control others behavior (M = 2.69, SD = 0.67, = .73) (e.g., To tell others what
to watch or see, Because I want someone to do something for me). The final factor,
escape, (eigenvalue = 1.05) explained 3.2% of the variance. This factor included two
items, So I can get away from what Im doing, and So I can forget about school,
work or other things (M = 2.96, SD = 1.20, r = .67). The final results of the factor
analysis of the motives scale is depicted in Table 2.
Background characteristics
999
CO
.11
.20
.05
.14
.03
.04
.12
.28
.06
.10
.27
.26
.27
.21
.30
.26
.02
.18
.12
.21
.26
.00
.11
.39
.02
.14
.22
.02
.05
.08
.10
.35
.05
.05
.08
.04
.32
.27
.03
.04
.15
.23
.26
.05
.04
.26
.63
.18
.60
.60
.59
.05
.26
.00
.23
.70
.30
.65
1000 Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
Escaping reality
Attachment
.19**
.16*
.35**
.23**
.24**
.30**
.30**
.20**
.22**
.19**
.39**
.23**
.29**
.16*
.18*
.22**
.41**
.15*
.21**
.20**
.12*
.24**
.32**
.17*
1001
Intrusion
Escaping Reality
Attachment
.02
.01
.09
.22**
.17*
.12*
.10
.22**
.18**
.04
.23**
.04
.13
.18*
.11
.03
.08
.13
.05
.03
.07
.08
.05
.03
.22**
.09
.15
.20*
.28**
.11
.20**
.04
.18*
.23**
.08
.06
.02
.02
.24**
.02
.18*
.03
.18*
.08
.03
.23**
.18**
.06
Note. All s are final s on the last step of the regression. N = 204.
*p < .05, **p < .01
Discussion
The first dimension, intrusion, reflects a manifestation of Internet use in which users
neglect activities in their everyday lives (e.g., chores, etc.) due to their unhealthy
Internet use. Individuals who exhibit this form of use tend to use the Internet for
longer periods than they intend. They seem aware of their problematic Internet
use, but are unable to correct it satisfactorily. The second dimension, which we
term escaping reality, seems to be a more intense manifestation of possible Internet
addiction than either of the other two dimensions. Whereas intrusion reflects a sense
that Internet is interfering with ones offline life, those whose Internet use behavior
Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
1003
reflects escaping reality see offline activities as interfering with their online lives.
Those exhibiting this form of use experience anger when others hinder their Internet
use, prefer time online over time with friends and family, and are preoccupied
with Internet use even when they arent online. The final dimension, attachment,
reflects a strong emotional connection to the Internet. Users exhibiting this form
of Internet-use behavior could not imagine living without Internet. Although they
might get upset or agitated if unable to go online, this attachment to the Internet
does not seem to be as disrupting to users offline activities as when their Internet
use is manifested in the form of intrusion or escaping reality. But, it does reflect
becoming upset when one cannot use the Internet that may possibly reflect a more
intense feeling of loss or withdrawal than that experienced by those who simply have
an affinity for or reliance on the medium.
But reaching definitive conclusions from just one study using a convenience
sample would be premature and must be tempered. It is tempting, for example, to
speculate that intrusion and attachment may be less intense forms of Internet use that
may be precursors of the more intense form of use, escaping reality, if not negated
through intervention. This might reflect that there can be a progression in Internet
addiction moving from the milder to the intense level (Charlton & Danforth, 2007).
It would be similar to claims made in the context of substance addiction that the use
of soft drugs can lead to the use of hard drugs as addiction progresses (Hopper,
1995). It is also possible, though, that intrusion, attachment, and escaping reality
are three distinct forms of Internet use and that one does not necessarily lead to the
other. This would be consistent with claims made by Caplan (2002) that dimensions
of addiction are distinct and not a continuum of progression. Again, though, either
speculation is premature from the results of just one study. One reason we cant
reach a definitive conclusions about the related or distinct nature of the different
dimensions of Internet addiction is that no consensus has been reached on the
dimensions or stages of Internet addiction in previous research. Another reason is
that there was not a consistent set of predictors across the three different dimensions
of Internet use behavior in the current study.
In addition, among this convenience sample, the mean values of each dimension
(intrusion M = 1.66, escaping reality M = 2.63, attachment M = 1.94) were low.
Thus, even if our measure is a valid and reliable measure of addictive behavior, on the
whole, this student sample did not seem to exhibit inordinately such behaviors. Future
research should target particular populations that do measure high on such indicators
to see if factors identified hereintrusion, escaping reality, attachmentprove to be
stable in confirmatory factor analyses, and valid and reliable when studied with other
variables to which addiction should be linked theoretically.
Items composing intrusion, escaping reality, and attachment are from DSM-IV
criteria that have been used for diagnosing addiction in diverse contexts. However,
when applied to media contexts, we should be cautious in unabashedly interpreting
the three dimensions found in the current study as addiction. Rather, they might
reflect a tendency toward addiction or addictive behaviors. Though much more
1004 Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
research is needed, the fact that antecedents that had been associated with addiction
in different contexts were linked with the three dimensions identified in this study
suggests we should at least consider the possibility that these measures reflect aspects
of addictive behavior or a tendency toward it.
Background Characteristics
1005
2002), cultivation effects such as fear (Wober & Gunter, 1982), and concern with
safety (Haridakis & Rubin, 2005). Given its predictive strength, locus of control
should continue to receive greater attention in media addiction studies. As in the case
of shyness, though, results regarding the connection between locus of control and the
Internet-use dimensions here should be interpreted cautiously. For some externals,
the Internet may provide them with an opportunity to attempt to exercise some
control in their lives that they otherwise lack (e.g., Peele, 1985). Future research should
seek to differentiate between such positive effects, and the potentially unhealthy links
between locus of control and Internet use that our results may suggest.
Zero-order correlation analysis suggested that loneliness related positively with
all three dimensions of Internet addition. However, it was a significant negative
predictor of intrusion in the multiple regression analysis. Thus, when a wider array of
variables was considered, the relationship between loneliness and Internet addiction
was not so straightforward. This finding suggests that prior research linking loneliness
to at least some aspects of addictive behavior could be an artifact of the failure to
account for other variables that may mediate that relationship. This possibility may
also explain why self esteem and sensation seeking were related to dimensions of
addiction, but failed to predict any specific dimension in the regression analysis.
This latter point should be stressed. We included specific background factors in
this study because of their links with addiction in prior contexts. While no single
study can include all of the individual differences that may impact media effects, there
are numerous other possible confounding variables that could be important to assess
in future research. For example, pursuant to uses and gratifications theory various
psychological circumstances (e.g., depression, anxiety) and social circumstances (lack
of mobility, health problems, communication disabilities) could be relevant factors
that could make one more or less prone to Internet addiction or other problematic
use. Accordingly, future research should examine the influence of a wider array of
background factors.
Motives for Using the Internet
The second goal of this study was to ascertain whether certain motives for using the
Internet might predict addiction. Prior studies have not explored systematically the
potential influences of motives for using the Internet on Internet addiction. Here
we found that a number of motives differentially predicted different dimensions
of Internet use. The fact that different sets of motives predicted the three different
dimensions of Internet addiction (as measured here) might provide some hints on
distinguishing different intensity levels of addiction. Using the Internet for purposes
of habitual entertainment and escape predicted allegedly the most intense dimension
of Internet addiction, escaping reality. This may corroborate U&G research suggesting
that more habitual use was less likely to mitigate, and at times might even enhance,
the likelihood of unintended negative effects of media use (e.g., Haridakis, 2002;
Rubin, 2002). On the other hand, neither of those motives predicted attachment.
Instead, motives that reflected using the Internet to care for others and to seek
1006 Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
Most media effects research focusing on the role of the media on negative effects (e.g.,
violence) suggest that exposure is a central variable contributing to the effects. Prior
research also linked the amount of use with Internet addiction (e.g., Morahan-Martin
& Schumacher, 2000; Young & Rogers, 1998). Here, we found that amount of Internet
use correlated positively with all three dimensions of Internet-use behaviors. In the
multiple regression analyses, the amount of Internet use was a significant predictor
of both intrusion and escaping reality. In each instance, entering amount of Internet
use into the regression equations resulted in a significant increase in the explained
variance.
As with the other variables in this study, though, the relationship between amount
of use and the dimensions of Internet-use behavior identified here should be explored
further. If Internet use can be addictive, it is logical to assume that those who are
addicted would use it extensively. But, not all heavy use is tantamount to addiction.
If that were the case, all heavy use of media could be deemed addiction. On the
whole, these college students used the Internet a significant amount of time, more
than 3 hours per day (194 minutes). Despite the fact that they used the Internet a
significant amount of time, as referenced above, the low means on the addiction
scale suggests they did not on the whole exhibit a high level of addictive Internet-use
behavior. Accordingly, future research should focus more deliberately on the loss
of control over ones media use that is reflected in DSM-IV criteria that comprised
the factors of Internet use identified here. The loss of control and the disruptive
effects it may have on the user and his/her relationship with others may be a major
distinguishing characteristic between mere heavy use and addiction.
Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
1007
In summary, the results of this study suggest that Internet addiction may be
manifested in different ways. Here we identified three possible dimensions: intrusion,
escaping reality, and attachment. If, as the results suggest, some forms of Internetuse behaviors are more intense and more detrimental than others, then future
research should be directed toward identifying with greater specificity exactly what
background characteristics of Internet users and motives for using the Internet
explain how addiction is manifested and which users are more susceptible to these
different manifestations of addiction.
The results also suggest that if motives and background factors are
important potential contributors to addiction that should be included in future
researchparticularly research that considers more specifically possible addiction to
particular Internet fare or functions. It may be that some Internet users who exhibit
indicators of addiction may be addicted to the Internet. It may also be that they are
addicted to content the Internet permits them to access, rather than the Internet
itself. Perhaps it is possible to be addicted to both the Internet and to particular
fare. But more research has to focus on distinguishing between potential addiction
to the medium and addiction to content that may be accessed via that medium.
With respect to the former, some researchers have suggested that those who are
addicted to the Internet spend more time with a variety of functions such as browsing
without specific goals (e.g., Caplan, 2002; Davis, 2001). For those who are addicted to
particular content, such as pornography, the Internet may simply be a delivery device
in the same way that a syringe is a delivery device for a substance abuser. In addition,
the Internet may only be one medium among others (e.g., videos, magazines) through
which they obtain that content. Whether future research focuses on the Internet or
particular content, the results here suggest that the inquiry should not ignore the
important influence of motives and background characteristics of users that may
make some more prone to addictive behavior than others. For example, over the
years, media research has suggested that some people use and develop an affinity for
media (such as television) whereas others develop an affinity for particular content
(e.g., see Rubin, 2002). Future Internet addiction research should consider profiles
of these different media-use orientations to see if those evidencing either are more or
less prone to addiction to a medium such as the Internet and/or addiction to content
available via the media.
In addition, the results of the current study also lead to a series of questions related
to Internet users. Especially, how far can we generalize the findings from a study of
college students, who did not exhibit a high level of problematic Internet-use behavior,
to potentially more at-risk groups who may be more prone to addictive media-use
behavior? Can the results here be considered applicable to other populations who do
not have the level of Internet access that college students have? Finally, many of the
scales measuring variables included in the current study were developed in decades
preceding the Internet or adapted from research in the 1990s, when the Internet was
in its infancy. The changing nature of Internet use, functions, and use environments
1008 Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
may require more advanced and up-to-date measures of variables that may be more
amenable to the study of media use and effects in ever changing media environments.
Notes
1 A maladaptive pattern of substance use, leading to clinically significant impairment
or distress, as manifested by three (or more) of the following, occurring at any time
in the same 12-month period.
(1) Tolerance, as defined by either of the following:
(a) a need for markedly increased amounts of the substance to achieveintoxication
or desired effect
(b) markedly diminished effect with continued use of the same amount of the
substance
(2) Withdrawal, as manifested by either of the following:
(a) the characteristic withdrawal syndrome for the substance (refer to Criteria A
and B of the criteria sets for Withdrawal from the specific substances)
(b) the same (or a closely related) substance is taken to relieve or avoid withdrawal
symptoms
(3) The substance is often taken in larger amounts or over a longer period than was
intended.
(4) There is a persistent desire or unsuccessful efforts to cut down or control substance
use.
(5) A great deal of time is spent in activities necessary to obtain the substance (e.g., visiting
multiple doctors or driving long distances), use the substance (e.g., chain-smoking),
or recover from its effects.
(6) Important social, occupational, or recreational activities are given up or reduced
because of substance use.
(7) The substance use is continued despite knowledge of having a persistent or recurrent
physical or psychological problem that is likely to have been caused or exacerbated
by the substance (e.g., current cocaine use despite recognition of cocaine-induced
depression, or continued drinking despite recognition that an ulcer was made worse
by alcohol consumption) (Behavenet.com, 2007).
References
Akerlind, I., & Hornquist, J. O. (1989). Stability and change in feelings of loneliness: A twoyear prospective longitudinal study of advanced alcohol abuse. Scandinavian Journal of
Psychology, 30(2), 102112.
American Psychiatric Association. (2006). DSM: Diagnostic and statistical manual of mental
disorders (4th ed.). Retrieved January 23, 2008, from
http://www.psych.org/research/dor/dsm/dsmintro81301.cfm.
Anderson, K. (1998). Internet dependency among college students: Should we be concerned?
Presented at the Meeting of the American College Personnel Association, St. Louis, MO.
Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
1009
Armstrong, L., Phillips, J.G., & Saling, L.L. (2000). Potential determinants of heavier internet
usage. International Journal of HumanComputer Studies, 53, 537550.
Baumeister, R, F. (1993). Self-esteem: The puzzle of low self-regard. New York: Plenum Press.
Bauer, J. M., Gai, P., Kim, J-H., Muth, T., & Wildman, S. (2002). Broadband: Benefits and
policy challenges. A report prepared for Merit Network, Inc.
Ball-Rokeach, S. J. (1985). The origins of individual media-system dependency:
A sociological framework. Communication Research, 12(4), 485510.
Bearinger, L. H., & Blum, R. W. (1997). The utility of locus of control for predicting
adolescent substance use. Research in Nursing, 20, 229249.
Beard, K. W., & Wolf, E. M. (2001). Modification in the proposed diagnostic criteria for
Internet addiction. Cyberpsychology and Behavior, 4, 377383.
Bingham, J. E., & Piotrowski, C. (1996). On-line sexual addiction: A contemporary enigma.
Psychological Reports, 79, 257258.
Block, J. (2008). Issues for DSM-IV: Internet addiction. American Journal of Psychiatry, 165,
306307.
Brenner, V. (1997). Psychology of computer use: XLVII: Parameters of Internet use, abuse
and addiction: The first 90 days of the Internet usage survey. Psychological reports, 80,
879882.
Brown, R. I. F. (1993). Some contributions of the study of gambling to the study of other
addictions. In W. R. Eadington & J. A. Cornelius (Eds), Gambling behavior and problem
gambling (pp. 241272). Reno: University of Nevada Press.
Caplan, S. E. (2002). Problematic Internet use and psychosocial well-being: Development of
a theory-based cognitive-behavioral measurement instrument. Computers in Human
Behaviors, 18, 553575.
Caplan, S. E. (2003). Preference for online social interaction: A theory of problematic
Internet use and psychosocial well-being. Communication Research, 30(6), 625648.
Chak, K., & Leung, L. (2004). Shyness and locus of control as predictors of internet addiction
and internet use. Cyberpsychology & behavior: The impact of the Internet, multimedia and
virtual reality on behavior and society, 7(5), 559570.
Charney, T., & Greenberg, B. S. (2002). Uses and gratification of the Internet:
Communication, technology and science. In C. Lin & D. Atkin (Eds.), Communication,
technology and society: New media adoption and use (pp. 379407). Cresskill, NJ:
Hampton Press.
Charlton, J. P., & Danforth, I. D.W. (2007). Distinguishing addiction and high engagement
in the context of online game playing. Computers in Human Behavior, 23, 15311548.
Cheek, J. M., & Buss, A. H. (1981). Shyness and sociability. Journal of Personality and Social
Psychology, 41(2), 330339.
Chen, W. J., Boase, J., & Wellman, B. (2002). The Global villagers: Comparing Internet users
and uses around the world. In B. Wellman & C. Haythornthwaite (Eds.), The Internet in
Everyday Life (pp. 74113). Oxford: Blackwell.
Chou, C., & Hsiao, M-C. (2000). Internet addiction, usage, gratification, and pleasure
experience: The Taiwan college students case. Computers and Education, 35(1), 6580.
1010 Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
1011
Mandell, J. (July 3, (2007). ). Are gadgets, and the Internet, actually addictive? CNN.com,
retrieved November 1, 2008 from
http://edition.cnn.com/2007/TECH/ptech/07/01/la.tech.addictions/index.html.
Marks, I. (1990). Non-chemcal (behavioral) addictions. British Journal of Addiction, 85,
13891394.
Marlatt, A. G., Baer, J. S., Donovan, D. M., & Kivlahan, D. R. (1988). Addictive behaviors:
Etiology and treatment. Annual Review of Psychology, 39, 223252.
Martin, M. (2007). Doctors dismiss video game addiction claim, retrieved March 12, 2009
from
http://www.gamesindustry.biz/articles/doctors-dismiss-videogame-addiction-claim.
McKenna, K. Y. A., & Bargh, J. A. (2000). Plan 9 from cyberspace: The implication of the
Internet for personality and social psychology. Personality and Social Psychology Review,
4, 5775.
Medora, N. P., & Woodward, J. C. (1991). Factors associated with loneliness among
alcoholics in rehabilitation centers. Journal of Social Psychology, 131(6), 769779.
Morahan-Martin, J. (2007). Internet use and abuse and psychological problems. In
A. Joinson, K., McKenna, T., Postmes, & R., Ulf-Dietrich (Eds.), The Oxford handbook of
Internet psychology (pp. 331345). Oxford: University Press.
Morahan-Martin, J., & Schumacher, P. (2000). Incidence and correlates of pathological
Internet use among college students. Computers in Human Behavior, 16, 1329.
Morgan, W. (1979). Negative addiction in runners. Physician and Sports Medicine, 7, 5669.
Murali, V., & George, S. (2007). Lost online: An overview of Internet addiction. Advances in
Psychiatric Treatment, 13, 2430.
Nerviano, V. J., & Gross, W. F. (1976). Loneliness and locus of control for alcoholic males:
Validity against Murray need and Cattell trait dimensions. Journal of Clinical Psychology,
32, 479484.
Oliver, M. B. (2002). Individual differences in media effects. In D. Zillman & J. Bryant (Eds),
Media effects: Advances in theory and research (pp. 507524). Mahwah, NJ: Erlbaum.
Papacharissi, Z., & Rubin, A. M. (2000). Predictors of Internet use. Journal of Broadcasting &
Electronic Media, 44(2), 175196.
Peele, S. (1985). The Meaning of Addiction. Lexington, MA: Lexington Books.
Perse, E. M. (1996). Sensation seeking and the use of television for arousal. Communication
Reports, 9(1), 3748.
Pratarelli, M. E., Browne, B., & Johnson, K. (1999). The bits and bytes of computer/Internet
addiction: A factor analytic approach. Behavior Research Methods, Instruments and
Computers, 31, 305314.
Rawlings, J. O., Pantula, S. G. & Dickey, D. A. (1998). Applied Regression Analysis-A Research
Tool. Spring-Verlag, New Jersey.
Rheingold, H. (1993). The virtual community: Homesteading on the electronic frontier.
Reading, MA: Addison Wesley.
Robbins, R. N., & Bryan, A. (2004). Relationships Between future orientation, impulsive
sensation seeking, and risk behavior among adjudicated adolescents. Journal of Adolescent
Research, 19, 428445.
Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
1013
Rokach, A., & Orzeck, T. (2003). Coping with loneliness and drug use in young adults. Social
Indicators Research, 61(3), 259283.
Rosenberg, M. (1965). Society and the adolescent self-image. Princeton: Princeton University
Press.
Rotter, J. B. (1966). Generalized expectancies for internal versus external control of
reinforcement. Psychological Monographs, 80, 128.
Rubin, A. M. (1983). Television uses and gratifications: The interactions of viewing patterns
and motivations. Journal of Broadcasting, 27, 3752.
Rubin, A. M. (1993). The effect of locus of control on communication motivation, anxiety,
and satisfaction. Communication Quarterly, 41, 161171.
Rubin, A. M. (2002). The uses-and-gratifications perspective of media effects. In J. Bryant &
D. Zillmann (Eds), Media effects: Advances in theory and research (pp. 525548).
Mahwah, NJ: Erlbaum.
Rubin, A. M., & Windahl, S. (1986). The uses and dependency model of mass
communication. Critical Studies in Mass Communication, 3, 184199.
Russell, D. (1996). The UCLA Loneliness Scale (Version 3): Reliability, validity, and factor
structure. Journal of Personality Assessment, 66, 2040.
Sanders, C. E., Field, T. M., Diego, M., & Kaplan, M. (2000). The relationship of Internet use
To depression and social isolation among adolescents. Adolescence, 35, 237242.
Santesso, D. L., Schmidt, L. A., & Fox, N. A. (2004). Are shyness and sociability still a
dangerous combination for substance use? Evidence from a US and Canadian sample.
Personality and Individual Differences, 37, 517.
Shaffer, H. (2004). Internet gambling and addiction position paper. Boston: Harvard Medical
School, Division on Addictions.
Shaffer, H., Hall, M., & Vander Bilt, J. (2000). Computer addiction: A critical consideration.
American Journal of Orthopsychiatry, 70, 162168.
Scherer, K. (1997). College life on-line: healthy and unhealthy Internet use. The Journal of
College Student Development, 38, 655664.
Song, I., LaRose, R., Eastin, M., & Lin, C. (2004). Internet gratifications and Internet
addiction: On the uses and abuses of new media. Cyberpsychology & Behavior, 7(4),
384394.
Steele, R. G., Forehand, R., Armistead, L., & Brody, G. (1995). Predicting alcohol and drug
use in early adulthood: The role of internalizing and externalizing behavior problems in
early adolescence. American Journal of Orthopsychiatry, 65, 380388.
Stoll, C. (1995). Silicon snake oil. New York: Doubleday.
Swann, W. B., Jr. (1996). Self-traps: The elusive quest for higher self-esteem. New York:
Freeman.
Thatcher, A., & Goolam, S. (2005). Development and psychometric properties of the
problematic Internet use questionnaire. South Africa Journal of Psychology, 35, 793805.
Turkle, S. (1996). Virtuality and its discontents: Searching for community in cyberspace. The
American Prospect, 24, 5057.
Young, K. S. (1996a). Psychology of computer use XI: Addictive use of the Internet: a case
study that breaks the stereotype. Psychological Reports, 79, 899902.
1014 Journal of Computer-Mediated Communication 14 (2009) 9881015 2009 International Communication Association
Young, K. S. (1996b). Internet addiction: The emergence of a new clinical disorder. Paper
presented at the American Psychological Association. Toronto, Canada.
Young, K. S. (1997). What makes online usage stimulating? Potential explanations for
pathological Internet use. Symposia paper presented at the 105th Annual Meeting of the
American Psychological Association, Chicago.
Young, K. S. (1998). Caught in the net. Chichester: Wiley.
Young, K. S. (2004). Internet addiction: A new clinical phenomenon and its consequences.
American Behavioral Scientist, 48(4), 402415.
Young, K. S., & Rogers, R. C. (1998). The relationship between depression and Internet
addiction. CyberPsychology and Behavior, 1(1), 2528.
Yuen, N. C., & Lavin, M. J. (2004). Internet dependence in the collegiate population: The
role of shyness. CyberPsychology & Behavior, 7(4), 379383.
Wagner, M. K. (2001). Behavioral characteristics related to substance abuse and risk-taking,
sensation-seeking, anxiety sensitivity, and self-reinforcement. Addictive Behavior, 26,
115120.
Walther, J. (1999). Communication addiction disorder: Concern over media, behavior and
effects. Paper presented at the annual meeting of American Psychological Association,
Boston.
Widyanto, L., & McMurran, M. (2004). The psychometric properties of the Internet
Addiction Test. CyberPsychology & Behavior, 7(4), 443450.
Winn, M. (1977). The plug-in drug: Television, children, and the family. New York: Vikin.
Wober, M., & Gunter, B. (1982). Television and personal threat: Fact or artifact? A British
survey. British Journal of Social Psychology, 21, 239247.
Zuckerman, M. (1979). Sensation-seeking: Beyond the optimal level of arousal. Hillsdale, NJ,
Erlbaum.
Zweben, J. E. (1987). Recovery-oriented psychotherapy: Facilitating the use of 12-step
programs. Journal of Substance Abuse Treatment, 19, 243251.
1015
A Dissertation
Submitted to the School of Graduate Studies and Research
in Partial Fulfillment of the
Requirements for the Degree
Doctor of Psychology
Kimberlee D. DeRushia
Indiana University of Pennsylvania
May 2010
ii
Signature on file
Kimberely J. Husenits, Psy.D.
Associate Professor of Psychology, Advisor
Signature on file
Beverly J. Goodwin, Ph.D.
Professor of Psychology
Signature on file
John A. Mills, Ph.D., ABPP
Professor of Psychology
ACCEPTED
iii
ABSTRACT
Title: Internet Usage among College Students and its Impact on Depression, Social
Anxiety, and Social Engagement
Author: Kimberlee D. DeRushia, M.A.
Dissertation Chair: Kimberely J. Husenits, Psy.D.
Dissertation Committee Members:
iv
ACKNOWLEDGMENTS
Although the final product of a dissertation has just one name on the front
cover, in my experience at least, it takes an entire village of people to bring one to
fruition. This dissertation was a herculean task that I occasionally considered
abandoning and if it had not been for the encouragement, support, and even once or
twice outright pushing, from Jamie Brass, Steven Behling, Jessica Buckland, Karen
Graves, Hey-Mi Ahn, Marc Palmer and my dad, John DeRushia, Im not certain that I
would have ever finished. Thank you each for understanding the process and being
there when I needed someone to lean on.
It is also with deep gratitude that I thank the members of my committee,
Kimberley J. Husenits, Beverly J. Goodwin, and John A. Mills for their guidance,
patience and dedication throughout this entire process. Thank you Jennifer
Hambaugh, members of the Indiana University of Pennsylvania Applied Research
Lab and to Dana Reed at Student Voice, and Beverly Obitz in the School of Graduate
Studies and Research for helping with the technical aspects of this dissertation. Thank
you also to Nathaniel Mills, Jed Brubaker, and Daniel Lennen for your help with
analyzing my data and taking the time to remind me that the anxiety that comes from
a dissertation makes it easy to forget or over-think basic statistics. I would be remiss
if I didnt also thank my colleagues at University of the Pacific: Stacie Turks,
Charlene Patterson, Liz Thompson, and Kristina Dulcey-Wang for their never-ending
support and gentle prodding.
Finally, I want to thank my family for their encouragement, not only through
the dissertation process but throughout the totality of graduate school. Thank you to
vi
my husband Jason Clark for your limitless patience, for being there on the worst of
days and the best of days, and for moving not once, but twice across the country in
order to support my dream. Thank you also to my son Sebastian Clark, you were too
young to know this, but on the days that it seemed darkest I would come home, hear
your giggles and see your smile, and be reminded that in the end, the journey was
worth the hardship.
vii
TABLE OF CONTENTS
Chapter
1:
2:
3:
4:
Page
INTRODUCTION
1
2
4
6
7
12
13
15
16
16
17
METHOD
18
Participants
Materials
Demographic questionnaire
Measure of Internet usage
Measures of social engagement
Measure of depression
Measure of social anxiety
Procedures
Selecting participants
Phase one
Phase two
Phase three
18
18
19
19
20
21
21
22
22
22
23
23
RESULTS
25
Descriptive Statistics
Internal Consistency of the Social Rhythm Metric
25
26
viii
Chapter
Page
6:
26
28
29
30
31
32
33
DISCUSSION
34
34
35
36
38
39
REFERENCES
43
APPENDICES
51
A.
B.
C.
D.
E.
F.
G.
H.
I.
Informed Consent
Demographics Questionnaire
Part One: Internet Usage Tracking Chart
Part Two: Internet Usage Follow-up Questions
Social Rhythmic Metric
UCLA Loneliness Scale (Version 3)
Center for Epidemiologic Studies Depression Scale
Brief Fear of Negative Evaluation, Revised
Debriefing
Campus and Community Resources
ix
51
53
56
57
58
59
60
61
62
63
LIST OF TABLES
Table
1
Page
Time Spent on the Internet and its Influence on Social Engagement,
Social Anxiety, and Loneliness with Face-to-Face Relationships
27
Time Spent on the Internet and its Influence on Social Anxiety, and
Loneliness with Online Relationships
27
28
Social Activity on the Internet and its Influence on Social Anxiety, and
Loneliness with Online Relationships
29
30
30
31
32
33
CHAPTER 1
INTRODUCTION
Statement of the Problem
The Internet has become an integral part of Western society, with
approximately 72.5% of the population of the United States using the Internet on a
regular basis (Internet World Stats, 2008). With only a click of the mouse, the
Internet allows individuals to learn information about almost any topic they care to
research, and to communicate with or learn about future romantic partners,
prospective employees, long-lost friends, or family members (Davis 2007; Kraut et
al., 2002; Teske, 2002; White, 2007). The present study investigated the effect of
Internet use on social interaction with particular attention to the levels of social
anxiety, and depression experienced by college students who engage in frequent, nonacademic Internet use.
In 2005, the primary researcher noticed a social pattern reported by college
freshmen and sophomores presenting for therapy at a rural university counseling
center. In particular, these students frequently reported that they were more
comfortable talking to their friends using technology such as the Internet or text
messaging on their cell phones, than traditional forms of communication such as faceto-face conversations or speaking on the telephone. Anecdotally, a particular client
reported that she frequently froze up and was unable to have an in-person
conversation with her male friends but had no difficulty talking with text via a
computer instant messaging program.
CHAPTER 2
REVIEW OF THE LITERATURE
The Internet and Related Terms
In 1995, the term Internet was officially defined as the global information
system that is logically linked together by a globally unique address space based on
the Internet Protocol (IP), that is able to support communications using Transmission
Control Protocol/Internet Protocol (TCP/IP) and provides, uses or makes accessible,
either publicly or privately, high level services layered on the communications and
related infrastructure (Federal Networking Council, 1995, p. 1). However, when
individuals talk about the Internet, they are typically referring to more than this
technical definition.
When individuals access the Internet they typically do it via the World Wide
Web (web). The web is actually a collection of electronic documents that are stored
on computers throughout the world (World Wide Web, 2002; Howe 2007). Through
the use of a web browser these documents can be easily accessed by anyone who
knows what to look for and are frequently identified through the use of search engines
designed to access these documents based on key words (Search Engine, 2009). This
information can then be communicated to others through the use of email or instant
messaging/chat programs. Email is an electronic message that is sent and/or received
over a system that is designed specifically for the transmission of electronically
written messages between computers (Email, 2009; Howe, 2007). Due to its virtually
instantaneous delivery, email is a quick and easy form of communication that
individuals use for professional and personal reasons throughout their day.
Communication also happens on the Internet through instant messaging programs and
Internet Relay Chat (IRC). Instant messaging programs are designed to allow real
time conversation to occur between individuals who access the same service by
means of a program installed on their personal computers (Instant messaging, 2009;
Howe 2007). Similar to instant messaging, IRC allows real time conversation to occur
between groups of individuals in locations typically referred to as chat rooms
through a worldwide network of computers (IRC, 2009; Howe, 2007). In the last
decade with the advent of social networking sites, a new form of communication has
emerged on the Internet. Social networking sites, such Facebook or Twitter, are
typically websites designed to allow individuals to publish information about
themselves, with the intention of sharing that information with others in a way that
doesnt require direct conversation (Howe, 2007).
The Internet has expanded in ways that were not foreseeable at its inception.
As the tools that are used to access the Internet increase, so do the number of online
activities and the amount of time spent engaging in online activities. This is
particularly true for younger generations, as represented by the statistics presented in
a recent Pew Internet Survey that reported 83% to 87% of individuals ages 18 to 49
use the Internet compared with 65% of individuals age 50 to 64 and 32% of
individuals age 65+ (Pew Internet Tracking Survey, 2007a). The types of activities
that individuals report engaging in most often online are sending or reading email
(56%), searching for information (41%), getting news (37%), looking for information
on a hobby or other interest (29%), or browsing websites for fun (28%) (Pew Internet
Tracking Survey, 2007b). These statistics are particularly salient for younger
generations who have grown up with the Internet as part of their daily lives and
cannot imagine a time when constant contact to the world via the Internet did not
exist.
Gender Differences and the Internet
Although both genders reported equal use of the Internet in a Pew Internet
Tracking Survey (2007a), the psychological research of Internet usage presents mixed
results when looking at gender differences. An Odell, Korgen, Schumacher &
Delucchi (2000) study measured the responses of 843 students at five public
institutions and three private institutions to compare Internet usage and gender.
Participants were asked basic demographic questions, including major and year in
college, and Internet related questions including amount of access to the Internet
while growing up, how much time they currently spent on the Internet, and why they
accessed the Internet. The study reported that for public institutions, there were no
gender differences in the amount of time spent on the Internet, and that at private
institutions males spent significantly more time online than females (p = 0.019).
However, Odell and colleagues (2000) reported gender differences when examining
the specific activities or services accessed. Females spent significantly more time
checking email (p = 0.015), and conducting research for school (p = 0.002), while
their male counterparts spent significantly more time researching purchases (p =
0.002), visiting sex sites (p < 0.001), reading news (p < 0.001), playing games (p <
0.001), and listening to or downloading music (p < 0.001). A study by Sabrina Neu
(2009) looked at gender and perceptions of boredom, social interaction and social
anxiety among 200 college students ranging in age from 18 to 30 who reported
Over the last six years, social engagement has expanded to include the Internet
through the use of social networking (Sellers, 2006, para. 5). Social networking
online is typically accomplished through sites that allow individuals to search for
others that have the same interests, establish friendships, and reconnect with friends
from their past (Luo, 2007, para. 1). The impact of the Internet on social engagement
is frequently discussed in both popular media and in the psychological literature in
negative light.
In the psychological literature one meta-analysis has posited that as
individuals become more accustomed to interacting through the Internet there will be
negative consequences on their ability to communicate appropriately in face-to-face
situations (Brignall & Van Valey, 2005). Additionally, a study that focused on
college students asked 649 men and 647 women about their Internet use and found
that the students who reported greater levels of Internet use also reported that, in
addition to a decrease in their amount of daily sleep (p = 0.05) and lower grades
academically (p = 0.05), they also perceived fewer opportunities to interact with
individuals in face-to-face situations (Anderson, 2001). Another study that focused on
adolescent use of the Internet asked 52 female high school seniors and 37 male high
school seniors to complete several self report measures concerning Internet use,
quality of relationships, and depression (Sanders, Field, Diego & Kaplan, 2000).
Sanders and colleagues (2000) found that higher levels of Internet use were
associated with declines in face-to-face relationships with both friends and mothers
when compared with adolescents that used the Internet less than one hour per day (p
= 0.01). Finally, a recent study that asked 300 participants of an online multiplayer
role playing game to complete measures of social engagement and social anxiety
found that individuals were likely to report that as a result of high levels of Internet
use they had missed meals, decreased their amount of sleep, were more likely to
argue with friends and/or family members and perceived that their face-to-face social
life had suffered as a result (Neu, 2009).
In addition to the negative effects of Internet use on social engagement, the
psychological literature on this topic has also found both neutral and positive results
concerning the impact of Internet usage on social engagement. In 1998, a
comprehensive study of the topic occurred at Carnegie Mellon University (Kraut et
al, 1998). These researchers conducted a longitudinal study that gave computers and
Internet access to 93 families (256 individuals) in the Pittsburgh, Pennsylvania area
who had not previously had such access. Participants completed measures of anxiety,
depression, and social activity before they were given Internet access and then again
after they had been given access. The study authors reported that higher amounts of
Internet usage were correlated with declines in communication, and with smaller
social networks (Kraut et al, 1998). However, in contrast to this earlier study, a
follow-up study conducted in 2002 by the same researchers with 208 of the original
participants found that there were no correlations between Internet usage,
communication, and social networks and attributed this change in findings potentially
to maturation in their participants over time or as a result of the Internet changing to
be more socially inclined (Kraut et al. 2002 p. 69). Additionally, a study completed
by Eric Weiser (2000) had 140 males and 295 females from a student population (n =
134) and an online population (n = 301) complete several measures of well-being via
the World Wide Web (Weiser, 2000). Weiser found that when the Internet is used
primarily for social activities there was a decline in psychological well-being of the
individual and when it was used primarily for non-social activities it resulted in an
increase in psychological well-being (Weiser, 2000, p.257). Conversely, other studies
investigating the effects of Internet use on communication and levels social
interaction reported that college students who chatted anonymously on the Internet
over a period of four to eight weeks were more likely to report at the end of the study
that their perceptions of social support increased, and that individuals who used chat
rooms on a regular basis scored lower on measures of social fearfulness than non-chat
users (Campbell, Cumming, & Hughes 2006; Shaw & Gant 2002). A Madell and
Muncer (2007) study that focused on the use of communication and social
interactions reported that individuals preferred to use email and instant messaging
when communicating emotion-laden concerns in particular. Thus, the relational
consequences of Internet communication may differ by the type of conversation
facilitated.
Articles in the popular media frequently focus on the negative interactions that
are caused by use of the Internet. An example of this was seen on July 15, 2007 when
several articles were written in the popular media about a parenting couple from
Reno, Nevada who had neglected their children in order to play online games (CBS
News, 2007, para. 1; Fox News, 2007, para. 1; USA Today, 2007, para. 1). The
prosecutor in that case stated that the couple was too distracted by online games
to give their children proper care (USA Today, 2007, para. 4). The outcome of the
prosecution of this case has not been determined at this date. Similarly, a recent
10
editorial begins with MySpace is ruining my social life and continues to elucidate
the opening statement by detailing how the author no longer goes out with friends,
preferring instead to stay at home and improve her MySpace page (Geldof, 2007,
para. 1). An article in Time magazine in 2008 stated that the social aspects of the
Internet, namely the ability to comment on articles that are posted, result in
individuals being cruel and loathsome and posits that this is due to illusion of
anonymity online and a general disregard of cultural restraints (Grossman, 2008, para.
2 and 3). In contrast to these media accounts is an editorial in Primary Psychiatry
which recommended that social networking sites be used to connect professionals in
healthcare fields in order to take advantage of the ways that these sites allow
individuals to interact with their peers and exchange information with ease (Luo,
2007).
The connection between Internet use and social engagement has received
mixed results in both the popular media and the psychological research. Some studies
have found that increased use of the Internet leads to a decrease in social engagement
(Anderson, 2001; Kraut et al., 1998), while others have found that increased use leads
to increased social engagement (Campbell et al., 2006; Kraut et al, 2002; Madell &
Muncer, 2007; Sanders et al., 2000; Shaw & Gant, 2002). This mixture of results may
be due to the relative lack of research literature and the instinctive response that
guides most popular media to suppose that increased use of the Internet would result
in decreased social engagement. Such common sense may not stand up to scrutiny
when compared with stringent research.
11
12
looked at online game playing and the self-reported levels of social anxiety (Neu,
2009). Taken together, these studies suggest that socially anxious individuals do not
use the Internet for interpersonal communication as is assumed in the popular media.
Conversely, a study investigating participants ability to express their real self in a
social environment reported that high scores on measures of introversion and
neuroticism were associated with a greater comfort being their real self on the
Internet, compared to ratings that were high on extroversion and low on neuroticism
being associated with being more comfortable in face-to-face social situations
(Amichai-Hamburger et al. 2002). This finding concerning introversion was also
reported in a study conducted by Scott Caplan (2007) who reported that high social
anxiety was predictive of individual preference for online social interaction to faceto-face social interaction.
Intuitively it makes sense that individuals who experience anxiety in social
situations would be more comfortable on the Internet where the perception of
anonymity allows individuals to present only what they want others to see. However,
given the psychological research, it remains to be seen if this intuitive reaction
concerning social anxiety is something that can be adequately measured.
Depression and the Internet
Depression is one of the most common mental health disorders and is
diagnosed when individuals experience a depressed mood most of the day, show a
diminished interest in pleasurable activities, report changes in appetite, and in levels
of concentration, and have feelings of worthlessness or guilt (American Psychiatric
Association, 2000; Young, Weinberger, & Beck, 2001). The prevalence of Major
13
14
harassment on the Internet (Ybarra, 2004). Finally, a Campbell, Cumming & Hughes
(2006) study indicated that depressive symptoms were associated simply with
frequent Internet use, regardless of the amount of time or activity, suggesting that
those who reported spending time on the Internet were more likely to report
depressive symptoms than those who report not spending time online.
The psychological literature is scant on the topic of depression and Internet
use and, as with social engagement and social anxiety, the literature that does exist is
contradictory in nature. The overall consensus is that an increase in Internet use is not
implicated in an increase in levels of depression. In fact, the result of an increase in
reported depressive symptoms from Internet use is currently undetermined, with some
studies implicating gender, others implicating the type of activity engaged in online
and still others stating that its simply that chronically depressed individuals are more
prone to using the Internet than non-depressed individuals (Campbell et al., 2006;
Morgan & Cotton, 2003; Ybarra, 2004)
Hypotheses
This study investigated three primary questions to address this topic: 1) Can
time spent and amount of social interaction online predict loneliness and social
anxiety in face-to-face settings, loneliness and social anxiety in online settings, and
social interaction in face-to-face settings; 2) Can Internet use, or social interaction
online, predict participants levels of depression; and 3) Does gender influence the
amount of time spent online, the type of activities accessed online, or participants
level of depression.
15
16
online will be negatively related to depression with higher levels of social interaction
online predictive of lower levels of depression.
Hypothesis three. To address question three, Hypothesis 3a posits that there
will be gender differences in the amount of time individuals spend on the Internet.
Men will spend more time than women on the Internet. Hypothesis 3b posits that
there will be gender differences in the amount of social interaction online. Women
will spend more time engaging in social activities online than men. Hypothesis 3c
posits that there will be gender differences in the level of depression reported by
participants. Men will demonstrate higher levels of depression than women.
17
CHAPTER 3
METHOD
Participants
Sixty-eight female and 31 male undergraduate students attending a state
university located in rural Pennsylvania served as participants in the current study.
These participants had enrolled in the Psychology Departments subject pool to fulfill
their general psychology course research requirement. All participants were
randomly selected by the subject pool coordinator and were subsequently emailed an
initial request to participate and sent a second email invitation to participate if they
did not respond to the first request. Students who did not respond to either email
request were invited to participate via a subsequent telephone contact. All
participants were informed of the nature of the study and the time commitment
expected when invited to participate. The names of students who declined
participation were returned to the subject pool.
Participants were required to sign an informed consent form (Appendix A) by
which they were again informed of the time commitment and given the opportunity to
opt out of the study. Of the initial 150 students contacted for participation, 138
initially chose to participate in this study and 99 students completed all three phases
of the study.
Materials
Six measures were used in this study: an experimenter-developed
demographic questionnaire (Appendix B), an experimenter-developed self-report
measure of Internet usage (Appendix C), two measures of social engagement: one
18
that measured day-to-day social interactions (Appendix D) and one that measured
perceptions of loneliness (Appendix E), one questionnaire concerning social anxiety
symptoms (Appendix F), and one questionnaire measuring symptoms of depression
(Appendix G).
Demographic questionnaire. The demographic questionnaire (Appendix B)
consisted of 10 questions that included participants current academic standing,
gender, family income and parental levels of education (Braveman, Cubbin, Marchi,
Egerter, & Chavez 2001). This questionnaire also assessed participants current
ability to access the Internet and the typical locations of their access. Additionally, the
demographic questionnaire asked participants to list the three most important
activities in which they engage on the Internet.
Measure of Internet usage. The Internet Usage Tracking Chart (Appendix C,
parts 1 and 2) consisted of a grid designed to allow participants to quickly check off
the hours they engaged in Internet usage in a 24-hour period. Individuals were
instructed to round off times of use to the nearest hour and enter their responses into
an online computer database they were instructed to access each evening from a
personal computer. After tracking their Internet use for one week, participants were
given a series of questions that required them to estimate the amount of time they
spent studying and using the Internet, and to rank-order 13 potential activities (e.g.,
email, social networking, gambling, etc.) in which they engaged while online. This
rank order list was then used to determine if the type of Internet activities accessed by
each participant were of a social or solitary nature by assigning each item a social or
19
non-social value and weighting the value based on the rank assigned by the
participant.
Measures of social engagement. The Social Rhythm Metric (SRM)
(Appendix D) consists of 17 events that occur in an individuals life over the course
of a day, and was designed to assess social support and social networks of an
individual. Participants keep track of when each activity occurred, who was present
during the activity, and their own level of involvement. Individuals were asked to
manually track these 17 activities and enter them into an online computer database
each evening from a personal computer. These items include when participants get
out of bed each morning, when they have meals and when they participate in
activities such as school, exercise, or watching television. For each item the
participant is asked to enter the time the item was completed, whether or not they
were alone at the time, and, if others were present, whether they were just present
or actively involved. The SRM is calculated using an algorithm found in Monk,
Kupfer, Frank, & Ritenour (1990) and several indices can be calculated including
active social engagement, and minimal to no social engagement (Carney, Edinger,
Meyer, Lindman & Istre, 2006). The test-retest reliability for the SRM is moderate
with a significant correlation between week 1 and week 2 (rho=0.60, p < 0.001)
(Monk, Petrie, Hayes & Kupfer, 1994). Additionally the SRM has been described as a
valid instrument by several studies and in a personal communication by the creator of
the measure (Haynes, Ancoli-Israel, & McQuaid, 2005; Meyer & Maier, 2005; T.H.
Monk, personal communication, July 23, 2009; Monk, et al., 1994; Monk, Frank,
Potts, & Kupfer, 2002; Monk, Kupfer, Frank, & Ritenour, 1990).
20
21
Procedures
Selecting participants. All participants were randomly selected by the subject
pool coordinator and contacted via email or telephone to request their participation in
this study. All participants were informed of the nature of the study and the time
commitment involved at the time of first contact and given the opportunity to decline
participation. Students electing to participate were met by an assistant experimenter
who explained the time requirements of the study and again gave participants the
chance to decline participation. Those who elected to participate were required to sign
an informed consent form (Appendix A). Participants were informed that the
researcher was looking for possible connections between Internet usage,
psychological well-being, and relationships. No deception was used during this study.
Additionally, participants were given a resource sheet for campus and community
referrals (Appendix I) as a precaution should they experience feelings of concern
when completing the study measures.
Phase one. After signing the informed consent form, participants were
directed to a university computer with Internet access where they completed the
demographic questionnaire and the measures of depression (CES-D), social anxiety
(BFNE-II), and one of the social engagement measures (UCLA Loneliness scale).
Participants were asked to complete the BFNE-II and UCLA Loneliness Scale twice.
The first time they completed these two measures they were asked to focus on faceto-face relationships, the second time the focus was on online relationships.
Participants were asked to consider face-to-face and online relationships separately in
order to determine if there was a difference in their perception of experienced anxiety
22
or loneliness based on the population with which the participant was interacting.
Participants completed this first phase of the study in approximately 30 minutes.
Phase two. After completing these psychological measures, participants were
given verbal directions for tracking their Internet use and daily social interactions.
Additionally, they were instructed in how they were to enter their Internet use and
social interactions online using their personal computers. Participants were also given
paper copies of the measures to aid in their ability to keep track of their interactions
while not at a computer. Finally, an email reminder was sent from the Applied
Research Lab, a campus department devoted to assisting in research, to participants
each day for seven days to prompt participants to respond. This email reminder was
based on the email contact address provided by the subject and was not tied to
specific results in order to protect confidentiality of responses. It is estimated that this
aspect of the study took approximately 15 minutes each evening for the course of
seven days.
Phase three. At the end of seven days, participants were sent an email with a
link to access the final part of the study, a questionnaire (Appendix C, part 2) that
asked participants to estimate the amount of time they spent studying and using the
Internet, and to rank-order 13 potential activities in which they engaged while online.
After answering these questions, participants were thanked and debriefed (Appendix
H) online and provided with the experimenters contact information should they wish
to receive the results of the study. Additionally participants were again provided with
a copy of local community and campus resources (Appendix I) to access if they felt
23
concerned about any of the information that they were prompted to think about over
the course of this study.
24
CHAPTER 4
RESULTS
Descriptive Statistics
Out of the 150 individuals that were originally approached to participate in
this study, 12 declined to participate after being informed of the time commitment for
this study. Of the remaining 138 individuals, 99 successfully completed all three
phases of the study and were included for analysis. Of the 99 participant scores
included in the analyses, 31 (31.3%) were male and 68 (68.7%) were female. A chisquare test of goodness-of-fit was performed to determine if the differences in group
size for sex of participant significantly different. Sex was not equally distributed
across the population, X2 (1, n=99) = 13.828, p < 0.001. This means that possible
gender effects may not have been detected due to the difference in group sizes.
The majority of the sample was comprised of college freshmen, with 87
(87.9%) of the participants in their first year of college at the time of this study, nine
(9.1%) were sophomores, two (2%) were juniors, and one student (1%) reported
being a continuing education student. Participants reported that they were in 43
different majors, with 16 (16.2%) listing their major as undecided. The majority of
participants with chosen majors were in the college of Health and Human Services
(24.2%), with 18.2% in the college of Natural Sciences and Mathematics, 15.2% in
the college of Business and Information Technology, 15.2% in the college of
Education and Education Technology, 10.1% in the college of Humanities and Social
Sciences, and 1% in the college of Fine Arts. All participants were enrolled in an
undergraduate general psychology course at the time of this study.
25
26
Table 1
Time Spent on the Internet and its Influence on Social Engagement, Social Anxiety,
and Loneliness with Face-to-Face Relationships
Measure
R2
B
SE
p
0.040
SRM
0.142
0.255
0.57
0.578
0.020
0.020
0.112 0.322
0.024
0.023
0.116 0.302
The second linear regression similarily revealed no support for the hypothesis that
participants loneliness, social engagement and social anxiety in online settings was a
function of the amount of time they spent on the Internet. This model produced an R2
of 0.280, F(2,98) = 1.393, p = 0.253 and is displayed in Table 2.
Table 2
Time Spent on the Internet and its Influence on Social Anxiety, and Loneliness with
Online Relationships
Measure
R2
B
SE
p
0.028
BFNE-II: Online Relationships
0.021
0.021
0.107
0.301
0.021
0.020
0.109
0.293
These analyses indicated that amount of time spent on the Internet was not predictive
of subjects reported social anxiety, social engagement and loneliness in either offline
or online settings.
27
p
0.043
SRM
-0.181
0.092 -0.199
0.053
-0.003
0.007 -0.042
0.705
-0.003
0.008 -0.034
0.763
A second regression testing this hypothesis for online settings was performed and
similarily did not support this prediction. This model produced an R2 of 0.190,
F(2,98) = 0.916, p = 0.404 and is displayed in Table 4.
28
Table 4
Social Activity on the Internet and its Influence on Social Anxiety, and Loneliness
with Online Relationships
Measure
R2
B
SE
p
0.019
BFNE-II: Online Relationships
-0.002
0.007 -0.021
0.838
-0.009
0.007 -0.131
0.209
These analyses indicated that type of activity engaged in while on the Internet was not
predictive of subjects reported social anxiety, social engagement and loneliness in
either offline or online settings.
Internet Use and Social Activity on the Internet as a Predictor of
Depression
A series of linear regressions were used to investigate the relationship
between depression, Internet use, and social activity on the Internet. The first
regression was performed to predict participants depression as a function of the
amount of time they spent on the Internet. This model produced an R2 of 0.007,
F(1,98) = 0.682, p = 0.411 and. A second linear regression was performed to predict
participants depression as a function of the type of activities in which they engaged
while using the Internet. This model produced an R2 of 0.002, F(1,98) = 0.198, p =
0.657. Neither analysis supported the hypotheses that time or activity were linked to
participants scores on a measure of their reported depression. Both models can be
found on the following page in Table 5.
29
Table 5
Depression, Internet Use, and Types of Activities Engaged in Online
Predictor
R2
B
SE
0.002
0.504
1.131 0.045
p
0.411
0.657
Social Interaction
Online
Depression
(CES-D)
Between Groups
0.424
0.424
Within Groups
516.369
97
5.323
Total
516.793
98
Between Groups
0.355
0.355
Within Groups
67.484
97
0.696
Total
67.838
98
Between Groups
298.675
298.675
Within Groups
8135.164
97
83.868
Total
8433.838
98
30
0.080
0.778
0.510
0.477
3.561
0.062
Loneliness (UCLA)
SD
SE
Offline
21.26
12.811
1.288
Online
13.19
11.485
1.154
Offline
37.37
11.109
1.116
Online
39.05
11.807
1.187
perceived significantly more social anxiety when interacting with each others in faceto-face relationships than when socializing in online formats. Results from both the
social anxiety and loneliness paired sample t-tests can be found in table 8.
Table 8
Paired Samples T-Tests for Social Anxiety and Loneliness Measures
Measure
t
df
7.319
98
0.000
Loneliness (UCLA)
-1.827
98
0.071
32
Table 9
Means of Participant Time Spent Online
Mean
SD
SE
4.491
99
4.733
0.476
3.911
99
2.296
0.230
Summary of Results
Overall the analyses uses to test the three primary hypotheses did not lend
support to the predictions as expected. No significant results were found for
predicting scores on the measures of social engagement, social anxiety, or depression
based on time spent on the Internet or amount of social activity engaged in while
online. There was, however, the significant finding that participants reported greater
levels of social anxiety when referencing to their face-to-face relationships as
opposed to their online relationships even though the time and activity online did not
impact their overall level of social anxiety.
33
CHAPTER 5
DISCUSSION
The intention of this study was to clarify discrepant portrayals of Internet use
for social interaction by exploring the impact of Internet use on social engagement in
offline and online settings in a college-aged population with particular attention to
symptoms of social anxiety and depression. Sixty-eight female and 31 male
undergraduate college students spanning 43 different majors served as participants in
this study.
Gender Differences and the Internet
Previous research reported differences in the way the gender of participants
influenced ones interactions with the Internet (Neu, 2009, Ybarra, 2004). Based on
this literature, it was hypothesized that the sex of the participant would result in a
difference in either the amount of time spent on the Internet or in the types of
activities (e.g., social or non-social) in which they engaged while online. It was also
hypothesized that males that spent more time on the Internet would report higher
levels of depression than would females. Contrary to this hypothesis, in depth
analysis found no differences detected in the amount of time spent on the Internet, the
type of activities accessed while online, or reported levels of depression between
male and female participants. However, its important to note that although the
researcher attempted to have an equal number of male and female participants, an
overwhelming majority of the participants were female. The fact that a focus was
placed on having an equal number of male and female participants and the sample
still was disproportionately female may allude to some effects for gender that are not
34
visible and thus not measureable. It is hypothesized that men declined to participate in
this study because of the open nature of what was being measured and they did not
want to report the types of activities they engage in online. Due to the low number of
male participants it is possible that differences exist that were not able to be detected
by these analyses as a result of the subsequent low statistical power.
Social Engagement and the Internet
This study defined social engagement as the quality and quantity of
interactions that an individual had with others on a daily basis. Previous literature
reported mixed results that indicated that the amount of Internet usage was linked
with both increases and decreases in face-to-face social interaction (Anderson, 2001;
Campbell et al., 2006; Kraut et al., 1998; Kraut et al., 2002; Madell & Muncer, 2007;
Sanders et al., 2000; and Shaw & Gant, 2002). Whereas popular media articles
frequently focus on a perceived negative effect of Internet usage in face-to-face social
interactions (Geldof, 2007; Grossman, 2008 and USA Today, 2007).
Based on this review of both the psychological literature and the popular
media, it was hypothesized that either the amount of time spent on the Internet or the
amount of social interaction engaged in while online could be used to predict
loneliness and social interaction in offline settings (i.e., face-to-face relationships) and
in online settings (i.e., online relationships). Results suggest that neither the amount
of time spent on the Internet nor the amount of social activity engaged in while online
were predictive of participants scores on measures of social engagement and
loneliness in offlineand online settings. This lack of a statistically significant result
should not be dismissed because it helps to build on the previous literature that use of
35
the Internet is not going to result in individuals who are less socially engaged with
their day-to-day lives.
After investigating the primary hypothesis concerning social engagement and
the Internet, an additional analysis was completed to look at potential differences in
participants perceptions of loneliness for face-to-face social engagement settings and
online social engagement settings (e.g., online relationships). The primary reason for
conducting this analysis was to investigate if the frequently negative conception of
the Internets effect on social engagement in the popular media is related to the
intuitive perception of individuals. Previous research has shown that individuals will
overlook information that does not fit with their intuitive sense of how things should
occur particularly if they are already confident that the information should fit in a
particular intuitive way (Simmons & Nelson, 2006). With this in mind, participants
scores on the loneliness measure for face-to-facerelationships and online relationships
were compared to look for apparent differences in their perception of loneliness.
Contrary to popular media accounts of social engagement and the Internet, there were
no apparent differences in the respondents perception of the loneliness aspect of
social engagement for these seemingly disparate relationships. Thus is can be
hypothesized that when the popular media refers to the negative impact of the Internet
on social engagement they are not referring to the loneliness aspects of social
engagement.
Social Anxiety and the Internet
This study used the traditional definition of social anxiety as defined by the
Diagnostic and Statistical Manual, fourth edition, text revision (American Psychiatric
36
Association, 2000). The psychological literature on social anxiety and the Internet
(Campbell et al., 2006; Madell & Muncer, 2006; Neu, 2009) contradicted the popular
perception that socially anxious individuals were more likely to use the Internet for
interpersonal interactions (Ayushveda, 2008; Cuncic, 2009; Sorryforsilence, 2009).
Based on the review of both the psychological literature and the popular
media accounts of social anxiety, it was hypothesized that either the amount of time
spent online or the amount of social interaction engaged in while online could be used
to predict levels of social anxiety. This study confirmed previous findings in the
psychological literature that neither the amount of time spent on the Internet nor the
amount of social activity engaged in while online was predictive of the level of social
anxiety reported by participants when interacting with both offline and online
relationships.
After the analysis of the primary hypothesis was completed and found to not
be significantly significant, additional analysis was completed in order to investigate
the potential differences in participants perceptions of social anxiety while engaging
with the Internet. As stated previously, previous research had shown that individuals
were more likely to overlook information that is counterintuitive based on their own
level of confidence in the erroneous information (Simmons & Nelson, 2006) and it
was theorized that this may account for some of the discrepancy between the
psychological literature and the popular media. Unlike with the loneliness analysis,
the additional analysis on the participants perceptions when they were asked to
respond to questions measuring social anxiety showed that they were more likely to
perceive differences in their social anxiety level when asked to focus on offline
37
38
CHAPTER 6
CONCLUSION, LIMITATIONS, AND RECOMMENDATIONS
This study was intended to clarify the psychological literature concerning use
of the Internet and its effect on social engagement in a college-aged population with
particular attention to levels of social anxiety and depression. Thorough investigation
of three primary questions revealed no correlation between the amount of time spent
on the Internet and levels of social engagement, social anxiety and depression in
either offline (e.g., face-to-face) relationships or online relationships. Additionally, no
correlation between the type of activity engaged in while on the Internet and levels of
social engagement, social anxiety, or depression in offline or online relationships was
found in the current study. One finding of note was that the perception of social
anxiety decreased for participants when asked to answer for online relationships, even
though their actual levels of social anxiety were still not significantly influenced by
Internet use.
The seeming implication of this study is that the Internet, like so many other
aspects of daily life, is merely a tool that individuals access and use in ways that they
can choose. The amount of social engagement in which a person engages, both onand offline is not sigificantly influenced by this tool, nor is their reported levels of
depression symptoms or social anxiety.
There are several limitations to the findings of this study that must be
considered. First, the population at the rural university where this study was
conducted is 87% White, non-hispanic (IUP, 2010), and thus the sample can be
assumed to have been disproporiately White. This assumption is made because
39
ethnicity was inadvertently absent in the demographic questionnaire this study used.
This prevented exploration of differences based on ethnicity and represents a
limitation for generalizing the results of this study to non-White populations.
Collection of this variable would benefit furture investigations of this topic.
A second and commom limitation is the analog nature of the current study.
Although pariticpants were asked to enter their Internet use and social engagement
into a computer database each evening, they were still required to manually keep
track and enter their self-report. Due to the nature of self-report it is possible that the
data entered is not as accurate as it would be if their usage had been tracked digitally.
Future studies would benefit from gaining permission from participants to install a
computer tracking program to automatically gather the information needed.
The third major limitation of this study is the age group that was tracked.
Although the reason behind focusing this particular study on college students was due
to the fact that this population is assumed to have more access to the Internet as a part
of their daily lives for the majority of their lives, it is possible that different results
regarding the predictive nature of Internet usage would have been found in older
populations. Future research would benefit from exploring the hypothesized links of
this study across both ethnicity and the lifespan.
The last major limitation is that the population of this study was
disproportionately female despite investigator efforts to obtain equal representation of
sex across participants and thus it is possible that the lack of significant gender effects
was due to this discrepancy. Future studies would benefit from using a population
with equally distributed sex in order to explore or rule out any potential effects due to
40
the sex or gender of the standard Internet user. Perhaps future studies would benefit
by masking the study to help prevent reactivity effects, and thus encourage more
individuals of all genders to participate.
This study confirmed what previous psychological studies have alluded to,
and what the popular media has appeared to deny: the Internet is a valuable tool that
individuals use on a daily basis in order to access information concerning the world
around them. It appears that this tool does not significantly increase a persons
reported levels of depression, social engagement, or social anxiety. However, one
finding that has not been seen in other studies is that although the Internet does not
change a persons actual level of social anxiety, it may decrease their perception of
social anxiety when interacting online. Future studies would benefit from continuing
to explore this difference between the individuals perceived and actual levels of
social anxiety to determine what, if any, aspect of individuals online relationships
help. Specifically, more research needs to be done looking at both online and offline
relationships in all aspects of mental health and across all ethnicities and ages.
Additionally, with the rate that the Internet, and all aspects of social media are
expanding, it would be beneficial to have participants rate their activities online in
terms of how social they consider each activity. For example, one person playing
chess online may find the interaction to be highly social while another individual may
find it to be a solitary activity. This perception of social interaction online would be a
rich area to explore in future research.
Thus, contrary to the original hypotheses of this study, the Internet is simply a
tool that can be used to broaden a persons experience of the world in any way that
41
they see fit. The Internet puts the world at a persons fingertips, and in the United
States, is theoretically the type of tool that any individual can access regardless of
their socio-economic-status, ethnicity, class standing or geographic location.
42
REFERENCES
American Psychiatric Association. (2000). Diagnostic and Statistical Manual of
Mental Disorders (text revision). Washington, DC: Author.
Amichai-Hamburger, Y., Wainapel, G., & Fox, S. (2002). On the Internet No One
Knows Im an Introvert: Extroversion, Neuroticism, and Internet Interaction.
CyberPsychology & Behavior, 5(2), 125-128.
Anderson, K.J. (2001). Internet Use Among College Students: An Exploratory Study.
Journal of American College Health, 50(1), 21-26.
Ayushveda (2008, July). How to Reduce Anxiety. Online magazine by Ayushveda.
Retrieved from http://www.ayushveda.com/
Braveman, P., Cubbin, C., Marchi, K., Egerter, S., & Chavez, G. (2001). Measuring
Socioeconomic Status/Position in Studies of Racial/Ethnic Disparities:
Maternal and Infant Health. Public Health Reports, 116, 449-463.
Brignall, T.W., III & Van Valey, T. (2005). The Impact of Internet Communications
on Social Interactions. Sociological Spectrum, 25, 335-348.
Canadian Council on Social Development (2006). Social Engagement: The Progress
of Canadas Children and Youth, 2006. Retrieved from
http://www.ccsd.ca/pccy/2006/pdf/pccy_socialengagement.pdf.
Campbell, A.J., Cummings, S.R., & Hughes, I. (2006). Internet Use by the Socially
Fearful: Addiction or Therapy? CyberPsychology & Behavior, 9(1), 69-81.
Caplan, S.E. (2007). Relations Among Loneliness, Social Anxiety, and Problematic
Internet Use. CyberPsychology & Behavior, 10(2), 234-242.
43
Carleton, R.N., McCreary, D.R., Norton, P.J., & Asmundson, G.J.G. (2006). Brief
Fear of Negative Evaluation Scale Revised. Depression and Anxiety, 23,
297-303.
Carney, C.E., Edinger, J.D., Meyer, B., Lindman, L., & Istre, T. (2006). Daily
Activities and Sleep Quality in College Students. Chronobiology
International, 23(3), 623-637.
CBS News (2007, July 15). Parents Played Video Games As Kids Starved. Retrieved
from http://www.cbsnews.com/stories/2007/07/15/national/
main3058816.shtml. Article on file with author.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Cuncic, A. (2009, June 30). Is Facebook Good for Social Anxiety [Web log post].
Retrieved from
http://socialanxietydisorder.about.com/b/2009/06/30/facebook-good-forsocial-anxiety.htm
Davis, D.C. (2007). MySpace Isnt Your Space. ExpressO Preprint Series. Working
Paper 1943. Retrieved from http://law.bepress.co m/expresso/eps/1943
Email (2009). The American Heritage Dictionary of the English Language, Fourth
Edition. Retrieved from http://dictionary.reference.com/browse/email
Faul, F., Erdfelder, E., Lang, A.-G. & Buchner, A. (2007). G*Power 3: A flexible
statistical power analysis program for the social, behavioral, and biomedical
sciences. Behavior Research Methods, 39, 175-191.
44
45
Internet World Stats (2008) United States of America: Internet Usage and Broadband
Usage Report. Retrieved from http://www.internetworldstats.com/am/us.htm
IRC (2009). The American Heritage Dictionary of the English Language, Fourth
Edition. Retrieved from http://dictionary.reference.com/browse/irc
IUP (2010). Facts about IUP. Retrieved from http://www.iup.edu/about/default.aspx
Jones, S. & Fox, S. (2009), Pew Internet & American Life Project: Generations
online in 2009. Retrieved from http://www.pewinternet.org
Kraut, R., Patterson, M., Lundmark, V., Kiesler, S., Mukopadhyay, T., & Scherlis, W.
(1998). Internet Paradox: A Social Technology that Reduces Social
Involvement and Psychological Well-Being? American Psychologist, 53(9),
1017-1031
Kraut, R., Kiesler, S., Boneva, B., Cummings, J., Helgeson, V., & Crawford, A.
(2002). Internet Paradox Revisited. Journal of Social Issues. 58(1), 49-74.
Luo, J.S. (2007). Social Networking: Now Professionally Ready. Primary Psychiatry,
14(2), 21-24
Madell, D., & Muncer, S. (2006). Internet Communication: An Activity that Appeals
to Shy and Socially Phobic People? CyberPsychology & Behavior, 9(5), 618622.
Madell, D., & Muncer, S.J. (2007). Control over Social Interactions: An Important
Reason for Young Peoples Use of the Internet and Mobile Phones for
Communication. CyberPsychology & Behavior, 10(1), 137-140.
Meyer, T.D., & Maier, S. (2006). Is there evidence for social rhythm instability in
people at risk for affective disorders? Psychiatry Research, 141, 103-114.
46
Monk, T.H., Frank, E., Potts, J.M., & Kupfer, D.J. (2002). A simple way to measure
daily lifestyle regularity. J. Sleep Res., 11, 183-190.
Monk, T.H., Kupfer, D.J., Frank, E., & Ritenour, A.M. (1990). The Social Rhythm
Metric (SRM): Measuring Daily Social Rhythms Over 12 Weeks. Psychiatry
Research, 36, 195-207.
Monk, T.H., Petrie, S.R., Hayes, A.J., & Kupfer, D.J. (1994). Regularity in daily life
in relation to personality, age, gender, sleep quality, and circadian rhythms. J.
Sleep Res, 3, 196-205.
Morgan, C., & Cotton, S.R. (2003). The Relationship between Internet Activities and
Depressive Symptoms in a Sample of College Freshmen. CyberPsychology &
Behavior, 6(2), 133-142.
Neu, S. (2009). Use of Massively Multiplayer Online Role Play Games by College
Students (Doctoral dissertation). Available from ProQuest Dissertations and
Thesis Database (ATT No. 3344420).
Odell, P.M., Korgen, K.O., Schumacher, P., & Delucchi, M. (2000). Internet Use
Among Female and Male College Students. CyberPsychology & Behavior,
3(5), 855-862.
Pew Internet Tracking Survey (2007a) Demographics of Internet Users. Retrieved
from http://www.pewinternet.org/
Pew Internet Tracking Survey (2007b) Daily Internet Activities. Retrieved from
http://www.pewinternet.org/
47
Radloff, L.S., (1977). The CES-D Scale: A Self-Report Depression Scale for
Research in the General Population. Applied Psychological Measurement, 1,
385-401.
Russell, D. (1996). UCLA Loneliness Scale (Version 3): Reliability, Validity, and
Factor Structure. Journal of Personality Assessment, 66(1), 20-40.
Sanders, C.E., Field, T.M., Diego, M. & Kaplan, M. (2000). The Relationship of
Internet Use to Depression and Social Isolation Among Adolescents.
Adolescence, 35(138), 237-242.
Sellers, P. (2006, August 29). MySpace Cowboys. Fortune Magazine. Retrieved from
http://money.cnn.com/magazines/fortune/
Shaw, L.H., & Gant, L.M. (2002). In Defense of the Internet: The Relationship
between Communication and Depression, Loneliness, Self-Esteem, and
Perceived Social Support. CyberPsychology & Behavior, 5(2), 157-171
Simmons, J.P. & Nelson, L.D. (2006). Intuitive Confidence: Choosing Between
Intuitive and Nonintuitive Alternatives. Journal of Experimental Psychology:
General, 135(3), 409-428.
Sorryforsilence (2009, June 22). Are online friends BAD for people with social
anxiety disorder? [Web log post]. Retrieved from
http://sorryforsilence.wordpress.com/2009/06/22/is-having-online-friendsbad-for-people-with-social-anxiety-disorder/
Teske, J.A. (2002). Cyberpsychology, Human Relationships, and Our Virtual
Interiors. Zygon, 37(3), 677-700.
48
Thibaut, J., & Kelley, H. (1986). Interference and Facilitation in Interaction. In The
Social Psychology of Groups. (p. 60). Edison, New Jersey: Transaction
Publishers.
Turk, C.L., Heimberg, R.G., & Hope, D.A. (2001). Social Anxiety Disorder. In. D.H.
Barlow (Ed.), Clinical Handbook of Psychological Disorders (3rd edition).
(pp. 114-153). New York: The Guilford Press.
USA Today (2007, July 15). Couple Accused of starving children while on the
Internet. Retrieved from http://www.usatoday/news/nation/2007-07-15internet-neglect_N.htm. Article on file with author.
Watters, E. (2003). Urban Tribes. New York: Bloomsbury
Weiser, E. (2000). The Functions of Internet use and their social, psychological, and
interpersonal consequences (Doctoral dissertation). Available from ProQuest
Dissertations and Thesis Database (ATT No. 9980637).
White, E. (2007). Text Appeal: In the Age of Computers and Cell Phones,
Relationships Progress from Email to Text to the Real Commitment: A Phone
Call. The Houston Chronicle. Retrieved from http://www.chron.com/
World Wide Web (2002). The American Heritage Science Dictionary. Retrieved
from http://dictionary.reference.com/browse/world_wide_web
Ybarra, M.L. (2004). Linkages between Depressive Symptomatology and Internet
Harassment among Young Regular Internet Users. CyberPsychology &
Behavior, 7(2), 247-257.
49
Young, J.E., Weinberger, A.D., Beck, A.T. (2001). Cognitive Therapy for
Depression. In. D.H. Barlow (Ed.), Clinical Handbook of Psychological
Disorders (3rd edition). (pp. 114-153). New York: The Guilford Press.
Yule, T. (2004). Lotus Illustrated Dictionary of Internet. Twin Lakes, WI: Lotus
Press
50
APPENDIX A
Informed Consent
Informed Consent
You are invited to participate in this research study. The following information is
provided in order to help you make an informed decision about whether or not to
participate in this study. If you have any questions please do not hesitate to ask via
the provided researcher email listed below. You are eligible to participate because
you are an undergraduate at Indiana University of Pennsylvania and enrolled in PSYC
101 General Psychology.
The purpose of this study is to learn about college students habits when using the
Internet and the impact that it may have on social relationships and psychological
well-being. This study is particularly interested in looking at the amount of time you
spend on the Internet per week and the particular activities you engage in while on the
Internet. In an effort to get a complete picture of respondents, some demographic
information is included for this study. Several questionnaires include items of a
personal nature related to feelings of loneliness, depression and anxiety. It is
estimated that completion of questionnaires will take one hour at the initial interview
and an additional 15 minutes each night for a period of 7 days for no longer than a
total commitment of 3 hours. Your completed participation in this study will earn 4
of the 6 points required to complete your research participation in your PSYC 101
course.
Your participation in this study is voluntary. You may choose not to participate in this
study or to withdraw at any time without adversely affecting your relationship with
the investigators, with IUP, or your psychology professor. If you choose not to
participate, your name will be returned to the subject pool and your research
participation obligation will remain the same. If you choose to participate you may
withdraw at any time by notifying the researcher. Upon your request to withdraw, all
information pertaining to you will be destroyed. If you choose to participate, all
information will be held in strict confidence. Your responses will be considered only
in combination with those from other participants. The information obtained in this
study may be published in scientific journals or presented at scientific meetings but
your identity will always be kept strictly confidential.
51
Faculty Sponsor:
Kimberely J. Husenits, Psy.D.
Associate Professor
Psychology Department
238A Uhler Hall
Indiana, PA 15705
husenits@iup.edu
If you are willing to participate in this study, please sign the statement below. If you
choose not to participate, please inform the researcher now.
I have read the above information and understand that participation in this study is
voluntary. I agree to be a part of this research.
Signature of Participant
52
Date
APPENDIX B
Demographic Questionnaire
Where do you currently reside?
_____ On campus in student housing
_____ Off campus in student housing
_____ Off campus with family
_____ Off campus with friends
_____ Off campus alone
_____ Other:
What are the 3 most important activities you use the Internet for?
(1)
(2)
(3)
53
54
55
APPENDIX C
Part One: Internet Usage Tracking Chart
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
Sunday
Midnight 1 am
1 am 2 am
2 am 3 am
3 am 4 am
4 am 5 am
5 am 6 am
6 am 7 am
7 am 8 am
8 am 9 am
9 am 10 am
10 am 11 am
11 am 12 pm
12 pm 1 pm
1 pm 2 pm
2 pm 3 pm
3 pm 4 pm
4 pm 5 pm
5 pm 6 pm
6 pm 7 pm
7 pm 8 pm
8 pm 9 pm
9 pm 10 pm
10 pm 11 pm
11 pm Midnight
Scoring: Total amount of hours per day, divided by number of days = average amount per day online.
56
APPENDIX D
PM
Spouse /
partner
Children
Other
family
members
Other
person(s)
People
1=just present
2= actively involved
AM
Clock time
Check if
Did not do
Time
Check if Alone
Have breakfast
Have lunch
Have dinner
Physical exercise
Activity A
Activity B
Go to bed
Activity
Out of bed
First contact (in person or by phone)
with another person
58
APPENDIX E
UCLA Loneliness Scale (Version 3)
The Following statements describe how people sometimes feel. For each statement, please indicate how often you feel
the way described by writing a number in the space provided.
Here is an example: How often do you feel happy?
If you never felt happy, you would respond never; if you always feel happy, you would respond always.
Never
Rarely
Sometimes
Often
1.
How often do you feel that you are in tune with the people
around you?
2.
3.
How often do you feel that there is no one you can turn to?
4.
5.
6.
How often do you feel that you have a lot in common with
the people around you?
7.
8.
How often do you feel that your interests and ideas are not
shared by those around you?
9.
12. How often do you feel that your relationships with others are
not meaningful?
13. How often do you feel that no one really knows you well?
15. How often do you feel that you can find companionship
when you want it?
16. How often do you feel that there are people who really
understand you?
18. How often do you feel that people are around you but not
with you?
19. How often do you feel that there are people you can talk to?
20. How often do you feel that there are people you can turn to?
59
APPENDIX F
Center For Epidemiologic Studies Depression Scale (CES-D)
of the time
(1-2 days)
Occasionally or
a moderate
amount of time
(3-4 days)
Most or all of
the time
(5-7 days)
6. I felt depressed.
60
APPENDIX G
Brief Fear of Negative Evaluation, Revised (BFNE-II)
For the following statements please indicate how characteristic each is of you using the following rating scale
Not at all
characteristic
of me
Slightly
characteristic
of me
Moderately
characteristic
of me
Very
characteristic
of me
Extremely
characteristic
of me
1.
2.
3.
4.
5.
6.
7.
8.
9.
61
APPENDIX H
Debriefing
Clinical Psychology Doctoral Program
Psychology Department
Uhler Hall, Room 201 / 1020 Oakland Avenue
Indiana, Pennsylvania 15705-1064
724-357-4519 (office) 724-357-4519 (fax)
Debriefing
Thank you for participating in this research study. The Internet has become an
integral part of Western society, with approximately 69.2% of the population of the
United States using the Internet on a regular basis (Internet World Stats, 2007). This
study was conducted with the purpose of learning about college students habits when
using the Internet and the impact that it may have on social relationships and
psychological well-being of regular users of this medium. The connection between
Internet use and social engagement, depression and social anxiety has received mixed
results in both the popular media and the psychological research, an example of this
can be found in the journal article Internet Paradox Revisited (Kraut et al., 2002). The
study in which you participated is designed to more accurately track students daily
use of the Internet in terms of time spent on a variety of online activities in order to
clarify links between use and psychological outcomes.
The responses that you gave will be considered only in combination with those from
other participants in the study so that you cannot be personally identified. Although
the information obtained in this study may be published in scientific journals or
presented at scientific meetings, your identity will always be kept strictly
confidential.
This research is sponsored by Indiana University of Pennsylvanias Department of
Psychology. If you have any questions concerning this study, if you would like more
examples of the mixed results found between popular media and the psychological
research, or if you feel that you need to speak with a professional and would like a
referral, please contact the primary researcher listed below:
Primary Researcher:
Kimberlee D. DeRushia, M.A.
Graduate Student
Psychology Department
201 Uhler Hall
Indiana, PA 15705
k.d.derushia@iup.edu
Faculty Sponsor:
Kimberely J. Husenits, Psy.D.
Associate Professor
Psychology Department
238A Uhler Hall
Indiana, PA 15705
husenits@iup.edu
62
APPENDIX I
Campus and Community Resources
Counseling / Psychotherapy Resources:
1. IUP Counseling and Student Development Center
307 Pratt Hall (IUP campus)
724.357.2621
2. Crisis Intervention, Drug and Alcohol Counseling:
Open Door Counseling & Crisis Center
334 Philadelphia Street
Indiana, PA
724.465.2605
Suicide Hotline: 800.794.2112
3. Indiana County Guidance Center
793 Old Route 119 Highway North
Indiana, PA
724.465.5576
4. Center for Applied Psychology
Includes Stress & Habit Disorders Clinic, Child & Family Clinic, and Assessment
Clinic
210 Uhler Hall (IUP campus)
724.357.6228
Domestic Violence or Rape Crisis:
1. Alice Paul House
724.349.4444 or 800.435.7249
Child Abuse or Neglect:
1. Indiana County Children and Youth Services
350 N. 4th Street
Indiana, PA
724.465.3895
63
64
Recommending or Persuading?
The Impact of a Shopping Agents
Algorithm on User Behavior
Gerald Hubl
Kyle B. Murray
School of Business
University of Alberta
Edmonton, AB
Canada, T6G 2R6
School of Business
University of Alberta
Edmonton, AB
Canada, T6G 2R6
Gerald.Haeubl@ualberta.ca
kbmurray@ualberta.ca
ABSTRACT
Keywords
1. INTRODUCTION
The constraints of physical space no longer dictate the
organization of information in electronic shopping environments
[6]. One consequence of this is that online vendors are able to
offer a very large number of products due to their virtually infinite
shelf space, i.e., the lack of physical constraints with respect to
product display. Combined with the fact that the cost of searching
for product information across merchants is substantially lower in
electronic marketplaces than in the physical world [1, 7], this
results in the availability of a potentially vast amount of
information about market offerings to consumers.
Easy access to large amounts of product information is both a
blessing and a curse. It is a blessing in the sense that more
information may allow consumers to make better purchase
decisions (e.g., to select products that better match their personal
preferences) than they would otherwise. However, the curse of
having access to vast amounts of information is that consumers,
due to their limited cognitive capacity, may be unable to
adequately process this information. The idea that human decision
makers have limited resources for information processing
whether those limits are in memory, attention, motivation, or
elsewhere has deep roots in the literature of both marketing
and psychology [9, 11, 12]. In electronic shopping environments,
consumers are less constrained by the availability of product
information, yet they remain bounded by the cognitive limitations
of human information processing.
General Terms
Algorithms, Management, Design, Economics, Experimentation,
Human Factors, Theory.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
EC01, October 14-17, 2001, Tampa, Florida, USA.
Copyright 2001 ACM 1-58113-387-1/01/0010$5.00.
163
164
2. THE POTENTIAL OF AN
ELECTRONIC AGENT TO PERSUADE
3. METHOD
3.1 Overview
The objective of this experiment was to examine the possibility of
preference construction due to the selective inclusion of attributes
in a recommendation agent. The study was fully computer-based,
and involved a simulated shopping trip in an Internet-based
electronic store equipped with a recommendation agent and the
subsequent completion of an online questionnaire. Subjects were
informed that the purpose of the research was to test a new
electronic shopping environment and its features. Their task was
to shop for a backpacking tent in the Internet-based store and to
complete their simulated shopping trip by selecting from the set of
available tents the one that was the most attractive to them
personally. A total of 347 subjects completed the study remotely,
via a secure Internet site. Participants were randomly assigned to
one of the treatment conditions (see below).
165
Durability
Rating
Fly
Fabric
Weight
(kilograms)
1
Coyote
76
2.3 oz Nylon
3.4
2
Adventurer
76
1.9 oz Polyester
3.4
3
Sunlight
79
2.3 oz Nylon
3.5
4
Grizzly
79
1.9 oz Polyester
3.5
5
Oasis
82
2.3 oz Nylon
3.6
6
Solitude
82
1.9 oz Polyester
3.6
7
Summit
85
2.3 oz Nylon
3.3
8
Drifter
85
1.9 oz Polyester
3.3
9
Challenger
88
2.3 oz Nylon
3.8
10
Serenity
88
1.9 oz Polyester
3.8
11
Raven
91
2.3 oz Nylon
3.9
12
Waterfall
91
1.9 oz Polyester
3.9
13
Naturalist
94
2.3 oz Nylon
4.0
14
Skyline
94
1.9 oz Polyester
4.0
15
Neptune
97
2.3 oz Nylon
3.7
16
Freestyle
97
1.9 oz Polyester
3.7
Note: The most attractive level of each of the two primary attributes is indicated by gray shading.
166
Warranty
(years)
4
3
4
3
4
3
4
3
4
3
4
3
4
3
4
3
4. RESULTS
As a test of the predicted inclusion effect, we examine the relative
choice shares in the shopping task for alternatives that have the
most attractive level of the primary included attribute, i.e., the
primary attribute that was considered by the recommendation
agent. Our directional prediction is that alternatives that are
superior on the included attribute are more likely to be chosen
than ones that are superior on the excluded attribute. The
corresponding null hypothesis is that the extent to which an
attribute drives subjects choices of products is independent of
whether that attribute was included in the recommendation agent
or not, i.e., that half of the subjects select an alternative that has
the most attractive level of the included attribute and the other
half choose a product that has the most attractive level of the
excluded attribute (when controlling for potential differences in
the ecological importance of the actual attributes through
counterbalancing). A significant departure from such a fifty-fifty
split in choice shares in the predicted direction (i.e., greater
importance of an attribute when it is included in the
recommendation agent) would provide support for the inclusion
effect. Since attribute-specific characteristics were controlled for
through the counterbalancing of the two blocks of attributes, any
such departure would be independent of the relative importance of
the actual attributes used.
Durability
Rating
Fly
Fabric
Weight
(kilograms)
1
Traveler
76
2.3 oz Nylon
3.9
2
Journey
76
1.9 oz Polyester
3.9
3
Seabreeze
79
2.3 oz Nylon
4.0
4
Moonscape
79
1.9 oz Polyester
4.0
5
Galaxy
82
2.3 oz Nylon
3.5
6
Lakeside
82
1.9 oz Polyester
3.5
7
BackTrail
85
2.3 oz Nylon
3.6
8
Eagle
85
1.9 oz Polyester
3.6
9
Eclipse
88
2.3 oz Nylon
3.7
10
Daydream
88
1.9 oz Polyester
3.7
11
Spirit
91
2.3 oz Nylon
3.8
12
Westwind
91
1.9 oz Polyester
3.8
13
Glacier
94
2.3 oz Nylon
3.3
14
Wanderer
94
1.9 oz Polyester
3.3
15
Mountain
97
2.3 oz Nylon
3.4
16
Outfitter
97
1.9 oz Polyester
3.4
Note: The most attractive level of each of the two primary attributes is indicated by gray shading.
167
Warranty
(years)
4
3
4
3
4
3
4
3
4
3
4
3
4
3
4
3
168
70%
60.7%
Choice Shares
60%
Alternative
Superior on
Included
Attribute
50%
39.3%
40%
30%
Alternative
Superior on
Excluded
Attribute
20%
10%
0%
Figure 3: Attribute Inclusion in the Agent and Choice Shares in Agent-Assisted Shopping
80%
71.0%
70%
Choice Shares
60%
51.5%
50%
48.5%
40%
30%
Alternative
Superior on
Included
Attribute
29.0%
Alternative
Superior on
Excluded
Attribute
20%
10%
0%
Negative
Positive
Inter-Attribute Correlation
5. DISCUSSION
Although electronic shopping environments are not subject to the
space constraints of bricks-and-mortar stores, consumers remain
bounded by the familiar cognitive constraints in terms of their
ability to process information. Electronic recommendation agents
can play a key role in reducing the amount of information about
169
7. REFERENCES
Our key hypothesis has been that, everything else being equal, the
inclusion of an attribute in a selective recommendation agent
renders this attribute more prominent in consumers purchase
decisions in an electronic shopping environment. The results of
our controlled experiment provide strong support for the existence
of such an inclusion effect under typical market conditions where
no alternative is clearly superior to another and choosing a
product involves making trade-offs among attributes (i.e.,
negative inter-attribute correlation). Our findings suggest that, in
addition to providing a recommendation, an electronic agent has
the potential, whether intentionally or unintentionally, to
persuade users that certain alternatives are preferable to others.
The research presented here demonstrates that the preferences of
human decision makers can be influenced in a systematic and
predictable manner by merely altering the composition of the set
of product attributes that are included in a recommendation agent
for online shopping. In combination with the results from Hubl
and Murray [4], which demonstrate that the inclusion effect may
persist over time and into settings where an electronic agent is no
longer present, this stream of research illustrates the considerable
potential for systematically manipulating consumer behavior and
consumer preferences in digital marketplaces through the design
of electronic decision aids.
This paper extends the existing body of literature on constructive
consumer preferences by proposing and demonstrating a new type
of preference-construction effect that, given the rapidly increasing
prevalence of electronic decision aids for online shopping, is of
growing importance. In addition, this research also makes a
contribution to the emerging literature on consumer behavior in
the context of electronic commerce, in that it represents a step
towards a more complete understanding of human decision
making in agent-assisted electronic shopping environments.
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
6. ACKNOWLEDGMENTS
170
The deep Web is the largest growing category of new information on the Internet.
Deep Web sites tend to be narrower, with deeper content, than conventional surface sites.
Total quality content of the deep Web is 1,000 to 2,000 times greater than that of the surface
Web.
Deep Web content is highly relevant to every information need, market, and domain.
More than half of the deep Web content resides in topic-specific databases.
A full ninety-five per cent of the deep Web is publicly accessible information not subject to fees
or subscriptions.
To put these findings in perspective, a study at the NEC Research Institute [1] [#fn1] , published in
Nature estimated that the search engines with the largest number of Web pages indexed (such as
Google or Northern Light) each index no more than sixteen per cent of the surface Web. Since they
are missing the deep Web when they use such search engines, Internet searchers are therefore
searching only 0.03% or one in 3,000 of the pages available to them today. Clearly,
simultaneous searching of multiple surface and deep Web sources is necessary when
comprehensive information retrieval is needed.
[/j/jep/images/3336451.0007.104-00000001.gif]
Figure 1. Search Engines: Dragging a Net Across the Web's Surface
54,000 documents. [12] [#fn12] Since then, the compound growth rate in Web documents has been on
the order of more than 200% annually! [13a] [#fn13]
Sites that were required to manage tens to hundreds of documents could easily do so by posting
fixed HTML pages within a static directory structure. However, beginning about 1996, three
phenomena took place. First, database technology was introduced to the Internet through such
vendors as Bluestone's Sapphire/Web (Bluestone [http://www.bluestone.com] has since been bought
by HP) and later Oracle. [http://www.oracle.com/] Second, the Web became commercialized initially
via directories and search engines, but rapidly evolved to include e-commerce. And, third, Web
servers were adapted to allow the "dynamic" serving of Web pages (for example, Microsoft's ASP
and the Unix PHP technologies).
This confluence produced a true database orientation for the Web, particularly for larger sites. It is
now accepted practice that large data producers such as the U.S. Census Bureau
[http://www.census.gov] , Securities and Exchange Commission [http://www.sec.gov] , and Patent and
Trademark Office [http://www.uspto.gov] , not to mention whole new classes of Internet-based
companies, choose the Web as their preferred medium for commerce and information transfer.
What has not been broadly appreciated, however, is that the means by which these entities provide
their information is no longer through static pages but through database-driven designs.
It has been said that what cannot be seen cannot be defined, and what is not defined cannot be
understood. Such has been the case with the importance of databases to the information content of
the Web. And such has been the case with a lack of appreciation for how the older model of
crawling static Web pages today's paradigm for conventional search engines no longer applies
to the information content of the Internet.
In 1994, Dr. Jill Ellsworth first coined the phrase "invisible Web" to refer to information content
that was "invisible" to conventional search engines. [14] [#fn14] The potential importance of searchable
databases was also reflected in the first search site devoted to them, the AT1 engine that was
announced with much fanfare in early 1997. [15] [#fn15] However, PLS, AT1's owner, was acquired by
AOL in 1998, and soon thereafter the AT1 service was abandoned.
For this study, we have avoided the term "invisible Web" because it is inaccurate. The only thing
"invisible" about searchable databases is that they are not indexable nor able to be queried by
conventional search engines. Using BrightPlanet technology, they are totally "visible" to those who
need to access them.
Figure 2 represents, in a non-scientific way, the improved results that can be obtained by
BrightPlanet technology. By first identifying where the proper searchable databases reside, a
directed query can then be placed to each of these sources simultaneously to harvest only the
results desired with pinpoint accuracy.
[/j/jep/images/3336451.0007.104-00000002.gif]
Figure 2. Harvesting the Deep and Surface Web with a Directed Query Engine
Additional aspects of this representation will be discussed throughout this study. For the moment,
however, the key points are that content in the deep Web is massive approximately 500 times
greater than that visible to conventional search engines with much higher quality throughout.
BrightPlanet's technology is uniquely suited to tap the deep Web and bring its results to the
surface. The simplest way to describe our technology is a "directed-query engine." It has other
powerful features in results qualification and classification, but it is this ability to query multiple
search sites directly and simultaneously that allows deep Web content to be retrieved.
Study Objectives
To perform the study discussed, we used our technology in an iterative process. Our goal was to:
Quantify the size and importance of the deep Web.
Characterize the deep Web's content, quality, and relevance to information seekers.
Discover automated means for identifying deep Web search sites and directing queries to them.
Begin the process of educating the Internet-searching public about this heretofore hidden and
valuable information storehouse.
Like any newly discovered phenomenon, the deep Web is just being defined and understood. Daily,
as we have continued our investigations, we have been amazed at the massive scale and rich
content of the deep Web. This white paper concludes with requests for additional insights and
information that will enable us to continue to better understand the deep Web.
which may be partially "hidden" to the major traditional search engines nor the contents of
major search engines themselves. This latter category is significant. Simply accounting for the
three largest search engines and average Web document sizes suggests search-engine contents
alone may equal 25 terabytes or more [17] [#fn17] or somewhat larger than the known size of the
surface Web.
In partnership with Inktomi, NEC updated its Web page estimates to one billion documents in
early 2000. [21] [#fn21] We have taken this most recent size estimate and updated total document
storage for the entire surface Web based on the 1999 Nature study:
Table 1. Baseline Surface Web Size Assumptions
Total No. of Documents Content Size (GBs) (HTML basis)
1,000,000,000
18,700
These are the baseline figures used for the size of the surface Web in this paper. (A more recent
study from Cyveillance [5e] [#fn5] has estimated the total surface Web size to be 2.5 billion documents,
growing at a rate of 7.5 million documents per day. This is likely a more accurate number, but the
NEC estimates are still used because they were based on data gathered closer to the dates of our
own analysis.)
Other key findings from the NEC studies that bear on this paper include:
Surface Web coverage by individual, major search engines has dropped from a maximum of 32%
in 1998 to 16% in 1999, with Northern Light showing the largest coverage.
Metasearching using multiple search engines can improve retrieval coverage by a factor of 3.5 or
so, though combined coverage from the major engines dropped to 42% from 1998 to 1999.
More popular Web documents, that is, those with many link references from other documents,
have up to an eight-fold greater chance of being indexed by a search engine than those with no
link references.
stand-alone use, a query term known not to occur on the site such as "NOT ddfhrwxxct" was
issued. This approach returns an absolute total record count. Failing these two options, a
broad query was issued that would capture the general site content; this number was then
corrected for an empirically determined "coverage factor," generally in the 1.2 to 1.4 range
[#fn22]
[22]
5. A site that failed all of these tests could not be measured and was dropped from the results
listing.
[/j/jep/images/3336451.0007.104-00000003.gif]
Figure 3. Schematic Representation of "Overlap" Analysis
Overlap analysis involves pairwise comparisons of the number of listings individually within two
sources, na and nb, and the degree of shared listings or overlap, n0, between them. Assuming
random listings for both na and nb, the total size of the population, N, can be estimated. The
estimate of the fraction of the total population covered by na is no/nb; when applied to the total size
of na an estimate for the total population size can be derived by dividing this fraction into the total
size of na. These pairwise estimates are repeated for all of the individual sources used in the
analysis.
To illustrate this technique, assume, for example, we know our total population is 100. Then if two
sources, A and B, each contain 50 items, we could predict on average that 25 of those items would
be shared by the two sources and 25 items would not be listed by either. According to the formula
above, this can be represented as: 100 = 50 / (25/50)
There are two keys to overlap analysis. First, it is important to have a relatively accurate estimate
for total listing size for at least one of the two sources in the pairwise comparison. Second, both
sources should obtain their listings randomly and independently from one another.
This second premise is in fact violated for our deep Web source analysis. Compilation sites are
purposeful in collecting their listings, so their sampling is directed. And, for search engine listings,
searchable databases are more frequently linked to because of their information value which
increases their relative prevalence within the engine listings. [5f] [#fn5] Thus, the overlap analysis
represents a lower bound on the size of the deep Web since both of these factors will tend to
increase the degree of overlap, n0, reported between the pairwise sources.
Exactly 700 sites were inspected in their randomized order to obtain the 100 fully characterized
sites. All sites inspected received characterization as to site type and coverage; this information was
used in other parts of the analysis.
The 100 sites that could have their total
record/document count determined were then sampled
"The invisible portion of the
for average document size (HTML-included basis).
Web will continue to grow
Random queries were issued to the searchable database
exponentially before the
with results reported as HTML pages. A minimum of
tools to uncover the hidden
ten of these were generated, saved to disk, and then
Web are ready for general
averaged to determine the mean site page size. In a few
use"
cases, such as bibliographic databases, multiple records
were reported on a single HTML page. In these
instances, three total query results pages were
generated, saved to disk, and then averaged based on the total number of records reported on those
three pages.
fee access.
Growth Analysis
The best method for measuring growth is with time-series analysis. However, since the discovery of
the deep Web is so new, a different gauge was necessary.
Whois [http://www.whois.net] [28] [#fn28] searches associated with domain-registration services [25b]
[#fn25] return records listing domain owner, as well as the date the domain was first obtained (and
other information). Using a random sample of 100 deep Web sites [26b] [#fn26] and another sample of
100 surface Web sites [29] [#fn29] we issued the domain names to a Whois search and retrieved the
date the site was first established. These results were then combined and plotted for the deep vs.
surface Web samples.
Quality Analysis
Quality comparisons between the deep and surface Web content were based on five diverse,
subject-specific queries issued via the BrightPlanet technology to three search engines (AltaVista,
Fast, Northern Light) [30] [#fn30] and three deep sites specific to that topic and included in the 600
sites presently configured for our technology. The five subject areas were agriculture, medicine,
finance/business, science, and law.
The queries were specifically designed to limit total results returned from any of the six sources to a
maximum of 200 to ensure complete retrieval from each source. [31] [#fn31] The specific technology
configuration settings are documented in the endnotes. [32] [#fn32]
The "quality" determination was based on an average of our technology's VSM and mEBIR
computational linguistic scoring methods. [33] [#fn33] [34] [#fn34] The "quality" threshold was set at our
score of 82, empirically determined as roughly accurate from millions of previous scores of surface
Web documents.
Deep Web vs. surface Web scores were obtained by using the BrightPlanet technology's selection by
source option and then counting total documents and documents above the quality scoring
threshold.
About.com's Web search guide concluded the size of the deep Web was "big and getting bigger." [37]
[#fn37] A paper at a recent library science meeting suggested that only "a relatively small fraction of
the Web is accessible through search engines." [38] [#fn38]
The deep Web is about 500 times larger than the surface Web, with, on average, about three times
higher quality based on our document scoring methods on a per-document basis. On an absolute
basis, total deep Web quality exceeds that of the surface Web by thousands of times. Total number
of deep Web sites likely exceeds 200,000 today and is growing rapidly. [39] [#fn39] Content on the
deep Web has meaning and importance for every information seeker and market. More than 95%
of deep Web information is publicly available without restriction. The deep Web also appears to be
the fastest growing information component of the Web.
Name
Type
URL
Web
Size
(GBs)
National
Climatic Data
Center (NOAA)
Public
http://www.ncdc.noaa.gov/ol/satellite/satelliteresources.html
366,00
NASA EOSDIS
Public
http://harp.gsfc.nasa.gov/~imswww/pub/imswelcome/plain.html 219,60
National
Oceanographic
(combined with
Geophysical)
Data Center
(NOAA)
Public/Fee
http://www.nodc.noaa.gov/, http://www.ngdc.noaa.gov/
32,940
Alexa
Public
(partial)
http://www.alexa.com/
15,860
Right-to-Know
Network (RTK
Net)
Public
http://www.rtk.net/
14,640
MP3.com
Public
http://www.mp3.com/
4,300
Terraserver
Public/Fee
http://terraserver.microsoft.com/
4,270
HEASARC
(High Energy
Astrophysics
Science Archive
Research
Center)
Public
http://heasarc.gsfc.nasa.gov/W3Browse/
2,562
US PTO Trademarks +
Patents
Public
http://www.uspto.gov/tmdb/, http://www.uspto.gov/patft/
2,440
Informedia
(Carnegie
Mellon Univ.)
Public (not
yet)
http://www.informedia.cs.cmu.edu/
1,830
Alexandria
Digital Library
Public
http://www.alexandria.ucsb.edu/adl.html
1,220
JSTOR Project
Limited
http://www.jstor.org/
1,220
10K Search
Wizard
Public
http://www.tenkwizard.com/
769
UC Berkeley
Digital Library
Project
Public
http://elib.cs.berkeley.edu/
766
SEC Edgar
Public
http://www.sec.gov/edgarhp.htm
610
US Census
Public
http://factfinder.census.gov
610
NCI CancerNet
Database
Public
http://cancernet.nci.nih.gov/
488
Amazon.com
Public
http://www.amazon.com/
461
IBM Patent
Center
Public/Private http://www.patents.ibm.com/boolquery
345
NASA Image
Exchange
Public
337
InfoUSA.com
Public/Private http://www.abii.com/
195
Betterwhois
(many similar)
Public
http://betterwhois.com/
152
GPO Access
Public
http://www.access.gpo.gov/
146
Adobe PDF
Search
Public
http://searchpdf.adobe.com/
143
http://nix.nasa.gov/
Internet
Auction List
Public
http://www.internetauctionlist.com/search_products.html
130
Commerce, Inc.
Public
http://search.commerceinc.com/
122
Library of
Public
Congress Online
Catalog
http://catalog.loc.gov/
116
Sunsite Europe
Public
http://src.doc.ic.ac.uk/
98
Uncover
Periodical DB
Public/Fee
http://uncweb.carl.org/
97
Astronomer's
Bazaar
Public
http://cdsweb.u-strasbg.fr/Cats.html
94
eBay.com
Public
http://www.ebay.com/
82
REALTOR.com
Real Estate
Search
Public
http://www.realtor.com/
60
Federal Express
Public (if
shipper)
http://www.fedex.com/
53
Integrum
Public/Private http://www.integrumworld.com/eng_test/index.html
49
NIH PubMed
Public
http://www.ncbi.nlm.nih.gov/PubMed/
41
Visual Woman
(NIH)
Public
http://www.nlm.nih.gov/research/visible/visible_human.html
40
AutoTrader.com Public
http://www.autoconnect.com/index.jtmpl/?
LNX=M1DJAROSTEXT
39
UPS
Public (if
shipper)
http://www.ups.com/
33
NIH GenBank
Public
http://www.ncbi.nlm.nih.gov/Genbank/index.html
31
AustLi
(Australasian
Legal
Information
Institute)
Public
http://www.austlii.edu.au/austlii/
24
Digital Library
Program (UVa)
Public
http://www.lva.lib.va.us/
21
Subtotal
Public and
Mixed
Sources
673,0
DBT Online
Fee
http://www.dbtonline.com/
30,500
Lexis-Nexis
Fee
http://www.lexis-nexis.com/lncc/
12,200
Dialog
Fee
http://www.dialog.com/
10,980
Genealogy ancestry.com
Fee
http://www.ancestry.com/
6,500
ProQuest Direct
(incl. Digital
Vault)
Fee
http://www.umi.com
3,172
Dun &
Bradstreet
Fee
http://www.dnb.com
3,113
Westlaw
Fee
http://www.westlaw.com/
2,684
Dow Jones
News Retrieval
Fee
http://dowjones.wsj.com/p/main.html
2,684
infoUSA
Fee/Public
http://www.infousa.com/
1,584
Elsevier Press
Fee
http://www.elsevier.com
570
EBSCO
Fee
http://www.ebsco.com
481
Springer-Verlag
Fee
http://link.springer.de/
221
OVID
Technologies
Fee
http://www.ovid.com
191
Investext
Fee
http://www.investext.com/
157
Blackwell
Science
Fee
http://www.blackwell-science.com
146
GenServ
Fee
http://gs01.genserv.com/gs/bcc.htm
106
Academic Press
IDEAL
Fee
http://www.idealibrary.com
104
Tradecompass
Fee
http://www.tradecompass.com/
61
INSPEC
Fee
http://www.iee.org.uk/publish/inspec/online/online.html
16
Subtotal FeeBased
Sources
75.46
TOTAL
748,5
This listing is preliminary and likely incomplete since we lack a complete census of deep Web sites.
Our inspection of the 700 random-sample deep Web sites identified another three that were not in
the initially identified pool of 100 potentially large sites. If that ratio were to hold across the entire
estimated 200,000 deep Web sites (see next table), perhaps only a very small percentage of sites
shown in this table would prove to be the largest. However, since many large sites are anecdotally
known, we believe our listing, while highly inaccurate, may represent 10% to 20% of the actual
largest deep Web sites in existence.
This inability to identify all of the largest deep Web sites today should not be surprising. The
awareness of the deep Web is a new phenomenon and has received little attention. We solicit
nominations for additional large sites on our comprehensive CompletePlanet site and will
document new instances as they arise.
Total Est.
Deep
Search
A no
Search
B no
A
Unique Database Database Web Sites
Engine A dupes Engine B dupes plus
Fraction Size
B
AltaVista
Northern
Light
60
0.133
20,635
154,763
AltaVista
Fast
57
0.140
20,635
147,024
Fast
57
AltaVista
49
0.889
27,940
31,433
Northern
Light
60
AltaVista
52
0.889
27,195
30,594
Northern
Light
60
Fast
57
44
16
0.772
27,195
35,230
Fast
57
Northern
Light
60
44
13
0.733
27,940
38,100
This table shows greater diversity in deep Web site estimates as compared to normal surface Web
overlap analysis. We believe the reasons for this variability are: 1) the relatively small sample size
matched against the engines; 2) the high likelihood of inaccuracy in the baseline for total deep Web
database sizes from Northern Light [42] [#fn42] ; and 3) the indiscriminate scaling of Fast and
AltaVista deep Web site coverage based on the surface ratios of these engines to Northern Light. As
a result, we have little confidence in these results.
An alternate method is to compare NEC reported values [5g] [#fn5] for surface Web coverage to the
reported deep Web sites from the Northern Light engine. These numbers were further adjusted by
the final qualification fraction obtained from our hand scoring of 700 random deep Web sites.
These results are shown below:
Table 4. Estimation of Deep Web Sites, Search Engine Market Share Basis
Search
Engine
ReportedDeep
WebSites
Surface Web
Coverage %
QualificationFraction Total
Est.Deep
WebSites
Northern
Light
27,195
16.0%
86.4%
146,853
AltaVista
20,635
15.5%
86.4%
115,023
This approach, too, suffers from the limitations of using the Northern Light deep Web site baseline.
It is also unclear, though likely, that deep Web search coverage is more highly represented in the
search engines' listing as discussed above.
Our third approach is more relevant and is shown in Table 5.
Under this approach, we use overlap analysis for the three largest compilation sites for deep Web
sites used to build our original 17,000 qualified candidate pool. To our knowledge, these are the
three largest listings extant, excepting our own CompletePlanet site.
This approach has the advantages of:
providing an absolute count of sites
ensuring final qualification as to whether the sites are actually deep Web search sites
relatively large sample sizes.
Because each of the three compilation sources has a known population, the table shows only three
pairwise comparisons (e.g., there is no uncertainty in the ultimate A or B population counts).
Table 5. Estimation of Deep Web Sites, Searchable Database Compilation Overlap
Analysis
DB A
A no
dups
DB B
B no
dups
A
+
B
Unique DB
DB
Fract. Size
Total Estimated
Deep Web Sites
Lycos
5,081
Internets 3,449
256 4,825
0.074
5,081 68,455
Lycos
5,081
Infomine 2,969
156 4,925
0.053
5,081 96,702
Internets 3,449
Infomine 2,969
234 3,215
0.079
3,449 43,761
As discussed above, there is certainly sampling bias in these compilations since they were
purposeful and not randomly obtained. Despite this, there is a surprising amount of uniqueness
among the compilations.
The Lycos and Internets listings are more similar in focus in that they are commercial sites. The
Infomine site was developed from an academic perspective. For this reason, we adjudge the LycosInfomine pairwise comparison to be most appropriate. Though sampling was directed for both
sites, the intended coverage and perspective is different.
There is obviously much uncertainty in these various tables. Because of lack of randomness, these
estimates are likely at the lower bounds for the number of deep Web sites. Across all estimating
methods the mean estimate for number of deep Web sites is about 76,000, with a median of about
56,000. For the searchable database compilation only, the average is about 70,000.
The under count due to lack of randomness and what we believe to be the best estimate above,
namely the Lycos-Infomine pair, indicate to us that the ultimate number of deep Web sites today is
on the order of 200,000.
[/j/jep/images/3336451.0007.104-00000004.gif]
Figure 4. Inferred Distribution of Deep Web Sites, Total Record Size
Plotting the fully characterized random 100 deep Web sites against total record counts produces
Figure 4. Plotting these same sites against database size (HTML-included basis) produces Figure 5.
Multiplying the mean size of 74.4 MB per deep Web site times a total of 200,000 deep Web sites
results in a total deep Web size projection of 7.44 petabytes, or 7,440 terabytes. [43] [#fn43] [44a] [#fn44]
Compared to the current surface Web content estimate of 18.7 TB (see Table 1), this suggests a
deep Web size about 400 times larger than the surface Web. Even at the lowest end of the deep
Web size estimates in Table 3 through Table 5, the deep Web size calculates as 120 times larger
than the surface Web. At the highest end of the estimates, the deep Web is about 620 times the size
of the surface Web.
Alternately, multiplying the mean document/record count per deep Web site of 5.43 million times
200,000 total deep Web sites results in a total record count across the deep Web of 543 billion
documents. [44b] [#fn44] Compared to the Table 1 estimate of one billion documents, this implies a
deep Web 550 times larger than the surface Web. At the low end of the deep Web size estimate this
factor is 170 times; at the high end, 840 times.
Clearly, the scale of the deep Web is massive, though uncertain. Since 60 deep Web sites alone are
nearly 40 times the size of the entire surface Web, we believe that the 200,000 deep Web site basis
is the most reasonable one. Thus, across database and record sizes, we estimate the deep Web to be
about 500 times the size of the surface Web.
[/j/jep/images/3336451.0007.104-00000005.gif]
Figure 5. Inferred Distribution of Deep Web Sites, Total Database Size (MBs)
2.7%
Arts
6.6%
Business
5.9%
Computing/Web
6.9%
Education
4.3%
Employment
4.1%
Engineering
3.1%
Government
3.9%
Health
5.5%
Humanities
13.5%
Law/Politics
3.9%
Lifestyles
4.0%
News, Media
12.2%
Recreation, Sports
3.5%
References
4.5%
Science, Math
4.0%
Travel
3.4%
Shopping
3.2%
[/j/jep/images/3336451.0007.104-00000006.gif]
Figure 6. Distribution of Deep Web Sites by Content Type
More than half of all deep Web sites feature topical databases. Topical databases plus large internal
site documents and archived publications make up nearly 80% of all deep Web sites. Purchasetransaction sites including true shopping sites with auctions and classifieds account for
another 10% or so of sites. The other eight categories collectively account for the remaining 10% or
so of sites.
Surface Web
Deep Web
"Quality" Yield
Agriculture 400
20
5.0%
300
42
14.0%
Medicine
500
23
4.6%
400
50
12.5%
Finance
350
18
5.1%
600
75
12.5%
Science
700
30
4.3%
700
80
11.4%
Law
260
12
4.6%
320
38
11.9%
TOTAL
2,210 103
4.7%
2,320 285
12.3%
This table shows that there is about a three-fold improved likelihood for obtaining quality results
from the deep Web as from the surface Web on average for the limited sample set. Also, the
absolute number of results shows that deep Web sites tend to return 10% more documents than
surface Web sites and nearly triple the number of quality documents. While each query used three
of the largest and best search engines and three of the best known deep Web sites, these results are
somewhat misleading and likely underestimate the "quality" difference between the surface and
deep Web. First, there are literally hundreds of applicable deep Web sites for each query subject
area. Some of these additional sites would likely not return as high an overall quality yield, but
would add to the total number of quality results returned. Second, even with increased numbers of
surface search engines, total surface coverage would not go up significantly and yields would
decline, especially if duplicates across all search engines were removed (as they should be). And,
third, we believe the degree of content overlap between deep Web sites to be much less than for
surface Web sites.(45) Though the quality tests applied in this study are not definitive, we believe
they point to a defensible conclusion that quality is many times greater for the deep Web than for
the surface Web. Moreover, the deep Web has the prospect of yielding quality results that cannot
be obtained by any other means, with absolute numbers of quality results increasing as a function
of the number of deep Web sites simultaneously searched. The deep Web thus appears to be a
critical source when it is imperative to find a "needle in a haystack."
[/j/jep/images/3336451.0007.104-00000007.gif]
Figure 7. Comparative Deep and Surface Web Site Growth Rates
Use of site domain registration as a proxy for growth has a number of limitations. First, sites are
frequently registered well in advance of going "live." Second, the domain registration is at the root
or domain level (e.g., www.mainsite.com [http://www.mainsite.com] ). The search function and page
whether for surface or deep sites often is introduced after the site is initially unveiled and may
itself reside on a subsidiary form not discoverable by the whois analysis.
The best way to test for actual growth is a time series analysis. BrightPlanet plans to institute such
tracking mechanisms to obtain better growth estimates in the future.
However, this limited test does suggest faster growth for the deep Web. Both median and average
deep Web sites are four or five months "younger" than surface Web sites (Mar. 95 v. Aug. 95). This
is not surprising. The Internet has become the preferred medium for public dissemination of
records and information, and more and more information disseminators (such as government
agencies and major research projects) that have enough content to qualify as deep Web are moving
their information online. Moreover, the technology for delivering deep Web sites has been around
for a shorter period of time.
To find out whether the specialty search engines really do offer unique information, we used
similar retrieval and qualification methods on them pairwise overlap analysis in a new
investigation. The results of this analysis are shown in the table below.
A no
Search
dupes Engine B
B no
A
Unique Search
Search Est. #
dupes plus
Engine
Engine of
B
Fraction Size
Search
Engines
FinderSeeker
2,012
SEG
1,268
233
1,779
0.184
2,012
10,949
FinderSeeker
2,012
Netherlands
1,170
167
1,845
0.143
2,012
14,096
FinderSeeker
2,012
LincOne
783
129
1,883
0.165
2,012
12,212
SearchEngineGuide 1,268
FinderSeeker 2,012
233
1,035
0.116
1,268
10,949
SearchEngineGuide 1,268
Netherlands
1,170
160
1,108
0.137
1,268
9,272
SearchEngineGuide 1,268
LincOne
783
28
1,240
0.036
1,268
35,459
Netherlands
1,170
FinderSeeker 2,012
167
1,003
0.083
1,170
14,096
Netherlands
1,170
SEG
1,268
160
1,010
0.126
1,170
9,272
Netherlands
1,170
LincOne
783
44
1,126
0.056
1,170
20,821
LincOne
783
FinderSeeker 2,012
129
654
0.064
783
12,212
LincOne
783
SEG
1,268
28
755
0.022
783
35,459
LincOne
783
Netherlands
1,170
44
739
0.038
783
20,821
These results suggest there may be on the order of 20,000 to 25,000 total search engines currently
on the Web. (Recall that all of our deep Web analysis excludes these additional search engine sites.)
M. Hofstede, of the Leiden University Library in the Netherlands, reports that one compilation
alone contains nearly 45,000 search site listings. [46] [#fn46] Thus, our best current estimate is that
deep Web searchable databases and search engines have a combined total of 250,000 sites.
Whatever the actual number proves to be, comprehensive Web search strategies should include the
specialty search engines as well as deep Web sites. Thus, BrightPlanet's CompletePlanet Web site
also includes specialty search engines in its listings.
Commentary
The most important findings from our analysis of the deep Web are that there is massive and
meaningful content not discoverable with conventional search technology and that there is a nearly
uniform lack of awareness that this critical content even exists.
[/j/jep/images/3336451.0007.104-00000008.gif]
Figure 8. 10-yr Growth Trends in Cumulative Original Information Content (log scale)
The total volume of printed works (books, journals, newspapers, newsletters, office documents)
has held steady at about 390 terabytes (TBs). [48b] [#fn48] By about 1998, deep Web original
information content equaled all print content produced through history up until that time. By
2000, original deep Web content is estimated to have exceeded print by a factor of seven and is
projected to exceed print content by a factor of sixty three by 2003.
Other indicators point to the deep Web as the fastest growing component of the Web and will
continue to dominate it. [49] [#fn49] Even today, at least 240 major libraries have their catalogs on
line; [50] [#fn50] UMI, a former subsidiary of Bell & Howell, has plans to put more than 5.5 billion
document images online; [51] [#fn51] and major astronomy data initiatives are moving toward putting
petabytes of data online. [52] [#fn52]
These trends are being fueled by the phenomenal growth and cost reductions in digital, magnetic
storage. [48c] [#fn48] [53] [#fn53] International Data Corporation estimates that the amount of disk
storage capacity sold annually grew from 10,000 terabytes in 1994 to 116,000 terabytes in 1998,
and it is expected to increase to 1,400,000 terabytes in 2002. [54] [#fn54] Deep Web content accounted
for about 1/338th of magnetic storage devoted to original content in 2000; it is projected to
increase to 1/200th by 2003. As the Internet is expected to continue as the universal medium for
publishing and disseminating information, these trends are sure to continue.
. We issued a query on "NCAA basketball" with a restriction to review only annual filings filed
between March 1999 and March 2000. One result was produced for Sportsline USA, Inc. Clicking
on that listing produces full-text portions for the query string in that annual filing. With another
click, the full filing text can also be viewed. The URL resulting from this direct request is:
http://www.10kwizard.com/blurbs.php?repo=tenk & ipage=1067295 &
exp=%22ncaa+basketball%22 & g=
Note two things about this URL. First, our query terms appear in it. Second, the "ipage=" shows a
unique record number, in this case 1067295. It is via this record number that the results are served
dynamically from the 10KWizard database.
Now, if we were doing comprehensive research on this company and posting these results on our
own Web page, other users could click on this URL and get the same information. Importantly, if
we had posted this URL on a static Web page, search engine crawlers could also discover it, use the
same URL as shown above, and then index the contents.
It is by doing searches and making the resulting URLs available that deep content can be brought
to the surface. Any deep content listed on a static Web page is discoverable by crawlers and
therefore indexable by search engines. As the next section describes, it is impossible to completely
"scrub" large deep Web sites for all content in this manner. But it does show why some deep Web
content occasionally appears on surface Web search engines.
This gray zone also encompasses surface Web sites that are available through deep Web sites. For
instance, the Open Directory Project [http://dmoz.org] , is an effort to organize the best of surface
Web content using voluntary editors or "guides." [56] [#fn56] The Open Directory looks something like
Yahoo!; that is, it is a tree structure with directory URL results at each branch. The results pages
are static, laid out like disk directories, and are therefore easily indexable by the major search
engines.
The Open Directory claims a subject structure of 248,000 categories, [57] [#fn57] each of which is a
static page. [58] [#fn58] The key point is that every one of these 248,000 pages is indexable by major
search engines.
Four major search engines with broad surface coverage allow searches to be specified based on
URL. The query "URL:dmoz.org" (the address for the Open Directory site) was posed to these
engines with these results:
Table 9. Incomplete Indexing of Surface Web Sites
Engine
AltaVista
17,833
7.2%
Fast
12,199
4.9%
Northern Light
11,120
4.5%
Go (Infoseek)
1,970
0.8%
Although there are almost 250,000 subject pages at the Open Directory site, only a tiny percentage
are recognized by the major search engines. Clearly the engines' search algorithms have rules about
either depth or breadth of surface pages indexed for a given site. We also found a broad variation in
the timeliness of results from these engines. Specialized surface sources or engines should
therefore be considered when truly deep searching is desired. That bright line between deep and
surface Web shows is really shades of gray.
Content
Consider how a directed query works: specific requests need to be posed against the searchable
database by stringing together individual query terms (and perhaps other filters such as date
restrictions). If you do not ask the database specifically what you want, you will not get it.
Let us take, for example, our own listing of 38,000 deep Web sites. Within this compilation, we
have some 430,000 unique terms and a total of 21,000,000 terms. If these numbers represented
the contents of a searchable database, then we would have to issue 430,000 individual queries to
ensure we had comprehensively "scrubbed" or obtained all records within the source database. Our
database is small compared to some large deep Web databases. For example, one of the largest
collections of text terms is the British National Corpus containing more than 100 million unique
terms. [59] [#fn59]
It is infeasible to issue many hundreds of thousands or millions of direct queries to individual deep
Web search databases. It is implausible to repeat this process across tens to hundreds of thousands
of deep Web sites. And, of course, because content changes and is dynamic, it is impossible to
repeat this task on a reasonable update schedule. For these reasons, the predominant share of the
deep Web content will remain below the surface and can only be discovered within the context of a
specific information request.
Web as well.
Effective searches should both identify the relevant information desired and present it in order of
potential relevance quality. Sometimes what is most important is comprehensive discovery
everything referring to a commercial product, for instance. Other times the most authoritative
result is needed the complete description of a chemical compound, as an example. The searches
may be the same for the two sets of requirements, but the answers will have to be different.
Meeting those requirements is daunting, and knowing that the deep Web exists only complicates
the solution because it often contains useful information for either kind of search. If useful
information is obtainable but excluded from a search, the requirements of either user cannot be
met.
We have attempted to bring together some of the metrics included in this paper, [60] [#fn60] defining
quality as both actual quality of the search results and the ability to cover the subject.
Table 10. Total "Quality" Potential, Deep vs. Surface Web
Search Type
Surface Web
Single Site Search
160
Metasite Search
840
38
45
Deep Web
Mega Deep Search
110,000
14,850
688:1
2,063:1
Metasite Search
131:1
393:1
TOTAL POSSIBLE
655:1
2,094:1
These strict numerical ratios ignore that including deep Web sites may be the critical factor in
actually discovering the information desired. In terms of discovery, inclusion of deep Web sites
may improve discovery by 600 fold or more.
Surface Web sites are fraught with quality problems. For example, a study in 1999 indicated that
44% of 1998 Web sites were no longer available in 1999 and that 45% of existing sites were halffinished, meaningless, or trivial. [61] [#fn61] Lawrence and Giles' NEC studies suggest that individual
major search engine coverage dropped from a maximum of 32% in 1998 to 16% in 1999. [7b] [#fn7]
Peer-reviewed journals and services such as Science Citation Index have evolved to provide the
authority necessary for users to judge the quality of information. The Internet lacks such authority.
An intriguing possibility with the deep Web is that individual sites can themselves establish that
authority. For example, an archived publication listing from a peer-reviewed journal such as
Nature or Science or user-accepted sources such as the Wall Street Journal or The Economist
carry with them authority based on their editorial and content efforts. The owner of the site vets
what content is made available. Professional content suppliers typically have the kinds of databasebased sites that make up the deep Web; the static HTML pages that typically make up the surface
Web are less likely to be from professional content suppliers.
By directing queries to deep Web sources, users can choose authoritative sites. Search engines,
because of their indiscriminate harvesting, do not direct queries. By careful selection of searchable
sites, users can make their own determinations about quality, even though a solid metric for that
value is difficult or impossible to assign universally.
Conclusion
Serious information seekers can no longer avoid the importance or quality of deep Web
information. But deep Web information is only a component of total information available.
Searching must evolve to encompass the complete Web.
Directed query technology is the only means to integrate deep and surface Web information. The
information retrieval answer has to involve both "mega" searching of appropriate deep Web sites
and "meta" searching of surface Web search engines to overcome their coverage problem. Clientside tools are not universally acceptable because of the need to download the tool and issue
effective queries to it. [62] [#fn62] Pre-assembled storehouses for selected content are also possible, but
will not be satisfactory for all information requests and needs. Specific vertical market services are
already evolving to partially address these challenges. [63] [#fn63] These will likely need to be
supplemented with a persistent query system customizable by the user that would set the queries,
search sites, filters, and schedules for repeated queries.
These observations suggest a splitting within the Internet information search market: search
directories that offer hand-picked information chosen from the surface Web to meet popular
search needs; search engines for more robust surface-level searches; and server-side contentaggregation vertical "infohubs" for deep Web information to provide answers where
comprehensiveness and quality are imperative.
Michael K. Bergman is chairman and VP, products and technology of BrightPlanet Corporation, a
Sioux Falls, SD automated Internet content-aggregation service. Although he trained for a Ph.D. in
population genetics at Duke University, he has been involved in Internet and database-software
ventures for the last decade. He was chairman of The WebTools Co., and is president and chairman
of VisualMetrics Corporation in Iowa City, IA, which developed a genome informatics data system.
He has frequently testified before the U.S. Congress on technology and commercialization issues,
and has been a keynote or invited speaker at more than 80 national industry meetings. He is also
the author of BrightPlanet's award-winning "Tutorial: A Guide to Effective Searching of the
Internet." http://completeplanet.com/Tutorials/Search/index.asp
[http://completeplanet.com/Tutorials/Search/index.asp] . You may reach him by e-mail at
mkb@brightplanet.com [mailto:mkb@brightplanet.com] .
Endnotes
1. Data for the study were collected between March 13 and 30, 2000. The study was originally
published on BrightPlanet's Web site on July 26, 2000. ([formerly
http://www.completeplanet.com/Tutorials/DeepWeb/index.asp]) Some of the references and Web
status statistics were updated on October 23, 2000, with further minor additions on February 22,
2001. [#fn1-ptr1]
2. A couple of good starting references on various Internet protocols can be found at
http://wdvl.com/Internet/Protocols/ [http://wdvl.com/Internet/Protocols/] and
http://www.webopedia.com/Internet_and_Online_Services/Internet/Internet_Protocols/.
[http://www.webopedia.com/Internet_and_Online_Services/Internet/Internet_Protocols/]
[#fn2-ptr1]
3. Tenth edition of GVU's (graphics, visualization and usability) WWW User Survey, May 14, 1999.
[formerly http://www.gvu.gatech.edu/user_surveys/survey-1998-10/tenthreport.html.] [#fn3ptr1]
4. 4a, 4b. "4th Q NPD Search and Portal Site Study," as reported by SearchEngineWatch [formerly
http://searchenginewatch.com/reports/npd.html]. NPD's Web site is at http://www.npd.com/.
[http://www.npd.com/]
[#fn4-ptr1]
[#fn4-ptr2]
5. 5a, 5b, 5c, 5d, 5e, 5f, 5g. "Sizing the Internet, Cyveillance [formerly
http://www.cyveillance.com/web/us/downloads/Sizing_the_Internet.pdf]. [#fn5-ptr1]
ptr2]
[#fn5-ptr3]
[#fn5-ptr4]
[#fn5-ptr5]
[#fn5-ptr6]
[#fn5-
[#fn5-ptr7]
6. 6a, 6b. S. Lawrence and C.L. Giles, "Searching the World Wide Web," Science 80:98-100, April
3, 1998. [#fn6-ptr1] [#fn6-ptr2]
7. 7a, 7b. S. Lawrence and C.L. Giles, "Accessibility of Information on the Web," Nature 400:107109, July 8, 1999. [#fn7-ptr1] [#fn7-ptr2]
8. See http://www.google.com. [http://www.google.com]
[#fn8-ptr1]
10. Northern Light is one of the engines that allows a "NOT meaningless" query to be issued to get
an actual document count from its data stores. See http://www.northernlight.com
[http://www.northernlight.com.] NL searches used in this article exclude its "Special Collections"
listing. [#fn10-ptr1]
11. 11a, 11b. An excellent source for tracking the currency of search engine listings is Danny
Sullivan's site, Search Engine Watch (see http://www.searchenginewatch.com
[http://www.searchenginewatch.com/] ). [#fn11-ptr1] [#fn11-ptr2]
12. See http://www.wiley.com/compbooks/sonnenreich/history.html.
[http://www.wiley.com/compbooks/sonnenreich/history.html]
[#fn12-ptr1]
13. 13a, 13b. This analysis assumes there were 1 million documents on the Web as of mid-1994.
[#fn13-ptr1]
[#fn13-ptr2]
18. Many of these databases also store their information in compressed form. Actual disk storage
space on the deep Web is therefore perhaps 30% of the figures reported in this paper. [#fn18-ptr1]
19. See further, BrightPlanet, LexiBot Pro v. 2.1 User's Manual, April 2000, 126 pp. [#fn19-ptr1]
20. This value is equivalent to page sizes reported by most search engines and is equivalent to
reported sizes when an HTML document is saved to disk from a browser. The 1999 NEC study also
reported average Web document size after removal of all HTML tag information and white space to
be 7.3 KB. While a more accurate view of "true" document content, we have used the HTML basis
because of the equivalency in reported results from search engines themselves, browser document
saving and our technology. [#fn20-ptr1]
21. Inktomi Corp., "Web Surpasses One Billion Documents," press release issued January 18,
2000; see http://www.inktomi.com/new/press/2000/billion.html
[http://www.inktomi.com/new/press/2000/billion.html] and http://www.inktomi.com/webmap/
[http://www.inktomi.com/webmap/]
[#fn21-ptr1]
22. For example, the query issued for an agriculture-related database might be "agriculture." Then,
by issuing the same query to Northern Light and comparing it with a comprehensive query that
does not mention the term "agriculture" [such as "(crops OR livestock OR farm OR corn OR rice
OR wheat OR vegetables OR fruit OR cattle OR pigs OR poultry OR sheep OR horses) AND NOT
agriculture"] an empirical coverage factor is calculated. [#fn22-ptr1]
23. The compilation sites used for initial harvest were: [#fn23-ptr1]
AlphaSearch [formerly http://www.calvin.edu/library/searreso/internet/as/]
Direct Search http://www.freepint.com/gary/direct.htmdirect.htm
[http://www.freepint.com/gary/direct.htm]
[#fn25-ptr2]
29. The surface Web domain sample was obtained by first issuing a meaningless query to Northern
Light, 'the AND NOT ddsalsrasve' and obtaining 1,000 URLs. This 1,000 was randomized to
remove (partially) ranking prejudice in the order Northern Light lists results. [#fn29-ptr1]
30. These three engines were selected because of their large size and support for full Boolean
queries. [#fn30-ptr1]
31. An example specific query for the "agriculture" subject areas is "agricultur* AND (swine OR
pig) AND 'artificial insemination' AND genetics." [#fn31-ptr1]
32. The BrightPlanet technology configuration settings were: max. Web page size, 1 MB; min. page
size, 1 KB; no date range filters; no site filters; 10 threads; 3 retries allowed; 60 sec. Web page
timeout; 180 minute max. download time; 200 pages per engine. [#fn32-ptr1]
33. The vector space model, or VSM, is a statistical model that represents documents and queries
as term sets, and computes the similarities between them. Scoring is a simple sum-of-products
computation, based on linear algebra. See further: Salton, Gerard, Automatic Information
Organization and Retrieval, McGraw-Hill, New York, N.Y., 1968; and, Salton, Gerard, Automatic
35. See the Help and then FAQ pages at [formerly http://www.invisibleweb.com]. [#fn35-ptr1]
36. K. Wiseman, "The Invisible Web for Educators," [formerly
http://www3.dist214.k12.il.us/invisible/article/invisiblearticle.html] [#fn36-ptr1]
37. C. Sherman, "The Invisible Web," [formerly
http://websearch.about.com/library/weekly/aa061199.htm] [#fn37-ptr1]
38. 38.I. Zachery, "Beyond Search Engines," presented at the Computers in Libraries 2000
Conference, March 15-17, 2000, Washington, DC; [formerly
http://www.pgcollege.org/library/zac/beyond/index.htm] [#fn38-ptr1]
39. The initial July 26, 2000, version of this paper stated an estimate of 100,000 potential deep
Web search sites. Subsequent customer projects have allowed us to update this analysis, again
using overlap analysis, to 200,000 sites. This site number is updated in this paper, but overall deep
Web size estimates have not. In fact, still more recent work with foreign language deep Web sites
strongly suggests the 200,000 estimate is itself low. [#fn39-ptr1]
40. Alexa Corp., "Internet Trends Report 4Q 99." [#fn40-ptr1]
41. B.A. Huberman and L.A. Adamic, "Evolutionary Dynamics of the World Wide Web," 1999; see
http://www.hpl.hp.com/research/idl/papers/webgrowth/
[http://www.hpl.hp.com/research/idl/papers/webgrowth/]
[#fn41-ptr1]
42. The Northern Light total deep Web sites count is based on issuing the query "search OR
database" to the engine restricted to Web documents only, and then picking its Custom Folder on
Web search engines and directories, producing the 27,195 count listing shown. Hand inspection of
the first 100 results yielded only three true searchable databases; this increased in the second 100
to 7. Many of these initial sites were for standard search engines or Web site promotion services.
We believe the yield of actual search sites would continue to increase with depth through the
results. We also believe the query restriction eliminated many potential deep Web search sites.
Unfortunately, there is no empirical way within reasonable effort to verify either of these assertions
nor to quantify their effect on accuracy. [#fn42-ptr1]
43. 1024 bytes = I kilobyte (KB); 1000 KB = 1 megabyte (MB); 1000 MB = 1 gigabyte (GB); 1000
GB = 1 terabyte (TB); 1000 TB = 1 petabyte (PB). In other words, 1 PB = 1,024,000,000,000,000
bytes or 1015. [#fn43-ptr1]
44. 44a, 44b. Our original paper published on July 26, 2000, use d estimates of one billion surface
Web documents and about 100,000 deep Web sea rchable databases. Since publication, new
information suggests a total of about 200,000 deep Web searchable databases. Since surface Web
document growth is no w on the order of 2 billion documents, the ratios of surface to Web
documents ( 400 to 550 times greater in the deep Web) still approximately holds. These tren ds
would also suggest roughly double the amount of deep Web data storage to fifteen petabytes than is
indicated in the main body of the report. [#fn44-ptr1] [#fn44-ptr2]
45. We have not empirically tested this assertion in this study. However, from a logical standpoint,
surface search engines are all indexing ultimately the same content, namely the public indexable
Web. Deep Web sites reflect information from different domains and producers.
46. M. Hofstede, pers. comm., Aug. 3. 2000, referencing http://www.alba36.com/ [formerly
http://www.alba36.com/]. [#fn46-ptr1]
47. As reported in Sequoia Software's IPO filing to the SEC, March 23, 2000; see
http://www.10kwizard.com/filing.php?
repo=tenk&ipage=1117423&doc=1&total=266&back=2&g=. [http://www.10kwizard.com/filing.php?
repo=tenk&ipage=1117423&doc=1&total=266&back=2&g=]
[#fn47-ptr1]
48. 48a, 48b, 48c. P. Lyman and H.R. Varian, "How Much Information," published by the UC
Berkeley School of Information Management and Systems, October 18. 2000. See
http://www.sims.berkeley.edu/research/projects/how-much-info/index.html
[http://www2.sims.berkeley.edu/research/projects/how-much-info/] . The comparisons here are limited to
archivable and retrievable public information, exclusive of entertainment and communications
content such as chat or e-mail. [#fn48-ptr1] [#fn48-ptr2] [#fn48-ptr3]
49. As this analysis has shown, in numerical terms the deep Web already dominates. However,
from a general user perspective, it is unknown. [#fn49-ptr1]
50. See http://lcweb.loc.gov/z3950/. [http://lcweb.loc.gov/z3950/]
[#fn50-ptr1]
[#fn51-ptr1]
[#fn53-ptr1]
54. From Advanced Digital Information Corp., Sept. 1, 1999, SEC filing; [formerly
http://www.tenkwizard.com/fil_blurb.asp?iacc=991114 & exp=terabytes%20and%20online &
g=">http://www.tenkwizard.com/fil_blurb.asp?iacc=991114 & exp=terabytes%20and%20online &
g=]. [#fn54-ptr1]
55. See http://www.10kwizard.com/. [http://www.10kwizard.com/]
[#fn55-ptr1]
56. Though the Open Directory is licensed to many sites, including prominently Lycos and
Netscape, it maintains its own site at http://dmoz.org. [http://dmoz.org/] An example of a node
reference for a static page that could be indexed by a search engine is:
http://dmoz.org/Business/E-Commerce/Strategy/New_Business_Models/EMarkets_for_Businesses/ [formerly http://dmoz.org/Business/ECommerce/Strategy/New_Business_Models/E-Markets_for_Businesses/]. One characteristic of
most so-called search directories is they present their results through a static page structure. There
are some directories, LookSmart most notably, that present their results dynamically. [#fn56-ptr1]
57. As of Feb. 22, 2001, the Open Directory Project was claiming more than 345,000 categories.
[#fn57-ptr1]
58. See previous reference. This number of categories may seem large, but is actually easily
achievable, because subject node number is a geometric progression. For example, the URL
example in the previous reference represents a five-level tree: 1 - Business; 2 - E-commerce; 3 Strategy; 4 - New Business Models; 5 - E-markets for Businesses. The Open Project has 15 top-level
node choices, on average about 30 second-level node choices, etc. Not all parts of these subject
trees are as complete or "bushy" as other ones, and some branches of the tree extend deeper
because there is a richer amount of content to organize. Nonetheless, through this simple
progression of subject choices at each node, one can see how total subject categories - and the static
pages associated with them for presenting result - can grow quite large. Thus, for a five-level
structure with an average number or node choices at each level, Open Directory could have ((15 *
30 * 15 * 12 * 3) + 15 + 30 + 15 + 12) choices, or a total of 243,072 nodes. This is close to the
248,000 nodes actually reported by the site. [#fn58-ptr1]
59. See http://info.ox.ac.uk/bnc/. [http://info.ox.ac.uk/bnc/]
[#fn59-ptr1]
60. Assumptions: SURFACE WEB: for single surface site searches - 16% coverage; for metasearch
surface searchers - 84% coverage [higher than NEC estimates in reference 4; based on empirical
BrightPlanet searches relevant to specific topics]; 4.5% quality retrieval from all surface searches.
DEEP WEB: 20% of potential deep Web sites in initial CompletePlanet release; 200,000 potential
deep Web sources; 13.5% quality retrieval from all deep Web searches. [#fn60-ptr1]
61. Online Computer Library Center, Inc., "June 1999 Web Statistics," Web Characterization
Project, OCLC, July 1999. See the Statistics section in http://wcp.oclc.org/ [http://wcp.oclc.org/] .
[#fn61-ptr1]
62. Most surveys suggest the majority of users are not familiar or comfortable with Boolean
constructs or queries. Also, most studies suggest users issue on average 1.5 keywords per query;
even professional information scientists issue 2 or 3 keywords per search. See further
BrightPlanet's search tutorial at http://www.completeplanet.com/searchresources/tutorial.htm.
[#fn62-ptr1]
Some of the information in this document is preliminary. BrightPlanet plans future revisions as
better information and documentation is obtained. We welcome submission of improved
information and statistics from others involved with the Deep Web. Copyright BrightPlanet
Corporation. This paper is the property of BrightPlanet Corporation. Users are free to copy and
distribute it for personal use.
jep-info@umich.edu
ISSN 1080-2711
Bryce Westlake
Martin Bouchard
School of Criminology
Simon Fraser University
Burnaby, BC, Canada
School of Criminology
Simon Fraser University
Burnaby, BC, Canada
bwestlak@sfu.ca
mbouchard@sfu.ca
rfrank@sfu.ca
ABSTRACT
The emergence of the Internet has provided people with the ability
to find and communicate with others of common interests.
Unfortunately, those involved in the practices of child
exploitation have also received the same benefits. Although law
enforcement continues its efforts to shut down websites dedicated
to child exploitation, the problem remains uncurbed. Despite this,
law enforcement has yet to examine these websites as a network
and determine their structure, stability and susceptibleness to
attack. We extract the structure and features of four online child
exploitation networks using a custom-written webpage crawler.
Social network analysis is then applied with the purpose of
finding key players websites whose removal would result in the
greatest fragmentation of the network and largest loss of hardcore
material. Our results indicate that websites do not link based on
the hardcore content of the target website; however, blogs do
contain more hardcore content per page than non-blog websites.
General Terms
Algorithms, Measurement, Security, Human Factors
Keywords
Child exploitation, social network analysis, target prioritization,
Internet
1. INTRODUCTION
It is estimated that 1.8 billion individuals worldwide use the
Internet, with 260 million users being from North America [13].
Of the 1.8 billion users, adolescents and college students make up
the largest proportion [10, 24, 31]. Through access at home and at
school, it is estimated that 90% of youth have regular access to the
Internet [5]. Although the vast majority of individuals who use the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, to republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
ISI-KDD 2010, July 25, 2010, Washington, D.C., USA
Copyright 2010 ACM ISBN 978-1-4503-0223-4/10/07 $10.00.
Internet for sexual pursuits do so in a safe and legal way [4, 9],
the anonymity of the Internet has resulted in a growing percentage
who sexually solicit youth [23]. What makes this problem worse
is the ease with which one can obtain illegal pornographic
material [30, 35]. Searching the words boy, teen, or child,
brings up countless websites and photos of youth in sexually
exploitive roles [24, 34].
The growth of the Internet has resulted in a substantial
increase in research aimed at understanding online networks [8,
17, 29, 33]. However, most of the research to date has focused on
the structure of social networking websites such as Facebook and
MySpace, and has stopped short of investigating child
exploitation networks. This is despite the United Nations
announcement that there are more than four million websites
containing child pornography [6].
Much of the existing efforts to curb child exploitation have
taken the form of Internet chat room stings and injunctions against
online groups seen to be facilitating the proliferation of child
sexual abuse (e.g., North American Man-Boy Love Association,
Pedophile Information Network, Freespirit and BoyChat). At
times this process has come against roadblocks from those who
argue Internet stings are a form of entrapment1 [7]. In addition,
website owners often find loopholes, arguing that their websites
are merely support forums that do not host exploitative material
and that they cannot be held responsible for the private messages
people send back and forth, that may or may not contain
information on obtaining illegal material2.
As online child exploitation is seen as a global issue, the
United Nations International Criminal Police Organization
(INTERPOL) has taken a leading role in addressing the problem.
One of the ways child exploitation has been combated is with the
creation of a database containing all known sexually explicit
photos of children (the International Child Sexual Exploitation
image database) [14]. Additionally, INTERPOL partners with the
COSPOL Internet Related Child Abuse Material Project and the
Virtual Global Taskforce to help coordinate multi-country
investigations and spread awareness of the problem. These efforts
1
For instance, one of the most well know sites Free Spirits state
that the sites linked from these pages are operated by private
citizens exercising their right to free speech under the U.S.
Constitution and Universal International Human Rights
Convention [12].
3:
4:
5:
6:
7:
8:
9:
FollowedPages FollowedPages + P
10:
11:
12:
13:
14:
15:
16:
DL domain of L
17:
18:
19:
//initialize variables
2. METHODS
3
a) Blog A
b) Blog B
c) Site A
d) Site B
Figure 2 The 4 networks
Website
# of Pages on
Starting Website
Severity Score
% of All Websites
Connected To It
Degree Centrality
(Normalized)
Blog A
285
62.93
100.0
100.0
Blog B
583
1.82
81.8
81.6
Site A
237
80.19
78.8
78.6
Site B
2.00
27.1
68.4
Blog A
Blog B
Site A
Site B
Density (Ties)
0.13 (1214)
0.21 (2006)
0.09 (866)
0.04 (371)
Severity Score
High
0.23 (n=22)
0.37 (n=27)
0.11 (n=23)
0.08 (n=27)
Density
Low
0.12 (n=77)
0.16 (n=72)
0.08 (n=76)
0.02 (n=69)
0.39
0.48
0.28
0.22
Clustering Coefficient
0.04
0.06
0.04
0.02
Out-Degree
88.38%
61.58%
70.36%
65.03%
In-Degree
71.89%
31.65%
60.05%
36.31%
0.25
0.37
0.17
0.02
Fragmentation
Centralization
Reciprocity
3. RESULTS
First, we draw on SNA to examine the structure of the four
extracted networks. More specifically, we derive the following
measures:
Density: the percentage of network connections present
in relation to all possible network connections [11, 15]
Clustering coefficient: the likelihood that two websites,
both connected to another website, are connected to
each other [11, 19]
Fragmentation: the percentage of the network
connections disconnected by the removal of any one
website [3]
In-degree centrality: for website a it is the number of
other network websites that links to a
Out-degree: it is based on how many other websites
website a links to [11]
The starting page for Site B was a front page for a much larger
site. For example, all sections of the website www.hostsite.xxx
followed the url www.section.hostsite.xxx. Therefore, the
number of pages and hardcore words are low as there were no
additional pages on the front page.
Blog A
Blog B
Site A
Site B
Boy
60.82
35.89
55.78
70.59
Girl
0.61
4.90
0.43
4.54
Child
1.42
4.20
1.06
6.42
Love
6.75
30.53
19.15
7.66
Teen
4.09
2.30
4.04
0.95
Lolli*
0.00
0.15
0.00
0.04
Young
2.42
3.73
1.70
0.48
Bath*
0.02
0.01
0.12
0.04
Innocent
0.01
0.04
0.00
0.03
Smooth/Hairless
0.21
0.27
0.16
0.41
Mastur*
0.65
0.03
0.27
0.03
Sex
9.58
8.95
8.70
2.46
Penis
2.76
1.10
0.30
0.06
Vagina
0.00
0.12
0.00
0.03
Anal
4.23
3.38
1.03
4.17
Oral
0.74
0.45
0.35
0.55
Naked
5.42
3.27
6.70
0.81
Virgin
0.06
0.40
0.04
0.32
Blog A
Blog B
Site A
Site B
99
99
99
96
Number of
Pages/Website
Range
0-651
0-470
0-1,420
0-1,575
Average
405
265
268
394
Hardcore
Words
Average
(Range)
1501
(0-14,203)
9352
(0-133,526)
7435
(0-107,016)
1287
(0-21,226)
Average/Page
(Range)
3
(0-27)
38
(0-583)
52
(0-593)
3
(0-30)
% of Keywords
23.64
17.98
17.55
8.83
Average
(Range)
6,847
(0-41,588)
30,214
(0-298,602)
34,934
(0-63,951)
13,283
(0-617,748)
Average/Page
(Range)
15
(0-93)
108
(0-1061)
97
(0-896)
39
(0546)
% of Keywords
76.37
82.02
82.45
91.17
Range
0-45,061
0-380,348
0-746,526
0-618,586
Softcore
Words
Total Words
Average
8,348
39,566
42,369
14,570
Network Total
3,917,045
826,441
4,194,544
1,398,756
Website
Ranking
Blog A
In-degree
Severity
Blog B
In-degree
Severity
Site A
In-degree
Severity
Site B
In-degree
Severity
82
1.20
50
2.05
77
80.19
38
3.50
80
1.20
49
14.17
23
4.01
15
1.12
79
1.00
48
3.61
22
62.00
13
1.17
79
1.13
47
4.16
19
29.00
12
10.04
78
0.91
46
2.34
18
17.36
10
9.64
30
62.93
45
2.23
17
119.00
0.15
26
1.13
45
4.31
17
36.00
0.33
25
51.21
45
3.27
16
39.56
0.75
22
20.94
43
7.89
16
12.39
11.00
10
22
10.07
42
1.97
15
137.00
3.76
52.30
15.17
46.00
4.60
18.40
53.65
13.10
4.15
12.26
38.26
20.26
3.16
8.75
52.21
3.86
2.98
Mean for
Top 10
Mean for
Network
Figure 8 - Top 10 In-degree websites in each network compared to the overall network
significant, the pattern suggested that hardcore blogs and sites
have a tendency to reach out more to others (r=0.10, 0.13, 0.12, 0.05) than others reach out to them (r=-0.09, 0.04, -0.16, 0.12).
These findings support the previous analyses that there are mega
websites with a lot of material and a lot of connections, as well as
small independent websites, with only a little bit of material and
relatively unconnected to the rest of the network. Put another way,
the mean number of hardcore words per website and per page are
mainly driven by several extreme websites on both ends (websites
with a lot of content and websites with little to no content).
4. DISCUSSION
The Internet has changed the way society communicates and
obtains information. Despite the positive contributions the
Internet has made to society, it has also created a new avenue
Website
Ranking
Blog A
Blog B
Site A
Site B
In-degree
Severity
In-degree
Severity
In-degree
Severity
In-degree
Severity
583.08
27.50
593.00
30.00
543.14
12
24.07
531.00
26.00
193.71
36
17.33
431.00
16.00
177.53
49
14.17
292.45
14.92
15
130.06
18
12.97
244.00
13.48
10
128.05
30
11.39
244.00
12.00
125.13
11.11
182.00
11.00
14
123.95
20
10.40
11
149.88
10.08
16
113.83
10.00
146.00
12
10.04
10
95.81
37
8.90
137.00
10.00
Mean for
Top 10
9.50
221.43
21.5
14.78
3.20
295.03
4.60
15.35
Mean for
Network
12.26
38.26
20.26
3.16
8.75
52.21
3.86
2.98
5. CONCLUSIONS
The current study drew on social network analysis to
examine the content and structure of online child exploitation
networks. We extracted the structure and features of child
exploitation networks by performing a guided crawl of the
Internet. Our crawler, CENE, was guided by a set of keywords,
and exclusion websites, which kept it on topic. This provided very
focused networks for analysis.
Using social network analysis we attempted to find the key
playersthose websites displaying a combination of connectivity
and hardcore material. This analysis looked at two types of
websites: blogs and sites, covering four independent starting
points. Our results indicate, first, that the presence of hardcore
content is not the basis for linkages between websites. Second,
that blogs contain more hardcore content per page than sites.
Although this exploratory study has made substantial
additions to our current understanding of online child
exploitation, it has also laid the groundwork for the incorporation
of SNA into future research on this topic. Subsequent research
needs to expand on the network size(s) and shift to a more
detailed analysis of the attributes, including the content of forums,
6. ACKNOWLEDGEMENTS
Partial funding for this project was provided by the
International Cybercrime Research Centre, Simon Fraser
University.
7. REFERENCES
1)
2)
3)
4)
5)
6)
from
http://abcnews.go.com/Technology/wireStory?id=8591118
7)
8)
9)