Documente Academic
Documente Profesional
Documente Cultură
1
as YouTube, via a phenomenon called “raids.”
Contributions. In summary, this paper makes several con-
tributions. First, we provide a large scale analysis of /pol/’s
posting behavior, showing the impact of 4chan’s unique fea-
tures, that /pol/ users are spread around the world, and that,
although posters remain anonymous, /pol/ is filled with many
different voices. Next, we show that /pol/ users post many links
to YouTube videos, tend to favor “right-wing” news sources,
and post a large amount of unique images. Finally, we pro-
vide evidence that there are numerous instances of individual
YouTube videos being “raided,” and provide a first metric for
measuring such activity.
Paper Organization. The rest of the paper is organized as fol-
lows. Next section provides an overview of 4chan and its main
characteristics, then, Section 3 reviews related work, while
Section 4 discusses our dataset. Then, Section 5 and Section 6
present, respectively, a general characterization and a content
analysis of /pol/. Finally, we analyze raids toward other ser-
vices in Section 7, while the paper concludes in Section 8.
2 4chan
4chan.org is an imageboard site. A user, the “original poster” Figure 1: Examples of typical /pol/ threads. (A) illustrates the
(OP), creates a new thread by posting a message, with an im- derogatory use of “cuck” in response to a Bernie Sanders image; (B)
age attached, to a board with a particular topic. Other users a casual call for genocide with an image of a woman’s cleavage and
can also post in the thread, with or without images, and refer a “humorous” response; (C) /pol/’s fears that a withdrawal of Hillary
to previous posts by replying to or quoting portions of it. Clinton would guarantee Trump’s loss; (D) shows Kek, the “God” of
memes, via which /pol/ “believes” they influence reality.
Boards. As of January 2017, 4chan features 69 boards, split
into 7 high level categories, e.g., Japanese Culture (9 boards)
or Adult (13 boards). In this paper, we focus on /pol/, the “Po- (and only that thread), each poster is given a unique ID that
litically Incorrect” board.2 Figure 1 shows four typical /pol/ appears along with their post, using a combination of cookies
threads. Besides the content, the figure also illustrates the re- and IP tracking. This preserves anonymity, but mitigates low-
ply feature (‘»12345’ is a reply to post ‘12345’), as well as effort sock puppeteering. To the best of our knowledge, /pol/ is
other concepts discussed below. Aiming to create a baseline to currently the only board with poster IDs enabled.
compare /pol/ to, we also collect posts from two other boards:
“Sports” (/sp/) and “International” (/int/). The former focuses Flags. /pol/, /sp/, and /int/ also include, along with each post,
on sports and athletics, the latter on cultures, languages, etc. the flag of the country the user posted from, based on IP geo-
We choose these two since they are considered “safe-for-work” location. This is meant to reduce the ability to “troll” users by,
boards, and are, according to 4chan rules, more heavily mod- e.g., claiming to be from a country where an event is happening
erated, but also because they display the country flag of the OP, (even though geo-location can obviously be manipulated using
which we discuss next. VPNs and proxies).
Anonymity. Users do not need an account to read/write posts. Ephemerality. Each board has a finite catalog of threads.
Anonymity is the default (and preferred) behavior, but users Threads are pruned after a relatively short period of time via a
can enter a name along with their posts, even though they can “bumping system.” Threads with the most recent post appear
change it with each post if they wish. Naturally, anonymity first, and creating a new thread results in the one with the least
here is meant to be with respect to other users, not the site or recent post getting removed. A post in a thread keeps it alive
the authorities, unless using Tor or similar tools.3 by bumping it up, however, to prevent a thread from never get-
Tripcodes (hashes of user-supplied passwords) can be used ting purged, 4chan implements bump and image limits. After a
to “link” threads from the same user across time, providing a thread is bumped N times or has M images posted to it (with
way to verify pseudo-identity. On some boards, intra-thread N and M being board-dependent), new posts will no longer
trolling led to the introduction of poster IDs. Within a thread bump it up. Originally, when a thread fell out of the catalog, it
was permanently gone, however, an archive system for a sub-
2 http://boards.4chan.org/pol/
3 In
set of boards has recently been implemented: once a thread is
fact, moot (4chan’s creator) reported turning server logs and other records
over to the FBI. See http://www.thesmokinggun.com/buster/fbi/turns-out- purged, its final state is archived for a relatively short period of
4chan-not-lawless-it-seems. time – currently seven days.
2
Moderation. 4chan’s moderation policy is generally lax, es- /pol/ /sp/ /int/ Total
pecially on /pol/. So-called janitors, volunteers periodically Threads 216,783 14,402 24,873 256,058
recruited from the user base, can prune posts and threads, as Posts 8,284,823 1,189,736 1,418,566 10,893,125
well as recommend users to be banned by more “senior” 4chan Table 1: Number of threads and posts crawled for each board.
employees. Generally speaking, although janitors are not well
respected by 4chan users and are often mocked for their per-
tween content sensitivity and a user’s choice to be anonymous.
ceived love for power, they do contribute to 4chan’s continu-
[15] analyze user behavior on Ask.fm by building an “interac-
ing operation, by volunteering work on a site that is somewhat
tion graph” between 30K profiles. They characterize users in
struggling to stay solvent [29].
terms of positive/negative behavior and in-degree/out-degree,
and analyze the relationships between these factors.
Another line of work focuses on detecting hate speech. [11]
3 Related Work propose a word embedding based detection tool for hate
While 4chan constantly attracts considerable interest in the speech on Yahoo Finance. [20] also perform hate speech de-
popular press [5, 16], there is very little scientific work ana- tection on Yahoo Finance and News data, using a supervised
lyzing its ecosystem. To the best of our knowledge, the only classification methodology. [8] characterize anti-social behav-
measurement of 4chan is the work by [6], who study the “ran- ior in comments sections of a few popular websites and predict
dom” board on 4chan (/b/), the original and most active board. accounts on those sites that will exhibit anti-social behavior.
Using a dataset of 5.5M posts from almost 500K threads col- Although we observe some similar behavior from /pol/ users,
lected over a two-week period, they focus on analyzing the our work is focused more on understanding the platform and
anonymity and ephemerality characteristics of 4chan. They organization of semi-organized campaigns of anti-social be-
find that over 90% of posts are made by anonymous users, and, havior, rather than identifying particular users exhibiting such
similar to our findings, that the “bump” system affects threads’ behavior.
evolution, as the median lifetime of a /b/ thread is only 3.9mins
(and 9.1mins on average). Our work differs from [6] in several
aspects. First, their study is focused on one board (/b/) in a 4 Datasets
self-contained fashion, while we also measure how /pol/ af- On June 30, 2016, we started crawling 4chan using its JSON
fects the rest of the Web (e.g., via raids). Second, their content API.4 We retrieve /pol/’s thread catalog every 5 minutes and
analysis is primarily limited to a typology of thread types. Via compare the threads that are currently live to those in the pre-
manual labeling of a small sample, they determined that 7% viously obtained catalog. For each thread that has been purged,
of posts on /b/ are a “call for action,” which includes raiding we retrieve a full copy from 4chan’s archive, which allows us
behavior. In contrast, our analysis goes deeper, looking at post to obtain the full/final contents of a thread. For each post in a
contents and raiding in a quantitative manner. Finally, using thread, the API returns, among other things, the post’s number,
some of the features unique to /pol/, /int/, and /sp/, we are also its author (e.g., “Anonymous”), timestamp, and contents of the
able to get a glimpse of 4chan’s user demographics, which is post (escaped HTML). Although our crawler does not save im-
only speculated about in [6]. ages, the API also includes image metadata, e.g., the name the
[23] analyze the influence of anonymity on aggression and image is uploaded with, dimensions (width and height), file
obscene lexicon by comparing a few anonymous forums and size, and an MD5 hash of the image. On August 6, 2016 we
social networks. They focus on Russian-language platforms, also started crawling /sp/, 4chan’s sports board, and on August
and also include 2M words from 4chan, finding no correlation 10, 2016 /int/, the international board. Table 1 provides a high
between anonymity and aggression. In follow-up work [24], level overview of our datasets. We note that for about 6% of
4chan posts are also used to evaluate automatic verbal aggres- the threads, the crawler gets a 404 error: from a manual in-
sion detection tools. spection, it seems that this is due to “janitors” (i.e., volunteer
Other researchers have also analyzed social media plat- moderators) removing threads for violating rules.
forms, besides 4chan, characterized by (semi-)anonymity The analysis presented in this paper considers data crawled
and/or ephemerality. [9] study the differences between con- until September 12, 2016, except for the raids analysis pre-
tent posted on anonymous and non-anonymous social me- sented later on, where we considered threads and YouTube
dia, showing that linguistic differences between Whisper posts comments up to Sept. 25. We also use a set of 60,040,275
(anonymous) and Twitter (non-anonymous) are significant, tweets from Sept. 18 to Oct. 5, 2016 for a brief comparison in
and they train classifiers to discriminate them (with 73% accu- hate speech usage. We note that our datasets are available to
racy). [22] analyze users’ anonymity choices during their ac- other researchers upon request.
tivity on Quora, identifying categories of questions for which Ethical considerations. Our study has obtained approval by
users are more likely to seek anonymity. They also perform an the designated ethics officer at UCL. We note that 4chan posts
analysis of Twitter to study the prevalence and behavior of so- are typically anonymous, however, analysis of the activity gen-
called “anonymous” and “identifiable” users, as classified by
Amazon Mechanical Turk workers, and find a correlation be- 4 https://github.com/4chan/4chan-API
3
board ● /int/ /pol/ /sp/
150
100
50 ●
● ●
●
●
●
●● ●●● ●● ● ● ●● ●●●
●●●● ●● ●●●●●● ●● ● ●●●●●●● ●●●
●● ● ● ● ● ● ●●● ● ● ●● ●●●●●● ●● ● ●●●● ●●
● ● ● ● ●● ●● ●●●●● ● ●● ●● ●
● ●●●●●●● ●● ● ● ●
● ●●● ●●●
●●● ●●
●●
●●●●●● ● ● ●●●● ●●● ●
● ●●● ●● ● ● ●
● ● ●●● ●●● ● ● ●● ●●●●
● ●● ● ● ●
●
0
0 50 100 150
Hour of week 4.64e−08 0.000506
Figure 2: Average number of new threads per hour of the week. Figure 3: Heat map of the number of new /pol/ threads created per
country, normalized by Internet-using population. The darker the
country, the more participation in /pol/ it has, relative to its real-world
erated by links on 4chan to other services could be potentially Internet using population.
used to de-anonymize users. To this end, we followed standard
ethical guidelines [25], encrypting data at rest, and making no
attempt to de-anonymize users. We are also aware that con- 1.00 100
tent posted on /pol/ is often highly offensive, however, we do
10−1
not censor content in order to provide a comprehensive analy- 0.75
sis of /pol/, but warn readers that the rest of this paper features 10−2
CCDF
language likely to be upsetting. CDF
0.50
10−3
board
/int/
0.25 10−4
/pol/
5.1 Posting Activity in /pol/ Number of posts per thread Number of posts per thread
Our first step is a high-level examination of posting activ- Figure 4: Distributions of the number of posts per thread on /pol/,
ity. In Figure 2, we plot the average number of new threads /int/, and /sp/. We plot both the CDF and CCDF to show both typical
threads as well as threads that reach the bump limit. Note that the
created per hour of the week, showing that /pol/ users create
bump limit for /pol/ and /int/ is 300 at the time of this writing, while
one order of magnitude more threads than /int/ and /sp/ users at for /sp/ it is 500.
nearly all hours of the day. Then, Figure 3 reports the number
of new threads created per country, normalized by the coun-
try’s Internet-using population.5 Although the US dominates
in total thread creation (visible by the timing of the diurnal pat- being played. The effects of the bump limit are evident on all
terns from Figure 2), the top 5 countries in terms of threads per three boards. The bump limit is designed to ensure that fresh
capita are New Zealand, Canada, Ireland, Finland, and Aus- content is always available, and Figure 4 demonstrates this: ex-
tralia. 4chan is primarily an English speaking board, and in- tremely popular threads have their lives cut short earlier than
deed nearly every post on /pol/ is in English, but we still find the overall distribution would imply and are eventually purged.
that many non-English speaking countries – e.g., France, Ger-
many, Spain, Portugal, and several Eastern European countries We then investigate how much content actually violates the
– are represented. This suggests that although /pol/ is consid- rules of the board. In Figure 5, we plot the CDF of the max-
ered an “ideological backwater,” it is surprisingly diverse in imum number of posts per thread observed via the /pol/ cata-
terms of international participation. log, but for which we later receive a 404 error when retrieving
Next, in Figure 4, we plot the distribution of the number the archived version – i.e., threads that have been deleted by
of posts per thread on /pol/, /int/, and /sp/, reporting both the a janitor or moved to another board. Surprisingly, there are
cumulative distribution function (CDF) and the complemen- many “popular” threads that are deleted, as the median num-
tary CDF (CCDF). All three boards are skewed to the right, ber of posts in a deleted /pol/ thread is around 20, as opposed
exhibiting quite different means (38.4, 57.1, and 82.9 for /pol/, to 7 for the threads that are successfully archived. For /int/,
/int/, and /sp/, respectively) and medians (7.0, 12.0, 12.0) – i.e., the median number of posts in a deleted thread (5) is appre-
there are a few threads with a substantially higher number of ciably lower than in archived threads (12). This difference is
posts. One likely explanation for the average length of /sp/ likely due to: 1) /int/ moving much slower than /pol/, so there is
threads being larger is that users on /sp/ make “game threads” enough time to delete threads before they become overly pop-
where they discuss a professional sports game live, while it is ular, and/or 2) /pol/’s relatively lax moderation policy, which
allows borderline threads to generate many posts before they
5 Obtained from http://www.internetlivestats.com/internet-users/ end up “officially” violating the rules of the board.
4
1.00 1.00 100
10−1
0.75 0.75
board board
10−2
CCDF
/int/ /int/
CDF
CDF
0.50 0.50
/pol/ /pol/
Bump limit reached?
/sp/ /sp/ 10−3
No
Yes
0.25 0.25
−4
10
0.00 0.00
10−5
10 1000 10 100 1000 100 101 102
Maximum number of posts per non−archived thread observed Number of posts Number of posters per thread
Figure 5: CDF of the number of posts for non- Figure 6: CDF of the number of posts per Figure 7: CCDF of the number of unique
archived threads (i.e., likely deleted). unique tripcode. posters per thread.
CDF
0.50 /int/
tributes – i.e., the use of tripcodes and poster IDs – to pro- /pol/
/sp/
vide an overview of both micro-level interactions and individ- 0.25
5
/pol/ /int/ /sp/ 180000
Country Avg. Replies Country Avg. Replies Country Avg. Replies Unranked
160000 Top 1M
China 1.57 Thailand 1.13 Slovenia 0.91
Top 100k
Pakistan 1.42 Algeria 1.12 Japan 0.84 140000
Top 1K
Japan 1.35 Jordan 1.04 Bulgaria 0.81 120000 Top 10
Egypt 1.33 S. Korea 1.02 Sweden 0.75
100000
Tri. & Tob. 1.28 Ukraine 1.00 Israel 0.74
Israel 1.27 Viet Nam 0.97 Argentina 0.72 80000
S. Korea 1.20 Tunisia 0.97 India 0.72
60000
Turkey 1.18 Israel 0.97 Greece 0.72
UAE 1.20 Hong Kong 0.92 Puerto Rico 0.70 40000
Bangladesh 1.15 Macedonia 0.91 Australia 0.68 20000
6
100
10−1
10−2
CCDF
10−3
Figure 10: CCDF of the number of posts ex- Figure 11: Percentage of posts on /pol/ the Figure 12: World map colored by content analy-
act duplicate images appeared in on /pol/. top 15 most popular hate words appear in. sis based clustering.
7
5
3 kikenorm
% of posts
niggernorm
skypenorm
2 googlenorm
6
01
01
01
01
01
2 2
9 2
6 2
3 2
0 2
p 0
p 0
p 1
p 2
p 3
Se
Se
Se
Se
Se
4.15 9.82 12.5 30.7 date
Figure 14: Heat map showing the percentage of posts with hate Figure 15: The effects of Operation Google within /pol/.
speech per country. [Best viewed in color.]
nomic and immigration crisis, and people from Turkey about or a Twitter hashtag – and the text “you know what to do,”
the attempted coup in July 2016. prompting other 4chan users to start harassing the target. The
thread itself often becomes an aggregation point with screen-
Clustering. To provide more evidence for the conclusion that shots of the target’s reaction, sharing of sock puppet accounts
/pol/ is geo-politically diverse, we perform some basic text
used to harass, etc.
classification and evaluate whether or not different parts of the In this section, we study how raids on YouTube work. We
world are talking about “similar” topics. We apply spectral show that synchronization between /pol/ threads and YouTube
clustering over the vectors using the Eigengap heuristic [19] comments is correlated with an increase in hate speech in
to automatically identify the number of target clusters. In Fig- the YouTube comments. We further show evidence that the
ure 12, we present a world map colored according to the 8 clus- synchronization is correlated with a high degree of overlap
ters generated. Indeed, we see the formation of geo-political in YouTube commenters. First, however, we discuss a case
“blocks.” Most of Western Europe is clustered together, and study of a very broad-target raid, attempting to mess with anti-
so are USA and Canada, while the Balkans are in a cluster trolling tools by substituting racially charged words with com-
with Russia. One possible limitation stemming from our spec- pany names, e.g., “googles.”
tral clustering is its sensitivity to the total number of countries
we are attempting to cluster. Indeed, we find that, by filter-
ing out fewer countries based on number of posts, the clusters 7.1 Case Study: “Operation Google”
do change. For instance, if we do not filter any country out, We now present with a case study of a very broad-target
France is clustered with former French colonies and territories, raid, attempting to mess with anti-trolling tools by substituting
Spain with South America, and a few of the Nordic countries racially charged words with company names, e.g., “googles.”
flip between the Western Europe and the North American clus- On September 22, 2016, a thread on /pol/ called for the exe-
ters. Additionally, while /pol/ posts are almost exclusively in cution of so-called “Operation Google,” in response to Google
English, certain phrasings, misspellings, etc. from non native announcing the introduction of anti-trolling machine learning
speakers might also influence the clustering. That said, the based technology [13] and similar initiatives on Twitter [14].
overall picture remains consistent: the flags associated with It was proposed to poison these by using, e.g., “Google” in-
/pol/ posts are meaningful in terms of the topics those posts stead of “nigger” and “Skype” for “kike,” calling other users
talk about. to disrupt social media sites like Twitter, and also recommend-
ing using certain hashtags, e.g., #worthlessgoogs and #google-
hangout. By examining the impact of Operation Google on
7 Raids Against Other Services both /pol/ and Twitter, we aim to gain useful insight into just
As discussed previously, /pol/ is often used to post links to how efficient and effective the /pol/ community is in acting in
other sites: some are posted to initiate discussion or provide a coordinated manner.
additional commentary, but others serve to call /pol/ users to In Figure 15, we plot the normalized usage of the specific
certain coordinated actions, including attempts to skew post- replacements called for in the Operation Google post. The
debate polls [10] as well as “raids” [1]. effects within /pol/ are quite evident: on Sep 22 we see the
Broadly speaking, a raid is an attempt to disrupt another site, word “google” appearing at over 5 times its normal rate, while
not from a network perspective (as in a DDoS attack), but from “Skype” appears at almost double its normal rate. To some ex-
a content point of view. I.e., raids are not an attempt to directly tent, this illustrates how quickly /pol/ can execute on a raid, but
attack a 3rd party service itself, but rather to disrupt the com- also how short of an attention span its users have: by Sep 26
munity that calls that service home. Raids on /pol/ are semi- the burst in usage of Google and Skype had died down. While
organized: we anecdotally observe a number of calls for ac- we still see elevated usages of “Google” and “Skype,” there
tion [6] consisting of a link to a target – e.g., a YouTube video is no discernible change in the usage of “nigger” or “kike,”
8
0.00016
dumbgoogles
0.00014 worthlessgoogs
googlesgonnagoog
googleriots
0.00012 googlehangout
0.00010
(a)
% of tweets
0.00008
0.00006
0.00004
0.00002
0.00000
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
201 201 201 201 201 201 201 201 201 201 201 201 201 201 201 201 201 201
09 909 009 109 209 309 409 509 609 709 809 909 009 110 210 310 410 510
18 1 2 2 2 2 2 2 2 2 2 2 3 0 0 0 0 0
day
As discussed in our literature review, we still have limited in x and y. Since the lifetime of /pol/ threads isquite
dynamic,
we shift and normalize the time axis for both tix and tjy ,
insight into how trolls operate, and in particular how forces
outside the control of targeted services organize and coordi- so that t = 0 corresponds to when the video was first linked
nate their actions. To this end, we set out to investigate the and t = 1 to the last post in the /pol/ thread:
connection between /pol/ threads and YouTube comments. We t←
t − tyt
.
focus on YouTube since 1) it accounts for the majority of me- tlast − tyt
dia links posted on /pol/, and 2) it is experiencing an increase In other words, we normalize to the duration of the /pol/
in hateful comments, prompting Google to announce the (not thread’s lifetime. We consider only /pol/ posts that occur af-
uncontroversial) YouTube Heroes program [30]. ter the YouTube mention, while, for computational complexity
We examine the comments from 19,568 YouTube videos reasons, we consider only YouTube comments that occurred
linked to by 10,809 /pol/ threads to look for raiding behavior at 10 Recallthat, since there are no accounts on 4chan, bans are based on ses-
scale. Note that finding evidence of raids on YouTube (or any sion/cookies or IP addresses/ranges, with the latter causing VPN/proxies to
other service) is not an easy task, considering that explicit calls be banned often.
9
4
2 0
%
1 −2
−4
0
−2 −1 0 1 2 0 0.5 1 1.5 2
normalized time hate comments per second ·10−2
Figure 18: Distribution of the distance (in normalized thread life- Figure 19: Hateful YouTube comments vs synchronization lag be-
time) of the highest peak of activity in YouTube comments and the tween /pol/ threads and corresponding YouTube comments. Each
/pol/ thread they appear in. t = 0 denotes the time when video was point is a /pol/ thread. The hateful comments count refers to just
first mentioned, and t = 1 the last related post in the thread. those within the thread lifetime ([0,+1])
within the (normalized) [−10, +10] period, which accounts for The resulting cross-correlation is also a Dirac delta train,
35% of YouTube comments in our dataset. representing the set of all possible inter-arrival times between
From the list of YouTube comment timestamps, we compute elements from the two sets.
the corresponding Probability Density Function (PDF) using If y(t) is the version of x(t) shifted by ∆T (or at least con-
the Kernel Density Estimator method [27], and estimate the tains a shifted version of x(t)), with each sample delayed with
position of the absolute maximum of the distribution. In Fig- a slightly different time lag, c(t) will be characterized by a
ure 18, we plot the distribution of the distance between the high concentration of pulses around ∆T . As in the peak activ-
highest peak in YouTube commenting activity and the /pol/ ity detection, we can estimate the more likely lag by comput-
post linking to the video. We observe that 14% of the YouTube ing the associated PDF function ĉ(t) by means of the Kernel
videos experience a peak in activity during the period they are Density Estimator method [27], and then compute the global
discussed on /pol/. In many cases, /pol/ seems to have a strong maximum:
influence on YouTube activity, suggesting that the YouTube Z ∞
link posted on /pol/ might have a triggering behavior, even ĉ(t) = ˆ = arg max ĉ(t)
c(t + τ )k(τ )dτ ; ∆T
−∞ t
though this analysis does not necessarily provide evidence of
where k(t) is the kernel smoothing function (typically a zero-
a raid taking place.
mean Gaussian function).12
However, if a raid is taking place, then the comments on
both /pol/ and YouTube are likely to be “synchronized.” Con-
sider, for instance, the extreme case where some users that see 7.4 Evidence of Raids
the YouTube link on a /pol/ thread comment on both YouTube Building on the above insights, we provide large-scale evi-
and and the /pol/ thread simultaneously: the two set of times- dence of raids. If a raid is taking place, we expect the estimated
tamps would be perfectly synchronized. In practice, we mea- lag ∆T to be close to zero, and we can validate this by looking
sure the synchronization, in terms of delay between activi- at the content of the YouTube comments.
ties, using cross-correlation to estimate the lag between two Figure 19 plots the relationship between the number of hate-
signals. In practice, cross-correlation slides one signal with ful comments on YouTube that occur within the /pol/ thread
respect to the other and calculates the dot product (i.e., the lifetime (i.e., containing at least one word from the hate-
matching) between the two signals for each possible lag. The base dictionary) and the synchronization lag between the /pol/
estimated lag is the one that maximizes the matching between thread and the YouTube comments. The trend is quite clear:
the signals. We represent the sequences as signals (x(t) and as the rate of hateful comments on YouTube increases, the
y(t)), using Dirac delta distributions δ(·). Specifically, we ex- synchronization lag between /pol/ and YouTube comments de-
pand x(t) and y(t) into trains of Dirac delta distributions: creases. This shows that almost all YouTube videos affected
Nx Ny
X X by (detected) hateful comments during the /pol/ thread lifetime
x(t) = δ t − tix ; y(t) = δ t − tjy
i=1 j=1
are likely related to raids.
and we calculate c(t), the continuous time cross-correlation Figure 20 plots the CDF of the absolute value of the syn-
between the two series11 as: chronization lag between /pol/ threads and comments on the
Ny
∞ Nx X
corresponding YouTube videos. We distinguish between com-
Z X
c(t) = x(t + τ )y(τ )dτ = δ t − tjy − tix
−∞ i=1 j=1
ments with a higher percentage of comments containing hate
words during the life of the thread from those with more before
11 Sincetimestamp resolution is 1s, this is equivalent to a discrete-time cross-
correlation with 1s binning, but the closed form solution lets us compute it 12 ĉ(t)
is also the cross-correlation between the PDF functions related to x(t)
much more efficiently. and y(t).
10
1
8 Discussion & Conclusion
This paper presented the first large-scale study of /pol/, 4chan’s
politically incorrect board, arguably the most controversial one
eCDF
0.5
owing to its links to the alt-right movement and its unconven-
tional support to Donald Trump’s 2016 presidential campaign.
hate in [0 +1]≤hate in [-1 0] First, we provided a general characterization, comparing ac-
0 hate in [0 +1]>hate in [-1 0]
tivity on /pol/ to two other boards on 4chan, /sp/ (“sports”) and
0 0.2 0.4 0.6 0.8 1 /int/ (“international”). We showed that each of the boards ex-
Synchronization Lag (s) ·10 5 hibits different behaviors with respect to thread creation and
posts. We looked at the impact of “bump limits” on discourse,
Figure 20: CDF of synchronization lag between /pol/ threads and finding that it results in fresh content on a consistent basis.
YouTube comments, distinguishing between threads with YouTube
videos containing higher hate comments percentage in the [0 +1] pe-
We used the country flag feature present on the three boards
riod or [-1 0]. and found that, while Americans dominate the conversation
in terms of absolute numbers, many other countries (both na-
tive English speaking and not) are well represented in terms of
4
synchronization lag 105 (s)
11
analysis of language and posting behavior. Finally, while we inside-googles-internet-justice-league-ai-powered-war-trolls/,
showed quantitative evidence that raids are taking place, we do 2016.
not claim an ability to classify them as there are many layers [14] A. Horowitz Satlin. Anti-Semitic Trolls Threaten
of subtlety in how raiding behavior might be exhibited. How- To Take Twitter Down With Them. http://www.
ever, we are confident that our findings can serve as a founda- huffingtonpost.com/entry/twitter-bullying-anti-semitism_
us_580876d1e4b0b994d4c47e94, 2016.
tion for interesting and valuable future work exploring fringe
[15] H. Hosseinmardi, A. Ghasemianlangroodi, R. Han, Q. Lv, and
groups like the alt-right, hate speech, and online harassment S. Mishra. Analyzing Negative User Behavior in a Semi-
campaigns. anonymous Social Network. In ASONAM, 2014.
Acknowledgments. We wish to thank Andri Ioannou and De- [16] M. Ingram. Here’s Why You Shouldn’t Trust Those Online Polls
spoina Chatzakou for their help and feedback, and Timothy That Say Trump Won. http://for.tn/2dk74pG, 2016.
Quinn for providing access to the Hatebase API. This research [17] A. Johnson and P. Helsel. 4chan Murder Suspect David Kalac
is supported by the European Union’s H2020-MSCA-RISE Surrenders to Police. http://nbcnews.to/2dHNcuO, 2016.
[18] C. McGoogan. Internet trolls replace racist slurs with code-
grant “ENCASE” (GA No. 691025) and by the EPSRC un-
words to avoid censorship. http://www.telegraph.co.uk/
der grant EP/N008448/1. Jeremiah Onaolapo was supported
technology/2016/10/03/internet-trolls-replace-racist-slurs-
by the Petroleum Technology Development Fund (PTDF). with-online-codewords-to-av/, 2016.
[19] A. Y. Ng, M. I. Jordan, Y. Weiss, et al. On spectral clustering:
Analysis and an algorithm. Advances in neural information pro-
References cessing systems, 2:849–856, 2002.
[1] F. Alfonso. 4chan celebrates Independence Day by spamming [20] C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang.
popular Tumblr tags. http://www.dailydot.com/news/4chan- Abusive Language Detection in Online User Content. In WWW,
tumblr-independence-day/, 2014. pages 145–153, 2016.
[2] N. Anderson. 4chan tries to change life outside the base- [21] B. Pang and L. Lee. Opinion mining and sentiment analysis.
ment via DDoS attacks. http://arstechnica.com/tech- Foundations and trends in information retrieval, 2:1–135, 2008.
policy/2010/09/4chan-tries-to-change-life-outside-the- [22] S. T. Peddinti, A. Korolova, E. Bursztein, and G. Sampemane.
basement-via-ddos-attacks/, 2010. Cloak and Swagger: Understanding data sensitivity through the
[3] Anti-Defamation League. Pepe the Frog. http://www.adl.org/ lens of user anonymity. In IEEE Security & Privacy, 2014.
combating-hate/hate-on-display/c/pepe-the-frog.html, 2016. [23] R. Potapova and D. Gordeev. Determination of the Internet
[4] Aspen Institute. How the internet and social media are changing Anonymity Influence on the Level of Aggression and Usage of
culture. http://www.aspeninstitute.cz/en/article/4-2014-how- Obscene Lexis. ArXiv e-prints, Oct. 2015.
the-internet-and-social-media-are-changing-culture/, 2014. [24] R. Potapova and D. Gordeev. Detecting state of aggression in
[5] J. Bartlett. 4chan: the role of anonymity in the meme-generating sentences using CNN. CoRR, abs/1604.06650, 2016.
cesspool of the web. http://www.wired.co.uk/article/4chan- [25] C. M. Rivers and B. L. Lewis. Ethical research standards in a
happy-birthday, 2016. world of big data. F1000Research, 2014.
[26] J. Siegel. Dylann Roof, 4chan, and the New Online
[6] M. S. Bernstein, A. Monroy-Hernández, D. Harry, P. André,
Racism, 2015. http://www.thedailybeast.com/articles/2015/06/
K. Panovich, and G. Vargas. 4chan and /b/: An Analysis of
29/dylann-roof-4chan-and-the-new-online-racism.html.
Anonymity and Ephemerality in a Large Online Community. In
[27] B. W. Silverman. Density estimation for statistics and data
ICWSM, 2011.
analysis, volume 26. CRC press, 1986.
[7] J. Blackburn and H. Kwak. STFU NOOB! Predicting Crowd-
[28] J. Stein. How Trolls Are Ruining the Internet. http://ti.me/
sourced Decisions on Toxic Behavior in Online Games. In
2bzZa9y, 2016.
WWW, 2014.
[29] N. Wolf. Future of 4chan uncertain as controversial site faces
[8] J. Cheng, C. Danescu-Niculescu-Mizil, and J. Leskovec. Anti-
financial woes. https://www.theguardian.com/technology/2016/
social behavior in online discussion communities. In ICWSM,
oct/04/4chan-website-financial-trouble-martin-shkreli, 2016.
2015.
[30] YouTube Official Blog. Growing our Trusted Flagger program
[9] D. Correa, L. A. Silva, M. Mondal, F. Benevenuto, and K. P.
into YouTube Heroes. https://youtube.googleblog.com/2016/
Gummadi. The Many Shades of Anonymity: Characterizing
09/growing-our-trusted-flagger-program.html, 2016.
Anonymous Social Media Content. In ICWSM, 2015.
[10] A. Couts and A. Powell. 4chan and Reddit bom-
barded debate polls to declare Trump the winner. Appendix
http://www.dailydot.com/layer8/trump-clinton-debate-online-
polls-4chan-the-donald/, 2016. A Rare Pepes
[11] N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic,
and N. Bhamidipati. Hate speech detection with comment em- In this Section we display some of our rare Pepe collection.
beddings. In WWW, 2015.
[12] E. Ferrara, M. JafariAsbagh, O. Varol, V. Qazvinian,
F. Menczer, and A. Flammini. Clustering memes in social me-
dia. In ASONAM, Aug 2013.
[13] A. Greenberg. Inside Google’s Internet Justice League and Its
AI-Powered War on Trolls. https://www.wired.com/2016/09/
12
Figure 22: A somewhat rare, modern Pepe, which much like the
Bayeux Tapestry records the historic rise of /pol/.
13
Figure 26: An ironic Pepe depiction of Hillary Clinton.
14
Figure 30: A mischievous witch Pepe. Figure 31: The now “iconic” Trump Pepe.
15