Sunteți pe pagina 1din 36

Analyzing social media in

crisis situations
CCGL 9061
HKU, Common Core
Intended Learning Outcomes of the lecture
1. To describe how social media can be used to define
humanitarian interventions

2. To explain the specificities, benefits and limits of


social media data

3. To be able to collect and analyse data from social


media
An introductory distinction
Of course, humanitarian organizations are using social media today to
communicate about their actions
• And one can study how they do that

Some of these organizations also analyze social media for their


interventions
• Try to collect useful information

In this lecture, we will investigate the second approach


Overview

1. The specific nature of social media data

2. Collecting data in crisis situations

3. A hand-on practice
Overview

1. The specific nature of social media data

2. Collecting data in crisis situations

3. A hand-on practice
Defining social media
“Social media are interactive computer-mediated technologies that
facilitate the creation and sharing of information, ideas, career interests
and other forms of expression via virtual communities and networks.”

Some common features:


• Interactive Web 2.0 Internet-based applications.
• User-generated content (text posts, photos, videos…) as core component
• Service-specific profiles and identities created by users
• Development of online social networks

Social media with over 100 million registered users: Facebook, Weibo,
Youtube, Instagram, WeChat, Viber, QQ, Telegram…

https://en.wikipedia.org/wiki/Social_media
What social media are you using?
Go to menti.com and enter code 85 05 4

Choose the social media you are using regularly


Massive amounts of users

A view of the Twitter network

1 vertex : 1 Twitter user

2 vertices are connected if


one user follows the other

https://dhs.stanford.edu/gephi-workshop/twitter-network-gallery/
Massive amounts of data
E-mail (a social medium?)
• In 2017, 269 billion (2.69x109) emails sent and received… each day

WhatsApp
• Annual global traffic in 2015: around 14.4 trillion (14.4 × 1012) text messages

Twitter
• Around 6,000 tweets per second = 350,000 tweets sent per minute = 500 million
tweets per day = around 200 billion tweets per year

https://www.statista.com/statistics/456500/daily-number-of-e-mails-worldwide/
Ovum, 2016, “Application-to-Person Messaging…”, https://www.mmaglobal.com/files/casestudies/mob-mobilecustomerengagement_a2p-wp-77158.pdf
https://www.internetlivestats.com/twitter-statistics/
A diversity of social media
Different social media tailored to different usages and different
audiences
• Facebook: older than others, and thus older users?
• Twitter: the 140 characters constraints, the use of hashtags (not planned initially)
• Instagram and the role of pictures
• …

Significant cross-cultural variation


The geography of Twitter
Example:
Tweets in
Southeast Asia

All exact location coordinates in the Twitter Decahose (10% of all Tweets) 23 October
2012 to 30 November 2012 – Zoom on SE Asia.

Leetaru, K. H., Wang, S., Cao, G., Padmanabhan, A. & Shook, E. (2013). Mapping the global Twitter heartbeat: The geography of Twitter. First Monday 18(5 – 6).
doi:10.5210/fm.v18i5.4366 ; http://firstmonday.org/ojs/index.php/fm/article/view/4366/3654
Using social media for
humanitarian interventions
Some properties of social media, and their consequences
The surface
Information on social media does not appear in the way information
is reported by journalists or experts
• Less structure, often shorter / more compact
• A more colloquial language, slang etc.
• Use of many non-standard forms (4ever for forever, 2mr to tomorrow etc.)
• Possibly multimodal

Use of different languages, or of different dialects / vernaculars

This can create difficulties in accessing what the data mean = their
content
The content (1/3)
The content of people’s participations on social media can be very varied
around a given theme or event

Different speech acts: to label, to repeat, to answer, to request, to protest,


to convince etc.
• What we can do with words
• Pointing to other sources of information (esp. in Twitter)

Most if not all of these speech acts may be found in more traditional media,
but their distribution is likely different
• E.g. more aggressiveness on social media, less articulated / more atomized arguments
etc.
The content (2/3)
Humanitarian interventions are usually looking for actionable
information
• Something they can act upon

A lot of messages / pieces of information are useless


• Need to be filtered out

Information might be useless, but worse, it may be fake


• E.g. to mislead the opponent (or even the humanitarian organizations) during a
conflict
• Often not easy to detect
The numbers of fake information
Currently, in New York City, 911 operators receive on average 10,000
false calls

In continental Europe, half of all calls made to emergency numbers


are false/hoax calls…

During Hurricane Sandy in 2012, over 10,000 tweets posted fake


photographs
Striking a balance
The need to find a balance when selecting information

Classical concepts in statistics: Type-I vs Type-II errors


• You don’t want to filter too much and lose valuable information (type-II errors)
• You don’t want to include too much and possible consider false information as
true (type-I errors)

How can we assess these two risks when it comes to social


media??
Privacy
While some media like Twitter are open – all messages are public –,
others are not
• Whatsapp, WeChat, part of Facebook

Private networks are not primarily accessible for humanitarian


organizations

Should there be special request(s) to access data in urgent situations?


• Issue of privacy, of user’s consent etc.
The reactivity
Social media can quickly react to any event
• Faster than how long it takes to send a journalist

This is very useful during the onset of an emergency situation

The specific case of Twitter: popularity, public messages, very reactive


Finding the needle in the haystack
Social media data are usually very ‘noisy’
• Lot of ‘useless’ information

The objective is then to effectively sort out


what is relevant, and what is not
• The former being much rarer than the latter

Cf. ‘weak signals’

 A need for specific, tailored, approaches


Using social media in a clever way
A source of information in addition to others
• Avoid the “streetlight effect”

Looking for situations before and during a crisis


• What if the number of messages remains the same?
• Situation more or less under control
• What if the number of messages is drastically increasing?
• Suggests big problems
• What if the number of messages is drastically decreasing?
• Maybe no more electricity…

Looking for the ‘contours of a community’s social media footprint’


Overview

1. The specific nature of social media data

2. Collecting data in crisis situations

3. A hands-on practice
Collecting data effectively
A number of tools have been designed to collect information on Twitter

E.g. solutions to analyze Twitter in real-time to quickly


detect “bad buzz”

Some tools have been more specifically tailored to humanitarian action


https://www.visibrain.com/en/
http://deustotech.deusto.es/soluciones/dante.html
Introducing TweetTracker

http://tweettracker.fulton.asu.edu/
Localized
information
Some data can
be geo-localized
 Potentially a
very useful
feature

However, only a
small portions
of the message
(on Twitter)
https://ny.spatial.ly/
Matching amounts of
damage and volumes of
Tweets
Context: Hurricane Sandy in 2012

A correlation found between


• Activity on Twitter about Sandy
• Actual damage

 Use tweets to quickly estimate


actual damage…

https://www.theverge.com/2016/3/11/11200962/twitter-hurricane-sandy-damage-study
https://advances.sciencemag.org/content/2/3/e1500779
Overview

1. The specific nature of social media data

2. Collecting data in crisis situations

3. A hands-on practice
Source of data
Human-labeled tweets collected during the 2011 Joplin tornado and
labeled into humanitarian categories

Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz,


and Patrick Meier. Extracting Information Nuggets from Disaster-
Related Messages in Social Media. In Proceedings of the 10th
International Conference on Information Systems for Crisis Response
and Management (ISCRAM), May 2013, Baden-Baden, Germany.

https://crisisnlp.qcri.org/
Two Excel files (on Moodle)
Tweets - Joplin Tornado - to annotate.xlsx
Tweets - Joplin Tornado - annotated.xlsx

Start with the first file, do not check the second file until you have worked
on the first file (see instructions in the next slides)
• Otherwise, no point in doing the activity

Two series of Tweets


• The first series should be judged in terms of how informative the tweets are
• The second series should be judged in terms of the themes informative tweets relate to
The stages of categorization
Informativeness Themes / Topics
1. Personal Only 1. Caution and advice
2. Informative (Indirect) 2. Information source
3. Informative (Direct) 3. Donations of money, goods or
4. Informative (Direct or Indirect) services
5. Other 4. Casualties and damage
5. People missing, found or seen
6. Unknown

These categories have been designed by the authors of the study


Informativeness
Topics
Learning activity: Categorizing Tweets (1/2)
Working as a group of 3 to 5 people

Categorize the tweets with the former categories


• The first datasheet (Tweets – Informativeness) in terms of informativeness
• The second datasheet (Tweets – Themes) in terms of themes

At least 200 or 300 Tweets

Working as a group, how do you share the load? What are possible
strategies, with their benefits and limits?
Learning activity: Categorizing Tweets
Given how long it took you to categorize 200 or 300 tweets, how long
would you take to analyze the whole file?

What is your quantitative appreciation of the “needle in the haystack”?

What type of heuristics could be used to quickly sort out potentially


usefully and likely irrelevant tweets?
Crowdsourcing
The authors relied on crowdsourcing to analyse the Tweets
• Different unknown workers participated in the task

Crowdsourcing the information, but also crowdsourcing its analysis…

Need to assess the quality of the job done


• Same inputs analysed by several people, with majority judgment
• A gold standard dataset = “a small set of items selected by the authors, whose
labels are uncontroversial”. Workers that did not agree substantially with the
golden standard were discarded
Conclusions and perspectives
Social media data are relatively new and have their own specificities
• A potentially useful source, but uneasy to exploit
• A complementary source, with its own benefits and limits

Collecting data is made easier today with a range of tools


• E.g. Tweettracker

Manual analysis faces the problem of the amount of data


• Strongly suggests alternative, more automatic methods (data analytics) to be
considered
• With their advantages, and their limits
• See upcoming lectures…

S-ar putea să vă placă și