Sunteți pe pagina 1din 6

Fake News Detection using CNN and LSTM

Aimann Sait, Jatin JH, Saif Ali Khan


BMS College of Engineering
Abstract—In this paper we give an insight on how to detect
fake news using current technologies. We are using the
concept of neural networks to build a model that can detect
fake news. The dataset is obtained from the
internet(Kaggle.com). We propose a new way to detect
fake news, which will provide more accuracy.
Fake news is nothing but misinformation. It is difficult to
detect fake news only based on news contents.
After social media became popular, it has become very
easy to spread fake news. This misleads the society
because it is difficult for people to differentiate between
fake news and real news.
We also conducted a survey on the different approaches
that people have used to tackle the problem of fake news.
We have also discussed the area of future research related Figure 1: An illustration of how the story titled “Palestinians
to fake news detection. recognizes Texas as part of Mexico” appears on Facebook
[Source: http://www.facebook.com/]
1. INTRODUCTION
The society witnessed a lot fake news being spread during Considering the above Figure 1: USA President Donald J.
the 45th US Presidential elections. This epidemic affects Trump and Palestinian President Mahmoud Abbas shaking
the field of journalism and politics. First, fake news will hands while standing in the Oval office, standing in front of
disrupt the balance of authenticity balance in the society. USA and Mexico map where Texas looks like a part of
Second, fake news purposely makes people consume false Mexico. In short, the image portraits that USA and Palestine
information. Third, fake news changes the way people are in agreement with Texas being recognized as part of
respond to real news. Mexico

The media is trying to figure out the adverse effects of fake


news on the elections and how it persuaded the public in The approach provided in this paper will help people in
general. detecting fake news with more precision.
To put an end to this problem, people started thinking of The rest of this paper is divided
ways to detect fake news. Majority of them followed an into 9 sections. Section 3 reviews related work,
approach where, fake news is first detected and then steps Section 4 describes LSTM, Section 5 explains RNN,
are taken to stop spreading it. Because once fake news Section 6
spreads virally, the damage is done and its effects cannot explains our approach in brief, Section 7 provides an
be reversed, so immediate reaction is very important. insight on future work in the direction of fake news
Fake news detection is defined as the task of categorizing detection. Our conclusions and references are presented
news along a continuum of veracity with an associated in Section 8 and Section 9 respectively.
measure of certainty.
To date, people use "polifact" and fact-checking websites
such as "The Onion" and "Snopes". But these websites are 2. PROBLEM STATEMENT
not very reliable because fact checking websites are
confined to particular domains such as sports, politics, News that we consume every day change our view of the
humanities etc. This makes it difficult to obtain datasets. world. The escalation of fake news has certainly put
Fake news detection is difficult because even human reliability in journalism and media system in jeopardy.
beings may find it difficult to differentiate between real Defining, identifying and stopping fake news from
news and fake news. spreading has always been a top priority of governments
and corporations. Yet, fake news is still prevalent in social
media and is being consumed by millions of people every
day. One of the initial challenges we faced while
embarking on our journey to solve the problem is that there
does not yet exist a definite and unified definition of fake
news and the markers needed to determine an articles
legitimacy or flag a news as fake. A lot of corporations still and fake news on Social media. Here the Social Media
have to resort to flagging articles manually. The task of may have the malicious accounts.
detecting news as fake can be interpreted as classifying Even the detection is further classified into news context
news written to deceive the readers on purpose. and social context.
After getting the news, it is classified into respective side
and further analyzed
3.REALTED WORK by feature extraction on Linguistic based and visual based.
They have considered the
3.1 Media-Rich Fake News Detection: datasets from the trusted news sites and evaluation metrics
Unigrams and Bigrams are extracted from words. shows the truth discovery.
Term Frequency Inverse Document Frequency is retrieved
to show how important a word is to a document. 3.5 FAMOUS: Fake News Detection Model based on
Unified Key Sentence Information
Type of punctuation can help in identifying fake news
Due to the great influence of conununication media, from
number of characters, complex words, long words, the political power over the public, the fake news detection
number of syllables, word types, and number of has become very challenging.
paragraphs also play a factor. RST shows the co relation Using unified key sentence information model, sentence
between many pieces of text, writers tend to emphasize matching between question and article can be performed.
certain parts of a text. (Graphical Clustering Toolkit) This model makes use of one unified word vector of article
clusters similar news reports together, depending on how by extracting them to the question from the article.
frequently they relate, if its fake or credible is decided by It can perform sentence matching by executing matching
the Euclidian distance to its fake or real counterpart. operations from the contextual information obtained and
the key sentences through bidirectional long short term
3.2 Fighting Fake News Propagation with Block chains memory.
They have categorized the system into 3 parts, namely, I.
Authentic news source II. Sharing platform III.Users who 3.6 Fake News: A Survey of Research, Detection Methods,
can behave arbitrarily. and Opportunities
When the news is generated from authentic news source, Fake news has weakened public trust in governments. It
an id is generated and forwarded to the block chain along was best highlighted during the critical months of the 2016
with the sharing platform. U.S. presidential election campaign, where top 20
If the timestamp of shared news is different from the frequently discussed false stories on election generated
authentic news source in the news is fake and is notified 8,711,000 shares on Facebook.
from the block chain to the social network. The dozens of teenagers produced fake news on social
In this way the users who can behave arbitrarily, gets the media and became wealthy during 2016 U.S. Presidential
truthfulness of the news. elections, by advertising on one click in the social media.
As reported by the NBC, every individual has earned
3.3 Fake News Pattern Recognition using Linguistic nearly around $60,000 in the past 6 months.
Analysis As per this survey, the fake news can be identified
To make humans take better decisions to vote in future based on the style, propagation or the user who generated
elections they proposed a new framework. the fake news.
Initially, "text normalization" is performed to classify the
news into categories, then a set of 3.7 Design Exploration of Fake News: A
keywords are extracted to find noticeable pattern. Some Transdisciplinary Methodological Approach
evaluation metrics is used to check the Canadians trust news from traditional sources no matter
success rate of their framework. They made use of the the platform. A survey conducted had shown that 74%
dataset of 200 tweets as data and they trust TV while 34% trusted the newspaper. The focus over
explored that data, then preprocessing is done to the here was on the aesthetic appeal of the content. The sites
explored data, to analyze truthfulness of which were more aesthetically pleasing rated higher in
the news efficiently. trust ability. Transdisciplinary approaches involve using
SUPER-Q and other factors like loyalty to a newspaper.
3.4 Fake News Detection on Social Media: A Data mining
Eye tracking or thinking aloud combined with
perspective
conventional methods gives us a far greater understanding
Using Social Media to know the trending news is a double-
edged sword. of filtering out fake from real news.
They have categorized the fake news into 2 parts and then
detect it. 3.8 Fake News Detection Using Machine Learning
Fake News categories are of 2 parts namely, Fake News on approaches
traditional media
Linguistic Cue Approaches with Machine Learning, Bag Network(CNN). LSTM can effectively preserve the
of words approach, Rhetorical Structure and discourse characteristics of historical information in long text
analysis, Network analysis approaches and SVM sequences, and extract local features of text by using the
classifiers. These are models are text based only and have structure of CNN. We propose a hybrid model of LSTM
very little or negligible improvement on existing methods. and CNN, construct CNN model on the top of LSTM, the
The authors of [3] have classified every tweet/post as text feature vector output from LSTM is further extracted
by CNN structure. The performance of the hybrid model is
binary classification Problem. The Classification is purely
compared with that of other models in the experiment. The
on the basis of source of the post/tweet. The Authors used experimental results show that the hybrid model can
manually collected data sets using twitter API, DMOZ. effectively improve the accuracy of text classification.
The following algorithms where used on data Sets-Naïve
Base, Decision tree, SVM, Neural Networks. Random
Forests and SG Boost. However, ever-changing
characteristics and features of fake news in social media
networks is posing a challenge in categorization of fake
news.

3.9 Fake News Detection System using Article


Abstraction
Presents a new Korean fake news detection system using
fact DB which is built and updated by human’s direct
judgement after collecting obvious facts. The system
receives a proposition, and search the semantically related
articles from Fact DB in order to verify whether the given
proposition is true or not by comparing the proposition
with the related articles in fact DB. To achieve this a deep Figure2: Proposed System
learning model is utilized, Bidirectional Multi Perspective
The Long-Short memory network (LSTM), as a neural
Matching for Natural Language Sentence (BiMPM has
network model, is proven to be more accurate for long
also demonstrated a good performance for the sentence
sentences. Neural networks are good for prediction because
matching task. However, BiMPM has some limitations in of their unique ability nonlinear adaptive processing.
that the longer the length of the input sentence is, the In our model we are going to combine both CNN and
lower its performance is, and it has difficulty in making an LSTM. As stated earlier, LSTM overcomes the drawbacks
accurate judgement when an unlearned word or relation of CNNs. This makes fake news detection more accurate as
between words appear. In order to overcome the we will be combining features of both the models. A rough
limitations, a new matching technique is proposed which sketch of the model which we are planning to do is shown
exploits article abstraction as well as entity matching set in Figure3.
in addition to BiMPM. The Data Flow Diagram will be similar to the Figure5.

Model Description:
4. PROPOSED MODEL  Gathering data

We use word embedding. Here we use dataset which is a ◦ Collect the raw data
text file and consists of news made by users on media. The
goal is to report if the news is real or not. This is done by  Preparing that data
recognizing few words which akin news post behavior. So
this is done by first preparing the dataset by word ◦ NLTK processes
embedding, stemming process and matching few regular
expressions. we use a hybrid model of LSTM Recurrent ◦ Normalization
Neural network and CNN.
Text classification is a classic task in the field of natural ◦ Split Train and Test
language processing. however, the existing methods of text
classification tasks still need to be improved because of the  Choosing a model
complex abstraction of text semantic information and the
strong relevance of context. In this paper, we combine the ◦ Select the algorithm
advantages of two traditional neural network model, Long
Short-Term Memory(LSTM) and Convolutional Neural
◦ Build the model helps recurrent networks learn over several steps (over
1000). The architecture of LSTM is shown below.
◦ Train the model

 Evaluation

◦ Evaluate the model using Test Data

 Prediction

◦ Predict the answer using new record

Figure4: Working of LSTM.

6. Convolutional Neural Network (CNN)

CNN is a type of artificial neural network used mainly in


image recognition and processing that is specifically
designed to process pixel data. Convolution is a way to
give network a degree of translation invariance.

Convolution is needed because it relates 3 signals of


interest: The input signal, The output signal and The
impulse signal. It is just like a formal mathematical
operation, just as multiplication, addition, and integration.
For humans, recognition of objects is the first skill we learn
Figure3 right from the birth, whereas for computer recognizing
objects are slightly complex, as they see everything as input
and output which come as a class or set of classes. CNN
5. Long Short Term Memory accepts the input and converts it into vector form for further
process.
LSTMs were introduced by Hochreiter & Schmidhuber.
LSTMs have the ability to learn long term dependencies.
They can retain information for a very long time. These
networks have 3 gates, namely, forget gate, Input gate and
Output gate.
A. Forget gate:
The information which need not be remembered is
removed using the Forget gate. If the output is 0 for a
particular cell state, the information is forgotten and if the
Figure5: Convolutional Neural Network
output is 1 the information is retained for future use.

B. Input gate:
Useful information is added to a cell state using the input
gate.

C. Output gate:
Useful information is extracted from the current cell state
and presented as an output by the output gate.

LSTMs assist in preserving an error which can be back


propagated through layers and time as in Figure2. This
(ii)Corrector.
These users attach links that contain fake news to their
posts or comments.
(iii)Malicious users.
Malicious users must be removed from social media.
(iv)Naive users.
Naive users must be provided with assistance to improve in
distinguishing fake news.

8.CONCLUSION
Figure6: CNN converted input to vector
In today's world people consume more news from social
media than from traditional news content. But, social
media is also being used to spread fake news and is
7.SCOPE RELATED TO FUTURE WORK affecting people in a negative way. If fake news continues
to be spread in a rapid way, it poses a serious threat to the
In this section we present some of the research society.
opportunities available in fake news detection. Technology can be used to differentiate between facts and
Potential research opportunities for fake news studies fiction. The spread of fake news has to be monitored on
1.Fake News Early Detection. social media in order to make the platform more reliable
If fake news is detected early, it is easier to take steps for for users.
fake news mitigation and intervention before it spreads We saw a problem and we are trying to solve the epidemic
widely. Meanwhile, if users have already read some fake of fake news by using technology. Our model has the
news and they believe it to be true, it is difficult to change potential to help many researches in the directions of false
their perception. information.
2. Identifying Check-Worthy Content. Fake news detection is very important in this era, because
By prioritizing content or topics that are check-worthy, it is even the media is showing just one side of the news about
easier to detect fake news and take steps to terminate them. the politicians, and hiding many aspects.
Check-worthy content can be categorized by its Technology can be used to differentiate between facts and
newsworthiness and it's potential to influence the society. fiction. The spread of fake news has to be monitored on
3.Cross-domain (topic, website, language) Fake News social media in order to make the platform more reliable
Studies. for users.
The studies that are going on Fake News Detection aim to We saw a problem and we are trying to solve the epidemic
distinguish fake news from real news with experimental of fake news by using technology. Our model has the
settings which are limited to a language or a social potential to help many researches in the directions of false
network. Trying to analyze fake news across different information. We can conclude from this project the fake
domains allows us to develop a deeper understanding of news can be identified in less time. Since we took the
fake news and can further assist in detecting fake news political data of America, the people can identify the fake
early. news and cast their vote for right person.
4.Deep Learning for Fake News Studies. Now, we can do any similar projects with the dataset given
Recurrent Neural Networks(RNNs) can be used to by properly training the system and testing it.
represent user engagements and sequential posts.
Convolutional Neural Networks(CNNs) can be used to
capture the features of texts and images. Generative 9. REFERENCES
Adversarial Networks(GANs) are used for assistance in [1] Xinyi Zhou, Reza Zafarani. Fake News: A Survey of
early fake news detection. Deep learning techniques can Research, Detection Methods, and Opportunities. 2018
easily process text, images and speech which are Association for Computing Machinery, Manuscript submitted to
commonly observed in fake news. Deep learning systems ACM.
can easily adapt to a new problem.
5. Fake News Intervention. [2] Angelika Kirilin and Micheal Strube. 2018. In Proceedings of
Data Science, Journalism & Media workshop.
Fake news intervention draws insights from the roles that
users play in spreading fake news. [3] Le Q, Mikolov T (2014) Distributed representations of
(i)Influential user. sentences and documents. International conference on machine
In a social network, blocking the news spread by learning.
influential spreaders first, rather than handling users who
do not have much influence on others helps in a more [4] Wang WY (2017) “liar, liar pants on fire”: a new benchmark
efficient intervention. dataset for fake news detection.
[21] S. Huckle and M. White, “Fake news: a technological
[5] Manning, C.; Surdeanu, M.; Bauer, J.; Finkel, J.; Bethard, S.; approach to proving the origins of content, using blockchains,”
and McClosky, D. 2014. The stanford corenlp natural language Big data, 2017
processing toolkit. In Proceedings of 52nd annual meeting of the
association for computational linguistics: system demonstrations. [22] T. Mikolov, 1. Sutskever, K. Chen, G. S. Corrado, and J.
Dean, "Distributed representations of words and phrases and their
[6] Victoria L Rubin, Niall J Conroy, and Yimin Chen. Hawaii cornpositionality,' in Advances ill neural information processing
International Conference systems, 2013

[7] Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The [23] M. Tan, B. Xiang, and B. Zhou, "Lstm-based deep learning
spread of true and false news online. Science 359, 6380 (2018), models for non-factoid answer selection," 2015.
1146–1151.
[24] R. Mihalcea, and C. Strapparava, "The lie detector:
[8] Kai Shu, Suhang Wang, and Huan Liu. 2017b. Exploiting tri- Explorations in the automatic recognition of deceptive language,"
relationship for fake news detection. arXiv preprint in Proc. ACL-IJCNLP Conj. Assoc. Comput. Linguistics, 2009.
arXiv:1712.07709 (2017).
[25] M. Fitzi, P. Gazi, A. Kiayias, and A. Russell, “Parallel
[9] Saif M Mohammad, Parinaz Sobhani, and Svetlana chains: Improving throughput and latency of blockchain protocols
Kiritchenko. Stance and sentiment in tweets. via parallel composition,” IACR Cryptology ePrint Archive,
ACM Transactions on Internet Technology (TOIT), 2018.
17(3):26, 2017.
[26] Jaigris Hodson ; Brian Traynor: “Design Exploration of
[10] Xia Hu, Jiliang Tang, and Huan Liu. Online social Fake News: A Transdisciplinary Methodological
spammer detection. In AAAI'14, 2014.
Approach”, 2018 IEEE International Professional
,
[11] David O Klein and Joshua R Wueller. Fake news: A Communication Conference (ProComm).
legal perspective. 2017.
[27] Syed Ishfaq Manzoor, Jimmy Singla ; Nikita : “Fake
[12] Xia Hu, Jiliang Tang, Huiji Gao, and Huan Liu. Social News Detection Using Machine Learning approaches”,
spammer detection with sentiment information. In 2019 3rd International Conference on Trends in
ICDM'14. Electronics and Informatics (ICOEI).
[13] Amitabha Dey, Rafsan Zani Rafi, Shahriar Hasan Parash, [28] Kyeong-Hwan Kim, Chang-Sung Jeong: “Fake News
Sauvik Kundu Arko and Amitabha. 2018 Joint 7th International
Conference on Informatics, Electronics & Vision (ICIEV).
Detection System using Article Abstraction”, 2019 16th
International Joint Conference on Computer Science and
[14] Miraj Patel, Detection of Maliciously Authored News Software Engineering (JCSSE).
Articles, December 11, 2017.

[15] Namwon Kim, Deokjin Seo and Chang-Sung Jeong, Fake


News Detection Model based on
Unified Key Sentence Information, 2018 IEEE conference.

[16] N. J. Conroy, V. L. Rubin, and Y. Chen, Automatic


deception detection: Methods for finding fake news, Proceedings
of the Association for Information Science and Technology, 2015.

[17] K. Shu, A. Sliva, S.Wang, 1. Tang, and H. Liu, Fake news


detection on social media: A data mining perspective , ACM
SIGKDD Explorations Newsletter, 2017.

[18] Shivam B. Parikh and Pradeep K. Atrey, Media-Rich Fake


News Detection: A Survey, 2018 IEEE Conference on
Multimedia Information Processing and Retrieval

[19] Muhammad Saad, Ashar Ahmad, Aziz Mohaisen, Fighting


Fake News Propagation With Blockchains, 2019 IEEE
Conference on Communications and Network Security (CNS).

[20] D. M. Lazer, M. A. Baum, Y. Benkler, A. J. Berinsky, K.


M. Greenhill, F. Menczer, M. J. Metzger, B. Nyhan, G.
Pennycook, D. Rothschildet al., “The science of fake news,”
Science, 2018

S-ar putea să vă placă și