Project Abstract

ABSTRACT
Short-text messages such as tweets are being created and shared at an unprecedented rate.
Tweets, in their raw form, while being informative, can also be overwhelming. For both end-
users and data analysts, it is a nightmare to plow through millions of tweets which contain
enormous amount of noise and redundancy. In this paper, a proposed novel continuous
summarization framework called Sumblr to alleviate the problem. In contrast to the
traditional document summarization methods which focus on static and small-scale data set,
Sumblr is designed to deal with dynamic, fast arriving, and large-scale tweet streams.
The proposed framework consists of three major components. First, a proposed online
tweet stream clustering algorithm to cluster tweets and maintain distilled statistics in a data
structure called tweet cluster vector (TCV). Second, a TCV-Rank summarization technique is
developed for generating online summaries and historical summaries of arbitrary time
durations. Third, a design for an effective topic evolution detection method is implemented,
which monitors summary-based/volume-based variations to produce timelines automatically
from tweet streams. The experiments on large-scale real tweets demonstrate the efficiency
and effectiveness of the framework.
i
ACKNOWLEDGEMENT
The satisfaction that accompanies the successful completion of any task would be incomplete
without the mention of the people who made it possible. We would hence like to
acknowledge the very people behind our success in this endeavour.
We would like to profoundly thank Management of RNS Institute of Technology

for providing such a healthy environment for the successful completion of project work.
We would like to express my thanks to the Director Dr. H N Shivashankar and the
Principal Dr. M K Venkatesha for their encouragement that motivated us for the successful
completion of project work.
It gives us immense pleasure to thank Dr. M V Sudhamani, Professor and Head of

Department of Information Science & Engineering, for her constant support and
encouragement.
We would like to express our deepest sense of gratitude to our project guide Mr. R
Rajkumar, Assistant Professor, Department of Information Science & Engineering for his
support, inputs and recommendations throughout our project work. The constant mentorship
and guidance that he has provided has been an immense help for our project work.
We would like to thank our project coordinators, Ms. Kusuma and Mr. Manoj
Kumar, Assistant Professors, Department of Information Science & Engineering, for their
expert guidance and help throughout project work.
We would also like to thank all other teaching and non-teaching staff of Department
of Information Science& Engineering who has directly or indirectly helped us in the
completion of the project work. Last, but not the least, we would hereby acknowledge and
thank our parents who have been a source of inspiration and also instrumental in the
successful completion of the project work.
ii
Table of Contents
Certificate
Abstract i
Acknowledgment ii
List of Figures v
1. Introduction 1
2. Literature Survey 8
3. System Requirements 14
3.1 Hardware Requirements 15
3.2 Software Requirements 16
3.3 Software Environment 18
3.3.1 Java Technology 18
3.3.2 The Java Platform 19
3.3.3 ODBC 23
3.3.4 JDBC 24
4. System Design 27
4.1 System Architecture 27
4.2 Components of System Architecture 29
4.2.1 Tweet Stream 29
4.2.2 Tweet Stream Clustering 29
4.2.2.1 Tweet Stream Cluster 29
4.2.2.2 Pyramidal Time Frame 29
4.2.3 High-Level Summarization 30
4.2.3.1 Online Summaries 30
4.2.3.2 Historical Summaries 30
4.2.4Timeline Generation 30
4.2.4.1 Topic Evolution Detection 30
4.3 System Design 30
4.3.1 Input Design 30
4.3.1.1 Objectives of Input Design 31
4.3.2 Output Design 32
iii
4.3.2.1 Objectives of Output Design 32
4.4 Sequence Diagram 33
4.5 Class Diagram 34
4.6 Activity Diagram 36
4.7 Use Case Diagram 38
5. Implementation 40
5.1 Admin Modules 40
5.1.1 Admin Login 40
5.1.2 Search History 41
5.1.3 Requests and Response 42
5.1.4 Topic Tweet Messages 42
5.1.5Tweet based on Timeline 43
5.1.6 Tweet Clustering 43
5.2 User Modules 45
5.2.1 Search Users 46
5.2.2 Tweet 47
6. System Testing 48
6.1 Types of Tests 48
6.1.1 Unit Testing 48
6.1.2 Integration Testing 49
6.1.3 Functional Test 49
6.1.4 System Test 50
7. Results 51
7.1 Tweet Stream Clustering 51
7.2 Timeline Generation 52
7.3 Tweet Ranks 53
7.4 Registration Page 54
8. Conclusion and Future Enhancements 55
8.1 Conclusion 55
8.2 Future Enhancements 55
References
iv
List of Figures
Figure No. Description Page No.
1.1 A timeline example for topic “Apple” 2

3.1 Implementation and working of Java 19
3.2 Example of a Java Program 19
3.3 Programs that run on Java platform 20
3.4 Java 2 SDK 22
4.1 Framework of Sumblr 28
4.2 System Design 31
4.3 Sequence Diagram 33
4.4 Class Diagram 35
4.5 Activity Diagram 37
4.6 UML Diagram 39
5.1 Admin Page 40
5.2 Search Page 41
5.3 Request and Response Page 42
5.4 Tweet Timeline Page 43
5.5 Tweet Clustering Page 44
5.6 User Login Page 45
5.7 User Profile 46
5.8 Search User Page 47
5.9 Tweet Page 47
7.1 Cluster of Tweets 51
7.2 Timeline Generated 52
7.3 Tweet Ranks 53

Project Abstract

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Project Abstract

Încărcat de

Drepturi de autor:

Formate disponibile

ABSTRACT

We would like to profoundly thank Management of RNS Institute of Technology

It gives us immense pleasure to thank Dr. M V Sudhamani, Professor and Head of

Figure No. Description Page No.

1.1 A timeline example for topic “Apple” 2

S-ar putea să vă placă și