Sunteți pe pagina 1din 8

TEXT SUMMARIZATION TOOL

A PROJECT REPORT
Submitted By
NITISH RAJ (EO5BCS1098)

NISHANT (EO5BCS1095)

PARANTAP DAS (EO5BCS1101)

Under the Guidance of

Mrs. Sumathy Eswaran

In partial fulfillment for the award of the degree


Of

BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND ENGINEERING

Dr. M.G.R
EDUCATIONAL AND RESEARCH INSTITUTE
UNIVERSITY
MADURAVOYAL, CHENNAI. 600 095

MAY2009
Dr.M.G.R
EDUCATIONAL AND RESEARCH INSTITUTE
UNIVERSITY
CHENNAI - 95

BONAFIDE CERTIFICATE

Certified that this project report “TEXT SUMMARIZATION TOOL” is the

bonafide work of “NITISH RAJ (E05BCS1098), NISHANT (E05BCS1095) &

PARANTAP DAS (E05BCS1101)”, who carried out the project work under my

supervision.

SIGNATURE SIGNATURE

Dr. V. Cyril Raj Mrs. Sumathy Eswaran


HEAD OF DEPARTMENT SUPERVISOR
Department of Computer Science Dept of Computer Science
Dr.M.G.R E & RI UNIVERSITY Dr.M.G.R E & RI UNIVERSITY
Chennai Chennai

Submitted to Examination conducted on .

INTERNAL EXAMINAR EXTERNAL EXAMINER


ACKNOWLEDGEMENT

At the outset, we wish to record our gratitude towards Mr. A. C. Shanmugam,


our chancellor for his encouragement and valuable support toward us.

Special thanks are due to our Head of the Department(Computer Science and
Engineering)Dr. V. CYRIL RAJ, for the valuable help and suggestion
imparted to us.

We would express our sincere thanks to Mrs.S.GEETHA, Project coordinator


who encouraged and gave us all kind of help throughout the courses.

We wish to thank Mrs.Sumatyi Eswaran, our internal guide who had spent
hours together with us in bringing out this project successfully.

We would like to thank all the staff members of the Computer Science and
Engineering Department for their kind co-operation and help in completing
our project.
ABSTRACT

Auto-summarization is a technique used to generate summaries of

electronic documents. There are two categories of summarizers, linguistic and

statistical. Linguistic summarizers use knowledge about the language

(syntax/semantics/usage etc) to summarize a document. Statistical ones operate by

finding the important sentences using statistical methods (like frequency of a

particular word etc). Statistical summarizers normally do not use any linguistic

information. In this project, an auto-summarization tool is developed using

statistical techniques. The techniques involve finding the keywords and related

words, scoring the sentences, ranking the sentences etc. The summary operates on a

single document (but can be made to work on multiple documents by choosing

proper algorithms for integration) and provides a summary of the document. The

size of the summary can be specified by the user when invoking the tool. Pre-

processing interfaces are there to handle the following document types: Plain Text,

HTML, and Word Document.


Department of Computer Science
Dr.M.G.R Educational and Research Institute
Maduravoyal

TABLE OF CONTENTS

NO. TITLE PAGE NO.

ABSTRACT iii
LIST OF FIGURES v
LIST OF SYMBOLS vi

1. INTRODUCTION
8
2. PROJECT FEATURES
14
3. PROPOSED IMPLEMENTATION
15
4. SPECIFICATION
16
4.1 USER REQUIREMENT 16
4.2 SOFTWARE & HARDWARE REQUIREMENTS 18
5. DESIGNS
19
5.1 UML DIAGRAMS 19
5.2 MODULE BLOCK DIAGRAM 24
5.3 DATA FLOW DIAGRAM 25
6. IMPLEMENTATION
26
7. TESTING
40
8. SAMPLE CODING
52
9. CONCLUSION & SUGGESTED ENHANCEMENTS 68
10. APPENDIX 70
List of Figures

Fig No. Figure Page No.

1(a) use case diagram 20


1(b) class diagram 21
1(c) detailed class diagram 22
1(d) activity diagram 23
2(a) module diagram 24
2(b) data flow diagram 25
3(a) Module details 27
3(b) user interface 30
3 (c) Re-summary options 31
4(a) screenshot for converted text 33
4(b) screenshot for formatted text 34
4(c) screenshot for scoring module 36
4(d) screenshot for ranking module 38
5(a) Alert for an error 41
5(b) Different alert for an error 42
6(a) Input from user 46
6(b) Summary of input file 47
6 (c) 2 out of 4 of output after 1st summary 48
6(d) 4 out of 4 of output ater 1st summary 49
6(e) Re-Summary option 50
6(f) output after Redo option 51
List of Symbols

No. Symbol Meaning

1. Text Input Text provided to the Summarization Tool

2. Tool Summarization Tool

3. msg message

4. i/p Input

5. D/B Database

6. MB Megabyte

7. GB Gigabyte

8. RAM Read and write memory

S-ar putea să vă placă și