Găsiți următorul dvs. carte preferat
Python Text Processing with NLTK 2.0 Cookbook: LITE
Descriere
Despre autor
Autori similari
Legat de Python Text Processing with NLTK 2.0 Cookbook
Categorii relevante
Previzualizare carte
Python Text Processing with NLTK 2.0 Cookbook - Jacob Perkins
Table of Contents
Python Text Processing with NLTK 2.0 Cookbook: LITE
Credits
About the Author
About the Reviewers
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. Tokenizing Text and WordNet Basics
Introduction
Tokenizing text into sentences
Getting ready
How to do it...
How it works...
There's more...
Other languages
See also
Tokenizing sentences into words
How to do it...
How it works...
There's more...
Contractions
PunktWordTokenizer
WordPunctTokenizer
See also
Tokenizing sentences using regular expressions
Getting ready
How to do it...
How it works...
There's more...
Simple whitespace tokenizer
See also
Filtering stopwords in a tokenized sentence
Getting ready
How to do it...
How it works...
There's more...
See also
Looking up synsets for a word in WordNet
Getting ready
How to do it...
How it works...
There's more...
Hypernyms
Part-of-speech (POS)
See also
Looking up lemmas and synonyms in WordNet
How to do it...
How it works...
There's more...
All possible synonyms
Antonyms
See also
Calculating WordNet synset similarity
How to do it...
How it works...
There's more...
Comparing verbs
Path and LCH similarity
See also
Discovering word collocations
Getting ready
How to do it...
How it works...
There's more...
Scoring functions
Scoring ngrams
2. Replacing and Correcting Words
Introduction
Stemming words
How to do it...
How it works...
There's more...
LancasterStemmer
RegexpStemmer
SnowballStemmer
See also
Lemmatizing words with WordNet
Getting ready
How to do it...
How it works...
There's more...
Combining stemming with lemmatization
See also
Translating text with Babelfish
Getting ready
How to do it...
How it works...
There's more...
Available languages
Replacing words matching regular expressions
Getting ready
How to do it...
How it works...
There's more...
Replacement before tokenization
See also
Removing repeating characters
Getting ready
How to do it...
How it works...
There's more...
See also
Spelling correction with Enchant
Getting ready
How to do it...
How it works...
There's more...
en_GB dictionary
Personal word lists
See also
Replacing synonyms
Getting ready
How to do it...
How it works...
There's more...
CSV synonym replacement
YAML synonym replacement
See also
Replacing negations with antonyms
How to do it...
How it works...
There's more...
See also
3. Text Classification
Introduction
Bag of Words feature extraction
How to do it...
How it works...
There's more...
Filtering stopwords
Including significant bigrams
See also
Training a naive Bayes classifier
Getting ready
How to do it...
How it works...
There's more...
Classification probability
Most informative features
Training estimator
Manual training
See also
Training a decision tree classifier
Getting ready
How to do it...
How it works...
There's more...
Entropy cutoff
Depth cutoff
Support cutoff
See also
Training a maximum entropy classifier
Getting ready
How to do it...
How it works...
There's more...
Scipy algorithms
Megam algorithm
See also
Measuring precision and recall of a classifier
How to do it...
How it works...
There's more...
F-measure
See also
Calculating high information words
How to do it...
How it works...
There's more...
MaxentClassifier with high information words
DecisionTreeClassifier with high information words
See also
Combining classifiers with voting
Getting ready
How to do it...
How it works...
See also
Classifying with multiple binary classifiers
Getting ready
How to do it...
How it works...
There's more...
See also
Index
Python Text Processing with NLTK 2.0 Cookbook: LITE
Python Text Processing with NLTK 2.0 Cookbook: LITE
Copyright © 2011 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: April 2011
Production Reference: 1130411
Published by Packt Publishing Ltd. 32 Lincoln Road Olton Birmingham, B27 6PA, UK.
ISBN 978-1-849516-38-9
www.packtpub.com
Cover Image by Sujay Gawand K (<sujay0000@gmail.com>)
Credits
Author
Jacob Perkins
Reviewers
Patrick Chan
Herjend Teny
Acquisition Editor
Steven Wilding
Technical Editors
Hithesh Uchil
Indexer
Hemangini Bari
Production Coordinator
Melwyn D'sa
Cover Work
Melwyn D'sa
About the Author
Jacob Perkins has been an avid user of open source software since high school, when he first built his own computer and didn't want to pay for Windows. At one point he had five operating systems installed, including Red Hat Linux, OpenBSD, and BeOS.
While at Washington University in St. Louis, Jacob took classes in Spanish and poetry writing, and worked on an independent study project that eventually became his Master's project: WUGLE—a GUI for manipulating logical expressions. In his free time, he wrote the Gnome2 version of Seahorse (a GUI for encryption and key management), which has since been translated into over a dozen languages and is included in the default Gnome distribution.
After receiving his MS in Computer Science, Jacob tried to start a web development studio with some friends, but since no one knew anything about web development, it didn't work out as planned. Once he'd actually learned about web development, he went off and co-founded another company called Weotta, which sparked his interest in Machine Learning and Natural Language Processing.
Jacob is currently the CTO/Chief Hacker for Weotta and blogs about what he's learned along the way at http://streamhacker.com/. He is also applying this knowledge to produce text processing APIs and demos at http://text-processing.com/. This book is a synthesis of his knowledge on processing text using Python, NLTK, and more.
Thanks to my parents for all their support, even when they don't understand what I'm doing; Grant for sparking my interest in Natural Language Processing; Les