Python Text Processing with NLTK 2.0 Cookbook: LITE
4/5
()
About this ebook
Read more from Jacob Perkins
Python 3 Text Processing with NLTK 3 Cookbook Rating: 4 out of 5 stars4/5Natural Language Processing: Python and NLTK Rating: 0 out of 5 stars0 ratings
Related to Python Text Processing with NLTK 2.0 Cookbook
Related ebooks
Parallel Programming with Python Rating: 0 out of 5 stars0 ratingsLearning Python Design Patterns - Second Edition Rating: 0 out of 5 stars0 ratingsAdvanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch Rating: 0 out of 5 stars0 ratingsLearning NumPy Array Rating: 0 out of 5 stars0 ratingsMastering Python Regular Expressions Rating: 5 out of 5 stars5/5Getting Started with Python Data Analysis Rating: 0 out of 5 stars0 ratingsDistributed Computing with Python Rating: 0 out of 5 stars0 ratingsMastering Python Design Patterns Rating: 0 out of 5 stars0 ratingsPython Unlocked Rating: 0 out of 5 stars0 ratingsTransfer Learning for Natural Language Processing Rating: 0 out of 5 stars0 ratingsMastering Flask Rating: 0 out of 5 stars0 ratingsNatural Language Processing with Java Rating: 0 out of 5 stars0 ratingsJava Data Science Cookbook Rating: 0 out of 5 stars0 ratingsPractical Python Data Visualization: A Fast Track Approach To Learning Data Visualization With Python Rating: 4 out of 5 stars4/5Mastering Scala Machine Learning Rating: 0 out of 5 stars0 ratingsNatural Language Processing with Java and LingPipe Cookbook Rating: 0 out of 5 stars0 ratingsTensorFlow Machine Learning Cookbook Rating: 4 out of 5 stars4/5Python Text Mining: Perform Text Processing, Word Embedding, Text Classification and Machine Translation Rating: 0 out of 5 stars0 ratingsMastering Python Data Analysis Rating: 0 out of 5 stars0 ratingsReal-World Natural Language Processing: Practical applications with deep learning Rating: 0 out of 5 stars0 ratingsNumPy Essentials Rating: 0 out of 5 stars0 ratingsInteractive Applications Using Matplotlib Rating: 0 out of 5 stars0 ratingsMachine Learning Systems: Designs that scale Rating: 0 out of 5 stars0 ratingsLarge Scale Machine Learning with Python Rating: 2 out of 5 stars2/5PyTorch Recipes: A Problem-Solution Approach Rating: 0 out of 5 stars0 ratingsFlask Framework Cookbook Rating: 5 out of 5 stars5/5Python: Deeper Insights into Machine Learning Rating: 0 out of 5 stars0 ratingsAdvanced Machine Learning with Python Rating: 0 out of 5 stars0 ratings
Information Technology For You
Computer Science: A Concise Introduction Rating: 4 out of 5 stars4/5ChatGPT: The Future of Intelligent Conversation Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5Health Informatics: Practical Guide Rating: 0 out of 5 stars0 ratingsCompTIA ITF+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Exam FC0-U61 Rating: 0 out of 5 stars0 ratingsInkscape Beginner’s Guide Rating: 5 out of 5 stars5/5How To Use Chatgpt: Using Chatgpt To Make Money Online Has Never Been This Simple Rating: 0 out of 5 stars0 ratingsHow to Write Effective Emails at Work Rating: 4 out of 5 stars4/5AWS Certified Cloud Practitioner: Study Guide with Practice Questions and Labs Rating: 5 out of 5 stars5/5Handbook of Digital Forensics and Investigation Rating: 4 out of 5 stars4/5Programming for Everyone Rating: 3 out of 5 stars3/5An Ultimate Guide to Kali Linux for Beginners Rating: 3 out of 5 stars3/5Micro Niches Rating: 0 out of 5 stars0 ratingsLinux Command Line and Shell Scripting Bible Rating: 3 out of 5 stars3/5Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps Rating: 3 out of 5 stars3/5CODING INTERVIEW: Advanced Methods to Learn and Excel in Coding Interview Rating: 0 out of 5 stars0 ratingsHacking Essentials - The Beginner's Guide To Ethical Hacking And Penetration Testing Rating: 3 out of 5 stars3/5Computer Organization and Design: The Hardware / Software Interface Rating: 4 out of 5 stars4/5The Best Damn Cybercrime and Digital Forensics Book Period Rating: 3 out of 5 stars3/5Data Governance For Dummies Rating: 0 out of 5 stars0 ratingsSupercommunicator: Explaining the Complicated So Anyone Can Understand Rating: 3 out of 5 stars3/520 Windows Tools Every SysAdmin Should Know Rating: 5 out of 5 stars5/5Windows Registry Forensics: Advanced Digital Forensic Analysis of the Windows Registry Rating: 4 out of 5 stars4/5CompTIA A+ CertMike: Prepare. Practice. Pass the Test! Get Certified!: Core 1 Exam 220-1101 Rating: 0 out of 5 stars0 ratingsHow to Find a Wolf in Siberia (or, How to Troubleshoot Almost Anything) Rating: 0 out of 5 stars0 ratingsPractical Ethical Hacking from Scratch Rating: 5 out of 5 stars5/5The Basics of Hacking and Penetration Testing: Ethical Hacking and Penetration Testing Made Easy Rating: 4 out of 5 stars4/5
Reviews for Python Text Processing with NLTK 2.0 Cookbook
1 rating0 reviews
Book preview
Python Text Processing with NLTK 2.0 Cookbook - Jacob Perkins
Table of Contents
Python Text Processing with NLTK 2.0 Cookbook: LITE
Credits
About the Author
About the Reviewers
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. Tokenizing Text and WordNet Basics
Introduction
Tokenizing text into sentences
Getting ready
How to do it...
How it works...
There's more...
Other languages
See also
Tokenizing sentences into words
How to do it...
How it works...
There's more...
Contractions
PunktWordTokenizer
WordPunctTokenizer
See also
Tokenizing sentences using regular expressions
Getting ready
How to do it...
How it works...
There's more...
Simple whitespace tokenizer
See also
Filtering stopwords in a tokenized sentence
Getting ready
How to do it...
How it works...
There's more...
See also
Looking up synsets for a word in WordNet
Getting ready
How to do it...
How it works...
There's more...
Hypernyms
Part-of-speech (POS)
See also
Looking up lemmas and synonyms in WordNet
How to do it...
How it works...
There's more...
All possible synonyms
Antonyms
See also
Calculating WordNet synset similarity
How to do it...
How it works...
There's more...
Comparing verbs
Path and LCH similarity
See also
Discovering word collocations
Getting ready
How to do it...
How it works...
There's more...
Scoring functions
Scoring ngrams
2. Replacing and Correcting Words
Introduction
Stemming words
How to do it...
How it works...
There's more...
LancasterStemmer
RegexpStemmer
SnowballStemmer
See also
Lemmatizing words with WordNet
Getting ready
How to do it...
How it works...
There's more...
Combining stemming with lemmatization
See also
Translating text with Babelfish
Getting ready
How to do it...
How it works...
There's more...
Available languages
Replacing words matching regular expressions
Getting ready
How to do it...
How it works...
There's more...
Replacement before tokenization
See also
Removing repeating characters
Getting ready
How to do it...
How it works...
There's more...
See also
Spelling correction with Enchant
Getting ready
How to do it...
How it works...
There's more...
en_GB dictionary
Personal word lists
See also
Replacing synonyms
Getting ready
How to do it...
How it works...
There's more...
CSV synonym replacement
YAML synonym replacement
See also
Replacing negations with antonyms
How to do it...
How it works...
There's more...
See also
3. Text Classification
Introduction
Bag of Words feature extraction
How to do it...
How it works...
There's more...
Filtering stopwords
Including significant bigrams
See also
Training a naive Bayes classifier
Getting ready
How to do it...
How it works...
There's more...
Classification probability
Most informative features
Training estimator
Manual training
See also
Training a decision tree classifier
Getting ready
How to do it...
How it works...
There's more...
Entropy cutoff
Depth cutoff
Support cutoff
See also
Training a maximum entropy classifier
Getting ready
How to do it...
How it works...
There's more...
Scipy algorithms
Megam algorithm
See also
Measuring precision and recall of a classifier
How to do it...
How it works...
There's more...
F-measure
See also
Calculating high information words
How to do it...
How it works...
There's more...
MaxentClassifier with high information words
DecisionTreeClassifier with high information words
See also
Combining classifiers with voting
Getting ready
How to do it...
How it works...
See also
Classifying with multiple binary classifiers
Getting ready
How to do it...
How it works...
There's more...
See also
Index
Python Text Processing with NLTK 2.0 Cookbook: LITE
Python Text Processing with NLTK 2.0 Cookbook: LITE
Copyright © 2011 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: April 2011
Production Reference: 1130411
Published by Packt Publishing Ltd. 32 Lincoln Road Olton Birmingham, B27 6PA, UK.
ISBN 978-1-849516-38-9
www.packtpub.com
Cover Image by Sujay Gawand K (<sujay0000@gmail.com>)
Credits
Author
Jacob Perkins
Reviewers
Patrick Chan
Herjend Teny
Acquisition Editor
Steven Wilding
Technical Editors
Hithesh Uchil
Indexer
Hemangini Bari
Production Coordinator
Melwyn D'sa
Cover Work
Melwyn D'sa
About the Author
Jacob Perkins has been an avid user of open source software since high school, when he first built his own computer and didn't want to pay for Windows. At one point he had five operating systems installed, including Red Hat Linux, OpenBSD, and BeOS.
While at Washington University in St. Louis, Jacob took classes in Spanish and poetry writing, and worked on an independent study project that eventually became his Master's project: WUGLE—a GUI for manipulating logical expressions. In his free time, he wrote the Gnome2 version of Seahorse (a GUI for encryption and key management), which has since been translated into over a dozen languages and is included in the default Gnome distribution.
After receiving his MS in Computer Science, Jacob tried to start a web development studio with some friends, but since no one knew anything about web development, it didn't work out as planned. Once he'd actually learned about web development, he went off and co-founded another company called Weotta, which sparked his interest in Machine Learning and Natural Language Processing.
Jacob is currently the CTO/Chief Hacker for Weotta and blogs about what he's learned along the way at http://streamhacker.com/. He is also applying this knowledge to produce text processing APIs and demos at http://text-processing.com/. This book is a synthesis of his knowledge on processing text using Python, NLTK, and more.
Thanks to my parents for all their support, even when they don't understand what I'm doing; Grant for sparking my interest in Natural Language Processing; Les