Sunteți pe pagina 1din 11

Presentation Transcript

Why Text Mining?:


Why Text Mining? Welcome to the age of too much information. We can now easily retrieve far more
relevant information than is humanly possible to read. 2012 by VP Institute, All rights reserved
PowerPoint Presentation:
What does Text Mining give us? Since text mining allows us to use computers to read the
information, we can digest far more information than we could before. 2012 by VP Institute, All
rights reserved
When does Text Mining Work?:
When does Text Mining Work? When what you seek is a pattern (not a specific document) There is a
distinct difference between search and retrieval and text mining When your information is
available in electronic machine readable form PDF images introduce an added layer of complexity due
to OCR issues When your electronic information is accessible via bulk download If you have to
download one/few records at a time, adding a bot to do the downloads is possible but usually adds
additional technical and licensing issues 2012 by VP Institute, All rights reserved
PowerPoint Presentation:
What are Patterns in Text? Patterns in text are the relationships between words or phrases that repeat
across many different documents. For example, if one document mentions sodium chloride and
salt and then another document mentions sodium chloride and salt and then another and
another etc You begin to assume that sodium chloride and salt are related. 2012 by VP
Institute, All rights reserved
PowerPoint Presentation:
Patterns have Meaning Patterns that we find represent higher order abstractions within the large text
collection. In our salt example, we can induce that Sodium Chloride is a salt Meaning 2012 by VP
Institute, All rights reserved
PowerPoint Presentation:
How do we find a pattern? Word 1 Word 2 Use Co-word Bibliometrics /Co-occurrence statistics to find
relationships Count the number of times words appear together in a set of documents The higher the
co-occurrence, the stronger the potential relationship 2012 by VP Institute, All rights reserved
What kind of questions can we answer with text-mining software?:
What kind of questions can we answer with text-mining software? Who? Where? What? When? 2012
by VP Institute, All rights reserved

Commercial[edit]

AeroText a suite of text mining applications for content analysis. Content used can be in
multiple languages.

Angoss Angoss Text Analytics provides entity and theme extraction, topic categorization,
sentiment analysis and document summarization capabilities via the
embeddedLexalytics Salience Engine. The software provides the unique capability of merging
the output of unstructured, text-based analysis with structured data to provide additional
predictive variables for improved predictive models and association analysis.

Attensity hosted, integrated and stand-alone text mining (analytics) software that uses
natural language processing technology to address collective intelligence in social media and
forums; the voice of the customer in surveys and emails; customer relationship management; eservices; research and e-discovery; risk and compliance; and intelligence analysis.

AUTINDEX - is a commercial text mining software package based on sophisticated


linguistics by IAI (Institute for Applied Information Sciences), Saarbrcken.

Autonomy text mining, clustering and categorization software

Averbis provides text analytics, clustering and categorization software, as well as


terminology management and enterprise search

Basis Technology provides a suite of text analysis modules to identify language, enable
search in more than 20 languages, extract entities, and efficiently search for and translate
entities.

Clarabridge text analytics (text mining) software, including natural language (NLP),
machine learning, clustering and categorization. Provides SaaS, hosted and on-premise text and
sentiment analytics that enables companies to collect, listen to, analyze, and act on the Voice of
the Customer (VOC) from both external (Twitter, Facebook, Yelp!, product forums, etc.) and
internal sources (call center notes, CRM, Enterprise Data Warehouse, BI, surveys, emails, etc.).

Complete Discovery Source - provides software and services for data discovery and data
analytics via Nytrix CIY and other proprietary tools.

Endeca Technologies provides software to analyze and cluster unstructured text.

Expert System S.p.A. suite of semantic technologies and products for developers and
knowledge managers.

FICO Score leading provider of analytics.

General Sentiment - Social Intelligence platform that uses natural language processing to
discover affinities between the fans of brands with the fans of traditional television shows in
social media. Stand alone text analytics to capture social knowledge base on billions of topics
stored to 2004.

IBM LanguageWare - the IBM suite for text analytics (tools and Runtime).

IBM SPSS - provider of Modeler Premium (previously called IBM SPSS Modeler and IBM
SPSS Text Analytics), which contains advanced NLP-based text analysis capabilities (multilingual sentiment, event and fact extraction), that can be used in conjunction with Predictive
Modeling. Text Analytics for Surveys provides the ability to categorize survey responses using
NLP-based capabilities for further analysis or reporting.

Inxight provider of text analytics, search, and unstructured visualization technologies.


(Inxight was bought by Business Objects that was bought by SAP AG in 2008).

LanguageWare text analysis libraries and customization software from IBM.

Language Computer Corporation text extraction and analysis tools, available in multiple
languages.

Lexalytics - provider of a text analytics engine used in Social Media Monitoring, Voice of
Customer, Survey Analysis, and other applications.

LexisNexis provider of business intelligence solutions based on an extensive news and


company information content set. LexisNexis acquired DataOps to pursue search

Luminoso enterprise feedback and text analytics solutions developed over a decade
of natural language processing (NLP), machine learning and artificial intelligenceresearch at MIT
Media Lab. Enables clients to understand, measure and act on large amounts of consumer
feedback, across multiple channels.[1][2]

Mathematica provides built in tools for text alignment, pattern matching, clustering and
semantic analysis.

Medallia - offers one system of record for survey, social, text, written and online feedback.

Megaputer Intelligence - derives actionable knowledge from large volumes of text and
structured data, including natural language processing (NLP), machine learning, sentiment
analysis, entity extraction, clustering, and categorization.

NetOwl suite of multilingual text and entity analytics products, including entity extraction,
link and event extraction, sentiment analysis, geotagging, name translation, name matching, and
identity resolution, among others.

RapidMiner with its Text Processing Extension data and text mining software.

SAS SAS Text Miner and Teragram; commercial text analytics, natural language
processing, and taxonomy software used for Information Management.

Semantria - offers its services via API and Excel plugin. It is a spinoff of text-analysis
software Lexalytics, but differs in that it is offered via API and Excel plugin, and in that it
incorporates a bigger knowledge base and uses deep learning.

Smartlogic Semaphore; Content Intelligence platform containing commercial text analytics,


natural language processing, rule-based classification, ontology/taxonomy modelling and
information vizualization software used for Information Management.

StatSoft provides STATISTICA Text Miner as an optional extension to STATISTICA Data


Miner, for Predictive Analytics Solutions.

Sysomos - provider social media analytics software platform, including text analytics and
sentiment analysis on online consumer conversations.

Textalytics - Meaning as a Service: a set of text analytics APIs that offer vertical, high-level
functionality targeted at specific usage scenarios: Semantic Publishing, Media Analysis, Voice of
the Customer.

WordStat - Content analysis and text mining add-on module of QDA Miner for analyzing
large amounts of text data.

Xpresso - XPRESSO, an engine developed by the Abzoobas core technology group, is


focused on the automated distillation of expressions in social media conversations. [3]

Commercial and Research[edit]

RxNLP API for Text Mining and NLP text mining APIs for both research and commercial
use. APIs includes n-gram generation, sentence clustering, opinion summarization, and others

Open source[edit]

Carrot2 text and search results clustering framework.

GATE General Architecture for Text Engineering, an open-source toolbox for natural
language processing and language engineering

Gensim - large-scale topic modelling and extraction of semantic information from


unstructured text (Python)

OpenNLP - natural language processing

Natural Language Toolkit (NLTK) a suite of libraries and programs for symbolic and
statistical natural language processing (NLP) for the Python programming language.

Text Mechanic Simple, single task, browser based, text manipulation tools. [4]

The programming language R provides a framework for text mining applications in the
package tm.[5] The Natural Language Processing task view contains tm and other text mining
library packages.[6]

The KNIME Text Processing extension.

KH Coder - For content analysis, text mining or corpus linguistics.

The PLOS Text Mining Collection[7]

S-ar putea să vă placă și