Sunteți pe pagina 1din 27

Watson Knowledge Studio

Need for WATSON Knowledge


Studio
Unstructured Data is Exploding
Every day we produce 2.5 quintillion bytes of new data
Most of this new data is unstructured data
Books, journals, health records, e-mails, blogs, tweets, etc.

Holds many possibilities, but is vastly underutilized due to challenges in


understanding and using the data
Typical organizations only leverage 8% of this data!
Need for WATSON Knowledge
Studio
Why do we need to have the capability to analyze Unstructured Data
Need for WATSON Knowledge
Studio
Data Growth is Driven by Unstructured data. We cannot Ignore it
Need for WATSON Knowledge
Studio
We have seen it. Many a times the Remarks/Comments column
in extracts or status sheet has more information
Need for WATSON Knowledge
Studio

How do I extract Information from Unstructured data ?


What are the Challenges ?
Need for WATSON Knowledge
Studio

Each Domain is different.


Domain Entities are different. The way Entities are related are different.
How do we Take care of this ?
Extracting Information from unstructured data

Natural Language Processing (NLP) is a


core library that helps you to extract
information from unstructured data

Most organizations need to mine unstructured text for specific


information that is unique to their industry or business needs
Organizations must have the ability to customize the NLP model in order to
realize the full value/benefit of mining the unstructured text
Helps organizations generate business insights
Extracting Information from unstructured data

NLP - What is the Toolkit Landscape ?

NLP Toolkits

Open Source Google Cloud


Alchemy APIs Python Libraries- TextRazor API Natural Language
NLTK API
Need for WATSON Knowledge
Studio

How do I customize this Toolkit to the Domain that I am planning to apply


into ?
Need for WATSON Knowledge
Studio

IBM WATSON Knowledge Studio Can help you build your Domain Model
( Entities and Relationships ) so that the NLP extraction libraries can work
for your domain EFFECTIVELY.
Introducing IBM Watson Knowledge Studi
Software-as-a-Service (SaaS) offering available exclusively through the
IBM Cloud Marketplace
Intended to accelerate the training and adaption of Watson with
specific industry and organizational domain knowledge
Leverages state-of-the-art supervised machine learning techniques that
allow you to create machine-learning models that understand the
linguistic nuances, meaning, and relationships specific to your industry
Watson Knowledge Studio

Enables developers and domain experts to collaborate on the creation of


custom annotator components that can be used to identify mentions and
relations in unstructured text

SME DEV

TargetUsers Watson Knowledge Studio Analytics Exchange

Watson Explorer AlchemyLanguage Discovery


Four key features of WKS

Provide an intuitive way to transfer knowledge from humans to


computers for text analytics solutions.
Administer the full lifecycle of annotator component development
within one tool.
Create custom annotator components from scratch that extract entities
and relations from domain-specific unstructured text.
Deploy annotator components from within Watson Knowledge Studio
to IBM Watson Explorer and IBM Watson Developer Cloud.
Watson Knowledge Studio terminology

An Annotator adds annotations (metadata) to text that appears in natural


language content. Used by applications to analyze and process text.
A Type System is an inventory of everything we want WKS to
understand about the unstructured text.
Mentions = any span of text relevant to the current domain
Example: airbag, child restraint system, etc.
Entities = group of Mentions that refer to the same thing
Example: CarMake, AccidentLocation
Relation = a binary relationship between two entities
Example: occurredAt defines a relationship between CarMake
and AccidentLocation
Annotation example

Entity: PERSON Entity: ORG


John Smith IBM Corp

John Smith works for IBM. He has been with Big Blue for 20 years.

Relation: employedBy Relation: employedBy


Creating an Annotator

Knowledge curation (performed outside of WKS)


Collect and maintain content relevant to a specific domain
Ground truth generation
Produce a collection of vetted data to train Watson on a specific domain
Annotator component development
Human annotations used to further train Watson
Annotator component evaluation
Determine which documents are promoted to ground truth
Annotator component deployment
Export model into machine-learning runtime environments
Example: Auto manufacturer

Use case: Identify safety defects using traffic incident reports


Solution: Create a NLP model that understands relationships between
manufacturer, make, model, type of incident, and date ofincident
1 Create a project

Defines the resources required to create a machine-learning annotator


training documents, type system, dictionaries, human annotations
2 Create a type system

Inventory of everything you want WKS to understand in unstructured text


Mentions, entities, relations
3 Add documents

Documents that are representative of your domain content (ie:corpus)


Create document sets and assign to human annotators
4 Pre-annotate using dictionaries

IBM Bluemix Analytics Exchange provides industry-specific dictionaries


that can be used to automatically annotate documents before humans
5 Annotate documents

Human annotators use the Ground Truth Editor to apply type system
labels to unstructured text
Multiple users will perform this task across document sets
5 Analyze results

Inter-Annotator Agreement (IAA) scores can be used to determine


whether humans are annotating overlapping documents consistently
Documents with a passing score are promoted to ground truth
6 Create a machine learning annotator

Select document sets that will be used to train the annotator


Can only train using documents that have been promoted to groundtruth
End-to-end domain adaptation
Watson Knowledge Studio trial

http://ibm.biz/ibmwatsonknowledgestudio

Free 30-day trial


5 authorized users
10 projects
Leverage artifacts from IBM Analytics Exchange
Deploy models directly to the Watson Developer Cloud