Watson Knowledge Studio

Need for WATSON Knowledge

Unstructured Data is Exploding
Every day we produce 2.5 quintillion bytes of new data
Most of this new data is unstructured data
Books, journals, health records, e-mails, blogs, tweets, etc.

Holds many possibilities, but is vastly underutilized due to challenges in

understanding and using the data
Typical organizations only leverage 8% of this data!
Why do we need to have the capability to analyze Unstructured Data
Data Growth is Driven by Unstructured data. We cannot Ignore it
We have seen it. Many a times the Remarks/Comments column
in extracts or status sheet has more information
How do I extract Information from Unstructured data ?

What are the Challenges ?
Each Domain is different.

Domain Entities are different. The way Entities are related are different.
How do we Take care of this ?
Extracting Information from unstructured data

Natural Language Processing (NLP) is a

core library that helps you to extract
information from unstructured data

Most organizations need to mine unstructured text for specific

information that is unique to their industry or business needs
Organizations must have the ability to customize the NLP model in order to
realize the full value/benefit of mining the unstructured text
Helps organizations generate business insights
Extracting Information from unstructured data

NLP - What is the Toolkit Landscape ?

NLP Toolkits

Open Source Google Cloud

Alchemy APIs Python Libraries- TextRazor API Natural Language
How do I customize this Toolkit to the Domain that I am planning to apply

into ?
IBM WATSON Knowledge Studio Can help you build your Domain Model
( Entities and Relationships ) so that the NLP extraction libraries can work
for your domain EFFECTIVELY.
Introducing IBM Watson Knowledge Studi
Software-as-a-Service (SaaS) offering available exclusively through the
IBM Cloud Marketplace
Intended to accelerate the training and adaption of Watson with
specific industry and organizational domain knowledge
Leverages state-of-the-art supervised machine learning techniques that
allow you to create machine-learning models that understand the
linguistic nuances, meaning, and relationships specific to your industry
Watson Knowledge Studio

Enables developers and domain experts to collaborate on the creation of

custom annotator components that can be used to identify mentions and
relations in unstructured text


TargetUsers Watson Knowledge Studio Analytics Exchange

Watson Explorer AlchemyLanguage Discovery

Four key features of WKS

Provide an intuitive way to transfer knowledge from humans to

computers for text analytics solutions.
Administer the full lifecycle of annotator component development
within one tool.
Create custom annotator components from scratch that extract entities
and relations from domain-specific unstructured text.
Deploy annotator components from within Watson Knowledge Studio
to IBM Watson Explorer and IBM Watson Developer Cloud.
Watson Knowledge Studio terminology

An Annotator adds annotations (metadata) to text that appears in natural

language content. Used by applications to analyze and process text.
A Type System is an inventory of everything we want WKS to
understand about the unstructured text.
Mentions = any span of text relevant to the current domain
Example: airbag, child restraint system, etc.
Entities = group of Mentions that refer to the same thing
Example: CarMake, AccidentLocation
Relation = a binary relationship between two entities
Example: occurredAt defines a relationship between CarMake
and AccidentLocation
Annotation example

Entity: PERSON Entity: ORG

John Smith IBM Corp

John Smith works for IBM. He has been with Big Blue for 20 years.

Relation: employedBy Relation: employedBy

Creating an Annotator

Knowledge curation (performed outside of WKS)

Collect and maintain content relevant to a specific domain
Ground truth generation
Produce a collection of vetted data to train Watson on a specific domain
Annotator component development
Human annotations used to further train Watson
Annotator component evaluation
Determine which documents are promoted to ground truth
Annotator component deployment
Export model into machine-learning runtime environments
Example: Auto manufacturer

Use case: Identify safety defects using traffic incident reports

Solution: Create a NLP model that understands relationships between
manufacturer, make, model, type of incident, and date ofincident
1 Create a project

Defines the resources required to create a machine-learning annotator

training documents, type system, dictionaries, human annotations
2 Create a type system

Inventory of everything you want WKS to understand in unstructured text

Mentions, entities, relations
3 Add documents

Documents that are representative of your domain content (ie:corpus)

Create document sets and assign to human annotators
4 Pre-annotate using dictionaries

IBM Bluemix Analytics Exchange provides industry-specific dictionaries

that can be used to automatically annotate documents before humans
5 Annotate documents

Human annotators use the Ground Truth Editor to apply type system
labels to unstructured text
Multiple users will perform this task across document sets
5 Analyze results

Inter-Annotator Agreement (IAA) scores can be used to determine

whether humans are annotating overlapping documents consistently
Documents with a passing score are promoted to ground truth
6 Create a machine learning annotator

Select document sets that will be used to train the annotator

Can only train using documents that have been promoted to groundtruth
End-to-end domain adaptation
Watson Knowledge Studio trial

Free 30-day trial

5 authorized users
10 projects
Leverage artifacts from IBM Analytics Exchange
Deploy models directly to the Watson Developer Cloud