Documente Academic
Documente Profesional
Documente Cultură
ETL
And with data warehouses came the corporation’s opportunity to look at information
across the corporation. But building a data base of integrated, historical, granular data
was not enough. As powerful as data can be inside a data warehouse, unless the end user
can unleash the potential of the data warehouse, there wasn’t much value in building a
data warehouse.
BI
ETL
With BI, organizations could create reports, transactions, and very sophisticated analysis
of the data found in the data warehouse. Among other things, graphical displays of
information was popular. The granular data found in the data warehouse provided a very
firm foundation for the analysis and discovery of corporate information.
BI
ETL
BI was designed to operate on the data found in the data warehouse. And exactly what
was the essential nature of the data found in the data warehouse? The data found in the
classical data warehouse was –
ETL
- numeric data
- repetitive data
- pints of interest data
Such an arrangement of data might be typical for the contents of a classical data
warehouse. Consider how the data might be used for analysis. Some data will be used for
selecting data and discriminating data from other data. Other data – numeric data – will
be used in calculations and comparisons. Given the typical data base that has been
described, the analyst could examine such things as –
- numeric data
- repetitive data
- pints of interest data
Addition
Selection subtraction
grouping multiplication
discrete reporting comparison
In order to do the analysis and the calculations, a BI tool could be used on top of the data.
BI
- numeric data
- repetitive data
- pints of interest data
For many years data warehouse and BI worked more or less as described. And – even
though they are not aware of it – BI tools focused on operating on repetitive numeric data
for calculation and comparison, where nonnumeric data served the purpose of allowing
data to be selected and grouped together. And BI tools simply expected to find repetitive
data in the data warehouse, where the same type of data was repeated over and over.
Now it is possible to build a data warehouse using textual ETL, such as that provided by
Forest Rim Technology. Now unstructured text can be read into Textual ETL and a new
type of data warehouse can be built.
Text Textual
ETL
Text
The contents of an unstructured data warehouse are very different than that of a classical
data warehouse. The contents of an unstructured data warehouse are – not surprisingly –
text. However, the text that arrives in the unstructured data warehouse is formatted into a
standard relational data base. For years organizations have been able to place text in a
relational data base in the form of blobs. But once text is placed into a relational data base
in the form of a blob, there is not a lot that can be done with it.
Instead textual ETL passes the text through a myriad of algorithms before the text is
placed in the relational data base. (NOTE: most of the important algorithms are patent
pending. See Forest Rim Technology for licensing opportunities.) The net result is a
relational data base that can be used for analytical purposes.
Text Textual
ETL
But creating new forms of a data warehouse leads to its own challenges (as well as
opportunities.) The first thing the organization discovers is that classical BI does not
work very well with an unstructured data warehouse. What is needed is an entirely
different kind of BI. What is needed is Textual BI.
Textual
BI
Text Textual
ETL
Text
Text
Nonnumeric
Text
Under normal circumstances data is not repetitive in an unstructured data warehouse. But
there are some circumstances – for some types of data – where there is a certain amount
of repetition in a data warehouse.
Now consider the transcripts from a call center. Certainly a person participating in the
call center conversation can say whatever he/she wants to say. But most operators
working a call center have been carefully trained to structure the conversation. As a result
there is a certain similarity to the structure of each call.
And there are plenty other examples of structural repetition in the world of text.
But there are plenty of examples where there is no structural repetition in text. Consider
emails. In emails, a person can say whatever he/she wants to say. The email can be short
or long. The email can be formal or informal. The email can be in any language, and so
forth. There simply is no structural conformity of text when it comes to email.
Text
Text
contracts email
call center calls law
insurance claims medical records
warranty claims depositions
log records doctors notes
real estate filings
REPETITIOUS CONTRACTS
Pictured below are three different contracts. There is great similarity between the
contracts but each contract is certainly different from any other contract.
Text
Text
Classical Textual
BI BI
Another way of looking at this concept is that an unstructured data warehouse can have
two types of BI used, as in the case of contracts, while completely non repetitive text can
have only textual BI used against it, as in the case of the Dodd Frank law. The diagram
below makes this point.
Text
Text
Contract
Law
Contract
Classical Textual
BI BI
TEXTUAL BI – AN EXAMPLE
So what does Textual BI look like? Consider the following example of Textual BI, from
Forest Rim Technology.
In the diagram below, the basic screen is shown. It is seen that there is basic query
management, there is parametric control of the query, there is execution of the query, and
there is the display of the results of the query. In many ways this screen is analogous the
a SQL statement. The difference is that this elaborate query management tool is built
specifically for the management of textual data, not general purpose access and analysis
of a relational data base.
The most interesting part of the textual Business Intelligence query is in the execution.
The screen below shows a simple query where there is a search for all contracts where
there is a mention of “naphtha” and “helium”
The query is executed and there are six occurrences of contracts in which “naphtha” and
“helium” are mentioned.
Now that the query has been executed, the results are displayed. First the basic
parameters of the query are shown. Not that the results can be displayed in four ways –
Suppose the analyst merely wants to find the contracts where the references are found.
The results would look like –
Or suppose the analyst wants to find the exact byte location where the references are
found. The results would look like -
Or suppose the analyst wanted to see snippets of text where the references are found. The
results would look like -
Or suppose the analyst wanted to see the entire document and at a glance see where the
references are found in the document. The results would look like -
There are then many different ways to look at and analyze text using Textual Business
Intelligence.
The example that has been shown was selected for its simplicity. Textual Business
Intelligence can look at text in many different ways, other than the simple example that
has been shown. The diagram below depicts just some of the many sophisticated ways
that analysis can be done with textual Business Intelligence.
Textual
BI
Text Textual
ETL
Classical
BI
Textual
BI
References