OLTP: On Line Transaction Processing Describes processing at operational sites OLAP: On Line Analytical Processing Describes processing at warehouse OLTP vs. OLAP Mostly updates Many small transactions Mb-Gb of data Raw data Clerical users Up-to-date data Consistency, recoverability critical Mostly reads Queries long, complex Gb-Tb of data Summarized, consolidated data Decision-makers, analysts as users OLTP OLAP INTRODUCTION Online transaction processing (OLTP) the gathering of input information, processing that information, and updating existing information to reflect the gathered and processed information Databases support OLTP Operational database databases that support OLTP INTRODUCTION Online analytical processing (OLAP) the manipulation of information to support decision making Databases can support some OLAP Data warehouses only support OLAP, not OLTP Data warehouses are special forms of databases that support decision making INTRODUCTION OLTP vs. OLAP On-line Transaction Processing (OLTP) Day-to-day handling of transactions that result from enterprise operation Maintains correspondence between database state and enterprise state On-line Analytic Processing (OLAP) Analysis of information in a database for the purpose of making management decisions OLAP Analyzes historical data (terabytes) using complex queries Due to volume of data and complexity of queries, OLAP often uses a data warehouse Data Warehouse - (offline) repository of historical data generated from OLTP or other sources Data Mining - use of warehouse data to discover relationships that might influence enterprise strategy Examples - Supermarket OLTP Event is 3 cans of soup and 1 box of crackers bought; update database to reflect that event OLAP Last winter in all stores in northeast, how many customers bought soup and crackers together? Data Mining Are there any interesting combinations of foods that customers frequently bought together? OLTP: On Line Transaction Processing Describes processing at operational sites OLAP: On Line Analytical Processing Describes processing at warehouse OLTP vs. OLAP Warehouse is a Specialized DB Standard DB (OLTP) Mostly updates Many small transactions Mb - Gb of data Current snapshot Index/hash on p.k. Raw data Thousands of users (e.g., clerical users) Warehouse (OLAP) Mostly reads Queries are long and complex Gb - Tb of data History Lots of scans Summarized, reconciled data Hundreds of users (e.g., decision-makers, analysts) Decision Support Information technology to help the knowledge worker (executive, manager, analyst) make faster & better decisions What were the sales volumes by region and product category for the last year? How did the share price of comp. manufacturers correlate with quarterly profits over the past 10 years? Which orders should we fill to maximize revenues? On-line analytical processing (OLAP) is an element of decision support systems (DSS) Three-Tier Decision Support Systems Warehouse database server Almost always a relational DBMS, rarely flat files OLAP servers Relational OLAP (ROLAP): extended relational DBMS that maps operations on multidimensional data to standard relational operators Multidimensional OLAP (MOLAP): special-purpose server that directly implements multidimensional data and operations Clients Query and reporting tools Analysis tools Data mining tools The Complete Decision Support System Information Sources Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Clients (Tier 3) Operational DBs Semistructured Sources extract transform load refresh etc. Data Marts Data Warehouse e.g., MOLAP e.g., ROLAP serve Analysis Query/Reporting Data Mining serve serve Data Warehouse vs. Data Marts Enterprise warehouse: collects all information about subjects (customers,products,sales,assets, personnel) that span the entire organization Requires extensive business modeling (may take years to design and build) Data Marts: Departmental subsets that focus on selected subjects Marketing data mart: customer, product, sales Faster roll out, but complex integration in the long run Virtual warehouse: views over operational dbs Materialize sel. summary views for efficient query processing Easy to build but require excess capability on operat. db servers OLAP for Decision Support OLAP = Online Analytical Processing Support (almost) ad-hoc querying for business analyst Think in terms of spreadsheets View sales data by geography, time, or product Extend spreadsheet analysis model to work with warehouse data Large data sets Semantically enriched to understand business terms Combine interactive queries with reporting functions Multidimensional view of data is the foundation of OLAP Data model, operations, etc. Approaches to OLAP Servers Relational DBMS as Warehouse Servers Two possibilities for OLAP servers (1) Relational OLAP (ROLAP) Relational and specialized relational DBMS to store and manage warehouse data OLAP middleware to support missing pieces (2) Multidimensional OLAP (MOLAP) Array-based storage structures Direct access to array data structures
OLAP Server: Query Engine Requirements Aggregates (maintenance and querying) Decide what to precompute and when Query language to support multidimensional operations Standard SQL falls short Scalable query processing Data intensive and data selective queries Data Warehousing and End-User Access Tools Accompanying growth in data warehouses is increasing demands for more powerful access tools providing advanced analytical capabilities.
Key developments include: Online analytical processing (OLAP). SQL extensions for complex data analysis. Data mining tools.
Introducing OLAP The dynamic synthesis, analysis, and consolidation of large volumes of multi- dimensional data, Codd (1993).
Describes a technology that uses a multi- dimensional view of aggregate data to provide quick access to strategic information for purposes of advanced analysis. Introducing OLAP Enables users to gain a deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to a wide variety of possible views of the data.
Allows users to view corporate data in such a way that it is a better model of the true dimensionality of the enterprise. Introducing OLAP Can easily answer who? and what? questions, however, ability to answer what if? and why? type questions distinguishes OLAP from general-purpose query tools.
Types of analysis ranges from basic navigation and browsing (slicing and dicing) to calculations, to more complex analyses such as time series and complex modeling. OLAP Benchmarks OLAP Council published an analytical processing benchmark referred to as the APB-1 (OLAP Council, 1998).
Aim is to measure a servers overall OLAP performance rather than the performance of individual tasks. OLAP Benchmarks APB-1 assesses most common business operations including: bulk loading of data from internal/external data sources; incremental loading of data from operational systems; aggregation of input/level data along hierarchies; calculation of new data based on business models; time series analysis; queries with a high degree of complexity; drill-down through hierarchies; ad hoc queries; multiple online sessions. OLAP Benchmarks OLAP applications are judged on their ability to provide just-in-time (JIT) information, a core requirement of supporting effective decision-making.
Assessing a servers ability to satisfy this requirement is more than measuring processing performance but includes its abilities to model complex business relationships and to respond to changing business requirements. OLAP Benchmarks APB-1 uses a standard benchmark metric called AQM (Analytical Queries per Minute).
AQM represents number of analytical queries processed per minute including data loading and computation time. Thus, AQM incorporates data loading performance, calculation performance, and query performance into a singe metric. OLAP Benchmarks Publication of APB-1 benchmark results must include both the database schema and all code required for executing the benchmark.
An essential requirement of all OLAP applications is ability to provide users with JIT information, to make effective decisions about an organizations strategic directions. OLAP Applications JIT information is computed data that usually reflects complex relationships and is often calculated on the fly. Also, as data relationships may not be known in advance, the data model must be flexible. Examples of OLAP Applications in Various Functional Areas
OLAP Applications Although OLAP applications are found in widely divergent functional areas, all have following key features: multi-dimensional views of data; support for complex calculations; time intelligence. OLAP Applications - Multi- Dimensional Views of Data
Core requirement of building a realistic business model.
Provides basis for analytical processing through flexible access to corporate data.
The underlying database design that provides the multi-dimensional view of data should treat all dimensions equally. OLAP Applications - Support for Complex Calculations Must provide a range of powerful computational methods such as that required by sales forecasting, which uses trend algorithms such as moving averages and percentage growth.
Mechanisms for implementing computational methods should be clear and non-procedural.
OLAP Applications Time Intelligence Key feature of almost any analytical application as performance is almost always judged over time.
Time hierarchy is not always used in same manner as other hierarchies.
Concepts such as year-to-date and period-over-period comparisons should be easily defined.
OLAP Benefits Increased productivity of end-users. Reduced backlog of applications development for IT staff. Retention of organizational control over the integrity of corporate data. Reduced query drag and network traffic on OLTP systems or on the data warehouse. Improved potential revenue and profitability. Representing Multi-Dimensional Data Example of two-dimensional query. What is the total revenue generated by property sales in each city, in each quarter of 1997?
Choice of representation is based on types of queries end-user may ask.
Compare representation - three-field relational table versus two-dimensional matrix.
Multi-Dimensional Data as Three-Field Table versus Two-Dimensional Matrix Representing Multi-Dimensional Data Example of three-dimensional query. What is the total revenue generated by property sales for each type of property (Flat or House) in each city, in each quarter of 1997?
Compare representation - four-field relational table versus three-dimensional cube. Multi-Dimensional Data as Four- Field Table versus Three- Dimensional Cube Representing Multi-Dimensional Data Cube represents data as cells in an array.
Relational table only represents multi- dimensional data in two dimensions. Multi-Dimensional OLAP Servers Use multi-dimensional structures to store data and relationships between data.
Multi-dimensional structures are best visualized as cubes of data, and cubes within cubes of data. Each side of cube is a dimension.
A cube can be expanded to include other dimensions. Multi-Dimensional OLAP Servers A cube supports matrix arithmetic.
Multi-dimensional query response time depends on how many cells have to be added on the fly.
As number of dimensions increases, number of the cubes cells increases exponentially. Multi-Dimensional OLAP Servers However, majority of multi-dimensional queries use summarized, high-level data.
Solution is to pre-aggregate (consolidate) all logical subtotals and totals along all dimensions.
Pre-aggregation is valuable, as typical dimensions are hierarchical in nature. (e.g. Time dimension hierarchy - years, quarters, months, weeks, and days) Multi-Dimensional OLAP Servers Predefined hierarchy allows logical pre- aggregation and, conversely, allows for a logical drill-down.
Supports common analytical operations Consolidation. Drill-down. Slicing and dicing.
Multi-Dimensional OLAP Servers Consolidation - aggregation of data such as simple roll-ups or complex expressions involving inter-related data.
Drill-Down - is reverse of consolidation and involves displaying the detailed data that comprises the consolidated data.
Slicing and Dicing - (also called pivoting) refers to the ability to look at the data from different viewpoints.
Multi-Dimensional OLAP servers Can store data in a compressed form by dynamically selecting physical storage organizations and compression techniques that maximize space utilization.
Dense data (i.e., data that exists for high percentage of cells) can be stored separately from sparse data (i.e., significant percentage of cells are empty).
Multi-Dimensional OLAP Servers Ability to omit empty or repetitive cells can greatly reduce the size of the cube and the amount of processing.
Allows analysis of exceptionally large amounts of data.
Multi-Dimensional OLAP Servers In summary, pre-aggregation, dimensional hierarchy, and sparse data management can significantly reduce the size of the cube and the need to calculate values on- the-fly.
Removes need for multi-table joins and provides quick and direct access to arrays of data, thus significantly speeding up execution of multi-dimensional queries.
Codds Rules for OLAP Systems In 1993, E.F. Codd formulated twelve rules as the basis for selecting OLAP tools. Multi-dimensional conceptual view Transparency Accessibility Consistent reporting performance Client-server architecture Generic dimensionality
Codds Rules for OLAP Dynamic sparse matrix handling Multi-user support Unrestricted cross-dimensional operations Intuitive data manipulation Flexible reporting Unlimited dimensions and aggregation levels. Codds Rules for OLAP Systems There are proposals to re-define or extend the rules. For example, to also include: Comprehensive database management tools. Ability to drill down to detail (source record) level. Incremental database refresh. SQL interface to the existing enterprise environment. Categories of OLAP Tools OLAP tools are categorized according to the architecture of the underlying database.
Three main categories of OLAP tools include Multi-dimensional OLAP (MOLAP or MD-OLAP) Relational OLAP (ROLAP), also called multi-relational OLAP Managed query environment (MQE)
Multi-Dimensional OLAP (MOLAP) Uses specialized data structures and multi- dimensional Database Management Systems (MDDBMSs) to organize, navigate, and analyze data.
Data is typically aggregated and stored according to predicted usage to enhance query performance. Multi-Dimensional OLAP (MOLAP) Use array technology and efficient storage techniques that minimize the disk space requirements through sparse data management.
Provides excellent performance when data is used as designed, and the focus is on data for a specific decision-support application. Multi-Dimensional OLAP (MOLAP) Traditionally, require a tight coupling with the application layer and presentation layer.
Recent trends segregate the OLAP from the data structures through the use of published application programming interfaces (APIs).
Typical Architecture for MOLAP Tools
MOLAP Tools - Development Issues Underlying data structures are limited in their ability to support multiple subject areas and to provide access to detailed data.
Navigation and analysis of data is limited because the data is designed according to previously determined requirements.
MOLAP Tools - Development Issues MOLAP products require a different set of skills and tools to build and maintain the database, thus increasing the cost and complexity of support.
Relational OLAP (ROLAP) Fastest growing style of OLAP technology.
Supports RDBMS products using a metadata layer - avoids need to create a static multi-dimensional data structure - facilitates the creation of multiple multi- dimensional views of the two-dimensional relation. Relational OLAP (ROLAP) To improve performance, some products use SQL engines to support complexity of multi-dimensional analysis, while others recommend, or require, the use of highly denormalized database designs such as the star schema.
Typical Architecture for ROLAP Tools
ROLAP Tools - Development Issues Middleware to facilitate the development of multi-dimensional applications. (Software that converts the two- dimensional relation into a multi- dimensional structure).
Development of an option to create persistent, multi-dimensional structures with facilities to assist in the administration of these structures. Hybrid OLAP (HOLAP) Can use data from either a RDBMS directly or a multi-dimension server.
Managed Query Environment (MQE) Relatively new development.
Provide limited analysis capability, either directly against RDBMS products, or by using an intermediate MOLAP server. Managed Query Environment (MQE) Deliver selected data directly from DBMS or via a MOLAP server to desktop (or local server) in form of a datacube, where it is stored, analyzed, and maintained locally.
Promoted as being relatively simple to install and administer with reduced cost and maintenance.
Typical Architecture for MQE Tools
MQE Tools - Development Issues Architecture results in significant data redundancy and may cause problems for networks that support many users.
Ability of each user to build a custom datacube may cause a lack of data consistency among users.
Only a limited amount of data can be efficiently maintained. OLAP Extensions to SQL SQL promoted as easy to learn, non- procedural, free-format, DBMS- independent, and international standard.
However, major disadvantage has been inability to represent many of the questions most commonly asked by business analysts.
IBM and Oracle jointly proposed OLAP extensions to SQL early in 1999, adopted as an amendment to SQL. OLAP Extensions to SQL Many database vendors including IBM, Oracle, Informix, and Red Brick Systems have already implemented portions of specifications in their DBMSs.
Red Brick Systems was first to implement many essential OLAP functions (as Red Brick Intelligent SQL (RISQL)), albeit in advance of the standard. OLAP Extensions to SQL - RISQL Designed for business analysts.
Set of extensions that augments SQL with a variety of powerful operations appropriate to data analysis and decision- support applications such as ranking, moving averages, comparisons, market share, this year versus last year. Approaches to OLAP Servers Three possibilities for OLAP servers (1) Relational OLAP (ROLAP) Relational and specialized relational DBMS to store and manage warehouse data OLAP middleware to support missing pieces (2) Multidimensional OLAP (MOLAP) Array-based storage structures Direct access to array data structures (3) Hybrid OLAP (HOLAP) Storing detailed data in RDBMS Storing aggregated data in MDBMS User access via MOLAP tools
Points to be noticed about ROLAP Defines complex, multi-dimensional data with simple model Reduces the number of joins a query has to process Allows the data warehouse to evolve with rel. low maintenance Can contain both detailed and summarized data. ROLAP is based on familiar, proven, and already selected technologies. BUT!!! SQL for multi-dimensional manipulation of calculations. MOLAP: Dimensional Modeling Using the Multi Dimensional Model MDDB: a special-purpose data model Facts stored in multi-dimensional arrays Dimensions used to index array Sometimes on top of relational DB Products Pilot, Arbor Essbase, Gentia The MOLAP Cube sale prodId storeId amt p1 s1 12 p2 s1 11 p1 s3 50 p2 s2 8 s1 s2 s3 p1 12 50 p2 11 8 Fact table view: Multi-dimensional cube: dimensions = 2 3-D Cube dimensions = 3 Multi-dimensional cube: Fact table view: sale prodId storeId date amt p1 s1 1 12 p2 s1 1 11 p1 s3 1 50 p2 s2 1 8 p1 s1 2 44 p1 s2 2 4 day 2 s1 s2 s3 p1 44 4 p2 s1 s2 s3 p1 12 50 p2 11 8 day 1 Example P r o d u c t
Time M T W Th F S S Juice Milk Coke Cream Soap Bread NY SF LA 10 34 56 32 12 56 56 units of bread sold in LA on M Dimensions: Time, Product, Store Attributes: Product (upc, price, ) Store
Hierarchies: Product Brand Day Week Quarter Store Region Country roll-up to week roll-up to brand roll-up to region Cube Aggregation: Roll-up day 2 s1 s2 s3 p1 44 4 p2 s1 s2 s3 p1 12 50 p2 11 8 day 1 s1 s2 s3 p1 56 4 50 p2 11 8 s1 s2 s3 sum 67 12 50 sum p1 110 p2 19 129 . . . drill-down rollup Example: computing sums Cube Operators for Roll-up day 2 s1 s2 s3 p1 44 4 p2 s1 s2 s3 p1 12 50 p2 11 8 day 1 s1 s2 s3 p1 56 4 50 p2 11 8 s1 s2 s3 sum 67 12 50 sum p1 110 p2 19 129 . . . sale(s1,*,*) sale(*,*,*) sale(s2,p2,*) s1 s2 s3 * p1 56 4 50 110 p2 11 8 19 * 67 12 50 129 Extended Cube day 2 s1 s2 s3 * p1 44 4 48 p2 * 44 4 48 s1 s2 s3 * p1 12 50 62 p2 11 8 19 * 23 8 50 81 day 1 * sale(*,p2,*) Aggregation Using Hierarchies region A region B p1 56 54 p2 11 8 store region country (store s1 in Region A; stores s2, s3 in Region B) day 2 s1 s2 s3 p1 44 4 p2 s1 s2 s3 p1 12 50 p2 11 8 day 1 Points to be noticed about MOLAP Pre-calculating or pre-consolidating transactional data improves speed. BUT Fully pre-consolidating incoming data, MDDs require an enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GB
MDDs are great candidates for the <50GB department data marts.
Rolling up and Drilling down through aggregate data.
With MDDs, application design is essentially the definition of dimensions and calculation rules, while the RDBMS requires that the database schema be a star or snowflake. Hybrid OLAP (HOLAP) HOLAP = Hybrid OLAP:
Best of both worlds
Storing detailed data in RDBMS
Storing aggregated data in MDBMS
User access via MOLAP tools Multi- dimensiona l access Multidimensional Viewer Relational Viewer Client MDBMS Server Multi- dimensio naldata SQL- Read RDBMS Server User data Meta data Derived data SQL- Reach Through SQL- Read Data Flow in HOLAP