Sunteți pe pagina 1din 13

Data Warehouse Process

Description A Data Warehouse is not an individual repository product. Rather, it is an overall strategy, or process, for building decision support systems and a knowledge-based applications architecture and environment that supports both everyday tactical decision making and long-term business strategizing. The Data Warehouse environment positions a business to utilize an enterprise-wide data store to link information from diverse sources and make the information accessible for a variety of user purposes, most notably, strategic analysis. Business analysts must be able to use the Warehouse for such strategic purposes as trend identification, forecasting, competitive analysis, and targeted market research. Data Warehouses and Data Warehouse applications are designed primarily to support executives, senior managers, and business analysts in making complex business decisions. Data Warehouse applications provide the business community with access to accurate, consolidated information from various internal and external sources. The primary objective of Data Warehousing is to bring together information from disparate sources and put the information into a format that is conducive to making business decisions. This objective necessitates a set of activities that are far more complex than just collecting data and reporting against it. Data Warehousing requires both business and technical expertise and involves the following activities: - Accurately identifying the business information that must be contained in the Warehouse - Identifying and prioritizing subject areas to be included in the Data Warehouse - Managing the scope of each subject area which will be implemented into the Warehouse on an iterative basis - Developing a scaleable architecture to serve as the Warehouses technical and application foundation, and identifying and selecting the hardware/software/middleware components to implement it - Extracting, cleansing, aggregating, transforming and validating the data to ensure accuracy and consistency - Defining the correct level of summarization to support business decision making - Establishing a refresh program that is consistent with business needs, timing and cycles - Providing user-friendly, powerful tools at the desktop to access the data in the Warehouse - Educating the business community about the realm of possibilities that are available to them through Data Warehousing - Establishing a Data Warehouse Help Desk and training users to effectively utilize the desktop tools - Establishing processes for maintaining, enhancing, and ensuring the ongoing success and applicability of the Warehouse Until the advent of Data Warehouses, enterprise databases were expected to serve multiple purposes, including online transaction processing, batch processing, reporting, and analytical processing. In most cases, the primary focus of computing resources was on satisfying operational needs and requirements. Information reporting and analysis needs were secondary considerations. As the use of PCs, relational databases, 4GL technology and end-user computing grew and changed the complexion of information processing, more and more business users demanded that their needs for information be addressed. Data Warehousing has evolved to meet those needs without disrupting operational processing. In the Data Warehouse model, operational databases are not accessed directly to perform information processing. Rather, they act as the source of data for the Data Warehouse, which is the information repository and point of access for information processing. There are sound reasons for separating operational and informational databases, as described below. - The users of informational and operational data are different. Users of informational data are generally managers and analysts; users of operational data tend to be clerical, operational and administrative staff. - Operational data differs from informational data in context and currency. Informational data contains an historical perspective that is not generally used by operational systems. - The technology used for operational processing frequently differs from the technology required to support informational needs.

- The processing characteristics for the operational environment and the informational environment are fundamentally different. The Data Warehouse functions as a Decision Support System (DSS) and an Executive Information System (EIS), meaning that it supports informational and analytical needs by providing integrated and transformed enterprise-wide historical data from which to do management analysis. A variety of sophisticated tools are readily available in the marketplace to provide user-friendly access to the information stored in the Data Warehouse. Data Warehouses can be defined as subject-oriented, integrated, time-variant, non-volatile collections of data used to support analytical decision making. The data in the Warehouse comes from the operational environment and external sources. Data Warehouses are physically separated from operational systems, even though the operational systems feed the Warehouse with source data. Subject Orientation Data Warehouses are designed around the major subject areas of the enterprise; the operational environment is designed around applications and functions. This difference in orientation (data vs. process) is evident in the content of the database. Data Warehouses do not contain information that will not be used for informational or analytical processing; operational databases contain detailed data that is needed to satisfy processing requirements but which has no relevance to management or analysis. Integration and Transformation The data within the Data Warehouse is integrated. This means that there is consistency among naming conventions, measurements of variables, encoding structures, physical attributes, and other salient data characteristics. An example of this integration is the treatment of codes such as gender codes. Within a single corporation, various applications may represent gender codes in different ways: male vs. female, m vs. f, and 1 vs. 0, etc. In the Data Warehouse, gender is always represented in a consistent way, regardless of the many ways by which it may be encoded and stored in the source data. As the data is moved to the Warehouse, it is transformed into a consistent representation as required. Time Variance All data in Data Warehouse is accurate as of some moment in time, providing an historical perspective. This differs from the operational environment in which data is intended to be accurate as of the moment of access. The data in the Data Warehouse is, in effect, a series of snapshots. Once the data is loaded into the enterprise data store and data marts, it cannot be updated. It is refreshed on a periodic basis, as determined by the business need. The operational data store, if included in the Warehouse architecture, may be updated. Non-Volatility Data in the Warehouse is static, not dynamic. The only operations that occur in Data Warehouse applications are the initial loading of data, access of data, and refresh of data. For these reasons, the physical design of a Data Warehouse optimizes the access of data, rather than focusing on the requirements of data update and delete processing. Data Warehouse Configurations A Data Warehouse configuration, also known as the logical architecture, includes the following components: - one Enterprise Data Store (EDS) - a central repository which supplies atomic (detail level) integrated information to the whole organization. - (optional) one Operational Data Store - a "snapshot" of a moment in time's enterprise-wide data - (optional) one or more individual Data Mart(s) - summarized subset of the enterprise's data specific to a functional area or department, geographical region, or time period - one or more Metadata Store(s) or Repository(ies) - catalog(s) of reference information about the primary data. Metadata is divided into two categories: information for technical use, and information for business end-users. The EDS is the cornerstone of the Data Warehouse. It can be accessed for both immediate informational needs and for analytical processing in support of strategic decision making, and can be used for drill-down

support for the Data Marts which contain only summarized data. It is fed by the existing subject area operational systems and may also contain data from external sources. The EDS in turn feeds individual Data Marts that are accessed by end-user query tools at the user's desktop. It is used to consolidate related data from multiple sources into a single source, while the Data Marts are used to physically distribute the consolidated data into logical categories of data, such as business functional departments or geographical regions. The EDS is a collection of daily "snapshots" of enterprise-wide data taken over an extended time period, and thus retains and makes available for tracking purposes the history of changes to a given data element over time. This creates an optimum environment for strategic analysis. However, access to the EDS can be slow, due to the volume of data it contains, which is a good reason for using Data Marts to filter, condense and summarize information for specific business areas. In the absence of the Data Mart layer, users can access the EDS directly. Metadata is "data about data," a catalog of information about the primary data that defines access to the Warehouse. It is the key to providing users and developers with a road map to the information in the Warehouse. Metadata comes in two different forms: end-user and transformational. End-user metadata serves a business purpose; it translates a cryptic name code that represents a data element into a meaningful description of the data element so that end-users can recognize and use the data. For example, metadata would clarify that the data element "ACCT_CD" represents "Account Code for Small Business." Transformational metadata serves a technical purpose for development and maintenance of the Warehouse. It maps the data element from its source system to the Data Warehouse, identifying it by source field name, destination field code, transformation routine, business rules for usage and derivation, format, key, size, index and other relevant transformational and structural information. Each type of metadata is kept in one or more repositories that service the Enterprise Data Store. While an Enterprise Data Store and Metadata Store(s) are always included in a sound Data Warehouse design, the specific number of Data Marts (if any) and the need for an Operational Data Store are judgment calls. Potential Data Warehouse configurations should be evaluated and a logical architecture determined according to business requirements. The Data Warehouse Process The james martin + co Data Warehouse Process does not encompass the analysis and identification of organizational value streams, strategic initiatives, and related business goals, but it is a prescription for achieving such goals through a specific architecture. The Process is conducted in an iterative fashion after the initial business requirements and architectural foundations have been developed with the emphasis on populating the Data Warehouse with "chunks" of functional subject-area information each iteration. The Process guides the development team through identifying the business requirements, developing the business plan and Warehouse solution to business requirements, and implementing the configuration, technical, and application architecture for the overall Data Warehouse. It then specifies the iterative activities for the cyclical planning, design, construction, and deployment of each population project. The following is a description of each stage in the Data Warehouse Process. (Note: The Data Warehouse Process also includes conventional project management, startup, and wrap-up activities which are detailed in the Plan, Activate, Control and End stages, not described here.) Business Case Development A variety of kinds of strategic analysis, including Value Stream Assessment, have likely already been done by the customer organization at the point when it is necessary to develop a Business Case. The Business Case Development stage launches the Data Warehouse development in response to previously identified strategic business initiatives and "predator" (key) value streams of the organization. The organization will likely have identified more than one important value stream. In the long term it is possible to implement Data Warehouse solutions that address multiple value streams, but it is the predator value stream or highest priority strategic initiative that usually becomes the focus of the short-term strategy and first run population projects resulting in a Data Warehouse. At the conclusion of the relevant business reengineering, strategic visioning, and/or value stream assessment activities conducted by the organization, a Business Case can be built to justify the use of the Data Warehouse architecture and implementation approach to solve key business issues directed at the most important goals. The Business Case defines the outlying activities, costs, benefits, and critical success factors for a multi-generation implementation plan that results in a Data Warehouse framework of an information storage/access system. The Warehouse is an iterative designed/developed/refined solution to the tactical and strategic business requirements. The Business Case addresses both the short-term and

long-term Warehouse strategies (how multiple data stores will work together to fulfill primary and secondary business goals) and identifies both immediate and extended costs so that the organization is better able to plan its short and long-term budget appropriation. Business Question Assessment Once a Business Case has been developed, the short-term strategy for implementing the Data Warehouse is mapped out by means of the Business Question Assessment (BQA) stage. The purpose of BQA is to: - Establish the scope of the Warehouse and its intended use - Define and prioritize the business requirements and the subsequent information (data) needs the Warehouse will address - Identify the business directions and objectives that may influence the required data and application architectures - Determine which business subject areas provide the most needed information; prioritize and sequence implementation projects accordingly - Drive out the logical data model that will direct the physical implementation model - Measure the quality, availability, and related costs of needed source data at a high level - Define the iterative population projects based on business needs and data validation The prioritized predator value stream or most important strategic initiative is analyzed to determine the specific business questions that need to be answered through a Warehouse implementation. Each business question is assessed to determine its overall importance to the organization, and a high-level analysis of the data needed to provide the answers is undertaken. The data is assessed for quality, availability, and cost associated with bringing it into the Data Warehouse. The business questions are then revisited and prioritized based upon their relative importance and the cost and feasibility of acquiring the associated data. The prioritized list of business questions is used to determine the scope of the first and subsequent iterations of the Data Warehouse, in the form of population projects. Iteration scoping is dependent on source data acquisition issues and is guided by determining how many business questions can be answered in a three to six month implementation time frame. A "business question" is a question deemed by the business to provide useful information in determining strategic direction. A business question can be answered through objective analysis of the data that is available. Architecture Review and Design The Architecture is the logical and physical foundation on which the Data Warehouse will be built. The Architecture Review and Design stage, as the name implies, is both a requirements analysis and a gap analysis activity. It is important to assess what pieces of the architecture already exist in the organization (and in what form) and to assess what pieces are missing which are needed to build the complete Data Warehouse architecture. During the Architecture Review and Design stage, the logical Data Warehouse architecture is developed. The logical architecture is a configuration map of the necessary data stores that make up the Warehouse; it includes a central Enterprise Data Store, an optional Operational Data Store, one or more (optional) individual business area Data Marts, and one or more Metadata stores. In the metadata store(s) are two different kinds of metadata that catalog reference information about the primary data. Once the logical configuration is defined, the Data, Application, Technical and Support Architectures are designed to physically implement it. Requirements of these four architectures are carefully analyzed so that the Data Warehouse can be optimized to serve the users. Gap analysis is conducted to determine which components of each architecture already exist in the organization and can be reused, and which components must be developed (or purchased) and configured for the Data Warehouse. The Data Architecture organizes the sources and stores of business information and defines the quality and management standards for data and metadata. The Application Architecture is the software framework that guides the overall implementation of business functionality within the Warehouse environment; it controls the movement of data from source to user, including the functions of data extraction, data cleansing, data transformation, data loading, data refresh, and data access (reporting, querying). The Technical Architecture provides the underlying computing infrastructure that enables the data and application architectures. It includes platform/server, network, communications and connectivity

hardware/software/middleware, DBMS, client/server 2-tier vs.3-tier approach, and end-user workstation hardware/software. Technical architecture design must address the requirements of scalability, capacity and volume handling (including sizing and partitioning of tables), performance, availability, stability, chargeback, and security. The Support Architecture includes the software components (e.g., tools and structures for backup/recovery, disaster recovery, performance monitoring, reliability/stability compliance reporting, data archiving, and version control/configuration management) and organizational functions necessary to effectively manage the technology investment. Architecture Review and Design applies to the long-term strategy for development and refinement of the overall Data Warehouse, and is not conducted merely for a single iteration. This stage develops the blueprint of an encompassing data and technical structure, software application configuration, and organizational support structure for the Warehouse. It forms a foundation that drives the iterative Detail Design activities. Where Design tells you what to do; Architecture Review and Design tells you what pieces you need in order to do it. The Architecture Review and Design stage can be conducted as a separate project that runs mostly in parallel with the Business Question Assessment stage. For the technical, data, application and support infrastructure that enables and supports the storage and access of information is generally independent from the business requirements of which data is needed to drive the Warehouse. However, the data architecture is dependent on receiving input from certain BQA activities (data source system identification and data modeling), so the BQA stage must conclude before the Architecture stage can conclude. The Architecture will be developed based on the organization's long-term Data Warehouse strategy, so that future iterations of the Warehouse will have been provided for and will fit within the overall architecture. Tool Selection The purpose of this stage is to identify the candidate tools for developing and implementing the Data Warehouse data and application architectures, and for performing technical and support architecture functions where appropriate. Select the candidate tools that best meet the business and technical requirements as defined by the Data Warehouse architecture, and recommend the selections to the customer organization. Procure the tools upon approval from the organization. It is important to note that the process of selecting tools is often dependent on the existing technical infrastructure of the organization. Many organizations feel strongly for various reasons about using tools for the Data Warehouse applications that they already have in their "arsenal" and are reluctant to purchase new application packages. It is recommended that a thorough evaluation of existing tools and the feasibility of their reuse be done in the context of all tool evaluation activities. In some cases, existing tools can be formfitted to the Data Warehouse; in other cases, the customer organization may need to be convinced that new tools would better serve their needs. It may even be feasible that this series of activities is skipped altogether, if the organization is insistent that particular tools be used (no room for negotiation), or if tools have already been assessed and selected in anticipation of the Data Warehouse project. Tools may be categorized according to the following data, technical, application, or support functions: - Source Data Extraction and Transformation - Data Cleansing - Data Load - Data Refresh - Data Access - Security Enforcement - Version Control/Configuration Management - Backup and Recovery - Disaster Recovery - Performance Monitoring - Database Management - Platform - Data Modeling

- Metadata Management Iteration Project Planning The Data Warehouse is implemented (populated) one subject area at a time, driven by specific business questions to be answered by each implementation cycle. The first and subsequent implementation cycles of the Data Warehouse are determined during the BQA stage. At this point in the Process the first (or next if not first) subject area implementation project is planned. The business requirements discovered in BQA and, to a lesser extent, the technical requirements of the Architecture Design stage are now refined through user interviews and focus sessions to the subject area level. The results are further analyzed to yield the detail needed to design and implement a single population project, whether initial or follow-on. The Data Warehouse project team is expanded to include the members needed to construct and deploy the Warehouse, and a detailed work plan for the design and implementation of the iteration project is developed and presented to the customer organization for approval. Detail Design In the Detail Design stage, the physical Data Warehouse model (database schema) is developed, the metadata is defined, and the source data inventory is updated and expanded to include all of the necessary information needed for the subject area implementation project, and is validated with users. Finally, the detailed design of all procedures for the implementation project is completed and documented. Procedures to achieve the following activities are designed: - Warehouse Capacity Growth - Data Extraction/Transformation/Cleansing - Data Load - Security - Data Refresh - Data Access - Backup and Recovery - Disaster Recovery - Data Archiving - Configuration Management - Testing - Transition to Production - User Training - Help Desk - Change Management Implementation Once the Planning and Design stages are complete, the project to implement the current Data Warehouse iteration can proceed quickly. Necessary hardware, software and middleware components are purchased and installed, the development and test environment is established, and the configuration management processes are implemented. Programs are developed to extract, cleanse, transform and load the source data and to periodically refresh the existing data in the Warehouse, and the programs are individually unit tested against a test database with sample source data. Metrics are captured for the load process. The metadata repository is loaded with transformational and business user metadata. Canned production reports are developed and sample ad-hoc queries are run against the test database, and the validity of the output is measured. User access to the data in the Warehouse is established. Once the programs have been developed and unit tested and the components are in place, system functionality and user acceptance testing is conducted for the complete integrated Data Warehouse system. System support processes of database security, system backup and recovery, system disaster recovery, and data archiving are implemented and tested as the system is prepared for deployment. The final step is to conduct the Production Readiness Review prior to transitioning the Data Warehouse system into production. During this review, the system is evaluated for acceptance by the customer organization. Transition to Production The Transition to Production stage moves the Data Warehouse development project into the production environment. The production database is created, and the extraction/cleanse/transformation routines are run on the operations system source data. The development team works with the Operations staff to perform the

initial load of this data to the Warehouse and execute the first refresh cycle. The Operations staff is trained, and the Data Warehouse programs and processes are moved into the production libraries and catalogs. Rollout presentations and tool demonstrations are given to the entire customer community, and end-user training is scheduled and conducted. The Help Desk is established and put into operation. A Service Level Agreement is developed and approved by the customer organization. Finally, the new system is positioned for ongoing maintenance through the establishment of a Change Management Board and the implementation of change control procedures for future development cycles.

Data Warehousing: Similarities and Differences of Inmon and Kimball


How do the two architectures differ? how great the chasm? Is there a common ground? This article attempts to draw out the similarities and differences between the Inmon and Kimball approaches to the data warehouse. On the subject of what the data warehouse is and what the data marts are, both Kimball and Inmon have spoken: The data warehouse is nothing more than the union of all the data marts Ralph Kimball Dec. 29, 1997. You can catch all the minnows in the ocean and stack them together and they still do not make a whale. Bill Inmon Jan. 8, 1998. The Corporate Information Factory (CIF) and the Kimball Data Warehouse Bus (BUS) are considered the two main types of data warehousing architecture. Accordingly, the two architectures have some elements in common. All enterprises require a means to store, analyze and interpret the data they generate and accumulate in order to implement critical decisions that range from continuing to exist to maximizing prosperity. Corporations must develop operating and feedback systems to use the underlying data means (the data warehouse) to achieve their goals. Both the CIF and BUS architectures satisfy these criteria. Another requirement of any data warehouse architecture is that the user can depend on the accuracy and timeliness of the data. The user must also be able to access the data according to his or her particular needs through an easily understandable and straightforward manner of making queries. The data that is extracted in this manner by one user should be compatible with and translatable to other operations and users within the same group or enterprise that rely on the same data. Both Inmon and Kimball share the opinion that stand-alone or independent data marts or data warehouses do not satisfy the needs for accurate and timely data and ease of access for users on an enterprise or corporate scale. In an article for the Business Intelligence Network, Mr. Inmon writes: Independent data marts may work well when there are only a few data marts. But over time there are never only a few data marts ... Once there are a lot of data marts, the independent data mart approach starts to fall apart. There are many reasons why independent data marts built directly from a legacy/source environment fall apart: There is no single source of data for analytical processing ; There is no easy reconcilability of data values ; There is no foundation to build on for new data marts An independent data mart is rarely reusable for other purposes; There are too many interface programs to be built and maintained;

There is a massive redundancy of detailed data in each data mart ... because there is no common place where that detailed data is collected and integrated; There is no convenient place for historical data; There is no low level of granularity guaranteed for all data marts to use; Each data mart integrates data from the source systems in a unique way, which does not permit reconcilability or integrity of the data across the enterprise; and The window for extracting data from the legacy environment is stretched with each independent data mart requiring its own window of time for extraction

In Differences of Opinion (previously cited), Mr. Kimball gives his opinion of independent data marts: Finally stand-alone data marts or warehouses are problematic. These independent silos are built to satisfy specific needs, without regard to other existing or planned analytic data. They tend to be departmental in nature, often loosely dimensionally structured. Although often perceived as the path of least resistance because no coordination is required, the independent approach is unsustainable in the long run. Multiple, uncoordinated extracts from the same operational sources are inefficient and wasteful. They generate similar, but different variations with inconsistent naming conventions and business rules. The conflicting results cause confusion, rework and reconciliation. In the end, decisionmaking based on independent data is often clouded by fear, uncertainty and doubt. It appears from the above, that both Inmon and Kimball are of the opinion that independent or stand-alone data marts are of marginal use. However, for the most part, this is where the perception of similarity stops. You may discern later, as I have, that there are more similarities, but each of our data warehouse architects expresses them in a very different way. Inmon believes that Kimballs star schema-only approach causes inflexibility and therefore leads to a brittle structure. He writes this basic lack of flexibility is at the heart of the weakness of the star schema model as the basis of the data warehouse ... When there is an enterprise need for data the star schema is not at all optimal. Taken together, a series of star schemas and multi-dimensional tables are brittle ... [They] cannot change gracefully over time Mr. Inmon believes his approach, which uses the dependent data mart as the source for star schema usage, solves the problem of enterprise-wide access to the same data, which can change over time. The relational data warehouse is best served by a relational [3NF] database design running on relational technology This should be no surprise since the dbms technology the data warehouse runs on works the best with a relational database design. The Kimball BUS architecture expresses that raw data is transformed into presentable information in the staging area, ever mindful of throughput and quality. Staging begins with coordinated extracts from the operational source systems. Some staging kitchen activities are centralized, such as maintenance and storage of common reference data, while others may be distributed. (Data Warehouse Dining Experience, Intelligent Enterprise, Jan 1, 2004.) The above indicates to this author that Kimball has gone beyond the individual star schema approach, criticized by Inmon and, in fact, has described his multi-

dimensional data warehouse. In this approach, the model contains atomic data and the summarized data, but its construction is based on business measurements, which enable disparate business departments to query the data from a higher level of detail to the lowest level without reprogramming. Although this description appears to indicate that the Kimball staging area is VERY similar to the Inmon data warehouse, the Kimball approach does not recommend a real, physically implemented, data warehouse. His data warehouse is still the collection of data marts with their conformed dimensions. In Mastering Data Warehouse Design: Relational and Dimensional Techniques, by Claudia Imhoff, Nicholas Galemmo and Jonathan Geiger (Wiley, 2003), these authors analyze the Kimball approach as relying on star schemas for both atomic and aggregated storage. Summarizing this point of their research, the Data Warehouse Bus Architecture is said to consist of two types of data marts: The Atomic Data Marts, which hold multi-dimensional data at the lowest level. These can also include aggregated data for improved query performance. Aggregated Data Marts. These can store data according to a core business process.

In both the Atomic and Aggregated Data Marts, the data is stored in a star schema design. Their description of the Kimball Bus Architecture seems to indicate that the Kimball Approach still does not recognize a need for nor require a central data warehouse repository. The next article will highlight the differences in the two models regarding relational vs. multidimensional data.

Layers in data warehouse architecture


George Albert THE Data Warehouse Architecture (DWA) initially consisted of three layers which met most of an organisation's needs. However, DWA's are now getting more complex and sophisticated to meet the growing need for ``intelligence'' by decision makers in the organisation. As a decision maker for IT in your organisation, how would you know which components (old and new) of the architecture are required? A close look at what each component does is worthwhile here. But first, let us start at the basics. A DWA is a way of representing the overall structure of data, communication, processing and presentation that exists for end user computing within the enterprise. The architecture is made up of a number of inter-connected parts which include, operational data base / external data base layer, information access layer, data access layer, data directory (metadata) layer, process management layer, application messaging layer, data warehouse layer and data staging layer. Initially, a data warehouse could be operated with the first three layers. However, with information getting more complex and the need for meta data among many things made it necessary for more layers in the DWA. Meta data is data describing other data. It is essentially a tag to describe what is in say a column of similar data, such as sales. Meta data can be stored using a COBOL program, but the latest tool is extensible mark up language (XML) in an Internet en vironment. With the explosion of data, need for information and the desire by top management to write into data bases, it is ideal to have all the layers described below in the DWA. Operational data base / external data base Operational systems process data to support critical operational needs by processing a relatively small number of well-defined business transactions. But these generally historic, systems have limited focus and does not allow easy access data. The data i n these databases are also limited. Hence organisations are acquiring information on demographic, econometric, competitive and purchasing trends and blending it with the data they already have. The data acquired is stored in an external database layer. Information access The information access layer of the data warehouse architecture is the layer that the end-user deals with directly. In particular, it represents the tools that the end-user normally uses such as Excel, Access, browsers, and the like. This layer also incl udes the hardware and software involved in displaying and printing reports, spreadsheets, graphs and charts for analysis and presentation.

Data access The data access layer of the data warehouse architecture allows the information access layer talk to the operational layer. This is done by interfacing between information access tools and operational data bases. The language often used for interaction i s SQL or ASP. One of the keys to a data warehousing strategy is to provide end-users with ``universal data access''. In theory, universal data access means end-users, regardless of location or information access tool used, should be able to access any or all of the data in the enterprise that is necessary for them to do their job. The access will also apply to supplier s and retailers in a B2B (business-to-business) scenario. Metadata In order to provide for universal data access, it is absolutely necessary to maintain some form data directory or repository of meta-data information. This helps the enduser to access the data without having to know the location and form of data. For in stance if a end-user types ``sales'', system will know what sales refers to and where it is located by refering to the metadata layer and display it to the user. Process management The process management layer is involved in scheduling the various tasks that must be accomplished to build and maintain the data warehouse and data directory information. Application messaging Application messaging is the transport system in the DWA. It involves more than just networking protocols. It can, for instance, be used to collect transactions or messages and deliver them to a certain location at a certain time. Data warehouse The core data warehouse is where the actual data used primarily for informational uses occurs. In some cases, one can think of the data warehouse simply as a logical or virtual view of data. In this layer copies of operational and or external data are ac tually stored in a form that is easy to access and is highly flexible. Data warehouses can be stored in main frames but are being hosted on client/server platforms in the Internet world. Data staging This final layer includes processes necessary to select, edit, summarise, combine and load data warehouse and information access data from operational and / or external data bases. The complex programming involved in this layer has been reduced with the availability of off-the-shelf tools. The layer may also include programs to identify patterns in the data stored or getting compiled everyday.

http://www.ibm.com/developerworks/db2/library/techarticle/dm0505cullen/index.html?ca=drs-

S-ar putea să vă placă și