Documente Academic
Documente Profesional
Documente Cultură
large reservoir of detailed and summary data that describes the organization and its activities, organized by the various business dimensions in a way to facilitate easy retrieval of information describing activities data mart a subset of the data warehouse, tailored to meet the specialized needs of a particular group of users Top-down approach bottom-up approach to data warehouse developmentthe data marts are created first and then integrated.
keep the warehouse data current; ensure that the warehouse data is accurate; keep the warehouse data secure; make the warehouse data easily available to authorized users; maintain descriptions of the warehouse data so that users and system developers can understand the meaning of each element
Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. Major task of data warehouse system Data analysis and decision making User and system orientation: customer vs. market Data contents: current, detailed vs. historical, consolidated Database design: ER + application vs. star + subject View: current, local vs. evolutionary, integrated Access patterns: update vs. read-only but complex queries
OLAP
c le rk , IT p ro fessio n a l k n o w le d g e w o rk e r d a y to d a y o p e ra tio n s d e cisio n su p p o rt a p p lic a tio nrie n te d -o su b jec t rie n te d -o c u rre n t, -to-d a te up h isto ric a l, d e ta ile d , fla t re la tio n a l su m m a riz e d , m u ltid im a ln sio n e iso la te d in te g ra te d , c o n so lid a te d re p e titiv e a d-h o c re a d /w rite lo ts o f sc a n s in d e x /h a sh o n p rim . k e y sh o rt, sim p le tra n sa c tiocno m p le x q u e ry m illio n s h u n d re d s 1 0 0 G -T B B th o u sa n d s 1 0 0 M -G B B
# r e c o r d s a c c e sse te n s d
tra n sa c tio n th ro u g h p u t q u e ry th ro u g h p u t, re sp o n se
area data is prepared to be moved into the warehouse data repository and the metadata repository metadata data about data, or descriptions of the data in the data warehouse Exhibit 4.1: A Data Warehouse System Model
Staging area
Metadata repository
where the warehouse data is stored within the computer system or systems customer picturea compilation of geographic, demographic, activity, psychographic, and behavioral data the types of data to be processed, including considerations of data granularity, data hierarchies, and data dimensions Data Types
Data Content
Data Characteristics
Data Granularity the degree of detail that is represented by the data, where the greater the detail, the finer the granularity Data Hierarchies since multiple attributes can describe a single entity, an attribute is a data element that identifies or describes an occurrence of a data entity (i.e., a particular customer is identified by a customer number attribute)
Data Dimensions
for example, a manager can query the data warehouse for a display of data according to salesperson, customer, product, and time Exhibit 4.3: Every Data Record Contains the Time Element
Customer Customer number Customer age Customer gender Customer marital status Customer number of dependents Customer education Customer dwelling type Customer state Customer city Customer zip code
Customer sales order Sales order date Customer statement Statement date Warehouse shipping order Date shipped
METADATA REPOSITORY describes the flow of data from the time that the data is captured until it is archived, i.e., metadata in the metadata repository for the customer number attribute would describe its format, editing rules, and so on TYPES OF METADATA Metadata for Users (analysis) identification of the source systems, the time of the last update, the different report formats that are available, and how to find data in the data warehouse Metadata for Systems Developers data to allow developers to maintain, revise, and reengineer the data warehouse system, including the various rules that were employed in creating the warehouse data repository, and the rules for extraction, cleansing, transforming, purging, and archiving
object diagrams and entity-relationship diagrams use cases, use case diagrams, and data flow diagrams
CASE
Tools Systems
stands for computer-aided system engineering and is a way to use the computer to develop systems include a data dictionary component, which contains excellent descriptions of the data in the database or data warehouse.
DBMS
table a list of all of the attributes that identify and describe a particular entity Exhibit 4.4: A Sample Dimension Table fact table a list of all the facts that relate to some type of the organizations activity Exhibit 4.5: A Sample Fact Table
Customer Customer number Customer name Customer phone number Customer e-mail address Customer territory Customer credit code Customer standard industry code Customer city Customer state Customer zip code
Commercial Sales Facts Actual sales units Budgeted sales units Actual sales amount Budgeted sales amount Sales discount amount Net sales amount Sales commission amount Sales bonus amount Sales tax amount
INFORMATION PACKAGES
a
table that is maintained in the data warehouse repository that identifies both the dimensions and the facts that relate to a business activity Exhibit 4.6: Information Package Format keya number, such as a customer number, that identifies a particular occurrence of the dimension Exhibit 4.7: A Sample Information Package
Dimension n
Dimension n
Facts : Actual sales units, budgeted sales units, actual sales amount, budgeted sales amount, sales discount amount, net sales amount, sales commission amount, sales bonus amount, sales tax amount
STAR SCHEMAS
the arrangement of an information package that usually identifies multiple dimension tables for a single fact table and has the appearance of a star, with the fact table in the center and the dimension tables forming the points Exhibit 4.8: Star Schema Format foreign keys a means of linking the fact table to the dimension tables by means of the keys identified at the top of the fact table where the keys identify other, foreign tables as opposed to the fact table Exhibit 4.9: A Sample Star Schema
Dimension 1 name Dimension 1 key Dimension 1 hierarchy Business activity name Dimension 1 key Dimension 2 key Dimension n key Measurable fact 2 Measurable fact 4 Measurable fact 5 Measurable fact n Dimension n name Dimension n key Dimension n hierarchy
Customer payment Product key Product name Product unit price Product quantity
item
Sales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_sales Measures
item_key item_name brand type supplier_type
branch
branch_key branch_name branch_type
location
location_key street city province_or_street country
item
Sales Fact Table time_key item_key branch_key
item_key item_name brand type supplier_key
supplier
supplier_key supplier_type
branch
branch_key branch_name branch_type
location
location_key street city_key
city
time
branch
branch_key branch_name branch_type
location
location_key street city province_or_street country
summary information preprocessed data that provides the user with exactly the content that is needed top-down navigation the user seeks more detail in an effort to understand the summary information roll up navigation the user summarizes data to see the forest rather than the trees or to prepare summary graphs drill across navigation the user moves from one data hierarchy to another, i.e., information on customer sales, salesperson sales, and then product sales Exhibit 4.10: Navigation Paths
Detailed information (Net sales for salesperson 3742) Drill through Detailed data (Sales units for salesperson 3742)
information systems security damage, destruction, theft, and misuse Exhibit 4.11: The Security Action Cycle The Corporate Security Environment deterrence security policies and procedures that are intended to deter security violations, such as guidelines for proper system use and the requirement that users change their passwords periodically prevention measures aimed at those persons who ignore deterrence, and include such things as locks on computer rooms, user passwords, file permissions detection proactive actions include system audits, reports of suspicious activity, and virus scanning software and reactive actions take the form of investigations remedies respond with warnings, reprimands, termination of employment, or legal action.
1. Deterrence
Deterrence feedback
2. Prevention
Prevented abuse
3. Detection
Undetected abuse
4. Remedies
Unpunished abuse
Data Warehouse Security Measures network security using procedures such as firewalls to restrict access to the network that houses the servers and data files, databases, data warehouses, and data marts data security obtaining access to data once access to the network has been achieved; where, data files may be located on multiple servers on the network, and the user must provide a second password database or data warehouse security the security checks of the database management system (DBMS) that may include a third password, verification of user name, and also verification of access to particular data tables, records, and even record fields