Bi Manual

BI &DM Manual
1. Hardware and Software Requirements

Hardware
CPU: To keep things moving, you need a CPU with at least a Pentium III class processor running at a minimum of 1 GHz. For serious work, plan on employing a Pentium IV processor that offers at least 2 GHz. Memory: Because sufficient memory serves as the foundation of any wellperforming relational database, make sure 1GB or more memory id provided. SQL Server will always use as much memory as it needs but not more. Disk: Given that relational databases use disk drives as their primary storage mechanism, its always difficult to recommend a fixed value for the right amount of available disk capacity every site and application is different. However, note that a full installation of SQL Server and related tools eats more than 2GB before any of your data arrives.
Software
Operating system: Microsoft gives you a fairly wide choice of operating systems (both 32-bit and 64-bit) that can run SQL Server. They include Windows Server 2008 (Standard, Data Center, Enterprise) Windows Server 2003 (Standard, Data Center, Enterprise) Windows XP Professional Edition Windows Vista (Ultimate, Home Premium, Home Basic, Enterprise, Business) Apply the latest service pack for your operating system; in many cases, SQL Server depends on these patches. SQL SERVER 2008 consists of BIDS and SQL Server Management Studio.
Dept of CSE
BI &DM Manual
2. Introduction
Business Intelligence process has to go through the following components 1. Data Integration 2. Data Analysis 3. Reporting Data Integration is achieved by using the SQL Server Integration Services (SSIS) provided by the Business Intelligence Development Studio (BIDS) tool. Integration services are a platform for building enterprise level data integration and data transformation solutions. It includes a rich set of built in tasks and transformations; tools for constructing packages. Integration services rely on components like integration services package, data flow, control flow and connection managers. All these components are threaded together to achieve the desired functionality. Data flow constitutes three main components: 1. Data Flow Sources 2. Data Flow Transformations 3. Data Flow Destinations
1. Data Flow Sources

Data Flow Sources are designed to bring data from external sources to the integration services data flow. A data flow source reads the external data source such as a flat file or a table in the relational database and brings the data to a data flow transformation. Data Flow Sources are as shown in fig 2.1 :
Fig.2.1 Data Flow Sources
Dept of CSE
BI &DM Manual
ADO NET Source: Extracts data from a relational database by using a .NET provider.
Excel Source: Extracts data from an Excel workbook using Excel Connection Manager.
Flat File Source: Extracts data from flat files (i.e., text files) using Flat File Connection Manager.
OLEDB Source: Extracts data from a relational database using an OLEDB provider.
Raw File Source: Extract data from a raw file using direct connection. XML source: Read data from a XML data source by specifying the location of the XML file.
2. Data Flow Transformations

Data flow transformations change/transform the data obtained from the data flow source corresponding to the data flow transformation chosen. They are Aggregate: Aggregates and groups values in a dataset. It can perform operations like average, count, count distinct, group by, maximum, minimum, sum etc. Audit: Add audit information to rows in a dataset. Cache Transform: The cache column associated with input column. Character Map: Applies string operations to character data. Conditional Split: Evaluates and directs rows in a dataset. Copy Column: Copies Columns. Data Conversion: Converts columns to different data types and adds the columns to data set. Data Mining Query: To perform prediction queries against data mining models. Derived Column: Updates column values using expressions. Export Column: Exports column values from rows in dataset to files. Fuzzy Grouping: Groups rows in a dataset that contains similar values. Fuzzy Lookup: Looks up values in a reference dataset by using fuzzy matching. Import Column: Imports data from files to rows in datasets.
Dept of CSE
BI &DM Manual
Fig.2.2 Data Flow Transformations Look Up: Looks up values in a reference dataset by using exact matching. Merge: Merges two sorted datasets. Merge Join: Merges two datasets by using a join. Multicast: Creates copies of various datasets. OLEDB Command: Executes an SQL command for each row in a dataset. Percentage sampling: Creates a sample dataset by extracting a percentage of rows from a dataset. Dept of CSE
4
BI &DM Manual
Pivot: Pivots a dataset to create a less normalized representation of the data. Row Count: Counts the rows in a dataset. Row Sampling: Creates a sample dataset by extracting a number of rows from a dataset.
Script component: Executes a custom script. Slowly Changing Dimension: Updates a slowly changing dimension. Sort: Sorts data. Term Extraction: Extract terms from data in a column. Term Lookup: Counts the frequency that terms in a reference table appear in a dataset.
Union All: Merges multiple datasets. Unpivot: Creates a more normalized representation of a dataset.
Data Flow Destinations

The data flow destination writes the data that flows to it after undergoing various transformations to an external data store or to an in memory dataset. They are:
Fig.2.3 Data Flow Destinations ADO NET Destination: Writes to database using ADO.NET provider. Data Mining Model Training: Pass the data to train data mining models through the data mining model algorithms. Data Reader Destination: Creates and populates an ADO.NET in memory dataset.
5
Dept of CSE
BI &DM Manual
Dimension Processing: Load and process a SQL Server 2005 Analysis service dimension.
Excel Destination: Loads data into Excel workbook. Flat File Destination: Loads data into a flat file i.e. text file. OLEDB Destination: Loads data into relational databases using OLEDB provider. Partition Processing: Load and process a SQL Server 2005 Analysis service partition.
Raw File Destination: Output data to a raw file. Record set Destination: Creates and populates an in memory ADO record set. SQL Server Compact destination: Loads data into SQL server compact database. SQL Server Destination: Loads data into SQL server database.
Dept of CSE
BI &DM Manual
3. Data Integration Exercises

1. Move the employee data present in a relational database to a flat file. Steps involved: 1. Click on Start->All Programs->Microsoft SQL Server 2008->SQL Server Business Intelligence Development Studio (BIDS) as shown in Fig.3.1
Fig.3.1: open the Business Intelligence Development Studio 2. Start page of the BIDS will be opened as shown in Fig.3.2.
Fig.3.2: Start Page Dept of CSE

7
BI &DM Manual
3. From the Menu Bar, Click on File->New->Project. A new project dialog box will be opened. This is shown in Fig.3.3 and Fig.3.4
Fig.3.3: Create a New Project
Fig.3.4: A New Project Window 4. Select the Integration Services Project template from Visual Studio installed templates. Set the name and location of the project to be saved. Then press OK button.
Dept of CSE
BI &DM Manual
5. In the resulting window, four panes can be seen. Select the data flow pane. In the data flow pane click the link to add a new data flow task. This is shown by the Fig.3.5
Click Here
Fig.3.5: Selecting the data flow pane and adding a data flow task 6. Go to the toolbox palette at top left of the window to drag the data source and data destination to the designer interface. This is shown by Fig.3.6 and Fig.3.7.
Fig.3.6: Source and Destination
Dept of CSE
BI &DM Manual
Fig.3.7: Source and Destination 7. Click on the OLEDB Source and then drag the green pointer from the OLEDB Source (relational database) and drop it on the Flat File Destination (Text File). After connecting source with the destination, double click the OLEDB source. It will open a new dialog box. Then Click on New button as shown by the Fig.3.8.
Click
Fig.3.8: Set the OLEDB Connection Manager
Dept of CSE
10
BI &DM Manual
8. Click on the New Button to configure the OLEDB Connection Manager. Fig.3.9 illustrates this.
Click
Fig.3.9: OLEDB Connection 9. If the software is installed only in a local machine(i.e. not client/server mode) then type localhost as the Server Name; Otherwise, type the IP address/Machine Name of the Server (Client/Server mode)
Localhost/IP address
Fig.3.10: Set Server Name
Dept of CSE
11
BI &DM Manual
10. If the Server Name is localhost then check the Use Windows Authentication radio button. Select the database name and then click on the Test Connection Button. If every thing is correct then success message will be displayed (as shown in Fig. 3.11); otherwise error message will be displayed. Suppose IP address is given as the Server Name then check the option Use SQL Server Authentication. Then specify the username and password; select the database name and test the connection.
Click
Fig.3.11: Selecting the database from Server or local machine
11. Click OK button of the success message dialog box. This action will enable the OK button of OLEDB Connection Manager Dialog box (See Fig.3.9). Click the OK button present on that dialog box. 12. Click the drop down menu and select the table containing the employee details. Preview of the table containing employee details can be seen by clicking the preview button. Then, Click OK button. This is shown Fig.3.12. Now the OLEDB Source is set to the details of an employee.
Dept of CSE
12
BI &DM Manual
Click to see the preview
Select the table containing employee details
Fig.3.12: Selecting the table and preview of the table. 13. Now, double click on the Flat File destination. It will open the Flat File destination Editor as show in the Fig.3.13. Click on the New Button.
Click Here
Fig.3.13: Flat File Destination Editor
Dept of CSE
13
BI &DM Manual
14. After the clicking the New Button it will request for the format of the file in which the data is to be stored. By default, it will be Delimited. If it is so, proceed by pressing the OK Button.
Fig.3.14: Choosing the Flat File Format
15. Set the appropriate connection manager name and description. Click on the Browse button to select the text file in which the data from the OLEDB Source has to be stored. If the text file has not been created then create the text file. Then check the option Column names in the first data row. If this option is checked then when the data is transferred from the OLEDB Source, first row in the text file would be the column names as it has been mentioned in the table. For e.g. Name, Age, Designation, Salary are mentioned ad columns names in the table, the same will be the column names in the text file. Click Ok Button. See the Fig.3.15. 16. Finally, go to debug option in the menu bar and click on start debugging option to execute the package. Alternatively, Function key F5 can be pressed. The data transfer is successful if the green color is displayed on the screen as shown in the Fig.3.16. It also shows the number of rows transferred from OLEDB Source to the Flat File Destination.
Dept of CSE
14
BI &DM Manual
Click
Check This
Fig. 3.15: Selection of the text file
Fig.3.16: Moving the data from source to destination
Dept of CSE
15
BI &DM Manual
2. Move the student details in the flat file to the relational database. Steps Involved: 1. Create a new package in the same project which has been created in the first exercise. Go to the Solution Explorer window placed on the right side. Right Click the SSIS Packages and select the New SSIS Package. A new work space will be opened. Drag flat file source and OLEDB destination from the tool box and proceed as mentioned in the above exercise. Fig.3.17and Fig.3.18 demonstrate the step1.
Right Click
Fig. 3.17: Creation of a new package in the existing project
Rename the package
Fig.3.18: Rename the package
Dept of CSE
16
BI &DM Manual
3. To move data from Flat File (Text File) to Excel file. Steps Involved: 1. Create a new project or create a new package in the existing project. Follow the steps 1-5 of Exercise 1. 2. Drag the Flat File Source, Excel File Destination and drop on the Data Flow pane. 3. The data type of the Text file and Excel File differ. So, to overcome the problem, Data Conversion transformation has to be used between the Flat File and the Excel File.
Fig. 3.19: Data conversion between the Flat file and excel file.
4. Make the connection as shown in the Fig. 3.19. Configure the Flat File Source and Excel Destination as mentioned in the Exercise 1. Double Click the Data Conversion Transformation. Data Conversion Transformation Editor dialog box will be opened as shown in Fig. 3.20. 5. Columns in the text file are of String [DT_STR] data type. While moving the data to excel file all those have to be converted to Unicode String [DT_WSTR]. Check all the four columns such as EmpNo, Name, Age, and Gender. Then, change the data type from String [DT_STR] to Unicode String [DT_WSTR] as shown in the Fig. 3.21. Click OK button after setting the required data type.
Dept of CSE
17
BI &DM Manual
Fig.3.20: Data Conversion Transformation Editor
Fig. 3.21: Changing the data type as required
Dept of CSE
18
BI &DM Manual
6. Double Click the Excel Destination. Excel Destination Editor dialog box will be opened. Click on the New Button. Fig. 3.22 illustrates this step.
Click
Fig. 3.22: Excel Destination Editor
7. Specify the name of the Excel file where data would be stored. Check the option First row has column names. Click OK button.
Check
Fig. 3.23: Excel Connection Manager
Dept of CSE
19
BI &DM Manual
8. In the Excel Destination Editor dialog box , Click the New button (Fig.3.24) and it will open Create Table dialog box and perform the tasks as shown in the Fig 3.25. The resulting window after the action will be as shown in the Fig. 3.26. Click OK button.
Click
Fig. 3.24: Excel Destination Editor.
Type the table Name Remove This Delete the `copy of `
Fig. 3.25: Create Table
Dept of CSE
20
BI &DM Manual
Fig. 3.26: Table creation after editing
9. Select the name of the excel sheet and click on the Mappings option on the left side of the dialog box. See Fig. 3.27.
Click
Click and select the name of the Excel Sheet
Fig.3.27: Selecting the name of the excel sheet
Dept of CSE
21
BI &DM Manual
10. Map the Input columns with changed data type to the available output columns as shown in Fig. 3.28 and then click OK button.
Fig.3.28: Mapping the Input Columns to Output columns with changed data type.
11. Finally, press the debug button and execute the package. The data from the flat file will be moved to the excel file. Note: In the above exercises, the data sources like flat file, OLEDB, Excel must contain data like student details, employee details, sales details etc. before performing any task.
Dept of CSE
22
BI &DM Manual
4 .Some More Transformations

In the Exercise 3, data conversion transformation was used to convert the data to the required data type. In this section, some transformations other than data conversion which would help in manipulating the data during data integration phase.
Character Map
The character map transformation enables us to modify the contents of character based columns. The modified column can be placed in the data flow in place of the original column or it can be added to the data flow as a new column. The following character mappings are available: Lowercase changes all the characters to lowercase Uppercase changes all the characters to uppercase Byte Reversal reverses the byte order of each character. Hiragana maps Katakana characters to Hiragana characters. Katakana maps Katakana characters to Katakana characters. Half width changes double-byte characters to single-byte characters. Full width changes single-byte characters to double-byte characters. Linguistic casing applies linguistic casing rules instead of system casing rules. Simplified Chinese maps traditional Chinese to simplified Chinese. Traditional Chinese maps simplified Chinese to Traditional Chinese.
Multiple character mappings can be applied to a single column at the same time. However, a number of mappings are mutually exclusive. For e.g. it doesnt make any sense to use both the lowercase and uppercase mapping on the same column. The Character Map Transformation Editor Dialog box is shown in Fig. 4.1.
Conditional Split
The conditional split transformation enables us to split the data flow into multiple columns. In the Conditional Split Transformation Editor Dialog box, shown in Fig. 4.2, conditions have been defined for each branch of the split. When the package executes, each row in the data flow is compared against the conditions in order. When the row meets a set of conditions, the row is sent to that branch of the split. In Fig. 4.2, if the age is less than 20 years, the row is sent to less than 20 output. If the age is greater than 20 years, the row is sent to greater than output. Dept of CSE
23
BI &DM Manual
Fig.4.1: The Character Map Transformation Editor Dialog box
Fig. 4.2: Conditional Split Transformation Editor Dialog box
Dept of CSE
24
BI &DM Manual
Copy Column
The copy column transformation is used to create new columns in the data flow that are copies of existing columns. The new columns can then be used later in the data flow for calculations, transformations or mapping to columns in the data destination. Fig.4.3. shows the Copy Column Transformation Editor Dialog box with Gender column being copied to a new column call GenderStatistics.
,
Fig.4.3: Copy Column Transformation Editor Dialog Box
Lookup
Lookup Transformation looks for the values (provided by the lookup table) in the data source. Lookup transformation by default loads the entire source lookup table into cache for faster processing. If the source lookup table is too large to be completely loaded into cache, then restriction can be set on the amount of memory used. The Fig. 4.5 shows the Lookup Transformation Editor. In addition, if only a portion of the records in the source lookup table are needed to resolve the lookups for a given lookup transformation, then load only the required portion of the source lookup table into memory. The Configure Error output Dialog box lets us determine whether an unresolved lookup is ignored, sent to the error output or causes transformation to fail. Dept of CSE
25
BI &DM Manual
Fig. 4.5: Lookup Transformation Editor
Merge
The Merge transformation merges two data flows together. For the merge transformation to work properly, both input data flows must be sorted using the same sort order. This can be done by using the Sort transformation in each data flow prior to merge transformation.
Fig. 4.6: Merge transformation Editor
Dept of CSE
26
BI &DM Manual
Fig.4.6 shows two lists of names that are being merged together. Each Input is sorted by the Name. When the records from the two inputs are merged together, the resulting output will also be in Name order. All of the rows in both of the input data flows are presented in the merged output. For example, say 10 rows are in the first input data flow and 15 rows are in the second input data flow. There will be 25 rows in the output data flow.
Percentage Sampling
The Percentage Sampling transformation splits the data flow into two separate data flows based on a percentage. This can be useful when a small sample of a larger set of data for testing or for training a data mining model has to be created. Fig 4.7 shows the Percentage Sampling Transformation Editor Dialog Box.
Fig. 4.7: Percentage Sampling Transformation Editor Dialog Box.
Dept of CSE
27
BI &DM Manual
5. Data Analysis
SQL Server Integration Services (SSAS)
SSAS provides an Online Analytical Processing (OLAP) solution that includes data mining solutions. Specialized algorithms are used to help decision makers identify patterns, trends, and associations in business data. SQL Server Analysis Services uses Online Analytical Processing (OLAP) as opposed to Online Transaction Processing (OLTP).
Key OLAP terms

Key terms to grasp when discussing about an OLAP database are cubes, dimensions, measures, and Key Performance Indicators (KPIs). Cubes: A cube is a de-normalized version of the database, which is an extension of the two-dimensional tables found in typical OLTP databases. Dimensions: A dimension in a cube is a method used to compare or analyze the data. As an example, a product dimension within a cube could be created using the following product attributes: product name, product cost, product list price, product category, and product color. Measures: A measure in a cube is a column that measures quantifiable data (usually numeric). Measures can be mapped to a specific column in a dimension and can be used to provide aggregations (such as SUM or AVG) on the dimension. For example, a measure can be used to identify how many products sold during a given time period. Key Performance Indicator (KPI): Business decision makers define a Key Performance Indicator (KPI). By identifying certain thresholds that management might be interested in, they can easily measure performance of the company, or certain elements of the company, against these thresholds. For example, if a retailer has several stores and expects each store to exceed $1 million in retail sales each quarter, a KPI could be created with this definition.
Steps
The overall steps to be followed to create and use a cube are: 1. Create a project 2. Define a data source Dept of CSE
28
BI &DM Manual
3. Define a data source view 4. Create a cube using measures and dimensions Note: These exercises use the AdventureWorksDW2008 database available as a free download from Microsofts CodePlex site. CodePlex is Microsofts open source project hosting Web site. The SQL Server database samples are found here:
www.codeplex.com/MSFTDBProdSamples The following steps are used to create an SSAS project within BIDS. SSAS project is used to create a SQL Server Analysis cube. These steps assume that SQL Server Analysis Services has been installed on your server. 1. Launch the Business Intelligence Development Studio by choosing Start->All Programs->Microsoft SQL Server 2008->SQL Server Business Intelligence Development Studio. BIDS launches but is blank. A project needs to be either opened or created. 2. Choose File->New->Project. 3. On the New Project page, select Analysis Service Project. Enter AdventureWorks as the name and set the location to C:\AdventureWorks or continue with the default name and location. Fig.5.1. shows the dialog box with default name and location.
Fig.5.1: Creating a SSAS project in BIDS
Dept of CSE
29
BI &DM Manual
4. Click OK. The project is created and displayed in Visual Studio. 5. Leave BIDS open for the next steps.
Creating a data source

The following steps add the AdventureWorksDW2008 database to the project as the data source. Then add tables and views that are used as data source views within the project. 1. If not already open, launch BIDS by clicking Start->All Programs->Microsoft SQL Server 2008->SQL Server Business Intelligence Development Studio. Open the AdventureWorks project/any other project created in the previous steps. 2. In Solution Explorer, right-click Data Sources and choose New Data Source. The New Data Source Wizard launches. 3. On the Welcome to the Data Source Wizard page, click Next. 4. On the Select How to Define the Connection page, ensure the Create a Data Source Based on an Existing or New Connection option is selected. Click the New button. 5. On the Connection Manager page, ensure the Provider is set to Native OLE DB/SQL Server Native Client 10.0. In the Server Name text box, enter Localhost. In Select or Enter a Database Name, select AdventureWorksDW2008 from the drop-down box. The display looks like Figure 5.2. The Provider, Server Name, and Database Name should be the same on your screen. 6. On the Connection Manager page, click OK. 7. On the Select How to Define the Connection page, click Next. 8. On the Impersonation Information page, select Use the Service Account and then click Next. 9. On the Completing the Wizard page, accept the Data Source of Adventure Works DW2008 and then click Finish. 10. Leave BIDS open for the next steps. At this point, project has a data source (the AdventureWorks DW2008 database), but need to create a data source view and a cube.
Dept of CSE
30
BI &DM Manual
Fig.5.2: Configuring the Connection Manager of the Data Source
Creating a data source view

The data source view defines the tables and views you use within your cube. The following steps show the way to create a data source view. These steps assume that the data source has been created in the previous steps; however, the data source and the data source view can be created at the same time by launching the Data Source View Wizard. 1. Right-click on the Data Source Views folder in Solution Explorer and select New Data Source View. 2. Read the first page of the Data Source View Wizard and click Next. 3. Select the Adventure Works DW data source and click Next. Note that you could also launch the Data Source Wizard from here by clicking New Data Source. 4. Select the FactFinance(dbo) table in the Available Objects list and click the > button to move it to the Included Object list. This will be the fact table in the new cube. 5. Click the Add Related Tables button to automatically add all of the tables that are directly related to the dbo.FactFinance table. These will be the dimension tables for the new cube. Fig.5.3 shows the wizard with all of the tables selected. Dept of CSE
31
BI &DM Manual
Fig.5.3: Selecting the tables and views for the data source view
6. Click Next. 7. Name the new view Finance and click Finish. BIDS will automatically display the schema of the new data source view, as shown in Fig.5.4.
Invoking the Cube Wizard

Invoke the Cube Wizard by right clicking on the Cubes folder in Solution Explorer. The Cube Wizard interactively explores the structure of your data source view to identify the dimensions, levels, and measures in your cube. To create the new cube, follow these steps: 1. Right-click on the Cubes folder in Solution Explorer and select New Cube. 2. Read the first page of the Cube Wizard and click Next. 3. Select the option to Use Existing Tables. 4. Click Next.
Dept of CSE
32
BI &DM Manual
5. The Finance data source view should be selected in the drop-down list at the top. Place a checkmark next to the FactFinance table to designate it as a measure group table and click Next. 6. Remove the check mark for the field FinanceKey, indicating that it is not a measure we wish to summarize, and click Next. 7. Leave all Dim tables selected as dimension tables, and click Next. 8. Name the new cube FinanceCube and click Finish.
Fig.5.4: Viewing tables and views in your Data Source View.
Dept of CSE
33
BI &DM Manual
Defining Dimensions
The cube wizard defines dimensions based upon the choices, but it doesnt populate the dimensions with attributes. Each dimension needs to be edited, adding any attributes that users will wish to use when querying the cube. To create new dimensions follow these steps: 1. In BIDS, double click on DimDate in the Solution Explorer. 2. Using Table 5.1 below as a guide, drag the listed columns from the right-hand panel (named Data Source View) and drop them in the left-hand panel (named Attributes) to include them in the dimension. DimDate CalendarYear CalendarQuarter MonthNumberOfYear DayNumberOfWeek DayNumberOfMonth DayNumberOfYear WeekNumberOfYear FiscalQuarter FiscalYear
Table 5.1
3. Using Table5.2, add the listed columns to the remaining four dimensions. DimDepartmentGroup DepartmentGroupName DimAccount AccountDescription AccountType DimScenario ScenarioName DimOrganization OrganizationName
Table 5.2
Dept of CSE
34
BI &DM Manual
Adding Dimensional Intelligence

One of the most common ways data gets summarized in a cube is by time. So, query sales per month for the last fiscal year. Compare production values year-to-date to last years production values year-to-date. Cubes know a lot about time. In order for SQL Server Analysis Services to be best able to answer these questions, it needs to know which of the dimensions stores the time information, and which fields in the time dimension correspond to what units of time. The Business Intelligence Wizard helps you specify this information in the cube. Steps: 1. With your FinanceCube open in BIDS, click on the Business Intelligence Wizard button on the toolbar. 2. Read the initial page of the wizard and click Next. 3. Choose to Define Dimension Intelligence and click Next. 4. Choose DimDate as the dimension you wish to modify and click Next. 5. Choose Time as the dimension type. In the bottom half of this screen are listed the units of time for which cubes have knowledge. Using the Table 5.3 below, place a checkmark next to the listed units of time and then select which field in DimDate contains that type of data. 6. Click Next.
Time Property Name Year Quarter Month Day of Week Day of Month Day of Year Week of Year Fiscal Quarter Fiscal Year
Time Column CalendarYear CalendarQuarter MonthNumberOfYear DayNumberOfWeek DayNumberOfMonth DayNumberOfYear WeekNumberOfYear FiscalQuarter FiscalYear
Table 5.3: Time columns for FinanceCube
Dept of CSE
35
BI &DM Manual
Hierarchies
There is a need to create hierarchies in the defined dimensions. Hierarchies are defined by a sequence of fields, and are often used to determine the rows or columns of a pivot table when querying a cube. Steps: 1. In BIDS, double-click on DimDate in the solution explorer. 2. Create a new hierarchy by dragging the CalendarYear field from the left-hand pane (called Attributes) and drop it in the middle pane (called Hierarchies.) 3. Add a new level by dragging the CalendarQuarter field from the left-hand panel and drop it on the <new level> spot in the new hierarchy in the middlepanel. 4. Add a third level by dragging the MonthNumberOfYear field to the <new level> spot in the hierarchy. 5. Right-click on the hierarchy and rename it to Calendar. 6. In the same manner, create a hierarchy named Fiscal that contains the fields 7. FiscalYear, FiscalQuarter and MonthNumberOfYear. Fig.5.5 shows the hierarchy panel.
Fig.5.5: DimDate hierarchies.
Deploying and Processing a Cube

At this point, weve defined the structure of the new cube - but theres still more work to be done. We still need to deploy this structure to an Analysis Services server and then process the cube to create the aggregates that make querying fast and easy. To deploy and process your cube, follow these steps: 1. In BIDS, select Project-> AdventureWorksCube1 from the menu system. Dept of CSE
36
BI &DM Manual
2. Choose the Deployment category of properties in the upper left-hand corner of the project properties dialog box. 3. Verify that the Server property lists your server name. If not, enter your server name. Click OK. Fig.5.6 shows the project properties window. 4. From the menu, select Build ->Deploy AdventureWorksCube1. Fig.5.7 shows the Cube Deployment window after a successful deployment.
Fig 5.6: Project Properties.
Fig.5.7: Deploying a cube
Dept of CSE
37
BI &DM Manual
Exploring a Data Cube

BIDS includes a built-in Cube Browser that lets you interactively explore the data in any cube that has been deployed and processed. To open the Cube Browser, right-click on the cube in Solution Explorer and select Browse. Fig.5.8 shows the default state of the Cube Browser after its just been opened.
Fig.5-8: the cube browser in BIDS
To see the data in the cube you just created, follow these steps: 1. Right-click on the cube in Solution Explorer and select Browse. 2. Expand the Measures node in the metadata panel (the area at the left of the user interface). 3. Expand the Fact Finance measure group. 4. Drag the Amount measure and drop it on the Totals/Detail area. 5. Expand the Dim Account node in the metadata panel. 6. Drag the Account Description attribute and drop it on the Row Fields area. 7. Expand the Dim Date node in the metadata panel. 8. Drag the Calendar hierarchy and drop it on the Column Fields area. 9. Click the + sign next to year 2001 and then the + sign next to quarter 3. Dept of CSE
38
BI &DM Manual
10. Expand the Dim Scenario node in the metadata panel. 11. Drag the Scenario Name attribute and drop it on the Filter Fields area. 12. Click the dropdown arrow next to scenario name. Uncheck all of the checkboxes except for the one next to the Budget value. Fig.5.9 shows the result. The Cube Browser displays month-by-month budgets by account for the third quarter of 2001. Although queries could have been written to extract this information from the original source data, its much easier to let Analysis Services do the heavy lifting for you.
Fig5.9: Exploring cube data in the cube browser
Dept of CSE
39
BI &DM Manual
6. Reporting SQL Server Reporting Services (SSRS)

Reporting Services includes two tools for creating reports: Report Designer can create reports of any complexity that Reporting Services supports, but requires you to understand the structure of your data and to be able to navigate the Visual Studio user interface. Report Builder provides a simpler user interface for creating ad hoc reports, directed primarily at business users rather than developers. Report Builder requires a developer or administrator to set up a data model before end users can create reports.
Using the Report Server Project Wizard

The easiest way to create a report in Report Designer is to use the Report Wizard. Like all wizards, the Report Wizard takes you through the process in step-by-step fashion. The following choices can me made in the wizard: The data source to use The query to use to retrieve data Whether to use a tabular or matrix layout for the report How to group the retrieved data What visual style to use Where to deploy the finished report
To create a simple report using the Report Wizard, follow these steps: 1. Launch Business Intelligence Development Studio. 2. Select File -> New -> Project. 3. Select the Business Intelligence Projects project type. 4. Select the Report Server Project Wizard template. 5. Name the new project ProductReport1 and pick a convenient location to save it in. 6. Click OK. 7. Read the first page of the Report Wizard and click Next. 8. Name the new data source AdventureWorksDS. 9. Click the Edit button. 10. Log on to your test server. Dept of CSE
40
BI &DM Manual
11. Select the AdventureWorks2008 database. 12. Click OK. 13. Click the Credentials button. 14. Select Use Windows Authentication. 15. Click OK. 16. Check the Make. This is a Shared Data Source checkbox. This will make this particular data source available to other Reporting Services applications in the future. 17. Click Next. 18. Click the Query Builder button. 19. If the full query designer interface does not display by default, click the query designer toolbar button at the far left end of the toolbar. Fig.6.1 shows the full query designer interface.
Fig.6.1: Query Designer
20. Click the Add Table toolbar button. 21. Select the Product table and click Add. 22. Click Close. Dept of CSE
41
BI &DM Manual
23. Check the Name, ProductNumber, Color, and ListPrice columns. 24. Click OK. 25. Click Next. 26. Select the Tabular layout and click Next. 27. Move the Color column to the Group area, and the other three columns to the Detail area, as shown in Fig. 6.2.
Fig.6.2: Grouping columns in the report.
28. Click Next 29. Select the Stepped layout and click Next. 30. Select the Ocean style and click Next. 31. Accept the default deployment location and click Next. 32. Name the report ProductReport1. 33. Check the Preview Report checkbox. 34. Click Finish. 35. Fig.6.3 shows the finished report, open in Report Designer. Dept of CSE
42
BI &DM Manual
Fig.6.3: Report created by the Report Wizard
Using Report Server Project In general, following steps must be followed to create a report: 1. Create a Report project in Business Intelligence Design Studio or open an existing Report project. 2. Add a report to the project. 3. Create one or more datasets for the report. 4. Build the report layout. Specifically, follow the steps mentioned below 1. Select File ->New -> Project. 2. Select the Business Intelligence Projects project type. 3. Select the Report Server Project template. 4. Name the new report ProductReport2 and pick a convenient location to save it in. 5. Right-click on the Reports node in Solution Explorer and select Add-> New Item. Dept of CSE
43
BI &DM Manual
6. Select the Report template. 7. Name the new report ProductReport2.rdl and click Add. 8. In the Report Data window, select New -> Data Source. 9. Name the new Data Source AdventureWorksDS. 10. Select the Embedded Connection option and click on the Edit button. 11. Connect to your test server and choose the AdventureWorks2008 database. 12. Click OK. 13. Click OK again to create the data source. 14. In the Report Data window, select New -> Dataset. 15. Name the dataset dsLocation. 16. Click the Query Designer button. 17. If the full Query Designer does not appear, click on the Edit As Text button. 18. Click the Add Table button. 19. Select the Location table. 20. Click Add. 21. Click Close. 22. Check the boxes for the Name and CostRate columns. 23. Sort the dataset in ascending order by Name and click OK. 24. Click OK again to create the dataset. 25. Open the toolbox window (View ->Toolbox). 26. Double-click the Table control. 27. Switch back to the Report Data window. 28. Expand the dataset to show the column names. 29. Drag the Name field and drop it in the first column of the table control on the design tab. 30. Drag the CostRate field from the Report Data window and drop it in the second column of the table control. 31. Place the cursor between the column selectors above the Name and CostRate columns to display a double-headed arrow. Hold down the mouse button and drag the cursor to the right to widen the Name column. 32. Fig. 6.4 shows the report in Design view. 33. Select the Preview tab to see the report with data. Dept of CSE
44
BI &DM Manual
Fig.6.4: Designing a report using Report Server project
Publishing a Report
Creating reports in Business Intelligence Development Studio is good for developers, but it doesnt help users at all. In order for the reports built to be available to others, it must be published in the Reporting Services server of the users machine. To publish a report, the Build and Deploy menu items must be used in Business Intelligence Development Studio. Before this, the projects configuration need to be checked to make sure that appropriate server has been selected for the deployment. To publish the first report, follow these steps: 1. Select File->Recent Projects and choose your ProductReport1 project. Dept of CSE
45
BI &DM Manual
2. Select Project ->ProductReport1 Properties. 3. Click the Configuration Manager Button. 4. Fill in the Target Server URL for your Report Server. If youre developing on the same computer where Reporting Services is installed, and you installed in the default configuration, this will be http://localhost/ReportServer. Fig.6.5 shows the completed Property Pages.
Fig.6.5: Setting report project properties
5. Click OK. 6. Select Build->Deploy ProductReport1. The Output Window will track the progress of BIDS in deploying your report, as shown in Fig.6.6. Depending on the speed of your computer, building the report may take some time.
Fig.6.6: Deploying a report Dept of CSE

46
BI &DM Manual
7. Launch a web browser and enter the address http://localhost/reports. 8. It may take several minutes for the web page to display; Reporting Services goes to sleep when it hasnt been used for a while and can take a while to spin up to speed. Figure 6.7 shows the result.
Fig.6.7: The Report Manager.
9. Click the link for the ProductReport1 folder. 10. Click the link for the ProductReport1 report.
Using a Report Builder

Report Designer helps to create reports for Reporting Services, but its not the only way. SQL Server 2008 also includes a tool directed at end users named Report Builder. Unlike Report Designer, which is aimed at Developers, Report Builder presents a simplified view of the report-building process and is intended for business analysts and other end users.
]
Dept of CSE
47
BI &DM Manual
Building a Data Model

Report Builder doesnt let end users explore all of a SQL Server database. Instead, it depends on a data model: a pre-selected group of tables and relationships that a developer has identified as suitable for end-user reporting. To build a data model, you use Business Intelligence Development Studio. Data models contain three things: Data Sources connect the data model to actual data. Data Source Views draw data from data sources. Report Models contain entities that end users can use on reports.
To create a data model, follow these steps: 1. If its not already open, launch Business Intelligence Development Studio 2. Select File->New -> Project. 3. Select the Business Intelligence Projects project type. 4. Select the Report Model Project template. 5. Name the new project AWSales and save it in a convenient location. 6. Click OK. 7. Right-click on Data Sources in Solution Explorer and select Add New Data Source. 8. Read the first page of the Add New Data Source Wizard and click Next. 9. Click New. 10. In the Connection Manager dialog box connect to the AdventureWorks2008 database on your test server and click OK. 11. Click Next. 12. Name the new data source AdventureWorks and click Finish. 13. Right-click on Data Source Views in Solution Explorer and select Add New Data Source View. 14. Read the first page of the Add New Data Source View Wizard and click Next. 15. Select the AdventureWorks data source and click Next. 16. Select the Product(Production) table and click the > button to move it to the Included Objects listbox. 17. Select the SalesOrderDetail(Sales) table and click the > button to move it to the Included Objects listbox. 18. Click the Add Related Tables button. Dept of CSE
48
BI &DM Manual
19. Click Next. 20. Click Finish. 21. Right-click on Report Models in Solution Explorer and select Add New Report Model. 22. Read the first page of the Report Model Wizard and click Next. 23. Select the Adventure Works2008 data source view and click Next. 24. Keep the default rules selection, as shown in Fig.6.8 and click Next.
Fig.6.8: Creating entities for end-user reporting
25. Choose the Update Statistics option and click Next. 26. Click Run to complete the wizard. 27. Click Finish. If you get a warning that a file was modified outside the source editor, click yes. 28. Select Build ->Deploy AWSales to deploy the report model to the local Reporting Services server.
Dept of CSE
49
BI &DM Manual
Building a Report
Report Builder itself is a ClickOnce Windows Forms application. It means that its a Windows application that end users launch from their web browser, but it never gets installed on their computer, so they dont need any local administrator rights on their computer to run it. To get started with Report Builder, browse to your Reporting Services home page. Typically, this will have a URL such as http://ServerName/Reports (or http://localhost/Reports if the browser is running on the same box with SQL Server 2008 itself). Fig.6.9 shows the Reporting Services home page.
Fig.6.9: Reporting Services home page
To run Report Builder, click the Report Builder link in the home page menu bar. Report Builder will automatically load up all of the available report models and wait for the user to choose one to build a report from. Steps are as follows: 1. Open a browser window and navigate to http://localhost/Reports (or to the appropriate Report Server URL if you are not working on the reportserver). 2. Click the Report Builder link. 3. Depending on your operating system, you may have to confirm that you want to run the application. Dept of CSE
50
BI &DM Manual
4. After Report Builder is loaded, select the AdventureWorks2008 report model and the table report layout. Click OK. Fig.6.10 shows the new blank report that Report Builder will create. 5. Select the Product table. 6. Drag the Name field and drop it in the area labeled Drag and Drop Column Fields. 7. Click on Special Offer Products in the Explorer window to show related child tables. 8. Click on Sales Order Details. 9. Drag the Total Order Qty field and drop it to the right of the Name field. 10. Click where it says Click to Add Title and type Product Sales.
Fig.6.10: New report in Report Builder
11. Click the Run Report button to produce the report shown in Fig.6.11.
Dept of CSE
51
BI &DM Manual
Fig.6.11: Report in Report Builder
12. Click the Sort and Group toolbar button. 13. Select to sort by Total Order Qty descending. 14. Click OK. 15. Select File->Save. 16. Name the new report Product Sales. 17. Click Save. This will publish the report back to the Reporting Services server that you originally downloaded Report Builder from.
Dept of CSE
52
BI &DM Manual
7. Clementine
Clementine is a data mining workbench that enables you to quickly develop predictive models using business expertise and deploy them into business operations to improve decision making. Designed around the industry-standard CRISP-DM model, Clementine supports the entire data mining process, from data to better business results. Clementine Client, Server, and Batch Clementine uses client/server architecture to distribute requests for resourceintensive operations to powerful server software, resulting in faster performance on larger datasets.Additional products or updates beyond those listed here may also be available. Clementine Client. Clementine Client is a functionally complete version of the product that is installed and run on the users desktop computer. It can be run in local mode as a standalone product or in distributed mode along with Clementine Server for improved performance on large datasets. Clementine Server. Clementine Server runs continually in distributed analysis mode together with one or more client installations, providing superior performance on large datasets because memory-intensive operations can be done on the server without downloading data to the client computer. Clementine Server also provides support for SQL optimization, batch-mode processing, and in-database modeling capabilities, delivering further benefits in performance and automation. At least one Clementine Client or Clementine Batch installation must be present to run an analysis. Clementine Batch. Clementine Batch is a special version of the client that runs in batch mode only, providing support for the complete analytical capabilities of Clementine without access to the regular user interface. This allows long-running or repetitive tasks to be performed without user intervention and without the presence of the user interface on the screen. Unlike Clementine Client, which can be run as a standalone product, Clementine Batch must be licensed and used only in combination with Clementine Server.
Starting Clementine
Follow the below steps to start Clementine: From the Windows Start menu choose: All Programs-> Clementine-> SPSS
53
Clementine. Fig.7.1 demonstrates this step. Dept of CSE
BI &DM Manual
Fig.7.1: Starting Clementine
When you first start Clementine, the workspace opens in the default view. The area in the middle is called the stream canvas. This is the main area to be used to work in Clementine. See Fig.7.2.
Fig.7.2: Clementine Workspace (Default View)
Most of the data and modeling tools in Clementine reside in palettes, the area below the stream canvas. Each tab contains groups of nodes that are a graphical representation of data mining tasks, such as accessing and filtering data, creating graphs, and building models. This is depicted in the Fig.7.3.
Dept of CSE
54
BI &DM Manual
Stream Canvas
Palette Area
Fig.7.3: Shows the Stream Canvas and Palette Area
To add nodes to the canvas, double-click icons from the node palettes or drag and drop them onto the canvas. Then connect them to create a stream, representing the flow of data.
Clementine Managers
On the top right side of the window are the outputs and object managers. These tabs are used to view and manage a variety of Clementine objects. This is shown in Fig.7.4.
Fig.7.4: Streams Tab
Dept of CSE
55
BI &DM Manual
The Outputs tab contains a variety of files produced by stream operations in Clementine. User can display, rename, and close the tables, graphs, and reports listed here. See Fig.7.5.
Fig.7.5: Outputs Tab
The Models tab is a powerful tool that contains all generated models (models that have been built in Clementine) for a session. Models can be examined closely, added to the stream, exported, or annotated. See Fig 7.6.
Fig.7.6: Models Tab
Clementine Projects
On the bottom right side of the window is the projects tool, used to create and manage data mining projects. There are two ways to view projects you create in Clementine. 1. CRISP-DM view 2. Classes view Dept of CSE
56
BI &DM Manual
1. The CRISP-DM tab provides a way to organize projects according to the CrossIndustry Standard Process for Data Mining, an industry-proven, nonproprietary methodology. For both experienced and first-time data miners, using the CRISPDM tool will help you to better organize and communicate your efforts. See Fig.7.7.
Fig.7.7: CRISP-DM view
2. The Classes tab provides a way to organize your work in Clementine categorically--by the types of objects you create. This view is useful when taking inventory of data, streams, models, etc. See Fig.7.8.
Fig.7.8: Classes View
Dept of CSE
57
BI &DM Manual
Exercise 1
Problem: Imagine that a medical researcher is compiling data for a study. He has collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of five medications. Now, the researcher is entrusted with the job to find out which drug might be appropriate for a future patient with the same illness. Solution: 1. Store the data in a text file as shown in the table 7.1. The data fields used are Age, Sex, BP, Cholesterol, Blood Sodium concentration, Blood Potassium
concentration, Drug used. Age BP Cholesterol Na K Drug Number HIGH, NORMAL, LOW NORMAL or HIGH Blood Sodium Concentration Blood Potassium Concentration Prescription Drug to which patient responded Sex M or F
Table 7.1: Shows the data Fields
2. Read in delimited text data using a Variable File node. Add a Variable File node from the palettes--either click the Sources tab to find the node or use the Favorites tab, which includes this node by default. Next, double-click the newly placed node to open its dialog box. See Fig 7.9. 3. Click the button just to the right of the File box marked with an ellipsis (...) to browse to the directory and select the file called DRUG1n ( This File contains the data fields mentioned in the step 1)..Select Read field names from file and notice the fields and values that have just been loaded into the dialog box. This step is shown in Fig.7.10.
Dept of CSE
58
BI &DM Manual
Fig.7.9: illustrates the step 2.
Fig.7.10: Reading the Text File
4. Click the Data tab to override and change storage for a field. Note that storage is different than type, or usage of the data field. The Data Tab is highlighted by the Yellow Color in the Fig.7.11.
Dept of CSE
59
BI &DM Manual
Fig.7.11: Shows Data Tab 5. The Types tab helps you learn more about the type of fields in your data. You can
also choose Read Values to view the actual values for each field based on the selections that you make from the Values column. This process is known as instantiation which is shown in the Fig.7.12.
Fig.7.12: Shows Types Tab
6. Adding a Table: Now that data is loaded to a file, glance at the values for some of the records. This is can be done by building a stream that includes a Table Dept of CSE
60
BI &DM Manual
node. To place a Table node in the stream, either double click an icon in the palette or drag and drop it on to the canvas. See Fig.7.13 and Fig.7.14.
Fig.7.13: Shows the palette
Fig.7.14: shows the step 6 7. To view the table, click the green arrow button on the toolbar to execute the
stream, or right-click on the Table node and select Execute. This is shown in Fig.7.15.
Fig.7.15: View the Table
18. Creating a distribution graph: During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several different types of Dept of CSE
61
BI &DM Manual
graphs to choose from, depending on the kind of data that you want to summarize. For example, to find out what proportion of the patients responded to each drug, use a Distribution node. 19. Add a Distribution node to the stream and connect it to the Source node, then double-click the node to edit options for display. Select Drug as the target field whose distribution you want to show. Then, click Execute from the dialog box. See Fig.7.16
Fig.7.16: Selecting the drug as the target field
20. The resulting graph shown in Fig.7.17 helps you see the "shape" of the data. It shows that patients responded to drug Y most often and to drugs B and C least often.
Fig.7.17: Graph showing the result
Dept of CSE
62
BI &DM Manual
Dept of CSE
63

Bi Manual

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Bi Manual

Încărcat de

Drepturi de autor:

Formate disponibile

BI &DM Manual

1. Hardware and Software Requirements

1. Data Flow Sources

Fig.2.1 Data Flow Sources

2. Data Flow Transformations

Data Flow Destinations

3. Data Integration Exercises

Fig.3.2: Start Page Dept of CSE

Fig.3.3: Create a New Project

Fig.3.6: Source and Destination

Fig.3.8: Set the OLEDB Connection Manager

Fig.3.10: Set Server Name

Fig.3.11: Selecting the database from Server or local machine

Click to see the preview

Select the table containing employee details

Fig.3.13: Flat File Destination Editor

Fig.3.14: Choosing the Flat File Format

Fig. 3.15: Selection of the text file

Fig.3.16: Moving the data from source to destination

Fig. 3.17: Creation of a new package in the existing project

Rename the package

Fig.3.18: Rename the package

Fig.3.20: Data Conversion Transformation Editor

Fig. 3.21: Changing the data type as required

Fig. 3.22: Excel Destination Editor

Fig. 3.23: Excel Connection Manager

Fig. 3.24: Excel Destination Editor.

Type the table Name Remove This Delete the `copy of `

Fig. 3.25: Create Table

Fig. 3.26: Table creation after editing

Click and select the name of the Excel Sheet

Fig.3.27: Selecting the name of the excel sheet

4 .Some More Transformations

Fig.4.1: The Character Map Transformation Editor Dialog box

Fig. 4.2: Conditional Split Transformation Editor Dialog box

Fig. 4.5: Lookup Transformation Editor

Fig. 4.6: Merge transformation Editor

Fig. 4.7: Percentage Sampling Transformation Editor Dialog Box.

Key OLAP terms

Fig.5.1: Creating a SSAS project in BIDS

Creating a data source

Fig.5.2: Configuring the Connection Manager of the Data Source

Creating a data source view

Invoking the Cube Wizard

Fig.5.4: Viewing tables and views in your Data Source View.

Adding Dimensional Intelligence

Table 5.3: Time columns for FinanceCube

Fig.5.5: DimDate hierarchies.

Deploying and Processing a Cube

Fig 5.6: Project Properties.

Fig.5.7: Deploying a cube

Exploring a Data Cube

Fig.5-8: the cube browser in BIDS

Fig5.9: Exploring cube data in the cube browser

6. Reporting SQL Server Reporting Services (SSRS)

Using the Report Server Project Wizard

Fig.6.1: Query Designer

Fig.6.2: Grouping columns in the report.

Fig.6.3: Report created by the Report Wizard

Fig.6.4: Designing a report using Report Server project

Fig.6.5: Setting report project properties

Fig.6.6: Deploying a report Dept of CSE

Fig.6.7: The Report Manager.