Sunteți pe pagina 1din 63

BI &DM Manual

1. Hardware and Software Requirements


Hardware
CPU: To keep things moving, you need a CPU with at least a Pentium III class processor running at a minimum of 1 GHz. For serious work, plan on employing a Pentium IV processor that offers at least 2 GHz. Memory: Because sufficient memory serves as the foundation of any wellperforming relational database, make sure 1GB or more memory id provided. SQL Server will always use as much memory as it needs but not more. Disk: Given that relational databases use disk drives as their primary storage mechanism, its always difficult to recommend a fixed value for the right amount of available disk capacity every site and application is different. However, note that a full installation of SQL Server and related tools eats more than 2GB before any of your data arrives.

Software
Operating system: Microsoft gives you a fairly wide choice of operating systems (both 32-bit and 64-bit) that can run SQL Server. They include Windows Server 2008 (Standard, Data Center, Enterprise) Windows Server 2003 (Standard, Data Center, Enterprise) Windows XP Professional Edition Windows Vista (Ultimate, Home Premium, Home Basic, Enterprise, Business) Apply the latest service pack for your operating system; in many cases, SQL Server depends on these patches. SQL SERVER 2008 consists of BIDS and SQL Server Management Studio.

Dept of CSE

BI &DM Manual

2. Introduction
Business Intelligence process has to go through the following components 1. Data Integration 2. Data Analysis 3. Reporting Data Integration is achieved by using the SQL Server Integration Services (SSIS) provided by the Business Intelligence Development Studio (BIDS) tool. Integration services are a platform for building enterprise level data integration and data transformation solutions. It includes a rich set of built in tasks and transformations; tools for constructing packages. Integration services rely on components like integration services package, data flow, control flow and connection managers. All these components are threaded together to achieve the desired functionality. Data flow constitutes three main components: 1. Data Flow Sources 2. Data Flow Transformations 3. Data Flow Destinations

1. Data Flow Sources


Data Flow Sources are designed to bring data from external sources to the integration services data flow. A data flow source reads the external data source such as a flat file or a table in the relational database and brings the data to a data flow transformation. Data Flow Sources are as shown in fig 2.1 :

Fig.2.1 Data Flow Sources

Dept of CSE

BI &DM Manual

ADO NET Source: Extracts data from a relational database by using a .NET provider.

Excel Source: Extracts data from an Excel workbook using Excel Connection Manager.

Flat File Source: Extracts data from flat files (i.e., text files) using Flat File Connection Manager.

OLEDB Source: Extracts data from a relational database using an OLEDB provider.

Raw File Source: Extract data from a raw file using direct connection. XML source: Read data from a XML data source by specifying the location of the XML file.

2. Data Flow Transformations


Data flow transformations change/transform the data obtained from the data flow source corresponding to the data flow transformation chosen. They are Aggregate: Aggregates and groups values in a dataset. It can perform operations like average, count, count distinct, group by, maximum, minimum, sum etc. Audit: Add audit information to rows in a dataset. Cache Transform: The cache column associated with input column. Character Map: Applies string operations to character data. Conditional Split: Evaluates and directs rows in a dataset. Copy Column: Copies Columns. Data Conversion: Converts columns to different data types and adds the columns to data set. Data Mining Query: To perform prediction queries against data mining models. Derived Column: Updates column values using expressions. Export Column: Exports column values from rows in dataset to files. Fuzzy Grouping: Groups rows in a dataset that contains similar values. Fuzzy Lookup: Looks up values in a reference dataset by using fuzzy matching. Import Column: Imports data from files to rows in datasets.

Dept of CSE

BI &DM Manual

Fig.2.2 Data Flow Transformations Look Up: Looks up values in a reference dataset by using exact matching. Merge: Merges two sorted datasets. Merge Join: Merges two datasets by using a join. Multicast: Creates copies of various datasets. OLEDB Command: Executes an SQL command for each row in a dataset. Percentage sampling: Creates a sample dataset by extracting a percentage of rows from a dataset. Dept of CSE
4

BI &DM Manual

Pivot: Pivots a dataset to create a less normalized representation of the data. Row Count: Counts the rows in a dataset. Row Sampling: Creates a sample dataset by extracting a number of rows from a dataset.

Script component: Executes a custom script. Slowly Changing Dimension: Updates a slowly changing dimension. Sort: Sorts data. Term Extraction: Extract terms from data in a column. Term Lookup: Counts the frequency that terms in a reference table appear in a dataset.

Union All: Merges multiple datasets. Unpivot: Creates a more normalized representation of a dataset.

Data Flow Destinations


The data flow destination writes the data that flows to it after undergoing various transformations to an external data store or to an in memory dataset. They are:

Fig.2.3 Data Flow Destinations ADO NET Destination: Writes to database using ADO.NET provider. Data Mining Model Training: Pass the data to train data mining models through the data mining model algorithms. Data Reader Destination: Creates and populates an ADO.NET in memory dataset.
5

Dept of CSE

BI &DM Manual

Dimension Processing: Load and process a SQL Server 2005 Analysis service dimension.

Excel Destination: Loads data into Excel workbook. Flat File Destination: Loads data into a flat file i.e. text file. OLEDB Destination: Loads data into relational databases using OLEDB provider. Partition Processing: Load and process a SQL Server 2005 Analysis service partition.

Raw File Destination: Output data to a raw file. Record set Destination: Creates and populates an in memory ADO record set. SQL Server Compact destination: Loads data into SQL server compact database. SQL Server Destination: Loads data into SQL server database.

Dept of CSE

BI &DM Manual

3. Data Integration Exercises


1. Move the employee data present in a relational database to a flat file. Steps involved: 1. Click on Start->All Programs->Microsoft SQL Server 2008->SQL Server Business Intelligence Development Studio (BIDS) as shown in Fig.3.1

Fig.3.1: open the Business Intelligence Development Studio 2. Start page of the BIDS will be opened as shown in Fig.3.2.

Fig.3.2: Start Page Dept of CSE


7

BI &DM Manual

3. From the Menu Bar, Click on File->New->Project. A new project dialog box will be opened. This is shown in Fig.3.3 and Fig.3.4

Fig.3.3: Create a New Project

Fig.3.4: A New Project Window 4. Select the Integration Services Project template from Visual Studio installed templates. Set the name and location of the project to be saved. Then press OK button.

Dept of CSE

BI &DM Manual

5. In the resulting window, four panes can be seen. Select the data flow pane. In the data flow pane click the link to add a new data flow task. This is shown by the Fig.3.5

Click Here

Fig.3.5: Selecting the data flow pane and adding a data flow task 6. Go to the toolbox palette at top left of the window to drag the data source and data destination to the designer interface. This is shown by Fig.3.6 and Fig.3.7.

Fig.3.6: Source and Destination

Dept of CSE

BI &DM Manual

Fig.3.7: Source and Destination 7. Click on the OLEDB Source and then drag the green pointer from the OLEDB Source (relational database) and drop it on the Flat File Destination (Text File). After connecting source with the destination, double click the OLEDB source. It will open a new dialog box. Then Click on New button as shown by the Fig.3.8.

Click

Fig.3.8: Set the OLEDB Connection Manager

Dept of CSE

10

BI &DM Manual

8. Click on the New Button to configure the OLEDB Connection Manager. Fig.3.9 illustrates this.

Click

Fig.3.9: OLEDB Connection 9. If the software is installed only in a local machine(i.e. not client/server mode) then type localhost as the Server Name; Otherwise, type the IP address/Machine Name of the Server (Client/Server mode)

Localhost/IP address

Fig.3.10: Set Server Name

Dept of CSE

11

BI &DM Manual

10. If the Server Name is localhost then check the Use Windows Authentication radio button. Select the database name and then click on the Test Connection Button. If every thing is correct then success message will be displayed (as shown in Fig. 3.11); otherwise error message will be displayed. Suppose IP address is given as the Server Name then check the option Use SQL Server Authentication. Then specify the username and password; select the database name and test the connection.

Click

Fig.3.11: Selecting the database from Server or local machine

11. Click OK button of the success message dialog box. This action will enable the OK button of OLEDB Connection Manager Dialog box (See Fig.3.9). Click the OK button present on that dialog box. 12. Click the drop down menu and select the table containing the employee details. Preview of the table containing employee details can be seen by clicking the preview button. Then, Click OK button. This is shown Fig.3.12. Now the OLEDB Source is set to the details of an employee.

Dept of CSE

12

BI &DM Manual

Click to see the preview

Select the table containing employee details

Fig.3.12: Selecting the table and preview of the table. 13. Now, double click on the Flat File destination. It will open the Flat File destination Editor as show in the Fig.3.13. Click on the New Button.

Click Here

Fig.3.13: Flat File Destination Editor

Dept of CSE

13

BI &DM Manual

14. After the clicking the New Button it will request for the format of the file in which the data is to be stored. By default, it will be Delimited. If it is so, proceed by pressing the OK Button.

Fig.3.14: Choosing the Flat File Format

15. Set the appropriate connection manager name and description. Click on the Browse button to select the text file in which the data from the OLEDB Source has to be stored. If the text file has not been created then create the text file. Then check the option Column names in the first data row. If this option is checked then when the data is transferred from the OLEDB Source, first row in the text file would be the column names as it has been mentioned in the table. For e.g. Name, Age, Designation, Salary are mentioned ad columns names in the table, the same will be the column names in the text file. Click Ok Button. See the Fig.3.15. 16. Finally, go to debug option in the menu bar and click on start debugging option to execute the package. Alternatively, Function key F5 can be pressed. The data transfer is successful if the green color is displayed on the screen as shown in the Fig.3.16. It also shows the number of rows transferred from OLEDB Source to the Flat File Destination.

Dept of CSE

14

BI &DM Manual

Click

Check This

Fig. 3.15: Selection of the text file

Fig.3.16: Moving the data from source to destination

Dept of CSE

15

BI &DM Manual

2. Move the student details in the flat file to the relational database. Steps Involved: 1. Create a new package in the same project which has been created in the first exercise. Go to the Solution Explorer window placed on the right side. Right Click the SSIS Packages and select the New SSIS Package. A new work space will be opened. Drag flat file source and OLEDB destination from the tool box and proceed as mentioned in the above exercise. Fig.3.17and Fig.3.18 demonstrate the step1.

Right Click

Fig. 3.17: Creation of a new package in the existing project

Rename the package

Fig.3.18: Rename the package

Dept of CSE

16

BI &DM Manual

3. To move data from Flat File (Text File) to Excel file. Steps Involved: 1. Create a new project or create a new package in the existing project. Follow the steps 1-5 of Exercise 1. 2. Drag the Flat File Source, Excel File Destination and drop on the Data Flow pane. 3. The data type of the Text file and Excel File differ. So, to overcome the problem, Data Conversion transformation has to be used between the Flat File and the Excel File.

Fig. 3.19: Data conversion between the Flat file and excel file.

4. Make the connection as shown in the Fig. 3.19. Configure the Flat File Source and Excel Destination as mentioned in the Exercise 1. Double Click the Data Conversion Transformation. Data Conversion Transformation Editor dialog box will be opened as shown in Fig. 3.20. 5. Columns in the text file are of String [DT_STR] data type. While moving the data to excel file all those have to be converted to Unicode String [DT_WSTR]. Check all the four columns such as EmpNo, Name, Age, and Gender. Then, change the data type from String [DT_STR] to Unicode String [DT_WSTR] as shown in the Fig. 3.21. Click OK button after setting the required data type.

Dept of CSE

17

BI &DM Manual

Fig.3.20: Data Conversion Transformation Editor

Fig. 3.21: Changing the data type as required

Dept of CSE

18

BI &DM Manual

6. Double Click the Excel Destination. Excel Destination Editor dialog box will be opened. Click on the New Button. Fig. 3.22 illustrates this step.

Click

Fig. 3.22: Excel Destination Editor

7. Specify the name of the Excel file where data would be stored. Check the option First row has column names. Click OK button.

Check

Fig. 3.23: Excel Connection Manager

Dept of CSE

19

BI &DM Manual

8. In the Excel Destination Editor dialog box , Click the New button (Fig.3.24) and it will open Create Table dialog box and perform the tasks as shown in the Fig 3.25. The resulting window after the action will be as shown in the Fig. 3.26. Click OK button.

Click

Fig. 3.24: Excel Destination Editor.

Type the table Name Remove This Delete the `copy of `

Fig. 3.25: Create Table

Dept of CSE

20

BI &DM Manual

Fig. 3.26: Table creation after editing

9. Select the name of the excel sheet and click on the Mappings option on the left side of the dialog box. See Fig. 3.27.

Click

Click and select the name of the Excel Sheet

Fig.3.27: Selecting the name of the excel sheet

Dept of CSE

21

BI &DM Manual

10. Map the Input columns with changed data type to the available output columns as shown in Fig. 3.28 and then click OK button.

Fig.3.28: Mapping the Input Columns to Output columns with changed data type.

11. Finally, press the debug button and execute the package. The data from the flat file will be moved to the excel file. Note: In the above exercises, the data sources like flat file, OLEDB, Excel must contain data like student details, employee details, sales details etc. before performing any task.

Dept of CSE

22

BI &DM Manual

4 .Some More Transformations


In the Exercise 3, data conversion transformation was used to convert the data to the required data type. In this section, some transformations other than data conversion which would help in manipulating the data during data integration phase.

Character Map
The character map transformation enables us to modify the contents of character based columns. The modified column can be placed in the data flow in place of the original column or it can be added to the data flow as a new column. The following character mappings are available: Lowercase changes all the characters to lowercase Uppercase changes all the characters to uppercase Byte Reversal reverses the byte order of each character. Hiragana maps Katakana characters to Hiragana characters. Katakana maps Katakana characters to Katakana characters. Half width changes double-byte characters to single-byte characters. Full width changes single-byte characters to double-byte characters. Linguistic casing applies linguistic casing rules instead of system casing rules. Simplified Chinese maps traditional Chinese to simplified Chinese. Traditional Chinese maps simplified Chinese to Traditional Chinese.

Multiple character mappings can be applied to a single column at the same time. However, a number of mappings are mutually exclusive. For e.g. it doesnt make any sense to use both the lowercase and uppercase mapping on the same column. The Character Map Transformation Editor Dialog box is shown in Fig. 4.1.

Conditional Split
The conditional split transformation enables us to split the data flow into multiple columns. In the Conditional Split Transformation Editor Dialog box, shown in Fig. 4.2, conditions have been defined for each branch of the split. When the package executes, each row in the data flow is compared against the conditions in order. When the row meets a set of conditions, the row is sent to that branch of the split. In Fig. 4.2, if the age is less than 20 years, the row is sent to less than 20 output. If the age is greater than 20 years, the row is sent to greater than output. Dept of CSE
23

BI &DM Manual

Fig.4.1: The Character Map Transformation Editor Dialog box

Fig. 4.2: Conditional Split Transformation Editor Dialog box

Dept of CSE

24

BI &DM Manual

Copy Column
The copy column transformation is used to create new columns in the data flow that are copies of existing columns. The new columns can then be used later in the data flow for calculations, transformations or mapping to columns in the data destination. Fig.4.3. shows the Copy Column Transformation Editor Dialog box with Gender column being copied to a new column call GenderStatistics.

,
Fig.4.3: Copy Column Transformation Editor Dialog Box

Lookup
Lookup Transformation looks for the values (provided by the lookup table) in the data source. Lookup transformation by default loads the entire source lookup table into cache for faster processing. If the source lookup table is too large to be completely loaded into cache, then restriction can be set on the amount of memory used. The Fig. 4.5 shows the Lookup Transformation Editor. In addition, if only a portion of the records in the source lookup table are needed to resolve the lookups for a given lookup transformation, then load only the required portion of the source lookup table into memory. The Configure Error output Dialog box lets us determine whether an unresolved lookup is ignored, sent to the error output or causes transformation to fail. Dept of CSE
25

BI &DM Manual

Fig. 4.5: Lookup Transformation Editor

Merge
The Merge transformation merges two data flows together. For the merge transformation to work properly, both input data flows must be sorted using the same sort order. This can be done by using the Sort transformation in each data flow prior to merge transformation.

Fig. 4.6: Merge transformation Editor

Dept of CSE

26

BI &DM Manual

Fig.4.6 shows two lists of names that are being merged together. Each Input is sorted by the Name. When the records from the two inputs are merged together, the resulting output will also be in Name order. All of the rows in both of the input data flows are presented in the merged output. For example, say 10 rows are in the first input data flow and 15 rows are in the second input data flow. There will be 25 rows in the output data flow.

Percentage Sampling
The Percentage Sampling transformation splits the data flow into two separate data flows based on a percentage. This can be useful when a small sample of a larger set of data for testing or for training a data mining model has to be created. Fig 4.7 shows the Percentage Sampling Transformation Editor Dialog Box.

Fig. 4.7: Percentage Sampling Transformation Editor Dialog Box.

Dept of CSE

27

BI &DM Manual

5. Data Analysis
SQL Server Integration Services (SSAS)
SSAS provides an Online Analytical Processing (OLAP) solution that includes data mining solutions. Specialized algorithms are used to help decision makers identify patterns, trends, and associations in business data. SQL Server Analysis Services uses Online Analytical Processing (OLAP) as opposed to Online Transaction Processing (OLTP).

Key OLAP terms


Key terms to grasp when discussing about an OLAP database are cubes, dimensions, measures, and Key Performance Indicators (KPIs). Cubes: A cube is a de-normalized version of the database, which is an extension of the two-dimensional tables found in typical OLTP databases. Dimensions: A dimension in a cube is a method used to compare or analyze the data. As an example, a product dimension within a cube could be created using the following product attributes: product name, product cost, product list price, product category, and product color. Measures: A measure in a cube is a column that measures quantifiable data (usually numeric). Measures can be mapped to a specific column in a dimension and can be used to provide aggregations (such as SUM or AVG) on the dimension. For example, a measure can be used to identify how many products sold during a given time period. Key Performance Indicator (KPI): Business decision makers define a Key Performance Indicator (KPI). By identifying certain thresholds that management might be interested in, they can easily measure performance of the company, or certain elements of the company, against these thresholds. For example, if a retailer has several stores and expects each store to exceed $1 million in retail sales each quarter, a KPI could be created with this definition.

Steps
The overall steps to be followed to create and use a cube are: 1. Create a project 2. Define a data source Dept of CSE
28

BI &DM Manual

3. Define a data source view 4. Create a cube using measures and dimensions Note: These exercises use the AdventureWorksDW2008 database available as a free download from Microsofts CodePlex site. CodePlex is Microsofts open source project hosting Web site. The SQL Server database samples are found here:

www.codeplex.com/MSFTDBProdSamples The following steps are used to create an SSAS project within BIDS. SSAS project is used to create a SQL Server Analysis cube. These steps assume that SQL Server Analysis Services has been installed on your server. 1. Launch the Business Intelligence Development Studio by choosing Start->All Programs->Microsoft SQL Server 2008->SQL Server Business Intelligence Development Studio. BIDS launches but is blank. A project needs to be either opened or created. 2. Choose File->New->Project. 3. On the New Project page, select Analysis Service Project. Enter AdventureWorks as the name and set the location to C:\AdventureWorks or continue with the default name and location. Fig.5.1. shows the dialog box with default name and location.

Fig.5.1: Creating a SSAS project in BIDS

Dept of CSE

29

BI &DM Manual

4. Click OK. The project is created and displayed in Visual Studio. 5. Leave BIDS open for the next steps.

Creating a data source


The following steps add the AdventureWorksDW2008 database to the project as the data source. Then add tables and views that are used as data source views within the project. 1. If not already open, launch BIDS by clicking Start->All Programs->Microsoft SQL Server 2008->SQL Server Business Intelligence Development Studio. Open the AdventureWorks project/any other project created in the previous steps. 2. In Solution Explorer, right-click Data Sources and choose New Data Source. The New Data Source Wizard launches. 3. On the Welcome to the Data Source Wizard page, click Next. 4. On the Select How to Define the Connection page, ensure the Create a Data Source Based on an Existing or New Connection option is selected. Click the New button. 5. On the Connection Manager page, ensure the Provider is set to Native OLE DB/SQL Server Native Client 10.0. In the Server Name text box, enter Localhost. In Select or Enter a Database Name, select AdventureWorksDW2008 from the drop-down box. The display looks like Figure 5.2. The Provider, Server Name, and Database Name should be the same on your screen. 6. On the Connection Manager page, click OK. 7. On the Select How to Define the Connection page, click Next. 8. On the Impersonation Information page, select Use the Service Account and then click Next. 9. On the Completing the Wizard page, accept the Data Source of Adventure Works DW2008 and then click Finish. 10. Leave BIDS open for the next steps. At this point, project has a data source (the AdventureWorks DW2008 database), but need to create a data source view and a cube.

Dept of CSE

30

BI &DM Manual

Fig.5.2: Configuring the Connection Manager of the Data Source

Creating a data source view


The data source view defines the tables and views you use within your cube. The following steps show the way to create a data source view. These steps assume that the data source has been created in the previous steps; however, the data source and the data source view can be created at the same time by launching the Data Source View Wizard. 1. Right-click on the Data Source Views folder in Solution Explorer and select New Data Source View. 2. Read the first page of the Data Source View Wizard and click Next. 3. Select the Adventure Works DW data source and click Next. Note that you could also launch the Data Source Wizard from here by clicking New Data Source. 4. Select the FactFinance(dbo) table in the Available Objects list and click the > button to move it to the Included Object list. This will be the fact table in the new cube. 5. Click the Add Related Tables button to automatically add all of the tables that are directly related to the dbo.FactFinance table. These will be the dimension tables for the new cube. Fig.5.3 shows the wizard with all of the tables selected. Dept of CSE
31

BI &DM Manual

Fig.5.3: Selecting the tables and views for the data source view

6. Click Next. 7. Name the new view Finance and click Finish. BIDS will automatically display the schema of the new data source view, as shown in Fig.5.4.

Invoking the Cube Wizard


Invoke the Cube Wizard by right clicking on the Cubes folder in Solution Explorer. The Cube Wizard interactively explores the structure of your data source view to identify the dimensions, levels, and measures in your cube. To create the new cube, follow these steps: 1. Right-click on the Cubes folder in Solution Explorer and select New Cube. 2. Read the first page of the Cube Wizard and click Next. 3. Select the option to Use Existing Tables. 4. Click Next.

Dept of CSE

32

BI &DM Manual

5. The Finance data source view should be selected in the drop-down list at the top. Place a checkmark next to the FactFinance table to designate it as a measure group table and click Next. 6. Remove the check mark for the field FinanceKey, indicating that it is not a measure we wish to summarize, and click Next. 7. Leave all Dim tables selected as dimension tables, and click Next. 8. Name the new cube FinanceCube and click Finish.

Fig.5.4: Viewing tables and views in your Data Source View.

Dept of CSE

33

BI &DM Manual

Defining Dimensions
The cube wizard defines dimensions based upon the choices, but it doesnt populate the dimensions with attributes. Each dimension needs to be edited, adding any attributes that users will wish to use when querying the cube. To create new dimensions follow these steps: 1. In BIDS, double click on DimDate in the Solution Explorer. 2. Using Table 5.1 below as a guide, drag the listed columns from the right-hand panel (named Data Source View) and drop them in the left-hand panel (named Attributes) to include them in the dimension. DimDate CalendarYear CalendarQuarter MonthNumberOfYear DayNumberOfWeek DayNumberOfMonth DayNumberOfYear WeekNumberOfYear FiscalQuarter FiscalYear
Table 5.1

3. Using Table5.2, add the listed columns to the remaining four dimensions. DimDepartmentGroup DepartmentGroupName DimAccount AccountDescription AccountType DimScenario ScenarioName DimOrganization OrganizationName
Table 5.2

Dept of CSE

34

BI &DM Manual

Adding Dimensional Intelligence


One of the most common ways data gets summarized in a cube is by time. So, query sales per month for the last fiscal year. Compare production values year-to-date to last years production values year-to-date. Cubes know a lot about time. In order for SQL Server Analysis Services to be best able to answer these questions, it needs to know which of the dimensions stores the time information, and which fields in the time dimension correspond to what units of time. The Business Intelligence Wizard helps you specify this information in the cube. Steps: 1. With your FinanceCube open in BIDS, click on the Business Intelligence Wizard button on the toolbar. 2. Read the initial page of the wizard and click Next. 3. Choose to Define Dimension Intelligence and click Next. 4. Choose DimDate as the dimension you wish to modify and click Next. 5. Choose Time as the dimension type. In the bottom half of this screen are listed the units of time for which cubes have knowledge. Using the Table 5.3 below, place a checkmark next to the listed units of time and then select which field in DimDate contains that type of data. 6. Click Next.

Time Property Name Year Quarter Month Day of Week Day of Month Day of Year Week of Year Fiscal Quarter Fiscal Year

Time Column CalendarYear CalendarQuarter MonthNumberOfYear DayNumberOfWeek DayNumberOfMonth DayNumberOfYear WeekNumberOfYear FiscalQuarter FiscalYear

Table 5.3: Time columns for FinanceCube

Dept of CSE

35

BI &DM Manual

Hierarchies
There is a need to create hierarchies in the defined dimensions. Hierarchies are defined by a sequence of fields, and are often used to determine the rows or columns of a pivot table when querying a cube. Steps: 1. In BIDS, double-click on DimDate in the solution explorer. 2. Create a new hierarchy by dragging the CalendarYear field from the left-hand pane (called Attributes) and drop it in the middle pane (called Hierarchies.) 3. Add a new level by dragging the CalendarQuarter field from the left-hand panel and drop it on the <new level> spot in the new hierarchy in the middlepanel. 4. Add a third level by dragging the MonthNumberOfYear field to the <new level> spot in the hierarchy. 5. Right-click on the hierarchy and rename it to Calendar. 6. In the same manner, create a hierarchy named Fiscal that contains the fields 7. FiscalYear, FiscalQuarter and MonthNumberOfYear. Fig.5.5 shows the hierarchy panel.

Fig.5.5: DimDate hierarchies.

Deploying and Processing a Cube


At this point, weve defined the structure of the new cube - but theres still more work to be done. We still need to deploy this structure to an Analysis Services server and then process the cube to create the aggregates that make querying fast and easy. To deploy and process your cube, follow these steps: 1. In BIDS, select Project-> AdventureWorksCube1 from the menu system. Dept of CSE
36

BI &DM Manual

2. Choose the Deployment category of properties in the upper left-hand corner of the project properties dialog box. 3. Verify that the Server property lists your server name. If not, enter your server name. Click OK. Fig.5.6 shows the project properties window. 4. From the menu, select Build ->Deploy AdventureWorksCube1. Fig.5.7 shows the Cube Deployment window after a successful deployment.

Fig 5.6: Project Properties.

Fig.5.7: Deploying a cube

Dept of CSE

37

BI &DM Manual

Exploring a Data Cube


BIDS includes a built-in Cube Browser that lets you interactively explore the data in any cube that has been deployed and processed. To open the Cube Browser, right-click on the cube in Solution Explorer and select Browse. Fig.5.8 shows the default state of the Cube Browser after its just been opened.

Fig.5-8: the cube browser in BIDS

To see the data in the cube you just created, follow these steps: 1. Right-click on the cube in Solution Explorer and select Browse. 2. Expand the Measures node in the metadata panel (the area at the left of the user interface). 3. Expand the Fact Finance measure group. 4. Drag the Amount measure and drop it on the Totals/Detail area. 5. Expand the Dim Account node in the metadata panel. 6. Drag the Account Description attribute and drop it on the Row Fields area. 7. Expand the Dim Date node in the metadata panel. 8. Drag the Calendar hierarchy and drop it on the Column Fields area. 9. Click the + sign next to year 2001 and then the + sign next to quarter 3. Dept of CSE
38

BI &DM Manual

10. Expand the Dim Scenario node in the metadata panel. 11. Drag the Scenario Name attribute and drop it on the Filter Fields area. 12. Click the dropdown arrow next to scenario name. Uncheck all of the checkboxes except for the one next to the Budget value. Fig.5.9 shows the result. The Cube Browser displays month-by-month budgets by account for the third quarter of 2001. Although queries could have been written to extract this information from the original source data, its much easier to let Analysis Services do the heavy lifting for you.

Fig5.9: Exploring cube data in the cube browser

Dept of CSE

39

BI &DM Manual

6. Reporting SQL Server Reporting Services (SSRS)


Reporting Services includes two tools for creating reports: Report Designer can create reports of any complexity that Reporting Services supports, but requires you to understand the structure of your data and to be able to navigate the Visual Studio user interface. Report Builder provides a simpler user interface for creating ad hoc reports, directed primarily at business users rather than developers. Report Builder requires a developer or administrator to set up a data model before end users can create reports.

Using the Report Server Project Wizard


The easiest way to create a report in Report Designer is to use the Report Wizard. Like all wizards, the Report Wizard takes you through the process in step-by-step fashion. The following choices can me made in the wizard: The data source to use The query to use to retrieve data Whether to use a tabular or matrix layout for the report How to group the retrieved data What visual style to use Where to deploy the finished report

To create a simple report using the Report Wizard, follow these steps: 1. Launch Business Intelligence Development Studio. 2. Select File -> New -> Project. 3. Select the Business Intelligence Projects project type. 4. Select the Report Server Project Wizard template. 5. Name the new project ProductReport1 and pick a convenient location to save it in. 6. Click OK. 7. Read the first page of the Report Wizard and click Next. 8. Name the new data source AdventureWorksDS. 9. Click the Edit button. 10. Log on to your test server. Dept of CSE
40

BI &DM Manual

11. Select the AdventureWorks2008 database. 12. Click OK. 13. Click the Credentials button. 14. Select Use Windows Authentication. 15. Click OK. 16. Check the Make. This is a Shared Data Source checkbox. This will make this particular data source available to other Reporting Services applications in the future. 17. Click Next. 18. Click the Query Builder button. 19. If the full query designer interface does not display by default, click the query designer toolbar button at the far left end of the toolbar. Fig.6.1 shows the full query designer interface.

Fig.6.1: Query Designer

20. Click the Add Table toolbar button. 21. Select the Product table and click Add. 22. Click Close. Dept of CSE
41

BI &DM Manual

23. Check the Name, ProductNumber, Color, and ListPrice columns. 24. Click OK. 25. Click Next. 26. Select the Tabular layout and click Next. 27. Move the Color column to the Group area, and the other three columns to the Detail area, as shown in Fig. 6.2.

Fig.6.2: Grouping columns in the report.

28. Click Next 29. Select the Stepped layout and click Next. 30. Select the Ocean style and click Next. 31. Accept the default deployment location and click Next. 32. Name the report ProductReport1. 33. Check the Preview Report checkbox. 34. Click Finish. 35. Fig.6.3 shows the finished report, open in Report Designer. Dept of CSE
42

BI &DM Manual

Fig.6.3: Report created by the Report Wizard

Using Report Server Project In general, following steps must be followed to create a report: 1. Create a Report project in Business Intelligence Design Studio or open an existing Report project. 2. Add a report to the project. 3. Create one or more datasets for the report. 4. Build the report layout. Specifically, follow the steps mentioned below 1. Select File ->New -> Project. 2. Select the Business Intelligence Projects project type. 3. Select the Report Server Project template. 4. Name the new report ProductReport2 and pick a convenient location to save it in. 5. Right-click on the Reports node in Solution Explorer and select Add-> New Item. Dept of CSE
43

BI &DM Manual

6. Select the Report template. 7. Name the new report ProductReport2.rdl and click Add. 8. In the Report Data window, select New -> Data Source. 9. Name the new Data Source AdventureWorksDS. 10. Select the Embedded Connection option and click on the Edit button. 11. Connect to your test server and choose the AdventureWorks2008 database. 12. Click OK. 13. Click OK again to create the data source. 14. In the Report Data window, select New -> Dataset. 15. Name the dataset dsLocation. 16. Click the Query Designer button. 17. If the full Query Designer does not appear, click on the Edit As Text button. 18. Click the Add Table button. 19. Select the Location table. 20. Click Add. 21. Click Close. 22. Check the boxes for the Name and CostRate columns. 23. Sort the dataset in ascending order by Name and click OK. 24. Click OK again to create the dataset. 25. Open the toolbox window (View ->Toolbox). 26. Double-click the Table control. 27. Switch back to the Report Data window. 28. Expand the dataset to show the column names. 29. Drag the Name field and drop it in the first column of the table control on the design tab. 30. Drag the CostRate field from the Report Data window and drop it in the second column of the table control. 31. Place the cursor between the column selectors above the Name and CostRate columns to display a double-headed arrow. Hold down the mouse button and drag the cursor to the right to widen the Name column. 32. Fig. 6.4 shows the report in Design view. 33. Select the Preview tab to see the report with data. Dept of CSE
44

BI &DM Manual

Fig.6.4: Designing a report using Report Server project

Publishing a Report
Creating reports in Business Intelligence Development Studio is good for developers, but it doesnt help users at all. In order for the reports built to be available to others, it must be published in the Reporting Services server of the users machine. To publish a report, the Build and Deploy menu items must be used in Business Intelligence Development Studio. Before this, the projects configuration need to be checked to make sure that appropriate server has been selected for the deployment. To publish the first report, follow these steps: 1. Select File->Recent Projects and choose your ProductReport1 project. Dept of CSE
45

BI &DM Manual

2. Select Project ->ProductReport1 Properties. 3. Click the Configuration Manager Button. 4. Fill in the Target Server URL for your Report Server. If youre developing on the same computer where Reporting Services is installed, and you installed in the default configuration, this will be http://localhost/ReportServer. Fig.6.5 shows the completed Property Pages.

Fig.6.5: Setting report project properties

5. Click OK. 6. Select Build->Deploy ProductReport1. The Output Window will track the progress of BIDS in deploying your report, as shown in Fig.6.6. Depending on the speed of your computer, building the report may take some time.

Fig.6.6: Deploying a report Dept of CSE


46

BI &DM Manual

7. Launch a web browser and enter the address http://localhost/reports. 8. It may take several minutes for the web page to display; Reporting Services goes to sleep when it hasnt been used for a while and can take a while to spin up to speed. Figure 6.7 shows the result.

Fig.6.7: The Report Manager.

9. Click the link for the ProductReport1 folder. 10. Click the link for the ProductReport1 report.

Using a Report Builder


Report Designer helps to create reports for Reporting Services, but its not the only way. SQL Server 2008 also includes a tool directed at end users named Report Builder. Unlike Report Designer, which is aimed at Developers, Report Builder presents a simplified view of the report-building process and is intended for business analysts and other end users.

]
Dept of CSE
47

BI &DM Manual

Building a Data Model


Report Builder doesnt let end users explore all of a SQL Server database. Instead, it depends on a data model: a pre-selected group of tables and relationships that a developer has identified as suitable for end-user reporting. To build a data model, you use Business Intelligence Development Studio. Data models contain three things: Data Sources connect the data model to actual data. Data Source Views draw data from data sources. Report Models contain entities that end users can use on reports.

To create a data model, follow these steps: 1. If its not already open, launch Business Intelligence Development Studio 2. Select File->New -> Project. 3. Select the Business Intelligence Projects project type. 4. Select the Report Model Project template. 5. Name the new project AWSales and save it in a convenient location. 6. Click OK. 7. Right-click on Data Sources in Solution Explorer and select Add New Data Source. 8. Read the first page of the Add New Data Source Wizard and click Next. 9. Click New. 10. In the Connection Manager dialog box connect to the AdventureWorks2008 database on your test server and click OK. 11. Click Next. 12. Name the new data source AdventureWorks and click Finish. 13. Right-click on Data Source Views in Solution Explorer and select Add New Data Source View. 14. Read the first page of the Add New Data Source View Wizard and click Next. 15. Select the AdventureWorks data source and click Next. 16. Select the Product(Production) table and click the > button to move it to the Included Objects listbox. 17. Select the SalesOrderDetail(Sales) table and click the > button to move it to the Included Objects listbox. 18. Click the Add Related Tables button. Dept of CSE
48

BI &DM Manual

19. Click Next. 20. Click Finish. 21. Right-click on Report Models in Solution Explorer and select Add New Report Model. 22. Read the first page of the Report Model Wizard and click Next. 23. Select the Adventure Works2008 data source view and click Next. 24. Keep the default rules selection, as shown in Fig.6.8 and click Next.

Fig.6.8: Creating entities for end-user reporting

25. Choose the Update Statistics option and click Next. 26. Click Run to complete the wizard. 27. Click Finish. If you get a warning that a file was modified outside the source editor, click yes. 28. Select Build ->Deploy AWSales to deploy the report model to the local Reporting Services server.

Dept of CSE

49

BI &DM Manual

Building a Report
Report Builder itself is a ClickOnce Windows Forms application. It means that its a Windows application that end users launch from their web browser, but it never gets installed on their computer, so they dont need any local administrator rights on their computer to run it. To get started with Report Builder, browse to your Reporting Services home page. Typically, this will have a URL such as http://ServerName/Reports (or http://localhost/Reports if the browser is running on the same box with SQL Server 2008 itself). Fig.6.9 shows the Reporting Services home page.

Fig.6.9: Reporting Services home page

To run Report Builder, click the Report Builder link in the home page menu bar. Report Builder will automatically load up all of the available report models and wait for the user to choose one to build a report from. Steps are as follows: 1. Open a browser window and navigate to http://localhost/Reports (or to the appropriate Report Server URL if you are not working on the reportserver). 2. Click the Report Builder link. 3. Depending on your operating system, you may have to confirm that you want to run the application. Dept of CSE
50

BI &DM Manual

4. After Report Builder is loaded, select the AdventureWorks2008 report model and the table report layout. Click OK. Fig.6.10 shows the new blank report that Report Builder will create. 5. Select the Product table. 6. Drag the Name field and drop it in the area labeled Drag and Drop Column Fields. 7. Click on Special Offer Products in the Explorer window to show related child tables. 8. Click on Sales Order Details. 9. Drag the Total Order Qty field and drop it to the right of the Name field. 10. Click where it says Click to Add Title and type Product Sales.

Fig.6.10: New report in Report Builder

11. Click the Run Report button to produce the report shown in Fig.6.11.

Dept of CSE

51

BI &DM Manual

Fig.6.11: Report in Report Builder

12. Click the Sort and Group toolbar button. 13. Select to sort by Total Order Qty descending. 14. Click OK. 15. Select File->Save. 16. Name the new report Product Sales. 17. Click Save. This will publish the report back to the Reporting Services server that you originally downloaded Report Builder from.

Dept of CSE

52

BI &DM Manual

7. Clementine
Clementine is a data mining workbench that enables you to quickly develop predictive models using business expertise and deploy them into business operations to improve decision making. Designed around the industry-standard CRISP-DM model, Clementine supports the entire data mining process, from data to better business results. Clementine Client, Server, and Batch Clementine uses client/server architecture to distribute requests for resourceintensive operations to powerful server software, resulting in faster performance on larger datasets.Additional products or updates beyond those listed here may also be available. Clementine Client. Clementine Client is a functionally complete version of the product that is installed and run on the users desktop computer. It can be run in local mode as a standalone product or in distributed mode along with Clementine Server for improved performance on large datasets. Clementine Server. Clementine Server runs continually in distributed analysis mode together with one or more client installations, providing superior performance on large datasets because memory-intensive operations can be done on the server without downloading data to the client computer. Clementine Server also provides support for SQL optimization, batch-mode processing, and in-database modeling capabilities, delivering further benefits in performance and automation. At least one Clementine Client or Clementine Batch installation must be present to run an analysis. Clementine Batch. Clementine Batch is a special version of the client that runs in batch mode only, providing support for the complete analytical capabilities of Clementine without access to the regular user interface. This allows long-running or repetitive tasks to be performed without user intervention and without the presence of the user interface on the screen. Unlike Clementine Client, which can be run as a standalone product, Clementine Batch must be licensed and used only in combination with Clementine Server.

Starting Clementine
Follow the below steps to start Clementine: From the Windows Start menu choose: All Programs-> Clementine-> SPSS
53

Clementine. Fig.7.1 demonstrates this step. Dept of CSE

BI &DM Manual

Fig.7.1: Starting Clementine

When you first start Clementine, the workspace opens in the default view. The area in the middle is called the stream canvas. This is the main area to be used to work in Clementine. See Fig.7.2.

Fig.7.2: Clementine Workspace (Default View)

Most of the data and modeling tools in Clementine reside in palettes, the area below the stream canvas. Each tab contains groups of nodes that are a graphical representation of data mining tasks, such as accessing and filtering data, creating graphs, and building models. This is depicted in the Fig.7.3.

Dept of CSE

54

BI &DM Manual

Stream Canvas

Palette Area

Fig.7.3: Shows the Stream Canvas and Palette Area

To add nodes to the canvas, double-click icons from the node palettes or drag and drop them onto the canvas. Then connect them to create a stream, representing the flow of data.

Clementine Managers
On the top right side of the window are the outputs and object managers. These tabs are used to view and manage a variety of Clementine objects. This is shown in Fig.7.4.

Fig.7.4: Streams Tab

Dept of CSE

55

BI &DM Manual

The Outputs tab contains a variety of files produced by stream operations in Clementine. User can display, rename, and close the tables, graphs, and reports listed here. See Fig.7.5.

Fig.7.5: Outputs Tab

The Models tab is a powerful tool that contains all generated models (models that have been built in Clementine) for a session. Models can be examined closely, added to the stream, exported, or annotated. See Fig 7.6.

Fig.7.6: Models Tab

Clementine Projects
On the bottom right side of the window is the projects tool, used to create and manage data mining projects. There are two ways to view projects you create in Clementine. 1. CRISP-DM view 2. Classes view Dept of CSE
56

BI &DM Manual

1. The CRISP-DM tab provides a way to organize projects according to the CrossIndustry Standard Process for Data Mining, an industry-proven, nonproprietary methodology. For both experienced and first-time data miners, using the CRISPDM tool will help you to better organize and communicate your efforts. See Fig.7.7.

Fig.7.7: CRISP-DM view

2. The Classes tab provides a way to organize your work in Clementine categorically--by the types of objects you create. This view is useful when taking inventory of data, streams, models, etc. See Fig.7.8.

Fig.7.8: Classes View

Dept of CSE

57

BI &DM Manual

Exercise 1
Problem: Imagine that a medical researcher is compiling data for a study. He has collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of five medications. Now, the researcher is entrusted with the job to find out which drug might be appropriate for a future patient with the same illness. Solution: 1. Store the data in a text file as shown in the table 7.1. The data fields used are Age, Sex, BP, Cholesterol, Blood Sodium concentration, Blood Potassium

concentration, Drug used. Age BP Cholesterol Na K Drug Number HIGH, NORMAL, LOW NORMAL or HIGH Blood Sodium Concentration Blood Potassium Concentration Prescription Drug to which patient responded Sex M or F
Table 7.1: Shows the data Fields

2. Read in delimited text data using a Variable File node. Add a Variable File node from the palettes--either click the Sources tab to find the node or use the Favorites tab, which includes this node by default. Next, double-click the newly placed node to open its dialog box. See Fig 7.9. 3. Click the button just to the right of the File box marked with an ellipsis (...) to browse to the directory and select the file called DRUG1n ( This File contains the data fields mentioned in the step 1)..Select Read field names from file and notice the fields and values that have just been loaded into the dialog box. This step is shown in Fig.7.10.

Dept of CSE

58

BI &DM Manual

Fig.7.9: illustrates the step 2.

Fig.7.10: Reading the Text File

4. Click the Data tab to override and change storage for a field. Note that storage is different than type, or usage of the data field. The Data Tab is highlighted by the Yellow Color in the Fig.7.11.

Dept of CSE

59

BI &DM Manual

Fig.7.11: Shows Data Tab 5. The Types tab helps you learn more about the type of fields in your data. You can

also choose Read Values to view the actual values for each field based on the selections that you make from the Values column. This process is known as instantiation which is shown in the Fig.7.12.

Fig.7.12: Shows Types Tab

6. Adding a Table: Now that data is loaded to a file, glance at the values for some of the records. This is can be done by building a stream that includes a Table Dept of CSE
60

BI &DM Manual

node. To place a Table node in the stream, either double click an icon in the palette or drag and drop it on to the canvas. See Fig.7.13 and Fig.7.14.

Fig.7.13: Shows the palette

Fig.7.14: shows the step 6 7. To view the table, click the green arrow button on the toolbar to execute the

stream, or right-click on the Table node and select Execute. This is shown in Fig.7.15.

Fig.7.15: View the Table

18. Creating a distribution graph: During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several different types of Dept of CSE
61

BI &DM Manual

graphs to choose from, depending on the kind of data that you want to summarize. For example, to find out what proportion of the patients responded to each drug, use a Distribution node. 19. Add a Distribution node to the stream and connect it to the Source node, then double-click the node to edit options for display. Select Drug as the target field whose distribution you want to show. Then, click Execute from the dialog box. See Fig.7.16

Fig.7.16: Selecting the drug as the target field

20. The resulting graph shown in Fig.7.17 helps you see the "shape" of the data. It shows that patients responded to drug Y most often and to drugs B and C least often.

Fig.7.17: Graph showing the result

Dept of CSE

62

BI &DM Manual

Dept of CSE

63

S-ar putea să vă placă și