Documente Academic
Documente Profesional
Documente Cultură
Software
Operating system: Microsoft gives you a fairly wide choice of operating systems (both 32-bit and 64-bit) that can run SQL Server. They include Windows Server 2008 (Standard, Data Center, Enterprise) Windows Server 2003 (Standard, Data Center, Enterprise) Windows XP Professional Edition Windows Vista (Ultimate, Home Premium, Home Basic, Enterprise, Business) Apply the latest service pack for your operating system; in many cases, SQL Server depends on these patches. SQL SERVER 2008 consists of BIDS and SQL Server Management Studio.
Dept of CSE
BI &DM Manual
2. Introduction
Business Intelligence process has to go through the following components 1. Data Integration 2. Data Analysis 3. Reporting Data Integration is achieved by using the SQL Server Integration Services (SSIS) provided by the Business Intelligence Development Studio (BIDS) tool. Integration services are a platform for building enterprise level data integration and data transformation solutions. It includes a rich set of built in tasks and transformations; tools for constructing packages. Integration services rely on components like integration services package, data flow, control flow and connection managers. All these components are threaded together to achieve the desired functionality. Data flow constitutes three main components: 1. Data Flow Sources 2. Data Flow Transformations 3. Data Flow Destinations
Dept of CSE
BI &DM Manual
ADO NET Source: Extracts data from a relational database by using a .NET provider.
Excel Source: Extracts data from an Excel workbook using Excel Connection Manager.
Flat File Source: Extracts data from flat files (i.e., text files) using Flat File Connection Manager.
OLEDB Source: Extracts data from a relational database using an OLEDB provider.
Raw File Source: Extract data from a raw file using direct connection. XML source: Read data from a XML data source by specifying the location of the XML file.
Dept of CSE
BI &DM Manual
Fig.2.2 Data Flow Transformations Look Up: Looks up values in a reference dataset by using exact matching. Merge: Merges two sorted datasets. Merge Join: Merges two datasets by using a join. Multicast: Creates copies of various datasets. OLEDB Command: Executes an SQL command for each row in a dataset. Percentage sampling: Creates a sample dataset by extracting a percentage of rows from a dataset. Dept of CSE
4
BI &DM Manual
Pivot: Pivots a dataset to create a less normalized representation of the data. Row Count: Counts the rows in a dataset. Row Sampling: Creates a sample dataset by extracting a number of rows from a dataset.
Script component: Executes a custom script. Slowly Changing Dimension: Updates a slowly changing dimension. Sort: Sorts data. Term Extraction: Extract terms from data in a column. Term Lookup: Counts the frequency that terms in a reference table appear in a dataset.
Union All: Merges multiple datasets. Unpivot: Creates a more normalized representation of a dataset.
Fig.2.3 Data Flow Destinations ADO NET Destination: Writes to database using ADO.NET provider. Data Mining Model Training: Pass the data to train data mining models through the data mining model algorithms. Data Reader Destination: Creates and populates an ADO.NET in memory dataset.
5
Dept of CSE
BI &DM Manual
Dimension Processing: Load and process a SQL Server 2005 Analysis service dimension.
Excel Destination: Loads data into Excel workbook. Flat File Destination: Loads data into a flat file i.e. text file. OLEDB Destination: Loads data into relational databases using OLEDB provider. Partition Processing: Load and process a SQL Server 2005 Analysis service partition.
Raw File Destination: Output data to a raw file. Record set Destination: Creates and populates an in memory ADO record set. SQL Server Compact destination: Loads data into SQL server compact database. SQL Server Destination: Loads data into SQL server database.
Dept of CSE
BI &DM Manual
Fig.3.1: open the Business Intelligence Development Studio 2. Start page of the BIDS will be opened as shown in Fig.3.2.
BI &DM Manual
3. From the Menu Bar, Click on File->New->Project. A new project dialog box will be opened. This is shown in Fig.3.3 and Fig.3.4
Fig.3.4: A New Project Window 4. Select the Integration Services Project template from Visual Studio installed templates. Set the name and location of the project to be saved. Then press OK button.
Dept of CSE
BI &DM Manual
5. In the resulting window, four panes can be seen. Select the data flow pane. In the data flow pane click the link to add a new data flow task. This is shown by the Fig.3.5
Click Here
Fig.3.5: Selecting the data flow pane and adding a data flow task 6. Go to the toolbox palette at top left of the window to drag the data source and data destination to the designer interface. This is shown by Fig.3.6 and Fig.3.7.
Dept of CSE
BI &DM Manual
Fig.3.7: Source and Destination 7. Click on the OLEDB Source and then drag the green pointer from the OLEDB Source (relational database) and drop it on the Flat File Destination (Text File). After connecting source with the destination, double click the OLEDB source. It will open a new dialog box. Then Click on New button as shown by the Fig.3.8.
Click
Dept of CSE
10
BI &DM Manual
8. Click on the New Button to configure the OLEDB Connection Manager. Fig.3.9 illustrates this.
Click
Fig.3.9: OLEDB Connection 9. If the software is installed only in a local machine(i.e. not client/server mode) then type localhost as the Server Name; Otherwise, type the IP address/Machine Name of the Server (Client/Server mode)
Localhost/IP address
Dept of CSE
11
BI &DM Manual
10. If the Server Name is localhost then check the Use Windows Authentication radio button. Select the database name and then click on the Test Connection Button. If every thing is correct then success message will be displayed (as shown in Fig. 3.11); otherwise error message will be displayed. Suppose IP address is given as the Server Name then check the option Use SQL Server Authentication. Then specify the username and password; select the database name and test the connection.
Click
11. Click OK button of the success message dialog box. This action will enable the OK button of OLEDB Connection Manager Dialog box (See Fig.3.9). Click the OK button present on that dialog box. 12. Click the drop down menu and select the table containing the employee details. Preview of the table containing employee details can be seen by clicking the preview button. Then, Click OK button. This is shown Fig.3.12. Now the OLEDB Source is set to the details of an employee.
Dept of CSE
12
BI &DM Manual
Fig.3.12: Selecting the table and preview of the table. 13. Now, double click on the Flat File destination. It will open the Flat File destination Editor as show in the Fig.3.13. Click on the New Button.
Click Here
Dept of CSE
13
BI &DM Manual
14. After the clicking the New Button it will request for the format of the file in which the data is to be stored. By default, it will be Delimited. If it is so, proceed by pressing the OK Button.
15. Set the appropriate connection manager name and description. Click on the Browse button to select the text file in which the data from the OLEDB Source has to be stored. If the text file has not been created then create the text file. Then check the option Column names in the first data row. If this option is checked then when the data is transferred from the OLEDB Source, first row in the text file would be the column names as it has been mentioned in the table. For e.g. Name, Age, Designation, Salary are mentioned ad columns names in the table, the same will be the column names in the text file. Click Ok Button. See the Fig.3.15. 16. Finally, go to debug option in the menu bar and click on start debugging option to execute the package. Alternatively, Function key F5 can be pressed. The data transfer is successful if the green color is displayed on the screen as shown in the Fig.3.16. It also shows the number of rows transferred from OLEDB Source to the Flat File Destination.
Dept of CSE
14
BI &DM Manual
Click
Check This
Dept of CSE
15
BI &DM Manual
2. Move the student details in the flat file to the relational database. Steps Involved: 1. Create a new package in the same project which has been created in the first exercise. Go to the Solution Explorer window placed on the right side. Right Click the SSIS Packages and select the New SSIS Package. A new work space will be opened. Drag flat file source and OLEDB destination from the tool box and proceed as mentioned in the above exercise. Fig.3.17and Fig.3.18 demonstrate the step1.
Right Click
Dept of CSE
16
BI &DM Manual
3. To move data from Flat File (Text File) to Excel file. Steps Involved: 1. Create a new project or create a new package in the existing project. Follow the steps 1-5 of Exercise 1. 2. Drag the Flat File Source, Excel File Destination and drop on the Data Flow pane. 3. The data type of the Text file and Excel File differ. So, to overcome the problem, Data Conversion transformation has to be used between the Flat File and the Excel File.
Fig. 3.19: Data conversion between the Flat file and excel file.
4. Make the connection as shown in the Fig. 3.19. Configure the Flat File Source and Excel Destination as mentioned in the Exercise 1. Double Click the Data Conversion Transformation. Data Conversion Transformation Editor dialog box will be opened as shown in Fig. 3.20. 5. Columns in the text file are of String [DT_STR] data type. While moving the data to excel file all those have to be converted to Unicode String [DT_WSTR]. Check all the four columns such as EmpNo, Name, Age, and Gender. Then, change the data type from String [DT_STR] to Unicode String [DT_WSTR] as shown in the Fig. 3.21. Click OK button after setting the required data type.
Dept of CSE
17
BI &DM Manual
Dept of CSE
18
BI &DM Manual
6. Double Click the Excel Destination. Excel Destination Editor dialog box will be opened. Click on the New Button. Fig. 3.22 illustrates this step.
Click
7. Specify the name of the Excel file where data would be stored. Check the option First row has column names. Click OK button.
Check
Dept of CSE
19
BI &DM Manual
8. In the Excel Destination Editor dialog box , Click the New button (Fig.3.24) and it will open Create Table dialog box and perform the tasks as shown in the Fig 3.25. The resulting window after the action will be as shown in the Fig. 3.26. Click OK button.
Click
Dept of CSE
20
BI &DM Manual
9. Select the name of the excel sheet and click on the Mappings option on the left side of the dialog box. See Fig. 3.27.
Click
Dept of CSE
21
BI &DM Manual
10. Map the Input columns with changed data type to the available output columns as shown in Fig. 3.28 and then click OK button.
Fig.3.28: Mapping the Input Columns to Output columns with changed data type.
11. Finally, press the debug button and execute the package. The data from the flat file will be moved to the excel file. Note: In the above exercises, the data sources like flat file, OLEDB, Excel must contain data like student details, employee details, sales details etc. before performing any task.
Dept of CSE
22
BI &DM Manual
Character Map
The character map transformation enables us to modify the contents of character based columns. The modified column can be placed in the data flow in place of the original column or it can be added to the data flow as a new column. The following character mappings are available: Lowercase changes all the characters to lowercase Uppercase changes all the characters to uppercase Byte Reversal reverses the byte order of each character. Hiragana maps Katakana characters to Hiragana characters. Katakana maps Katakana characters to Katakana characters. Half width changes double-byte characters to single-byte characters. Full width changes single-byte characters to double-byte characters. Linguistic casing applies linguistic casing rules instead of system casing rules. Simplified Chinese maps traditional Chinese to simplified Chinese. Traditional Chinese maps simplified Chinese to Traditional Chinese.
Multiple character mappings can be applied to a single column at the same time. However, a number of mappings are mutually exclusive. For e.g. it doesnt make any sense to use both the lowercase and uppercase mapping on the same column. The Character Map Transformation Editor Dialog box is shown in Fig. 4.1.
Conditional Split
The conditional split transformation enables us to split the data flow into multiple columns. In the Conditional Split Transformation Editor Dialog box, shown in Fig. 4.2, conditions have been defined for each branch of the split. When the package executes, each row in the data flow is compared against the conditions in order. When the row meets a set of conditions, the row is sent to that branch of the split. In Fig. 4.2, if the age is less than 20 years, the row is sent to less than 20 output. If the age is greater than 20 years, the row is sent to greater than output. Dept of CSE
23
BI &DM Manual
Dept of CSE
24
BI &DM Manual
Copy Column
The copy column transformation is used to create new columns in the data flow that are copies of existing columns. The new columns can then be used later in the data flow for calculations, transformations or mapping to columns in the data destination. Fig.4.3. shows the Copy Column Transformation Editor Dialog box with Gender column being copied to a new column call GenderStatistics.
,
Fig.4.3: Copy Column Transformation Editor Dialog Box
Lookup
Lookup Transformation looks for the values (provided by the lookup table) in the data source. Lookup transformation by default loads the entire source lookup table into cache for faster processing. If the source lookup table is too large to be completely loaded into cache, then restriction can be set on the amount of memory used. The Fig. 4.5 shows the Lookup Transformation Editor. In addition, if only a portion of the records in the source lookup table are needed to resolve the lookups for a given lookup transformation, then load only the required portion of the source lookup table into memory. The Configure Error output Dialog box lets us determine whether an unresolved lookup is ignored, sent to the error output or causes transformation to fail. Dept of CSE
25
BI &DM Manual
Merge
The Merge transformation merges two data flows together. For the merge transformation to work properly, both input data flows must be sorted using the same sort order. This can be done by using the Sort transformation in each data flow prior to merge transformation.
Dept of CSE
26
BI &DM Manual
Fig.4.6 shows two lists of names that are being merged together. Each Input is sorted by the Name. When the records from the two inputs are merged together, the resulting output will also be in Name order. All of the rows in both of the input data flows are presented in the merged output. For example, say 10 rows are in the first input data flow and 15 rows are in the second input data flow. There will be 25 rows in the output data flow.
Percentage Sampling
The Percentage Sampling transformation splits the data flow into two separate data flows based on a percentage. This can be useful when a small sample of a larger set of data for testing or for training a data mining model has to be created. Fig 4.7 shows the Percentage Sampling Transformation Editor Dialog Box.
Dept of CSE
27
BI &DM Manual
5. Data Analysis
SQL Server Integration Services (SSAS)
SSAS provides an Online Analytical Processing (OLAP) solution that includes data mining solutions. Specialized algorithms are used to help decision makers identify patterns, trends, and associations in business data. SQL Server Analysis Services uses Online Analytical Processing (OLAP) as opposed to Online Transaction Processing (OLTP).
Steps
The overall steps to be followed to create and use a cube are: 1. Create a project 2. Define a data source Dept of CSE
28
BI &DM Manual
3. Define a data source view 4. Create a cube using measures and dimensions Note: These exercises use the AdventureWorksDW2008 database available as a free download from Microsofts CodePlex site. CodePlex is Microsofts open source project hosting Web site. The SQL Server database samples are found here:
www.codeplex.com/MSFTDBProdSamples The following steps are used to create an SSAS project within BIDS. SSAS project is used to create a SQL Server Analysis cube. These steps assume that SQL Server Analysis Services has been installed on your server. 1. Launch the Business Intelligence Development Studio by choosing Start->All Programs->Microsoft SQL Server 2008->SQL Server Business Intelligence Development Studio. BIDS launches but is blank. A project needs to be either opened or created. 2. Choose File->New->Project. 3. On the New Project page, select Analysis Service Project. Enter AdventureWorks as the name and set the location to C:\AdventureWorks or continue with the default name and location. Fig.5.1. shows the dialog box with default name and location.
Dept of CSE
29
BI &DM Manual
4. Click OK. The project is created and displayed in Visual Studio. 5. Leave BIDS open for the next steps.
Dept of CSE
30
BI &DM Manual
BI &DM Manual
Fig.5.3: Selecting the tables and views for the data source view
6. Click Next. 7. Name the new view Finance and click Finish. BIDS will automatically display the schema of the new data source view, as shown in Fig.5.4.
Dept of CSE
32
BI &DM Manual
5. The Finance data source view should be selected in the drop-down list at the top. Place a checkmark next to the FactFinance table to designate it as a measure group table and click Next. 6. Remove the check mark for the field FinanceKey, indicating that it is not a measure we wish to summarize, and click Next. 7. Leave all Dim tables selected as dimension tables, and click Next. 8. Name the new cube FinanceCube and click Finish.
Dept of CSE
33
BI &DM Manual
Defining Dimensions
The cube wizard defines dimensions based upon the choices, but it doesnt populate the dimensions with attributes. Each dimension needs to be edited, adding any attributes that users will wish to use when querying the cube. To create new dimensions follow these steps: 1. In BIDS, double click on DimDate in the Solution Explorer. 2. Using Table 5.1 below as a guide, drag the listed columns from the right-hand panel (named Data Source View) and drop them in the left-hand panel (named Attributes) to include them in the dimension. DimDate CalendarYear CalendarQuarter MonthNumberOfYear DayNumberOfWeek DayNumberOfMonth DayNumberOfYear WeekNumberOfYear FiscalQuarter FiscalYear
Table 5.1
3. Using Table5.2, add the listed columns to the remaining four dimensions. DimDepartmentGroup DepartmentGroupName DimAccount AccountDescription AccountType DimScenario ScenarioName DimOrganization OrganizationName
Table 5.2
Dept of CSE
34
BI &DM Manual
Time Property Name Year Quarter Month Day of Week Day of Month Day of Year Week of Year Fiscal Quarter Fiscal Year
Time Column CalendarYear CalendarQuarter MonthNumberOfYear DayNumberOfWeek DayNumberOfMonth DayNumberOfYear WeekNumberOfYear FiscalQuarter FiscalYear
Dept of CSE
35
BI &DM Manual
Hierarchies
There is a need to create hierarchies in the defined dimensions. Hierarchies are defined by a sequence of fields, and are often used to determine the rows or columns of a pivot table when querying a cube. Steps: 1. In BIDS, double-click on DimDate in the solution explorer. 2. Create a new hierarchy by dragging the CalendarYear field from the left-hand pane (called Attributes) and drop it in the middle pane (called Hierarchies.) 3. Add a new level by dragging the CalendarQuarter field from the left-hand panel and drop it on the <new level> spot in the new hierarchy in the middlepanel. 4. Add a third level by dragging the MonthNumberOfYear field to the <new level> spot in the hierarchy. 5. Right-click on the hierarchy and rename it to Calendar. 6. In the same manner, create a hierarchy named Fiscal that contains the fields 7. FiscalYear, FiscalQuarter and MonthNumberOfYear. Fig.5.5 shows the hierarchy panel.
BI &DM Manual
2. Choose the Deployment category of properties in the upper left-hand corner of the project properties dialog box. 3. Verify that the Server property lists your server name. If not, enter your server name. Click OK. Fig.5.6 shows the project properties window. 4. From the menu, select Build ->Deploy AdventureWorksCube1. Fig.5.7 shows the Cube Deployment window after a successful deployment.
Dept of CSE
37
BI &DM Manual
To see the data in the cube you just created, follow these steps: 1. Right-click on the cube in Solution Explorer and select Browse. 2. Expand the Measures node in the metadata panel (the area at the left of the user interface). 3. Expand the Fact Finance measure group. 4. Drag the Amount measure and drop it on the Totals/Detail area. 5. Expand the Dim Account node in the metadata panel. 6. Drag the Account Description attribute and drop it on the Row Fields area. 7. Expand the Dim Date node in the metadata panel. 8. Drag the Calendar hierarchy and drop it on the Column Fields area. 9. Click the + sign next to year 2001 and then the + sign next to quarter 3. Dept of CSE
38
BI &DM Manual
10. Expand the Dim Scenario node in the metadata panel. 11. Drag the Scenario Name attribute and drop it on the Filter Fields area. 12. Click the dropdown arrow next to scenario name. Uncheck all of the checkboxes except for the one next to the Budget value. Fig.5.9 shows the result. The Cube Browser displays month-by-month budgets by account for the third quarter of 2001. Although queries could have been written to extract this information from the original source data, its much easier to let Analysis Services do the heavy lifting for you.
Dept of CSE
39
BI &DM Manual
To create a simple report using the Report Wizard, follow these steps: 1. Launch Business Intelligence Development Studio. 2. Select File -> New -> Project. 3. Select the Business Intelligence Projects project type. 4. Select the Report Server Project Wizard template. 5. Name the new project ProductReport1 and pick a convenient location to save it in. 6. Click OK. 7. Read the first page of the Report Wizard and click Next. 8. Name the new data source AdventureWorksDS. 9. Click the Edit button. 10. Log on to your test server. Dept of CSE
40
BI &DM Manual
11. Select the AdventureWorks2008 database. 12. Click OK. 13. Click the Credentials button. 14. Select Use Windows Authentication. 15. Click OK. 16. Check the Make. This is a Shared Data Source checkbox. This will make this particular data source available to other Reporting Services applications in the future. 17. Click Next. 18. Click the Query Builder button. 19. If the full query designer interface does not display by default, click the query designer toolbar button at the far left end of the toolbar. Fig.6.1 shows the full query designer interface.
20. Click the Add Table toolbar button. 21. Select the Product table and click Add. 22. Click Close. Dept of CSE
41
BI &DM Manual
23. Check the Name, ProductNumber, Color, and ListPrice columns. 24. Click OK. 25. Click Next. 26. Select the Tabular layout and click Next. 27. Move the Color column to the Group area, and the other three columns to the Detail area, as shown in Fig. 6.2.
28. Click Next 29. Select the Stepped layout and click Next. 30. Select the Ocean style and click Next. 31. Accept the default deployment location and click Next. 32. Name the report ProductReport1. 33. Check the Preview Report checkbox. 34. Click Finish. 35. Fig.6.3 shows the finished report, open in Report Designer. Dept of CSE
42
BI &DM Manual
Using Report Server Project In general, following steps must be followed to create a report: 1. Create a Report project in Business Intelligence Design Studio or open an existing Report project. 2. Add a report to the project. 3. Create one or more datasets for the report. 4. Build the report layout. Specifically, follow the steps mentioned below 1. Select File ->New -> Project. 2. Select the Business Intelligence Projects project type. 3. Select the Report Server Project template. 4. Name the new report ProductReport2 and pick a convenient location to save it in. 5. Right-click on the Reports node in Solution Explorer and select Add-> New Item. Dept of CSE
43
BI &DM Manual
6. Select the Report template. 7. Name the new report ProductReport2.rdl and click Add. 8. In the Report Data window, select New -> Data Source. 9. Name the new Data Source AdventureWorksDS. 10. Select the Embedded Connection option and click on the Edit button. 11. Connect to your test server and choose the AdventureWorks2008 database. 12. Click OK. 13. Click OK again to create the data source. 14. In the Report Data window, select New -> Dataset. 15. Name the dataset dsLocation. 16. Click the Query Designer button. 17. If the full Query Designer does not appear, click on the Edit As Text button. 18. Click the Add Table button. 19. Select the Location table. 20. Click Add. 21. Click Close. 22. Check the boxes for the Name and CostRate columns. 23. Sort the dataset in ascending order by Name and click OK. 24. Click OK again to create the dataset. 25. Open the toolbox window (View ->Toolbox). 26. Double-click the Table control. 27. Switch back to the Report Data window. 28. Expand the dataset to show the column names. 29. Drag the Name field and drop it in the first column of the table control on the design tab. 30. Drag the CostRate field from the Report Data window and drop it in the second column of the table control. 31. Place the cursor between the column selectors above the Name and CostRate columns to display a double-headed arrow. Hold down the mouse button and drag the cursor to the right to widen the Name column. 32. Fig. 6.4 shows the report in Design view. 33. Select the Preview tab to see the report with data. Dept of CSE
44
BI &DM Manual
Publishing a Report
Creating reports in Business Intelligence Development Studio is good for developers, but it doesnt help users at all. In order for the reports built to be available to others, it must be published in the Reporting Services server of the users machine. To publish a report, the Build and Deploy menu items must be used in Business Intelligence Development Studio. Before this, the projects configuration need to be checked to make sure that appropriate server has been selected for the deployment. To publish the first report, follow these steps: 1. Select File->Recent Projects and choose your ProductReport1 project. Dept of CSE
45
BI &DM Manual
2. Select Project ->ProductReport1 Properties. 3. Click the Configuration Manager Button. 4. Fill in the Target Server URL for your Report Server. If youre developing on the same computer where Reporting Services is installed, and you installed in the default configuration, this will be http://localhost/ReportServer. Fig.6.5 shows the completed Property Pages.
5. Click OK. 6. Select Build->Deploy ProductReport1. The Output Window will track the progress of BIDS in deploying your report, as shown in Fig.6.6. Depending on the speed of your computer, building the report may take some time.
BI &DM Manual
7. Launch a web browser and enter the address http://localhost/reports. 8. It may take several minutes for the web page to display; Reporting Services goes to sleep when it hasnt been used for a while and can take a while to spin up to speed. Figure 6.7 shows the result.
9. Click the link for the ProductReport1 folder. 10. Click the link for the ProductReport1 report.
]
Dept of CSE
47
BI &DM Manual
To create a data model, follow these steps: 1. If its not already open, launch Business Intelligence Development Studio 2. Select File->New -> Project. 3. Select the Business Intelligence Projects project type. 4. Select the Report Model Project template. 5. Name the new project AWSales and save it in a convenient location. 6. Click OK. 7. Right-click on Data Sources in Solution Explorer and select Add New Data Source. 8. Read the first page of the Add New Data Source Wizard and click Next. 9. Click New. 10. In the Connection Manager dialog box connect to the AdventureWorks2008 database on your test server and click OK. 11. Click Next. 12. Name the new data source AdventureWorks and click Finish. 13. Right-click on Data Source Views in Solution Explorer and select Add New Data Source View. 14. Read the first page of the Add New Data Source View Wizard and click Next. 15. Select the AdventureWorks data source and click Next. 16. Select the Product(Production) table and click the > button to move it to the Included Objects listbox. 17. Select the SalesOrderDetail(Sales) table and click the > button to move it to the Included Objects listbox. 18. Click the Add Related Tables button. Dept of CSE
48
BI &DM Manual
19. Click Next. 20. Click Finish. 21. Right-click on Report Models in Solution Explorer and select Add New Report Model. 22. Read the first page of the Report Model Wizard and click Next. 23. Select the Adventure Works2008 data source view and click Next. 24. Keep the default rules selection, as shown in Fig.6.8 and click Next.
25. Choose the Update Statistics option and click Next. 26. Click Run to complete the wizard. 27. Click Finish. If you get a warning that a file was modified outside the source editor, click yes. 28. Select Build ->Deploy AWSales to deploy the report model to the local Reporting Services server.
Dept of CSE
49
BI &DM Manual
Building a Report
Report Builder itself is a ClickOnce Windows Forms application. It means that its a Windows application that end users launch from their web browser, but it never gets installed on their computer, so they dont need any local administrator rights on their computer to run it. To get started with Report Builder, browse to your Reporting Services home page. Typically, this will have a URL such as http://ServerName/Reports (or http://localhost/Reports if the browser is running on the same box with SQL Server 2008 itself). Fig.6.9 shows the Reporting Services home page.
To run Report Builder, click the Report Builder link in the home page menu bar. Report Builder will automatically load up all of the available report models and wait for the user to choose one to build a report from. Steps are as follows: 1. Open a browser window and navigate to http://localhost/Reports (or to the appropriate Report Server URL if you are not working on the reportserver). 2. Click the Report Builder link. 3. Depending on your operating system, you may have to confirm that you want to run the application. Dept of CSE
50
BI &DM Manual
4. After Report Builder is loaded, select the AdventureWorks2008 report model and the table report layout. Click OK. Fig.6.10 shows the new blank report that Report Builder will create. 5. Select the Product table. 6. Drag the Name field and drop it in the area labeled Drag and Drop Column Fields. 7. Click on Special Offer Products in the Explorer window to show related child tables. 8. Click on Sales Order Details. 9. Drag the Total Order Qty field and drop it to the right of the Name field. 10. Click where it says Click to Add Title and type Product Sales.
11. Click the Run Report button to produce the report shown in Fig.6.11.
Dept of CSE
51
BI &DM Manual
12. Click the Sort and Group toolbar button. 13. Select to sort by Total Order Qty descending. 14. Click OK. 15. Select File->Save. 16. Name the new report Product Sales. 17. Click Save. This will publish the report back to the Reporting Services server that you originally downloaded Report Builder from.
Dept of CSE
52
BI &DM Manual
7. Clementine
Clementine is a data mining workbench that enables you to quickly develop predictive models using business expertise and deploy them into business operations to improve decision making. Designed around the industry-standard CRISP-DM model, Clementine supports the entire data mining process, from data to better business results. Clementine Client, Server, and Batch Clementine uses client/server architecture to distribute requests for resourceintensive operations to powerful server software, resulting in faster performance on larger datasets.Additional products or updates beyond those listed here may also be available. Clementine Client. Clementine Client is a functionally complete version of the product that is installed and run on the users desktop computer. It can be run in local mode as a standalone product or in distributed mode along with Clementine Server for improved performance on large datasets. Clementine Server. Clementine Server runs continually in distributed analysis mode together with one or more client installations, providing superior performance on large datasets because memory-intensive operations can be done on the server without downloading data to the client computer. Clementine Server also provides support for SQL optimization, batch-mode processing, and in-database modeling capabilities, delivering further benefits in performance and automation. At least one Clementine Client or Clementine Batch installation must be present to run an analysis. Clementine Batch. Clementine Batch is a special version of the client that runs in batch mode only, providing support for the complete analytical capabilities of Clementine without access to the regular user interface. This allows long-running or repetitive tasks to be performed without user intervention and without the presence of the user interface on the screen. Unlike Clementine Client, which can be run as a standalone product, Clementine Batch must be licensed and used only in combination with Clementine Server.
Starting Clementine
Follow the below steps to start Clementine: From the Windows Start menu choose: All Programs-> Clementine-> SPSS
53
BI &DM Manual
When you first start Clementine, the workspace opens in the default view. The area in the middle is called the stream canvas. This is the main area to be used to work in Clementine. See Fig.7.2.
Most of the data and modeling tools in Clementine reside in palettes, the area below the stream canvas. Each tab contains groups of nodes that are a graphical representation of data mining tasks, such as accessing and filtering data, creating graphs, and building models. This is depicted in the Fig.7.3.
Dept of CSE
54
BI &DM Manual
Stream Canvas
Palette Area
To add nodes to the canvas, double-click icons from the node palettes or drag and drop them onto the canvas. Then connect them to create a stream, representing the flow of data.
Clementine Managers
On the top right side of the window are the outputs and object managers. These tabs are used to view and manage a variety of Clementine objects. This is shown in Fig.7.4.
Dept of CSE
55
BI &DM Manual
The Outputs tab contains a variety of files produced by stream operations in Clementine. User can display, rename, and close the tables, graphs, and reports listed here. See Fig.7.5.
The Models tab is a powerful tool that contains all generated models (models that have been built in Clementine) for a session. Models can be examined closely, added to the stream, exported, or annotated. See Fig 7.6.
Clementine Projects
On the bottom right side of the window is the projects tool, used to create and manage data mining projects. There are two ways to view projects you create in Clementine. 1. CRISP-DM view 2. Classes view Dept of CSE
56
BI &DM Manual
1. The CRISP-DM tab provides a way to organize projects according to the CrossIndustry Standard Process for Data Mining, an industry-proven, nonproprietary methodology. For both experienced and first-time data miners, using the CRISPDM tool will help you to better organize and communicate your efforts. See Fig.7.7.
2. The Classes tab provides a way to organize your work in Clementine categorically--by the types of objects you create. This view is useful when taking inventory of data, streams, models, etc. See Fig.7.8.
Dept of CSE
57
BI &DM Manual
Exercise 1
Problem: Imagine that a medical researcher is compiling data for a study. He has collected data about a set of patients, all of whom suffered from the same illness. During their course of treatment, each patient responded to one of five medications. Now, the researcher is entrusted with the job to find out which drug might be appropriate for a future patient with the same illness. Solution: 1. Store the data in a text file as shown in the table 7.1. The data fields used are Age, Sex, BP, Cholesterol, Blood Sodium concentration, Blood Potassium
concentration, Drug used. Age BP Cholesterol Na K Drug Number HIGH, NORMAL, LOW NORMAL or HIGH Blood Sodium Concentration Blood Potassium Concentration Prescription Drug to which patient responded Sex M or F
Table 7.1: Shows the data Fields
2. Read in delimited text data using a Variable File node. Add a Variable File node from the palettes--either click the Sources tab to find the node or use the Favorites tab, which includes this node by default. Next, double-click the newly placed node to open its dialog box. See Fig 7.9. 3. Click the button just to the right of the File box marked with an ellipsis (...) to browse to the directory and select the file called DRUG1n ( This File contains the data fields mentioned in the step 1)..Select Read field names from file and notice the fields and values that have just been loaded into the dialog box. This step is shown in Fig.7.10.
Dept of CSE
58
BI &DM Manual
4. Click the Data tab to override and change storage for a field. Note that storage is different than type, or usage of the data field. The Data Tab is highlighted by the Yellow Color in the Fig.7.11.
Dept of CSE
59
BI &DM Manual
Fig.7.11: Shows Data Tab 5. The Types tab helps you learn more about the type of fields in your data. You can
also choose Read Values to view the actual values for each field based on the selections that you make from the Values column. This process is known as instantiation which is shown in the Fig.7.12.
6. Adding a Table: Now that data is loaded to a file, glance at the values for some of the records. This is can be done by building a stream that includes a Table Dept of CSE
60
BI &DM Manual
node. To place a Table node in the stream, either double click an icon in the palette or drag and drop it on to the canvas. See Fig.7.13 and Fig.7.14.
Fig.7.14: shows the step 6 7. To view the table, click the green arrow button on the toolbar to execute the
stream, or right-click on the Table node and select Execute. This is shown in Fig.7.15.
18. Creating a distribution graph: During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several different types of Dept of CSE
61
BI &DM Manual
graphs to choose from, depending on the kind of data that you want to summarize. For example, to find out what proportion of the patients responded to each drug, use a Distribution node. 19. Add a Distribution node to the stream and connect it to the Source node, then double-click the node to edit options for display. Select Drug as the target field whose distribution you want to show. Then, click Execute from the dialog box. See Fig.7.16
20. The resulting graph shown in Fig.7.17 helps you see the "shape" of the data. It shows that patients responded to drug Y most often and to drugs B and C least often.
Dept of CSE
62
BI &DM Manual
Dept of CSE
63