Sunteți pe pagina 1din 7

Master of Business Administration- MBA Semester 3

MI0036 Business Intelligence Tools 4 Credits (Book ID: B1219) Assignment Set -1 (60 marks)

Q1

Explain business development life cycle in detail?

Answer
The Business development Lifecycle is a methodology adopted for planning, designing, implementing and maintaining the BI system. Various steps involved in this approach are depicted below. Each of the phases in the above life cycle is described below. Project Planning Developing a project plan involves identification of all the tasks necessary to implement the BI project. The Project Manager identifies the key team members, assigns the tasks, and develops the effort estimates for their tasks. There is much interplay between this activity and the activity of defining the Business Requirements and aligning the BI system/data warehouse system with the business requirements is very crucial. Therefore you need to understand the business requirements properly before proceeding further. Project Management This is the phase wherein the actual implementation of the project takes place. The first step here is to define the business requirements and the implementation is carried out in three phases on the basis of the requirements. The first phase (includes technical architecture design, selection and installation of a product) deals with technology, the second phase (includes Dimensional Modeling, Physical Design, ETL Design & Development) focuses on data and the last phase (includes BI Application Specification, BI Application Development) deals with design and development of analytical applications. The steps in these phases are discussed below. 1 Defining the Business Requirements Business requirements are the bedrock of the BI system and so the Business Requirements Definition acts as the foundation of the Lifecycle methodology. The business requirements defined at this stage provide the necessary guidance to make the decisions. This process mainly includes the following activities: Requirements planning Collecting the business requirements Post-collection documentation and follow-up 2 Technical Architecture Design Creation of the Technical Architecture includes the following steps: I. Establishing an Architecture task-force II. Collecting Architecture-related requirements III. Documenting the Architecture requirements IV. Developing a high-level Architectural model V. Designing and specifying the subsystems VI. Determining Architecture implementation phases VII. Documenting the technical Architecture VIII. Reviewing and finalizing the Architecture 3 Selection and Installation of a Product The selection and the installation of a business intelligence product is carried out in the following steps: I. Understanding the corporate purchasing process II. Developing a product evaluation matrix III. Conducting market research

IV. Shortlisting the options and performing detailed evaluations V. Conducting a prototype (if necessary) VI. Selecting a product, installing on trial, and negotiating the value/price. 4 Dimensional Modeling A dimensional model packages the data in a symmetric format whose design goals are obtaining the user know-how, query performance, and resilience to change. In this step, a data-modeling team is formed and design workshops are conducted to create the dimensional model. Once the modeling team is confident of the model prepared, the model is demonstrated and validated with a broader audience and then documented. 5 Physical Design In this step, the dimensional model created in the previous step is translated into a physical design. The physical model includes the details viz., physical database, data types, key declarations, permissibility of nulls. 6 ETL Design & Development ETL stands for Extraction, Transformation, and Loading. ETL tools are used to extract the data from the operational data sources and to load the same into a data warehouse. 7 BI Application Specification In this step, a set of analytical applications are identified for building a BI system based on the business requirements definition, type of data being used, and the architecture of the warehouse proposed. 8 BI Application Development This is step wherein a specific application (tool) is selected from the identified applications for actual implementation of the BI system. 9 Deployment This is the step wherein the technology, data and analytical application tracks are converged. The completion of this step can be assumed as the completion of actual building of the BI system. 10 Maintenance & Growth During this step, the project team provides the user-support to the end-users of the system. Also, the team involves in providing the technical support required for the system so as ensure the continuous utilization of the system. This step may also include making some minor enhancements to the BI system. Revising the Project Planning As the project makes progress, the project manager of the project has to revise the project plan to accommodate the new business interests, concerns raised by the end-users.

Q2 Discuss the various components of data warehouse? Answer Data Extraction is the act or the process of extracting data out of, which is usually unstructured or badly structured, datasources for added data processing or data storage or data migration. This data can be extracted from the web. The internet pages in the html, xml etc can be considered to be unstructured data source because of the wide variety in the code styles.This also includes exceptions and violations of the standard coding practices. The import into the intermediate extractingsystem can be usually followed by data transformation and possibly the inclusion of metadata prior to export to another stage in the data workflow.Usually unstructured data sources include web pages, emails, documents, PDFs, scanned text, mainframe reports, spoolfiles etc. Extracting the data from these unstructured sources has become a considerable technical challenge where ashistorically data extraction had to deal with changes in physical hardware formats. Majority of the current data extractiondeals with extracting the data from

the unstructured data sources, and from different software formats. The rising processof data extraction from the web is also known as Web scraping.The process of adding structure to unstructured data can be done in a number of forms: Using a text pattern matching which is also known as Regular expression to recognise small or large scalestructure. For example, records in a report and their related data from headers and footers. Using the table-based approach to recognise the common sections within a limited domain. For example, inresumes identify the skills, previous work experience, qualifications etc. Using a standard set of commonly usedheadings. For example, Education might be found under Education or Qualification or Courses. Using the text analytics to try to understand the text and then link it to other information.Overview of Extraction, Transformation, and Loading (ETL)The data warehouse should be loaded regularly so that the purpose of facilitating business analysis can be served. In order to perform this operation, data from one or more operational systems must be obtained and copied into the warehouse. The process of obtaining data from the source systems and bringing it into the data warehouse is usually called Extraction,Transformation, and Loading (ETL).ETL is perhaps too simple, because it omits the transportation phase and indicates that each of the other phases of the process is different. The whole process along with data loading is referred to as ETL.ETL relates to a broad process and not to the three well-defined steps.The methodology and the tasks of ETL have been well known for many years, and are not essentially unique to datawarehouse environments where a wide variety of proprietary applications and database systems are considered as the IT backbone of any enterprise. Data has to be shared between the applications or systems, try to combine them, and give atleast two applications the same picture of the world. This kind of data sharing was regularly addressed by mechanismssimilar to what is now known as ETL.In Data warehouse environments apart from exchange there is additional burden of integrating, rearranging andconsolidating data over many systems, thus, providing a new combined information base for business intelligence.Furthermore, the data volume in data warehouse environment tends to be very big.In the ETL process, during extraction the desired data will be identified and extracted from many different sourcesincluding database systems and applications. Often it is not possible to recognise the particular subset of interest, thereforemore data than required has to be extracted, so the recognition of the appropriate data can be done at a later point in time. Depending on the source systems capabilities, for example in the operating system resources, some transformations can take place during the extraction process. The size of the extracted data can range from hundreds of kilobytes up togigabytes, depending on the source system and the business situation. The same is true for the time delta between twologically identical extractions where the time span can differ between days/hours and minutes to near real-time. The Web servers log files for example can easily become hundreds of megabytes in a very short span of time. After extracting the data, it has to be physically moved to the target system or to an intermediate system for further processing. Depending on the selected way of transportation, some transformations can be done during this process. For example, a SQL statement which can directly access a remote target through a gateway can concatenate two columns usingthe SELECT statement. Offline Extract, Transform, and Load (ETL)Previously the one common interface that was given between the dissimilar systems in an organisation was magnetic tape.They were standardized and any system could have written tapes that could be read by other systems. So, the first datawarehouses were fed by magnetic tapes prepared by different systems within the organisation which left the problem of data disparity. There is often little relation to data written to tape to one system to data written by another system. The data warehouse

s database was designed to support the analytical functions necessary for the business intelligence function. Thedatabase design was a well structured database with complex indices to support the Online Analytical Processing (OLAP).The databases configured for OLAP allows the complex analytical and ad hoc queries with quick execution time. The datagiven to the data warehouse from the enterprise system gets transformed to a format understandable to the data warehouse.To overcome the problem of loading the data initially to the data warehouse, keeping it updated and resolvingdiscrepancies the Extract, Transform and Load (ETL) utilities were developed.The key to the success of this approach is the transformation function. The transform function is the key to the success of this approach and helps to apply a series of rules to the extracted data so that it is correctly formatted for loading into thedata warehouse.The examples of transformation rules are: Selecting the data to load. Translating the encoded items. Encoding and standardizing the free-form values. Deriving the new calculated values that is sale price = price discount. Merging of data from multiple sources. Summarizing or aggregating certain rows and columns. Splitting a column into multiple columns for example- a comma-separated list. Resolving the discrepancies between similar data items. Validating the data. Ensuring the data consistency.The ETL function allows the integration of multiple data sources into a well-structured database for the use in complexanalyses. The ETL process will have to be executed periodically such as daily, weekly, or monthly, depending on the business needs. This process is called the offline ETL because the target database is not always updated. It is updated periodically on batch basis.Though the offline ETL serves its purpose well there are some serious drawbacks which are as follows: The data in the data warehouse could be weeks old. Therefore, it is helpful for planned functions but is not particularly adaptable for tactical uses. The source database typically should be inactive during the extract process. Or else, the target database is not ina consistent state following the load. Considering this result, the applications must be shut down, often for hours.In online ETL, the function of ETL is to support the real-time business intelligence which should be continuous and non-invasive. In contrast to offline ETL which gives old but consistent responses to queries the online ETL gives present butvarying responses to successive queries. This is because the data that it uses is continuously updated to reflect the presentstate of the enterprise.The Offline ETL technology has always served businesses for decades. The intelligence that is obtained from this datainforms long-term reactive strategic decision making. On the other hand, the short-term operational and proactive tacticaldecision making will continue to rely on instinct. Q3 Discuss data extraction process? What are the various methods being used for data extraction?

Answer

Q4
Answer

Discuss the needs of developing OLAP tools in details?

Online Analytical Processing OLAP is a data warehousing tool used to organize, partition and summarize data in the data warehouse and data marts. OLAP is an approach to respond quickly to multi-dimensional analytical queries. OLAP belongs to the category of business intelligence. OLAP finds applications in business reporting for sales, marketing, budgeting and forecasting, management reporting, business process management, financial reporting and similar areas. The output of an OLAP query is displayed as a matrix. The dimensions form the rows and columns of the matrix and the measures form the values. OLAP creates a hypercube of information. The cube metadata is usually created from a star schema or snowflake schema (to be discussed later) of tables in a relational database. Characteristics of OLAP The data warehouse dealers and few other researchers have extracted a general definition for OLAP which is FASMI. It stands for Fast, Analysis, Shared, Multidimensional, and Information. Let us discuss these characteristics in detail. Fast: This refers to the ability of OLAP to respond to the user requests in less than 5 seconds. The response time for a complex requests would probably take 20 seconds. The speed is achieved by using various techniques like specialized data storage, certain hardware components, pre-calculations and so on. Analysis: OLAP has the capacity of handling any business or statistical analysis for users. The most commonly used analysis techniques are slice and dice and drill down. Shared: When multiple write access is granted, the system has the ability to maintain confidentiality and lock simultaneous update. The recent OLAP products realise the need for write access and are capable of handling updates from multiple users in a timely order. Shared also refers to the ability of the system to provide multiple user access without letting the files to duplicate. Multidimensionality: It is the main feature of OLAP products. Multidimensionality requires organising data in the format as per the organisations actual business dimension. For example, in a marketing company, the dimensions maybe lined on dimensions like clients, products, sales, time and so on. The cells contain relevant data at the inter section of dimensions. Sometimes cells are left blank. For example, a client may not always buy products at all time frames. This is called sparsity. Information: Refers to all the data and required information for users. The data capacity varies with factors like data access methods and level of data duplication. OLAP must contain the data which the user requires and must provide efficient data analysis techniques .Different techniques can be followed to attain FASMI objectives which include client-server architecture, time series analysis, object orientation, parallel processing, optimised data storage, and multi-threading. OLAP Tools Online Analytical Processing falls under the group of software tools which enables the analysis of data stored in a database. It is often used in data mining. Using this tool, user can analyse various dimensions of a multidimensional data. For example, it provides both the time analysis and the trend analysis views. There are two types of OLAP tools. They are: Multidimensional OLAP (MOLAP): In this type of OLAP a cube is extracted from the relational data warehouse. Once the user generates a report request, the MOLAP tools responds quickly as the data is extracted. Relational OLAP (ROLAP): In this type of OLAP, the data is not extracted. The ROLAP engine behaves like a smart

SQL generator. The ROLAP tool comes with a Designer piece, where the data warehouse administrator can not only specify the relationship between the relational tables, but also how dimensions, attributes, and hierarchies map to the database tables. As of now the ROLAP and MOLAP traders are going in for the combination of both. MOLAP traders find it necessary to get down to low levels of details at times and ROLAP traders find it important to deliver results to users at a rapid pace. Hence the traders find it essential to merge both the tools. Q5 What do you understand by the term statistical analysis? Discuss the most important statistical techniques?

Answer
Guided Transmission Medium can be classified into Twisted Pair Coaxial Cable Optical Fiber Twisted Pair A twisted pair consists of two insulted copper wires, typically about 1 mm thick. The wires are twisted together in a helical form, just like a DNA molecule. Twisting is done because two parallel wires constitute a fine antenna. When the wires are twisted, the waves from different twists cancel out, so the wire radiates less effectively. Coaxial Cable The coaxial cable consists of a stiff copper wire as the core, surrounded by an insulating material. The insulator is encased by a cylindrical conductor, often as a closely-woven braided mesh. The outer conductor is covered in a protective plastic sheath. The construction and shielding of the coaxial cable give it a good combination of high bandwidth and excellent noise immunity. Optical Fiber A fiber-optic cable is made of glass or plastic and transmits signals in the form of light. Fiber optic cables are similar to coax. At the center is the glass core through which the light propagates. The core is surrounded by a glass cladding with a lower index of refraction than the core, to keep all the light in the core. Comparison of Fiber Optics and Copper Wire Fiber Optics Copper Wire It can handle much higher band widths than Copper Wires handles lesser bandwidth than Fiber copper. Optics Fiber is not being affected by the power surges, Copper Wires could be attract electromagnetic electromagnetic interference, or power failures interferences Fibers do not leak light and are quite difficult to Lesser security against wire tapping tap. This gives them excellent security against potential wire tappers Fiber interfaces cost more Less Costlier

Q6

What are the methods for determining the executive needs?

Answer
Four different types of satellite orbits can be identified depending on the shape and diameter of the orbit: GEO (Geostationary orbit) LEO (Low Earth Orbit) MEO (Medium Earth Orbit) or ICO (Intermediate Circular Orbit) HEO (Highly Elliptical Orbit) elliptical orbits The following figure clearly depicts their differences-

GEO

Orbit Altitude Geostationary 36000 km above earth surface

Coverage Suited for continuous, regional coverage using a single satellite. Can also be used equally effectively for global coverage using a minimum of three satellites Multi-satellite constellations of upwards of 30-50 satellites are required for global, continuous coverage.

Visibility Mobile to satellite visibility decreases with increased latitude of the user. Poor Visibility in built-up, urban regions

LEO

Low Earth orbit

500 - 1500 km

MEO Medium Earth orbit

6000 - 20000 km

Multi-satellite constellations of between 10 and 20 satellites are required for global coverage. Three or four satellites are needed to provide continuous coverage to a region

The use of satellite diversity, by which more than one satellite is visible at any given time, can be used to optimize the link. This can be achieved by either selecting the optimum link or combining the reception of two or more links. Good to excellent global visibility, augmented by the use of satellite diversity techniques Particularly designed to provide high guaranteed elevation angle to satellite for Northern and Southern temperate latitudes

HEO

Highly elliptical orbit

Apogee: 40 00050 000 km, Perigee: 100020 000 km.

S-ar putea să vă placă și