Documente Academic
Documente Profesional
Documente Cultură
1.
Why is Big Data important? What are the Vs that are used to define Big Data?
When big data is effectively and efficiently captured, processed, and analyzed, companies
are able to gain a more complete understanding of their business, customers, products,
competitors, etc. which can lead to efficiency improvements, increased sales, lower costs,
better customer service, and/or improved products and services.
Volume:
Big data implies enormous volumes of data. It is used to be employees created data. Now
that data is generated by machines, networks and human interaction on systems like
social media the volume of data to be analyzed is massive.
Variety:
Variety refers to the many sources and types of data both structured and unstructured. We
used to store data from sources like spreadsheets and databases.
Velocity:
Big data velocity deals with the pace at which data flows in from sources like business
processes, machines, networks and human interaction with things like social media sites,
mobile devices, etc.
2.
What are the critical success Factors for Big Data Analytics and explain them?
Ensure alignment of the organization and project.
Simply designing and building a big data application will not ensure its success.
Consideration must be given to who is going to provide the infrastructure for the
application and more importantly who is going to operate it and how. The application will
become costly or even legacy if no thought is given to who is going to maintain it and
Business Intelligence
how. Most importantly, a big data project is likely to disrupt how the business currently
operates, and so the project needs to consider the business change required to make full
use of the application and how it will transform. This spans process, structural and
cultural change. All parts of the organization involved in the project need to focus on a
common goal to succeed. Sound governance must be put in place to deliver and sustain
the project to realize the desired benefits.
Apply an ethical policy.
Incorporation of new data sources into big data systems coupled with significant
improvements in the capabilities of analytics technology provides organizations with
opportunities to gain far greater and far deeper insight than ever before. For example,
bringing together corporate records on customers with log files on customers' use of
applications, social media data and statistical modeling techniques, allows a rounded, upto-date view of individuals to be formed. However, this does not mean that any insight
should be derived nor should insight necessarily be acted upon. Consideration should be
given to the original purpose for which the individual gave information about themselves
and whether an organization's intended use of that data is reasonable, and indeed seen to
be reasonable. Moreover, data quality becomes more important with big data because
errors are amplified. Poor quality data may also detract from minimizing false positives
and false negatives. So if resulting actions are wrong, an organization risks reputational
damage or contravention of regulations.
Employ the right skills.
Organizations should utilize their existing business intelligence staff in big data projects:
big data is not something separate, but it augments what these people do already.
Business Intelligence
However, skills development is needed to be successful with big data. Firstly, big data
systems utilize large scale infrastructure which requires skills to design and operate it
successfully. Secondly, skills in statistics and programming are needed to reflect the
business opportunity in the resulting applications. Taking an approach which only utilises
data warehousing skills will simply result in today's techniques being applied on big data
technology, thereby not fully exploiting the opportunity. As an aside, organizations
should recognize that Hadoop is not necessarily a replacement for a data warehouse: they
have different design points. What is well suited to one may not be best suited to the
other, and the skills required to build and operate each system differ. Ultimately,
maximizing the business return from a big data system is more than simply choice of
technologies, and one of the factors that must be taken into account is acquisition of the
right infrastructure and analytics skills to succeed.
3. What are the common characteristics of emerging Big Data technologies?
Business intelligence, querying, reporting, searching, including many implementation of
searching, filtering, indexing, speeding up aggregation for reporting and for report
generation, trend analysis, search optimization, and general information retrieval.
(Examples include: Alibaba, University of North Carolina Lineberger Comprehensive
Cancer Center, University of Frieburg.)
Improved performance for common data management operations, with the majority
focusing on log storage, data storage and archiving, followed by sorting, running joins,
Extraction/Transformation/Loading (ETL) processing, other types of data conversions, as
well as duplicate analysis and elimination. (Examples: AOL, Brilig, Infochimps.)
Non-Database Applications, such as image processing, text processing in preparation for
publishing, genome sequencing, protein sequencing and structure prediction, web
Business Intelligence
Business Intelligence
5.
What is Hadoop? How does it work? What are the main Hadoop components?
Hadoop is a free, Java-based programming schema that backings the transforming of
Map Reduce:
Business Intelligence
Define Cloud Computing. How does it relate to PaaS, SaaS and IaaS?
Cloud computing will be computing in which huge gatherings of remote servers are
Business Intelligence
In the PaaS models, cloud providers deliver a computing platform, typically including
operating system, programming language execution environment, database, and web server.
Application developers can develop and run their software solutions on a cloud platform
without the cost and complexity of buying and managing the underlying hardware and
software layers
Software as a service (SaaS):
In the SaaS model, cloud providers install and operate application software in the cloud
and cloud users access the software from cloud clients. Cloud users do not manage the cloud
infrastructure and platform where the application runs. This eliminates the need to install and
run the application on the cloud user's own computers, which simplifies maintenance and
support.
Infrastructure as a service (IaaS):
In the most basic cloud-service model & according to the IETF (Internet Engineering
Task Force), providers of IaaS offer computers physical or (more often) virtual machines
and other resources. Cloud providers typically bill IaaS services on a utility computing basis:
cost reflects the amount of resources allocated and consumed
8. How does Cloud Computing affect Business Intelligence?
When looking into practicalities of moving BI into the cloud we should first consider
potential benefits and then examine the risks involved.
Increased Elastic Computing
Power Computing power refers to how fast a machine or software can perform an
operation. Hosting BI on the cloud means that the computing power, or processing power,
depends on where the software itself is hosted, rather than the on-premises hardware.
Business Intelligence
Cloud computing has become very popular over the last few years and is hailed as
revolutionizing IT, freeing corporations from large IT capital investments, and enabling
them to plug into extremely powerful computing resources over the network, As the
volume of data increases to unprecedented levels and the growing trend of Big Data,
becomes a norm rather than an exception more and more businesses are looking for BI
solutions that can handle gigabytes (and eventually terabytes) of data
Potential Cost Savings
Pay-as-you-go computing power for BI tools has the potential to reduce costs. A user on
the cloud only has to pay for whatever computing power is needed. Computing needs
could vary considerably due to seasonal changes in demand or during high-growth phases
this makes IT expenditure much more efficient.
Easy Deployment
The cloud makes it easier for a company to adopt a BI solution and quickly experience
the value. Managers will see results quickly and increased confidence surrounding the
success of the implementation. Deployment requires less complicated upgrades for
existing processes and IT infrastructure. The development cycle is much shorter, meaning
that the adoption of BI does not have to be a drawn out process, thanks to the elimination
of complicated upgrade processes and IT infrastructures demanded by on-premises BI
solutions.
Supportive of Nomadic Computing
Nomadic computing is the information systems support that provides computing and
communication capabilities and services to users, as they move from place to place. As
globalization continues to dominate all industries, nomadic computing services and
Business Intelligence
solutions will grow in demand. It also allows employees and BI users to travel without
losing access to the tools.
9. Define Business Intelligence? Discuss the framework and the benefits of Business
Intelligence?
Business intelligence is a process of analyzing data and transforming the raw data into a
readable data by using various tools. By using the available tools like ETL( Extract
Transform Load) we can transform the raw data. Business intelligence provides effective
view of data by establishing effective decision-making and strategic operational insights
through functions like online analytical processing (OLAP), reporting, predictive
analytics etc. Analytical tools ought to help decision makers to discover the right data
rapidly and empower them to make overall educated decisions.
Business intelligence (BI) is the process of gathering the right data in the right way at the
correct time, and conveying the right comes about to the right individuals for decisionproduction purposes.
Framework
A business intelligence framework gives the strategy, standards and best practices
needed to guarantee that business intelligence reporting and analysis meets organizational
prerequisites. It is contained:
Data management (Governance) standards and best practices;
Project management framework; and
Metrics management
Business Intelligence
10
Benefits:
Higher revenue per employee can be achieved by implementing Business Intelligence
in a company.
Time Saving- This is the major advantage of BI. By implementing BI in a company,
most of the processes are automated so it will save both time and costs. It ultimately
increases the productivity of the organization.
By BI we can choose right decisions, In order to stay in competition with other
companies every company has to take right decisions. By implementing Business
Intelligence in a company we can achieve this.
BI can make data readable and accessible
10. Describe the basic elements of the Balanced Scorecard (BSC) and Six Sigma
Methodologies?
Balanced scorecard (BSC) is both performance estimation and a management
methodology that makes a difference interpret an organization's financial, customer, internal
process and learning and development objectives and focuses into a set of significant
activities. As an issue approach, BSC is intended to defeat the constraints of systems that are
financially centered. It does this by interpreting an organization's vision and strategy into a
set of interrelated financial and nonfinancial objectives, measures, targets, and activities. The
nonfinancial objectives can be categorized as one of three viewpoints:
Customers:
This objective characterizes how the organization ought to seem to its customers if it
is to fulfill its vision.
Business Intelligence
11
Business Intelligence
12
Business Intelligence
13
Dashboard
Is used for
performance measurement
or monitoring
Balanced Scorecard
performance management
As a
measurement tool is
Metric
Measure is
linked to business
objectives
doesnt link
links
It measures
Performance
It is updated
in real-time
operational (short-term)
It focuses on
goals
12. What is Data Warehousing? What are the characteristics of Data Warehousing?
Explain the Data Warehousing Framework? What is the future of Data
Warehousing?
a data warehouse (DW) is a pool of data delivered to support decision making. it is
also a repository of present and historical data of potential interest to managers all through
the organization. Data are usually structured to be available in a form ready for analytical
processing activities. A data warehouse is a subject-oriented, integrated, time-variant,
nonvolatile collection of data in support of management's decision-making process.
Business Intelligence
14
Business Intelligence
15
A data warehouse uses the client/server architecture to provide easy access for end
users
Metadata
A data warehouse contains metadata (data about data) about how the data is organized
and how to effectively use them. And other characteristics are relational and real-time.
Data warehouse Framework
Data sources.
Data is sourced from different autonomous operational "legacy" systems and perhaps
from outer data providers. Data might likewise originate from an online transaction
processing (OLTP) or ERP system. Web data as Weblogs may additionally sustain a data
warehouse.
Data extraction and transformation.
Data is extracted and properly transformed using custom-written or commercial
software called ETL.
Data loading.
Data are loaded into a staging area, where they are transformed and purged. The data
are then prepared to load into the data warehouse and/or data marts.
Comprehensive database.
Basically, this is the EDW to support all decision analysis by giving pertinent outlined
and definite information originating from numerous different sources.
Business Intelligence
16
Business Intelligence
17
BPM has characterized as the convergence of BI and planning. The processes that
BPM encompasses are not new. Virtually
every medium and large organization has processes in place. BPM adds is a
framework for integrating these processes,
methodologies, metrics, and systems into a unified solution.
BPM system is strategy driven. It encompasses closed-loop set of processes that link
strategy to execution in order to optimize business performance.
BPM cycle is a continuous process it consists five major steps. Plan, Execute,
Monitor, Analyze, Forecast. Every step has to follow the prior step to start the next process.
BPM involves monitoring key performance indicators (KPIs) that measure whether an
organization is meeting its objectives and overarching strategy. A KPI in this sense is a
measure defined by a business that allows for observation of actual values, as they may
emerge from line-of-business (LOB) applications
14. Define Six Sigma? What is DMAIC Performance Model? What is the payoff from
Six Sigma?
Six Sigma is a disciplined, data-driven approach and methodology for eliminating
defects (driving towards six standard deviations between the mean and the nearest
specification limit) in any process, product, or service.
DMAIC Performance model:
Define.
Define the goals, objectives, and limits of the improvement action. At the top level,
the goals are the strategic objectives of the company. At lower levels-office or project levelsthe goals are centered around particular operational forms.
Business Intelligence
18
Measure.
Measure the current framework. Make quantitative measures that will yield
measurably legitimate data. The data can be utilized to screen advance around the goals
defined in the past step.
Analyze.
Analyze the framework to recognize approaches to take out the crevice between the
current execution of the framework or procedure and the fancied objective.
Improve
Launch activities to dispense with the hole by discovering approaches to improve
things, cheaper, or faster. Utilization project management and other arranging tools to
actualize the new approach.
Control
Organize the enhanced framework by changing recompense and incentive systems,
policies, procedures, producing asset arranging, budgets, operation guidelines, or other
management systems.
15. What is Business Intelligence Governance?
Business Intelligence (BI) Governance provides a customized framework to help senior
managers design and implement good BI governance guiding principles, decision-making
bodies, decision areas, and oversight mechanisms to fit your companys unique needs and
culture.
The objectives of BI Governance are:
Clearly defined authority and accountability, roles and responsibilities.
Program planning, prioritization, and funding processes.
Communicating strategic business opportunities to IT.
Business Intelligence
19
16. What is Big Data analytics? What are the sources of Big Data? What are the
characteristics of Big Data? What processing techniques is applied to process Big Data?
Big data analytics is the methodology of looking at substantial information sets
containing an assortment of information sorts - i.e., huge information - to reveal concealed
examples, obscure connections, business sector patterns, client inclination and other helpful
business data.
Business intelligence, querying, reporting, searching, including many implementation of
searching, filtering, indexing, speeding up aggregation for reporting and for report
generation, trend analysis, search optimization, and general information retrieval.
(Examples include: Alibaba, University of North Carolina Lineberger Comprehensive
Cancer Center, University of Frieburg.)
Improved performance for common data management operations, with the majority
focusing on log storage, data storage and archiving, followed by sorting, running joins,
Extraction/Transformation/Loading (ETL) processing, other types of data conversions, as
well as duplicate analysis and elimination. (Examples: AOL, Brilig, Infochimps.)
Non-Database Applications, such as image processing, text processing in preparation for
publishing, genome sequencing, protein sequencing and structure prediction, web
crawling, and monitoring workflow processes. (Examples: Benipal Technologies,
University of Maryland.)
Business Intelligence
20
Data mining and analytical applications, including social network analysis, facial
recognition, profile matching, other types of text analytics, web mining, machine
learning, information extraction, personalization and recommendation analysis, ad
optimization, and behavior analysis
17. What are ROLAP, MOLAP and HOLAP? How do they differ from OLAP?
OLAP (Online Analytical Processing): On-Line Analytical Processing (OLAP) is a
category of software technology that enables analysts, managers and executives to gain
insight into data through fast, consistent, interactive access to a wide variety of possible
views of information that has been transformed from raw data to reflect the real
dimensionality of the enterprise as understood by the user. In the OLAP world, there are
mainly two different types: Multidimensional OLAP (MOLAP) and Relational OLAP
(ROLAP). Hybrid OLAP (HOLAP) refers to technologies that combine MOLAP and
ROLAP.
MOLAP: This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a
multidimensional cube. The storage is not in the relational database, but in proprietary
formats.
ROLAP: This methodology relies on manipulating the data stored in the relational database
to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each
action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement.
HOLAP: HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP.
For summary-type information, HOLAP leverages cube technology for faster performance.
Business Intelligence
21
When detail information is needed, HOLAP can "drill through" from the cube into the
underlying relational data.
18. What are the major Data Mining Processes?
Classification: Mining patterns that can classify future data into known classes.
Association rule mining: Mining any rule of the form X Y, where X and Y are sets of
data items.
Clustering: Identifying a set of similarity groups in the data
Sequential pattern mining: A sequential rule: A B, says that event A will be
immediately followed by event B with a certain confidence
Deviation detection: discovering the most significant changes in data
Data visualization: using graphical methods to show patterns in data.
19. Identify at least three of the main data mining methods.
Clustering: Identifying a set of similarity groups in the data
Sequential pattern mining: A sequential rule: A B, says that event A will be
immediately followed by event B with a certain confidence
Deviation detection: discovering the most significant changes in data
20. What are some of the methods for cluster analysis?
There are a number of different methods that can be used to carry out a cluster analysis;
these methods can be classified as follows:
Hierarchical methods Agglomerative methods, in which subjects start in their own
separate cluster. The two closest (most similar) clusters are then combined and this is
Business Intelligence
22
done repeatedly until all subjects are in one cluster. At the end, the optimum number of
clusters is then chosen out of all cluster solutions. Divisive methods, in which all
subjects start in the same cluster and the above strategy is applied in reverse until every
subject is in a separate cluster. Agglomerative methods are used more often than divisive
methods, so this handout will concentrate on the former rather than the latter.
Non-hierarchical methods (often known as k-means clustering methods)
References :
Business Intelligence
23
https://mycc.cambridgecollege.edu/.../Data_Warehous.
http://www.grantthornton.com/~/media/content-page-files/advisory/pdfs/2014/BASprescriptive-analytics.ashx
http://www.tdan.com/view-articles/4681
http://searchdatamanagement.techtarget.com/definition/data-analytics
http://www.rosebt.com/blog/predictive-descriptive-prescriptive-analytics
http://www.analytics-magazine.org/november-december-2010/54-the-analytics-journey
http://bigdataguru.blogspot.com/2012/09/difference-between-descriptive.html
http://searchdatamanagement.techtarget.com/definition/predictive-modeling
http://searchbusinessanalytics.techtarget.com/definition/big-data-analytics
http://www.zdnet.com/article/top-10-categories-for-big-data-sources-and-miningtechnologies/
http://www.slideshare.net/venturehire/what-is-big-data-and-its-characteristics
http://www.developer.com/db/understanding-big-data-processing-and-analytics.html
http://www.csun.edu/~twang/595DM/Slides/Week6.pdf
http://www.processmining.org/_media/blogs/pub2013/1-wvda-process-cubesapbpm2013.pdf
https://apandre.wordpress.com/data/datacube/
http://social.technet.microsoft.com/wiki/contents/articles/19898.differences-betweenolap-rolap-molap-and-holap.aspx
Business Intelligence
24
https://www.ibm.com/developerworks/community/blogs/bigdataanalytics/entry/critical_s
uccess_factors_for_big_data_in_business_part_4?lang=en
http://www.nedsi.org/proc/2013/proc/p121023001.pdf
http://hadoopilluminated.com/hadoop_illuminated/MapReduce_Intro.html
http://www.statstutor.ac.uk/resources/uploaded/clusteranalysis.pdf