Sunteți pe pagina 1din 16

UNIT 1:

Analytics is the use of:

data,

information technology,

statistical analysis,

quantitative methods, and

mathematical or computer-based models,

to help managers gain improved insight about their business


operations and

make better, fact-based decisions.

COMPONENTS OF BUSINESS ANALYTICS

Business context.

 Business Analytics projects starts with business context.

 Ability of organization to ask right questions (it is an important


success criteria for analytics projects).

 E.g. Target corporation’s pregnancy prediction.

 E.g. ‘Did you forget feature’ and ‘smart basket’, used by


bigbasket.com

Technology.

 IT is used for data capture, data storage, data preparation,


data analysis, and data share.

 Today, most data are unstructured data.


 Unstructured data: not in rows and columns.

 Images, texts, voice, video, click stream are few examples of


unstructured data.

 For data analysis, we need to use software such as R, Python,


SAS, SPSS, Tableau etc.

Data Science:Consists of:

 Statistical and operations research techniques.

 Machine learning and deep learning algorithms.

 Main objective: to identify the most appropriate statistical


model/machine learning algorithm.

TYPES OF BA

 Descriptive analytics.

 Predictive analytics.

 Prescriptive analytics.

 Diagnostic or Detective

APPLICATIONS OF BUSINESS ANALYTICS

 Management of customer relationships

 Financial and marketing activities

 Supply chain management

 Human resource planning


 Pricing decisions

 Sports and team game strategies

IMPORTANCE OF BA:

There is a strong relationship of BA with:

- ability to make better decisions

- profitability of businesses

- revenue of businesses

- shareholder return

 BA enhances understanding of data

 BA is vital for businesses to remain competitive

 BA enables creation of informative reports

What is a data scientist?


Data scientists are responsible for discovering insights from massive
amounts of structured and unstructured data to help shape or meet specific
business needs and goals.
The data scientist role

A data scientist’s main objective is to organize and analyze large amounts of


data, often using software specifically designed for the task. The final results
of a data scientist’s analysis needs to be easy enough for all invested
stakeholders to understand — especially those working outside of IT.

Data analysts collect, process and perform statistical analyses of data. Their
skills may not be as advanced as data scientists (e.g. they may not be able to
create new algorithms), but their goals are the same – to discover how data can be
used to answer questions and solve problems.
Data Analyst vs. Data Scientist -
Differences
 The job role of a data scientist strong business acumen and data visualization skills to
converts the insight into a business story whereas a data analyst is not expected to possess
business acumen and advanced data visualization skills.
 Data scientist explores and examines data from multiple disconnected sources whereas
a data analyst usually looks at data from a single source like the CRM system.
 A data analyst will solve the questions given by the business while a data scientist will
formulate questions whose solutions are likely to benefit the business.
 In many scenarios, data analysts are not expected have hands-on machine learning
experience or build statistical models but the core responsibility of a data scientist is to build
statistical models and be well-versed with machine learning.

 Data quality is important because we need: accurate and timely information to manage
services and accountability. good information to manage service effectiveness. to
prioritise and ensure the best use of resources

Data Analytics Tools:

 R Programming.

 Tableau Public:

 3.Python.
 SAS:

 Apache Spark.

 Excel.

 RapidMiner:

 KNIME.

UNIT 2: DBMS

A database management system (DBMS) is system software for


creating and managing databases. The DBMS provides users and
programmers with a systematic way to create, retrieve, update
and manage data
The functions of a DBMS include
1. It Organizes Data
2. It integrates Data
3. It Separates Data
4. It Controls Data
5. It Retrieves Data
6. It Protects Data.

A relational database management system (RDBMS) is a


collection of programs and capabilities that enable IT teams and
others to create, update, administer and otherwise interact with
a relational database. Most commercial RDBMSes use Structured
Query Language (SQL) to access the database, although SQL was
invented after the initial development of the relational model and
is not necessary for its use.

What is SQL?

SQL refers to Structured Query Language, as it is the special


purpose domain specific language for querying the data in
Relational Database Management System (RDBMS).

Types of SQL statements

SQL statements are categorized into four different type of


statements, which are

1. DML (DATA MANIPULATION LANGUAGE)

2. DDL (DATA DEFINITION LANGUAGE)

3. DCL (DATA CONTROL LANGUAGE)

4. TCL (TRANSACTION CONTROL LANGUAGE)


DBMS stands for database management system; in other words, a
system that manages databases. Examples of DBMSes
are Oracle and SQL Server. These are systems that can be used
to manage transactional databases, such as HR systems, banking
systems and so on. These are typically optimized for performing
transactions.

A data dictionary is a collection of descriptions of


the data objects or items in a data model for the benefit of
programmers and others who need to refer to them.

UNION is combines the results of two or more queries into a


single result set that includes all the rows that belong to all
queries in the union. JOINs, you can retrieve data from two or
more tables based on logical relationships between the tables
The database implementation or deployment is the process of
installation of database software, configuration and
customization, running, testing, integrating with applications, and
training the users

UNIT 3: Data Warehouse and Data Mining

A data warehousing is a technique for collecting and managing data


from varied sources to provide meaningful business insights. It is a
blend of technologies and components which allows the strategic
use of data
Data mining is the process of sorting through large data sets to identify patterns and establish
relationships to solve problems through data analysis. Data mining tools allow enterprises to
predict future trends.

Comparison Chart

BASIS FOR
OLTP OLAP
COMPARISON

Basic It is an online transactional It is an online data retrieving

system and manages and data analysis system.

database modification.

Focus Insert, Update, Delete Extract data for analyzing that

information from the helps in decision making.

database.

Data OLTP and its transactions Different OLTPs database

are the original source of becomes the source of data

data. for OLAP.

Transaction OLTP has short OLAP has long transactions.

transactions.

Time The processing time of a The processing time of a

transaction is transaction is comparatively


BASIS FOR
OLTP OLAP
COMPARISON

comparatively less in OLTP. more in OLAP.

Queries Simpler queries. Complex queries.

Normalization Tables in OLTP database Tables in OLAP database are

are normalized (3NF). not normalized.

Integrity OLTP database must OLAP database does not get

maintain data integrity frequently modified. Hence,

constraint. data integrity is not affected.

Difference Between OLTP and OLAP


November 15, 2016 3 Comments

OLTP and OLAP both are the online processing systems.


OLTP is a transactional processing while OLAP is an analytical processing
system. OLTP is a system that manages transaction-oriented applications on the
internet for example, ATM. OLAP is an online system that reports to
multidimensional analytical queries like financial reporting, forecasting, etc. The
basic difference between OLTP and OLAP is that OLTP is an online database
modifying system, whereas, OLAP is an online database query answering
system.
There are some other differences between OLTP and OLAP which I have
explained using the comparison chart shown below.

Content: OLTP Vs OLAP

1. Comparison Chart

2. Definition

3. Key Differences

4. Conclusion

Comparison Chart

BASIS FOR
OLTP OLAP
COMPARISON

Basic It is an online transactional It is an online data retrieving and

system and manages data analysis system.

database modification.

Focus Insert, Update, Delete Extract data for analyzing that

information from the helps in decision making.

database.

Data OLTP and its transactions are Different OLTPs database becomes

the original source of data. the source of data for OLAP.

Transaction OLTP has short transactions. OLAP has long transactions.

Time The processing time of a The processing time of a

transaction is comparatively transaction is comparatively more

less in OLTP. in OLAP.


BASIS FOR
OLTP OLAP
COMPARISON

Queries Simpler queries. Complex queries.

Normalization Tables in OLTP database are Tables in OLAP database are not

normalized (3NF). normalized.

Integrity OLTP database must maintain OLAP database does not get

data integrity constraint. frequently modified. Hence, data

integrity is not affected.

Definition of OLTP

OLTP is an Online Transaction Processing system. The main focus of OLTP


system is to record the current Update, Insertion and Deletion while
transaction. The OLTP queries are simpler and short and hence require less
time in processing, and also requires less space.

Definition of OLAP

OLAP is an Online Analytical Processing system. OLAP database stores


historical data that has been inputted by OLTP. It allows a user to view different
summaries of multi-dimensional data. Using OLAP, you can extract information
from a large database and analyze it for decision making.

ETL is short for extract, transform, load, three database functions that are combined into one tool to pull data out of

one database and place it into another database.


 Extract is the process of reading data from a database. In this stage, the data is collected, often from
multiple and different types of sources.

 Transform is the process of converting the extracted data from its previous form into the form it needs to be
in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by
combining the data with other data.

 Load is the process of writing the data into the target database.

How ETL Works


Data from one or more sources is extracted and then copied to the data warehouse. When dealing with large

volumes of data and multiple source systems, the data is consolidated. ETL is used to migrate data from one

database to another, and is often the specific process required to load data to and from data marts and data

warehouses, but is a process that is also used to to large convert (transform) databases from one format or type to

another

Limitations or Disadvantages of Data Mining


Techniques:
Data mining technology is something which helps one person in their
decision making and that decision making is a process where in which all the
factors of mining is involved precisely.

And while involvement of these mining systems, one can come across several
disadvantages of data mining and they are as follows.

1. It violates user privacy:

It is a known fact that data mining collects information about people using
some market-based techniques and information technology. And these data
mining process involves several numbers of factors.

But while involving those factors, data mining system violates the privacy of
its user and that is why it lacks in the matters of safety and security of its
users. Eventually, it creates Mis-communication between people.

2. Additional irrelevant information:

The main functions of the data mining systems creates a relevant space for
beneficial information.
But the main problem with these information collection is that there is a
possibility that the collection of information process can be little
overwhelming for all.

Therefore, it is very much essential to maintain a minimum level of limit for


all the data mining techniques.

3. Misuse of information:

As it has been explained earlier that in the data mining system the possibility
of safety and security measure are really minimal. And that is why some can
misuse this information to harm others in their own way.

Therefore, the data mining system needs to change its course of working so
that it can reduce the ratio of misuse of information through the mining
process.

4. An accuracy of data:

Most of the time while collecting information about certain elements one used
to seek help from their clients, but nowadays everything has changed. And
now the process of information collection made things easy with the mining
technology and their methods.

One of the most possible limitations of this data mining system is that it can
provide accuracy of data with its own limits.

Data Warehouse Architectures


There are mainly three types of Datawarehouse Architectures: -

Single-tier architecture
The objective of a single layer is to minimize the amount of data
stored. This goal is to remove data redundancy. This architecture is
not frequently used in practice.

Two-tier architecture

Two-layer architecture separates physically available sources and


data warehouse. This architecture is not expandable and also not
supporting a large number of end-users. It also has connectivity
problems because of network limitations.

Three-tier architecture

This is the most widely used architecture.

It consists of the Top, Middle and Bottom Tier.

1. Bottom Tier: The database of the Datawarehouse servers as


the bottom tier. It is usually a relational database system. Data
is cleansed, transformed, and loaded into this layer using back-
end tools.

2. Middle Tier: The middle tier in Data warehouse is an OLAP


server which is implemented using either ROLAP or MOLAP
model. For a user, this application tier presents an abstracted
view of the database. This layer also acts as a mediator
between the end-user and the database.

3. Top-Tier: The top tier is a front-end client layer. Top tier is the
tools and API that you connect and get data out from the data
warehouse. It could be Query tools, reporting tools, managed
query tools, Analysis tools and Data mining tools.

Database Schema
A database schema is the skeleton structure that represents the logical view
of the entire database. It defines how the data is organized and how the
relations among them are associated. It formulates all the constraints that
are to be applied on the data.

A database schema can be divided broadly into two categories −

 Physical Database Schema − This schema pertains to the actual storage of


data and its form of storage like files, indices, etc. It defines how the data will
be stored in a secondary storage.

 Logical Database Schema − This schema defines all the logical constraints
that need to be applied on the data stored. It defines tables, views, and
integrity constraints.

S-ar putea să vă placă și