Documente Academic
Documente Profesional
Documente Cultură
Overview of the
Microsoft Analytics
Platform System
(APS)
Matt Usher
Senior Program Manager
@two_under
About.me
Senior Program Manager
9 years at Microsoft
Visual Studio
Office
Windows Server
Analytics Platform System (APS)
Dashboard Reporting
s
Data
warehouse
Realtime
data
ETL
1
3
Increasing
data
volumes
Data sources
OLTP ERP CRM LOB
New data
sources and
types
4
Cloudborn
data
Self-service
Corporate
Collaboration
Mobile
Predictive
Single query
model
Extract, transform,
load
Data quality
Master data
management
Relational
Non-relational
Analytical
Streaming
INFRASTRUCTURE
Data sources
OLTP ERP CRM LOB
Non-relational data
A building management
company wanted to
integrate and analyze
data from sensors and
equipment to improve
efficiency and lower
energy costs by 20
percent.
A technical university
needed on-demand
computing in the cloud for
DNA sequencing to
accelerate access,
discovery, and analysis.
Live data
feeds
Advanced
analytics
6
Limited
scalability and
ability to handle
new data types
Significant
training and data
silos
Acquire business
intelligence
High
acquisition
and migration
costs
Complex with
low adoption
Enterprise-ready
Big Data
Relational and nonrelational data in a single
appliance
Enterprise-ready Hadoop
Integrated querying across
Hadoop and PDW using TSQL
Direct integration with
Microsoft BI tools such as
Microsoft Excel
Next-generation
performance at
scale
Near real-time
performance with InMemory Columnstore
Ability to scale out to
accommodate growing
data
Removal of data
warehouse bottlenecks
with MPP SQL Server
Concurrency that fuels
rapid adoption
Engineered for
optimal value
Industrys lowest data
warehouse appliance price
per terabyte
Value through a single
appliance solution
Value with flexible
hardware options using
commodity hardware
Enterprise-ready
Big Data
Next-generation
performance at
scale
Engineered for
optimal value
Megabyte
s
Petabytes
Data complexity:
variety and
velocity
Historical
analysis
Insight
analysis
Predictive
analytics
Predictive
forecasting
What is Hadoop?
OPERATION
AL
SERVICES
AMBARI
DATA
SERVICES
FLUME
OOZIE
SQOOP
FALCON
Microsoft Confidential
HBASE
LOAD &
EXTRACT
MAP
REDUC
E
YARN
NFS
Core Services
WebHDF
HDFS
PI
G
Hadoop Cluster
compute
&
storage
compute
&
storage
HIVE &
HCATALO
G
11
SQL Server
Parallel Data
Warehouse
High performance
and tuned within
the appliance
End-user
authentication
with Active
Directory
100-percent
Apache Hadoop
Managed and
monitored using
System Center
PolyBase
Microsoft
HDInsight
Accessible
insights for
everyone with
Microsoft BI tools
Security
Metering
Servicing
Appliance
Fabric
Hardware
HDInsight workload
Demo
Bringing Hadoop point solutions and the data warehouse together for users and IT
Select
Microsoft Azure
HDInsight
Hortonworks for
Windows and Linux
Cloudera
Result
set
SQL Server
Parallel Data
Warehouse
PolyBase
Microsoft
HDInsight
(HDFS) Bridge
Direct and parallelized HDFS access
Enhancing the Data Movement Service (DMS) of APS to allow direct communication between HDFS data nodes and
PDW compute nodes
Non-relational
data
Relational data
Social
apps
Sensor
and
RFID
Mobile
Web
apps
apps
Hadoop
Regular
T-SQL
Results
External table
External
data
source
Traditional schema-based
data warehouse
applications
External file
format
Enhanced
PDW query
engine
HDFS bridge
PDW
SQL Server
Data Marts
MapReduce
SQL Server
Parallel Data
Warehouse
T-SQL
SQL Server
Reporting Services
PolyBase
Microsoft
HDInsight
APS
SQL Server
Analysis Services
Dynamic binding
Column filtering
SELECT User,
FROM
Product, Sentiment
Twitter_Table
WHERE Hour
=
AND
Date
=
AND
Sentiment
Current - 1
Today
>= 0
Hour
Date
5-15-14
5-15-14
xbox
5-15-14
IL
sqls
5-13-14
Sanjay
MN
wp8
5-14-14
Roger
TX
ssas
23
5-14-14
Steve
AL
ssrs
23
5-13-14
User
Location
Product
Sentiment Rtwt
Sean
CA
xbox
-1
Audie
CO
excel
Suz
WA
Tom
Row filtering
Hadoop
Syntax extensions
Security and
permission model
External table
source and file
format syntax
Microsoft Azure
HDInsight
HDInsight on APS
Hortonworks Data
Platform 1.3 and 2.0
(Linux/Windows Server)
Azure extensions
Microsoft
Azure
Storage
Blobs
AU1
PolyBase
v2
Analytics Platform
System
(powered by PolyBase)
Takes
advantage of
high adoption
of Excel,
Power View,
PowerPivot,
and SQL
Server
Offers Hadoop
Analysis
tools like
Services
MapReduce,
Hive, and Pig
for data
scientists
Minimizes IT
intervention
for discovering
data with tools
such as
Microsoft Excel
Enables DBA
and power
users to join
relational and
Hadoop data
with T-SQL
Power users
Data scientist
{WITH (
1 Type of external data source
TYPE = <data_source>,
2 Location of external data
source
LOCATION =<location>,
[JOB_TRACKER_LOCATION = <jb_location>]
};
3
Enabling or disabling of
MapReduce job generation
{WITH (
3 Compression meth
FORMAT_TYPE = <type>,
[SERDE_METHOD = <sede_method>,]
[DATA_COMPRESSION = <compr_method>,]
[FORMAT_OPTIONS (<format_options>)]
};
4
Format Options
<Format Options> :: =
[,FIELD_TERMINATOR = value],
[,STRING_DELIMITER = value],
[,DATE_FORMAT = value],
[USE_TYPE_DEFAULT = value]
Column delimiter
Demo
Enterprise-ready
Big Data
Next-generation
performance at
scale
Engineered for
optimal value
Rowstore
Querying data by row
Data
Forklift
Forklift
C
1
C
2
C
3
C
4
R1
R1
R1
R1
R2
R2
R2
R2
R3
R3
R3
R3
R4
R4
R4
R4
R5
R5
R5
R5
R6
R6
R6
R6
Page 1
Page 2
Page 3
Scale out
PDW /
HDInsight
PDW /
HDInsight
PDW /
HDInsight
PDW /
HDInsight
PDW /
HDInsight
PDW
6
petabytes
Blazing-fast performance
C
1
C
2
C
3
C
4
C
5
C
6
100x
compression
Load data into or out of memory for nextgeneration performance with up to 60%
improvement in data loading speed
Query
Results
15x
Up to
Up to
faster queries more
compression
Saves space
91%
saving
s
Create query
plan
User
query
Client
Appliance
Compute
Management
Control
Compute
Compute
Query
results
Aggregate query
results
Compute
Compute nodes
process query
plan operations in
parallel
CRM
LOB
Analytics Platform
Intra-Day
System
APPS
CRTAS
Link Table
Near real-time
PDW
Real-Time
Columnstore
ROLAP / MOLAP
DirectQuery
Ad hoc queries
Polybase
PolyBase
Fast ad hoc
HDInsight
SNAC
BI Tools
Enterprise-ready
Big Data
Next-generation
performance at
scale
Engineered for
optimal value
Thousands
$25
$20
lower
price per
terabyte than the
closest competitor
$15
$10
$5
$0
Oracle
EMC
IBM
Teradata
Microsoft
Lower storage
costs
with Windows Server
2012
Storage Spaces
PDW
Integrated
support plan
with a single
Microsoft
contact
Coengineered
with HP, Dell,
and Quanta
best practices
Preconfigured,
built, and
tuned
software and
hardware
Leading
performance
with
commodity
hardware
PolyBase
HDInsight
Rack #2
Rack #1
InfiniBand
InfiniBand
InfiniBand
InfiniBand
Ethernet
Ethernet
Ethernet
HDI extension
base unit
Failover node
Hardware architecture
Ethernet
Control node
Failover node
Networkin
g
PDW region
HST-01
Master node
HST-02
Failover node
Compute nodes
Compute nodes
HDI active
scale unit
Compute nodes
HSA-01
Economical disk storage
HDInsight region
HST-02
Compute nodes
HDI active
scale unit
HDI extension
base unit
Passive Unit
Failover
Node
PDW region
Economi
cal disk
storage
Active Unit
Compute nodes
Compute nodes
IB and
Ethernet
PDW engine
DMS Manager
SQL Server 2012 Enterprise Edition (PDW build)
Software details
C
T
L
M
A
D
A
D
V
M
M
Base Unit
Host 1
Host 2
Compute 1
IB and
Ethernet
Compute 2
Host 3
Economica
l disk
storage
Host 4
Direct attached SAS
Failover functionality
M
A
D
A
D
V
M
M
Host 1
C
TL
M
A
D
FA
B
A
D
Compute 1
V
M
M
1
Host 2
Compute 1
C
TL
Compute 1
IB and
Ethernet
Compute 2
Host 5
2
Host 3
Base Unit
Base Unit
Passive
Unit
Economic
al disk
storage
Host 4
Failover capabilities
Security enhancements
Integrated
authentication
Transparent
data
encryption
Scenarios
User logs in with domain
credentials
No trust is required
Minimum configuration
(NTLM)
Minimum configuration
(Kerberos)
One-way (outgoing)
external (non-transitive)
trust between corporate DC
and PDW Workload AD
One-way forest
Two-way forest
Two-way external trust
3. User
creates
certificate in
master
database
4. User
creates
database
encryption
key (UserDB)
5. Initiate
database
encryption
for user
database
PDW creates
certificate on
CTL01
PDW creates
database
encryption key
on CTL01
PDW creates
different
database
encryption key
(ALL CMP)
PDW creates
master key on
CTL01
PDW creates
separate master
key on all
compute nodes
PDW encrypts
tempdb and
pdwtempdb
PDW exports
certificate and
imports it into
all CMP nodes
PDW encrypts
user database
Demo
Resources
Learning
Sessions on Demand
http://channel9.msdn.com/Events/Tec
hEd
TechNet
Resources for IT Professionals
http://microsoft.com/technet
www.microsoft.com/learning
msdn
Resources for Developers
http://microsoft.com/msdn
2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be
interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR
STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.