Sunteți pe pagina 1din 15

Graeme Malcolm | Data Technology Specialist, Content Master

Pete Harris | Learning Product Planner, Microsoft


Graeme Malcolm | @graeme_malcolm
Microsoft Data Platform Specialist
Consultant, trainer, and author since SQL Server 4.2
One of the worlds first MCSEs in SQL Server 2012 BI
(Fairly) regular blogger at www.contentmaster.com
Longstanding partner with Microsoft
Lead author for Microsoft Official Curriculum SQL Server 2014 and SQL
Server 2012 BI courses
Contributing author of Patterns and Practices Guide to Big Data
Author of numerous training courses and Microsoft Press titles since
SQL Server 7.0
Pete Harris | @SQLPete

Learning Product Planner
Various roles at Microsoft since 1995

Course Topics
Implementing Big Data Analysis
01 | Introduction to Big Data 05 | Processing Big Data with Hive
02 | Getting Started with HDInsight 06 | Automating Big Data Processing
03 | Windows Azure PowerShell 07 | Analyzing Big Data with Excel
04 | Processing Big Data with Pig
Setting Expectations
Target Audience
BI professionals and data analysts
Suggested Prerequisites/Supporting Material
Experience using Microsoft Excel and Power BI
Knowledge of enterprise BI technologies

Microsoft Virtual Academy
Free online learning tailored for IT Pros and Developers
Over 1M registered users
Up-to-date, relevant training on variety of Microsoft products
Earn while you learn!
Get 50 MVA Points for this event!
Visit http://aka.ms/MVA-Voucher
Enter this code: PowerJump1 (expires 8/15/2013)
Join the MVA Community!
01 | Introduction to Big Data
Graeme Malcolm | Data Technology Specialist, Content Master
Pete Harris | Learning Product Planner, Microsoft
What is Big Data?
Big Data Technologies
Map/Reduce
Microsoft Tools for Big Data
Module Overview
What is Big Data?
Data that is too large or complex for analysis in
traditional relational databases
Typified by the 3 Vs:
Volume Huge amounts of data to process
Variety A mixture of structured and unstructured data
Velocity New data generated extremely frequently

Web server log reporting Social media sentiment analysis Sensor anomaly detection
Big Data Technologies
Hadoop
Open source distributed data processing cluster
Data processed in Hadoop Distributed File System (HDFS)
Related projects
Hive
Pig
HCatalog
Oozie
Sqoop
Others
HDFS
Name Node Data Nodes
Hadoop Cluster
Map/Reduce
1. Source data is divided
among data nodes
2. Map phase generates
key/value pairs
3. Reduce phase aggregates
values for each key
Lorem ipsum sit amet magma sit elit
Fusce magna sed sit amet magna
Key Value
Lorem 1
ipsum 1
sit 1
amet 1
magma 1
sit 1
elit 1
Key Value
Fusce 1
magma 1
sed 1
sit 1
amet 1
magma 1
Key Value
Lorem 1
ipsum 1
sit 3
amet 2
magma 3
elit 1
Fusce 1
sed 1
M
A
P

R
E
D
U
C
E

Map/Reduce Code in Hadoop
Usually written in Java and compiled as a Jar
Streaming enables other languages
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context){
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
Microsoft Tools for Big Data
SQL Server Parallel Data Warehouse
Enterprise data warehouse appliance
Massively Parallel Processing (MPP), shared-nothing architecture
Windows Azure HDInsight
Cloud-based implementation of Hadoop
Available as a Windows Azure service
PolyBase
Integration technology for SQL Server Parallel Data Warehouse and
HDInsight

Module Summary
Big Data is characterized by
Volume
Variety
Velocity
Hadoop is an open source platform for Big Data processing
Map/Reduce is a distributed data processing technique
Microsoft is investing in solutions for Big Data
2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Office, Azure, System Center, Dynamics and other product names are or may be registered trademarks and/or trademarks in the
U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft
must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

S-ar putea să vă placă și