0 evaluări0% au considerat acest document util (0 voturi)
28 vizualizări15 pagini
Graeme malcolm data Technology Specialist, Content Master Pete Harris learning Product Planner, Microsoft One of the world's first MCSEs in SQL Server 2012 BI. Graeme is A regular blogger at www.contentmaster.com Longstanding partner with Microsoft.
Graeme malcolm data Technology Specialist, Content Master Pete Harris learning Product Planner, Microsoft One of the world's first MCSEs in SQL Server 2012 BI. Graeme is A regular blogger at www.contentmaster.com Longstanding partner with Microsoft.
Graeme malcolm data Technology Specialist, Content Master Pete Harris learning Product Planner, Microsoft One of the world's first MCSEs in SQL Server 2012 BI. Graeme is A regular blogger at www.contentmaster.com Longstanding partner with Microsoft.
Graeme Malcolm | Data Technology Specialist, Content Master
Pete Harris | Learning Product Planner, Microsoft
Graeme Malcolm | @graeme_malcolm Microsoft Data Platform Specialist Consultant, trainer, and author since SQL Server 4.2 One of the worlds first MCSEs in SQL Server 2012 BI (Fairly) regular blogger at www.contentmaster.com Longstanding partner with Microsoft Lead author for Microsoft Official Curriculum SQL Server 2014 and SQL Server 2012 BI courses Contributing author of Patterns and Practices Guide to Big Data Author of numerous training courses and Microsoft Press titles since SQL Server 7.0 Pete Harris | @SQLPete
Learning Product Planner Various roles at Microsoft since 1995
Course Topics Implementing Big Data Analysis 01 | Introduction to Big Data 05 | Processing Big Data with Hive 02 | Getting Started with HDInsight 06 | Automating Big Data Processing 03 | Windows Azure PowerShell 07 | Analyzing Big Data with Excel 04 | Processing Big Data with Pig Setting Expectations Target Audience BI professionals and data analysts Suggested Prerequisites/Supporting Material Experience using Microsoft Excel and Power BI Knowledge of enterprise BI technologies
Microsoft Virtual Academy Free online learning tailored for IT Pros and Developers Over 1M registered users Up-to-date, relevant training on variety of Microsoft products Earn while you learn! Get 50 MVA Points for this event! Visit http://aka.ms/MVA-Voucher Enter this code: PowerJump1 (expires 8/15/2013) Join the MVA Community! 01 | Introduction to Big Data Graeme Malcolm | Data Technology Specialist, Content Master Pete Harris | Learning Product Planner, Microsoft What is Big Data? Big Data Technologies Map/Reduce Microsoft Tools for Big Data Module Overview What is Big Data? Data that is too large or complex for analysis in traditional relational databases Typified by the 3 Vs: Volume Huge amounts of data to process Variety A mixture of structured and unstructured data Velocity New data generated extremely frequently
Web server log reporting Social media sentiment analysis Sensor anomaly detection Big Data Technologies Hadoop Open source distributed data processing cluster Data processed in Hadoop Distributed File System (HDFS) Related projects Hive Pig HCatalog Oozie Sqoop Others HDFS Name Node Data Nodes Hadoop Cluster Map/Reduce 1. Source data is divided among data nodes 2. Map phase generates key/value pairs 3. Reduce phase aggregates values for each key Lorem ipsum sit amet magma sit elit Fusce magna sed sit amet magna Key Value Lorem 1 ipsum 1 sit 1 amet 1 magma 1 sit 1 elit 1 Key Value Fusce 1 magma 1 sed 1 sit 1 amet 1 magma 1 Key Value Lorem 1 ipsum 1 sit 3 amet 2 magma 3 elit 1 Fusce 1 sed 1 M A P
R E D U C E
Map/Reduce Code in Hadoop Usually written in Java and compiled as a Jar Streaming enables other languages public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while (tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, one); } } } public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values, Context context){ int sum = 0; for (IntWritable val : values) { sum += val.get(); } context.write(key, new IntWritable(sum)); } } Microsoft Tools for Big Data SQL Server Parallel Data Warehouse Enterprise data warehouse appliance Massively Parallel Processing (MPP), shared-nothing architecture Windows Azure HDInsight Cloud-based implementation of Hadoop Available as a Windows Azure service PolyBase Integration technology for SQL Server Parallel Data Warehouse and HDInsight
Module Summary Big Data is characterized by Volume Variety Velocity Hadoop is an open source platform for Big Data processing Map/Reduce is a distributed data processing technique Microsoft is investing in solutions for Big Data 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Office, Azure, System Center, Dynamics and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.