Documente Academic
Documente Profesional
Documente Cultură
Executive Summary.................................................................................................... 3
1
Introduction......................................................................................................... 3
1.1
Objectives................................................................................................. 3
1.2
How This Report Was Produced................................................................3
1.3
Structure of This Report............................................................................3
2
Big Data Definitions and Taxonomies...................................................................3
2.1
Big Data Definitions.................................................................................. 3
2.2
Big Data Taxonomies................................................................................ 3
2.3
Actors....................................................................................................... 3
2.4
Roles and Responsibilities.........................................................................3
3
Big Data Elements............................................................................................... 6
3.1
Data Elements.......................................................................................... 6
3.2
Dataset at Rest......................................................................................... 6
3.3
Dataset in Motion..................................................................................... 6
3.4
Data Processes......................................................................................... 6
3.5
Data Process Changes..............................................................................6
3.6
Data Science............................................................................................. 6
3.7
Big Data Metrics....................................................................................... 6
Executive Summary
<intro to big data>
1 Introduction
1.1 Objectives
Restrict to what is different now that we have big data
Not trying to create taxonomy of the entire data lifecycle processes and all data
types
Keep terms independent of a specific tool
Be mindful of terminology due to the context in different domains (e.g. legal)
Break
Big data refers to the inability of traditional data architectures to efficiently handle
the new data sets. Characteristics that force a new architecture to achieve
efficiencies are the dataset characteristics volume, variety of data types, and
diversity of data from multiple domains. In addition the data in motion
characteristics of velocity, or rate of flow, and variability, as the change in
velocity, also result in different architectures or different data lifecycle process
orderings to achieve greater efficiencies.
The new big data paradigm occurs when the scale of the data at rest or in motion
forces the management of the data to be a significant driver in the system
architecture.
2.3 Actors
2.4 Roles and Responsibilities
The list below is from http://www.smartplanet.com/blog/bulletin/7-new-types-of-jobscreated-by-big-data/682 that excerpted an article by Tam Harbert of Information World
at http://www.computerworld.com/s/article/9231445/Big_data_big_jobs_? The job roles
are mapped to elements of a Reference Architecture in red. The list should be
augmented with end-user, external data stakeholders, and Big Data solution provider
roles. See roles 8, 9, 10 below.
Here are 7 new types of jobs being created by Big Data:
1
Datascientists:Thisemergingroleistakingtheleadinprocessingrawdataand
determiningwhattypesofanalysiswoulddeliverthebestresults.Typicalbackgrounds,
ascitedbyHarbert,includemathandstatistics,aswellasartificialintelligenceand
naturallanguageprocessing.(Analytics)
2
Dataarchitects:OrganizationsmanagingBigDataneedprofessionalswhowill
beabletobuildadatamodel,andplanoutaroadmapofhowandwhenvariousdata
sourcesandanalyticaltoolswillcomeonline,andhowtheywillallfittogether.(Design,
Develop,DeployTools)
3
Datavisualizers:Thesedays,alotofdecisionmakersrelyoninformationthatis
presentedtotheminahighlyvisualformateitherondashboardswithcolorfulalerts
anddials,orinquicktounderstandchartsandgraphs.Organizationsneed
professionalswhocanharnessthedataandputitincontext,inlaymanslanguage,
exploringwhatthedatameansandhowitwillimpactthecompany,saysHarbert.
(Applications)
4
Datachangeagents:Everyforwardthinkingorganizationneedschangeagents
usuallyaninformalrolewhocanevangelizeandmarshalthenecessaryresources
fornewinnovationandwaysofdoingbusiness.Harbertpredictsthatdatachange
agentsmaybemoreofaformaljobtitleintheyearstocome,drivingchangesin
internaloperationsandprocessesbasedondataanalytics.Theyneedtobegood
communicators,andaSixSigmabackgroundmeaningtheyknowhowtoapply
statisticstoimprovequalityonacontinuousbasisalsohelps.(Notapplicableto
ReferenceArchitecture)
5
Dataengineer/operators:ThesearethepeoplethatmaketheBigData
infrastructurehumonadaytodaybasis.Theydevelopthearchitecturethathelps
analyzeandsupplydatainthewaythebusinessneeds,andmakesuresystemsare
performingsmoothly,saysHarbert.(DataProcessingandDataStores)
6
Datastewards:NotmentionedinHarbertslist,butessentialtoanyanalytics
drivenorganization,istheemergingroleofdatasteward.Everybitandbyteofdata
acrosstheenterpriseshouldbeownedbysomeoneideally,alineofbusiness.Data
stewardsensurethatdatasourcesareproperlyaccountedfor,andmayalsomaintaina
centralizedrepositoryaspartofaMasterDataManagementapproach,inwhichthereis
onegoldcopyofenterprisedatatobereferenced.(DataResourceManagement)
7
Datavirtualization/cloudspecialists:Databasesthemselvesarenolongeras
uniqueastheyusetobe.Whatmattersnowistheabilitytobuildandmaintaina
virtualizeddataservicelayerthatcandrawdatafromanysourceandmakeitavailable
acrossorganizationsinaconsistent,easytoaccessmanner.Sometimes,thisiscalled
DatabaseasaService.Nomatterwhatitscalled,organizationsneedprofessionals
thatcanalsobuildandsupportthesevirtualizedlayersorclouds.(Infrastructure)
7
SolutionProviders:ThesecanbeoriginalcreatorsofspecializedBigDatacomponentsor
supplierswhoincorporatespecializedBigDatacomponentsintoacompletesolutionpackage.
(WholeReferenceArchitecture)
10 ExternalDataOwners/Users:Thesecanbeowners/usersofexternaldatasources
providinginputtoBigDatainfrastructure(e.g.databases,streams)orowner/usersofexternal
datasinksreceivingoutputsfromBigDatainfrastructure(e.g.batchdataprocessingresults,
streamprocessingresults)(ExternalDataStoresandStreams)
10