Documente Academic
Documente Profesional
Documente Cultură
Sponsored By:
Presented by :
Tony Curcio, InfoSphere
Product Management
0
Tony Curcio
InfoSphere Product Management
Larger volumes
Solutions need to move, transform, cleanse and otherwise
prepare huge data volumes
Big Data requires data scalability
Simplifies Heterogeneity
Common method for diverse data sources
Analyze
Transactional
& Collaborative
Applications
Business Analytics
Applications
Content
Integrate
Manage
Big Data
Master
Data
Cubes
Streams
Data
External
Information
Sources
Data
Warehouses
Content
Streaming
Information
Information
Governance
Govern
Standards
Quality
Security &
Lifecycle Privacy
Extend existing
customer views by
incorporating additional
information sources
Big Data
Exploration
Data
Warehouse
Augmentation
Enhanced
360o View of
the Customer
Operations
Analysis
Analyze a variety of
machine data for
improved business
results
Challenges
Leveraging structured, unstructured,
and streaming data sources for deep
analysis
Low latency requirements
Query access to data
Optimizing warehouse for big data
volumes
Metadata management to support
impact analysis and data lineage
Required capabilities
Data Integration Hub Processing
High-speed, massively scalable
read from and write to big data
sources and new data
Big Data Expert
Automatically build MapReduce
logic through simple data flow
design and coordinate workflow
across traditional and big data
platforms
Data Integration
Hub Processing
Connectivity Hub
InfoSphere
DataStage
Effectively handle the complexity of enterprise information sources
and types with a common design paradigm across
heterogeneous landscape with high-speed scalable solution
to speed the delivery of analytics.
2013 IBM Corporation
Transfor
m
Sequential
Enrich
4-way Parallel
Disk
Disk
CPU
Memor
y
Shared
Memory
Uniprocessor
10
Cleanse
SMP System
EDW
64-way Parallel
Dynamic
Instantly get better performance
as hardware resources are
added to any topology
Extendable
Add a new server to scale out
through simple text file edit (or, in
grid config, automatically via
integration with grid management
software).
Data Partitioned
In true MPP fashion (like
Hadoop) data persisted in the
data integration platform is stored
in parallel to scale out the I/O.
Hadoop Integrated
Push all or parts of the process
MPP Clustered System out to Hadoop to take advantage
of its scalability in ELT fashion.
10
noSQL
InfoSphere Streams
massive real-time analytics
11
12
Read from an
HDFS file in
parallel
Create new
HDFS file,
fully
parallelized
Join two
HDFS files
13
Open
Code
14
Writing data
to MongoDB
Writing data
to Hive
15
Beta available on
InfoSphere
DataStage 9.1 FP1
16
Big Data
Expert
InfoSphere
DataStage
Automatically push transformational processing close to where the
data resides, both SQL for DBMS and MapReduce for Hadoop,
leveraging the same simple data flow design process and coordinate
workflow across all platforms
2013 IBM Corporation
New in 9.1, leverage the same UI and the same stages to build
MapReduce.
Drag and drop stages to the canvas to create a job, rather than have to
learn MapReduce programming.
Push the processing to Hadoop for patterns when you dont want to
transport the data on the network.
19
Build integration
jobs with the
same data flow
tool and stages
Automatically
creates
MapReduce
code.
20
21
clickstream
sensors
ETL
Replication
Lineage
Quality
Information Server
JAQL
Hive
Analytics
Warehouse
Zone
HBase
transactions
Guardium
BigInsights / Hadoop
content
Masking
all sources
Custom MR
Operational
Warehouse
Zone
Landing Zone
Masking
Optim
22
End-to-end Workflows
Sequence right alongside other
data integration and analytics
activities
Allows users to have the data
sourcing, ETL, Analytics and
delivery of information all controlled
through a single process.
Monitor all stages through
Operations Consoles web based
interace
23
Understand how traditional and big data sources are being used
Assess impact of change and mitigate risks
Show impact on downstream applications and BI reports
Navigate through impacted areas and drill down
Wrap-up
Application
Development
Discovery
Accelerators
Hadoop
System
Stream
Computing
Data
Warehouse
Data
Media
Content
Machine
Social
Activity Monitoring
Data Masking
Data Encryption
On-Demand / In-Place Protection
In-Line Protection (w/ETL etc.)
Active Detection & Alerting
27
Queryable Archive
Structured and Semi-Structured
Optimized Connectors to existing Apps
Hot-Restorable On-the-Fly
Immutable and Secure Access
Automated Legal Hold Capability for Data
Freeze
29
Thanks