Documente Academic
Documente Profesional
Documente Cultură
Why MapR? Products Services Solutions Customers Find a Reseller TRY MAPR
Spark jobs perform multiple operations consecutively, in memory, only spilling to disk when required by memory limitations. Spark simplifies the management of these disparate processes, offering
an integrated whole – a data pipeline that is easier to configure, run, and maintain. In use cases such as ETL, these pipelines can become extremely rich and complex, combining large numbers of
inputs and a wide range of processing steps into a unified whole that consistently delivers the desired result.
In this video, you will see an example from the eBook Getting Started with Apache Spark 2.x.
The MapReduce Java API is not easy to program with, although Pig and Hive make this somewhat
easier.
MapReduce, Pig, and Hive are only for batch ETL, and data sources are limited to Hadoop.
Spark provides a rich functional programming model and comes packaged with higher level libraries
for SQL, machine learning, streaming, and graphs.
Spark's Structured API provides the same API for batch and real-time streaming. Spark's architecture
supports tight integration with a number of leading storage solutions in the Hadoop ecosystem and
beyond, including Apache HDFS, MapR XD Distributed File and Object Store, Apache HBase, MapR
Database JSON, Apache Kafka, and Apache Hive.
Develop and deploy applications that run 10-100x faster in production environments with in-memory Provides a uniform set of high-level machine learning pipeline APIs built on top of DataFrames to
processing of data make machine learning scalable with the ease of SQL for data manipulation
Build complex ETL pipelines that can speed up data ingestion and deliver superior performance Integrated distributed machine algorithms for classification, regression, collaborative filtering,
clustering, dimensionality reduction, and frequent pattern mining
Spark SQL's Structured Data API simplifies the complexity of data access, transformation, and storage
across distributed file systems, different file formats, streaming data, and NoSQL data stores Leverage Spark and deep learning with external libraries including BigDL, Spark Deep Learning
Pipelines, TensorFlowOnSpark, dist-keras, H2O Sparkling Water, PyTorch, Caffe, and MXNet
Combine event streams with machine learning to handle the logistics of machine learning in a flexible
way by:
Managing and evaluating multiple models and easily deploying new models
If you're interested in this free on-demand course, learn more about it here.
A confluence of several different technology shifts have dramatically changed machine learning applications. The combination of distributed computing, streaming analytics, and machine learning
is accelerating the development of next-generation intelligent applications, which take advantage of modern computational paradigms powered by modern computational infrastructure. The MapR
Data Platform integrates global event streaming, real-time database capabilities, and scalable enterprise storage with Hadoop, Spark, Apache Drill, and other ML libraries to power this new
generation of data processing pipelines and intelligent applications. Diverse and open APIs allow all types of analytics workflows to run on the data in place.
The MapR XD Distributed File and Object Store is designed to store data at exabyte scale, support trillions of files, and combine analytics and operations into a single platform. MapR XD supports
industry-standard protocols and APIs, including POSIX, NFS, S3, and HDFS. Unlike Apache HDFS, which is a write once, append-only paradigm, the MapR Data Platform delivers a true read-write,
POSIX-compliant file system. Support for the HDFS API enables Spark and Hadoop ecosystem tools, for both batch and streaming, to interact with MapR XD. Support for POSIX enables Spark and
all non-Hadoop libraries to read and write to the distributed data store as if the data were mounted locally, which greatly expands the possible use cases for next-generation applications. Support
for an S3-compatible API means MapR XD can also serve as the foundation for Spark applications that leverage object storage.
The MapR Event Store for Apache Kafka is the first big-data-scale streaming system built into a unified data platform and the only big data streaming system to support global event replication
reliably at IoT scale. Support for the Kafka API enables Spark streaming applications to interact with data in real time in a unified data platform, which minimizes maintenance and data copying.
MapR Database is a high-performance NoSQL database built into the MapR Data Platform. MapR Database is multi-model: wide-column, key-value with the HBase API, or JSON (document) with
the OJAI API. Spark connectors are integrated for both HBase and OJAI APIs, enabling real-time and batch pipelines with MapR Database:
The MapR Database Connector for Apache Spark enables you to use MapR Database as a sink for Spark Structured streaming or Spark Streaming.
The Spark MapR Database Connector enables users to perform complex SQL queries and updates on top of MapR Database, while applying critical techniques such as projection and filter
pushdown, custom partitioning, and data locality.
MapR put key technologies essential to achieving high scale and high reliability in a fully distributed architecture that spans on-premises, cloud, and multi-cloud deployments, including edge-first
IoT, while dramatically lowering both the hardware and operational costs of your most important applications and data.
“ We are very excited about the new features [in MapR], Spark structured streaming allows us to use advanced analytics on
real-time oil well data while Drill allows us to explore the same data using SQL. This helps us make operational decisions
faster.
”
CUSTOMERS USING APACHE SPARK ON MAPR
ADDITIONAL eBook On-Demand Training
Getting Started with Apache Spark Developer Courses
RESOURCES Spark 2.x from Inception to
Production
What's New?
MapR Accelerates the MapR Ecosystem Pack (MEP) MapR Clarity vs Cloudera Unity
Separation of Compute and 6.1
Storage
April 02, 2019 February 06, 2019 November 07, 2018
Products What's New Compute and Storage Products What's New MEP 6.1 MapR Differentiation Cloudera MapR Clarity MapR
MapR Accelerates the Separation of Compute Ecosystem Pack (MEP) 6.1 MapR Amplifies Announces Clarity Program Available Today,
and Storage Latest Release Integrates with Power of Kubernetes, Kafka, and MapR MapR Clarity Provides a Clear Path to AI,
Kubernetes to Better Manage Today's Bursty Database to Speed Up AI Application Hybrid Cloud, Containers, and Operational
and Unpredictable AI... Development. MAPR IS THE LEADING DATA Analytics WEBINAR Learn what...
PLATFORM...
GET STARTED
Email Us +1 855-NOW-MAPR Download MapR for Free Request a Demo
Training Awards