Documente Academic
Documente Profesional
Documente Cultură
https://www.gangboard.com/big-data-training/apache-spark-training
Apache Spark is an ultra-fast cluster computing technology designed for
fast calculations. It is based on the Hadoop MapReduce and extends the
MapReduce model to efficiently use it for more types of calculations,
including interactive queries and flow processing. Spark's key feature is in-
memory cluster computing that increases the speed of processing an
application.
Spark SQL
Spark SQL is a component beyond Spark Core that features a new data
abstraction called SchemaRDD, which provides support for structured
and semi-structured data.
Spark streaming
Spark Streaming takes advantage of Spark Core's fast programming
capability to perform stream analysis. It inserts data into mini-batches and
performs resilient distributed data sets (RDD) transformations on these mini-
batches of data.