Documente Academic
Documente Profesional
Documente Cultură
Description and Overview - Apache Spark is a fast and general engine for
large-scale data processing.
How to Use Spark
Because of its high memory and I/O bandwidth requirements, we recommend
you run your spark jobs on Cori.
Follow the steps below to use spark, note that the order of the commands
matters. DO NOT load the spark module until you are inside a batch job.
Interactive mode
Submit an interactive batch job with at least 2 nodes:
salloc -N 2 -t
30
You need to use at least 2 nodes because the driver runs on the head node by
itself and then executors run on all the other nodes (if you would like to change
this functionality see this section).
Wait for the job to start. Once it does you will be on a compute node and you will
need to load the spark module:
module load
spark
You can start spark with this command:
startall.sh
To connect to the Python Spark Shell, do:
pyspar
k
To connect to the Scala Spark Shell, do:
sparkshell
To shutdown the Spark cluster, do:
stop-
all.sh
Batch mode
Below are example batch script for Cori and Edison. You can change number of
nodes/time/queue accordingly (so long as the number of nodes is greater than
1). On Cori you can use the debug queue for short, debugging jobs and the
regular queue for long jobs.
Here's an example script for Cori called run.sl:
#!/bin/bas
h
#SBATCH -p
regular
#SBATCH -N 2
#SBATCH -t 00:30:00
#SBATCH -e mysparkjob_
%j.err
#SBATCH -o mysparkjob_
%j.out
module load
spark
startall.sh
spark-submit
$SPARK_EXAMPLES/python/pi.py
stopall.sh
To submit the job:
sbatch
run.sl
Running an Executor on the Same Node as the Driver
If you would like one of the executors to run on the same node as the driver
(which would allow you to use Spark on one node or if using multiple nodes, to
have as many executors as nodes instead of 1 fewer executors than nodes), set
this variable before module loading spark:
export SPARK_CLUSTER_MODE=ON
Monitoring Your Spark Application
Running the History Server
The history server allow you to visualize the information provided by the event
logs in a nice interactive web interface. Here are instructions to run it on Cori
(note the history server is independent of running a spark job, so no need to
start a spark job to run the server):
Run the following commands in a login node (do not run in a compute node):
module load spark/histserver
run_history_server.sh
This command will return a url that will look something like
this: http://120.44.234.30:18080
Go to the address returned in your browser on your local machine.
Alternatively, if you are in an NX or in an X11 forwarded ssh session, you can
enter
firefox
which will open a firefox browser from the login node. From there, enter
"localhost:18080" as the url to see the history server.
Initially, the page will display "No completed applications found!" until all logs
are processed. This processing can take anywhere from a minute to ten minutes
depending on how many event logs you have accumulated
To get a quicker turnaround time, consider using an event logs directory with
fewer event logs (if possible!)
Make sure to stop the server when you are done. To stop the history server:
run_history_server.sh
--stop
Note you must be on the same login node, where you started the history server,
in order to stop it.
Trouble Shooting
Module Load Errors
A successful module load spark has four steps and looks like this:
Creating Directory SPARK_WORKER_DIR
/global/cscratch1/sd/racah/spark/1054167
Creating /global/cscratch1/sd/racah/spark/1054167/slaves file
Determining the master node name...
Master node is nid00092
Module load error outputs look like this:
spark/1.6.0(137):ERROR:102: Tcl command execution failed: if { [ module-info
mode load ] } {
puts stderr "Creating Directory SPARK_WORKER_DIR $env(SPARK_WORKER_DIR)"
puts stderr "Creating $env(SPARK_WORKER_DIR)/slaves file"
puts stderr "Determining the master node name..."
set master [exec $root/myfindmaster.sh]
puts stderr "Master node is $master"
exec /bin/mkdir -p $env(SPARK_WORKER_DIR)
exec $root/myfindslaves.sh $master $env(SPARK_WORKER_DIR)/slaves
setenv SPARKURL spark://$master:7077
setenv SPARKMASTER $master
}
If the module load error comes after
Platform
Category
Spark
cori
applications/
debugging
Version
1.5-instru
Module
Install
Date
spark/1.5instru
2015-1204
1.5.1 spark/1.5.1
2016-0718
1.5.1-mkl spark/1.5.1mkl
2016-0325
cori
applications/
debugging
cori
applications/
debugging
Date Made
Default
Packa
ge
Platform
Category
Version
Module
Install
Date
Date Made
Default
cori
applications/
debugging
1.5.1-sc- spark/1.5.1instru
sc-instru
2016-0330
1.6.0 spark/1.6.0
2016-0718
2.0.0 spark/2.0.0
2016-0817
cori
applications/
debugging
cori
applications/
debugging
cori
applications/
debugging
hist-server
spark/histserver
2016-0201
edison
applications/
debugging
1.0.0 spark/1.0.0
2014-0710
1.0.2 spark/1.0.2
2014-0816
2014-08-16
1.1.0 spark/1.1.0
2014-0921
2014-10-31
1.1.0-shm spark/1.1.0shm
2014-1028
edison
applications/
debugging
edison
applications/
debugging
edison
applications/
debugging
Packa
ge
Platform
Category
Spark
edison
applications/
debugging
Version
Module
Install
Date
1.2.1 spark/1.2.1
2015-0217
2015-0313
Date Made
Default
2015-02-18
edison
edison
applications/
debugging
1.2.1- spark/1.2.1scratch
scratch
2015-0331
2015-04-02
1.3.1 spark/1.3.1
2015-0427
2015-05-15
1.3.1- spark/1.3.1scratch
scratch
2015-0812
1.4.1 spark/1.4.1
2015-0805
edison
applications/
debugging
edison
applications/
debugging
edison
applications/
debugging
2015-08-11
edison
applications/
debugging
1.5-rc1-inst
spark/1.5rc1-inst
2015-0917
1.5.0 spark/1.5.0
2015-0925
edison
applications/
debugging
2015-11-13
Packa
ge
Platform
Category
Spark
edison
applications/
debugging
Version
hist-server
Module
spark/histserver
Install
Date
2016-0628
edison
applications/
debugging
scratch spark/scratc
h
2015-0330
Date Made
Default