Documente Academic
Documente Profesional
Documente Cultură
Period 6
1/3/17
Introduction:
Every year, computer hardware continually becomes more powerful. Central processing
units, CPUs, and discrete graphics Processing Units, GPUs, contain more and faster cores.
Random access memory, RAM, is also becoming faster and more abundant. But all of this extra
computing power goes to waste if programs and scripts do not take advantage of it. Many
programs and scripts run homogenously, meaning they only use one processing core (typically in
the CPU), and sequentially, meaning they execute one line at a time. They utilize a tiny fraction
of a computers processing power by doing this Even low-range computers have multi-core
CPUs. As a result, both the users and programmers experiences are degraded. The user suffers
longer load times and lower frames per second, while the programmer is forced to limit their
code in order to maintain a satisfactory performance for their users [2]. So, a programmer would
need parallelize their code, meaning it would execute multiple lines of code (or tasks)
concurrently, and make it heterogeneous, meaning it would use multiple cores in the CPU
(and/or GPU), in order to leverage the entire computing power of a system. Theoretically, a
parallelized, heterogeneous program would be far more efficient and faster than its sequential
counterpart. Thus, both the users and programmers experiences should improve [4].
And naturally, the question arises: Why arent all programs and scripts parallel and
heterogeneous? The three primary reasons are difficulty, portability, and complexity.
Parallelizing code challenges programmers because it is difficult to break down a task into
discrete operations that can run concurrently. Often one independent task requires information
from another to continue, so one needs to be able to regulate the tasks to wait for one another,
which is especially difficult when tasks move at different speeds. If this waiting for
dependencies does not occur, the program or script will not work. For example, if two people are
building a two-story building, one person makes the floors and ceilings and the other makes the
walls, the person making the walls must wait for the other person to make the floor before
starting [4]. On the other hand, portability, the usability of a piece of software in different
environments is a major problem with making code heterogeneous, especially when the code is
supposed to run on both the CPU and GPU. The way different cores and different pieces of
hardware interact and communicate varies from hardware to hardware. For example, the way one
graphics card might allocate memory could be extremely different from another. This could
cause ones program or script to not function on all machines or run with an extremely different
performance from machine to machine. To resolve this, a programmer would be forced to change
their code on a machine to machine basis [5]. Similar to the problem of difficulty, complexity is
another major issue. Parallelized programs sometimes have significantly more steps than their
sequential counterparts, leading some to argue that an optimized, sequential program is more
The application program interfaces, OpenCL, WebCL, OpenGL, and WebGL, seek to
resolve the first two issues for coding heterogeneously and in parallel. They provide a series of
functions that make parallelizing slightly easier, but most importantly, they enhance code
portability: Code written that implements one of these APIs should be able to run on all
machines, and the environment (aside from the power of the hardware) that the code is run on
should not significantly affect efficiency. OpenCL and WebCL are typically implemented for
computational code that does not involve graphics, while OpenGL and WebGL are implemented
exclusively for rendering graphics, such as those seen in videogames and graphing applications.
The Web varieties, designed to be run on internet browsers, are bindings to the Open APIs,
designed to be run locally, with more security features to stop malicious code. So in short,
theyre essentially the same, but the Web varieties will run less efficiently due to their built in
security. However, the most obvious benefit of the Web varieties is that can run on any device
without downloading an application [6 & 8]. So will code that implements OpenCL, WebCL,
OpenGl, or WebGL be faster than an optimized, serial counterpart because of its heterogeneous,
parallel nature? This research paper will be focusing on answering the question for WebGL in
Other research papers and studies have answered similar questions for OpenCL, WebCL,
and OpenGL. For example, one study concluded that OpenCL accelerated the calculations
required to render virtual trees [7]. Another study with a more board topic, conducted by McGill,
A machine learning algorithm for neural networks, an edge length calculation algorithm for
network graphs, a particle physics algorithm, and Googles algorithm to determine the popularity
of websites, among others. In almost every algorithm (with exception to the previously
mentioned neural network one and one that creates Markov models), WebCL performed better
than JavaScript and OpenCL performed better than C (Better: Speed-up of API over traditional
serial language > 1). And as expected, OpenCLs speed-up over C was greater than WebCLs
speed-up over JavaScript for nearly every benchmark. This is not unusual considering WebCL
has built in securities that hinder performance unlike OpenCL [1]. Along with many other
studies, McGill Universitys study confirmed the performance boosts offered by both APIs .
And while OpenCL, first released in 2009, and WebCL, first released in 2013, are still in
their infancy, OpenGL, first released 1992, has been researched thoroughly, implemented, and
improved for the past twenty five years. It has even become a gaming industry standard, being
used in every application from Doom (2016) to Angry Birds. One medical research group even
rendering methods, such as matplotlib [9]. Thus, it is already widely known that it offers
Conversely, the Khronos Group introduced WebGL to the industry just three years ago in
2013. WebGL has more performance-limiting securities to prevent malicious online attacks than
OpenGL [6]. Despite the possible performance boosts WebGL offers and the APIs potential to
be used on all devices, few developers have implemented it and few researchers have studied it.
Numerous medical research papers discuss implementing WebGL for medical (scan)
visualizations, but few actually measure the APIs effect on performance. For example, one
research study employed WebGL to render large medical scans but measured portability instead
of performance acceleration [3]. The field of Big Data modeling more than others could
especially reap the potential benefits from implementing WebGL. Networks, a model frequently
used for Big Data, could be rendered faster on any device and accessed anywhere in the world
(So long as one has an internet connection). Additionally, large networks, which characterize Big
Data models, tend to be plagued by long render times that frustrate researchers and impede
progress. WebGL will offer performance benefits for large, 2-D network rendering over
traditional, serial, JavaScript, graphics-rendering functions because of its parallel and
heterogeneous nature.
Methodology:
The benchmark used to measure the performance boosts offered by WebGL revolved
network graphs, meaning networks graphs where the user could move around, while handling
most of the graphics generating back-end. Written entirely in JavaScript, the library had three
primary objects: Node, Edge, and Sigma. The Node object had three critical parameters (The
others relate to ascetics): A node id, an x location, and a y location. The Edge object had two
id and a target node id. The final object, Sigma, also had two
Sigma object was instantiated, the network graph is rendered. Pict. 1: The basic code used to
instantiate Sigma
Sigma by default had two renderers: One that used traditional
The benchmark that implemented SigmaJS recorded the amount of time it took to render
2^x nodes and 2^(x+1) edges with both randomly placed nodes and edges with random sources
and targets. The value of x started at zero and incremented to a maximum value of sixteen -
because of memory constraints. And in order to obtain accurate results, the benchmark measured
and averaged twenty times for each value of x because background processes vary constantly,
which can irregularly affect results. For instance, with an x value of five, a graph with 2^5 nodes
and 2^6 edges was rendered twenty times, a time was recorded
these times. The graph creation code was relatively simple with
source node and a random target node (from the node array), and
nodes and edges were instantiated, the Sigma object was created with the node array, edge array,
and an HTML canvas element Where the graph was actually displayed. The benchmark then
killed the instance of the graph and either created another graph with the same value of x in order
to find an average or created a graph with a new value of x incremented by one (+1, # Nodes:
To accurately reflect a real world environment where WebGL would be used, the
benchmark was uploaded to a local On the same network, not public Apache web server. The
web server had an Intel Celeron 1007U CPU @ 1.50 GHz and 4 GB of RAM. The web server
did not need high processing power because the network rendering occured client-side. Its
purpose was to serve the benchmark to the machines where WebGL would be benchmarked.
Two machines ran the benchmark, a desktop computer with a discrete graphics card and a
mobile phone. The desktop with a discrete graphics card had an Intel i7-3770k Quad-Core (But 8
threads) CPU @ 3.9 GHz, 16 GB of RAM, and an Nvidia 780 TI graphics card with a core clock
at .980 GHz. The mobile phone had a Snapdragon 801 Quad-Core CPU @ 2.3 GHz, 2 GB of
RAM, and an Adreno 330 graphics chip. The two machines represented the two major
environments WebGL would be used in. The value of x on the mobile device (# of Nodes: 2^x; #
of Edges 2^x+1) was restricted to 7 due to the limited amount of RAM on the device. This
should not affect results because the portability of WebGL among the devices is not being
compared. The focus of the paper, the performance of WebGL compared to traditional rendering
On all the machines the benchmark was run twice with serial JavaScript graphics-
rendering functions as the network renderer for the first time and WebGL as the network renderer
for the second time. The times outputted by the benchmark Each benchmark should output x
average times were used to calculate speed-up: Old Time (Average Time for x-1)/New Time
Fig 1. Fig 2.
Numb Speed Speed
Speed Speed
er of -up
-up -up
-up
Node with
WebG with
Serial
s LWebG Serial
2 L 1 1
4 0.8831 1.2637
46 84
8 1.1772 0.9815
49 5
16 0.8811 0.8089
19 55
32 0.8060 0.6590
88 16
64 0.7692 0.6052
97 79
128 0.5430 0.6600
14 73
256 1.6096 0.5705
02 53
512 0.8688 0.5303
25 21
1024 0.7568 0.4967
96 51
2048 0.6988 0.4715
33 88
4096 0.7395 0.4459
34 63
8192 0.5056 0.4850
23 27
1638 0.4804 0.4573
4 92 44
3276 0.4950 0.4742
8 42 76
6553 0.4964 0.4661
6 49 58
Numb
er of
Nodes
2 1 1
4 0.8335 0.7766
72 1
8 1.3468 1.4922
1 94
16 0.6273 0.4731
51 87
32 0.9629 0.9522
49 06
64 1.0701 1.2064
99 46
128 0.6256 0.4669
4 97
256 1.1410 0.9790
51 45
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
-------------------------------
1.5
Speed-up 1 WebGL
Serial Renderer
0.5
0
0 20000 40000 60000 80000
Number of Nodes
1.5
1 WebGL
Speed-up
0.5 Serial Renderer
0
100 300 500 700 900
0 200 400 600 800 1000
Number of Nodes
Speed-up of Mobile Phone - Fig. 5
2
1.5
Speed-up 1 WebGL
Serial Renderer
0.5
0
0 50 100 150 200 250 300
Number of Nodes
2.5
0.5
0
0 20000 40000 60000 80000
Speed-ups Compared of Mobile Phone - Fig. 8
1.6
1.4
1.2
1
Speed-ups Compared of Deskop with Discrete Graphics (Restricted Domain to 1000) - Fig. 7
3
2.5
0.5
0
0 200 400 600 800 1000
When looking at Figures 7 and 4, one would notice that at first with small node values
(Nodes<32), the speed-ups of both the serial renderer and WebGL were relatively equal and had
extremely negative slopes. Then, around 64 nodes, the slope of the serial renders speed-up
became about 0 (slightly negative), while the slope of WebGL continued to remain negative. The
slope of WebGLs speed-up pivoted to positive at 128 nodes. WebGLs maximum speed-up of 1.6
was at 256 nodes and was 180% greater than the speed-up of the serial renderer. From 256
nodes to 512 nodes, the WebGL renderers speed-up plummeted from 1.6 to .86 (When speed-up
< 1 Process slowing down) as shown in Figure 1. This speed-up of .8 (+-.1) for WebGL
remained relatively constant until 4096 nodes, when its slope once again became extremely
negative, dropping from .74 to .50. From this point to the maximum number of nodes
benchmarked, 65536, the speed-up stayed constant at .495 +- .005 [Fig. 4]. On the other hand,
the slope of the serial renders slope (second derivative) remained at a near-constant rate from
128 nodes to 2048 nodes, the speed-ups decreasing from .66 to .47 as shown in Figures 1 and 3.
After this, the speed-up of the serial renderer became relatively sporadic jumping: From .47 to .
45 to .49 to .46 and finally to .47 at 32768 nodes. The speed-up remained constant at about .47
from 32768 nodes the speed-up to 65536 [Fig. 1 & 3]. Overall, as shown in Figures 7 and 8,
WebGL maintained a significantly higher speed-up than the serial renderer starting at 256 nodes
and ending at 2048 nodes. After a significant drop in speed-up at 2048 nodes, the speed-up of
WebGL was about 6% faster than the serial renderer throughout the rest of the benchmark [Fig.
7].
The speed-ups of WebGL and the serial renderer on the mobile phone were nearly-
equivalent until 128 nodes [Fig. 2, 5, & 8]. At about 128 nodes, the speed-up of WebGL
surpassed that of the serial renderer. The speed-up of WebGL at the maximum node value of the
benchmark, 256 nodes, was 16% than the speed-up of the serial renderer [Fig. 8].
Discussion:
Discussion - Desktop
The near-equivalent speed-ups of the two renderers on the desktop machine for node
values less than 128 was expected. WebGLs transfer of data to the graphics card and other cores
acted as a hindrance for these low node values. There was a delay in this transfer that the
processing power of the graphics card and other cores could not compensate for because there
was not enough work to saturate the components Their full processing power was not being
leveraged. The performance jump of WebGL from 128 nodes to 256 nodes occurred because the
graphics card and processor cores were becoming fully saturated with work Their full
processing power was starting to be used, and it compensated for the data-transfer delay. Thus,
speed-ups were yielded. The decrease in speed-up of WebGL from 4096 nodes to 8192 nodes
was most likely caused by the way the SigmaJS rendering algorithm or WebGL stored and
mapped variables changing. Or, the amount of RAM allocated to the browser might have been
filled and garbage-collecting of unused data was required. The relatively constant speed-up after
8192 of .495 +-.05 suggested all available processing power from the GPU and CPU was being
used. Further, it suggested the behavior of SigmaJS and WebGLs algorithms regarding memory
allocation and garbage collection remained constant. Similarly, the nearly constant speed-up
(+-.1) of the serial renderer after 128 nodes intimated the processing power available to it (one
CPU core) was saturated and the algorithms regarding memory allocation and garbage collection
were not changing. Once the serial and WebGL renderers speed-ups became constant, WebGLs
speed-up was about 6% greater than the serial renderers I was expecting the magnitude of
WebGLs speed-up to be at least 100% greater when there are more than 1000 nodes. The results
concur with the original hypothesis that WebGL would offer performance benefits for 2-D
network rendering over traditional, serial, JavaScript, graphics-rendering functions because of its
parallel and heterogeneous nature. However, I expected WebGL to provide at least 100% greater
speed-ups than serial rendering when node values are greater than 1000 because it can use the
processing power of the entire GPU and all of the cores of the CPU. One possible reason for the
low performance is that SigmaJS implementation of WebGL had too many data transfers
between components CPU cores and GPU, hindering the performance of the rendering.
The results of the benchmark on the Mobile Phone were inconclusive. Because of the
limited amount of RAM, only a maximum of 256 nodes could be graphed and not enough data
could be gathered. Neither the speed-up of WebGL nor the speed-up of the serial renderer
became constant like in the desktop benchmark. Additionally, a maximum speed-up could not be
determined for either WebGL or the serial renderer because at the maximum number of nodes,
256, the speed-up (from what we can see in the data) was still increasing [Fig 5]. Thus, it was
impossible to form an accurate conclusion regarding WebGL on mobile devices. But the lack of
data stems from a flaw in the methodology: The browser had too little RAM allocated too it (The
phone had available RAM, but the browser limited the amount of RAM a single web page could
consume).
In order to accurately assess my hypothesis on mobile devices, I need to gather more data
points with a greater maximum node value (>1000). And to gather more data points, the
benchmark must have access to more RAM. This can be achieved by using a different browser
Javascript would involve measuring the performance (the frames per second) of an animated
network (The user can move around or zoom in and out) A type of network SigmaJS provides
support for. It would be especially interesting considering WebGL was designed for animating.
References:
[1] A Comparative Study of Native and Web Technologies. (2014). McGill University.
[2] Barney. (2016). Introduction to Parallel Computing.
[3] Congote, Segura, Kabongo, & Ruiz. (2011). Interactive visualization of volumetric
data with WebGL in real-time.
[4] Introduction to Parallel Computing (2nd ed.). (2003). Addison-Wesley.
[5] Jskelinen, de La Lama, Schnette, Raiskila, Takala, & Berg. (2015). pocl: A
Performance-Portable OpenCL Implementation. International Journal of Parallel
Programming.
[6] [The Khronos Group]. (2016). Retrieved October 12, 2016, from
https://www.khronos.org/
[7] Kohek, ., & Strnad, D. (2015). Interactive synthesis of self-organizing tree models
on the GPU. Computing, 97(2), 145-169. doi:10.1007/s00607-014-0424-7
[8] OpenGL. (2016). Retrieved October 22, 2016, from https://www.opengl.org/
[9] Rossant, & Harris. (2013). Hardware-accelerated interactive data visualization for
neuroscience in Python.
[10] SigmaJS. (2016). Retrieved October 22, 2016, from http://sigmajs.org/