Sunteți pe pagina 1din 16

Bryce Plunkett

Period 6

1/3/17

The Need for Speed: WebGL and Network Rendering

Introduction:

Every year, computer hardware continually becomes more powerful. Central processing

units, CPUs, and discrete graphics Processing Units, GPUs, contain more and faster cores.

Random access memory, RAM, is also becoming faster and more abundant. But all of this extra

computing power goes to waste if programs and scripts do not take advantage of it. Many

programs and scripts run homogenously, meaning they only use one processing core (typically in

the CPU), and sequentially, meaning they execute one line at a time. They utilize a tiny fraction

of a computers processing power by doing this Even low-range computers have multi-core

CPUs. As a result, both the users and programmers experiences are degraded. The user suffers

longer load times and lower frames per second, while the programmer is forced to limit their

code in order to maintain a satisfactory performance for their users [2]. So, a programmer would

need parallelize their code, meaning it would execute multiple lines of code (or tasks)

concurrently, and make it heterogeneous, meaning it would use multiple cores in the CPU

(and/or GPU), in order to leverage the entire computing power of a system. Theoretically, a

parallelized, heterogeneous program would be far more efficient and faster than its sequential

counterpart. Thus, both the users and programmers experiences should improve [4].
And naturally, the question arises: Why arent all programs and scripts parallel and

heterogeneous? The three primary reasons are difficulty, portability, and complexity.

Parallelizing code challenges programmers because it is difficult to break down a task into

discrete operations that can run concurrently. Often one independent task requires information

from another to continue, so one needs to be able to regulate the tasks to wait for one another,

which is especially difficult when tasks move at different speeds. If this waiting for

dependencies does not occur, the program or script will not work. For example, if two people are

building a two-story building, one person makes the floors and ceilings and the other makes the

walls, the person making the walls must wait for the other person to make the floor before

starting [4]. On the other hand, portability, the usability of a piece of software in different

environments is a major problem with making code heterogeneous, especially when the code is

supposed to run on both the CPU and GPU. The way different cores and different pieces of

hardware interact and communicate varies from hardware to hardware. For example, the way one

graphics card might allocate memory could be extremely different from another. This could

cause ones program or script to not function on all machines or run with an extremely different

performance from machine to machine. To resolve this, a programmer would be forced to change

their code on a machine to machine basis [5]. Similar to the problem of difficulty, complexity is

another major issue. Parallelized programs sometimes have significantly more steps than their

sequential counterparts, leading some to argue that an optimized, sequential program is more

efficient than a parallelized form of the program [4].

The application program interfaces, OpenCL, WebCL, OpenGL, and WebGL, seek to

resolve the first two issues for coding heterogeneously and in parallel. They provide a series of

functions that make parallelizing slightly easier, but most importantly, they enhance code
portability: Code written that implements one of these APIs should be able to run on all

machines, and the environment (aside from the power of the hardware) that the code is run on

should not significantly affect efficiency. OpenCL and WebCL are typically implemented for

computational code that does not involve graphics, while OpenGL and WebGL are implemented

exclusively for rendering graphics, such as those seen in videogames and graphing applications.

The Web varieties, designed to be run on internet browsers, are bindings to the Open APIs,

designed to be run locally, with more security features to stop malicious code. So in short,

theyre essentially the same, but the Web varieties will run less efficiently due to their built in

security. However, the most obvious benefit of the Web varieties is that can run on any device

without downloading an application [6 & 8]. So will code that implements OpenCL, WebCL,

OpenGl, or WebGL be faster than an optimized, serial counterpart because of its heterogeneous,

parallel nature? This research paper will be focusing on answering the question for WebGL in

regards to rendering network graphs.

Other research papers and studies have answered similar questions for OpenCL, WebCL,

and OpenGL. For example, one study concluded that OpenCL accelerated the calculations

required to render virtual trees [7]. Another study with a more board topic, conducted by McGill,

tested WebCL vs JavaScript and OpenCL vs C using a variety of resource-intensive algorithms:

A machine learning algorithm for neural networks, an edge length calculation algorithm for

network graphs, a particle physics algorithm, and Googles algorithm to determine the popularity

of websites, among others. In almost every algorithm (with exception to the previously

mentioned neural network one and one that creates Markov models), WebCL performed better

than JavaScript and OpenCL performed better than C (Better: Speed-up of API over traditional

serial language > 1). And as expected, OpenCLs speed-up over C was greater than WebCLs
speed-up over JavaScript for nearly every benchmark. This is not unusual considering WebCL

has built in securities that hinder performance unlike OpenCL [1]. Along with many other

studies, McGill Universitys study confirmed the performance boosts offered by both APIs .

And while OpenCL, first released in 2009, and WebCL, first released in 2013, are still in

their infancy, OpenGL, first released 1992, has been researched thoroughly, implemented, and

improved for the past twenty five years. It has even become a gaming industry standard, being

used in every application from Doom (2016) to Angry Birds. One medical research group even

implemented it to graph large neurological datasets and compared it to traditional network

rendering methods, such as matplotlib [9]. Thus, it is already widely known that it offers

significant performance boosts for rendering graphics.

Conversely, the Khronos Group introduced WebGL to the industry just three years ago in

2013. WebGL has more performance-limiting securities to prevent malicious online attacks than

OpenGL [6]. Despite the possible performance boosts WebGL offers and the APIs potential to

be used on all devices, few developers have implemented it and few researchers have studied it.

Numerous medical research papers discuss implementing WebGL for medical (scan)

visualizations, but few actually measure the APIs effect on performance. For example, one

research study employed WebGL to render large medical scans but measured portability instead

of performance acceleration [3]. The field of Big Data modeling more than others could

especially reap the potential benefits from implementing WebGL. Networks, a model frequently

used for Big Data, could be rendered faster on any device and accessed anywhere in the world

(So long as one has an internet connection). Additionally, large networks, which characterize Big

Data models, tend to be plagued by long render times that frustrate researchers and impede

progress. WebGL will offer performance benefits for large, 2-D network rendering over
traditional, serial, JavaScript, graphics-rendering functions because of its parallel and

heterogeneous nature.

Methodology:

The benchmark used to measure the performance boosts offered by WebGL revolved

around the JavaScript framework, SigmaJS. It allowed developers to generate interactive

network graphs, meaning networks graphs where the user could move around, while handling

most of the graphics generating back-end. Written entirely in JavaScript, the library had three

primary objects: Node, Edge, and Sigma. The Node object had three critical parameters (The

others relate to ascetics): A node id, an x location, and a y location. The Edge object had two

critical parameters (The others related to ascetics): A source node

id and a target node id. The final object, Sigma, also had two

critical parameters: A canvas element and a graph object that

contains an array of Nodes and an array of Edges. After the

Sigma object was instantiated, the network graph is rendered. Pict. 1: The basic code used to
instantiate Sigma
Sigma by default had two renderers: One that used traditional

JavaScript graphics rendering functions and the other that used

WebGL [10]. Thus, the framework provided an ideal

environment to compare WebGL to traditional, JavaScript rendering techniques.

The benchmark that implemented SigmaJS recorded the amount of time it took to render

2^x nodes and 2^(x+1) edges with both randomly placed nodes and edges with random sources

and targets. The value of x started at zero and incremented to a maximum value of sixteen -

because of memory constraints. And in order to obtain accurate results, the benchmark measured

and averaged twenty times for each value of x because background processes vary constantly,
which can irregularly affect results. For instance, with an x value of five, a graph with 2^5 nodes

and 2^6 edges was rendered twenty times, a time was recorded

for each render, and an average was calculated out of all of

these times. The graph creation code was relatively simple with

three primary components: A for-loop where the node objects

were instantiated, a for-loop where the edge objects were

instantiated, and an instantiation of a Sigma object. The node for-

loop created a node object, assigned it an id, gave it a random


Pict. 2: One of the networks created
location, and stored it into a node array It did this 2^x times. by the benchmark

The edge for-loop created an edge object, assigned it a random

source node and a random target node (from the node array), and

stored it in an edge array It did this 2^(x+1) times. Once the

nodes and edges were instantiated, the Sigma object was created with the node array, edge array,

and an HTML canvas element Where the graph was actually displayed. The benchmark then

killed the instance of the graph and either created another graph with the same value of x in order

to find an average or created a graph with a new value of x incremented by one (+1, # Nodes:

2^x) from the previous value.

To accurately reflect a real world environment where WebGL would be used, the

benchmark was uploaded to a local On the same network, not public Apache web server. The

web server had an Intel Celeron 1007U CPU @ 1.50 GHz and 4 GB of RAM. The web server

did not need high processing power because the network rendering occured client-side. Its

purpose was to serve the benchmark to the machines where WebGL would be benchmarked.
Two machines ran the benchmark, a desktop computer with a discrete graphics card and a

mobile phone. The desktop with a discrete graphics card had an Intel i7-3770k Quad-Core (But 8

threads) CPU @ 3.9 GHz, 16 GB of RAM, and an Nvidia 780 TI graphics card with a core clock

at .980 GHz. The mobile phone had a Snapdragon 801 Quad-Core CPU @ 2.3 GHz, 2 GB of

RAM, and an Adreno 330 graphics chip. The two machines represented the two major

environments WebGL would be used in. The value of x on the mobile device (# of Nodes: 2^x; #

of Edges 2^x+1) was restricted to 7 due to the limited amount of RAM on the device. This

should not affect results because the portability of WebGL among the devices is not being

compared. The focus of the paper, the performance of WebGL compared to traditional rendering

techniques, does not seek to compare WebGLs performance on different machines.

On all the machines the benchmark was run twice with serial JavaScript graphics-

rendering functions as the network renderer for the first time and WebGL as the network renderer

for the second time. The times outputted by the benchmark Each benchmark should output x

average times were used to calculate speed-up: Old Time (Average Time for x-1)/New Time

(Average Time for x).


Results:

Fig 1. Fig 2.
Numb Speed Speed
Speed Speed
er of -up
-up -up
-up
Node with
WebG with
Serial
s LWebG Serial
2 L 1 1
4 0.8831 1.2637
46 84
8 1.1772 0.9815
49 5
16 0.8811 0.8089
19 55
32 0.8060 0.6590
88 16
64 0.7692 0.6052
97 79
128 0.5430 0.6600
14 73
256 1.6096 0.5705
02 53
512 0.8688 0.5303
25 21
1024 0.7568 0.4967
96 51
2048 0.6988 0.4715
33 88
4096 0.7395 0.4459
34 63
8192 0.5056 0.4850
23 27
1638 0.4804 0.4573
4 92 44
3276 0.4950 0.4742
8 42 76
6553 0.4964 0.4661
6 49 58
Numb
er of
Nodes
2 1 1
4 0.8335 0.7766
72 1
8 1.3468 1.4922
1 94
16 0.6273 0.4731
51 87
32 0.9629 0.9522
49 06
64 1.0701 1.2064
99 46
128 0.6256 0.4669
4 97
256 1.1410 0.9790
51 45
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
-------------------------------

Speed-up of Desktop with Discrete Graphics - Fig. 3


2

1.5

Speed-up 1 WebGL
Serial Renderer
0.5

0
0 20000 40000 60000 80000

Number of Nodes

Speed-up of Desktop with Discrete Graphics (Restricted Domain to 1000) - Fig. 4


2

1.5

1 WebGL
Speed-up
0.5 Serial Renderer

0
100 300 500 700 900
0 200 400 600 800 1000

Number of Nodes
Speed-up of Mobile Phone - Fig. 5
2

1.5

Speed-up 1 WebGL
Serial Renderer
0.5

0
0 50 100 150 200 250 300

Number of Nodes

Speed-ups Compared of Deskop with Discrete Graphics (WebGL/Serial Renderer) - Fig 6


3

2.5

Speed-Up WebGL/Speed-up of Serial 1.5


1

0.5

0
0 20000 40000 60000 80000
Speed-ups Compared of Mobile Phone - Fig. 8
1.6
1.4
1.2
1

Speed-up WebGL/Serial 0.8


0.6
0.4
0.2
0
0 50 100 150 200 250 300

Speed-ups Compared of Deskop with Discrete Graphics (Restricted Domain to 1000) - Fig. 7
3

2.5

Speed-Up WebGL/Speed-up of Serial 1.5


1

0.5

0
0 200 400 600 800 1000

Results Desktop with Discrete Graphics

When looking at Figures 7 and 4, one would notice that at first with small node values

(Nodes<32), the speed-ups of both the serial renderer and WebGL were relatively equal and had

extremely negative slopes. Then, around 64 nodes, the slope of the serial renders speed-up

became about 0 (slightly negative), while the slope of WebGL continued to remain negative. The

slope of WebGLs speed-up pivoted to positive at 128 nodes. WebGLs maximum speed-up of 1.6
was at 256 nodes and was 180% greater than the speed-up of the serial renderer. From 256

nodes to 512 nodes, the WebGL renderers speed-up plummeted from 1.6 to .86 (When speed-up

< 1 Process slowing down) as shown in Figure 1. This speed-up of .8 (+-.1) for WebGL

remained relatively constant until 4096 nodes, when its slope once again became extremely

negative, dropping from .74 to .50. From this point to the maximum number of nodes

benchmarked, 65536, the speed-up stayed constant at .495 +- .005 [Fig. 4]. On the other hand,

the slope of the serial renders slope (second derivative) remained at a near-constant rate from

128 nodes to 2048 nodes, the speed-ups decreasing from .66 to .47 as shown in Figures 1 and 3.

After this, the speed-up of the serial renderer became relatively sporadic jumping: From .47 to .

45 to .49 to .46 and finally to .47 at 32768 nodes. The speed-up remained constant at about .47

from 32768 nodes the speed-up to 65536 [Fig. 1 & 3]. Overall, as shown in Figures 7 and 8,

WebGL maintained a significantly higher speed-up than the serial renderer starting at 256 nodes

and ending at 2048 nodes. After a significant drop in speed-up at 2048 nodes, the speed-up of

WebGL was about 6% faster than the serial renderer throughout the rest of the benchmark [Fig.

7].

Results Mobile Phone

The speed-ups of WebGL and the serial renderer on the mobile phone were nearly-

equivalent until 128 nodes [Fig. 2, 5, & 8]. At about 128 nodes, the speed-up of WebGL

surpassed that of the serial renderer. The speed-up of WebGL at the maximum node value of the

benchmark, 256 nodes, was 16% than the speed-up of the serial renderer [Fig. 8].
Discussion:

Discussion - Desktop

The near-equivalent speed-ups of the two renderers on the desktop machine for node

values less than 128 was expected. WebGLs transfer of data to the graphics card and other cores

acted as a hindrance for these low node values. There was a delay in this transfer that the

processing power of the graphics card and other cores could not compensate for because there

was not enough work to saturate the components Their full processing power was not being

leveraged. The performance jump of WebGL from 128 nodes to 256 nodes occurred because the

graphics card and processor cores were becoming fully saturated with work Their full

processing power was starting to be used, and it compensated for the data-transfer delay. Thus,

speed-ups were yielded. The decrease in speed-up of WebGL from 4096 nodes to 8192 nodes

was most likely caused by the way the SigmaJS rendering algorithm or WebGL stored and

mapped variables changing. Or, the amount of RAM allocated to the browser might have been

filled and garbage-collecting of unused data was required. The relatively constant speed-up after

8192 of .495 +-.05 suggested all available processing power from the GPU and CPU was being

used. Further, it suggested the behavior of SigmaJS and WebGLs algorithms regarding memory

allocation and garbage collection remained constant. Similarly, the nearly constant speed-up

(+-.1) of the serial renderer after 128 nodes intimated the processing power available to it (one

CPU core) was saturated and the algorithms regarding memory allocation and garbage collection

were not changing. Once the serial and WebGL renderers speed-ups became constant, WebGLs

speed-up was about 6% greater than the serial renderers I was expecting the magnitude of

WebGLs speed-up to be at least 100% greater when there are more than 1000 nodes. The results

concur with the original hypothesis that WebGL would offer performance benefits for 2-D
network rendering over traditional, serial, JavaScript, graphics-rendering functions because of its

parallel and heterogeneous nature. However, I expected WebGL to provide at least 100% greater

speed-ups than serial rendering when node values are greater than 1000 because it can use the

processing power of the entire GPU and all of the cores of the CPU. One possible reason for the

low performance is that SigmaJS implementation of WebGL had too many data transfers

between components CPU cores and GPU, hindering the performance of the rendering.

Discussion Mobile Phone:

The results of the benchmark on the Mobile Phone were inconclusive. Because of the

limited amount of RAM, only a maximum of 256 nodes could be graphed and not enough data

could be gathered. Neither the speed-up of WebGL nor the speed-up of the serial renderer

became constant like in the desktop benchmark. Additionally, a maximum speed-up could not be

determined for either WebGL or the serial renderer because at the maximum number of nodes,

256, the speed-up (from what we can see in the data) was still increasing [Fig 5]. Thus, it was

impossible to form an accurate conclusion regarding WebGL on mobile devices. But the lack of

data stems from a flaw in the methodology: The browser had too little RAM allocated too it (The

phone had available RAM, but the browser limited the amount of RAM a single web page could

consume).

Discussion Future experiments

In order to accurately assess my hypothesis on mobile devices, I need to gather more data

points with a greater maximum node value (>1000). And to gather more data points, the

benchmark must have access to more RAM. This can be achieved by using a different browser

that does not limit RAM usage.


Another possible experiment to assess WebGLs performance compared to traditional

Javascript would involve measuring the performance (the frames per second) of an animated

network (The user can move around or zoom in and out) A type of network SigmaJS provides

support for. It would be especially interesting considering WebGL was designed for animating.

References:

[1] A Comparative Study of Native and Web Technologies. (2014). McGill University.
[2] Barney. (2016). Introduction to Parallel Computing.
[3] Congote, Segura, Kabongo, & Ruiz. (2011). Interactive visualization of volumetric
data with WebGL in real-time.
[4] Introduction to Parallel Computing (2nd ed.). (2003). Addison-Wesley.
[5] Jskelinen, de La Lama, Schnette, Raiskila, Takala, & Berg. (2015). pocl: A
Performance-Portable OpenCL Implementation. International Journal of Parallel
Programming.
[6] [The Khronos Group]. (2016). Retrieved October 12, 2016, from
https://www.khronos.org/
[7] Kohek, ., & Strnad, D. (2015). Interactive synthesis of self-organizing tree models
on the GPU. Computing, 97(2), 145-169. doi:10.1007/s00607-014-0424-7
[8] OpenGL. (2016). Retrieved October 22, 2016, from https://www.opengl.org/
[9] Rossant, & Harris. (2013). Hardware-accelerated interactive data visualization for
neuroscience in Python.
[10] SigmaJS. (2016). Retrieved October 22, 2016, from http://sigmajs.org/

S-ar putea să vă placă și