Cloudproject

Related Works:
[1] https://arxiv.org/pdf/1501.00092.pdf
[2] https://arxiv.org/pdf/1506.07552v2.pdf
[3] http://www.image-net.org/
[4] http://ivpl.ece.northwestern.edu/sites/default/files/07444187.pdf
[5] https://people.csail.mit.edu/celiu/pdfs/VideoSR.pdf
Overview:
We use example-based super-resolution (SR) techniques to enhance the
resolution of a video. Specifically, we focus on parallelizing computation such
that the processing time for each frame is ~1/60 seconds (i.e. 1 second per
60 frames). Current state-of-the-art in video SR has seen various degrees of
success, with processing times ranging from .24 seconds per frame[4] to
over a minute [5]. Reducing this time below the frame rate of a video is
crucial to applications which involve the live-stream of videos.
CNN: In this paper, we implement video SR via convolutional neural
networks, a machine learning algorithm that has proven to be quite effective
in image enhancement [1]. We treat the CNN as a black box in that the
network parameters are already known. The processing time will be a
function of the complexity of the network (i.e. more parameters = longer to
process). The CNN itself operates by mapping functions of input data to
desired outputs. In our case, the CNN seeks to determine F such that:
F ( X ) =Y
Where X is the low resolution version of the image and Y is the high
resolution image.
General CNN Architecture: Let the input low resolution (LR) image have size
M N with 3 channels. First layer processing in the CNN yields a result:
F1 ( X )=F ( W 1X +B 1 )
W1
B1
are the weights and biases and denotes the

convolution operation. W 1 has dimensions f 1 f 1 . The function F is
Where
and
some activation function (for example, sigmoid, max, softmax etc.). The
second layer of the CNN follows the same form:
F2 ( X )=F ( W 2F 1 +B 2 )
More layers can be added to the network. The output of the
given by:
n th layer is
Fn ( X ) =F ( W nFn 1 + Bn )
where
n>1
The final layer of the network is:

F ( X ) =W k F k1 + Bk
Where k is the number of layers in the CNN. Thus, the final layer produces
an image that approximates the ground-truth version of the image. The
parameters we must find are:
{ W 1 , ,W k , B1 , , Bk }
This is found by minimizing the function:
M
k
k
2
1
E= |F i ( X ) Y i| + 1 W 2i + 2 B2i
n i=1
2 i=1
2 i=1
Processing a video: From the weights and biases determined above we can
use them to process a low-resolution image or video file. We aim for
accuracy and efficiency and thus process the video in several ways:
(1)Processing the whole video in parallel. Individual frames are processed
separately on
separate nodes.
Fram
e
Vide
o
CNN
Fram
e
CNN
Fram
e
CNN
HR
Vide
Each frame is processed on a separate node. We use an Amazon EC2

cluster with Spark to implement the process. Processing time per
frame is calculated by:
Total Processing Time
Processing Time per Frame=
Number of Frames
The accuracy is determined by the PSNR metric:
2
1
255 2
MSE=
|f ( X )Y |
PSNR=10 log
where
MN
MSE
( )
Where
is the ground-truth version of each frame and
f (X )
is
the output frame from the above process.

(2)Processing one frame at a time. Each frame is split into smaller subframes and each sub-frame is sent to separate nodes for processing.
The processed sub-images are concatenated together to produce the
final result.
(3)We experiment with overlapping sub-images from a single frame. This

method is the same as (2) except sub-frames are overlapping to
compensate for image reduction when passing through the CNN.
Obtaining LR Video: Let the ground-truth image have dimensions M N
with 3 channels. The LR video is obtained by first, scaling the video down
1/ n where n>1 . Then, bicubic interpolation is performed to scale the
video back up to dimension M N . Next, Gaussian Blur is added to the
video. Finally, artificial noise is introduced. The process can be summarized
by the equation:
Y =W X +
Where
is the filter corresponding to

Gaussian blur multiplied by the filter for bicubic interpolation. X denotes
the down-sampled video and Y is the LR output.
Video
is the random noise and
Scale Down
Bicubic
Interpolatio
n
Gaussian
Blur
Random
Noise
LR Video
Zero-padding the inputs: Because the CNN we are using [1] produces an
output image smaller than the input, we must zero-pad each frame before
enhancing its resolution. The size of the zero-padded border will depend on
the parameters of the neural network. For method (2) we must zero-pad
each sub-frame. For method (3) we can either zero-pad each sub-frame or
only along the outer edge of the entire frame. Zero-padding adds extra
computation time that must be accounted for.
Plan:
1. Set up Amazon EC2 node clusters. Convert a video to LR.
2. Use Spark to enhance the video using the 3 methods listed above.
Record accuracy and processing time for each.
3. Repeat for other videos. Compare results to previous work.

Cloudproject

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Cloudproject

Încărcat de

Drepturi de autor:

Formate disponibile

Related Works:

are the weights and biases and denotes the

The final layer of the network is:

Each frame is processed on a separate node. We use an Amazon EC2

is the ground-truth version of each frame and

the output frame from the above process.

(3)We experiment with overlapping sub-images from a single frame. This

is the filter corresponding to

is the random noise and

S-ar putea să vă placă și