Documente Academic
Documente Profesional
Documente Cultură
[1] https://arxiv.org/pdf/1501.00092.pdf
[2] https://arxiv.org/pdf/1506.07552v2.pdf
[3] http://www.image-net.org/
[4] http://ivpl.ece.northwestern.edu/sites/default/files/07444187.pdf
[5] https://people.csail.mit.edu/celiu/pdfs/VideoSR.pdf
Overview:
We use example-based super-resolution (SR) techniques to enhance the
resolution of a video. Specifically, we focus on parallelizing computation such
that the processing time for each frame is ~1/60 seconds (i.e. 1 second per
60 frames). Current state-of-the-art in video SR has seen various degrees of
success, with processing times ranging from .24 seconds per frame[4] to
over a minute [5]. Reducing this time below the frame rate of a video is
crucial to applications which involve the live-stream of videos.
CNN: In this paper, we implement video SR via convolutional neural
networks, a machine learning algorithm that has proven to be quite effective
in image enhancement [1]. We treat the CNN as a black box in that the
network parameters are already known. The processing time will be a
function of the complexity of the network (i.e. more parameters = longer to
process). The CNN itself operates by mapping functions of input data to
desired outputs. In our case, the CNN seeks to determine F such that:
F ( X ) =Y
Where X is the low resolution version of the image and Y is the high
resolution image.
General CNN Architecture: Let the input low resolution (LR) image have size
M N with 3 channels. First layer processing in the CNN yields a result:
F1 ( X )=F ( W 1X +B 1 )
W1
B1
and
some activation function (for example, sigmoid, max, softmax etc.). The
second layer of the CNN follows the same form:
F2 ( X )=F ( W 2F 1 +B 2 )
More layers can be added to the network. The output of the
given by:
n th layer is
Fn ( X ) =F ( W nFn 1 + Bn )
where
n>1
CNN
Fram
e
CNN
Fram
e
CNN
HR
Vide
where
MN
MSE
( )
Where
f (X )
is
Video
Scale Down
Bicubic
Interpolatio
n
Gaussian
Blur
Random
Noise
LR Video
Zero-padding the inputs: Because the CNN we are using [1] produces an
output image smaller than the input, we must zero-pad each frame before
enhancing its resolution. The size of the zero-padded border will depend on
the parameters of the neural network. For method (2) we must zero-pad
each sub-frame. For method (3) we can either zero-pad each sub-frame or
only along the outer edge of the entire frame. Zero-padding adds extra
computation time that must be accounted for.
Plan:
1. Set up Amazon EC2 node clusters. Convert a video to LR.
2. Use Spark to enhance the video using the 3 methods listed above.
Record accuracy and processing time for each.
3. Repeat for other videos. Compare results to previous work.