Documente Academic
Documente Profesional
Documente Cultură
Pattern Recognition
journal homepage: www.elsevier.com/locate/patcog
a r t i c l e i n f o a b s t r a c t
Article history: Computer vision tasks prefer the images focused at the related objects for a better performance, which
Received 31 January 2017 requests a Auto-ReFocusing (ARF) function for using light field cameras. However, the current ARF
Revised 9 February 2018
schemes are time-consuming in practice, because they commonly need to render an image sequence
Accepted 23 March 2018
for finding the optimally refocused frame. This paper presents an efficient ARF solution for light-field
Available online 30 March 2018
cameras based on modeling the refocusing point spread function (R-PSF). The R-PSF holds a simple linear
MSC: relationship between refocusing depth and defocus blurriness. Such a linear relationship enables to deter-
00-01 mine the two candidates of the optimally refocused frame from only one initial refocused image. Because
99-00 our method only involves three times of refocusing rendering for finding the optimally refocused frame,
it is much more efficient than the current “rendering and selection” solutions which need to render a
Keywords:
Auto-refocusing
large number of refocused images.
Detection-based focusing © 2018 Elsevier Ltd. All rights reserved.
Blurriness measure
Light-field photography
1. Introduction ages. However, as the counterpart in light field cameras, ARF has
not been systematically investigated to the best of our knowledge.
Light field photography offers an impressive feature of render- Valid images for computer vision tasks should be focused at
ing the images refocused at user-specified object after the light the interested objects related to their applications. For example,
field image was captured [1]. This feature shows a promising po- biometric scanners are only sensitive to biometric modality [2,3],
tential for applying light field cameras to computer vision tasks, i.e. faces or iris; cameras for autonomous driving need to focus
e.g. mobile robotics, autonomous driving, biometrics, surveillance at vehicles, pedestrians and traffic signs. Even in consumer imag-
etc. In these applications, a basic request for using light field cam- ing area, the great majority of pictures are of human and humans
eras is how to automatically refocus at interested objects, e.g. faces [4], which impels face-detection-based AF to be equipped as
marks, signs, vehicles, faces, iris etc., it is essentially similar to a standard feature for most consumer cameras. The significance of
Auto-Focus (AF) in conventional cameras, so it can be named as Detection-based AF in both computer vision and consumer pho-
Auto-ReFocusing (ARF). tography encouraged us to research on the similar issue for light
In order to gather more light, reduce exposure period and en- field cameras, the detection-based ARF. Actually, the detection-
hance Signal-to-Noise Ratio (SNR), cameras for high quality image based ARF is equal to the ARF, since the focusing are meaningful
acquisition are equipped with main-lens of large aperture. How- only if it orients to focus at the valid interested objects. Thus, the
ever, the depth of field (DOF) is remarkably narrowed as the side- detection-based ARF and the ARF are not distinguished in the rest
effect of large aperture main-lens. Such narrowed DOF exacerbates of this paper.
the difficulties of accurate focusing, since slightly unfocusing may A demonstration of the ARF is shown in Fig. 1. The faces and 2D
lead to unacceptable defocus blur. AF actively or passively senses barcode are set as interested objects in this light field image. What
the depth of interested objects and adjusts lens to accurately fo- is the purpose of ARF is to render high-quality images refocused
cus on them, which plays a vital role in capturing high quality im- precisely at the faces and the 2D barcode, respectively.
It is meaningless to discuss ARF without considering its effi-
ciency, since an Exhaustive-Search ARF (ES-ARF) scheme can be
easily achieved via searching the entire depth of object space.
∗
Corresponding author. Unfortunately, the ES-ARF approach is computationally expensive,
E-mail addresses: chi.zhang@ia.ac.cn (C. Zhang), gqhou@nlpr.ia.ac.cn (G. Hou), since the complexity of digital refocusing is O(n4 ). Even though
zhaoxiang.zhang@ia.ac.cn (Z. Zhang), znsun@nlpr.ia.ac.cn (Z. Sun), tnt@nlpr.ia.ac.cn the Fourier slice refocusing algorithm can promise a complexity
(T. Tan).
https://doi.org/10.1016/j.patcog.2018.03.020
0031-3203/© 2018 Elsevier Ltd. All rights reserved.
C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 177
Fig. 1. Demonstration of detection-based auto-refocusing (ARF). The ARF algorithms are requested to automatically render images well-focused at predefined objects.
of O(n2 log n), it can not save time unless the angular resolution as well as the application for iris recognition. Section 5 concludes
is higher than 16 [5]. this paper.
In this paper, the refocusing operation is considered as an el-
ement operation, O(1), then the computational complexity of the 2. Background
ES-ARF approach is O(n), where n is determined by the required
density of refocusing slices. The ES-ARF commonly requests too Light-field cameras are capable of recording positions and di-
much computing capacity, which hinders it to be executed on lim- rections of rays from scenes, which adopts integral photography as
ited resource devices. the basic principle [7]. Light field photography allows a much more
This paper presents an efficient ARF solution for light-field cam- free photography style, and is expected to solve the imaging is-
eras based on modeling a refocusing point spread function (R-PSF). sues, such as depth extension, low-illumination, accurate-focusing,
The R-PSF holds a simple linear relationship between the blurri- HDR imaging, multi-spectral imaging, depth-awareness etc. Thus, it
ness and the refocusing depth, which can significantly reduce the has gained increasing attentions [8–16]. It was predicted that most
searching space of ARF from the entire refocusing space to just consumer photographic cameras will be light-field cameras in 20
two optimal-focusing candidates via an absolute blurriness mea- years [17].
sure (ABM), as shown in Fig. 2. Light-field cameras can dramatically extend the DOF [8,18],
The main contributions of this paper include: (1) introducing an which benefits many computer vision applications. Raghaven-
efficient ARF framework based on accurate estimation of R-PSF; (2) dra et al. [10] and Raja et al. [11] captured a face database and
modelling the R-PSF and finding the linear relationship between an iris database using a Lytro camera respectively. The extended
refocusing depth and defocus blurriness in refocusing rendering; DOF by the Lytro camera improves performance of detection and
(3) constructing an absolute blurriness measure; (4) implementing recognition of iris and faces. Zhang et al. [16] developed an iris
an efficient ARF algorithm and evaluating the algorithm on four imaging system with a specially designed light-field camera and
datasets; (5) applying the proposed ARF algorithm to iris recogni- verified its superiority for resolving the trade-off between aperture
tion and quantizing its effectiveness and robustness via recognition size and DOF. However, all of these [10,11,16] have to render an
scores. refocused image sequence and then select the optimal one from
This paper extends our previous work [6] by (1) verifying the it. Guo et al. [12] achieve a barcode reading system using a Lytro
versatility of the proposed ARF algorithm that was used for iris camera. They compute the optimal refocusing depth via measuring
imaging; (2) optimizing the ARF algorithms on the CPU+GPU plat- the variation of texture along a fixed direction in micro-lens sub-
form; (3) proposing an efficient absolute blurriness measure (ABM) images and rendering the best refocused frame for barcode read-
that achieves a significant decrease in executing time by more ing. However, Guo et al.’s scheme cannot extend to refocus objects
than an order of magnitude; (4) introducing two novel light-field with complex textures, since its scheme heavily depends on the
datasets (QR-Code dataset and Face dataset) and a new refocus- special texture of 1D barcode.
ing performance index (Right-Refocusing Rate, RRR) to evaluate the The schemes used in the optical AF can not be directly applied
ARF algorithms; (5) updating the iris recognition scores by using to the ARF, although they are similar problems. In the literature of
the new ARF algorithm proposed in this paper. optics [4,19–21], the AF can be achieved by either active [22] or
The rest of this paper is organized as follows. Section 2 de- passive sensing [23]. In active sensing, the infrared light or ultra-
scribes background and related techniques. Section 3 presents the sound signal is actively emitted from the camera to detect the
technical details of the proposed ARF scheme and its derivation. depth of interested objects. The focal length is then set from a
Section 4 shows the experimental results on 4 light field datasets lookup table depending on that depth. The most popular passive
178 C. Zhang et al. / Pattern Recognition 81 (2018) 176–189
Fig. 2. (a) shows the pipeline of auto-refocusing by exhaustive searching, and (b) is the pipeline of proposed auto-refocusing scheme. The proposed auto-refocusing scheme
can reduce the searching space of refocusing from entire depth space to only three depths, and hence improve efficiency significantly.
AF systems are based on contrast or sharpness assessment, where The second point is how to form an absolute blurriness mea-
the sharpness of the Region Of Interest (ROI) is used to iteratively sure (ABM) crossing the variations of texture of interested objects.
alter the focal length. The passive AF is essentially similar with The focus measures used for AF [20,28–31] can be considered as
the ES-ARF algorithm, a time-consuming strategy discussed above. the relative blurriness measures (RBMs). Technically, focus mea-
Meanwhile, the active sensing enlightens us that if the depth of in- sures are inversely proportional to relative blurriness measures. Al-
terested objects has been estimated, the computational complexity though most of these measures would robustly output a monotonic
of ARF can be decreased to O(1). assessment when the image becomes sharper or blurrier and con-
Actually, both AF and ARF are ultimate problems of depth esti- verge to a peak when the image is well focused, they hardly give
mation. Although light-field cameras offer an impressive ability for an absolute blurriness measure independent of the image content.
depth estimation [7–9,24–26], the explicit depth estimation is not To solve this issue, we turned to the more boarder area, image
suitable for ARF. Depth estimation is also a time-consuming proce- quality assessment.
dure. Furthermore, most depth estimation algorithms are based on Image quality assessment (IQA) commonly divides into three
epipolar geometry which cannot achieve a robust estimation when categories, i.e. Full-reference (FR), Reduced-reference (RR) and No-
the surface of the objects cannot be modeled as the Lambert sur- reference (NR), based on the amount of information of undistorted
face, e.g. the surface of human iris. The defocus blur presents a image provided to the algorithm [32–35]. All RBMs can be consid-
robust cue of depth [27,28], which inspired us to determine the ered as FR–IQA, images of the same scene for assessing blurriness
optimal refocusing depth via defocus blur. are mutual referents. The ABM needs to uniformly assess blurri-
There are three key points for achieving ARF in our scheme: ness without reference, so we resorted to NR–IQA. Now, most of
The first point is how to reduce the space for searching the the state-of-the-art NR–IQAs are based on the regularity of natu-
optimal refocused image, which heavily depends on the imaging ral scene statistics (NSS) [33,36–38]. Mittal et al. [37] propose a
model of light field cameras. We discussed the ARF based on Ngs Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) that
light field camera model [1] and derive the R-PSF to model the utilizes an NSS model of locally normalized luminance coefficients
refocusing rendering. The R-PSF holds a simple linear relationship and operate directly on the spatial pixel data for promoting effi-
between the refocusing depth and the defocus blurriness and it can ciency. It is proved to be more accurate and efficient than other
help to significantly reduce the searching space.
C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 179
The optimal refocusing depth β 0 can be used to render images ac- We derived the R-PSF based on the Ng’s model of light field
curately refocused at the interested object. cameras [1]. Moreover, the same assumptions in [42] are adopted,
i.e. the main lens are modeled as a thin lens, and the lenselets
plane as an array of pinholes. Light-field cameras can be modeled
3.1. The ARF framework as shown in Fig. 3 in this paper. Assuming that a pointolite is set in
S, as shown in Fig. 4, its image is focused on the microlens array
The ARF in essence can be abstracted as an inverse problem to plane, or the image plane. The image distance is F. LSF represents
estimate the optimal refocusing depth β 0 from a set of observa- the light field parameterized at the image distance of F and illu-
tions gβi . The β 0 leads the σ h (β ) to arrive at the minimum, and minated by a pointolite set at S. Thus, the LSF can be modeled as
hence the h(σ h ) gets close to a Dirac function.
Thus, according to the model of Eq. (1), a set of images refo-
exp − 2uσu2 , ∀x = x0
T
1
cused at arbitrary depth β i are rendered to estimate σ h . Then the LSF ( x, u ) = 2π σr2 r , (5)
optimal refocusing depth can be obtained by computing the mini- 0, ∀x = x 0
mum of σ h .
Thus the proposed framework firstly calculates the samples’ σhi where u = (u1 , u2 )T , represents the angular dimension; x =
via (x1 , x2 )T represents the spatial dimension; σ r is a constant, once
the optical parameters have been ascertained; x0 is the position of
σhi (βi ) = ABM (gβi [x] ), i = 1, . . . , n, (3)
the image of the pointolite S. Since the pointolite is well focused
where ABM is an absolute blurriness measure insensitive to image on the image plane, it is possible to model its disc of confusion at
content p, σhi (βi ) denotes the observed blurriness, n is determined the sensor plane as a Gaussian distribution ideally. As well known,
by the number of indeterminate parameters of σ h (β ). Then, the the refocusing integration [1] is shown as
optimal refocusing depth β 0 can be estimated via minimizing the 1
1
object equation, α [L] ( α xα ) = L u 1− + xα , u du, (6)
α2F 2 α
n
σh (βi ) − σh (βi )2 . where α is the relative image distance of the virtual image plane,
β0 = arg min σh (β0 ) + λ (4)
i 2 α [L] is the refocusing operator which represents refocusing the
i=1
light field L at the relative image distance α . The xα denotes the
The first term of right side of Eq. (4) is to ensure that β 0 is the coordinate of the virtual image plane at the relative image distance
minimum of σ h (β ), and the second term guarantees the precision α.
in estimation of σ h (β ). λ is a balance factor. We used the point spread function (PSF) to describe the blur-
It can be inferred that the model of R-PSF and the ABM are two riness generated by refocusing. The PSF is defined as the intensity
major issues in the proposed ARF framework. distribution of the defocused spot caused by a pointolite. In order
180 C. Zhang et al. / Pattern Recognition 81 (2018) 176–189
to calculate the R-PSF, the substitution is used as followed, As shown in Eq. (19), the PSF of pointolite S0 will shrink to a
1
Dirichlet function when α = α0 , which is the optimal solution of
x=u 1− + xα (7) Eq. (4). Let β0 = α0−1 and β = α −1 , thus the variance σ h (β ) of PSF
α
can be modeled as
It is possible to calculate the α [LSF ](α xα ) as:
1
σh (β ) = β · σr , β = |β − β0 |. (20)
α [LSF ](α xα ) =
1 2
4π α 2 F 2 ( 1 − α ) σr2 Note that, in Eq. (20) there is a linear relationship between the
refocusing depth shift β and the defocus blurriness σ h (β ). Such
(x0 − xα )T (x0 − xα ) a simple linear relationship enables to recognize the refocus shift
× exp (8)
2 β by rendering only one refocusing image. Then, the β 0 can be
2(1 − α1 ) σr2
obtained by simply comparing the relative sharpness between the
To eliminate the changed scale caused by refocusing, An images refocused at β + β and β − β . The relative sharpness
integration-invariant resize operator is defined as followed, measure can be any monotonic blurriness measures used in AF
[29].
Sη [I (x )] = η2 I (ηx ), (9)
In implementing the ARF algorithm, β can completely replace
Then the R-PSF of S can be represented as, α , although α has an intuitive physical meaning. So β is referred
hSα (x ) = Sα −1 α [LSF ](α xα ) . (10) as “refocusing depth” in the rest of this paper.
0, ∀x = x 0
MD1 (i, j ) = Iˆ(i, j )Iˆ(i + 1, j + 1),
where
MD2 (i, j ) = Iˆ(i − 1, j − 1)Iˆ(i + 1, j + 1) (23)
f − α0 F
x0 = x0 , (15)
f −F
where f is the focal length of the main lens. Since the captured SD1 (i, j ) = Iˆ(i, j )Iˆ(i + 1, j − 1),
light field is re-parameterized at α 0 F. The image refocused at the
relative image distance α from original light field can be achieved
SD2 (i, j ) = Iˆ(i − 1, j − 1)Iˆ(i + 1, j − 1) (24)
by refocusing the re-parameterized light field at α , where α = α · for i ∈ {1, 2, . . . , M} and j ∈ {1, 2, . . . , N}. The histograms of paired
α0 via products of first and second neighboring pixels along the horizon
α [LSF0 ](α xα ) = α [LSα00 ·F ](α xα ). (16) orientation are plotted in Fig. 5. It can be infer from the Fig. 5 that
the paired products of second order neighboring pixels contain the
Thus, the R-PSF of pointolite S0 can be calculated from
extra cues for boosting accuracy of blurriness measure.
hSα0 (x ) = Sα −1 α [LSα00 ·F ](α xα ) . (17) The asymmetric generalized Gaussian distribution (AGGD)
model is adopt to fit the distribution of the statistical relation-
The PSF of the pointolite S0 refocused at image distance α F can be
ships between neighboring pixels [37]. The AGGD with zero mode
derived as
T is given by:
1 x0 − x x0 − x
hα ( x ) =
S0
exp − , (18) f x; v, σl2 , σr2
2π F 2 σα2 2σα2 ⎧ v
⎪
⎪ v exp − −x ,x < 0
where ⎨ ( β l + β r ) ( v )
1 β
l v
1
1 2
= v exp − −x ,x 0
(25)
⎪
⎩ ( β l + β r ) ( v )
⎪
σα2 = − · σr2 . (19)
1 βr
α0 α
C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 181
Fig. 5. Histograms of paired products of first order (a) and second order (b) neighboring pixels along the horizon orientation.
where with the weights indicating the correlation between features and
blurriness. The weights can be computed via solving the Eq. (28).
1v 1v Secondly, the L1 norm regularization is known for generaliza-
βl = σl 3 , βr = σr (26)
v 3v tion and noise-proof ability [43]. For boosting the generalization,
we apply the L1 norm regularization rather than directly solve the
The shape parameter controls the shape of the distribution while least square version.
σl2 and σr2 are scale parameters that control the spread on each Thirdly, the proposed ABM is expected to measure the blurri-
side of the mode, respectively. The parameters η, v, σl2 , σr2 of the ness of refocused images across larger appearance variation. We
best AGGD fit are extracted where η is given by: proved in Section 4 that the BRISQUE can be affected by appear-
2 ance variance and its accuracy decreases for simultaneously evalu-
ating blurriness of different objects, e.g. face and 2D barcode in the
η = (βr − βl ) 1v (27)
v experiment. The weights from Lasso regression solution enhance
the features related to blurriness and waken those related to ap-
Thus for each paired product, 32 parameters are computed. pearance.
All features discussed above are extracted at two scales, i.e. an In solving the lasso regression, the λ is a factor for adjusting
original image scale and a down-sample image scale (low pass fil- the balance between the generalization performance and the ac-
tered and downsampled by a factor of 2). To increasing the number curacy of regression. The lager λ will give a sparse solution of w,
of scales beyond 2 is observed not to enhance performance much. which is prefer in deal with high dimension data as it selects most
Thus, there are 68 features extracted for assessing blurriness. representative features. It also decreases the accuracy of regression
Instead of sending these features into regression algorithm di- since a larger bias from the least squared solution. The smaller λ is
rectly, we proved that a feature weighting scheme can further im- also undesired, as it degrades to the least square solution and leads
prove accuracy and universality of blurriness assessment. In gen- to overfitting on the training set. Experiments prove that λ = 0.01
eral, feature selection is able to improve the accuracy of regres- is appropriate.
sion, reduce the interference of noisy and redundant features as Finally, a regression model for assessing blurriness from
well as optimize computing efficiency. However, reducing the 68 weighted features can be trained by two representative machine
features into a much lower feature space, e.g. lower than 30, learning model: (1) support vector machine regression (SVM-R)
would damage the accuracy of regression and offer a negligible ef- and (2) AdaBoosting back-propagation neural network (AB-BPNN).
ficiency
improvement. Since the features are computed by group, SVM-R is widely used in IQA tasks [33,36,37]. AB-BPNN is proved
e.g. η, v, σl2 , σr2 cannot be removed independently. effective in IQA recently [38]. We applied the libSVM package to
The weights vector is learned from the training set in which the implement the SVM regression algorithm [44]. The radial basis
blurriness of images is labeled via solving the lasso regression as function (RBF) kernel is adopted in this paper. We implemented
follow: AB-BPNN based on the OG-IQA package. The parameters of both
the SVM regression and the AB-BPNN are estimated using cross-
w = arg min Aw − β2 + λw1
2
(28)
validation on the training set.
Where A is a m × n matrix of training instance, m is the dimen-
sion of training samples, n is the dimension of features. β is the
label vector. •1 denotes L1-norm. Such lasso regression is a reg-
3.4. The ARF algorithm
ularized version of least squares regression, which uses the con-
straint the L1-norm of the w vector. In lasso regression, increasing
The proposed ARF algorithm is demonstrated in Algorithm 1.
the penalty factor λ will cause more and more of the parameters
The pipeline of processing the light-field image is shown in Fig. 6,
to be driven to zero, thus it tends to be applied to feature select-
which is based on the proposed ARF algorithm.
ing.
We adopted lasso regression for three reasons:
Firstly, the BRISQUE is designed for assessing the quality of dis-
torted images across multiple distortion categories so that its fea- 4. Experiments
tures are redundant for assessing Gaussian blurriness. To increase
the correlation of these features with blurriness and remove the We did experiments on four datasets to evaluate the perfor-
influence of their redundancy, the features should be modulated mance of the proposed ARF.
182 C. Zhang et al. / Pattern Recognition 81 (2018) 176–189
Fig. 6. Flowchart of processing light-field face image based via the proposed ARF algorithm.
Algorithm 1 An algorithm for auto-refocusing. We used a Lytro illum camera, and set its focal length of main-
lens at 240 mm and aperture size at f/2. In each image, two faces
Input: LF (u, x )
are arranged at different depths. There are 6 candidate positions
1. render the initial image gβI [x] at arbitrary depth of βI .
ranged from 2.50 m to 7.50 m with an incremental step of 1m.
2. compute the σI via σI (βI ) = ABM gβ I [x] by trained SVR regres-
Thus, every 2 faces can generate 30 images for all permutations.
sion model.
The dataset totally has 450 light field images of faces. The scene
3. compute the β = σI · σr−1
for capturing the face dataset is exhibited in Fig. 8.
4. render two candidates optimal images g(βI +β ) [x] and
g(βI −β ) [x]. 4.1.3. Iris dataset
5. determine the optimal
refocusing
depth β0 via Finally, we applied the proposed ARF algorithm to iris recogni-
β0 = arg min RBM g(βI +β ) , RBM g(βI −β ) tion. we evaluated the quality of ARF images via the performance
Output: β0 and gβ 0 [x]. of iris recognition. Indeed, ARF for iris imaging is a quite challeng-
ing and convincing task, because the iris recognition is known as a
texture sharpness-demanding application.
The dataset in [16] is adopted to verify our ARF algorithm. In
4.1. Datasets acquisition
this database, 14 subjects participate in collection of light-field iris
images. The distance between the iris and the light-field camera is
We built four representative datasets for testing the proposed
continuously varied. This database includes over 20 0 0 iris lenselet
ARF algorithm. Thay are a QR-code dataset, a face dataset, an iris
images. A close-up view of a sample is shown in Fig. 9. Accuracy
dataset and a blended dataset.
of iris recognition is used to evaluate the performance of the pro-
posed ARF algorithm.
4.1.1. QR-code dataset
Barcode scanners are fundamentally low-cost cameras [12], and 4.1.4. Blended dataset
are limited by well-known tradeoff between noise and blur. Bar- ARF is expected to adapt to multiple objects. Thus, it is neces-
code reading can represents a series of mark-reading tasks, i.e. sary to evaluate the ABM on multiple objects, especially the objects
brand, badge, traffic sign, auto plate, etc., which encourages us to with entirely different appearance. In the experiment, a dataset
choose barcode for evaluation. We captured a light-field QR-code blended the QR-code dataset and the faces dataset is used to eval-
(abbreviated from Quick Response Code) dataset for evaluation of uate the general applicability of proposed ABM model.
the proposed ARF algorithm. Comparing to 1D barcode selected by
Guo et al. [12], QR-code has much more complex texture and thus 4.2. Preprocessing
can not be solved by scheme proposed in [12].
This QR-code dataset is captured by a Lytro illum camera, the The 2D raw lenselet image should be firstly de-
focal length of main-lens is set at 200 mm and aperture size is coded to form a 4D light-field representation. We adopted
fixed at f/2. The QR-code is positioned at 950 mm to 1750 mm Dansereau et al. [42] LFtoolbox0.3 for decoding the light field im-
from the camera with a incremental step of 100 mm. The camera age captured by the Lytro illum camera. As mentioned above, the
is adjusted to focus at 1250 mm. Qr-codes are generate by ZXing raw images in the QR-code dataset and the face dataset is decoded
API [39]. The QR-code is rotated at 0, 90, 180, 270, for augmenting into the 4D light fields with resolution of 15 × 15 × 434 × 625.
the variance. Finally, we totally captured 540 light field images of We developed tools for light field iris acquisition. The raw light-
QR-codes for this dataset. The QR-code imaging installation and a field images are decoded into 4D light fields with resolution of
sub-aperture image are shown in Fig. 7. 9 × 9 × 403 × 268. Like in our previous version [6], The raw 4D
light fields are interpolated to increase the spatial resolution
4.1.2. Face dataset with factor 2, e.g. the resolution of iris light-field images is
Note that the great majority of pictures taken by consump- 9 × 9 × 806 × 536.
tive cameras or cell-phone cameras are of human or humans As discussed above, we applied the center-aperture image
faces. Conventional camera manufacturers have started introducing of a light field to localize interested objects. The ZXing tool-
a face-priority AF feature which detects faces in scene and focuses box [39], Viola-Jones face detector [40] and He’s iris localizing
at faces area. The face-priority ARF is still expected for light field scheme [41] are applied to localize the QR-code region, faces and
camera. Thus, we constructed a multi-face light field dataset ori- iris respectively. The localizing results can be shared with all refo-
ented to rendering high quality images of faces positioned at dif- cused images rendered by the same light field. Notice that the sub-
ferent depth. aperture image can be considered as imaging the lights though a
C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 183
Fig. 7. (a) shows the installation of capturing QR-code dataset, and (b) is a sub-aperture image of a light field.
Fig. 8. (a) shows the installation of capturing faces dataset, and (b) is a sub-aperture image of a light field.
Fig. 10. (a)–(d) compare the cumulative error distribution curves among the related blurriness measures evaluated on the QR-code dataset, the face dataset, the iris dataset
and the blended dataset, respectively.
Method QR-code Faces Iris Blend It is necessary to evaluate quality of refocused images rendered
(1) RBM [20] 0.9272 0.9015 0.9154 0.8075 by the proposed ARF scheme. We compared the images rendered
(2) DIIVINE [33] 0.9484 0.9440 0.9732 0.9471 by proposed-ARF algorithm and the images exhaustively searched
(3) Zhang14 [6] 0.9630 0.9719 0.9733 0.9579 (ES-ARF) from entirely refocusing depth space.
(4) BRISQUE [37] 0.9679 0.9610 0.9659 0.9479 We first evaluated the qualitative results. The images in
(5) OG-IQA [38] 0.9692 0.9702 0.9687 0.9585
Figs. 11–13 show the comparison of the initial images, proposed
(6) WF+SVM-R 0.9785 0.9862 0.9610 0.9802
(7) WF+AB-BPNN 0.9805 0.9859 0.9726 0.9822 ARF images and ES-ARF images. The closeup views are offered for
better observation.
In addition, we evaluated the quantity performance of ARF. The
optimal refocused image selected by Average Opinions (AO) is con-
sidered as ground truth. The Structural Similarity Index Measure-
ment (SSIM) is applied to evaluating the optimal refocused images
(2) The both schemes of Zhang14, WF+SVM-R and WF+AB-BPNN
rendered by ARF. ES-ARF can robustly select the optimal refocused
using the weighted features generally obtain better perfor-
images, thus its performance is considered as the baseline. The cu-
mance than the schemes using unweighted features, DIIVINE
mulative SSIM distribution curves are shown in Fig. 14
and BRISQUE. Notice that the weighted features can effectively
In the experiment, the SSIM of optimal refocused images se-
improve performance on blurriness assessment.
lected by ARF-ES distributes from 0.95 to 1.00 referred by images
(3) There is a noticeable degradation of accuracy, when BRISQUE
selected by AO, as shown in Fig. 14. Also, it is hard to discern the
is evaluated on the blended dataset. Such degradation proves
difference of two images by human vision, if the SSIM of them is
that BRISQUE can be interfered by the appearance variation of
larger than 0.95. Thus, we assumed that images can be considered
interested objects. Meanwhile, the weights learned from lasso
as rightly refocusing, if the SSIM of the images compared to ground
regression can weaken the interference of the appearance varia-
truth is larger than 0.95.
tion and assist the WF+SVM-R and WF+AB-BPNN in performing
In addition, we defined a new index to quantitatively evaluate
better than other regression schemes in blended dataset.
the performance of ARF algorithms, named Right-Refocusing Rate
(4) The WF+AB-BPNN performs slightly better than the WF+SVM-
(RRR). The RRR defined as the percentage rate of rightly refocusing
R, since it is driven by a more powerful regressor. However, the
images among total rendered images. The RRRs computed on the
trained AB-BPNN has 10 neural networks as its weak regressor
four dataset are shown in Table 2.
and takes over 20 times longer time to assess a query image
In the meanwhile, the initial images are rendered at entire
than SVM-R. Since the efficiency has a higher priority in im-
depth range for verifying the effectiveness and robustness of the
plementing an ARF algorithm, the WF+SVM-R is chosen as the
proposed ARF algorithm. The SSIM distribution and RRR of the ini-
ABM in the following experiments.
C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 185
Table 2 tial image set is shown in Fig. 14 and Table 2 as a reference for
Refocused image quality assessment. comparing the differences of image quality.
Method QR-code Faces Iris Blend From Fig. 14 and Table 2, it is convincible to conclude that the
(1) Init 0.2469 0.2375 0.2381 0.2465
proposed ARF algorithm can effectively render images refocused at
(2) Proposed ARF 0.9468 0.9376 0.9720 0.9410 interested objects. Since the lowest RRR of image sets rendered by
(3) ES-ARF 0.9907 1.0 0 0 0 0.9964 0.9975 ARF is over 0.93 among the four datasets, while, the RRR of initial
186 C. Zhang et al. / Pattern Recognition 81 (2018) 176–189
Fig. 14. (a)–(d) are the cumulative SSIM distribution curves of rendered image sets evaluated on the QR-code dataset, the face dataset, the iris dataset and the blended
dataset, respectively. INIT denotes the set of initial image, ARF denote the set of image rendered by the proposed ARF scheme and ES denotes the set of image rendered by
ES-ARF scheme.
Table 3 Table 4
Comparison of executing time of the ARF algorithms. The performance of iris Recognition. The iris image set corresponding the larger DI
andsmaller EER contains the sharper iris images.
Method Executing time (ms) Handler
Method DI EER
(1) ES-ARF(806 × 536) 170 0 0+ CPU
(2) ES-ARF(806 × 536) 1503 CPU+GPU (1)IRII 2.6981 0.0324
(3) ARF [6](806 × 536) 2966 CPU (2)ORII-AO 4.0305 0.0084
(3) ARF [6](806 × 536) 933 CPU+GPU (3)Proposed ARF-βI = 1.0 0 0 4.0224 0.0081
(4) The proposed ARF(806 × 536) 146 CPU+GPU (4)Proposed ARF-βI = random 4.0635 0.0083
(5)ORII-Raja [11] 4.0140 0.0076
Fig. 15. (a)–(f) illustrate the sample images refocused at β = 1.0 0 0 (IRII). (g)-(l) display the images rendered by the proposed ARF algorithm with initial images of (a)–(f)
(ARF-βI = 1.0 0 0).
(4) The method (ORII-Raja) has an ignorable superior performance [13] R. Raghavendra, K.B. Raja, C. Busch, Presentation attack detection for face
to the methods (ARF-βI = 1.0 0 0) and (ARF-βI = random), at the recognition using light field camera, IEEE Trans. Image Process. 24 (3) (2015)
1060–1075.
cost of much more computing time. [14] A. Ghasemi, M. Vetterli, Detecting planar surface using a light-field camera
with application to distinguishing real scenes from printed photos, in: Pro-
5. Conclusions ceedings of the IEEE International Conference on Acoustics, Speech and Signal,
2014, pp. 4588–4592, doi:10.1109/ICASSP.2014.6854471.
[15] A. Ghasemi, M. Vetterli, Scale-invariant representation of light field images for
In this paper, we presented an efficient solution of the ARF that object recognition and tracking, in: Proceedings of the IS&T/SPIE Electronic
is a basic feature for light field cameras. We introduced an ARF Imaging, International Society for Optics and Photonics, 2014, p. 902015.
[16] C. Zhang, G. Hou, Z. Sun, T. Tan, Z. Zhou, Light field photography for iris image
framework based on modeling the R-PSF and found a simple lin-
acquisition, in: Proceedings of the Chinese Conference on Biometric Recogni-
ear relationship in the R-PSF. This linear relationship simplifies the tion, 2013, pp. 345–352.
computational complexity of the ARF and enables to build an effi- [17] M. Levoy, Light fields and computational imaging, IEEE Comput. 39 (8) (2006)
46–55.
cient ARF algorithm that estimates the optimal refocusing depth
[18] T. Georgiev, A. Lumsdaine, Depth of field in plenoptic cameras, in: Proceedings
from only one refocused image via a proposed ABM. We tested of the Eurographics, 2009.
the proposed ARF algorithm on four datasets as well as applied [19] C.M. Chen, C.M. Hong, H.C. Chuang, Efficient auto-focus algorithm utilizing dis-
it to the iris imaging tasks. The experimental results show that the crete difference equation prediction model for digital still cameras, IEEE Trans.
Consum. Electron. 52 (4) (2006) 1135–1143.
proposed ARF algorithm significantly decreases executing time by [20] M. Subbarao, T.S. Choi, A. Nikzad, Focusing techniques, Opt. Eng. 32 (11) (1993)
more than an order of magnitude comparing to the current “ren- 2824–2836.
dering and selection” solutions. Meanwhile, the proposed ARF al- [21] E. Krotkov, Focusing, Int. J. Comput. Vis. 1 (3) (1988) 223–237.
[22] N. Kehtarnavaz, H.J. Oh, Development and real-time implementation of a
gorithm achieves a comparable performance in accuracy and ro- rule-based auto-focus algorithm, Real Time Imaging 9 (3) (2003) 197–203.
bustness. [23] J.H. Lee, K.S. Kim, B.D. Nam, J.C. Lee, Y.M. Kwon, H.G. Kim, Implementation
In future, we would like to implement and evaluate the pro- of a passive automatic focusing algorithm for digital still camera, IEEE Trans.
Consum. Electron. 41 (3) (1995) 449–454, doi:10.1109/30.468047.
posed ARF algorithm on a large-scale dataset with a variety of [24] M.W. Tao, S. Hadap, J. Malik, R. Ramamoorthi, Depth from combining defocus
interested objects that is unavailable now. At the same time, we and correspondence using light-field cameras, in: Proceedings of the IEEE In-
will replace the SVM-R model with some more powerful machine ternational Conference on Computer Vision, IEEE, 2013, pp. 673–680.
[25] H.G. Jeon, J. Park, G. Choe, J. Park, Y. Bok, Y.W. Tai, I.S. Kweon, Accurate depth
learning models, e.g. deep learning models, to benefit from the
map estimation from a lenslet light field camera, in: Proceedings of the IEEE
large-scale data. Conference on Computer Vision and Pattern Recognition, 2015, pp. 1547–1555.
[26] H. Sheng, P. Zhao, S. Zhang, J. Zhang, D. Yang, H. Sheng, P. Zhao, S. Zhang,
J. Zhang, D. Yang, Occlusion-aware depth estimation for light field using mul-
Acknowledgment
ti-orientation EPIs, Pattern Recognit. 74 (2018) 587–599.
[27] P. Favaro, S. Soatto, A geometric approach to shape from defocus, IEEE Trans.
The authors would like to thank the associate editors and the Pattern Anal. Mach. Intel. 27 (3) (2005) 406–417.
reviewers for their valuable comments and advices. This work is [28] S.K. Nayar, Y. Nakagawa, Shape from focus, IEEE Trans. Pattern Anal. Mach. In-
tel. 16 (8) (1994) 824–831.
funded by National Natural Science Foundation of China (Grant [29] S. Pertuz, D. Puig, M.A. Garcia, Analysis of focus measure operators for
Nos. 61602481, 61573360) and National Natural Science Foundation shape-from-focus, Pattern Recognit 46 (2013) 1415–1432.
of China Major Instrument Special Fund (Grant No. 61427811). This [30] C. Zhou, D. Miau, S.K. Nayar, Focal sweep camera for space-time refocusing,
[31] A. Kumar, N. Ahuja, A Generative Focus Measure with Application to Omnifo-
work is supported by the Open Research Fund of Key Laboratory of cus Imaging, in: Proceedings of the IEEE International Conference on Compu-
Spectral Imaging Technology, Chinese Academy of Science. tational Photography, 2013, pp. 1–8, doi:10.1109/ICCPhot.2013.6528295.
[32] Z. Wang, A.C. Bovik, Modern image quality assessment, Synth. Lect. Image
References Video Multimed. Proces. 2 (1) (2006) 1–156.
[33] A.K. Moorthy, A.C. Bovik, Blind image quality assessment: from natural scene
statistics to perceptual quality, IEEE Trans. Image Process. 20 (12) (2011)
[1] R. Ng, Digital light field photography, 2006 Ph.D. thesis.
3350–3364.
[2] G. Guo, M. Jones, P. Beardsley, A system for automatic iris capturing, Mitsubishi
[34] K.H. Thung, R. Paramesran, C.L. Lim, Content-based image quality metric using
Electric Research Laboratories TR2005-044, 2005.
similarity measure of moment vectors, Pattern Recognit. 45 (2012) 2193–2204.
[3] W. Dong, Z. Sun, T. Tan, A design of iris recognition system at a distance, in:
[35] W. Sun, F. Zhou, Q. Liao, Mdid: a multiply distorted image database for image
Proceedings of the Chinese Conference on Pattern Recognition, 2009, pp. 1–5,
quality assessment, Pattern Recognit. 61 (2017) 153–168.
doi:10.1109/CCPR.2009.5344030.
[36] M. Saad, A. Bovik, C. Charrier, Blind image quality assessment: a natural scene
[4] M. Rahman, N. Kehtarnavaz, Real-time face-priority auto focus for digital and
statistics approach in the DCT domain, IEEE Trans. Image Process. 21 (8) (2012)
cell-phone cameras, IEEE Trans. Consum. Electron. 54 (4) (2008) 1506–1513.
3339–3352, doi:10.1109/TIP.2012.2191563.
[5] R. Ng, Fourier Slice Photography, in: Proceedings of the ACM Transactions on
[37] A. Mittal, A. Moorthy, A. Bovik, No-reference image quality assessment in the
Graphics, 24, ACM, 2005, pp. 735–744.
spatial domain, IEEE Trans. Image Process. 21 (12) (2012) 4695–4708, doi:10.
[6] C. Zhang, G. Hou, Z. Sun, T. Tan, Efficient auto-refocusing of iris images for
1109/TIP.2012.2214050.
light-field cameras, in: Proceedings of the IEEE International Joint Conference
[38] L. Liu, Y. Hua, Q. Zhao, H. Huang, A.C. Bovik, Blind image quality assessment
on Biometrics, 2014, pp. 1–7, doi:10.1109/BTAS.2014.6996295.
by relative gradient statistics and adaboosting neural network, Signal Process.
[7] E.H. Adelson, J.Y. Wang, Single lens stereo with a plenoptic camera, IEEE Trans.
Image Commun. 40 (2016) 1–15.
Pattern Anal. Mach. Intel. 14 (2) (1992) 99–106.
[39] https://github.com/micjahn/ZXing.Net.1
[8] T.E. Bishop, P. Favaro, The light field camera: extended depth of field, alias-
[40] P. Viola, M. Jones, Robust real-time face detection, Int. J. Comput. Vis. 57(2),
ing, and superresolution, IEEE Trans. Pattern Anal. Mach. Intel. 34 (5) (2012)
doi:10.1023/B:VISI.0 0 0 0 013087.49260.fb.
972–986.
[41] Z. He, T. Tan, Z. Sun, X. Qiu, Toward accurate and fast iris segmentation for iris
[9] S. Wanner, B. Goldluecke, Variational light field analysis for disparity estima-
biometrics, IEEE Trans. Pattern Anal. Mach. Intel. 31 (9) (2009) 1670–1684.
tion and super-resolution, IEEE Trans. Pattern Anal. Mach. Intel. 36(3).
[42] D.G. Dansereau, O. Pizarro, S.B. Williams, Decoding, Calibration and Rectifica-
[10] R. Raghavendra, B. Yang, K.B. Raja, C. Busch, A New Perspective face recogni-
tion for Lenselet-Based Plenoptic Cameras[C]// IEEE Conference on Computer
tion with light-field Camera, in: Proceedings of the International Conference
Vision and Pattern Recognition, IEEE Computer Society, 2013, pp. 1027–1034.
on Biometrics, 2013, pp. 1–8.
[43] P. Zhao, B. Yu, On model selection consistency of lasso, J. Mach. Learn. Res. 7
[11] K. Raja, R. Raghavendra, F. Cheikh, B. Yang, C. Busch, Robust iris recognition
(2006) 2541–2563.
using light-field camera, in: Proceedings of the Colour and Visual Computing
[44] C.C. Chang, C.J. Lin, Libsvm: a library for support vector machines, ACM Trans.
Symposium, 2013, pp. 1–6.
Intel. Syst. Technol. 2 (3) (2011) 27.
[12] X. Guo, H. Lin, Z. Yu, S.M. Closkey, Barcode imaging using a light field
[45] Z. Sun, T. Tan, Ordinal measures for iris recognition, IEEE Trans. Pattern Anal.
camera, in: L. Agapito, M.M. Bronstein, C. Rother (Eds.), Proceedings of the
Mach. Intel. 31 (12) (2009) 2211–2226.
Computer Vision - ECCV 2014 Workshops, Lecture Notes in Computer Sci-
ence, 8926, Springer International Publishing, 2015, pp. 519–532, doi:10.1007/
978- 3- 319- 16181- 5_40.
1
zxing.net
C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 189
Chi Zhang received the B.E. degree in computer science from Southwest Jiaotong University, the Ph.D. degree in computer science from University of Chinese Academy of
Sciences (UCAS) from in 2007 and 2016, respectively. He is currently an assistant professor with the Research Center of Brain-inspired Intelligence (RCBI), National Laboratory
of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), China. His research interests focus on computer vision and computational
photography.
Guangqi Hou received the Ph.D. degree in Optical Engineering from Beijing Institute of Technology (BIT), in 2011. He is currently an associate professor with the Center for
Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA),
China. His research interests focus on computational optics and computational photography.
Zhaoxiang Zhang received the B.S. degree in electronic science and technology from the University of Science and Technology of China, Hefei, China, in 2004 and the Ph.D.
degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2009. He is currently a Professor with
the Research Center of Brain-inspired Intelligence (RCBI), Institute of Automation, Chinese Academy of Sciences (CASIA). His current research interests include computer
vision, pattern recognition, machine learning, and brain-inspired neural network and brain-inspired learning. Dr. Zhang is an Associate Editor or a Guest Editor of some
internal journals, such as, Neurocomputing, Pattern Recognition Letters, and IEEE ACCESS.
Zhenan Sun received the B.E. degree in industrial automation from Dalian University of Technology, Dalian, China, the M.S. degree in system engineering from Huazhong
University of Science and Technology, Wuhan, China, and the Ph.D. degree in pattern recognition and intelligent systems from CASIA in 1999, 2002, and 2006, respectively.
He is currently a Professor with the Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR), Institute of
Automation, Chinese Academy of Sciences (CASIA), China. His current research interests include biometrics, pattern recognition, and computer vision. He is a member of the
IEEE and the IEEE Computer Society.
Tieniu Tan received the B.Sc. degree in electronic engineering from Xi’an Jiaotong University, China, in 1984, and the M.Sc. and Ph.D. degrees in electronic engineering from
Imperial College London, U.K., in 1986 and 1989, respectively. He is currently a Professor with the Center for Research on Intelligent Perception and Computing (CRIPAC),
National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA). His current research interests include biometrics, image
and video understanding, information hiding, and information forensics. He is a Fellow of IEEE and the IAPR (International Association of Pattern Recognition).