Sunteți pe pagina 1din 14

Pattern Recognition 81 (2018) 176–189

Contents lists available at ScienceDirect

Pattern Recognition
journal homepage:

Efficient auto-refocusing for light field camera

Chi Zhang, Guangqi Hou, Zhaoxiang Zhang, Zhenan Sun∗, Tieniu Tan
Research Center of Brain-inspired Intelligence, National Laboratory of Pattern Recognition, Institute of Automation, CAS Center for Excellence in Brain
Science and Intelligence Technology, Chinese Academy of Sciences, Beijing 100190, China

a r t i c l e i n f o a b s t r a c t

Article history: Computer vision tasks prefer the images focused at the related objects for a better performance, which
Received 31 January 2017 requests a Auto-ReFocusing (ARF) function for using light field cameras. However, the current ARF
Revised 9 February 2018
schemes are time-consuming in practice, because they commonly need to render an image sequence
Accepted 23 March 2018
for finding the optimally refocused frame. This paper presents an efficient ARF solution for light-field
Available online 30 March 2018
cameras based on modeling the refocusing point spread function (R-PSF). The R-PSF holds a simple linear
MSC: relationship between refocusing depth and defocus blurriness. Such a linear relationship enables to deter-
00-01 mine the two candidates of the optimally refocused frame from only one initial refocused image. Because
99-00 our method only involves three times of refocusing rendering for finding the optimally refocused frame,
it is much more efficient than the current “rendering and selection” solutions which need to render a
large number of refocused images.
Detection-based focusing © 2018 Elsevier Ltd. All rights reserved.
Blurriness measure
Light-field photography

1. Introduction ages. However, as the counterpart in light field cameras, ARF has
not been systematically investigated to the best of our knowledge.
Light field photography offers an impressive feature of render- Valid images for computer vision tasks should be focused at
ing the images refocused at user-specified object after the light the interested objects related to their applications. For example,
field image was captured [1]. This feature shows a promising po- biometric scanners are only sensitive to biometric modality [2,3],
tential for applying light field cameras to computer vision tasks, i.e. faces or iris; cameras for autonomous driving need to focus
e.g. mobile robotics, autonomous driving, biometrics, surveillance at vehicles, pedestrians and traffic signs. Even in consumer imag-
etc. In these applications, a basic request for using light field cam- ing area, the great majority of pictures are of human and humans
eras is how to automatically refocus at interested objects, e.g. faces [4], which impels face-detection-based AF to be equipped as
marks, signs, vehicles, faces, iris etc., it is essentially similar to a standard feature for most consumer cameras. The significance of
Auto-Focus (AF) in conventional cameras, so it can be named as Detection-based AF in both computer vision and consumer pho-
Auto-ReFocusing (ARF). tography encouraged us to research on the similar issue for light
In order to gather more light, reduce exposure period and en- field cameras, the detection-based ARF. Actually, the detection-
hance Signal-to-Noise Ratio (SNR), cameras for high quality image based ARF is equal to the ARF, since the focusing are meaningful
acquisition are equipped with main-lens of large aperture. How- only if it orients to focus at the valid interested objects. Thus, the
ever, the depth of field (DOF) is remarkably narrowed as the side- detection-based ARF and the ARF are not distinguished in the rest
effect of large aperture main-lens. Such narrowed DOF exacerbates of this paper.
the difficulties of accurate focusing, since slightly unfocusing may A demonstration of the ARF is shown in Fig. 1. The faces and 2D
lead to unacceptable defocus blur. AF actively or passively senses barcode are set as interested objects in this light field image. What
the depth of interested objects and adjusts lens to accurately fo- is the purpose of ARF is to render high-quality images refocused
cus on them, which plays a vital role in capturing high quality im- precisely at the faces and the 2D barcode, respectively.
It is meaningless to discuss ARF without considering its effi-
ciency, since an Exhaustive-Search ARF (ES-ARF) scheme can be
easily achieved via searching the entire depth of object space.

Corresponding author. Unfortunately, the ES-ARF approach is computationally expensive,
E-mail addresses: (C. Zhang), (G. Hou), since the complexity of digital refocusing is O(n4 ). Even though (Z. Zhang), (Z. Sun), the Fourier slice refocusing algorithm can promise a complexity
(T. Tan).
0031-3203/© 2018 Elsevier Ltd. All rights reserved.
C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 177

Fig. 1. Demonstration of detection-based auto-refocusing (ARF). The ARF algorithms are requested to automatically render images well-focused at predefined objects.

of O(n2 log n), it can not save time unless the angular resolution as well as the application for iris recognition. Section 5 concludes
is higher than 16 [5]. this paper.
In this paper, the refocusing operation is considered as an el-
ement operation, O(1), then the computational complexity of the 2. Background
ES-ARF approach is O(n), where n is determined by the required
density of refocusing slices. The ES-ARF commonly requests too Light-field cameras are capable of recording positions and di-
much computing capacity, which hinders it to be executed on lim- rections of rays from scenes, which adopts integral photography as
ited resource devices. the basic principle [7]. Light field photography allows a much more
This paper presents an efficient ARF solution for light-field cam- free photography style, and is expected to solve the imaging is-
eras based on modeling a refocusing point spread function (R-PSF). sues, such as depth extension, low-illumination, accurate-focusing,
The R-PSF holds a simple linear relationship between the blurri- HDR imaging, multi-spectral imaging, depth-awareness etc. Thus, it
ness and the refocusing depth, which can significantly reduce the has gained increasing attentions [8–16]. It was predicted that most
searching space of ARF from the entire refocusing space to just consumer photographic cameras will be light-field cameras in 20
two optimal-focusing candidates via an absolute blurriness mea- years [17].
sure (ABM), as shown in Fig. 2. Light-field cameras can dramatically extend the DOF [8,18],
The main contributions of this paper include: (1) introducing an which benefits many computer vision applications. Raghaven-
efficient ARF framework based on accurate estimation of R-PSF; (2) dra et al. [10] and Raja et al. [11] captured a face database and
modelling the R-PSF and finding the linear relationship between an iris database using a Lytro camera respectively. The extended
refocusing depth and defocus blurriness in refocusing rendering; DOF by the Lytro camera improves performance of detection and
(3) constructing an absolute blurriness measure; (4) implementing recognition of iris and faces. Zhang et al. [16] developed an iris
an efficient ARF algorithm and evaluating the algorithm on four imaging system with a specially designed light-field camera and
datasets; (5) applying the proposed ARF algorithm to iris recogni- verified its superiority for resolving the trade-off between aperture
tion and quantizing its effectiveness and robustness via recognition size and DOF. However, all of these [10,11,16] have to render an
scores. refocused image sequence and then select the optimal one from
This paper extends our previous work [6] by (1) verifying the it. Guo et al. [12] achieve a barcode reading system using a Lytro
versatility of the proposed ARF algorithm that was used for iris camera. They compute the optimal refocusing depth via measuring
imaging; (2) optimizing the ARF algorithms on the CPU+GPU plat- the variation of texture along a fixed direction in micro-lens sub-
form; (3) proposing an efficient absolute blurriness measure (ABM) images and rendering the best refocused frame for barcode read-
that achieves a significant decrease in executing time by more ing. However, Guo et al.’s scheme cannot extend to refocus objects
than an order of magnitude; (4) introducing two novel light-field with complex textures, since its scheme heavily depends on the
datasets (QR-Code dataset and Face dataset) and a new refocus- special texture of 1D barcode.
ing performance index (Right-Refocusing Rate, RRR) to evaluate the The schemes used in the optical AF can not be directly applied
ARF algorithms; (5) updating the iris recognition scores by using to the ARF, although they are similar problems. In the literature of
the new ARF algorithm proposed in this paper. optics [4,19–21], the AF can be achieved by either active [22] or
The rest of this paper is organized as follows. Section 2 de- passive sensing [23]. In active sensing, the infrared light or ultra-
scribes background and related techniques. Section 3 presents the sound signal is actively emitted from the camera to detect the
technical details of the proposed ARF scheme and its derivation. depth of interested objects. The focal length is then set from a
Section 4 shows the experimental results on 4 light field datasets lookup table depending on that depth. The most popular passive
178 C. Zhang et al. / Pattern Recognition 81 (2018) 176–189

Fig. 2. (a) shows the pipeline of auto-refocusing by exhaustive searching, and (b) is the pipeline of proposed auto-refocusing scheme. The proposed auto-refocusing scheme
can reduce the searching space of refocusing from entire depth space to only three depths, and hence improve efficiency significantly.

AF systems are based on contrast or sharpness assessment, where The second point is how to form an absolute blurriness mea-
the sharpness of the Region Of Interest (ROI) is used to iteratively sure (ABM) crossing the variations of texture of interested objects.
alter the focal length. The passive AF is essentially similar with The focus measures used for AF [20,28–31] can be considered as
the ES-ARF algorithm, a time-consuming strategy discussed above. the relative blurriness measures (RBMs). Technically, focus mea-
Meanwhile, the active sensing enlightens us that if the depth of in- sures are inversely proportional to relative blurriness measures. Al-
terested objects has been estimated, the computational complexity though most of these measures would robustly output a monotonic
of ARF can be decreased to O(1). assessment when the image becomes sharper or blurrier and con-
Actually, both AF and ARF are ultimate problems of depth esti- verge to a peak when the image is well focused, they hardly give
mation. Although light-field cameras offer an impressive ability for an absolute blurriness measure independent of the image content.
depth estimation [7–9,24–26], the explicit depth estimation is not To solve this issue, we turned to the more boarder area, image
suitable for ARF. Depth estimation is also a time-consuming proce- quality assessment.
dure. Furthermore, most depth estimation algorithms are based on Image quality assessment (IQA) commonly divides into three
epipolar geometry which cannot achieve a robust estimation when categories, i.e. Full-reference (FR), Reduced-reference (RR) and No-
the surface of the objects cannot be modeled as the Lambert sur- reference (NR), based on the amount of information of undistorted
face, e.g. the surface of human iris. The defocus blur presents a image provided to the algorithm [32–35]. All RBMs can be consid-
robust cue of depth [27,28], which inspired us to determine the ered as FR–IQA, images of the same scene for assessing blurriness
optimal refocusing depth via defocus blur. are mutual referents. The ABM needs to uniformly assess blurri-
There are three key points for achieving ARF in our scheme: ness without reference, so we resorted to NR–IQA. Now, most of
The first point is how to reduce the space for searching the the state-of-the-art NR–IQAs are based on the regularity of natu-
optimal refocused image, which heavily depends on the imaging ral scene statistics (NSS) [33,36–38]. Mittal et al. [37] propose a
model of light field cameras. We discussed the ARF based on Ngs Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE) that
light field camera model [1] and derive the R-PSF to model the utilizes an NSS model of locally normalized luminance coefficients
refocusing rendering. The R-PSF holds a simple linear relationship and operate directly on the spatial pixel data for promoting effi-
between the refocusing depth and the defocus blurriness and it can ciency. It is proved to be more accurate and efficient than other
help to significantly reduce the searching space.
C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 179

NR–IQAs, which encouraged us to design an efficient ABM based

on the BRISQUE.
The third point is how to localize the interested objects in light
field image. Raghavendra et al. [10] and Raja et al. [11] localize the
interested objects at refocused images, thus they have to render
a image sequence for searching the interested objects, and merge
the locations of all images to determine the position. As discussed,
it is a time-consuming task. A better solution [6] uses the cen-
ter sub-aperture image of light fields for localization. Although the
center sub-aperture image are aliased with low SNR, it is proved to
satisfy for localization by ordinary detection algorithms. Detection
algorithm plays an important role in ARF for localize the interested
objects. However, we did not discuss the details of detection algo- Fig. 3. The main lens is modeled as a thin lens, and the lenselets are modeled as
an array of pinholes with a similar idea to [42]. The lenselets array is fixed at the
rithm in this paper, because even those ordinary algorithms can of-
image plane. The distance between the lenselets plane and the image sensor is the
fer a satisfied accuracy for ARF. The detection algorithms involved focal length of the lenselet.
in this paper are discussed in [39–41].

3. The ARF framework and algorithm

In refocused images, blur is due to the deviation of the inter-

ested objects from current refocusing plane and it can be com-
monly modeled as

g[x] = (h(σh )  p)[x] + N [x], (1)

where  denotes a convolution operator. The p and g[x] stand for

the interested object in the all-in-focus image and in a refocus-
ing image, respectively. N is the additive noise. h(σ h ) is a refo-
cusing point spread function (R-PSF), and it is generally modeled Fig. 4. The pointolites S and S0 are focused at the optical image plane and the
as a zero-mean gaussian distribution and thus can be determined virtual image plane, respectively.
merely by its variance σ h . σ h is the function of refocusing depth,

3.2. The refocusing point spread function

σh = σh ( β ) (2)

The optimal refocusing depth β 0 can be used to render images ac- We derived the R-PSF based on the Ng’s model of light field
curately refocused at the interested object. cameras [1]. Moreover, the same assumptions in [42] are adopted,
i.e. the main lens are modeled as a thin lens, and the lenselets
plane as an array of pinholes. Light-field cameras can be modeled
3.1. The ARF framework as shown in Fig. 3 in this paper. Assuming that a pointolite is set in
S, as shown in Fig. 4, its image is focused on the microlens array
The ARF in essence can be abstracted as an inverse problem to plane, or the image plane. The image distance is F. LSF represents
estimate the optimal refocusing depth β 0 from a set of observa- the light field parameterized at the image distance of F and illu-
tions gβi . The β 0 leads the σ h (β ) to arrive at the minimum, and minated by a pointolite set at S. Thus, the LSF can be modeled as
hence the h(σ h ) gets close to a Dirac function.
Thus, according to the model of Eq. (1), a set of images refo-
exp − 2uσu2 , ∀x = x0
cused at arbitrary depth β i are rendered to estimate σ h . Then the LSF ( x, u ) = 2π σr2 r , (5)
optimal refocusing depth can be obtained by computing the mini- 0, ∀x = x 0
mum of σ h .
Thus the proposed framework firstly calculates the samples’ σhi where u = (u1 , u2 )T , represents the angular dimension; x =
via (x1 , x2 )T represents the spatial dimension; σ r is a constant, once
the optical parameters have been ascertained; x0 is the position of
σhi (βi ) = ABM (gβi [x] ), i = 1, . . . , n, (3)
the image of the pointolite S. Since the pointolite is well focused
where ABM is an absolute blurriness measure insensitive to image on the image plane, it is possible to model its disc of confusion at
content p, σhi (βi ) denotes the observed blurriness, n is determined the sensor plane as a Gaussian distribution ideally. As well known,
by the number of indeterminate parameters of σ h (β ). Then, the the refocusing integration [1] is shown as
optimal refocusing depth β 0 can be estimated via minimizing the 1

object equation, α [L] ( α xα ) = L u 1− + xα , u du, (6)
α2F 2 α
σh (βi ) − σh (βi )2 . where α is the relative image distance of the virtual image plane,
β0 = arg min σh (β0 ) + λ (4)
i 2 α [L] is the refocusing operator which represents refocusing the
light field L at the relative image distance α . The xα denotes the
The first term of right side of Eq. (4) is to ensure that β 0 is the coordinate of the virtual image plane at the relative image distance
minimum of σ h (β ), and the second term guarantees the precision α.
in estimation of σ h (β ). λ is a balance factor. We used the point spread function (PSF) to describe the blur-
It can be inferred that the model of R-PSF and the ABM are two riness generated by refocusing. The PSF is defined as the intensity
major issues in the proposed ARF framework. distribution of the defocused spot caused by a pointolite. In order
180 C. Zhang et al. / Pattern Recognition 81 (2018) 176–189

to calculate the R-PSF, the substitution is used as followed, As shown in Eq. (19), the PSF of pointolite S0 will shrink to a
 Dirichlet function when α = α0 , which is the optimal solution of
x=u 1− + xα (7) Eq. (4). Let β0 = α0−1 and β = α −1 , thus the variance σ h (β ) of PSF
can be modeled as
It is possible to calculate the α [LSF ](α xα ) as:
σh (β ) = β · σr , β = |β − β0 |. (20)
α [LSF ](α xα ) =
1 2
4π α 2 F 2 ( 1 − α ) σr2 Note that, in Eq. (20) there is a linear relationship between the

refocusing depth shift β and the defocus blurriness σ h (β ). Such
(x0 − xα )T (x0 − xα ) a simple linear relationship enables to recognize the refocus shift
× exp (8)
2 β by rendering only one refocusing image. Then, the β 0 can be
2(1 − α1 ) σr2
obtained by simply comparing the relative sharpness between the
To eliminate the changed scale caused by refocusing, An images refocused at β + β and β − β . The relative sharpness
integration-invariant resize operator is defined as followed, measure can be any monotonic blurriness measures used in AF
Sη [I (x )] = η2 I (ηx ), (9)
In implementing the ARF algorithm, β can completely replace
Then the R-PSF of S can be represented as, α , although α has an intuitive physical meaning. So β is referred

hSα (x ) = Sα −1 α [LSF ](α xα ) . (10) as “refocusing depth” in the rest of this paper.

α (x ) is the R-PSF, which can be derived as

  3.3. The blurriness assessment
1 (x0 − x )T (x0 − x )
hSα (x ) = exp − , (11)
2 π F σα
2 2 2σα2 The proposed ARF framework needs to estimate the value of
σhi (β ) from measuring the blurriness of g[x] with little influence
of p, as shown in Eq. (3). We built an Absolute Blurriness Mea-
1 sure (ABM) based on the BRISQUE [37]. We explored second order
σα2 = (1 − )2 · σr2 . (12)
α neighboring pixels for boosting the accuracy of blurriness assess-
Then, we extended the model to more general cases. Assuming ment. The empirical distributions of pairwise products of neigh-
that another pointolite is set in S0 , its image cannot focus at the boring MSCN coefficients is modeled along four orientations, hor-
optical image plane. According to Ng [1], the original light field can izontal (H1 ), vertical (V1 ), main-diagonal (MD1 ) and secondary di-
be re-parameterized on a virtual image plane at image distance of agonal (SD1 ). In addition, we extended to model the empirical dis-
α 0 F via tributions of pairwise products of second order neighboring MSCN
 coefficients along the four direction, denoted as (H2 ), (V2 ), (MD2 )
LSα00 ·F (α0 xα0 , u ) = LSF0 u(1 − ) + xα0 , u . (13) and (SD2 ).
Thus, assuming that the rays emit from the pointolite at S0 can H1 (i, j ) = Iˆ(i, j )Iˆ(i, j + 1 ),
be focused at this virtual image plane, as shown in Fig. 4. Then, H2 (i, j ) = Iˆ(i, j − 1 )Iˆ(i, j + 1 ) (21)
according to the similar triangles and the fundamental equation
of optics, the re-parameterized light field generated by S0 can be
modeled as

V1 (i, j ) = Iˆ(i, j )Iˆ(i + 1, j ),

2 exp −  2uT u
, ∀x = x 0  V2 (i, j ) = Iˆ(i − 1, j )Iˆ(i + 1, j ) (22)
LSα00 ·F (x, u ) = 2π ασr 2 ασr , (14)

⎩ 0 0

0, ∀x = x 0 
MD1 (i, j ) = Iˆ(i, j )Iˆ(i + 1, j + 1),
  MD2 (i, j ) = Iˆ(i − 1, j − 1)Iˆ(i + 1, j + 1) (23)
f − α0 F
x0  = x0 , (15)
f −F
where f is the focal length of the main lens. Since the captured SD1 (i, j ) = Iˆ(i, j )Iˆ(i + 1, j − 1),
light field is re-parameterized at α 0 F. The image refocused at the
relative image distance α from original light field can be achieved
SD2 (i, j ) = Iˆ(i − 1, j − 1)Iˆ(i + 1, j − 1) (24)
by refocusing the re-parameterized light field at α  , where α = α  · for i ∈ {1, 2, . . . , M} and j ∈ {1, 2, . . . , N}. The histograms of paired
α0 via products of first and second neighboring pixels along the horizon
α [LSF0 ](α xα ) = α  [LSα00 ·F ](α  xα  ). (16) orientation are plotted in Fig. 5. It can be infer from the Fig. 5 that
the paired products of second order neighboring pixels contain the
Thus, the R-PSF of pointolite S0 can be calculated from
extra cues for boosting accuracy of blurriness measure.
hSα0 (x ) = Sα −1 α  [LSα00 ·F ](α  xα  ) . (17) The asymmetric generalized Gaussian distribution (AGGD)
model is adopt to fit the distribution of the statistical relation-
The PSF of the pointolite S0 refocused at image distance α F can be
ships between neighboring pixels [37]. The AGGD with zero mode
derived as

 T   is given by:
1 x0  − x x0  − x  
hα ( x ) =
exp − , (18) f x; v, σl2 , σr2
2π F 2 σα2 2σα2 ⎧   v 

⎪ v exp − −x ,x < 0
where ⎨ ( β l + β r ) ( v )
1 β
  l v 
1 2
= v exp − −x ,x  0

⎩ ( β l + β r ) ( v )

σα2 = − · σr2 . (19)
1 βr
α0 α
C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 181

Fig. 5. Histograms of paired products of first order (a) and second order (b) neighboring pixels along the horizon orientation.

where with the weights indicating the correlation between features and
      blurriness. The weights can be computed via solving the Eq. (28).
1v 1v Secondly, the L1 norm regularization is known for generaliza-
βl = σl  3  , βr = σr   (26)
v 3v tion and noise-proof ability [43]. For boosting the generalization,
we apply the L1 norm regularization rather than directly solve the
The shape parameter controls the shape of the distribution while least square version.
σl2 and σr2 are scale parameters that control the spread on each Thirdly, the proposed ABM is expected to measure the blurri-
side of the mode, respectively. The parameters η, v, σl2 , σr2 of the ness of refocused images across larger appearance variation. We
best AGGD fit are extracted where η is given by: proved in Section 4 that the BRISQUE can be affected by appear-
2 ance variance and its accuracy decreases for simultaneously evalu-
ating blurriness of different objects, e.g. face and 2D barcode in the
η = (βr − βl )  1v  (27)
v experiment. The weights from Lasso regression solution enhance
the features related to blurriness and waken those related to ap-
Thus for each paired product, 32 parameters are computed. pearance.
All features discussed above are extracted at two scales, i.e. an In solving the lasso regression, the λ is a factor for adjusting
original image scale and a down-sample image scale (low pass fil- the balance between the generalization performance and the ac-
tered and downsampled by a factor of 2). To increasing the number curacy of regression. The lager λ will give a sparse solution of w,
of scales beyond 2 is observed not to enhance performance much. which is prefer in deal with high dimension data as it selects most
Thus, there are 68 features extracted for assessing blurriness. representative features. It also decreases the accuracy of regression
Instead of sending these features into regression algorithm di- since a larger bias from the least squared solution. The smaller λ is
rectly, we proved that a feature weighting scheme can further im- also undesired, as it degrades to the least square solution and leads
prove accuracy and universality of blurriness assessment. In gen- to overfitting on the training set. Experiments prove that λ = 0.01
eral, feature selection is able to improve the accuracy of regres- is appropriate.
sion, reduce the interference of noisy and redundant features as Finally, a regression model for assessing blurriness from
well as optimize computing efficiency. However, reducing the 68 weighted features can be trained by two representative machine
features into a much lower feature space, e.g. lower than 30, learning model: (1) support vector machine regression (SVM-R)
would damage the accuracy of regression and offer a negligible ef- and (2) AdaBoosting back-propagation neural network (AB-BPNN).
 improvement.  Since the features are computed by group, SVM-R is widely used in IQA tasks [33,36,37]. AB-BPNN is proved
e.g. η, v, σl2 , σr2 cannot be removed independently. effective in IQA recently [38]. We applied the libSVM package to
The weights vector is learned from the training set in which the implement the SVM regression algorithm [44]. The radial basis
blurriness of images is labeled via solving the lasso regression as function (RBF) kernel is adopted in this paper. We implemented
follow: AB-BPNN based on the OG-IQA package. The parameters of both
the SVM regression and the AB-BPNN are estimated using cross-
w = arg min Aw − β2 + λw1
validation on the training set.
Where A is a m × n matrix of training instance, m is the dimen-
sion of training samples, n is the dimension of features. β is the
label vector. •1 denotes L1-norm. Such lasso regression is a reg-
3.4. The ARF algorithm
ularized version of least squares regression, which uses the con-
straint the L1-norm of the w vector. In lasso regression, increasing
The proposed ARF algorithm is demonstrated in Algorithm 1.
the penalty factor λ will cause more and more of the parameters
The pipeline of processing the light-field image is shown in Fig. 6,
to be driven to zero, thus it tends to be applied to feature select-
which is based on the proposed ARF algorithm.
We adopted lasso regression for three reasons:
Firstly, the BRISQUE is designed for assessing the quality of dis-
torted images across multiple distortion categories so that its fea- 4. Experiments
tures are redundant for assessing Gaussian blurriness. To increase
the correlation of these features with blurriness and remove the We did experiments on four datasets to evaluate the perfor-
influence of their redundancy, the features should be modulated mance of the proposed ARF.
182 C. Zhang et al. / Pattern Recognition 81 (2018) 176–189

Fig. 6. Flowchart of processing light-field face image based via the proposed ARF algorithm.

Algorithm 1 An algorithm for auto-refocusing. We used a Lytro illum camera, and set its focal length of main-
lens at 240 mm and aperture size at f/2. In each image, two faces
Input: LF (u, x )
are arranged at different depths. There are 6 candidate positions
1. render the initial image gβI [x] at arbitrary depth of βI .
  ranged from 2.50 m to 7.50 m with an incremental step of 1m.
2. compute the σI via σI (βI ) = ABM gβ I [x] by trained SVR regres-
Thus, every 2 faces can generate 30 images for all permutations.
sion model.
The dataset totally has 450 light field images of faces. The scene
3. compute the β = σI · σr−1
for capturing the face dataset is exhibited in Fig. 8.
4. render two candidates optimal images g(βI +β ) [x] and
g(βI −β ) [x]. 4.1.3. Iris dataset
5. determine the optimal
  depth β0 via Finally, we applied the proposed ARF algorithm to iris recogni-
β0 = arg min RBM g(βI +β ) , RBM g(βI −β ) tion. we evaluated the quality of ARF images via the performance
Output: β0 and gβ 0 [x]. of iris recognition. Indeed, ARF for iris imaging is a quite challeng-
ing and convincing task, because the iris recognition is known as a
texture sharpness-demanding application.
The dataset in [16] is adopted to verify our ARF algorithm. In
4.1. Datasets acquisition
this database, 14 subjects participate in collection of light-field iris
images. The distance between the iris and the light-field camera is
We built four representative datasets for testing the proposed
continuously varied. This database includes over 20 0 0 iris lenselet
ARF algorithm. Thay are a QR-code dataset, a face dataset, an iris
images. A close-up view of a sample is shown in Fig. 9. Accuracy
dataset and a blended dataset.
of iris recognition is used to evaluate the performance of the pro-
posed ARF algorithm.
4.1.1. QR-code dataset
Barcode scanners are fundamentally low-cost cameras [12], and 4.1.4. Blended dataset
are limited by well-known tradeoff between noise and blur. Bar- ARF is expected to adapt to multiple objects. Thus, it is neces-
code reading can represents a series of mark-reading tasks, i.e. sary to evaluate the ABM on multiple objects, especially the objects
brand, badge, traffic sign, auto plate, etc., which encourages us to with entirely different appearance. In the experiment, a dataset
choose barcode for evaluation. We captured a light-field QR-code blended the QR-code dataset and the faces dataset is used to eval-
(abbreviated from Quick Response Code) dataset for evaluation of uate the general applicability of proposed ABM model.
the proposed ARF algorithm. Comparing to 1D barcode selected by
Guo et al. [12], QR-code has much more complex texture and thus 4.2. Preprocessing
can not be solved by scheme proposed in [12].
This QR-code dataset is captured by a Lytro illum camera, the The 2D raw lenselet image should be firstly de-
focal length of main-lens is set at 200 mm and aperture size is coded to form a 4D light-field representation. We adopted
fixed at f/2. The QR-code is positioned at 950 mm to 1750 mm Dansereau et al. [42] LFtoolbox0.3 for decoding the light field im-
from the camera with a incremental step of 100 mm. The camera age captured by the Lytro illum camera. As mentioned above, the
is adjusted to focus at 1250 mm. Qr-codes are generate by ZXing raw images in the QR-code dataset and the face dataset is decoded
API [39]. The QR-code is rotated at 0, 90, 180, 270, for augmenting into the 4D light fields with resolution of 15 × 15 × 434 × 625.
the variance. Finally, we totally captured 540 light field images of We developed tools for light field iris acquisition. The raw light-
QR-codes for this dataset. The QR-code imaging installation and a field images are decoded into 4D light fields with resolution of
sub-aperture image are shown in Fig. 7. 9 × 9 × 403 × 268. Like in our previous version [6], The raw 4D
light fields are interpolated to increase the spatial resolution
4.1.2. Face dataset with factor 2, e.g. the resolution of iris light-field images is
Note that the great majority of pictures taken by consump- 9 × 9 × 806 × 536.
tive cameras or cell-phone cameras are of human or humans As discussed above, we applied the center-aperture image
faces. Conventional camera manufacturers have started introducing of a light field to localize interested objects. The ZXing tool-
a face-priority AF feature which detects faces in scene and focuses box [39], Viola-Jones face detector [40] and He’s iris localizing
at faces area. The face-priority ARF is still expected for light field scheme [41] are applied to localize the QR-code region, faces and
camera. Thus, we constructed a multi-face light field dataset ori- iris respectively. The localizing results can be shared with all refo-
ented to rendering high quality images of faces positioned at dif- cused images rendered by the same light field. Notice that the sub-
ferent depth. aperture image can be considered as imaging the lights though a
C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 183

Fig. 7. (a) shows the installation of capturing QR-code dataset, and (b) is a sub-aperture image of a light field.

Fig. 8. (a) shows the installation of capturing faces dataset, and (b) is a sub-aperture image of a light field.

plays a vital role in the proposed ARF algorithm. Thus, we eval-

uated this model at the beginning of experiments.
Since the refocusing depth shift β is linear with blurriness
σ h , the SVM regression model can be trained to project the blur-
riness features to depth shift directly. Thus, the SVM regression
model can predict depth shift β from optimal refocusing depth
to current refocusing depth β = |β0 − β|.
As mentioned above, the refocused images in the four datasets,
i.e. QR-code, faces, iris and blended dataset, are labeled with depth
shift β . For example, if the image refocused at depth β and the
optimal refocus depth is β 0 , then the image is labeled as β =
|β0 − β|. We compared the five blurriness measures on all the four
Fig. 9. A close-up view of the lenslet iris image. datasets. The five blurriness measure includes RBM (variance in
a 7 × 7 window [20]), DIIVINE [33], Zhang14 [6], BRISQUE [37],
OG-IQA [38], weighted features with SVM-R (WF+SVM-R) and
digital reduced aperture [1], and its DOF is equal to upper limit of
weighted features with AB-BPNN (WF+AB-BPNN).
extended DOF of the light field camera. Although the sub-aperture
The Spearmans Rank Ordered Correlation Coefficient (SROCC)
image is grained image with low SNR, it is proved to be good
between the predicted β from depth-from-blurriness regression
enough for localizing the interested objects in experiments.
algorithm and labeled β from averaged people’s opinion (AO)
For QR-code dataset, we rendered 41 images refocused at the
is used to evaluate the performance of these blurriness measure.
range from β = 0.600 to β = 1.400 with an increment step of
A value close to 1.0 0 0 for SROCC indicates good performance in
0.025. The depth range and the increment step assure that there
terms of correlation with labeled depth. The performance indices
are at least one image accurately refocused at the QR-code in this
of these blurriness measures are tabulated in Tables 1. Meanwhile,
image sequence. The similar criterion is applied to the face dataset
the cumulative error distribution curves are applied to visualizing
and the iris dataset. We rendered 41 images refocused at the range
the comparison of regression accuracy, as shown in Fig. 10.
from β = 0.600 to β = 1.400 with an increment step of 0.025.
There are some observations:
There are 39 images rendered from a iris light field across from
β = 0.500 to β = 1.475 with a constant step of 0.025. (1) The RBM generally maintains much worse performance than
the ABMs, especially on the dataset blended with QR-code and
4.3. Depth estimation from blurriness Faces. As discussed above, the RBM is easily influenced by the
image content, thus the objects with large appearance variation
In ARF, a depth-from-blurriness model is fitted on a rendered would remarkably decrease the accuracy of RBM-based depth
image to predict depth shift from the current rendering depth to regression. Such defect impedes RBM to be used for ARF, al-
the optimal refocusing depth. Such depth-from-blurriness model though RBM is the fastest scheme.
184 C. Zhang et al. / Pattern Recognition 81 (2018) 176–189

Fig. 10. (a)–(d) compare the cumulative error distribution curves among the related blurriness measures evaluated on the QR-code dataset, the face dataset, the iris dataset
and the blended dataset, respectively.

Table 1 4.4. ARF evaluation

Performance of Regression. The larger SROCC indicates the better accuracy.

Method QR-code Faces Iris Blend It is necessary to evaluate quality of refocused images rendered
(1) RBM [20] 0.9272 0.9015 0.9154 0.8075 by the proposed ARF scheme. We compared the images rendered
(2) DIIVINE [33] 0.9484 0.9440 0.9732 0.9471 by proposed-ARF algorithm and the images exhaustively searched
(3) Zhang14 [6] 0.9630 0.9719 0.9733 0.9579 (ES-ARF) from entirely refocusing depth space.
(4) BRISQUE [37] 0.9679 0.9610 0.9659 0.9479 We first evaluated the qualitative results. The images in
(5) OG-IQA [38] 0.9692 0.9702 0.9687 0.9585
Figs. 11–13 show the comparison of the initial images, proposed
(6) WF+SVM-R 0.9785 0.9862 0.9610 0.9802
(7) WF+AB-BPNN 0.9805 0.9859 0.9726 0.9822 ARF images and ES-ARF images. The closeup views are offered for
better observation.
In addition, we evaluated the quantity performance of ARF. The
optimal refocused image selected by Average Opinions (AO) is con-
sidered as ground truth. The Structural Similarity Index Measure-
ment (SSIM) is applied to evaluating the optimal refocused images
(2) The both schemes of Zhang14, WF+SVM-R and WF+AB-BPNN
rendered by ARF. ES-ARF can robustly select the optimal refocused
using the weighted features generally obtain better perfor-
images, thus its performance is considered as the baseline. The cu-
mance than the schemes using unweighted features, DIIVINE
mulative SSIM distribution curves are shown in Fig. 14
and BRISQUE. Notice that the weighted features can effectively
In the experiment, the SSIM of optimal refocused images se-
improve performance on blurriness assessment.
lected by ARF-ES distributes from 0.95 to 1.00 referred by images
(3) There is a noticeable degradation of accuracy, when BRISQUE
selected by AO, as shown in Fig. 14. Also, it is hard to discern the
is evaluated on the blended dataset. Such degradation proves
difference of two images by human vision, if the SSIM of them is
that BRISQUE can be interfered by the appearance variation of
larger than 0.95. Thus, we assumed that images can be considered
interested objects. Meanwhile, the weights learned from lasso
as rightly refocusing, if the SSIM of the images compared to ground
regression can weaken the interference of the appearance varia-
truth is larger than 0.95.
tion and assist the WF+SVM-R and WF+AB-BPNN in performing
In addition, we defined a new index to quantitatively evaluate
better than other regression schemes in blended dataset.
the performance of ARF algorithms, named Right-Refocusing Rate
(4) The WF+AB-BPNN performs slightly better than the WF+SVM-
(RRR). The RRR defined as the percentage rate of rightly refocusing
R, since it is driven by a more powerful regressor. However, the
images among total rendered images. The RRRs computed on the
trained AB-BPNN has 10 neural networks as its weak regressor
four dataset are shown in Table 2.
and takes over 20 times longer time to assess a query image
In the meanwhile, the initial images are rendered at entire
than SVM-R. Since the efficiency has a higher priority in im-
depth range for verifying the effectiveness and robustness of the
plementing an ARF algorithm, the WF+SVM-R is chosen as the
proposed ARF algorithm. The SSIM distribution and RRR of the ini-
ABM in the following experiments.
C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 185

Fig. 11. Comparison of qualitative results on the QR-code dataset.

Fig. 12. Comparison of qualitative results on the face dataset.

Fig. 13. Comparison of qualitative results on the iris dataset.

Table 2 tial image set is shown in Fig. 14 and Table 2 as a reference for
Refocused image quality assessment. comparing the differences of image quality.
Method QR-code Faces Iris Blend From Fig. 14 and Table 2, it is convincible to conclude that the
(1) Init 0.2469 0.2375 0.2381 0.2465
proposed ARF algorithm can effectively render images refocused at
(2) Proposed ARF 0.9468 0.9376 0.9720 0.9410 interested objects. Since the lowest RRR of image sets rendered by
(3) ES-ARF 0.9907 1.0 0 0 0 0.9964 0.9975 ARF is over 0.93 among the four datasets, while, the RRR of initial
186 C. Zhang et al. / Pattern Recognition 81 (2018) 176–189

Fig. 14. (a)–(d) are the cumulative SSIM distribution curves of rendered image sets evaluated on the QR-code dataset, the face dataset, the iris dataset and the blended
dataset, respectively. INIT denotes the set of initial image, ARF denote the set of image rendered by the proposed ARF scheme and ES denotes the set of image rendered by
ES-ARF scheme.

Table 3 Table 4
Comparison of executing time of the ARF algorithms. The performance of iris Recognition. The iris image set corresponding the larger DI
andsmaller EER contains the sharper iris images.
Method Executing time (ms) Handler
Method DI EER
(1) ES-ARF(806 × 536) 170 0 0+ CPU
(2) ES-ARF(806 × 536) 1503 CPU+GPU (1)IRII 2.6981 0.0324
(3) ARF [6](806 × 536) 2966 CPU (2)ORII-AO 4.0305 0.0084
(3) ARF [6](806 × 536) 933 CPU+GPU (3)Proposed ARF-βI = 1.0 0 0 4.0224 0.0081
(4) The proposed ARF(806 × 536) 146 CPU+GPU (4)Proposed ARF-βI = random 4.0635 0.0083
(5)ORII-Raja [11] 4.0140 0.0076

image set are all lower than 0.25. It is a prominent improvement

in image quality when the proposed ARF algorithm is applied. The at the interested object in the sequence. In experiments, n is set to
similar conclusion can be observed from Fig. 14 where the SSIMs 41 for the QR-code dataset and the face dataset and to 33 for the
of initial image set are approximately scattered from 0.75 to 1.00, iris dataset. Meanwhile, the computing complexity of the proposed
but the SSIMs of most images rendered by ARF are gathered above ARF algorithm is O(1 + 2m), m is determined by the number of in-
0.95. terested objects in a light field image. For example, the computing
complexity is O(3) for the QR-code dateset and the iris dataset and
4.5. ARF Efficiency O(5) for the faces dataset. Actually, the proposed ARF scheme is
theoretically much more efficient than the ES-ARF scheme in most
We analysed the efficiency of the proposed ARF algorithm in case.
two ways. All of the operations are tested on a PC which equips Compare to our previous paper [6], This new version concen-
a 3.6 GHz processor with 8 GB RAM and house a Nvidia GTX660 trates on enhancing the practical efficiency of the proposed ARF on
GPU. Its software environment is Windows 7 and Matlab 2010b. CPU+GPU platform. The conference version proposed the blurriness
The executing time is listed in Table 3. assessment based on statistically modeling images in the wavelet
We first compared the computing complexity of the proposed domain, which consumes a large amount of time for wavelet trans-
ARF algorithm with the ES-ARF algorithm. The refocusing render- formation [6]. When we optimize the ARF algorithm on CPU+GPU
ing tends to be considered as a time-consuming operation that the platform, the ABM in the wavelet domain consumes so much time
exhausting searching scheme has to take a large amount of time that it takes over 80% elapsed time of the ARF algorithm [6]. For
for rendering image sequence refocused at entire depth space. The example the ABM in the wavelet domain takes 750ms to determine
computing complexity of the ES-ARF algorithm is O(n). n is deter- the optimum refocusing depth [6], while the refocusing algorithm
mined by the number of images in this sequence, and n should as- in GPU only take approximately 45ms to render a 806 × 536 refo-
sure that there should be at least one image refocused accurately cusing image.
C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 187

Fig. 15. (a)–(f) illustrate the sample images refocused at β = 1.0 0 0 (IRII). (g)-(l) display the images rendered by the proposed ARF algorithm with initial images of (a)–(f)
(ARF-βI = 1.0 0 0).

As shown in Table 3, the executing time of the exhaustive

search scheme(ES-ARF) consumes over 170 0 0ms on CPU platform
and it can be optimized to 1503ms on CPU+GPU platform. The
ARF scheme in our previous paper [6] offers slight improvements
(933ms) on CPU+GPU platform comparing to the ES-ARF. In this pa-
per, The novel ABM in spatial domain consumes only 45ms to de-
termine the optimum refocusing depth. The novel ABM leads to a
significant decrease in the average executing time of the proposed
ARF algorithm in this paper to 146ms on CPU+GPU platform from
the 2966ms in our previous paper.

4.6. Application on iris recognition

The iris dataset in [16] is adopted to verify our ARF algorithm

for iris recognition. In this database, 14 subjects participate in col-
lection of light-field iris images. One iris is captured on each light-
field iris image. The distance between the iris and the light-field Fig. 16. The ROC curves. The better recognition performance corresponds to the
camera is continuously changing. A sample is shown in Fig. 9. ROC curve getting more close to the horizontal axis.
The quality of iris images can be best evaluated by their contri-
bution to iris recognition performance. So the performance of ARF
algorithm can be demonstrated with the accuracy of iris recogni-
images, denoted as (ORII-AO). The iris images in set of (ORII-AO)
tion with the ARF iris images. Hence, we organized five image sets
can be considered as rendering at the optimal refocusing depth.
for evaluating the proposed ARF algorithm in iris recognition task.
The applied iris recognition algorithm is based on ordinal mea-
we compare iris recognition performance on these image sets: (1)
sures (OM) [45] which are the state-of-the-art descriptors of iris
Initially Refocused Iris Images (IRII) at β = 1.0 0 0; (2) Optimally
texture. Equal error rate (ERR), Discriminating Index (DI) and Re-
Refocused Iris Images (ORII) selected by the Human Visual Sys-
ceiver Operating Characteristic (ROC) [45] curve are used to mea-
tem of averaged opinion (AO) (ORII-AO); (3) auto-refocused iris
sure the accuracy of iris recognition with different ARF methods.
images with initial refocusing depth βI = 1.0 0 0 (ARF-βI = 1.0 0 0);
From the results shown in Table 4 and Fig. 16 , several observa-
(4) auto-refocused iris images with random initial refocusing depth
tions can be summarized:
(ARF-βI = random); (5) optimally refocused iris images selected by
exhaustive searching from entire depth space, using the focusing
assessment proposed by Raja et al. [11] (ORII-Raja). (1) The iris recognition is an image sharpness-demanding applica-
The dataset selected for this experiment includes the iris light tion, which can be inferred from the remarkable low perfor-
fields with β 0 = 1.0 0 0. If we refocus these iris light fields at β = mance on set (IRII).
1.0 0 0, those images are blurred. These blurred images are col- (2) Both sets rendered by the proposed ARF algorithm, i.e. βI =
lected as a reference to quantify the degradation of using blurred 1.0 0 0 and βI = random, can guarantee high-performance in iris
iris images for recognition, denoted as (IRII). recognition, because their scores shown in Table 4 have higher
The set denoted with (ARF-βI = 1.0 0 0) includes the iris images DI and lower EER. Meanwhile, as shown in Fig. 16, the ROC
rendered by the proposed ARF algorithm with initial refocusing curves of (ARF-βI = 1.0 0 0) and (ARF-βI = random) are in good
depth at β = 1.0 0 0. In addition, to verify that the proposed ARF al- agreement with the curve of (ORII-AO), which is considered as
gorithm can robustly converge to the optimal refocusing depth in- rendering at the optimal refocusing depth. The images rendered
dependent of the initial refocusing depth, we rendered another set by proposed ARF algorithm is shown in Fig. 15.
of refocused iris images with the random initial refocusing depth, (3) The proposed ARF algorithm is verified to be robust to the ini-
denoted as (ARF-βI = random). tial refocusing depth, since the performance of sets βI = 1.0 0 0
Finally, we used iris images selected by AO, i.e. β = 0 or β = and βI = random are dramatically consistent as shown in the
β0 , as the baseline to compare the performance of each set of iris ROC curve in Fig. 16.
188 C. Zhang et al. / Pattern Recognition 81 (2018) 176–189

(4) The method (ORII-Raja) has an ignorable superior performance [13] R. Raghavendra, K.B. Raja, C. Busch, Presentation attack detection for face
to the methods (ARF-βI = 1.0 0 0) and (ARF-βI = random), at the recognition using light field camera, IEEE Trans. Image Process. 24 (3) (2015)
cost of much more computing time. [14] A. Ghasemi, M. Vetterli, Detecting planar surface using a light-field camera
with application to distinguishing real scenes from printed photos, in: Pro-
5. Conclusions ceedings of the IEEE International Conference on Acoustics, Speech and Signal,
2014, pp. 4588–4592, doi:10.1109/ICASSP.2014.6854471.
[15] A. Ghasemi, M. Vetterli, Scale-invariant representation of light field images for
In this paper, we presented an efficient solution of the ARF that object recognition and tracking, in: Proceedings of the IS&T/SPIE Electronic
is a basic feature for light field cameras. We introduced an ARF Imaging, International Society for Optics and Photonics, 2014, p. 902015.
[16] C. Zhang, G. Hou, Z. Sun, T. Tan, Z. Zhou, Light field photography for iris image
framework based on modeling the R-PSF and found a simple lin-
acquisition, in: Proceedings of the Chinese Conference on Biometric Recogni-
ear relationship in the R-PSF. This linear relationship simplifies the tion, 2013, pp. 345–352.
computational complexity of the ARF and enables to build an effi- [17] M. Levoy, Light fields and computational imaging, IEEE Comput. 39 (8) (2006)
cient ARF algorithm that estimates the optimal refocusing depth
[18] T. Georgiev, A. Lumsdaine, Depth of field in plenoptic cameras, in: Proceedings
from only one refocused image via a proposed ABM. We tested of the Eurographics, 2009.
the proposed ARF algorithm on four datasets as well as applied [19] C.M. Chen, C.M. Hong, H.C. Chuang, Efficient auto-focus algorithm utilizing dis-
it to the iris imaging tasks. The experimental results show that the crete difference equation prediction model for digital still cameras, IEEE Trans.
Consum. Electron. 52 (4) (2006) 1135–1143.
proposed ARF algorithm significantly decreases executing time by [20] M. Subbarao, T.S. Choi, A. Nikzad, Focusing techniques, Opt. Eng. 32 (11) (1993)
more than an order of magnitude comparing to the current “ren- 2824–2836.
dering and selection” solutions. Meanwhile, the proposed ARF al- [21] E. Krotkov, Focusing, Int. J. Comput. Vis. 1 (3) (1988) 223–237.
[22] N. Kehtarnavaz, H.J. Oh, Development and real-time implementation of a
gorithm achieves a comparable performance in accuracy and ro- rule-based auto-focus algorithm, Real Time Imaging 9 (3) (2003) 197–203.
bustness. [23] J.H. Lee, K.S. Kim, B.D. Nam, J.C. Lee, Y.M. Kwon, H.G. Kim, Implementation
In future, we would like to implement and evaluate the pro- of a passive automatic focusing algorithm for digital still camera, IEEE Trans.
Consum. Electron. 41 (3) (1995) 449–454, doi:10.1109/30.468047.
posed ARF algorithm on a large-scale dataset with a variety of [24] M.W. Tao, S. Hadap, J. Malik, R. Ramamoorthi, Depth from combining defocus
interested objects that is unavailable now. At the same time, we and correspondence using light-field cameras, in: Proceedings of the IEEE In-
will replace the SVM-R model with some more powerful machine ternational Conference on Computer Vision, IEEE, 2013, pp. 673–680.
[25] H.G. Jeon, J. Park, G. Choe, J. Park, Y. Bok, Y.W. Tai, I.S. Kweon, Accurate depth
learning models, e.g. deep learning models, to benefit from the
map estimation from a lenslet light field camera, in: Proceedings of the IEEE
large-scale data. Conference on Computer Vision and Pattern Recognition, 2015, pp. 1547–1555.
[26] H. Sheng, P. Zhao, S. Zhang, J. Zhang, D. Yang, H. Sheng, P. Zhao, S. Zhang,
J. Zhang, D. Yang, Occlusion-aware depth estimation for light field using mul-
ti-orientation EPIs, Pattern Recognit. 74 (2018) 587–599.
[27] P. Favaro, S. Soatto, A geometric approach to shape from defocus, IEEE Trans.
The authors would like to thank the associate editors and the Pattern Anal. Mach. Intel. 27 (3) (2005) 406–417.
reviewers for their valuable comments and advices. This work is [28] S.K. Nayar, Y. Nakagawa, Shape from focus, IEEE Trans. Pattern Anal. Mach. In-
tel. 16 (8) (1994) 824–831.
funded by National Natural Science Foundation of China (Grant [29] S. Pertuz, D. Puig, M.A. Garcia, Analysis of focus measure operators for
Nos. 61602481, 61573360) and National Natural Science Foundation shape-from-focus, Pattern Recognit 46 (2013) 1415–1432.
of China Major Instrument Special Fund (Grant No. 61427811). This [30] C. Zhou, D. Miau, S.K. Nayar, Focal sweep camera for space-time refocusing,
[31] A. Kumar, N. Ahuja, A Generative Focus Measure with Application to Omnifo-
work is supported by the Open Research Fund of Key Laboratory of cus Imaging, in: Proceedings of the IEEE International Conference on Compu-
Spectral Imaging Technology, Chinese Academy of Science. tational Photography, 2013, pp. 1–8, doi:10.1109/ICCPhot.2013.6528295.
[32] Z. Wang, A.C. Bovik, Modern image quality assessment, Synth. Lect. Image
References Video Multimed. Proces. 2 (1) (2006) 1–156.
[33] A.K. Moorthy, A.C. Bovik, Blind image quality assessment: from natural scene
statistics to perceptual quality, IEEE Trans. Image Process. 20 (12) (2011)
[1] R. Ng, Digital light field photography, 2006 Ph.D. thesis.
[2] G. Guo, M. Jones, P. Beardsley, A system for automatic iris capturing, Mitsubishi
[34] K.H. Thung, R. Paramesran, C.L. Lim, Content-based image quality metric using
Electric Research Laboratories TR2005-044, 2005.
similarity measure of moment vectors, Pattern Recognit. 45 (2012) 2193–2204.
[3] W. Dong, Z. Sun, T. Tan, A design of iris recognition system at a distance, in:
[35] W. Sun, F. Zhou, Q. Liao, Mdid: a multiply distorted image database for image
Proceedings of the Chinese Conference on Pattern Recognition, 2009, pp. 1–5,
quality assessment, Pattern Recognit. 61 (2017) 153–168.
[36] M. Saad, A. Bovik, C. Charrier, Blind image quality assessment: a natural scene
[4] M. Rahman, N. Kehtarnavaz, Real-time face-priority auto focus for digital and
statistics approach in the DCT domain, IEEE Trans. Image Process. 21 (8) (2012)
cell-phone cameras, IEEE Trans. Consum. Electron. 54 (4) (2008) 1506–1513.
3339–3352, doi:10.1109/TIP.2012.2191563.
[5] R. Ng, Fourier Slice Photography, in: Proceedings of the ACM Transactions on
[37] A. Mittal, A. Moorthy, A. Bovik, No-reference image quality assessment in the
Graphics, 24, ACM, 2005, pp. 735–744.
spatial domain, IEEE Trans. Image Process. 21 (12) (2012) 4695–4708, doi:10.
[6] C. Zhang, G. Hou, Z. Sun, T. Tan, Efficient auto-refocusing of iris images for
light-field cameras, in: Proceedings of the IEEE International Joint Conference
[38] L. Liu, Y. Hua, Q. Zhao, H. Huang, A.C. Bovik, Blind image quality assessment
on Biometrics, 2014, pp. 1–7, doi:10.1109/BTAS.2014.6996295.
by relative gradient statistics and adaboosting neural network, Signal Process.
[7] E.H. Adelson, J.Y. Wang, Single lens stereo with a plenoptic camera, IEEE Trans.
Image Commun. 40 (2016) 1–15.
Pattern Anal. Mach. Intel. 14 (2) (1992) 99–106.
[8] T.E. Bishop, P. Favaro, The light field camera: extended depth of field, alias-
[40] P. Viola, M. Jones, Robust real-time face detection, Int. J. Comput. Vis. 57(2),
ing, and superresolution, IEEE Trans. Pattern Anal. Mach. Intel. 34 (5) (2012)
doi:10.1023/B:VISI.0 0 0 0 013087.49260.fb.
[41] Z. He, T. Tan, Z. Sun, X. Qiu, Toward accurate and fast iris segmentation for iris
[9] S. Wanner, B. Goldluecke, Variational light field analysis for disparity estima-
biometrics, IEEE Trans. Pattern Anal. Mach. Intel. 31 (9) (2009) 1670–1684.
tion and super-resolution, IEEE Trans. Pattern Anal. Mach. Intel. 36(3).
[42] D.G. Dansereau, O. Pizarro, S.B. Williams, Decoding, Calibration and Rectifica-
[10] R. Raghavendra, B. Yang, K.B. Raja, C. Busch, A New Perspective face recogni-
tion for Lenselet-Based Plenoptic Cameras[C]// IEEE Conference on Computer
tion with light-field Camera, in: Proceedings of the International Conference
Vision and Pattern Recognition, IEEE Computer Society, 2013, pp. 1027–1034.
on Biometrics, 2013, pp. 1–8.
[43] P. Zhao, B. Yu, On model selection consistency of lasso, J. Mach. Learn. Res. 7
[11] K. Raja, R. Raghavendra, F. Cheikh, B. Yang, C. Busch, Robust iris recognition
(2006) 2541–2563.
using light-field camera, in: Proceedings of the Colour and Visual Computing
[44] C.C. Chang, C.J. Lin, Libsvm: a library for support vector machines, ACM Trans.
Symposium, 2013, pp. 1–6.
Intel. Syst. Technol. 2 (3) (2011) 27.
[12] X. Guo, H. Lin, Z. Yu, S.M. Closkey, Barcode imaging using a light field
[45] Z. Sun, T. Tan, Ordinal measures for iris recognition, IEEE Trans. Pattern Anal.
camera, in: L. Agapito, M.M. Bronstein, C. Rother (Eds.), Proceedings of the
Mach. Intel. 31 (12) (2009) 2211–2226.
Computer Vision - ECCV 2014 Workshops, Lecture Notes in Computer Sci-
ence, 8926, Springer International Publishing, 2015, pp. 519–532, doi:10.1007/
978- 3- 319- 16181- 5_40.

C. Zhang et al. / Pattern Recognition 81 (2018) 176–189 189

Chi Zhang received the B.E. degree in computer science from Southwest Jiaotong University, the Ph.D. degree in computer science from University of Chinese Academy of
Sciences (UCAS) from in 2007 and 2016, respectively. He is currently an assistant professor with the Research Center of Brain-inspired Intelligence (RCBI), National Laboratory
of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA), China. His research interests focus on computer vision and computational

Guangqi Hou received the Ph.D. degree in Optical Engineering from Beijing Institute of Technology (BIT), in 2011. He is currently an associate professor with the Center for
Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA),
China. His research interests focus on computational optics and computational photography.

Zhaoxiang Zhang received the B.S. degree in electronic science and technology from the University of Science and Technology of China, Hefei, China, in 2004 and the Ph.D.
degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 2009. He is currently a Professor with
the Research Center of Brain-inspired Intelligence (RCBI), Institute of Automation, Chinese Academy of Sciences (CASIA). His current research interests include computer
vision, pattern recognition, machine learning, and brain-inspired neural network and brain-inspired learning. Dr. Zhang is an Associate Editor or a Guest Editor of some
internal journals, such as, Neurocomputing, Pattern Recognition Letters, and IEEE ACCESS.

Zhenan Sun received the B.E. degree in industrial automation from Dalian University of Technology, Dalian, China, the M.S. degree in system engineering from Huazhong
University of Science and Technology, Wuhan, China, and the Ph.D. degree in pattern recognition and intelligent systems from CASIA in 1999, 2002, and 2006, respectively.
He is currently a Professor with the Center for Research on Intelligent Perception and Computing (CRIPAC), National Laboratory of Pattern Recognition (NLPR), Institute of
Automation, Chinese Academy of Sciences (CASIA), China. His current research interests include biometrics, pattern recognition, and computer vision. He is a member of the
IEEE and the IEEE Computer Society.

Tieniu Tan received the B.Sc. degree in electronic engineering from Xi’an Jiaotong University, China, in 1984, and the M.Sc. and Ph.D. degrees in electronic engineering from
Imperial College London, U.K., in 1986 and 1989, respectively. He is currently a Professor with the Center for Research on Intelligent Perception and Computing (CRIPAC),
National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA). His current research interests include biometrics, image
and video understanding, information hiding, and information forensics. He is a Fellow of IEEE and the IAPR (International Association of Pattern Recognition).