Găsiți următorul dvs. carte preferat

Deveniți un membru astăzi și citiți gratuit pentru 30 zile
Computer Vision: Principles, Algorithms, Applications, Learning

Computer Vision: Principles, Algorithms, Applications, Learning

Citiți previzualizarea

Computer Vision: Principles, Algorithms, Applications, Learning

5/5 (1 evaluare)
1,788 pages
27 hours
Nov 15, 2017


Computer Vision: Principles, Algorithms, Applications, Learning (previously entitled Computer and Machine Vision) clearly and systematically presents the basic methodology of computer vision, covering the essential elements of the theory while emphasizing algorithmic and practical design constraints. This fully revised fifth edition has brought in more of the concepts and applications of computer vision, making it a very comprehensive and up-to-date text suitable for undergraduate and graduate students, researchers and R&D engineers working in this vibrant subject.

See an interview with the author explaining his approach to teaching and learning computer vision - http://scitechconnect.elsevier.com/computer-vision/

  • Three new chapters on Machine Learning emphasise the way the subject has been developing; Two chapters cover Basic Classification Concepts and Probabilistic Models; and the The third covers the principles of Deep Learning Networks and shows their impact on computer vision, reflected in a new chapter Face Detection and Recognition.
  • A new chapter on Object Segmentation and Shape Models reflects the methodology of machine learning and gives practical demonstrations of its application.
  • In-depth discussions have been included on geometric transformations, the EM algorithm, boosting, semantic segmentation, face frontalisation, RNNs and other key topics.
  • Examples and applications—including the location of biscuits, foreign bodies, faces, eyes, road lanes, surveillance, vehicles and pedestrians—give the ‘ins and outs’ of developing real-world vision systems, showing the realities of practical implementation.
  • Necessary mathematics and essential theory are made approachable by careful explanations and well-illustrated examples.
  • The ‘recent developments’ sections included in each chapter aim to bring students and practitioners up to date with this fast-moving subject.
  • Tailored programming examples—code, methods, illustrations, tasks, hints and solutions (mainly involving MATLAB and C++)
Nov 15, 2017

Despre autor

Roy Davies is Emeritus Professor of Machine Vision at Royal Holloway, University of London. He has worked on many aspects of vision, from feature detection to robust, real-time implementations of practical vision tasks. His interests include automated visual inspection, surveillance, vehicle guidance, crime detection and neural networks. He has published more than 200 papers, and three books. Machine Vision: Theory, Algorithms, Practicalities (1990) has been widely used internationally for more than 25 years, and is now out in this much enhanced fifth edition. Roy holds a DSc at the University of London, and has been awarded Distinguished Fellow of the British Machine Vision Association, and Fellow of the International Association of Pattern Recognition.

Legat de Computer Vision

Cărți conex
Articole conexe

Previzualizare carte

Computer Vision - E. R. Davies

Computer Vision

Principles, Algorithms, Applications, Learning

Fifth Edition

E.R. Davies

Royal Holloway, University of London, United Kingdom

Table of Contents

Cover image

Title page



About the Author


Preface to the Fifth Edition

Preface to the First Edition


Topics Covered in Application Case Studies

Influences Impinging Upon Integrated Vision System Design

Glossary of Acronyms and Abbreviations

Chapter 1. Vision, the challenge


1.1 Introduction—Man and His Senses

1.2 The Nature of Vision

1.3 From Automated Visual Inspection to Surveillance

1.4 What This Book Is About

1.5 The Part Played by Machine Learning

1.6 The Following Chapters

1.7 Bibliographical Notes

Part 1: Low-level vision

Part 1. Low-level vision

Chapter 2. Images and imaging operations


2.1 Introduction

2.2 Image Processing Operations

2.3 Convolutions and Point Spread Functions

2.4 Sequential Versus Parallel Operations

2.5 Concluding Remarks

2.6 Bibliographical and Historical Notes

2.7 Problems

Chapter 3. Image filtering and morphology


3.1 Introduction

3.2 Noise Suppression by Gaussian Smoothing

3.3 Median Filters

3.4 Mode Filters

3.5 Rank Order Filters

3.6 Sharp–Unsharp Masking

3.7 Shifts Introduced by Median Filters

3.8 Shifts Introduced by Rank Order Filters

3.9 The Role of Filters in Industrial Applications of Vision

3.10 Color in Image Filtering

3.11 Dilation and Erosion in Binary Images

3.12 Mathematical Morphology

3.13 Morphological Grouping

3.14 Morphology in Grayscale Images

3.15 Concluding Remarks

3.16 Bibliographical and Historical Notes

3.17 Problems

Chapter 4. The role of thresholding


4.1 Introduction

4.2 Region-Growing Methods

4.3 Thresholding

4.4 Adaptive Thresholding

4.5 More Thoroughgoing Approaches to Threshold Selection

4.6 The Global Valley Approach to Thresholding

4.7 Practical Results Obtained Using the Global Valley Method

4.8 Histogram Concavity Analysis

4.9 Concluding Remarks

4.10 Bibliographical and Historical Notes

4.11 Problems

Chapter 5. Edge detection


5.1 Introduction

5.2 Basic Theory of Edge Detection

5.3 The Template Matching Approach

5.4 Theory of 3×3 Template Operators

5.5 The Design of Differential Gradient Operators

5.6 The Concept of a Circular Operator

5.7 Detailed Implementation of Circular Operators

5.8 The Systematic Design of Differential Edge Operators

5.9 Problems With the Above Approach—Some Alternative Schemes

5.10 Hysteresis Thresholding

5.11 The Canny Operator

5.12 The Laplacian Operator

5.13 Concluding Remarks

5.14 Bibliographical and Historical Notes

5.15 Problems

Chapter 6. Corner, interest point, and invariant feature detection


6.1 Introduction

6.2 Template Matching

6.3 Second-Order Derivative Schemes

6.4 A Median Filter–based Corner Detector

6.5 The Harris Interest Point Operator

6.6 Corner Orientation

6.7 Local Invariant Feature Detectors and Descriptors

6.8 Concluding Remarks

6.9 Bibliographical and Historical Notes

6.10 Problems

Chapter 7. Texture analysis


7.1 Introduction

7.2 Some Basic Approaches to Texture Analysis

7.3 Graylevel Co-occurrence Matrices

7.4 Laws’ Texture Energy Approach

7.5 Ade’s Eigenfilter Approach

7.6 Appraisal of the Laws and Ade approaches

7.7 Concluding Remarks

7.8 Bibliographical and Historical Notes

Part 2: Intermediate-level vision

Part 2. Intermediate-level vision

Chapter 8. Binary shape analysis


8.1 Introduction

8.2 Connectedness in Binary Images

8.3 Object Labeling and Counting

8.4 Size Filtering

8.5 Distance Functions and Their Uses

8.6 Skeletons and Thinning

8.7 Other Measures for Shape Recognition

8.8 Boundary Tracking Procedures

8.9 Concluding Remarks

8.10 Bibliographical and Historical Notes

8.11 Problems

Chapter 9. Boundary pattern analysis


9.1 Introduction

9.2 Boundary Tracking Procedures

9.3 Centroidal Profiles

9.4 Problems With the Centroidal Profile Approach

9.5 The (s,ψ) Plot

9.6 Tackling the Problems of Occlusion

9.7 Accuracy of Boundary Length Measures

9.8 Concluding Remarks

9.9 Bibliographical and Historical Notes

9.10 Problems

Chapter 10. Line, circle, and ellipse detection


10.1 Introduction

10.2 Application of the Hough Transform to Line Detection

10.3 The Foot-of-Normal Method

10.4 Using RANSAC for Straight Line Detection

10.5 Location of Laparoscopic Tools

10.6 Hough-Based Schemes for Circular Object Detection

10.7 The Problem of Unknown Circle Radius

10.8 Overcoming the Speed Problem

10.9 Ellipse Detection

10.10 Human Iris Location

10.11 Concluding Remarks

10.12 Bibliographical and Historical Notes

10.13 Problems

Chapter 11. The generalized Hough transform


11.1 Introduction

11.2 The Generalized Hough Transform

11.3 The Relevance of Spatial Matched Filtering

11.4 Gradient Weighting Versus Uniform Weighting

11.5 Use of the GHT for Ellipse Detection

11.6 Comparing the Various Methods for Ellipse Detection

11.7 A Graph-Theoretic Approach to Object Location

11.8 Possibilities for Saving Computation

11.9 Using the GHT for Feature Collation

11.10 Generalizing the Maximal Clique and Other Approaches

11.11 Search

11.12 Concluding Remarks

11.13 Bibliographical and Historical Notes

11.14 Problems

Chapter 12. Object segmentation and shape models


12.1 Introduction

12.2 Active Contours

12.3 Practical Results Obtained Using Active Contours

12.4 The Level-Set Approach to Object Segmentation

12.5 Shape Models

12.6 Concluding Remarks

12.7 Bibliographical and Historical Notes

Part 3: Machine learning and deep learning networks

Part 3. Machine learning and deep learning networks

Chapter 13. Basic classification concepts


13.1 Introduction

13.2 The Nearest Neighbor Algorithm

13.3 Bayes’ Decision Theory

13.4 Relation of the Nearest Neighbor and Bayes’ Approaches

13.5 The Optimum Number of Features

13.6 Cost Functions and Error–Reject Tradeoff

13.7 Supervised and Unsupervised Learning

13.8 Cluster Analysis

13.9 The Support Vector Machine

13.10 Artificial Neural Networks

13.11 The Back-Propagation Algorithm

13.12 Multilayer Perceptron Architectures

13.13 Overfitting to the Training Data

13.14 Concluding Remarks

13.15 Bibliographical and Historical Notes

13.16 Problems

Chapter 14. Machine learning: Probabilistic methods


14.1 Introduction

14.2 Mixtures of Gaussians and the EM Algorithm

14.3 A More General View of the EM Algorithm

14.4 Some Practical Examples

14.5 Principal Components Analysis

14.6 Multiple Classifiers

14.7 The Boosting Approach

14.8 Modeling AdaBoost

14.9 Loss Functions for Boosting

14.10 The LogitBoost Algorithm

14.11 The Effectiveness of Boosting

14.12 Boosting with Multiple Classes

14.13 The Receiver Operating Characteristic

14.14 Concluding Remarks

14.15 Bibliographical and Historical Notes

14.16 Problems

Chapter 15. Deep-learning networks


15.1 Introduction

15.2 Convolutional Neural Networks

15.3 Parameters for Defining CNN Architectures

15.4 LeCun et al.’s LeNet Architecture

15.5 Krizhevsky et al.’s AlexNet Architecture

15.6 Zeiler and Fergus’s Work on CNN Architectures

15.7 Zeiler and Fergus’s Visualization Experiments

15.8 Simonyan and Zisserman’s VGGNet Architecture

15.9 Noh et al.’s DeconvNet Architecture

15.10 Badrinarayanan et al.’s SegNet Architecture

15.11 Recurrent Neural Networks

15.12 Concluding Remarks

15.13 Bibliographical and Historical Notes

Part 4: 3D vision and motion

Part 4. 3D vision and motion

Chapter 16. The three-dimensional world


16.1 Introduction

16.2 Three-Dimensional Vision—The Variety of Methods

16.3 Projection Schemes for Three-Dimensional Vision

16.4 Shape from Shading

16.5 Photometric Stereo

16.6 The Assumption of Surface Smoothness

16.7 Shape from Texture

16.8 Use of Structured Lighting

16.9 Three-Dimensional Object Recognition Schemes

16.10 Horaud’s Junction Orientation Technique

16.11 An Important Paradigm—Location of Industrial Parts

16.12 Concluding Remarks

16.13 Bibliographical and Historical Notes

16.14 Problems

Chapter 17. Tackling the perspective n-point problem


17.1 Introduction

17.2 The Phenomenon of Perspective Inversion

17.3 Ambiguity of Pose Under Weak Perspective Projection

17.4 Obtaining Unique Solutions to the Pose Problem

17.5 Concluding Remarks

17.6 Bibliographical and Historical Notes

17.7 Problems

Chapter 18. Invariants and perspective


18.1 Introduction

18.2 Cross Ratios: The Ratio of Ratios Concept

18.3 Invariants for Noncollinear Points

18.4 Invariants for Points on Conics

18.5 Differential and Semidifferential Invariants

18.6 Symmetric Cross-Ratio Functions

18.7 Vanishing Point Detection

18.8 More on Vanishing Points

18.9 Apparent Centers of Circles and Ellipses

18.10 Perspective Effects in Art and Photography

18.11 Concluding Remarks

18.12 Bibliographical and Historical Notes

18.13 Problems

Chapter 19. Image transformations and camera calibration


19.1 Introduction

19.2 Image Transformations

19.3 Camera Calibration

19.4 Intrinsic and Extrinsic Parameters

19.5 Correcting for Radial Distortions

19.6 Multiple View Vision

19.7 Generalized Epipolar Geometry

19.8 The Essential Matrix

19.9 The Fundamental Matrix

19.10 Properties of the Essential and Fundamental Matrices

19.11 Estimating the Fundamental Matrix

19.12 An Update on the Eight-Point Algorithm

19.13 Image Rectification

19.14 3-D Reconstruction

19.15 Concluding Remarks

19.16 Bibliographical and Historical Notes

19.17 Problems

Chapter 20. Motion


20.1 Introduction

20.2 Optical Flow

20.3 Interpretation of Optical Flow Fields

20.4 Using Focus of Expansion to Avoid Collision

20.5 Time-to-Adjacency Analysis

20.6 Basic Difficulties with the Optical Flow Model

20.7 Stereo from Motion

20.8 The Kalman Filter

20.9 Wide Baseline Matching

20.10 Concluding Remarks

20.11 Bibliographical and Historical Notes

20.12 Problem

Part 5: Putting computer vision to work

Part 5. Putting computer vision to work

Chapter 21. Face detection and recognition: The impact of deep learning


21.1 Introduction

21.2 A Simple Approach to Face Detection

21.3 Facial Feature Detection

21.4 The Viola–Jones Approach to Rapid Face Detection

21.5 The Eigenface Approach to Face Recognition

21.6 More on the Difficulties of Face Recognition

21.7 Frontalization

21.8 The Sun et al. DeepID Face Representation System

21.9 Fast Face Detection Revisited

21.10 The Face as Part of a 3-D Object

21.11 Concluding Remarks

21.12 Bibliographical and Historical Notes

Chapter 22. Surveillance


22.1 Introduction

22.2 Surveillance—The Basic Geometry

22.3 Foreground–Background Separation

22.4 Particle Filters

22.5 Use of Color Histograms for Tracking

22.6 Implementation of Particle Filters

22.7 Chamfer Matching, Tracking, and Occlusion

22.8 Combining Views from Multiple Cameras

22.9 Applications to the Monitoring of Traffic Flow

22.10 License Plate Location

22.11 Occlusion Classification for Tracking

22.12 Distinguishing Pedestrians by Their Gait

22.13 Human Gait Analysis

22.14 Model-based Tracking of Animals

22.15 Concluding Remarks

22.16 Bibliographical and Historical Notes

22.17 Problem

Chapter 23. In-vehicle vision systems


23.1 Introduction

23.2 Locating the Roadway

23.3 Location of Road Markings

23.4 Location of Road Signs

23.5 Location of Vehicles

23.6 Information Obtained by Viewing License Plates and Other Structural Features

23.7 Locating Pedestrians

23.8 Guidance and Egomotion

23.9 Vehicle Guidance in Agriculture

23.10 Concluding Remarks

23.11 More Detailed Developments and Bibliographies Relating to Advanced Driver Assistance Systems

23.12 Problem

Chapter 24. Epilogue—Perspectives in vision


24.1 Introduction

24.2 Parameters of Importance in Machine Vision

24.3 Tradeoffs

24.4 Moore’s Law in Action

24.5 Hardware, Algorithms, and Processes

24.6 The Importance of Choice of Representation

24.7 Past, Present, and Future

24.8 The Deep Learning Explosion

24.9 Bibliographical and Historical Notes

Appendix A. Robust statistics

A.1 Introduction

A.2 Preliminary Definitions and Analysis

A.3 The M-Estimator (Influence Function) Approach

A.4 The Least Median of Squares Approach to Regression

A.5 Overview of the Robustness Problem

A.6 The RANSAC Approach

A.7 Concluding Remarks

A.8 Bibliographical and Historical Notes

A.9 Problems

Appendix B. The sampling theorem

B.1 The Sampling Theorem

Appendix C. The representation of color

C.1 Introduction

C.2 Details of the HSI Color Representation

C.3 A Typical Example of the Use of Color

C.4 Bibliographical and Historical Notes

Appendix D. Sampling from distributions

D.1 Introduction

D.2 The Box–Muller and Related Methods

D.3 Bibliographical and Historical Notes




Academic Press is an imprint of Elsevier

125 London Wall, London EC2Y 5AS, United Kingdom

525 B Street, Suite 1800, San Diego, CA 92101-4495, United States

50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States

The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom

Copyright © 2018 Elsevier Inc. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).


Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

Library of Congress Cataloging-in-Publication Data

A catalog record for this book is available from the Library of Congress

ISBN: 978-0-12-809284-2

For Information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

Publisher: Mara Conner

Acquisition Editor: Tim Pitts

Editorial Project Manager: Charlotte Kent

Production Project Manager: Sruthi Satheesh

Cover Designer: Greg Harris

Typeset by MPS Limited, Chennai, India


This book is dedicated to my family.

To my late mother, Mary Davies, to record her never-failing love and devotion.

To my late father, Arthur Granville Davies, who passed on to me his appreciation of the beauties of mathematics and science.

To my wife, Joan, for love, patience, support, and inspiration.

To my children, Elizabeth, Sarah, and Marion, the music in my life.

To my grandchildren, Jasper, Jerome, Eva, and Tara, for constantly reminding me of the carefree joys of youth!

About the Author

Roy Davies is Emeritus Professor of Machine Vision at Royal Holloway, University of London, United Kingdom. He has worked on many aspects of vision, from feature detection and noise suppression to robust pattern matching and real-time implementations of practical vision tasks. His interests include automated visual inspection, surveillance, vehicle guidance, and crime detection. He has published more than 200 papers and three books—Machine Vision: Theory, Algorithms, Practicalities (1990), Electronics, Noise and Signal Recovery (1993), and Image Processing for the Food Industry (2000); the first of these has been widely used internationally for more than 25 years, and is now out in this much enhanced fifth edition. Roy is a fellow of the IoP and the IET, and a senior member of the IEEE. He is on the Editorial Boards of Pattern Recognition Letters, Real-Time Image Processing, Imaging Science, and IET Image Processing. He holds a DSc from the University of London, he was awarded BMVA Distinguished Fellow in 2005, and Fellow of the International Association of Pattern Recognition in 2008.


Mark S. Nixon, University of Southampton, Southampton, United Kingdom

It is an honor to write a foreword for Roy Davies’ new edition of Computer and Machine Vision, now entitled Computer Vision: Principles, Algorithms, Applications, Learning. This is one of the major books in Computer Vision and not just for its longevity, having now reached its Fifth Edition. It is actually a splendid achievement to reach this status and it reflects not only on the tenacity and commitment of its author, but also on the achievements of the book itself.

Computer Vision has shown awesome progress in its short history. This is part due to technology: computers are much faster and memory is now much cheaper than they were in the early days when Roy started his research. There have been many achievements and many developments. All of this can affect the evolution of a textbook. There have been excellent textbooks in the past, which were neither continued nor maintained. That has been avoided here as the textbook has continued to mature with the field and its many developments.

We can look forward to a future where automated computer vision systems will make our lives easier while enriching them too. There are already many applications of Computer Vision in the food industry and robotic cars that will be with us very soon. Then there are continuing advancements in medical image analysis, where Computer Vision techniques can be used to aid in diagnosis and therapy by automated means. Even accessing a mobile phone is considerably more convenient when using a fingerprint and access by face recognition continues to improve. These have all come about due to advancements in computers, Computer Vision, and applied artificial intelligence.

Adherents of Computer Vision will know it to be an exciting field indeed. It manages to cover many aspects of technology from human vision to machine learning requiring electronic hardware, computer implementations, and a lot of computer software. Roy continues to cover these in excellent detail.

I remember the First Edition when it was first published in 1990 with its unique and pragmatic blend of theory, implementation, and algorithms. I am pleased to see that the Fifth Edition maintains this unique approach, much appreciated by students in previous editions who wanted an accessible introduction to Computer Vision. It has certainly increased in size with age, and that is often the way with books. It is most certainly the way with Computer Vision since many of its researchers continue to improve, refine, and develop new techniques.

A major change here is the inclusion of Deep Learning. Indeed, this has been a major change in the field of Computer Vision and Pattern Recognition. One implication of the increase in computing power and the reduction of memory cost is that techniques can become considerably more complex, and that complexity lends itself to application in the analysis of big data. One cannot ignore the performance of deep learning and convolutional neural networks: one only has to peruse the program of top international conferences to perceive their revolutionary effect on research direction. Naturally, it is early days but it is good to have guidance as we have here. The nature of performance is always in question in any system in artificial intelligence and part of the way to answer those questions is to consider more deeply the architectures and their basis. That again is the function of a textbook for it is the distillation of research and practice in a ratiocinated exposition. It is a brave move to include Deep Learning in this edition, but a necessary one.

And what of Roy Davies himself? Following his DPhil in Solid State Physics at Oxford, he later developed a new sensitive method in Nuclear Resonance called Davies-ENDOR (Electron and Nuclear Double Resonance) which avoided the blind spots of its predecessor Mims-ENDOR. In 1970 he was appointed as a lecturer at Royal Holloway and a long series of publications in pattern recognition and its applications led to the award of his Personal Chair, his DSc and then the Distinguished Fellow of the British Machine Vision Association (BMVA), 2005. He has served the BMVA in many ways, latterly editing its Newsletter. Clearly the level of his work and his many contacts and papers have contributed much to the material that is found herein.

I look forward to having this Fifth Edition sitting proudly in my shelf, replacing the Fourth that will in turn pass to one of my student’s shelves. It will not stop there for long for it is one of the textbooks I often turn to for the information I need. Unlike the snapshots to be found on the Web, in a textbook I find it placed in context and in sequence and with extension to other material. That is the function of a textbook and it will be well served by this Fifth Edition.

July 2017

Preface to the Fifth Edition

Roy Davies, Royal Holloway, University of London, United Kingdom

The first edition of this book came out in 1990, and was welcomed by many researchers and practitioners. However, in the subsequent two decades the subject moved on at a rapidly accelerating rate, and many topics that hardly deserved a mention in the first edition had to be solidly incorporated into subsequent editions. For example, it seemed particularly important to bring in significant amounts of new material on feature detection, mathematical morphology, texture analysis, inspection, artificial neural networks, 3D vision, invariance, motion analysis, object tracking, and robust statistics. And in the fourth edition, cognizance had to be taken of the widening range of applications of the subject: in particular, two chapters had to be added on surveillance and in-vehicle vision systems. Since then, the subject has not stood still. In fact, the past four or five years have seen the onset of an explosive growth in research on deep neural networks, and the practical achievements resulting from this have been little short of staggering. It soon became abundantly clear that the fifth edition would have to reflect this radical departure—both in fundamental explanation and in practical coverage. Indeed, it necessitated a new part in the book—Part 3, Machine Learning and Deep Learning Networks—a heading which affirms that the new content reflects not only Deep Learning (a huge enhancement over the older Artificial Neural Networks) but also an approach to pattern recognition that is based on rigorous probabilistic methodology.

All this is not achieved without presentation problems: for probabilistic methodology can only be managed properly within a rather severe mathematical environment. Too little maths, and the subject could be so watered down as to be virtually content-free: too much maths, and many readers might not be able to follow the explanations. Clearly, one should not protect readers from the (mathematical) reality of the situation. Hence, Chapter 14 had to be written in such a way as to demonstrate in full what type of methodology is involved, while providing paths that would take readers past some of the mathematical complexities—at least, on first encounter. Once past the relatively taxing Chapter 14, Chapters 15 and 21 take the reader through two accounts consisting largely of case studies, the former through a crucial development period (2012–2015) for deep learning networks, and the latter through a similar period (2013–2016) during which deep learning was targeted strongly at face detection and recognition, enabling remarkable advances to be made. It should not go unnoticed that these additions have so influenced the content of the book that the title had to be modified to reflect them. Interestingly, the organization of the book was further modified by collecting three applications chapters into the new Part 5, Putting Computer Vision to Work.

It is worth remarking that, at this point in time, computer vision has attained a level of maturity that has made it substantially more rigorous, reliable, generic, and—in the light of the improved hardware facilities now available for its implementation (in particular, extremely powerful GPUs)—capable of real-time performance. This means that workers are more than ever before using it in serious applications, and with fewer practical difficulties. It is intended that this edition of the book will reflect this radically new and exciting state of affairs at a fundamental level.

A typical final-year undergraduate course on vision for Electronic Engineering and Computer Science students might include much of the work of Chapters 1–13 and Chapter 16, plus a selection of sections from other chapters, according to requirements. For MSc or PhD research students, a suitable lecture course might go on to cover Parts 3 or 4 in depth, and several of the chapters in Part 5, with many practical exercises being undertaken on image analysis systems. (The importance of the appendix on robust statistics should not be underestimated once one gets onto serious work, though this will probably be outside the restrictive environment of an undergraduate syllabus.) Here much will depend on the research programme being undertaken by each individual student. At this stage the text may have to be used more as a handbook for research, and indeed, one of the prime aims of the volume is to act as a handbook for the researcher and practitioner in this important area.

As mentioned in the original Preface, this book leans heavily on experience I have gained from working with postgraduate students: in particular, I would like to express my gratitude to Mark Edmonds, Simon Barker, Daniel Celano, Darrel Greenhill, Derek Charles, Mark Sugrue, and Georgios Mastorakis, all of whom have in their own ways helped to shape my view of the subject. In addition, it is a pleasure to recall very many rewarding discussions with my colleagues Barry Cook, Zahid Hussain, Ian Hannah, Dev Patel, David Mason, Mark Bateman, Tieying Lu, Adrian Johnstone, and Piers Plummer, the last two of whom were particularly prolific in generating hardware systems for implementing my research group’s vision algorithms. Next, I would like to record my thanks to my British Machine Vision Association colleagues for many wide-ranging discussions on the nature of the subject: in particular, I am hugely grateful to Majid Mirmehdi, Adrian Clark, Neil Thacker, and Mark Nixon, who, over time, have strongly influenced the development of the book and left a permanent mark on it. Next, I would like to thank the anonymous reviewers for making insightful comments and what have turned out to be extremely valuable suggestions. Finally, I am indebted to Tim Pitts of Elsevier Science for his help and encouragement, without which this fifth edition might never have been completed.

Supporting materials:

Elsevier’s website for the book contains programming and other resources to help readers and students using this text. Please check the publisher’s website for further information: https://www.elsevier.com/books-and-journals/book-companion/9780128092842.

Preface to the First Edition

Over the past 30 years or so, machine vision has evolved into a mature subject embracing many topics and applications: these range from automatic (robot) assembly to automatic vehicle guidance, from automatic interpretation of documents to verification of signatures, and from analysis of remotely sensed images to checking of fingerprints and human blood cells; currently, automated visual inspection is undergoing very substantial growth, necessary improvements in quality, safety, and cost-effectiveness being the stimulating factors. With so much ongoing activity, it has become a difficult business for the professional to keep up with the subject and with relevant methodologies: in particular, it is difficult for them to distinguish accidental developments from genuine advances. It is the purpose of this book to provide background in this area.

The book was shaped over a period of 10–12 years, through material I have given on undergraduate and postgraduate courses at London University, and contributions to various industrial courses and seminars. At the same time, my own investigations coupled with experience gained while supervising PhD and postdoctoral researchers helped to form the state of mind and knowledge that is now set out here. Certainly it is true to say that if I had had this book 8, 6, 4, or even 2 years ago, it would have been of inestimable value to myself for solving practical problems in machine vision. It is therefore my hope that it will now be of use to others in the same way. Of course, it has tended to follow an emphasis that is my own—and in particular one view of one path towards solving automated visual inspection and other problems associated with the application of vision in industry. At the same time, although there is a specialism here, great care has been taken to bring out general principles—including many applying throughout the field of image analysis. The reader will note the universality of topics such as noise suppression, edge detection, principles of illumination, feature recognition, Bayes’ theory, and (nowadays) Hough transforms. However, the generalities lie deeper than this. The book has aimed to make some general observations and messages about the limitations, constraints, and tradeoffs to which vision algorithms are subject. Thus there are themes about the effects of noise, occlusion, distortion, and the need for built-in forms of robustness (as distinct from less successful ad hoc varieties and those added on as an afterthought); there are also themes about accuracy, systematic design, and the matching of algorithms and architectures. Finally, there are the problems of setting up lighting schemes which must be addressed in complete systems, yet which receive scant attention in most books on image processing and analysis. These remarks will indicate that the text is intended to be read at various levels—a factor that should make it of more lasting value than might initially be supposed from a quick perusal of the contents.

Of course, writing a text such as this presents a great difficulty in that it is necessary to be highly selective: space simply does not allow everything in a subject of this nature and maturity to be dealt with adequately between two covers. One solution might be to dash rapidly through the whole area mentioning everything that comes to mind, but leaving the reader unable to understand anything in detail or to achieve anything having read the book. However, in a practical subject of this nature this seemed to me a rather worthless extreme. It is just possible that the emphasis has now veered too much in the opposite direction, by coming down to practicalities (detailed algorithms, details of lighting schemes, and so on): individual readers will have to judge this for themselves. On the other hand, an author has to be true to himself and my view is that it is better for a reader or student to have mastered a coherent series of topics than to have a mishmash of information that he is later unable to recall with any accuracy. This, then, is my justification for presenting this particular material in this particular way and for reluctantly omitting from detailed discussion such important topics as texture analysis, relaxation methods, motion, and optical flow.

As for the organization of the material, I have tried to make the early part of the book lead into the subject gently, giving enough detailed algorithms (especially in Chapter 2: Images and imaging operations and Chapter 6: Corner, interest point, and invariant feature detection) to provide a sound feel for the subject—including especially vital, and in their own way quite intricate, topics such as connectedness in binary images. Hence Part I provides the lead-in, although it is not always trivial material and indeed some of the latest research ideas have been brought in (e.g., on thresholding techniques and edge detection). Part II gives much of the meat of the book. Indeed, the (book) literature of the subject currently has a significant gap in the area of intermediate-level vision; while high-level vision (AI) topics have long caught the researcher’s imagination, intermediate-level vision has its own difficulties which are currently being solved with great success (note that the Hough transform, originally developed in 1962, and by many thought to be a very specialist topic of rather esoteric interest, is arguably only now coming into its own). Part II and the early chapters of Part III aim to make this clear, while Part IV gives reasons why this particular transform has become so useful. As a whole, Part III aims to demonstrate some of the practical applications of the basic work covered earlier in the book, and to discuss some of the principles underlying implementation: it is here that chapters on lighting and hardware systems will be found. As there is a limit to what can be covered in the space available, there is a corresponding emphasis on the theory underpinning practicalities. Probably this is a vital feature, since there are many applications of vision both in industry and elsewhere, yet listing them and their intricacies risks dwelling on interminable detail, which some might find insipid; furthermore, detail has a tendency to date rather rapidly. Although the book could not cover 3D vision in full (this topic would easily consume a whole volume in its own right), a careful overview of this complex mathematical and highly important subject seemed vital. It is therefore no accident that Chapter 16, The three-dimensional world, is the longest in the book. Finally, Part IV asks questions about the limitations and constraints of vision algorithms and answers them by drawing on information and experience from earlier chapters. It is tempting to call the last chapter the Conclusion. However, in such a dynamic subject area any such temptation has to be resisted, although it has still been possible to draw a good number of lessons on the nature and current state of the subject. Clearly, this chapter presents a personal view but I hope it is one that readers will find interesting and useful.


The author would like to credit the following sources for permission to reproduce tables, figures, and extracts of text from earlier publications:


For permission to reprint portions of the following papers from Image and Vision Computing as text in Chapter 5; as Tables 5.1–5.5; and as Figs. 3.31, 5.2:

Davies (1984b, 1987b)

For permissiovon to reprint portions of the following paper from Pattern Recognition as text in Chapter 8; and as Fig. 8.11:

Davies and Plummer (1981)

For permission to reprint portions of the following papers from Pattern Recognition Letters as text in Chapters 3, 5, 10, 11, 13; as Tables 3.2; 10.4; 11.1; and as Figs. 3.6, 3.8, 3.10, 5.1, 5.3, 10.1, 10.10, 10.11, 10.12, 10.13, 11.1,11.3, 11.4, 11.5, 11.6, 11.7, 11.8, 11.9, 11.10, 11.11:

Davies (1986, 1987a,c,d, 1988b,c,e, 1989a)

For permission to reprint portions of the following paper from Signal Processing as text in Chapter 3; and as Fig. 3.15, 3.17, 3.18, 3.19, 3.20:

Davies (1989b)

For permission to reprint portions of the following paper from Advances in Imaging and Electron Physics as text in Chapter 3:

Davies (2003c)

For permission to reprint portions of the following article from Encyclopedia of Physical Science and Technology as Figs. 8.9, 8.12, 9.1, 9.4:

Davies, E.R., 1987. Visual inspection, automatic (robotics). In: Meyers, R.A. (Ed.) Encyclopedia of Physical Science and Technology, vol. 14. Academic Press, San Diego, pp. 360–377.


For permission to reprint portions of the following paper as text in Chapter 3; and as Figs. 3.4, 3.5, 3.7, 3.11:

Davies (1984a)


For permission to reprint portions of the following papers from the IET Proceedings and Colloquium Digests as text in Chapters 3, 4, 6, 13, 21, 22, 23; as Tables 3.3, 4.2; and as Fig. 3.21, 3.28, 3.29, 4.6, 4.7, 4.8, 4.9, 4.10, 6.5, 6.6, 6.7, 6.8, 6.9, 6.12, 11.20, 14.16, 14.17, 22.16, 22.17, 22.18, 23.1, 23.3, 23.4:

Davies (1988a, 1999c, 2000a, 2005, 2008)

Sugrue and Davies (2007)

Mastorakis and Davies (2011)

Davies et al. (1998)

Davies et al. (2003)

IFS Publications Ltd

For permission to reprint portions of the following paper as text in Chapters 12, 20; and as Figs. 10.7, 10.8:

Davies (1984c)

The Royal Photographic Society

For permission to reprint portions of the following papers (see also the Maney website: www.maney.co.uk/journals/ims) as text in Chapter 3; and as Fig. 3.12, 3.13, 3.22, 3.23, 3.24:

Davies (2000c)

Charles and Davies (2004)


For permission to reprint portions of the following papers as text in Chapter 6; and as Figs. 6.2, 6.4:

Davies (1988d), Figs. 1–3

World Scientific

For permission to reprint portions of the following book as text in Chapters 7, 22, 23; and as Fig. 3.25, 3.26, 3.27, 5.4, 22.20, 23.15, 23.16:

Davies, 2000. Image Processing for the Food Industry. World Scientific, Singapore.

The Committee of the Alvey Vision Club

To acknowledge that extracts of text in Chapter 11 and Figs. 11.12, 11.13, 11.17 were first published in the Proceedings of the 4th Alvey Vision Conference:

Davies, E.R., 1988. An alternative to graph matching for locating objects from their salient features. In: Proceedings of 4th Alvey Vision Conference, Manchester, 31 August–2 September, pp. 281–286.

F.H. Sumner

For permission to reprint portions of the following article from State of the Art Report: Supercomputer Systems Technology as text in Chapter 8; and as Fig. 8.4:

Davies, E.R., 1982. Image processing. In: Sumner, F.H. (Ed.), State of the Art Report: Supercomputer Systems Technology. Pergamon Infotech, Maidenhead, pp. 223–244.

Royal Holloway, University of London

For permission to reprint extracts from the following examination questions, originally written by E.R. Davies:

EL385/97/2; EL333/98/2; EL333/99/2, 3, 5, 6; EL333/01/2, 4–6; PH5330/98/3, 5; PH5330/03/1–5; PH4760/04/1–5.

University of London

For permission to reprint extracts from the following examination questions, originally written by E.R. Davies:

PH385/92/2, 3; PH385/93/1–3; PH385/94/1–4; PH385/95/4; PH385/96/3, 6; PH433/94/3, 5; PH433/96/2, 5.

Collectors of publicly available image databases and utilities

To acknowledge use of the following image databases and utilities for generating a number of images presented in Chapters 15 and 21:

The Cambridge semantic segmentation online demo

The images in Fig. 15.14 were processed using the online demo available from the University of Cambridge, UK (see Badrinarayanan et al., 2015) at

http://mi.eng.cam.ac.uk/projects/segnet/ (website accessed 07.10.16).

The CMU image dataset

The newsradio image used to obtain Fig. 21.6 was taken from Test Set C—collected at CMU by Rowley, H.A., Baluja, S., and Kanade, T.—and is described in their paper:

Rowley, H.A., Baluja, S., Kanade, T., 1998. Neural network-based face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 23–38.

It may be downloaded from the website:

http://vasc.ri.cmu.edu/idb/html/face/frontal_images/ (website accessed 20.04.17).

The Bush LFW dataset

The images of George W. Bush used in Chapter 21 were taken from the set collected at the University of Massachusetts:

Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E., 2007. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. University of Massachusetts, Amherst, Technical Report 07-49, October.

The database may be downloaded from the website:

http://vis-www.cs.umass.edu/lfw/ (website accessed 20.04.17).

Topics Covered in Application Case Studies

Influences Impinging Upon Integrated Vision System Design

Glossary of Acronyms and Abbreviations

1-D one dimension/one-dimensional

2-D two dimensions/two-dimensional

3-D three dimensions/three-dimensional

AAM active appearance model

ACM Association for Computing Machinery (USA)

ADAS advanced driver assistance system

AFW annotated faces in the wild

AI artificial intelligence

ANN artificial neural network

AP average precision

APF auxiliary particle filter

ASCII American Standard Code for Information Interchange

ASIC application specific integrated circuit

ASM active shape model

ATM automated teller machine

AUC area under curve

AVI audio video interleave

BCVM between-class variance method

BDRF bidirectional reflectance distribution function

BetaSAC beta [distribution] sampling consensus

BMVA British Machine Vision Association

BPTT backpropagation through time

CAD computer-aided design

CAM computer-aided manufacture

CCTV closed-circuit television

CDF cumulative distribution function

CLIP cellular logic image processor

CNN convolutional neural network

CPU central processor unit

CRF conditional random field

DCSM distinct class based splitting measure

DET Beaudet determinant operator

DG differential gradient

DN Dreschler–Nagel corner detector

DNN deconvolution network

DoF degree of freedom

DoG difference of Gaussians

DPM deformable parts models

EM expectation maximization

EURASIP European Association for Signal Processing

f.c. fully connected

FAR frontalization for alignment and recognition

FAST features from accelerated segment test

FCN fully convolutional network

FDDB face detection data set and benchmark

FDR face detection and recognition

FFT fast Fourier transform

FN false negative

fnr false negative rate

FoE focus of expansion

FoV field of view

FP false positive

FPGA field programmable gate array

FPP full perspective projection

fpr false positive rate

GHT generalized Hough transform

GLOH gradient location and orientation histogram

GMM Gaussian mixture model

GPS global positioning system

GPU graphics processing unit

GroupSAC group sampling consensus

GVM global valley method

HOG histogram of orientated gradients

HSI hue, saturation, intensity

HT Hough transform

IBR intensity extrema-based region detector

IDD integrated directional derivative

IEE Institution of Electrical Engineers (UK)

IEEE Institute of Electrical and Electronics Engineers (USA)

IET Institution of Engineering and Technology (UK)

ILSVRC ImageNet large-scale visual recognition object challenge

ILW iterated likelihood weighting

IMPSAC importance sampling consensus

IoP Institute of Physics (UK)

IRLFOD image-restricted, label-free outside data

ISODATA iterative self-organizing data analysis

JPEG/JPG Joint Photographic Experts Group

k-NN k-nearest neighbor

KL Kullback–Leibler

KR Kitchen–Rosenfeld corner detector

LED light emitting diode

LFF local-feature-focus method

LFPW labeled face parts in the wild

LFW labeled faces in the wild

LIDAR light detection and ranging

LMedS least median of squares

LoG Laplacian of Gaussian

LRN local response normalization

LS least squares

LSTM long short-term memory

LUT lookup table

MAP maximum a posteriori

MDL minimum description length

ML machine learning

MLP multi-layer perceptron

MoG mixture of Gaussians

MP microprocessor

MSER maximally stable extremal region

NAPSAC n adjacent points sample consensus

NIR near infra-red

NN nearest neighbor

OCR optical character recognition

OVR one versus the rest

PASCAL Network of Excellence on pattern analysis, statistical modeling and computational learning

PC personal computer

PCA principal components analysis

PE processing element

PnP perspective n-point

PPR probabilistic pattern recognition

PR pattern recognition

PROSAC progressive sample consensus

PSF point spread function

R-CNN regions with CNN features

RAM random access memory

RANSAC random sample consensus

RBF radial basis function [classifier]

RELU rectified linear unit

RGB red, green, blue

RHT randomized Hough transform

RKHS reproducible kernel Hilbert space

RMS root mean square

RNN recurrent neural network

ROC receiver–operator characteristic

RoI region of interest

RPS Royal Photographic Society (UK)

s.d. standard deviation

SFC Facebook social face classification

SFOP scale-invariant feature operator

SIFT scale invariant feature transform

SIMD single instruction stream, multiple data stream

Sir sampling importance resampling

SIS sequential importance sampling

SISD single instruction stream, single data stream

SOC sorting optimization curve

SOM self-organizing map

SPIE Society of Photo-optical Instrumentation Engineers

SPR statistical pattern recognition

STA spatiotemporal attention [neural network]

SURF speeded-up robust features

SUSAN smallest univalue segment assimilating nucleus

SVM support vector machine

TM template matching

TMF truncated median filter

TN true negative

tnr true negative rate

TP true positive

tpr true positive rate

TV television

USEF unit step edge function

VGG Visual Geometry Group (Oxford)

VJ Viola–Jones

VLSI very large scale integration

VMF vector median filter

VOC visual object classes

VP vanishing point

WPP weak perspective projection

YOLO you only look once

YTF YouTube faces

ZH Zuniga–Haralick corner detector

Chapter 1

Vision, the challenge


This chapter introduces the subject of computer vision. It shows how recognition may be performed partly by image processing, although abstract pattern recognition methods are usually needed to complete the task. Important in this process is normalization of the image content to reduce variability so that statistical pattern recognizers such as the nearest neighbor algorithm can carry out their task with limited training requirements and low error rates. It extends the discussion by introducing machine learning and the recently prominent deep learning networks. This chapter also discusses the various applications of vision, contrasting automated visual inspection, and surveillance.


Computer vision; process of recognition; nearest neighbor algorithm; template matching; image preprocessing; need for normalization; machine learning; deep learning networks; automated visual inspection; surveillance

1.1 Introduction—Man and His Senses

Of the five senses—vision, hearing, smell, taste, and touch—vision is undoubtedly the one that man has come to depend upon above all others, and indeed the one that provides most of the data he receives. Not only do the input pathways from the eyes provide megabits of information at each glance but also the data rates for continuous viewing probably exceed 10 Mbps. However, much of this information is redundant and is compressed by the various layers of the visual cortex, so that the higher centers of the brain have to interpret abstractly only a small fraction of the data. Nonetheless, the amount of information the higher centers receive from the eyes must be at least two orders of magnitude greater than all the information they obtain from the other senses.

Another feature of the human visual system is the ease with which interpretation is carried out. We see a scene as it is—trees in a landscape, books on a desk, widgets in a factory. No obvious deductions are needed and no overt effort is required to interpret each scene; in addition, answers are effectively immediate and are normally available within a tenth of a second. Just now and again some doubt arises—e.g., a wire cube might be seen correctly or inside out. This and a host of other optical illusions are well known, although for the most part we can regard them as curiosities—irrelevant freaks of nature. Somewhat surprisingly, illusions are quite important, since they reflect hidden assumptions that the brain is making in its struggle with the huge amounts of complex visual data it is receiving. We have to pass by this story here (although it resurfaces now and again in various parts of this book). However, the important point is that we are for the most part unaware of the complexities of vision. Seeing is not a simple process: it is just that vision has evolved over millions of years, and there was no particular advantage in evolution giving us any indication of the difficulties of the task (if anything, to have done so would have cluttered our minds with irrelevant information and slowed our reaction times).

In the present-day and age, man is trying to get machines to do much of his work for him. For simple mechanistic tasks this is not particularly difficult, but for more complex tasks the machine must be given the sense of vision. Efforts have been made to achieve this, sometimes in modest ways, for well over 40 years. At first, schemes were devised for reading, for interpreting chromosome images, and so on; but when such schemes were confronted with rigorous practical tests, the problems often turned out to be more difficult. Generally, researchers react to finding that apparent trivia are getting in the way by intensifying their efforts and applying great ingenuity, and this was certainly so with early efforts at vision algorithm design. However, it soon became plain that the task really is a complex one, in which numerous fundamental problems confront the researcher, and the ease with which the eye can interpret scenes turned out to be highly deceptive.

Of course, one of the ways in which the human visual system gains over the machine is that the brain possesses more than 10¹⁰ cells (or neurons), some of which have well over 10,000 contacts (or synapses) with other neurons. If each neuron acts as a type of microprocessor, then we have an immense computer in which all the processing elements can operate concurrently. Taking the largest single man-made computer to contain several hundred million rather modest processing elements, the majority of the visual and mental processing tasks that the eye–brain system can perform in a flash have no chance of being performed by present-day man-made systems. Added to these problems of scale, there is the problem of how to organize such a large processing system and also how to program it. Clearly, the eye–brain system is partly hard-wired by evolution but there is also an interesting capability to program it dynamically by training during active use. This need for a large parallel processing system with the attendant complex control problems shows that computer vision must indeed be one of the most difficult intellectual problems to tackle.

So what are the problems involved in vision that make it apparently so easy for the eye, yet so difficult for the machine? In the next few sections an attempt is made to answer this question.

1.2 The Nature of Vision

1.2.1 The Process of Recognition

This section illustrates the intrinsic difficulties of implementing computer vision, starting with an extremely simple example—that of character recognition. Consider the set of patterns shown in Fig. 1.1A. Each pattern can be considered as a set of 25 bits of information, together with an associated class indicating its interpretation. In each case imagine a computer learning the patterns and their classes by rote. Then any new pattern may be classified (or recognized) by comparing it with this previously learnt training set, and assigning it to the class of the nearest pattern in the training set. Clearly, test pattern (1) (Fig. 1.1B) will be allotted to class U on this basis. Chapter 13, Basic Classification Concepts, shows that this method is a simple form of the nearest neighbor approach to pattern recognition.

Figure 1.1 Some simple 25-bit patterns and their recognition classes used to illustrate some of the basic problems of recognition: (A) training set patterns (for which the known classes are indicated); (B) test patterns.

The scheme outlined above seems straightforward and is indeed highly effective, even being able to cope with situations where distortions of the test patterns occur or where noise is present: this is illustrated by test patterns (2) and (3). However, this approach is not always foolproof. First, there are situations where distortions or noise is excessive, so errors of interpretation arise. Second, there are situations where patterns are not badly distorted or subject to obvious noise, yet are misinterpreted: this seems much more serious, since it indicates an unexpected limitation of the technique rather than a reasonable result of noise or distortion. In particular, these problems arise where the test pattern is displaced or misorientated relative to the appropriate training set pattern, as with test pattern (6).

As will be seen in Chapter 13, Basic Classification Concepts, there is a powerful principle that indicates why the unlikely limitation given above can arise: it is simply that there are insufficient training set patterns, and that those that are present are insufficiently representative of what will arise in practical situations. Unfortunately, this presents a major difficulty, since providing enough training set patterns incurs a serious storage problem and an even more serious search problem when patterns are tested. Furthermore, it is easy to see that these problems are exacerbated as patterns become larger and more real (obviously, the examples of Fig. 1.1 are far from having enough resolution even to display normal type-fonts). In fact, a combinatorial explosion takes place: this is normally taken to mean that one or more parameters produce fast-varying (often exponential) effects, which explode as the parameters increase by modest amounts. Forgetting for the moment that the patterns of Fig. 1.1 have familiar shapes, let us temporarily regard them as random bit patterns. Now the number of bits in these N×N patterns is N: even in a case where N=20, remembering all these patterns and their interpretations would be impossible on any practical machine, and searching systematically through them would take impracticably long (involving times of the order of the age of the universe). Thus it is not only impracticable to consider such brute force means of solving the recognition problem, but is also effectively impossible theoretically. These considerations show that other means are required to tackle the problem.

1.2.2 Tackling the Recognition Problem

An obvious means of tackling the recognition problem is to standardize the images in some way. Clearly, normalizing the position and orientation of any 2D picture object would help considerably: indeed this would reduce the number of degrees of freedom by three. Methods for achieving this involve centralizing the objects—arranging that their centroids are at the center of the normalized image—and making their major axes (e.g., deduced by moment calculations) vertical or horizontal. Next, we can make use of the order that is known to be present in the image—and here it may be noted that very few patterns of real interest are indistinguishable from random dot patterns. This approach can be taken further: if patterns are to be nonrandom, isolated noise points may be eliminated. Ultimately, all these methods help by making the test pattern closer to a restricted set of training set patterns (although care must also be taken to process the training set patterns initially so that they are representative of the processed test patterns).

It is useful to consider character recognition further. Here we can make additional use of what is known about the structure of characters—namely, that they consist of limbs of roughly constant width. In that case the width carries no useful information, so the patterns can be thinned to stick figures (called skeletons—see Chapter 8: Binary Shape Analysis); then, hopefully, there is an even greater chance that the test patterns will be similar to appropriate training set patterns (Fig. 1.2). This process can be regarded as another instance of reducing the number of degrees of freedom in the image, and hence of helping to minimize the combinatorial explosion—or, from a practical point of view, to minimize the size of the training set necessary for effective recognition.

Figure 1.2 Use of thinning to regularize character shapes. Here character shapes of different limb widths—or even varying limb widths—are reduced to stick figures or skeletons. Thus irrelevant information is removed and at the same time recognition is facilitated.

Next, consider a rather different way of looking at the problem. Recognition is necessarily a problem of discrimination—i.e., of discriminating between patterns of different classes. However, in practice, considering the natural variation of patterns, including the effects of noise and distortions (or even the effects of breakages or occlusions), there is also a problem of generalizing over patterns of the same class. In practical problems there is a tension between the need to discriminate and the need to generalize. Nor is this a fixed situation. Even for the character recognition task, some classes are so close to others (n’s and h’s will be similar) that less generalization is possible than in other cases. On the other hand, extreme forms of generalization arise when, for example, an A is to be recognized as an A whether it is a capital or small letter, or in italic, bold, suffix, or other form of font—even if it is handwritten. The variability is determined largely by the training set initially provided. What we emphasize here, however, is that generalization is as necessary a prerequisite to successful recognition as is discrimination.

At this point it is worth considering more carefully the means whereby generalization was achieved in the examples cited above. First, objects were positioned and orientated appropriately; second, they were cleaned of noise spots; and third, they were thinned to skeleton figures (although the latter process is relevant only for certain tasks such as character recognition). In the last case, we are generalizing over characters drawn with all possible limb widths, width being an irrelevant degree of freedom for this type of recognition task. Note that we could have generalized the characters further by normalizing their size and saving another degree of freedom. The common feature of all these processes is that they aim to give the characters a high level of standardization against known types of variability before finally attempting to recognize them.

The standardization (or generalization) processes outlined above are all realized by image processing, i.e., the conversion of one image into another by suitable means. The result is a two-stage recognition scheme: first, images are converted into more amenable forms containing the same numbers of bits of data; and second, they are classified with the result that their data content is reduced to very few bits (Fig. 1.3). In fact, recognition is a process of data abstraction, the final data being abstract and totally unlike the original data. Thus we must imagine a letter A starting as an array of perhaps 20×20 bits arranged in the form of an A, and then ending as the 7 bits in an ASCII representation of an A, namely 1000001 (which is essentially a random bit pattern bearing no resemblance to an A).

Figure 1.3 The two-stage recognition paradigm: C, input from camera; G, grab image (digitize and store); P, preprocess; R, recognize (i, image data; a, abstract data). The classical paradigm for object recognition is that of (1) preprocessing (image processing) to suppress noise or other artefacts and to regularize the image data and (2) applying a process of abstract (often statistical) pattern recognition to extract the very few bits required to classify the object.

The last paragraph reflects to a large extent the history of image analysis. Early on, a good proportion of the image analysis problems being tackled were envisaged as consisting of an image preprocessing task carried out by image processing techniques, followed by a recognition task undertaken by pure pattern recognition methods (see Chapter 13: Basic Classification Concepts). These two topics—image processing and pattern recognition—consumed much research effort and effectively dominated the subject of image analysis, while intermediate-level approaches such as the Hough transform were, for a time, slower to develop. One of the aims of this book is to ensure that such intermediate-level processing techniques are given due emphasis, and indeed that the best range of techniques is applied to any computer vision task.

1.2.3 Object Location

The problem that was tackled above—that of character recognition—is a highly constrained one. In a great many practical applications it is necessary to search pictures for objects of various types, rather than just interpreting a small area of a picture.

Search is a task that can involve prodigious amounts of computation and is also subject to a combinatorial explosion. Imagine the task of searching for a letter E in a page of text. An obvious way of achieving this is to move a suitable template of size n×n over the whole image, of size N×N, and to find where a match occurs (Fig. 1.4). A match can be defined as a position where there is exact agreement between the template and the local portion of the image but, in keeping with the ideas of Section 1.2.1, it will evidently be more relevant to look for a best local match (i.e., a position where the match is locally better than in adjacent regions) and where the match is also good in some more absolute sense, indicating that an E is present.

Figure 1.4 Template matching, the process of moving a suitable template over an image to determine the precise positions at which a match occurs, hence revealing the presence of objects of a particular type.

One of the most natural ways of checking for a match is to measure the Hamming distance between the template and the local n×n region of the image, i.e., to sum the number of differences between corresponding bits. This is essentially the process described in Section 1.2.1. Then places with a low Hamming distance are places where the match is good. These template-matching ideas can be extended to cases where the corresponding bit positions in the template and the image do not just have binary values but may have intensity values over a range 0–255. In that case the sums obtained are no longer Hamming distances but may be generalized to the form:


It being the local template value, Ii being the local image value, and the sum being taken over the area of the template. This makes template matching practicable in many situations: the possibilities are examined in more detail in subsequent chapters.

We referred above to a combinatorial explosion in this search problem too. The reason this arises is as follows. First, when a 5×5 template is moved over an N×N image in order to look for a match, the number of operations required is of the order of 5²N², totaling some 1 million operations for a 256×256 image. The problem is that when larger objects are being sought in an image, the number of operations increases as the square of the size of the object, the total number of operations being N²n² when an n×n template is used. For a 30×30 template and a 256×256 image, the number of operations required rises to ~60 million. Note that, in general, a template will be larger than the object it is used to search for, because some background will have to be included to help demarcate the object.

Next, recall that in general, objects may appear in many orientations in an image (E’s on a printed page are exceptional). If we imagine a possible 360 orientations (i.e., one per degree of rotation), then a corresponding number of templates will in principle have to be applied in order to locate the object. This additional degree of freedom pushes the search effort and time to enormous levels, so far away from the possibility of real-time implementation that new approaches must be found for tackling the task. [Real-time is a commonly used phrase meaning that the information has

Ați ajuns la sfârșitul acestei previzualizări. Înscrieți-vă pentru a citi mai multe!
Pagina 1 din 1


Ce părere au oamenii despre Computer Vision

1 evaluări / 0 Recenzii
Ce părere aveți?
Evaluare: 0 din 5 stele

Recenziile cititorilor