Sunteți pe pagina 1din 9


Quantifying the marginal benefit
of exploiting correlations between
adjacent characters and words
° ich field of research with many applicable domains
° Off-line vs. On-line (includes time-sequence info)
° Handwritten vs. Typed
° Cursive vs. Hand-printed
° Cooperative vs. andom Writers
° Language-specific differences of grammar and dictionary size
° We focus on off-line mixed-modal English data set with mostly
handwritten and some cursive data
° Observation is monochrome bitmap representation of each letter
with segmentation problem already solved for us (but poorly)
° Pre-processing of dataset for noise filtering and normalizations of
scale also assumed done
° Œtatistical Grammar ules and Dictionaries
° Feature Extraction of observations
° Global features: Moments and invariants of image (e.g.,
percentage of pixels in certain region, measuring curvature)
° Local features: Group windows around image pixels

° Hidden Markov Models

° Used mostly in cursive domain for easy training and to
avoid segmentation issues
° Most HMMs use very large models with words as states,
combined with above approaches, which is more applicable
to domains of small dictionary size with other restrictions
° Data Collected from 159
subjects with varying styles,
printed and cursive
° Missing first letter of each
word to simplify capital
° Each character represented
by 16x8 array of bits
° Character meta-data
includes correct labels and
end-of-word boundaries
° Pre-processed into 10
cross-validation folds
°   Quantify the impact of correlations between
adjacent letters and words
° Œ    Learn an accurate classifier for our data set
 Use a HMM and compare to other algorithms
° 26 states of HMM each represent letter of alphabet
° Œupervised learning of model with labeled data
° Prior probabilities and transition matrix learned by frequency of
letters in training
° Learning algorithm for emission probabilities uses Naive Bayes
assumption (i.e., pixels conditionally independent given the letter)
° Viterbi algorithm predicts most probable sequence of states given
the observed character pixel maps
° Learning algorithms implemented and tested:
° Baseline Algorithm: Naïve Bayes Classifier (no HMM)
° Algorithm 2: NB with maximum probable classification over a set
of shifted observations
° Motivation was to compensate for correlations between adjacent
pixels not included in Naïve Bayes assumption
° Algorithm 3: HMM with NB assumption
° Fix for incomplete data: Examples µhallucinated¶ prior to training
° Algorithm 4: Optimized HMM with NB assumption
° Ignore effects of inter-word transitions when learning HMM
° Algorithm 5: Dictionary Creation and Lookup with NB assumption
(no HMM)
° Geared toward specific data set with small dictionary size, but less
generalizable to more constrained data sets with larger dictionaries
° Other variants considered but not implemented:
° Joint Bayes parameter estimation (too many probabilities
to learn, 2^128 vs. 3,328)
° HMM with 2nd-order Markov assumption (exponential in
number of Viterbi paths)
° Training Naïve Bayes over a set of shifted and overlayed
observations (preprocessing to create thicker boundary)
° All experiments run with 10-fold cross-validation
° esults given as averages with standard deviations











 !  ! "# $ #    %       

° Naïve Bayes classifier did pretty good on its own (62.7%
accuracy - 15x better than random classifier!)
° Classification on shifted data did worse since we lost data on
° Œmall dictionary size of dataset affected results:
° Optimized HMM w/ NB achieves 71% accuracy
° Optimizations only marginally significant because of dataset
° More simple and flexible approach for achieving impressive results on
other datasets
° Dictionary approach is almost perfect with 99.3% accuracy!
° Demonstrates additional benefit of exploiting domain constraints,
grammatical or syntactic rules
° Not always feasible: dictionary may be unknown, too large, or the
data may not be predictable

S-ar putea să vă placă și