0 evaluări0% au considerat acest document util (0 voturi)
1K vizualizări47 pagini
Oral “Nick” Hillary’s defense team seeks a Frye Hearing calling into question the reliability of methods used by prosecutors to obtain DNA results to connect him to the 2011 murder of 12-year-old Garret J. Phillips.
Oral “Nick” Hillary’s defense team seeks a Frye Hearing calling into question the reliability of methods used by prosecutors to obtain DNA results to connect him to the 2011 murder of 12-year-old Garret J. Phillips.
Oral “Nick” Hillary’s defense team seeks a Frye Hearing calling into question the reliability of methods used by prosecutors to obtain DNA results to connect him to the 2011 murder of 12-year-old Garret J. Phillips.
STATE OF NEW YORK
ST. LAWRENCE COUNTY COURT
PEOPLE OF THE STATE OF NEW YORK:
NOTICE OF
against k MOTION TO PRECLUDE
ORAL NICHOLAS HILLARY, : Indictment No. 2015-15
Defendant. Hon. Felix J. Catena
Xx
PLEASE TAKE NOTICE, that upon the annexed affirmation of EARL S. WARD and
the accompanying memorandum of law, the undersigned will move the County Court of St
Lawrence County on the 1 Day of July, 2016, at 9:30 a.m., or as soon thereafter as counsel may
be heard, for an Order granting the following relief:
1. Precluding the prosecution from offering expert testimony as to the use of, or any
results produced by, the forensic software tool STRmix because the use of this software for
probabilistic genotyping is not generally accepted in the relevant scientific and legal
communities as required by Frye v. United States, 293 F. 1013 (D.C. Cir, 1923); or, in the
alternative, granting a pre-trial Frye hearing on the issues; or in the alternative,
2. Precluding the prosecution from calling an expert witness to testify on their ditect
case regarding any conclusion reached by the use of the STRmix on the ground that the
prosecution cannot lay a foundation for the introduction of the evidenee; specifically that the
application of the STRmix exceeds the limits of validation, rendering it unreliable, see People v.
Wesley, 83 N.Y.2d 417 (1994) and Parker v. Mobil Oil Corp,, 7 N.Y.3d 434, 447 (2006), or in
the alternative, directing that a hearing be held on the reliability of any proposed testimony about
the aforementioned, See People v. Wesley, 83 N.Y.2d 417,DATED: New York, New York
May 31, 2016
Respectfully yours,
He OY
Earl S. Ward
Counsel to Mr. Hillary
TO: The Hon, Mary Rain
Office of the District Attomey, St, Lawrence County
Clerk of Court
St. Lawrence CountySTATE OF NEW YORK
ST. LAWRENCE COUNTY COURT
PEOPLE OF THE STATE OF NEW YORK,
AFFIRMATION
— against ~
ORAL NICHOLAS HILLARY, : Indictment No, 2015-15
Defendant. Hon. Felix J. Catena
x
AFFIRMATION
EARL S. WARD, an attorney admitted to practice in the courts of this State, hereby
affirms, under penalty of perjury, that the following is true, except for those statements made
upon information and belief, which are believed to be true:
1, Tam counsel of record for ORAL NICHOLAS HILLARY. This Affirmation is
‘made based upon information and belief based upon review of the record in this matter and all
prior proceedings to date.
2. This Affirmation is made in support of Mr. Hillary’s motion to preclude the
prosecution from offering expert testimony regarding any DNA results obtained using the
forensic software program STRmix. In the alternative, Mr. Hillary requests a Frye hearing on
the issues raised.
RELEVANT FACTUAL BACKGROUND
A. Background of Case
3. On October 24, 2011, the Potsdam Police Department received a eall from
Marissa Vogel, neighbor of Garrett Phillips, stating that she heard moans and the word “help”coming from Garrett Phillip’s apartment at 100 Market Street (referred to hereinafter as Crime
Scene).
4. Officer Mark Wentworth of the Potsdam Pol
Department arrived at the Crime
Scene at approximately 5:16 p.m.
5. When Officer Wentworth knocked on the door of the apartment, he heard what
sounded like someone quietly walking around.
6. At 5:24 pm, Officer Wentworth reported that he still heard someone walking
around the apartment.
7. Shortly after 5:37 pm, Officer Wentworth entered the apartment with the landlord,
Rick Dumas, and found Garrett Philips unconscious in the bedroom,
8. There was no one else in the apartment,
9. Garrett Phillips was pronounced dead on 7:18 p.m. on the evening of October 24,
2011.
10. ‘The death was ruled a homicide, as there was evidence that Garrett Phillips had
been killed by strangulation.
11, The defendant Oral “Nick” Hillary became a suspect, although the district
attorney at the time would decline to prosecute him. Mr. Hillary had had an intimate relationship
with Tandy Cyrus, the mother of Garrett Phillips. They had briefly lived together, but had broken
up. Ms. Cyrus also had had an intimate relationship with Deputy John Jones, who visited the
crime scene and participated in the initial investigation.
12, Insecking to identify the killer or killers and to rule out Mr. Hillary as a suspect,
law enforcement performed an investigation, including the collection of dozens of DNA samples
taken from multiple areas of the Crime Scene, the body and clothing of Garrett Phillips, theinterior of Mr. Hillary’s car, clothing and other items seized from Mr. Hillary’s home, and a
pseudo-exemplar sample of Mr. Hillary’s own DNA profile, and multiple areas of the Crime
Scene.
13, Lawenforcement hypothesized that the killer or killers had escaped from the
Crime Scene by exiting a window in the rear of the apartment, dropping to a lower roof, and then
landing in the backyard. DNA samples were collected that tracked this possible avenue of egress.
The New York State Police crime lab processed an enormous number of samples from the Crime
Scene. For example the lab attempted to extract Human DNA from “Material removed from
crack in tile,” “Hair from tile,” “Swabs from window and screen,” “Swab from exterior wood
sill,” “Swab from exterior stone sill,” “Swabs from the window and screen,” “Small black fuzz,”
“Swabs from the rear upper and lower left regions of the shorts,” “Swabs from the rear upper and
lower right regions of the shorts,” “Swabs from the rear left shoulder region of the shirt,” “Swabs
from the outside rear right shoulder region of the shirt,” “Swabs from the left outside rear
back/side region of the shirt,” “Swabs from the right outside rear back/side region of the shirt,”
“Swabs from the left outside rear arm region of the shirt,” “Swabs from the right outside rear arm
region of the shirt,” “Swabs from the inside/outside rear waistband of the underwear,” “Swabs
from the side (-18") of the roof tile,” “Swabs from the side (-21") of the roof tile,” “Swabs from
the inside of the roof tle, (straight edge side)” “Swabs from the inside of the roof tile, (broken
edge side)” “Swabs from the edges of the roof tile, (mise. pieces)” “Swabs from the undemeath
of the roof tile, (mise. pieces)” “Swab from the button,” “Swab from the threads,” “Swabs from
the outside bottom region of the sock,” “Swabs from the outside top region of the sock,” “Swab
from the outside ankle elastic region of the sock,” “Swabs from the outside bottom region of the
sock,” “Swabs from the outside top region of the sock,” “Swab from the ankle elastic region ofthe sock,” “Paint scrapings - scuff mark,” “Blood stained cutting from the shorts,” “Blood
stained cutting from the long sleeve shirt,” “Cutting from the fluorescent area of the long sleeve
shirt,” “Swabs from the outside neck region of the long sleeve shirt,” “Swabs from the inside
neck region of the long sleeve shirt,” “Cutting from the fluorescent area of the long sleeve shirt,”
Jood stained cutting from the long sleeve shirt,” “Cutting from the fluorescent area of the
black long sleeve jersey,” “Cutting from the fluorescent area of the black long sleeve jersey,”
“Swab from the ripped area in the front of the black long sleeve jersey,” “Swabs of the ripped
neck region of the black long sleeve jersey,” “Swabs from the lower front arm region of the
black long sleeve jersey,” “Swabs from lower rear arm region of the black long sleeve jersey,”
“Oral swabs from Garrett J. Phillips,” “Fingernail scrapings from the right hand of Garret J.
Phillips,” “Fingernail scrapings from the loft hand of Garret J. Phillips,” “Blood stained swab
from the "nail left thumb" of Garrett J. Phillips,” “Blood stained swab from the "nail left thumb"
of Garrett J. Phillips,” “Blood stained swabs from the "skin anterior neck" of Garrett J. Phillips,”
“Blood stained swabs from the "skin anterior neck" of Garrett J. Phillips,” “Swab from outside
palm (leather) of left glove,” “Swab from outside finger tips of left glove,” “Swab from outside
wrist region of left glove,” “Swab from fluorescent area on fingertip of left glove,” “Swab from
top side of finger tips of left glove,” “Swabs from inside left glove,” “Swab from left front door
handle,” “Swab from left front door arm rest pocket,” “Swab from left front seatbelt clip,” “Swab
from shifting lever,” “Swab from steering wheel,” “Swab from exterior wood sill,” “Swabs from
inside blind,” “Swabs from outside blind,” “Swab from screen range from --0-10 in,” “Swab
from screen range from -10-20 in.,” “Swab from screen range from-20-30 in.,” “Swab from
screen range from -30-40 in.,” “Blood stained Swab from ...-bedroom - upper part of trim
molding,” “Blood stained Swab from master bedroom - middle part of trim molding,” “Bloodstained Swab from... bedroom - lo-part of trim molding,” “Swab from door look knob,” “Swab
from door lock plate,” “Swabs from sleeve of sweatshirt,” “Swabs from sleeve of sweatshirt,”
“Swabs from hood region of sweatshirt,” “Cutting of fluorescent area of sweatshirt,” “Cutting of
fluorescent area of sweatshirt,” “Cutting of fluorescent area of sweatshirt,” “Blood stained
cutting from the envelope,” “Swab from inside of left bra cup,” “Swab from outside of left bra
cup,” “Swab from inside of right bra cup,” “Swab from outside of right bra cup,” “Swab from
inside grey bracelet,” “Swab from outside grey bracelet,” “Swab from inside blue bracelet,”
“Swab from outside blue bracelet,” “Swabs from lower left leg region of Adidas sweatpants,
“Swabs from lower right leg region of Adidas sweatpants, “Swabs from outside of left glove
(palm side), “Swabs from outside of left glove (backside), “Cutting of fluorescent area on left
glove, “Cutting of fluorescent area on left glove, “Swabs from outside of right glove (palm side),
“Swabs from outside of right glove (backside), “Cutting of fluorescent area on right glove,
“Swabs from lower left leg of Admiral sweatpants, “Swabs from lower right leg of Admiral
sweatpants, “Cutting of fluorescent area on Admiral sweatpants, “Cutting of fluorescent area on
Admiral sweatpants, “Human hair from Admiral sweatpants, “Swabs from lower left leg of Lotto
sweatpants, “Swabs from lower right leg of Lotto sweatpants, “Cutting of fluorescent area on
Lotto sweatpants,” “Blood stained swab from the left inside chin and jaw area of the EMT collar,
“Blood stained swabs from the inside back of the neck area of the EMT collar, “Blood stained
cutting from the velcro strap of the EMT collar, “Blood stained swabs from the right inside chin
and jaw area of the EMT collar, “Swabs from the inside throat area of the EMT collar, “Swabs
from the inside back of the neck area of the EMT collar, “Blood stained swabs from the outside
back of the neck area of the EMT collar, “Blood stained swabs from the outside chin and jaw
area of the EMT collar, “Swabs from bottom base of blind, “Swabs from "inside" blind slats - 1to 17,” “Swabs from “outside” blind slats - 1 to 17,” “Swabs from screen - lower comer,”
“Swabs from screen - upper lower comer,” “Swabs from screen - upper comer~ near break,”
“Swabs from screen - opposite side comer,” “Swabs of latent lift from lower left interior window
frame,” and “Swabs of latent lift from lower left interior window frame.”
14, The profile of Nick Hillary and Garrett Phillips were compared against all
samples where comparisons could be made.
15, A definite trend occurred in these DNA evidence as reported: Mr. Hillary
‘was excluded from all samples at the Crime Scene where comparisons could be made.
Meanwhile, Garrett Phillips was excluded from samples taken from Mr. Hillary’s vehicle and
from the items seized from Mr. Hillary’s home, in samples where comparisons could be made.
‘The one exception to this evidentiary trend was a report by the NYSP lab that the DNA mixture
profile from the fingernail scrapings from the left hand of Garrett Phillips was consistent with
DNA from Mr. Phillips as the major contributor, admixed with DNA from at least one additional
donor. The lab further reported that due to insufficient genetic information, Mr. Hillary could
neither be included nor excluded as a possible contributor of DNA to the left hand fingernail
scrapings mixture.
16. Beginning in 2013, the NYSP lab reached out to Dr. Mark Perlin of
Cybergenetics, Ine. in Pittsburgh, PA. Founded by Dr. Perlin, Cybergeneties is the proprietor of
TrueAllele, a probabilistic genotyping software program that has been admitted into more courts
in the United States than any other DNA expert system. TrueAllele is a comprehensive software
system that utilizes the electronic data files from the genetic analyzing instrament from the lab.
‘The program is based upon the premise that more rather than less data should be used when
assigning a probability in a DNA forensic comparison.17, TrueAllele attempts to take into account into all of the data and all possibilities of
what the data represents, TrueAllele generates likelihood ratios (“LRs”) for each comparison
called a match statistic, that is similar to the random match probability used in conventional
DNA testing of an individual profile
18. At the behest of the prosecution, the NYSP requested that Dr. Perlin compare the
left hand fingernail scrapings mixture against the DNA profiles of Garrett Phillips and Mr.
Hillary.
19. Using TrueAllele, Dr. Perlin performed the comparison. The results were a match
statistic with no statistical support. In other words, the computer-generated statistic by
TrueAllele was consistent with the human expert interpretation that there was inconclusive
evidence as to whether Hillary’s profile was inchided in the mixture from the left hand fingernail
shavings.
20. After Dr. Perlin informed NYSP in 2013 (and again in 2014) that the TrueAllele
comparison provided no statistical support for a match with Mr. Hillary, the prosecution declined
to order a report from Cybergenetics, or to employ Dr. Perlin in any other work in this case.
21. In 2014, after running on a campaign to find the killer of Garrett Phillips, the
Office of District Attomey Mary Rain indicted Mr. Hillary. The indictment was later dismissed
for prosecutorial misconduct in the grand jury. Onondaga County District Attomey William
Fitzpatrick also entered the case as a special prosecutor.
22, The St. Lawrence County District Attorney's Office re-indicted Mr. Hillary in
2015.23. ADA Fitzpatrick reached out to Dr. John Buckleton, a forensic scientist from New
Zealand who had helped develop STRmix at the Institute of Environmental Science and
Research (“ESR”).
24, The NYSP subsequently sent ESR electronic raw data files, and a scientist at the
ESR ran the data in STRmix.
25, There is no indication in the materials provided to the defense that the NYSP ever
performed an internal validation study for the use of STRmix on casework samples developed at
the NYSP lab.
26. There have been a series of draft affidavits from Dr. Buckleton on the left hand
fingernail scrapings. The most recent affidavit was dated April 14, 2016.
27. The LR statistics for the left hand fingernail scrapings has ranged from ten million
to ten thousand, depending on how a phenomenon called ‘stutter’ is treated. The most recent LR
statistic was generated after the adoption of a forward stutter model. STRmix now reports the LR
to be roughly 300,000 as to Mr. Hillary.
B. Conventional DNA Analysis
28, Understanding probabilistic genotyping software like TrueAllele and STRmix
requires an understanding of general principles of DNA testing. DNA is a molecule containing
genetic material that codes for the unique physical characteristics of human beings. An
individual inherits half of his DNA from his mother and half from his father. Each person's
DNA is unique, with the exception of identical twins.
29. DNA is comprised of four chemicals called nucleotides, or bases: adenine (“A”),
cytosine (“C”), guanine (“G”), and thymine (“T”), These bases pair together in the followingway: A with T; C with G. These pairs repeat in varying lengths and form the rungs on the
double helix that constitutes the DNA molecule. The double helix is wound very tightly into a
chromosome.
30. A “gene” refers to a sequence of base pairs along a given portion of the DNA
double helix which codes for a certain trait. Different genes are located in different places, or
loci, along a chromosome. An allele is one of several alternative forms of a gene that occurs at
the same position on a specific chromosome. In other words, an allele is a variation in the
number of times the base pairs of DNA repeat at a particular locus on a particular chromosome.
This number of repeats varies among humans, who have two alleles at each locus of each
chromosome, inheriting one allele from each parent. Modern forensic analysis therefore focuses
on Short Tandem Repeats (STRs), i.e. the number of times the base pairs repeat at a variety of
loci along a person’s chromosomes. By measuring and comparing the number of repeats at
given loci, an analyst can distinguish one individual from another.
31. Currently, in developing a DNA profile, the NYSP examines fifteen loci, plus the
gender-determining locus, “Amelogenin,” Numbers are used to represent which alleles are
present at each locus. Each person has two alleles at each locus (one from each parent). Ifa
person inherits the same allele at a locus from each parent, the person is “homozygous” at that
locus; if the inherited alleles are different, the person is “heterozygous” at the locus.
32. Forensic DNA analysis is essentially a six-step process. First, DNA is extracted
from the evidence, e.g, the swabs or scrapings taken from the physical evidence. In the second
step, quantitation, the analyst measures the amount of DNA present in the sample being tested.
The third step, amplification, involves polymerase chain reaction (PCR), a process of heating and
cooling that makes millions of reproductions of DNA so that the DNA sample becomes more
urobust, and more easily analyzed. In the fourth step, after the DNA is amplified, a process
known as electrophoresis separates the STR fragments by size. The electrophoresis results
appear as a series of peaks on a graph, known as an electropherogram. Once the
electropherogram is generated, the analyst reviews it and draws conclusions about the DNA
sample, with a view toward developing a DNA profile, thus constituting the fifth step. In the
sixth and final step, the analyst compares a forensic DNA profile with a known DNA profile, and
draws additional conclusions
33, DNA testing and typing can be complicated by the existence of what are called
stochastic effects.
These are random fluctuations in testing results that can adversely influence
DNA profile interpretation and usually occur in low level samples. These stochastic effects
include: stutter, which is a peak that is typically one repeat unit less in size than a true allele, but,
is not itself a true allele; “drop-in,” which is an allele that is not from a contributor to an evidence
sample, usually due to low levels of contamination; “drop-out,” which is the failure to detect an
allele that is actually present in a sample, due to small amounts of genetic starting material.
ional stocha
Furthermore, in complex mixtures, ad ¢ effects can complicate accurate
interpretation, These are: peak height imbalance (disparities of height between two peaks from
the same contributor); machine noise (background noise captured by capillary electrophoresis,
machine); degradation (random breaks in the DNA molecules); preferential and locus-specific,
amplification (more ready amplification of some DNA types versus others)
34, A DNA profile on a piece of evidence or from a crime scene can be a single
source, meaning coming from a single contributor, or a mixture, DNA mixtures arise when two
or more individuals contribute DNA to a sample. An analyst can tell that a sample contains a
mixture because more than two allele peaks will appear at one or more loci. In standard DNA
2analysis, the peak heights of one contributor may stand out, and thus readily distinguish their
alleles from those of the one or more other contributors. Once an analyst determines the number
of contributors to a mixture sample, they can determine the genotype of the contributors by
grouping together the alleles with similar peak heights.
C. _ Uninterpretable Mixtures and Probabilistic Genotyping
35, Low level DNA samples and DNA mixtures of two or more contributors pose @
problem to DNA forensic analysts. In standard analysis, the peak heights of one contributor may
stand out, and thus the analyst can readily distinguish his alleles from those of the one or more
other contributors, But itis often the case, especially with relatively small contributors seen in
high sensitivity analysis, that the sample contains a soup from which each individual’s alleles
cannot be separated out and placed in a profile. In the past analysts dealt with this challenge by
calculating statistics concerning the probability of inclusion. But these statistics were general in
nature and continue to be the subject of much controversy. See, William C. Thompson,
Laurence D, Mueller, and Dan E. Krane, Forensic DNA Statistics: Still Controversial in Some
Cases, THE CHAMPION, December 2012, 12-23.
36. Probabilistic genotyping software programs are designed to calculate a statistic to
contributors of such mixtures when one could not be determined in the past. These programs use
biological modeling, statistical theory, computer algorithms, and probability distributions to
calculate likelihood ratios (LRs). LRs are the statistic calculated by these probabilistic programs,
which reflects the relative probability of a particular finding under alternative theories about its
origin. /d, In forensic DNA analysis, that LR can be stated as the profile is x amount of times
‘more likely if the prosecutor's hypothesis is true than the defendant’s hypothesis. The
prosecutor’s hypothesis typically is that the defendant and a certain number of other unknown,unrelated contributors contributed to the mixture, while the defendant’s hypothesis (which,
disturbingly, is not provided by the defendant, but by the operator of the software) is that the
same total number of unrelated people were the contributors.
D. What is STRmix ?
37, STRmix, the probabilistic genotyping software at issue in this case, does not
solely rely on the science of DNA typing and interpretation. Instead, it also uses computer
seience algorithms to perform “complex” mathematical and statistical calculations. See Jo-Anne
Bright, Duncan Taylor, et al. “Developmental validation of STRmix, expert software for the
interpretation of forensic DNA profiles, Forensic Science Int'l: Genetics”(accepted manuscript),
2016, p. 227 (Hereinafter “Developmental validation of STRmix”), Indeed, STRmix is a
sofiware program—it is not used for any of the other steps of DNA analysis described above.
‘See generally, id. tis only after a DNA analyst in a laboratory perform the regular steps
developing a DNA profile from a sample, that STRmix adds an additional step performed not in
a lab by a trained scientist, but instead, by a person sitting at a computer screen, running a
complex computer software program. This program is designed to answer the classic question in
forensic DNA interpretation: what are the profiles of the contributors to this mixture? But
‘STRmix answers this old question in a new, novel and unique way.
38. The backbone of the STRmix software system is a computing algorithm called the
Markov Chain Monte Carlo (MCMC) method of calculating probable outcomes. Ia. at 233. The
implementation of the MCMC algorithm in STRmix utilizes statistical models to simulate
hypothetical true alleles while incorporating stochastic effects. Jd. It then assesses those
simulated alleles and then makes conclusions about what is true DNA as opposed to artifacts ina
sample, Jd. Based on those conclusions, the likelihood ratio is then generated as a furtherstatistical assumption. The reason that MCMC is used is that there are an exponentially
enormous number of combinations of assumptions and outcomes that arise from any mixed
sample. It would be practically impossible to do such a calculation without a computer running
sophisticated software.
39, The MCMC method has shortcomings, however. First, it does not report results
for all of the calculations performed. Instead, it discards a certain number of calculations made
at the beginning of the analysis. The discarded data is called “burn-ins.” Id, at 234, These
“burn-ins” could contain probative DNA, but are discarded because, theoretically, they are
expected to be less likely to lead to probative results, d, at 234.
40. The remaining data simulations are all calculated toward the ultimate result. Each
mulation is called “random walks” whereby the algorithm running on the software is
programmed to randomly make an assumption about the data which, in tur, leads to the ultimate
conclusions. /d. at 235, In order to generate results this random simulation is performed
thousands, or even millions, of times for a single mixture sample. This randomness is the key to
the application of the MCMC algorithm to probabilistic genotyping. By randomly simulating a
supposedly sufficient subset of all possible outcomes, MCMC can generate an informed
probability distribution for the genotypes of the individuals whose DNA together composes a
mixed DNA sample. Critics of STRmix have expressed concerned, however, that by not
considering all of the data, the correct answer to the likelihood ratio question could be
overlooked or miscalculated. ‘The probability distribution of the MCMC algorithm could
compound this inaccuracy by overly weighting an inaccurate or incorrect answer. If the path that
the “random walk” takes, or the models that guide it, are imprecise or poorly defined, then theprobabilities produced by STRmix could not only be inaccurate themselves, but also, by the
nature the MCMC algorithm are not reproducible.
41, The developers of STRmix acknowledge in fact that “[tJhe results of no two
analyses will be completely the same using a stochastic system like MCMC. This is a
phenomenon that is relatively new to forensic DNA interpretation, which up until this point has
always had the luxury of, at least theoretically, completely reproducible interpretation
results.” “Developmental validation of STRmix,” p. 233.
Disagreement Amongst Forensic Scientists About STRmix and Other Similar
Software Programs
42. There is no agreement within the forensic community about which probabilistic
software programs or methods to employ when analyzing low template DNA or complex
mixture samples. To date, there are eight different probabilistic genotyping software programs in
the country. In New York state alone, courts have considered three different probabilistic
software programs that use different methodologies in their analysis. These programs include:
STRmix, The Forensic Statistical Tool (FST), and TrueAllele, They vary in the manner in which
they collect data, the necessary assumptions they make to perform their statistical calculations,
and the actual underlying mathematical principles used to make these calculations.
43. STRmix has thus far been accepted by only one trial court as passing the Frye
standard in New York state. See People v. Bullard-Daniel, Ind. No. 2015-88 (Niagara Co. March
10, 2016) STRmix was developed by the Institute for Environmental Science and Research
(ESR) in New Zealand. The program relies on analysts to collect the data by reviewing the
electopherograms (epg) developed in a case and discarding the peaks below the lab’s analytic
threshold, See Duncan Taylor, Jo-Anne Bright, and John Buckleton, The Interpretation of Single
Source and Mixed DNA Profiles, Forensic Sci Int'l: Geneties 7 519 (2013). Artifacts like pull-up
16and forward stutter are also removed manually Id. Most probabilistic software programs assume
that allele “drop-out” occurs at a predictable rate, but differ on how to determine that rate.
“Drop-out” is a stochastic effect that occurs when an allele is not seen in a given DNA profile,
though the analyst would have expected to see the allele, The “drop-out” rate is an important
assumption these programs make to calculate a statistic, STRmix calculates the “drop-out” rate
ina sample based on peak height variances. Peak height variability in the observed profile is
analyzed having regard to the peak height variability experienced in the laboratory generally.
STRmix utilizes MCMC algorithms to perform the required statistical calculation.
44, The Forensic Statistical Tool (FST) is another probabilistic software program
used in New York State that has been the subject of Frye hearings. See e.g. People v. Rodriguez,
N.Y. Co, Ind. No. 5471/2009 (Sup. Ct. N.Y.Co. October 24, 2013) (FST passes the Frye
standard). But see People v. Collins, 49 Mise.3d 595 (Kings Co, Sup. Ct. 2015) (Dwyer, J.) (FST
fails the Frye test). It was developed in-house at the Office of Chief Medical Examiner (OCME)
of the City of New York by Dr. Theresa Caragine and Dr. Adele Mitchell and is used in all
complex mixture cases in New York City. Similar to STRmix, FST relies on analysts to collect
the data used in the calculations and analysis, An analyst reviews the electropherogram and
determines whether alleles are present at each locus by utilizing the lab’s analytic threshold, The
analyst inputs this information, along with a known suspect profile, into the FST software, The
analyst then sets the parameters for running the program including, whether the mixture contains
‘two or three contributors. The software then outputs a “Forensic Statistic Comparison Report,”
summarizing the data that was input and indicating the resultant likelihood ratio.
45. PST differs from STRmix in how it calculates the LR. Unlike STRmix, FST does
not use MCMC algorithms in making these calculations, instead relying only on Bayesianstatistics. Bayesian statistics describe the probability of an event, based on conditions that might
be related to the event. See John Butler, Forensic DNA Typing: Biology, Technology, and
Genetics of STR Markers 459 (Second Ed., Elsevier Academic Press 2005). FST also calculates
the “drop-out” rate differently than STRmix. FST calculates the allelic “drop-out” based on the
quantitation values of given DNA samples, rather than peak height variation,
46. TrueAllele is the third different program that has been the subject of a Frye
hearing, but even it differs significantly from STRmix and FST in the manner in which it
collects, interprets, and calculates the data. See People v. Wakefield, 47 Misc.3d 850 (Sup. Co.
Schenectady Co. Feb. 9, 2015). TrueAllele was developed by Cybergeneties of Pittsburgh, PA
under the direction of Dr. Mark Perlin, TrueAllele is a fully continuous probabilistic approach
that analyzes the epgs and considers the genotypes at every locus of each contributor, taking into
consideration the mixture weights of the contributors, the DNA template mass, polymerase chain
reaction (PCR) stutter, relative amplification, DNA degradation, and the uncertainties of all these
variables.
47. Unlike FST and STRmix, TrueAllele does not rely on an analyst’s interpretation
of what constitutes a true allele by using analytical thresholds dictated by laboratory protocol in
order to collect its data. True Allele instead, considers all the data present in the sample, even
those peaks below the lab’s analytic threshold, In essence, the calculations made by TrueAllele
are based upon more information than used by FST and STRmix. Unlike FST, TrueAllele
accounts for “drop-out” rates as a function of peak heights and peak height ratios seen in the
sample rather than based on the quantity of DNA in the sample. Like STRmix, but unlike FST, it
uses MCMC algorithms to calculate likelihood function that compares genotypes relative to a
population and computes a match LR. See Wakefield, 47 Misc.3d at 859,E. The Standards of the Computer Science Community Have Been Ignored by STRmix.
48. While new to forensic biology, the use of the MCMC algorithm processing has
been long-used by computer scientists. Programs using MCMC, like all other commercial
software programs, universally undergo a rigorous validation process within the computer
science community before being accepted by that community for publie use. STRmix’s creators
have avoided the validation process of the computer science community, however, focusing
exclusively on the forensic biology component of the program.
49. Asset forth in the annexed affidavit of Nathaniel Adams, the failure to
demonstrate the development of the STRmix software system was in accordance with software
engineering industry standards is devastating to the reliability of its results. Mr. Adams is a
Systems Engineer at Forensic Bioinformatic Services, Inc. in Fairborn, Ohio. His duties include
analyzing electronic data generated during the course of forensic DNA testing; reviewing case
materials; reviewing laboratory protocols; and performing calculations of statistical weights,
including custom simulations for casework and research. See Affirmation of Nathaniel Adams
(annexed hereto) (Adams Aff.), § 1.
50. As Mr. Adams points out, the likelihood ratio calculation done by STRmix has no
“ground truth.” In other words, the likelihood ratio is totally dependent upon assumptions made
‘when modeling stochastic processes, see supra. It provides no benchmark from sample to sample
or test to test. Therefore, Mr. Adams observes, “we must base our confidence in the program on
‘two factors: (a) The appropriateness of the models used. This factor is generally within the
domain of biologists and statisticians; and (b) The degree of fidelity with which these models
have been translated from theory/concept to source code for execution as a software program.
This factor is generally within the domain of software developers/engineers.” Adams Aff., 46.51. Fora software system to be considered validated by the software engineering
community it must be tested and validated using relevant industry standards. See Adams Aff, {]
7-8, Standards for software development and validation are published by several internationally
recognized organizations within that community, including the Institute of Electrical and
Electronics Engineers (IEEE), the ISO, and the Associating for Computing Machinery. See
Adams Aff, 47.
52. IEEE has published verification and validation guidelines for new software
coming on the market. See Adams Aff, 417. These systems are to be tested to make sure they
perform as expected in a general sense, and also to make sure that there are no latent or dormant
“bugs” in the system which could appear in later uses to devastating effect. See Queensland
authorities confirm ‘miscode’ affects DNA evidence in criminal cases, supra, In addition to
making the software available for independent testing, the maker of new software should also
provide documentation concerning internal tes
yg and validation. For software to pass
and similar standar
validation and verification in accordance with I ;, the creator of the
software must demonstrate that it has gone through this rigorous process. See Adams Aff. ff
12-13, and attached guide to “The Software Development Process.”
53. Validation and verification is essential to the reliability of probabilistic
genotyping systems like STRmix. As the geneticists Christopher D, Steele and David J. Balding
point out that, although probabilistic genotyping has promise, before it can be used by fore
biologists, the underlying computer science must be validated:
Laboratory procedures to measure a physical quantity such
as a concentration can be validated by showing that the measured
concentration consistently lays within an acceptable range of error
relative to the true concentration. Such validation is infeasible for
software aimed at computing an LR because it has no underlying
true value (no equivalent to a true concentration exists). The LR
20expresses our uncertainty about an unknown event and depends on
‘modeling assumptions that cannot be precisely verified in the
context of noisy CSP [crime scene profile] data,
‘Some progress can be made in evaluating the validity and
performance of software. Courts need these kinds of evaluations to
have confidence in the results of software-based forensic analyses.
Open source software is highly desirable in the court environment
because openness to scrutiny by any interested party is an
waluable source of bug reports and suggestions for improvement.
C.D. Steele and D. J. Balding, “Statistical evaluation of forensic DNA profile evidence,
Annu, Rev. Stat. Its Appl., vol. 1, pp. 361-384, 2014. See also, Adams Aff. 5.
54, Here, STRmix is neither “open source” nor “open to scrutiny.” Although
STRmix has internally validate
procedures by biologists and statisticians, and has
endeavored to follow the forensic DNA guidelines outlined by SWGDAM, it has not been
demonstrated to have undergone any validation process described by IEEE, ISO or the
Association for Computing Machinery. See Adams Aff, 411. ‘The result is that source code
errors have called into question STRmix statistics to obtain criminal convictions. See
Queensland authorities confirm ‘miscode’ affects DNA evidence in criminal cases, The Courier
Mail, Mar.21, 2014."
* Available at
http://www, couriermail.com.awnews/queenstand/queensland-authorities-confitm-miscode-affects-dna-evidence-in-
criminal-cases/news-story/833c580d3#1c59039efdladefSSaf92b,
21MEMORANDUM OF LAW
POINT 1
STRMix Is NOT GENERALLY ACCEPTED;
EVIDENCE RELATED To RESULTS OBTAINED THROUGH ITs USE,
SHOULD BE PRECLUDED UNDER FRYE; IN THE
ALTERNATIVE, THERE SHOULD BE A HEARING
A. THE APPLICABLE LAW
New York courts have adopted the test set forth in Frye for the admission of scientific
evidence. Wesley, 83 N.Y.2d at 422-23 (citing Frye v. United States, 293 F. 1013, 1014 (D.C.
Cir. 1923)). The Frye test poses the elemental question of “whether the accepted techni
es,
when properly performed, generate results accepted as reliable within the scientific community
generally.” People v. LeGrand, 8 N.Y.3d 449, 457 (2007) (intemal quotations omitted). As
__Just when a scientific principle or discovery crosses the line between the
experimental and demonstrable stages is difficult to define. Somewhere in this twilight
zone the evidential force of the principle must be recognized, and while courts will go a
long way in admitting expert testimony deduced from a well-recognized scientific
principle or discovery, the thing from which the deduction is made must be sufficiently
established to have gained general acceptance in the particular field in which it belongs.
293 F. at 1013-14.
Unanimous endorsement by the scientific community is not required, but there must be
“general acceptance in the relevant scientific community that a technique or procedure is capable
of being performed reliably.” Wesley, 83 N.Y.2d at 435 (Kaye, C. J., concurring). As Judge
Kaye admonished in her concurring opinion in Wesley:
Its not for a court to take pioneering risks on promising new scientific
techniques, because premature admission both prejudices litigants and short-circuits
debate necessary to determine the accuracy of a technique. Premature acceptance of
“revolutionary” forensic techniques has led to wrongful convictions. Id. at 437, n. 4
2General acceptance of novel scientific evidence may be demonstrated through expert
testimony, judicial opinions, and/or scientific and legal writings. Lahey v. Kelley, 71 N.Y.24
135, 144 (1987); People v, Middleton, 54 N.Y.2d 42, 49-50 (1981). The determination of
whether a scientific principle or technique is generally accepted in the relevant scientific
“emphasizes counting scientists’ votes, rather than . .. verifying the soundness of a
scientific solution.” Wesley, 83 N.Y.2d at 432 (Kaye, C. J., concurring) (emphasis added); see
LeGrand, 8 N.Y.3d at 457 (same). Thus, “Frye is not concerned with the reliability of a certain
expert's conclusions, but instead with whether the experts’ deductions are based on principles
that are sufficiently established to have gained general acceptance as reliable.” Nonnon v. City of
New York, 32 A.D.3d 91, 103 (1st Dept. 2006) (internal quotations and citations omitted). The
proponent of the disputed evidence shoulders the burden of proving general acceptance in the
relevant scientific community. People v. Rosado, 25 Mise. 3d 380, 384 (Sup. Ct. Bronx Co.
2009) (citing Zito v. Zabarsky, 28 A.D.3d 422 (2d Dept. 2006).
B. — STRmix Does Not Produce Results That Are Generally Accepted As Reliable
Within the Relevant Scientific Community
i Probabilistic Genotyping Was Not Contemplated by Wesley; It is Not
“Generally Accepted’ by the Forensic Biology or DNA Analysis Community
To be sure, traditional DNA testing and “visual comparison” have been admissible under
Frye in New York since 1993, See Wesley, supra. However, the testing here, using novel
probabilistic genotyping software is far from traditional, and its admissibility is anything but
well-settled. The present posture of probabilistic genotyping in the forensic science field is,
nothing like what was presented in Wesley, and, thus, that case does not answer the Frye
question here.
23First, when considering whether forensic DNA analysis was admissible, the Wesley court
‘was presented with only one method of testing. Here, there are numerous methods of
probabilistic genotyping used in different labs, none of which has the backing of a general
consensus of the forensic science community. Although the technique was novel at the time, the
court in Wesley was not presented with alternate methods of creating and comparing single
source profiles. In this case, however, within the New York state alone, labs use three different
probabilistic genotyping software programs, which collect, analyze, and calculate the data from
the DNA sample in different ways. There is no agreement within the forensic community which
method or combination of methods is best to carry out this type of testing and which provides the
‘most accurate conclusions. Without such an agreement, it cannot be said that STRmix meets the
Frye standard.
Second, Wesley never contemplated a DNA analysis method as complex and opaque as
STRmix’s probabilistic genotyping method. In Wesley, the DNA method of comparison that was
accepted by the court was a side-by-side visual comparison of dark bands on two DNA print
pattems to see if they match. Once a match was determined, it was determined the frequency
‘with which a specific allele occurs within a specific population. See Wesley, supra.
STRmix and other probabilisti
genotyping software programs go well beyond this.
These programs use complicated statistical models and algorithms to calculate the presence of
alleles that are not seen in a sample and predict the existence of genotypes in a mixture sample
that cannot be determined by looking at an electropherogram. They apply mathematical
principles of modeling, that although may have value in certain fields, push the boundaries of
acceptance in the field of forensic DNA analysis. See e.g. People v. Collins, 49 Mise.3d 595
(Kings Co. Sup. Ct. 2015) (Dwyer, J.) (“Further, the fact that FST software is not open to the
24public, or to defense counsel, isthe basis of a more general objection”). Given that the
techniques used by STRmix go beyond the techniques accepted by the court in Wesley, Wesley
provides little supporting in determining whether STRmix meets the Frye standard,
ii, Since STRmix Combines DNA Analysis with Computer Science Principles, it
Cannot be Considered Generally Accepted As It Has Disregarded Computer
Science Standards
To be admissible under Frye evidence must be “generally accepted” not just in any
scientific community, but in the “relevant” scientific community. Wesley, 83 N.Y.2d 423, “The
thing from which the deduction is made must be sufficiently established to have gained general
acceptance in the particular field in which it belongs.” Frye, 239 F, 1014 (emphasis supplied).
In Wesley, the Court found that standard DNA testing passed the Frye test. The Court
described that testing as a multi-step process performed in a scientific laboratory by forensic
biologists. Wesley, at 423-44. At the conclusion of the testing, the forensic scientist - a live
person -- made a visual comparison of the DNA on the sample to that on the suspect in order to
determine whether there was a “match.” Id. at 425. The Court found, based on the hearing
record developed below, that this manner of DNA testing was generally accepted within the
‘community of forensic biologists and DNA analysts. Wesley made no mention of probabilistic
genotyping or the use of computer software to make this analysis. That Court did not consider
whether the same community that accepted “visual” DNA comparison also could do so for a
comparison that had nothing to do with DNA analysis and everything to do with computer code.
‘The only New York State decision applying Wesley in denying a Frye challenge to
STRmix focused solely on the acceptance of STRmix within the “community of forensic DNA
analysis.” See People v, Bullard-Daniel, supra, *15 In that case, the Court held that because a
28number of forensic DNA committees and governing bodies considered probabilistic genotyping
sofiware like STRmix reliable, then it was “generally accepted” under Frye. Id. Although the
Court also mentioned the use of MCMC and the underlying algorithm of STRmix as being
considered by these DNA scientists, it overlooked the most salient issue: that forensic biologists
and even population geneticists are not computer scientists. While they may be able to expound
on their respective communities’ acceptance of a program like STRmix, these scientists cannot
say anything about whether experts in computer algorithm and software engineering would draw
the same conclusion. In fact, given STRmix’s complete failure to go through any software
validation as proscribed by IEEE, STRmix would not be accepted within the community for
which it, as a computer program, is best suited; the computer science community.
Its inappropriate for forensic scientists with litle or no formal training or experience in
software engineering or development to claim that their software program has been validated
from a software engineering perspe
fe, The importance of validating a computer program and,
having it accepted by computer scientists before use in court is as obvious as it is essential. It is
obvious because STRmix, by its own admission, is a computer program. Computer program.
experts, thus, are the appropriate scientists “particular field” to appropriately determine whether
STRmix is accepted or rejected. Meanwhile, it is essential that the computer programming is
validated by computer scientists so that any “bugs” in the system can be caught and corrected
before it is used in forensic casework.
‘The need for validation can be demonstrated with an analogy. Microsoft Word is used to
write, among other things, legal briefs. Lawyers research and write those brief’. The content of
those brief’ are reviewed by other lawyers, by adversaries, and by the Court. The content of
those briefs can be considered “legal writing,” and all within the legal community likely could
26identify the type of law discussed in the brief. The law and its application, thus, would be akin to
being “generally accepted.” On a more basic level, the spelling of words and basic grammar are
known to all lawyers and can be accepted as “ground truths” that govern such writings.
But no lawyer (unless she also is a software engineer) could actually say anything
meaningful about the use of Microsoft word to write those briefs. Lawyers could observe that
Microsoft Word occasionally crashes, bugs emerge, and files of hard work can be corrupted or
destroyed, The lawyer knows when this happens, an error message appears, or a cursor stops
moving. But a lawyer cannot say whether those bugs represent a critical error in Microsoft
Word's code. A lawyer cannot make any evaluation of Microsoft Word’s method of processing
typed data, Indeed, a lawyer who is also not a computer scientist could say nothing about how
well Microsoft Word works and whether it works in a way that is like other word processors or
ctitically different. Similarly, a lawyer can know when a basic “ground truth” is violated, like a
word is misspelled
Asking a forensic biologist or DNA analyst to accept STRmix as a valid software is no
different than asking a lawyer to accept Microsoft Word as one. Neither has any meaningful
way of knowing whether the respective program is actually good and solid engineering, or has
fatally flawed code, In fact, the forensic scientist using STRmix is in an even worse position
than the attomey to evaluate her software because she wouldn’t even have a set of “ground
truths” to rely upon to monitor the reliability of the program in real time or even years down the
road, An attorney would immediately know that Microsoft Word didn’t pick up a simple
spelling error; a forensic scientist would never know if the software miscalculated the likelihood
ratio for a given mixture.
27Accordingly, because forensic biologists work in a totally different field, with totally
different standards and processes for validation, they are not the “particular” scientific
community needed to demonstrate general acceptance of STRmix. Computer scientists are that
“particular” group, and they have not yet weighed
POINT I
THis COURT SHOULD PRECLUDE THE STRMIX RESULTS
BECAUSE THE METHOD WaS UNRELIABLY APPLIED TO THE
EVIDENCE SAMPLE IN THIS CASE,
A, Introduetion
Even if this Court finds that STRmix methodology is generally accepted as reliable in the
scientific community, that methodology was not reliably applied in this case. This Court should
preclude the STRmix results.
A forensic method must be validated before it is used in casework. Validation
demonstrates that a method is reliable. It also establishes the limits of the method and the
conditions under which the method will produce reliable results.
The application of STRmix to the fingernail scrapings sample 550C2 exceeded the limits
established in validation, The program was used in this case in a mixture ratio that was much
more extreme than any r
tested in its validation, STRmix calculated the percentages of the
major and minor contributors to the fingernail scrapings sample to be 99 and 1, respectively,
when the sample was analyzed in the November 24, 2015 and December 3, 2015 analysis runs of
STRmix. See Nov. 24, 2015 STRmix Summary of Input Data Report at 1; Dee. 3, 2015
Summary of Data Report at 1. When STRmix was run on December 23, 2015, and then again on
April 4, 2016, the program calculated the percentages as 100 10 0. See Dec. 23, 2015 STRmix
28‘Summary of Input Data Report at 1; Apr. 4, 2015 STRmix Summary of Input Data Report at 1
Yet, STRmix has not been validated for use with such extreme mixture ratios.
Although the overall amount of DNA in the 550C2 sample is robust (it was tested in 1,
1.5, 2 nanogram (ng) amounts?), that portion contributed by the minor contributor is very small,
and would be termed “low-template,” “low-level” or trace DNA. Dr. Buckleton characterizes
the minor contributor to 550C2 as “at a very low level.” Apr. 14 Buckleton Affidavit. Even if
STRmix can be reliably applied to samples where the contributors are present in high template
amounts, itis unreliable for low level components at such extreme ratios. DNA present in such
small amounts exhibit stochastic, or random effects, including peak height fluctuation, drop out,
and drop in, The DNA peaks from such low level contributors may be indistinguishable from
stutter, a fact which complicated the interpretation in this case.
Although STRmix purports to account for these stochastic effects, it does not do so
reliably when applied to a sample with characteristics like 550C2, as illustrated by the
chronology of testing and reporting of results by ESR. ESR’s interpretation of this sample has
evolved over the course of months, leading to different results. When one of the STRmix
developers first looked at the data, she observed that “{t]hese are at the very edge of suitability of
any interpretation method.” Nov. 3, 2015 email from Jo Bright to John Buekleton, William
Fitzpatrick et al
Despite this, ESR decided it would run the sample in STRmix. After ESR already ran
and generated one result with STRmix, in November, 2015, ESR requested data from NYSP to
use in its interpretation—which the NYSP did not have. Nevertheless, ESR ran STRmix again in
December 2015, reporting the results in Dr. BucKleton’s December 7, 2015 affidavit. After this
? For points of reference, one human cell has 6.6 picograms (pg) of DNA in the nucleus. There are a thousand pg in
a nanogram (ng). One definition of low copy number DNA is the testing of DNA in amounts under 100-200 pg
29second set of results were generated, ESR then determined it would need to implement a new
‘model in its calculations, one which accounted for a phenomena known as forward stutter,
discussed below. ESR tried to do this with data NYSP provided, but resorted to reporting two
statistics which differed by three orders of magnitude instead.
Finally, after STRmix had been run several times and the resulting statistics reported in
Buckleton’s affidavits, ESR applied a new forward stutter model and produced an entirely new
statistic.
This constant revision and admitted failure to adequately model the observed data at the
outset at the very least warrants careful inquiry at a hearing before this Court can be satisfied that
the STRmix results should be admitted.
B. Legal Standard
Itis fundamental that a court has the responsibility to exclude irrelevant or unreliable
evidence from the case. “It is incumbent upon the proponent of expert scientific testimony to lay
a proper foundation establishing that the processes and methods employed by the expert in
formulating his or her opinions adhere to the accepted standards of reliability within the field.”
People v. Hyatt, 2001 N.Y. Slip Op. 50115(U) citing People v. Wilson, 133 A.D.2d 179 (N.Y.
App. Div. 1987). When dealing with a methodology that is generally accepted, “[t]he focus
moves from the general reliability concerns of Frye to the specific reliability of the procedures
followed to generate the evidence proffered and whether they can establish a foundation for the
reception of the evidence at trial.” Parker v. Mobil Oil Corp., 7 N.Y.3d 434, 447 (2006) citing
People v. Wesley, 83 N.Y.2d 417, 429 (1994); People v. Middleton, 54 N.Y.2d at 45; People v.
Wesley, 83 N.Y.2d at 428 (testimony must not only demonstrate the acceptance of forensic DNA
30profiling evidence by the relevant scientific community and its reliability, but must also
demonstrate the “admissibility of the specific evidence - - .e., the trial foundation - - how the
sample was acquired, whether the chain of custody was preserved, and how the tests were
made”)
In Parker v. Mobile Oil Corp., supra, a toxic tort case, the Court of Appeals had to
answer the question “as to whether the methodologies employed by [plaintiff's] experts lead to a
reliable result--specifically, whether they provided a reliable causation opinion without using a
dose-response relationship and without quantifying Parker's exposure.” Id. at 447. Concluding
that in that case there was not a novel methodology at issue necessitating a Frye inquiry, the
Court described the issue before it “as more akin to whether there is an appropriate foundation
for the experts’ opinions, rather than whether the opinions ate admissible under Frye.” Id. See
also Wesley, supra, J. Kaye concurring, FN2 at 436 (“Our cases have always required a
foundational inquiry before scientific evidence can be admitted (see, e.g., People v. Middleton,
54 N.Y.24d, at 45, 444 N.Y.S.2d 581, 429 N.E.2d 100, supra ), even after a particular technique
has passed out of the “twilight zone” of “novel” evidence that is the subject of Frye and is
judicially noticed as reliable ( see, People v. Knight, 72 N.Y.2d 481, 487, 534 N.Y.S.2d 353, 530
N.E.2d 1273 [radar speed detection]; People v. Campbell, 73 N.Y.2d 481, 485, 541 N.Y.S.2d
756, 539 N.E.2d 584 [blood alcohol content test]; People v. Mertz, 68 N.Y.2d 136, 148, 506
N.Y.$.2d 290, 497 N.E.2d 657 [same]; People v. Freeland, 68 N.Y.2d 699, 701, 506 N.Y.S.2d
306, 497 N.E.2d 673 [same]; Pereira v. Pereira, 35 N.Y.2d 301, 307, 361 N.Y.S.2d 148, 319
N.E.2d 413 [polygraph test used for investigative purposes] ).”). See also People v. Seda, 139
Misc.2d 834, 841 (N.Y. Cty. 1988) (Carey, J.) (“Dr. Shaler’s testimony also revealed that
contrary to the requirements of the laboratory manual he had devised for electrophoretic analysis,
31he had failed to record any of the parameters of the analysis he performed inasmuch as he acted
as his own ‘quality control’ and, in the event of any irregularities, would have repeated the
analysis”); People v. Castro, 144 Mise.2d 956, 976 (Bronx Cty. 1989) (Sheindlin, J.) (finding
DNA forensic identification technique was generally accepted, but that a hearing was necessary
to determine whether the lab had conducted scientifically acceptable tests, finding after a pretrial
hearing, that the “testing laboratory failed in several major respects to use the generally accepted
scientific techniques and experiments for obtaining reliable results” and ruling the evidence
inadmissible).
If this Court denies Mr. Hillary's Frye challenge, it must then answer this foundational
inquiry. Should this Court find that STRmix satisfies the Frye standard, the defense respectfully
requests that this Court conduct an inquiry into the reliability of the evidence and conduct a
hearing to answer this foundational question, see Wesley, supra, J. Kaye concurring, fn2 at 436
(Frye hearing and foundational inquiry may proceed simultaneously....”)
C. STRmix has not been adequately validated for use with mixtures ratios as extreme
as those in this ease
1, Extreme mixture ratios present interpretational challenges
A mixture re
is defined as “the relative ratio of the DNA contributions of multiple
ividuals to a mixed DNA typing result, as determined by the use of quantitative peak height
information (SWGDAM 2010).” ]. JOHN BUTLER, ADVANCED TOPICS IN FORENSIC
DNA TYPING: INTERPRETATION, p. 136 Elsevier (2014). As STRmix reported in this case,
the relative contributions may also be expressed as percentages in the overall sample mixture
(e.g., 99 percent and I percent). See id.
2‘The amount of DNA each individual contributes to the mixture affects the ability to
detect all of the contributors to the mixture. A “minor component of a mixture is usually not
detectable for mixture ratios below the 5% level or 1:20.” Furthermore, when a “minor
component is at a low level it is subject to stochastic effects...” which significantly impact
interpretation and statistical weighting, Jd. at 137
Critically, it can be difficult in a mixed sample to distinguish between testing artifacts
and a low level peak from a minor contributor. Stutter is an artifact of DNA testing. Stutter
appears on an electropherogram as a small peak to the immediate left (or sometimes immediate
right) of the true allele. When it occurs at one repeat less than the true allele, it is known as “-4
stutter”, When a stutter peak appears to the right of a true allele itis known as +4 stutter.
To illustrate this, consider the images below. Stutter will often appear in a profile like:
‘The small blip to the left of the larger peak is stutter in a conventionally typed profile.
Compare with two loci from the $50C2 sample in this case:Stutter complicates interpretation in different ways. Stutter and a real allele may also
overlap. Stutter can appear to be a real allele in a complex mixture with peaks present at a range
ofheights. This is particularly true when the minor component alleles are present at a height in
the same range as an expected stutter peak. “When minor alleles have peak heights that are
similar in amount to stutters of major alleles, then these stutter peaks and minor alleles are
indistinguishable and may need to be accounted for in the interpretation of the
profile...”[citations omitted]. JoHN BUTLER, ADVANCED TOPICS IN FORENSIC DNA TYPING:
INTERPRETATION, p. 319 Elsevier (2015).
This is the case for the sample at issue here. Dr. Buckleton in describing the need for a
forward stutter model agrees: “When a peak is in a forward stutter position it is sometimes
difficult to ascertain whether itis allelic or an artifact.” Apr. 14, 2016 Buckleton Aff. See also
Bruce Budowle, et al., Mixture Interpretation: Defining the Relevant Features for Guidelines for
the Assessment of Mixed DNA Profiles in Forensic Casework, J. Forensic Sci, July 2009, Vol.
54, No. 4. Therefore, the extreme mixture ratio in this case means it is difficult to determine
whether a peak is truly from a minor contributor or is it stutter peak.
In fact, Dr. Buckleton requested data from the New York State Police to model forward
stutter, As Dr, Buckleton stated in his affidavits, six alleles attributed to Mr. Hillary are in the
forward stutter position,
2. The mixture proportions/ratios are extreme and STRmix has not been
adequately validated for use on them
The mixture proportions in this case were determined by STRmix to be 99% and 1%,
respectively, when the sample was analyzed in the November 24, 2015 and December 3, 2015
analysis runs of STRmix. See Nov. 24, 2015 STRmix Summary of Input Data Report at 1; Dec,
343, 2015 Summary of Data Report at 1. When STRmix was run on December 23, 2015, and then
again on April 4, 2016, the program calculated the percentages as 100% to 0%.
Before a method may be used in a forensic laboratory, it must be validated to ensure it is
reliable, See FBI Quality Assurance Standards (QAS) 8.1, 8.2 and 8.3, available at
http://www. fbi. gov/about-us/lab/biometric-analysis/eodis/qas_testlabs. Last visited May 26,
2016.
There are two types of validation, developmental and internal, Developmental validation
is defined under the FBI Quality Assurance Guidelines as “the acquisition of test data and
determination of conditions and limitations of a new or novel DNA methodology for use on
forensic samples.” Internal validation is defined as “an accumulation of test data within the
laboratory to demonstrate that established methods and procedures perform as expected in the
laboratory.” FBI QAS, supra.
SWGDAM, a group of American and Canadian forensic scientists representing labs at the
local, state and national level, defines a developmental validation of a probabilistic genotyping
software system as “the acquisition of test data to verify the functionality of the system, the
accuracy of statistical calculations and other results, the appropriateness of analytical and
statistical parameters, and the determination of limitations... Developmental validation should
also demonstrate any known of potential limitations of the system.” SWGDAM Guidelines for
the Validation of Probabilistic Genotyping Systems, 2015, available at
hittp://media wix.com/ugd/4344b0_22776006b67e4a32a5 ffo04fe3b56515.pdf, last visited
5/14/2016.
Developmental validation should include the testing of various mixture proportions in
order to evaluate the system’s sensitivity, which measures how well the method can detect a
38known contributor, and the system’s specificity, which measures how well a method excludes a
non-contributor. These studies should be conducted “over a broad variety of evidentiary typing
results (to include mixtures and low-level DNA quantities).” Jd. at 5-6.
Inits developmental validation of STR Mix, ESR tested a range of template amounts,
number of contributors, and ratios of contributors. This is described in the STRmix V.2.4 User
Manual. For two person mixtures, STRmix was tested at amounts of DNA from 100-500pg with
ratios of 1:1 to 5:1 (when the prosecutor’s hypothesis was true, i.e. a sensitivity study). See
STRmix V.2.4 User Manual, p. 110; Duncan Taylor, Jo-Anne Bright, John Buckleton, The
interpretation of single source and mixed DNA profiles, Forensic Sci. Int'l: Genetics 7 516-528
at 524 (2013).
More extreme ratios than 5:1 were tested several years ago. An experiment was
conducted with an unknown, earlier version of STRmix than that used in this ease. The most
extreme proportion STRmix calculated for two person mixtures in that study is .09 to 91, or 9 to
91, less than the proportions STRmix assigned in this case. Only three samples, each of a
different template, were used at this ratio, and amplified in triplicate.’ Duncan Taylor, Using
continuous DNA interpretation methods to revisit likelihood ratio behavior, Forensic Sci, Int’:
Genetics, 11 (2014), 144-153 at 145. This is insufficient to demonstrate that STRmix can
generate reliable statistics for a sample at the ratio present in this case.
Moreover, the NYSP lab did not conduct an intemal validation of STRmix, so the lab did
not test how STRmix would perform on mixture ratios this extreme.
Not every mixture ratio which is conceivable can be tested—that would be impracticable
and unnecessary. What is necessary is to establish bounds or limits. For instance, if lab tests
Amplified in triplicate means that the 3 samples each underwent the amplification stage of DNA testing 3 times, so
that there were a total of 9 tests.
3620:1 and 10:1 mixtures, it may not be necessary to test 17:1 samples, because that ratio falls
within a range that has been tested.
The issue of whether STRmix can be applied to such extreme ratios appears to be one of
first impression for New York courts. No such challenge appears to have been raised in Bullard-
Daniel, supra
D. Confidence in STRmix’s ability to correetly interpret this type of sample should be
questioned
It is clear that this sample presented a challenge to STRmix and that this Court cannot be
assured that the answer reached by STRmix is accurate or was produced with reliable means,
ESR has already generated three different likelihood ratios, using different models. Modelling,
should be done in the developmental stage, not during the application of the method to an actual
case. This is a function of the inability of STRmix to explain the observed data well. This
continued revision also raises the possibility of subjectivity and bias in the interpretation,
Critically, NYSP did not conduct an internal validation study in which NYSP equipment,
protocols, personnel and data were used in the testing of STRmix. First, parameters incorporated
by STRmix on which it relies to generate results did not come from the New York State P
lab. On December 4, 2015, Dr. Bright emails Meegan Fitzpatrick, a scientist at NYSP, and
requests drop-in data because that function had been tumed off, but during technical review
(review of the results by another scientist at ESR) this was questioned. Drop-in is the detection
of sporadic alleles in a sample, the origin of which cannot be said to come from the crime scene
sample; in other words drop-in is sporadic contamination. As the STRmix V.2.4 User Manual
states, it’s like “alleles snowing from the ceiling.” Id. at 146,
37In response to Dr. Bright's request, Meegan Fitzpatrick states, “We currently do not have
any data on drop in. We do not have a low copy number protocol or validation in house to
provide any data.” Yet, for STRmix, “[d]rop-in parameters are defined individually for a
specific laboratory and are determined as part of the implementation process for STRmix within
that laboratory.” ESR requested the drop-in data and NYSP could not provide it.
In fact, other data used in modeling parameters incorporated into the STRmix statistical
analysis didn't come from the NYSP at all: instead, ESR used either the default settings ESR had
developed or took them from a lab in Toronto using the same genetic analyzer and kit. See Apr.
14, 2016 Buckleton Affidavit, (“Our standard operating settings and parameters for Identifiler
3500 data were applied except that values for saturation, allele and stutter variance and locus
specific amplification efficiencies. These were taken from a dataset validated using Identifiler
Plus data analyzed on a 3500 CE instrument undertaken at the Toronto Centre of Forensic
Sciences.”
Second, certain data that were provided proved difficult to model adequately. After
already running STRmix on this sample, Dr. Buckleton developed a forward stutter model,
critical in a case where six of the eleven peaks corresponding to Mr. Hillary’s alleles are in
forward stutter positions in a mixture with an extreme ratio, To this end, on December 15, 2015,
Dr. Buckleton requested from Julie Pizziketti, Director of Biological Science at the NYSP, “100
single source [samples] with all stutter filters off" because he “hit a bit of a snag at TR for your
case. I need to firm up on forward stutter. Many of your peaks are in forward stutter positions.”
Dr. Buckleton received this data:
In order to inform this we analyzed a set of data provided by the New
York State Police. This analysis has not proven highly successful and this
is because I did not specify the required data well. T have ended up with
136 useful data points of which only 2 show forward stutters...
38Jan. 3 Draft affidavit, Dec. 23 Affidavit. When attempting to develop or evaluate a model
describing forward stutter, a sample size of two is insufficient. Again, NYSP did not conduct an
internal validation where STRmix would be tested in that lab using those protocols, equipment,
personnel, and data,
ESR should have anticipated these issues from the start: as Dr. JoAnn Bright, one of the
developers of STRmix and scientist who ran the analyses in this case, communicated to the
NYSP lab director and her colleagues after reviewing the electropherograms of sample 550C2,
that “[tJhese are at the very edge c ity of any interpretation method,” Nov. 3, 2014 email
from Jo Bright to John Buckleton, William Fitzpatrick et al.. Yet the decision was made to push
the envelope anyway and test it, even without an adequate model in place. After results were
generated, ESR went back and attempted to compensate for the lack of adequate modelling,
More studies would need to be performed to ensure STRmix is reliable to be used on
mixtures with such extreme ratios as those present in this case. Therefore, this Court has no basis
on which to conclude that the application of STRmix in this particular case was reliable and must
preclude the evidence.
POINT U1
THE STRMIX RESULTS SHOULD ALSO BE PRECLUDED BECAUSE
‘THEIR PROBATIVE VALUE ARE SUBSTANTIALLY OUTWEIGHED
‘THE DANGER IT WILL PREJUDICE THE DEFENDANT
Even if this Court finds that testimony concerning the STRmix results is admissible under
the Frye/Wesley standard, it should still exclude the evidence because its probative value is
substantially outweighed by the danger it will prejudice the defendant and mislead the jury.
39“Evidence is relevant if it has any tendency or reason to prove the existence of any
material fact i.e, it makes the determination of the action more probable or less probable than it
would be without the evidence.” People v. Scarola, 525 N.E.2d 728, 732 (1988). ‘The general
rule is that evidence which tends to prove a material fact in a case is admissible unless precluded
by an evidentiary rule, People v. Wilder, 93 N.Y.2d 352 (1999); People v. Buie, 86 N.Y.2d 501,
509 (1995). As such, “[n}ot all relevant evidence is admissible as of right”...Even where
technically relevant evidence is admissible, it may still be excluded by the trial court in the
exercise of its discretion if its probative value is substantially outweighed by the danger that it
will unfairly prejudice the other side or mislead the jury.” Id
Here, testimony concerning the STRmix results will confuse and mislead the jury. There
are three STRmix results, and it is reasonable to assume that the prosecution will elicit ESR’s
account for the reasons behind those differences, which will involve the presentation of highly
technical testimony. The risk that the jury will be bewildered by the different modeling behind
the results and how they relate to one another is great.
Alternatively, this Court should hold a hearing to determine whether the probative value
of this evidence is substantially outweighed by the danger that it will unfairly prejudice the
defendant and mislead the jury.
40WHEREFORE, this Court should preclude the prosecution from introducing any
evidence about or produced via the STRmix software in this case or, in the alternative, order a
Frye hearing, and grant any other relief as this Court deems just and proper.
TO:
‘THE HON. MARY RAIN
District Attomey, St. Lawrence County
Clerk of the Court
St. Lawrence County
4
Sa
Earl 8. Ward
600 Fifth Avenue
10" Floor
New York, NY 10020
(212)763-5000Declaration of Nathaniel Adams
1, Ihave a Bachelor of Science degree with a major in Computer Science froma Ws
University (Dayton, Ohio). 1 am enrolled in the Graduate School at Wright State
University, pursuing a Master of Science degree in Computer Science. | am employed as
a Systems Engineer at Forensic Bioinformatic Services, Inc. in Fairborn, Ohio. My duties
include analyzing electronic data generated during the course of forensic DNA testing;
reviewing case materials; reviewing laboratory protocols; and performing calculations of
statistical weights, including custom simulations for casework and research. | actively
use, develop, and maintain a number of software programs to assist with these efforts. |
have been involved in several reviews of probabilistic genotyping analyses in criminal
cases, including STRmix™. In 2024 | attended a week-long workshop on interpreting
forensic DNA mixtures, including a day-long session on STRmix™. In January 2016, | was
retained in a criminal case unrelated to NY v Hillary and inspected the source code of,
STRmix™, Due to a non-disclosure agreement that | signed, | am not allowed to discuss
the findings of my code inspection of STRmix™ outside of that particular case
2. Asan employee of Forensic Bioinformatics, | have had the opportunity to examine the
sclentific literature directly relating to the STRmix™ program and the application of
‘STRmix™ to certain criminal cases internationally.
3. For purposes of this Declaration, ! am restricting my comments to any public evidence as
to whether STRmix™ has adhered to specific industry standards and practices
recognized and used in the field of software development and engineering for validation
of software systems.
4, Professor David Balding describes difficulties in assessing likelihood ratio (LR)
calculations for low template DNA [LTDNA] samples in his article “Evaluation of mixed-
source, low-template DNA profiles in forensic science” (D. J. Balding. Proc. Natl. Acad.
Sci. U. S. A. July 2013. 110(30}:12241-6. Available at:
http://www-pnas.org/content/110/30/12241 full)
There is no “gold standard” test of an LR calculation for LTDNA profiles.
Likelihoods reflect uncertainty, and even when the profiles of the true
contributors are known in an artificial simulation, this does not tell us what is the
appropriate level of uncertainty justified by a given observation affected by
stochastic phenomena, likelihoods depend on modeling assumptions, and there
can be no “true” statistical model for a phenomenon as complex as an LTDNA
profile.
5. The difficulty in properly validating software like STRmix™ is described in C. D. Steele
and D. J. Balding, “Statistical evaluation of forensic DNA profile evidence,” Annu. Rev.
Stat. Its Appl, vol. 1, 2014., Section 5.1, “Quality of Results”:Laboratory procedures to measure a physical quantity such as a concentration
can be validated by showing that the measured concentration consistently
within an acceptable range of error relative to the true concentration. Such
validation is infeasible for software aimed at computing an LR [likelihood ratio]
because it has no underlying true value (no equivalent to a true concentration
exists). The LR expresses our uncertainty about an unknown event and depends
‘on modeling assumptions that cannot be precisely verified in the context of
noisy [crime scene profile} data.
Some progress can be made in evaluating the validity and performance of
software. Courts need these kinds of evaluations to have confidence in the
results of software-based forensic analyses. Open source software is highly
desirable in the court environment because openness to scrutiny by any
interested party is an invaluable source of bug reports and suggestions for
improvement.
6. In comparing a suspect's DNA profile to an evidence sample, STRmix™ generates a
likelihood ratio (LR). Because no “ground truth” LR value exists against which STRmix™
results can be compared for any mixture, we must base our confidence in the program
on two factors:
‘a, The appropriateness of the models used. This factor is generally within the
domain of biologists and statisticians.
b. The degree of fidelity with which these models have been translated from
theory/concept to source code for execution as a software program. This factor
is generally within the domain of software developers/engineers.
7. Industry practices as well as specific standards exist for the development and validation
of software systems in order to determine its fitness for purpose from a software
engineering perspective. For example, the Institute of Electrical and Electronics
Engineers (IEEE), the International Organization for Standardization (ISO), and the
Association for Computing Machinery (ACM) have promulgated standards for software
engineers to utilize during the software development process described in Appendix A.
As of May 26, 2016, IEEE’s collection of “Systems and Software Engineering” standards
(available at: https://standards.ieee.org/cai-
bin/lp_index?type=standard&coll name=software_and systems engineeringgistatus=a
ctive) contains 132 active standards, including "730-2014 - Software Quality Assurance
Processes,” “982.1-2005 - Standard Dictionary of Measures of the Software Aspects of
Dependability,” “1012-2016 - IEEE Approved Draft Standard for System, Software and
Hardware Verification and Validation,” and “29119-1-2013 - Software and systems
engineering —Software testing - (Parts 1-4)’.8. Ihave not seen sufficient documentation demonstrating STRmix™’s fidelity to its
intended use, i.e, that it has been rigorously tested, validated, or verified using current
software engineering practices described in standards such as those above. | have not
seen formal descriptions of its intended use or demonstrations that its actual operations
adhere to its intended use. Examples of materials important for evaluating software
systems in this manner include, but are not limited to:
a. Formal software requirements and specifications documents;
b. The source code, including code utilized for testing purpos
c. The software test plan describing the testing that is or should be conducted;
d. The software test report describing the results of the testing conducted; and
e, Logs pertaining to maintenance of the cade; version changes; user change
requests; error/bug reports; and installation or performance issues.
9. Ihave observed few formal references to industry standards or even general practices in
relation to the development and use of many probabilistic genotyping software systems,
including STRmix™, such as the standards mentioned above or the process described in
Appendix A.
10. When claiming that a software system has been validated or verified as operating
correctly, such claims should be made of the context of “how, by whom, and by what
standard?” If claims of validation and verification of a software system are made in
accordance with a specific, formal standard, such claims should be demonstrable by way
of a citation of that standard as well as supporting documentation generated during the
course of development, testing, and validation/verification of that software system.
These materials should be imminently available to the software developer because they
are important components of the software development process. For the purposes of
independent validation/verification, these materials could be audited by an outside
group of experts, ideally involving software developers, biologists, and statisticians.
11. | am aware of no software engineering standards specific to the field of probabilistic
genotyping. | am not aware of any recommendations made by regulatory or guidance
bodies in the field of forensic DNA that the development or use of probabilistic
genotyping software systems adhere to specific, formal software engineering standards.
12. Advocacy for increased transparency of software in the greater scientific community has
been repeatedly made. Examples include
Requiring that source code be made available upon publication would also be
expected to yield substantial benefits—including improved code quality, reduced
errors, increased reproducibility, and greater efficiency through code reuse and
sharing, Achieving this would bring disclosure and publication requirements forcomputer codes in line with other types of scientific data and materials. (A.
Morin, et al. “Shining Light into Black Boxes.” Science. 2012 April 13; 336(6078):
159-160. Available at:
hhttp://veww.nebi.nim,nih.gov/pme/artictes/PN1C4203337/
Our view is that we have reached the point that, with some exceptions, anything
less than release of actual source code is an indefensible approach for any
scientific results that depend on computation, because not releasing such code
raises needless, and needlessly confusing, roadblocks to reproducibility. (D.C
Ince, et al. “The case for open computer programs.” Nature. 2012 February 22;
482(7386): 485-488, Available at:
http://www.nature.com/nature/journal/v482/n7386/full/nature10836.html)
13. While knowledge of a software system's source code is an important component of an
independent validation and verification, the body of software engineering materials,
generated during the course of development is important to contextualize the source
code (the software implementation) in light of the intended use of the program, i.e. the
formal software requirements and specifications.
14. | understand that several developers of probabilistic genotyping software systems,
including the developers of STRmix"™, are not interested in open source (publicly
available) distributions of their programs. | have been and continue to be willing to
assist with reviews of source code and associated software engineering materials under
non disclosure agreements or protective orders.
| declare the above is true and correct under the penalty of perjury under the laws of the State
of New York, executed this 27" day of May, 2016, at Fairborn, Ohio
ithaniel Adams
sash cs apa
esr
ne 25,20AL
Appendix A The Software Development
Process?
The Software Development Process is composed of a series of stages, which are generally
divided into:
1, Requirements - What the program should do, generally written in English and including
visuals where applicable.
2. Specification — What the program should do, generally written in a combination of
technical English and mathematical notation, with diagrams where applicable. A precise,
technical description of the Requirements.
3. Design — How the program should perform the tasks specified in the Requirements and
Specification documents, generally written in a combination of technical English,
pseudocode”, and diagrams, where applicable. This is the structure of the program with
descriptions of how its companents fit together.
4. Implementation - Translation of the design to a programming language.
5. Testing ~ Verification that the program produced by the Implementation stage adheres
to the tasks described in Requirements and Specifications.
6. Maintenance - Upkeep of the program, including fixes for errors (“bugs”) as well as
additions of new features or revisions of current features.
The Software Development Process is a cyclical process ~ non-trivial programs invariably
require revisiting earlier stages of the process. There are many good reasons for revisiting
earlier stages of the process: when prior assumptions turn out to be invalid or incomplete;
requirements 'change; technologies change; users request a change; a process can be improved;
or performance can be improved.
* Software development process: “the process by which user needs are translated into a software product,
NOTE The process involves translating user needs into software requirements, transforming the software
requirements into design, implementing the design in code, testing the code, and sometimes, installing and
checking out the software for operational use. These activities may overlap or be performed iteratively.”
(SO/IEC/IEEE, “Systems and software engineering ~ Vocabulary," ISO/IEC/IEEE 24765:2010(2), vol, 2010.)
* Pseudococe: "A notation resembling a simplified programming language, used in program design; esp. (a) one
that s translated by a computer into a programming language; (6) one consisting of expressions in natural
language syntactically structured like @ programming language, used to represent programs that are later written
bya programmer.” (Oxfard English Dictionary; accessed March 2016)Requirements
I
Figure A1 - The Software Design Process, (ND Adams and DE Krane. “Black boxes and due
a2
process: Transparency in expert software systems”. Oral presentation. American Academy of
Forensic Sciences annual meeting, 2016. Las Vegas, Nevada, USA.)