Analysis of Y-Chromosome Polymorphisms in Pakistani Populations

ANALYSIS OF Y-CHROMOSOME POLYMORPHISMS IN
PAKISTANI POPULATIONS
Thesis submitted to the Sindh Institute of Medical Sciences
for the degree of Doctor of Philosophy.
BY
Sadaf Firasat
Centre of Human Genetics and Molecular Medicine
Sindh Institute of Medical Sciences
Sindh Institute of Urology and Transplantation (SIUT)
Karachi, Pakistan
2010
TABLE OF CONTENTS
Title page
Acknowledgements ii
List of Tables iii
List of Figures iv
Summary vi
Introduction 1
Literature Review 19
Materials and Methods 34
Results
Phylogeography of Pakistani ethnic groups. 51
Comparison between the Pakistani and Greek populations 73
Discussion 86
Comparison within Pakistan 88
Comparison between the Pakistani and Greek population 94
Comparison with world populations 98
Insight in to populations origins 111
Conclusions 121
References 122
Appendix a
i
ACKNOWLEDGEMENT
I thank Prof. Dr. Syed Qasim Mehdi H.I. S.I., for his support, encouragement
and for providing all the facilities for doing scientific work in his laboratory.
The work presented in this thesis was done under the supervision of Dr.
Qasim Ayub T.I. It is great pleasure for me to acknowledge the keen interest, advice,
patient guidance and kindness that I have received from him during the course of this
work.
I would like to thank Dr. Shagufta Khaliq, (PoP), for teaching all the molecular
genetics lab techniques and also to Dr Aiysha Abid for comments on this manuscript
and suggestion for its improvement.
I am also grateful to Mrs. Ambreen Ayub for her help in making the contour
map.
I thank my colleague Ms. Sadia Ajaz for her help and cooperation in proof
reading the thesis.
It has been an honor for me to work at SIUT and I thank Prof. Dr Adeeb Rizvi
H.I. S.I., Director, SIUT, for his constant support and guidance.
Finally, I would like to thank my parent, without their love and support the
completion of this work would have not been possible.
ii
LIST OF TABLES
Table Title Page
I. The possible origins and language affinities of Pakistani populations. 21
II. A list of Y haplogroups, markers, type of polymorphism and

genotyping methods used in this study. Y haplogroups were
determined in a hierarchal manner, screening initially with markers
that identified deep lineages (bold) and subsequently genotyping
markers that further delineated the tree in the target population. The
typing methods were amplified fragment length polymorphism (AFLP),
denaturing high performance liquid chromatography (DHPLC),
amplification refractory mutation system polymerase chain reaction
(ARMS-PCR) or dideoxy DNA sequencing (Seq). 41
III. List of SNPs typed by AFLP method 42
IV. YSTR Primer sequences. 46
V. Frequency of haplogroups B*, C*, E* and F* in ethnic groups from 53

Pakistan.
VI. Number and frequencies of populations fall in haplogroup B-T. 60
VII. Y lineages found in the three Punjabi castes examined in this study. 63
VIII. Percentage of variation obtained by AMOVA at three levels of

population hierarchy in ethnic groups from Pakistan. 68
IX. Population pair wise FSTs between Pakistani ethnic groups computed
from Y haplogroup frequencies. FST p values (based upon 110
permutations) are given above the diagonal with *indicating significant
pair wise differences. 69
X. Matrix of significant. FST p values (significance level =0.0500) based

upon 110 permutations among the ethnic group of Pakistan. 70
XI. Weighted population pair wise genetic distances (below diagonal)

and FST values (above diagonal) based on STR variation within
haplogroups. 80
XII. Description of World populations. 103
XIII. Y-STRS data of clade B lineages in Pakistan and African populations. 108
iii
LIST OF FIGURES
Figure Title Page
I. Map of Pakistan showing its neighbors, administrative regions

and the geographical distribution of the populations that are
included in this study. 20
II. Phylogenetic tree. 26
III. Distribution of haplogroups B*, C*, E* and F* in populations

from northern and southern Pakistan. 54
IV. Y haplogroup frequency distribution in ethnic group of 55

Pakistani.
V. Distribution of major Y lineages (PK2, M52, M67, M27)

frequencies in Pakistan. 64
VI. Distribution of major Y lineages (M357, M173, M17 and M124)

frequencies in Pakistan 65
VII. Principal component analysis based on Y haplogroup

frequencies in Pakistani populations. 67
VIII. Median-joining network of Lineage L individuals based on Y

STR haplotypes. 72
IX. A rooted maximum-parsimony tree of Y lineages found in the

Greek, Burusho, Kalash, Pathan and Pakistani populations. 75
X. A plot of the first two principal coordinates based upon the

analysis of Y haplogroup frequencies in Pakistani and Greek
populations. 77
XI. A plot of the first two principal coordinates based upon the
analysis of Y haplogroup frequencies in Pakistani and Greek
samples (1=this study; 2 = Francalacci et al., 2003) using
comparable biallelic markers. 78
XII. Neighbor-joining tree showing the relationship between the

Greek and three Pakistani ethnic groups. The tree is based
on genetic distances. 81
XIII. Median-joining network of clade E lineages in Pakistan (open

circles) and Greece (hatched circles). Circles represent
haplotypes and have an area proportional to frequency. The
Pathan individuals are shown in black. 83
XIV. Contour map showing the 9 Y-STR haplotypes frequency

distribution in Eurasia and northern Africa. This haplotype
was shared between three Greeks and a Pathan individual
belonging to clade E1b1b1a. 85
iv
XV. The frequencies of Major haplogroup in Asian population. 105
XVI. Median-joining network of C lineage. 106
XVII. Distribution of L haplogroup in Indo Pak sub continent. 107
XVIII. Median-joining network of clade B lineages in Pakistan and

African populations. Circles represent haplotypes and have an
area proportional to frequency. The Pakistani individuals are
shown in orange and light blue colour. 109
XIX. Geographic distribution of O haplogroup. 110
XX. Median-joining network H1-M52 lineage fall in Burusho,

Kalash and Pathan, based on their Y-STR haplotype. 115
XXI. Possible origins a) Hazara b) Kalash c) Parsi d) Makrani

Negroid. 120
v
SUMMARY
-1-
The data presented in this thesis provides a comprehensive report on Y
chromosomal diversity among different ethnic groups from Pakistan. It provides
insights into the genetic variation in Pakistan in a global context and also sheds light
on the patrilineal origins of these populations. The major conclusions are
summarized as follows:
1. Genetic relationships in Pakistan are dictated primarily by
geographic proximity rather than linguistics:
The results suggest that within Pakistan male genetic relationships are
dictated primarily by geographic proximity. Ethnic groups speaking Dravidian
(Brahui), Sino-Tibetan (Balti) or the language isolate Burushaski (Burusho) share
genetic affinity with their Indo-European speaking geographic neighbors. Although
the isolation of the Hunza Burusho in the mountains of northern Pakistan has led to
the preservation of their language it has not made them genetically distinct in
comparison with their neighbors in Pakistan.
Based on Y haplogroup frequencies, the majority of the ethnic groups from
Pakistan show evidence of admixture mostly with Central/South Asian and European
populations. This is illustrated by the fact that the major haplogroups such as E*, J*
and R*, that are frequent in west Asians and Europeans, together constitute 65% of
the total. Haplogroups L1 and R2 are shared with populations from India and
constitute 11% of the Pakistani population.
2. The Karakoram Mountains form a formidable barrier to gene flow
from China:
Haplogroups, such as haplogroup C3 and O*, that are commonly observed in
East Asians, are rare, or absent in the Pakistani populations and constitute < 1.5 % of
the total. Populations living in these mountain valleys such as the Hunza Burusho,
Balti and Kashmiri are all genetically closer to other ethnic groups in Pakistan. This
vi
low prevalence, or absence, of East Asian haplotypes in Pakistan indicates that the
Karakoram Mountains, which separate Pakistan and China, form a formidable barrier
to gene flow from the north. The Hazara are the only population with significant East
Asian ancestry but historical records indicate that they did not cross this geographical
boundary and arrived in the sub-continent from the West.
3. Genetic signatures of invasions:
The Indo-European contribution to the Y gene pool in Pakistan is substantial and is
probably a reflection of the colonization of the subcontinent by invaders from West
and Central Asia. These probably replaced the indigenous Y haplogroups which are
now mostly found in South Indians and isolated populations in the Andaman Islands.
Three populations (Burusho, Kalash and Pathan) also claim Greek ancestry
following Alexanders invasion of the subcontinent. However, the results shown here
only provided strong support for a minor Greek genetic contribution to the Pathan
gene pool.
The presence of a unique star cluster based on Y-STR haplotypes in
haplogroup C3 Y chromosomes in the Hazara population has been linked to the male
descendants of Genghis Khan (1162-1227). These Y chromosomes are prevalent in
Mongolia and are observed at a frequency of 60% in a much larger sample of Hazara
males from northern and southern Pakistan that were analyzed in this study.
Although this haplogroup was also observed in the Burusho (8.2%) but these
samples did not share the star haplotype pointing towards separate origins for these
populations. Historical records also support the genetic relatedness between East
Asians and the Hazara.
vii
4. The Kalash as genetic outliers:
This study also demonstrates that the Kalash have a distinct genetic identity
within Pakistan. Located in the remote valleys of the Hindu Kush Mountains they
show significant Caucasian ancestry but also have a high proportion of population
specific haplogroup L3a that is not found elsewhere in Pakistan. Their genetic
uniqueness is a reflection of genetic drift in an isolated population struggling to
maintain their distinct cultural and religious identity.
Future Prospects:
This endeavour expands our knowledge about Pakistani populations and
complements data obtained from analyzing autosomal and mitochondrial markers. It
improves our understanding of geographic, linguistic and religious factors on
population diversity and structure in this region and provides a basis for future work
in this field.
viii
INTRODUCTION
-2-
Where do we come from? What are we? Where are we going? These provocative
questions as framed in the title of the French artist Paul Gauguins painting have
always aroused human curiosity. Using evidence from archaeology, fossils and
lately genetics, scientists have gained insights into humanitys past.
Human evolutionary history begins with the appearance of our species about
2.5 -1.5 million years ago (MYA), the earliest evidence of which has been found in
Africa (Klien, 1989). With the passage of time, various species of the genus Homo
have been identified including H. ergaster, H. erectus, Neanderthals and the H.
floresiensis (Brown et al., 2004; Gabunia and Vekua, 1995; Swisher et al., 1994), all
of whom are now extinct with the exception of modern H. Sapiens, the last fully
developed species that appeared about 100,000 years ago in East Africa (Klien,
1989; Righmire, 1989). The demise of our early ancestors has been attributed to
harsh weather conditions or the difficulty in finding food and other life necessities.
There is consensus among the modern scientific community that modern
humans arose in Africa and several waves of migrations help explain their passage
out of Africa. Evidence from fossils and archaeological remains suggest that
expansion of modern humans became possible when weather conditions were
favorable. The discovery of 125,000 old artifacts in Eritrea`s Red Sea coast (Walter
et al.,2000) suggest that people from the Horn of Africa moved across the Arabian
peninsula to the southern part of the Red Sea. They reached southern Asia,
traveling further east to Australia (Stringer, 2000) around 50-60 thousand years ago
(KYA). The evidence found from Skhul and Qafzeh, in modern day Israel, dating 100
KYA suggests that another wave of migration humans crossed the Red sea and
entered the Levantine region 47 KYA. From Arabia, people moved towards west and
east and reached Western Europe and Siberia about 40 KYA and East Asia about 39
KYA. These waves of migrations resulted in development of several populations and
races of modern humans that are characterized by the differences in their physical
appearance,culture and language.
1
Fossil and archaeological evidence in favour of an African origin for modern
humans is also supported by molecular genetic evidence (Batzer et al., 1996;
Bowcock et al., 1991, 1994; Cann et al., 1987; Cavalli-Sforza et al., 1994; Horai et
al., 1995; Jorde et al., 1995; Knight et al., 1996; Lahr and Foley, 1994; Leakey 1994;
Mountain et al., 1994; Perez-Lezaun et al., 1997; Ruvolo et al., 1993; Scozzari et al.,
1988; Shiver et al., 1997, Stringer and Andrew, 1988; Tattersall, 1997; Tishkoff et al.,
1996). This biological evidence has provided valuable insights and, in association
with paleontology and archaeology, allowed the reconstruction of human history.
The blood groups were the first markers to be analyzed in human populations soon
after the discovery of the ABO blood groups (Landsteiner, 1901). Variations in these
blood groups were analyzed among Second World War soldiers and the slaves from
different nations (Hirszfeld and Hirszfeld, 1919). This was followed by the discovery
and analysis of variation of several classical serological markers such as the
immunoglobulin allotypes, red cell enzymes, human leukocytes antigens (HLA)
(Dausset, 1954; Grubb and Laurell, 1956; Payne et al., 1964) and serum proteins
(Harris, 1966). All these markers collectively contributed to our understanding of the
human variation and charted their origins and dispersals.
WHAT IS DNA?
In 1953 the celebrated Nobel Prize winners Watson and Crick described the
double helical chemical structure of DNA (Watson and Crick, 1953) and laid the
foundations for the development of DNA based genetic markers that have now
become the hallmark of research into our past history. The simple but elegant
structure of DNA that they described has two anti-parallel polynucleotide chains with
a sugar- phosphate backbone. The nucleotide bases in DNA are of only four kinds:
adenine (A), guanine (G), cytosine (C) and thymine (T) that strictly obey hydrogen
bonding of nucleotides A with T and G with C. The sequences of these bases in the
2
polynucleotide chain dictate the structure and function of proteins and every
morphological and functional characteristic of each cell in the human body.
In humans DNA is present inside the cellular nucleus and the mitochondria,
an extra nuclear organelle. In the mitochondria the DNA is small, circular and double
stranded with a length of 16,569 base pairs (bp) (Anderson et al., 1981; Ruiz-Pesini
et al., 2007). It consists of only 37 genes but has been extremely useful in tracing
back the maternal origin of the human populations because it has three important
characteristics:
1.) A maternal mode of inheritance (Giles et al., 1980).
2.) A high mutation rate (Olivio et al., 1983).
3.) A lack of recombination (Brown, 1979).
The human nuclear genome consists of a double stranded DNA molecule that
is packed into 23 pairs of chromosome. Of these twenty-two pairs or autosomes are
identical in both male and female. One pair, the sex chromosomal pair, is different in
the sexes. Females have two X chromosomes whereas males have one X-
chromosome which they inherit from the mother and one Y chromosome which is
paternally inherited. This Y-chromosome is passed from a father to his son and does
not undergo inter-chromosomal recombination for most of its length. This feature has
been of great value in the study of variation in modern human males.
The completion of the Human Genome Project (International Human Genome
Sequencing Consortium, 2004) has revealed that enormous variation exists in our
genome. Only 2-3% of our genome codes for functional molecules such as proteins
and RNA. The intergenic regions, which constitutes 97-98% of the sequence,
consists of repetitive sequences, regulatory sequences, pseudogenes, intermediate
to large scale DNA copy number and sequence variants. All are remnants of our
evolutionary past and provide valuable insights about what makes us human.
The human genome contains three billion pairs of nucleotides. The sequence
of the nucleotides that constitutes the DNA strand carries all the genetic information
3
required for the survival of an organism. The gene, which codes for a protein product
is located at a relatively fixed position on a chromosome and performs specific
biological functions during the development of an individual from a fertilized egg and
throughout life. Recent estimates show that the human nuclear genome contains
about 20,000 25,000 genes (The ENCODE Project Consortium, 2007).
Any change that occurs in the DNA sequence is referred to as a mutation or
polymorphism. It can be categorized on the basis of its size as either a large or small
scale mutation. Large scale mutations can also include abnormalities such as an
alteration in chromosomal number that occur in Downs syndrome (trisomy 21)
Klinefelters syndrome (XXY) and Turner syndrome (XO), or chromosomal
translocations as observed in the Philadelphia chromosome t(9;22)(q34;q11). These
chromosomal abnormalities can be easily detected by cytogenetic analysis. Small
scale mutations refer to the alteration in the sequence of the nucleotides. This
includes the replacement of one nucleotide with another, or the deletion, or insertion,
of any of the four nucleotides resulting in a new allele for a particular gene. In some
instances these new alleles may result in disease or improve the fitness of the
organism. In most cases they are neutral changes and do not play any beneficial or
detrimental role.
Any mutation in the germ line DNA sequence is inherited in a stable form and
has the ability to pass from one generation to the next. Mutations can occur either at
the time of recombination during meiosis, when the parental DNA is transmitted to
their progeny or during mitotic cell division that occurs throughout the life time of an
individual. They occur due to errors in DNA replication during cell division. Copying
DNA requires great accuracy for the insertion of the correct nucleotide to the growing
polynucleotide strand. DNA replication enzymes, the DNA polymerases have proof
reading activity that reduces the error rate. The 3`-5` exonuclease activity of these
enzymes removes one incorrect nucleotide at a time from the 3` hydroxyl terminus
until the correct nucleotide appears. Despite these effective DNA proof reading and
4
repair mechanisms replication error occurs at about10-9-10-11 per incorporated
nucleotide (Cooper et al., 1995; Cooper et al., 2000).
HUMAN GENETIC POLYMORPHISMS
In humans 99.9 % of the genome is identical and only 0.1-2.0% of the DNA
sequence shows variation. These variations result in genotypic differences between
individuals as well as phenotypic differences commonly observed in traits such as
height, facial morphology, skin, eye and hair colour. These variations occur due to
polymorphisms which are non-pathogenic changes that exist at significant
frequencies (usually > 1%) in any given population. To date many types of
polymorphism have been discovered in the coding regions as well as in the non-
coding regions of the human genome and they form the basis of all current genetic
markers. They are used not only to unravel our evolutionary past but to genetically
predict our biological future and as diagnostic markers.
The non-coding DNA sequences that constitute the bulk of the human
genome are dispersed through out the genome. The exact function of these non-
coding regions remain unknown and this non-genic DNA also known as selfish or
junk DNA.
Several recent findings have shown the dynamic nature of these regions that
play a major role in gene regulation. The junk DNA does not encode for any product
used by the cell. It has a tendency to repeat the sequences many times. In some
instances this interferes with the function of other genes or increases their copy
number. A great amount of non-coding DNA consists of short tandem repeats of
nucleotide, in the form of an array or a block of bases, scattered through out the
genome.
5
According to their size, the human polymorphisms can be classified as single
nucleotide polymorphisms, and repeat polymorphisms that include satellite DNA,
mini-satellite DNA, micro-satellite DNA and copy number variants.
SINGLE NUCLEOTIDE POLYMOPHISMS:

The most common polymorphism in the human genome is the single
nucleotide polymorphism (SNPs). SNPs include single base substitutions, deletions
or insertions. The base substitutions can be classified into two groups namely
transitions and transversions. In case of transition the purine is replace by a purine
(A G) or a pyrimidine by a pyrimidine (C T). Transversion is the substitution of
a purine by a pyrimidine (A/G C/T) or vice versa (C/T A/G). According to
Collins and Jukes (1994) the transition mutation occurs frequently in the mammalian
genome as compared to transversions.
SNPs are dispersed throughout the genome such as in the promoter region,
coding sequences, intronic sequences and non-coding regions. According to the
single nucleotide polymorphism database the human genome contains more than 55
million SNPs. More than 6 million SNPs lie within genes (Serre and Hodson, 2006).
SNPs were the first generation of polymorphic genetic markers. Their use
was realized in late 1970s with the development of restriction fragment length
polymorphism (RFLP) (Roberts and Murray, 1976). RFLP occurs when a mutation
causes a loss or gain of the recognition site for a restriction enzyme. Restriction
enzymes were discovered in 1968 (Meselson and Yucan, 1968) and they are of three
types designated TYPE I, II and III. Among them TYPE II restriction enzyme are
most useful for genotyping. These restriction endonucleases recognize specific DNA
sequences and cut the DNA within, or near, the recognition sequence. The first
polymorphism in a restriction enzyme site was observed for the human globin
structural gene with the restriction enzyme HpaI (Kan and Dozy, 1978).
6
Since then many SNP genotyping methods such as heteroduplex analysis
(Lichten and Fox, 1983), single-strand conformational polymorphism (Orita et al.,
1989), enzymatic mutation detection (Youil et al., 1995), microarray or variant
detector arrays (Dong et al., 2001; Hacia et al., 1999; Hacia and Collins, 1999;
Marshall and Hodgson, 1998; Qi et al., 2001; Ramsay, 1998; Wang et al., 1998;
Yoshino et al., 2001), high-throughput SNP genotyping (Jenkins and Gibson, 2002,
McClay et al., 2002), and molecular beacon methods (Mhlanga and Malmberg, 2001)
have been discovered to construct high-density SNP maps. More recently massively
parallel resequencing has revolutionized the pace of discovery of SNPs in individual
genomes and the Thousand Genome Project aims to catalogue SNPs occurring at
frequencies of <1% in several diverse human populations (Wheeler et al., 2008).
In the present century SNPs have become the markers of choice for many
applications in the forensic sciences and medical and evolutionary genetics. The
recent discovery of large numbers of SNPs and the determination of their allelic
frequencies in various populations provides a new approach to disease detection,
anthropological studies and pharmaco-genetic analyses which will benefit the
biomedical sciences. Studies have identified genetic variation due to SNPs as one of
the factors associated with susceptibility to many common diseases such as heart
disorders, blood pressure (Koschinsky et al., 2001), Type II diabetes (Tsunoda et al.,
2001), and asthma (Immervoll et al., 2001).
The discovery of million of SNPs has greatly aided the field of pharmaco-
genetics and pharmacogenomics which aims to tailor drugs based on a persons
genotype. The relationship between the SNPs, disease and medicine are not the
same among various populations or even among the individuals within a population.
Due to the presence of variations in the target genes or drug metabolizing enzymes,
some patients suffering from the same disease exhibit a life-threatening adverse
reaction to a particular medicine while others fail to show any adverse reaction.
Some show intermediate responses for the same drug. The genotype of an
7
individual based upon SNP markers will soon allow the design of different new and
more efficacious drugs for individual patients.
SNPs have also helped in understanding how the modern humans and their
genome has evolved. In particular, SNPs found on the Y chromosome and
mitochondrial DNA have been used to describe the origins and migrations of our
male and female ancestors, respectively.
COPY NUMBER VARIANTS:
Copy number variations (CNVs) are structural variations in DNA sequence
that occur due to differences in the number of copies of a particular genomic region.
They evolve due to the duplication or deletions of DNA segment ranging several
kilobase (kb) to mega base in size (Feuk et al., 2006).
CNVs were first uncovered among the normal, healthy human individuals
soon after the completion of the human genome project and many studies have
shown them to be as prevalent as SNPs and an important source of genetic
variation, contributing to our uniqueness (Feuk et al., 2005; Hinds et al., 2006; Iafrate
et al., 2004; Sebat et al., 2004; Sharp et al., 2005; Stefansson et al., 2005; Tuzun et
al., 2005). It is estimated that about 12% of the human genome and thousands of
genes differ with respect to copy number variation (Carter, 2007).
CNVs often encompass genes, and lead to dosage imbalances (Buckland,
2003; McCarroll et al., 2006; Repping et al., 2006). They have been shown to
influence phenotypic variation, gene expression and gene dosage and are
associated with several human diseases through these mechanisms. An increase in
the copy number of EGFR gene increases risk for non-small cell lung cancer
(Cappuzzo et al., 2005). Another study has demonstrated that the high copy number
of CCL3L1 is associated with lower susceptibility to human HIV infection (Gonzalez
et al., 2005). Low copy number of FCGR3B (CD 16 cell surface immunoglobulin
8
receptor) can increase susceptibility to systemic lupus erythematosus and similar
inflammatory immune system disorders (Aitman et al., 2006).
The most widely used method to study CNVs is by DNA microarray
technology based on comparative genome hybridization (CGH) using synthesized
oligonucleotides. This technology has been useful in the detection of new CNVs and
their association with normal and disease phenotypes (Carter, 2007). In the most
complete world wide analyses (Redon et al., 2006) the first-generation CNV map was
constructed using two different platforms of microarrays: single-nucleotide
polymorphism (SNP) genotyping arrays, and clone-based comparative genomic
hybridization. In this survey a total of 1,447 copy number variable regions (CNVRs),
covering 360 megabases (12% of the genome) were identified in 270 individuals that
had been previously surveyed for SNPs (The International HapMap Consortium,
2005).
SATELLITE DNA:
It is located mainly in the darkly stained region of chromosomes referred to as
heterochromatin. Its exact function is unclear (Csink and Henikoff, 1998; Henikoff et
al., 2001) but transcription is limited in this region and it is thought to play a role in the
structure and function of centromeres (Grimes and Cooke, 1998). It consists of large
blocks of short tandem repeats. Although genotyping these repeats are not easy, it
has been used in human evolutionary studies (Oakey and Tyler-Smith, 1990).
MINI-SATELLITE DNA:
The mini-satellite DNA or the variable number of tandem repeats (VNTR)
(Nakamura et al., 1987) was first identified in the human myoglobin gene (Jeffery et
al., 1985). It consists of intermediate size arrays of short tandem repeats and
thousands of arrays ranging from 0.1-20 kilobases (kb) are found in the euchromatic
region of eukaryotes chromosome (Jeffreys, 1987).
9
Most mini-satellites are rich in GC content and clustered towards the ends of
the chromosomes (i.e. telomeres) (Royle et al., 1988). The majority of mini-satellite
DNA is transcriptionally inactive, but in some cases they are expressed for example
MUC1 locus (Swallow et al., 1987).
Mini-satellites are highly polymorphic (Wong et al., 1987) with
heterozygosity values between 70 - 90% (Jeffrey et al., 1985) and their mutation rate
is also higher in comparison to the classical genetic markers (Jeffery et al., 1988). It
is estimated that mutations occurs at a frequency of 1-2% per gamete per generation
resulting in a new variant with a different repeat copy number in individuals and
populations. Baird et al., (1986) were among the first to analyze two VNTR loci,
HRAS-I and D14S1 in various populations.
MICROSATELLITE DNA:
The microsatellites also referred to as short tandem repeat (STRs)
polymorphisms or simple sequence repeats (SSR) are a special class of tandem
repeats firstly recognized by Birnboim and Straus (1975) as polypyrimidinic
stretches. The term microsatellite was coined by Litt and Luty (1989) and Edward et
al., (1991) coined the term STR.
STRs are composed of 1-6 base pair repeat units that follow each other in
tandem (Tautz, 1989). Depending upon the number of bases in the repeat unit they
are classified as mono-, di-, tri-, tetra-, penta-, or hexa-nucleotide repeats. The tetra-
nucleotide repeat (GATA) and the array of TG repeats were the first di-nucleotide
STRs identified in human delta and beta globin gene (Miesfield et al., 1981).
Subsequently CA repeats were identified in the actin gene of cardiac muscles
(Hamada and Kakunaga, 1982) and several other di-nucleotide repeats (GT or CA)
were described by these groups (Epplen et al., 1982; Hamada et al., 1982)
10
respectively. These repeats are found in the euchromatin region of the chromosomes
and do not generally cluster near the telomeric regions.
STRs constitute about 2% of the human genome and are more frequent than
the mini-satellites. Estimates place the number of STR loci to be approximately
100,000 in the human genome. Both mini-satellites and STRs can be produced by
the unequal crossing over and by DNA slippage during replication (Kruglyak et al.,
1998; Toth et al., 2000). New STR alleles are thought to arise mostly by the DNA
slippage during replication (Di Rienzo et al., 1994; Jeffrey et al., 1993; Kimmel and
Chakraborty, 1996; Shriver et al., 1993; Valdes et al., 1993).
In humans the di-, tri- and tetra-nucleotide repeats are more frequent in
comparison with the large polymorphic repeats. Among all classes of STRs the most
frequent are the di-nucleotide repeats that comprise 0.5% of the genome. They are
highly polymorphic and tend to mutate more rapidly as compared to the tri- and tetra-
nucleotides (Chakraborty et al., 1997; Webster et al., 2002). The motifs of CA/TG
repeats are present at a frequency of 1 per 36 kb whereas the AT/TA motifs are
present at 1 per 50 kb. The less common AG/CT arrays are presents at a frequency
of 1 every 125 kb. The rarest di-nucleotide repeats are CG/GC repeats that are
present at 1 per 10 Mb. Among the tri-nucleotides the most frequently found arrays
are the ACC repeats followed by AGC, ACT and less common ACG.
Genetic variation at STR loci make them very useful genetic markers that
have been extensively applied towards human identification specially in forensic
cases (Budowle et al., 1998; 2001; Gill et al., 1994), linkage analysis of disease
(Dietrich et al., 1992; Hearn et al., 1992; Jefferys et al., 1985; Jefferys and Pena,
1993; Queller et al., 1993; Todd et al., 1991) and as a powerful tool for the
investigation of human past and diversity (Bowcock et al., 1994). The multi-allelic
variation at STR loci has been exploited by population geneticists to create a
powerful, accurate and informative tool that has aided in reconstructing the
11
evolutionary history of man and exposed the relationship between various world
populations and languages (Ayub et al., 2003; Rosenberg et al., 2002).
A striking feature of STRs is their high mutation rate in comparison with
SNPs. The average mutation rate for tri- and tetra-nucleotide repeats at autosomal
loci is estimated between 7.0 x 10-4 and 9.3 x 10-4 (Zhivotovsky et al., 2000) and for
Y-chromosomal STRs estimates range between 2.4 x 10-3 and 6.9X10-4 per locus,
per generation depending upon whether the mutation rate is observed (Kayser et al.,
2000) or inferred (Zhivotovsky et al., 2004).
Although there is some evidence that the STR loci are neutral in nature and
not involved in any biological function, yet many studies show that some STRs, such
as CA repeats, are involved in the enhancement of gene expression (Hamada et al.,
1984). Many of them have binding sites for specific nuclear proteins (Richards et al.,
1993), most of which promote homologous recombination (Treco and Arnheim,
1986). The tri-nucleotide STR loci are associated with several genetic diseases.
The first such association of the tri-nucleotide motif CCG was reported with fragile X
syndrome (Fu et al., 1991; Kremer et al., 1991; Verker et al., 1991). In normal
individuals 6 - 54 CCG repeats are located at the 5 untranslated region of fragile X-
mental retardation 1 gene (FMR1). In affected individuals these number between
52 to 1000 repeats. The meiotic instability of these repeats are associated with over
a dozen of human diseases such as, X-linked spinal and bulbar muscular atrophy
SBMA (La Spada et al., 1991), myotonic dystrophy (Brook et al., 1992; Fu et al.,
1992).
TRANSPOSABLE ELEMENTS:
The other class of repetitive DNA includes the interspersed repetitive non-
coding DNA that occupies 45% of the human genome (International Human Genome
Sequencing Consortium, 2001; Li et al., 2001). Polymorphisms of this class have
12
also been linked with certain diseases. These are derived from mobile DNA
sequences, also called transposable elements (Prak and Haig, 2000; Smith, 1999).
These elements have an ability to migrate from one region of the human genome and
integrate into another region (Prak and Haig, 2000; Smith, 1996). Until now there is
no known mechanism for the removal of these elements.
The transposable elements can be characterized in to four groups:
A) Long interspersed nuclear elements (LINES)
B) Short interspersed nuclear elements (SINES)
C) Long terminals repeats LTR transposons (retro- virus like elements)
D) DNA transposons.
Depending upon the transposition mechanism these four groups are broadly
organized into two groups:
1) Retrotransposons or retroposons:
2) DNA transposons.
Retro transposons are transposable elements that make their copies through
reverse transcriptase and include LINEs, SINEs and LTRs. Cellular reverse-
transcriptases transcribe mRNA into neutral cDNA which is then integrated in any
region of chromosomal DNA.
In DNA transposons the DNA sequences are excised and directly integrated
into another place of the genome by a cut and paste mechanism. DNA transposons
accounts for 3% of the human genome and virtually all human DNA transposons are
non-functional (Strachan and Read, 2004).
The most successful and ancient transposable elements are the LINES.
These elements first appeared in the eukaryotic genomes about 600 million years
ago (Malik et al., 1999) and collectively comprises about 21% of the human genome.
These elements are sub divided into three distantly related families LINES 1, LINES
2, and LINES 3. In comparison with LINES 2 and LINES 3 elements, the LINE 1
13
element is the only family, which is still being actively transposed (International
Human Genome Sequencing Consortium, 2001).
LINE 1 is an important transposable element about 6.0 kilo-bases (kb) long.
Recent estimates based on computational methods suggest that about 500,000 L1
fragments reside in the human genome and make up 17% of the genome. (Lander et
al., 2001; Smith, 1996). These elements are mostly found in AT rich regions
(Kongberg and Rykowski, 1988). The LINE 1 element consists of two open reading
frames ORF1 and ORF2. ORF1 encodes a 40 kilo Dalton (kDa) RNA-binding protein
while ORF2 encodes 150 kDa protein, which have both endonuclease and reverse
transciptase activity (Feng et al., 1996; Mathias et al., 1991). The LINE 1 transcript
moves from the nucleus to the cytoplasm where it is translated to yield ORF proteins.
The LINE1 RNA assembles with its own encoded proteins and re-enters the nucleus,
where the L1 endonuclease cleaves one strand of DNA preferably at the 5`-TTTT.A-
3`consensus site (Cost and Boeke, 1998; Feng et al., 1996; Jurka, 1997; Morrish et
al., 2002) and the reverse transcriptase uses the same site to prime reverse
transcription from the 3` end of the LINE RNA. At the time of integration, in most
instances, the reverse transcription fails to proceed to the 5` end resulting in a
truncated, non-functional copy of LINE 1 element.
In the human genome about 99.8% copies of the LINE1 elements present are
defective (Gilbert et al., 2002; Kazazian and Moran,1998; Myers et al., 2002;
Ostertag and Kazazian, 2001; Sassaman et al., 1997) with an average size of 900 bp
(Lander et al., 2001). It is estimated that approximately 40 elements of L1 family are
still functional and produce new copies (Sassaman et al., 1997). At least 1 in every
50 humans has a new genomic L1 insertion. These occur in the parental germ cell or
during early embryonic development (Goodier et al., 2001; Luningprak et al., 2003;
Ostertag et al., 2002). The functional significance of this occurrence is unknown but
these new copies can be used as genetic markers such as the L1 insertion in the
centromeric alphoid array of human Y chromosome designated as LY1 (Santos et al.,
14
2000). Some times these insertions can lead to disease as in the case of hemophilia
B (Brooks et al., 2003; Kazazian et al., 1988).
SINES comprises 13% of the human genome. These sequences are 100-
400 bp long and include the Alu repeats which are dispersed throughout the human
genome. Unlike LINE elements they do not encode any protein and use the LINE
machinery for their transposition (Kajikawa and Okada, 2002). All, except one, of the
families of SINE elements originated from tRNA. The only exception is the Alu family
which originated from signal recognition particle component (SRP 7SL) RNA (Ullu
and Tschudi, 1984).
The Alu elements are about 300 bp long and they constitute 10.7 % of the
human genome. The Alu insertion has been postulated to have occurred early in
primate evolution, about 30-65 million years ago (mya) (Batzer et al., 2002; Deininger
et al., 1992; Deininger and Daniels, 1986; Deininger and Slagel, 1988; Kapitoov,
1996; Labuda et al., 1991; Shen et al., 1991). A subfamily of these Alu repeats
termed as human specific (HS) repeats (Batzer et al., 1990) appeared in the human
genome record within the last 6 million years (Batzer et al., 1991; Batzer and
Deininger, 1991). Approximately 75% of these HS repeats are present in all human
populations indicating that they were inserted early in human evolution and were
completely fixed before the migration of humans from Africa (Deininger et al., 1999).
Alu repeats have also proven to be extremely useful genetic markers (Myers
et al., 2002; Watkins et al., 2001). About 25% (400 sites) of these recent Alu
insertions are variable among world populations and highly informative in
ascertaining the relationships between human populations. Several Alu insertions
are associated with human diseases such as hypertension (Barley et al., 1996; Duru
et al., 1994; Jeng et al., 1997), myocardial infarction (Ludwing et al., 1995),
ventricular hypertrophy (Schunkert et al., 1994) and cardiomyopathy (Raynolds et al.,
1993).
15
In the human genome 8.5% of repetitive DNA belongs to LTR which
comprises of autonomous and non-autonomous elements. About 4.7% of the human
genome is occupied by the autonomous endogenous retroviral sequences (ERV).
This human ERV (HERV) contains many sub-families and shows a small number of
polymorphism (Turner et al., 2001). Many of the LTRs are defective and
transposition has been rare. The non-autonomous element of LTR consists of the
MaLR family accounts about 3.8% of human genome. This family lacks the pol gene
and at times the gag gene.
Over the past decade the genetic variation of these DNA based markers has
been exploited to unravel the paternal and maternal lineages and the relationship
among modern humans (Cavalli-Sforza, 1994; Hammer et al., 1997; Quintana-Murci
et al., 1999 a, b and c). The current study was designed to use polymorphic markers
to uncover the genetic history of ethnic groups residing in present day Pakistan and
provide basis for further analyses of these populations in genetic association and
disease susceptibility studies.
THE GENETIC HISTORY OF PAKISTAN
The modern state of Pakistan was established on August 14, 1947, but the
region where it is located, the Indo-Pak subcontinent, has been of importance
throughout human history. The country lies on the postulated southern coastal route
that modern humans took from Africa to Australia.
The earliest evidence indicates that humans were present in this region
around 100,000 -150,000 years ago but the fossil record is non-existent. Neolithic
sites have been found in the Peshawar Valley in the north-west and at Mehrgarh, in
the south-east in the province of Baluchistan (Jarrige, 1991). The evidence found at
Mehrgarh indicates a modern human settlement dating to around 7,000 B.C. This
predates the region's other earliest civilizations, the Indus Valley civilizations found
16
throughout the sub-continent with major centres at Harappa and Mohenjo-Daro in
Pakistan. This civilization flourished in the 3rd and 2nd millennia B.C. (2,500-1,500
B.C.).
Due to its geostrategic importance as the gateway to India this region was
invaded many times. Around 1,500 B.C. the Indo-European speaking nomadic
pastoral tribes, the so-called Aryans, entered this region through the Hindu Kush
Mountains and established their supremacy replacing the Dravidian language
speakers who were thought to be there initially. Their rule lasted from about 1,500
B.C.500 B.C. when this region was occupied by the Persian Empire. In 326 B. C.
this region was conquered by Alexander the Great. Subsequently it was conquered
by the Mauryas (305 B.C.), Saka (97 B.C.), Arabs (711 A.D.), Turks (1001), Mughals
(16th cen.) and lastly by the British Empire.
India and Pakistan house many different races and languages and are often
referred to as "a museum of races." Present day Pakistan has a population of over
170 million (Pakistan Economic Survey, 2006-2007) and consists of more than 12
ethnic and linguistic groups, the majority being descendants of the invader stocks.
Ethnic groups from the southern part of Pakistan include Baloch, Brahui, Makrani
Baloch, Makrani Negroid, Parsi and Sindhi. Major populations represented by the
northern groups include Balti, Burusho, Kalash, Kashmiri, Pathan and Punjabis. The
latter form the majority population of this country and include several castes.
Linguistic groups found in Pakistan include a language isolate, Dravidians, Sino-
Tibetans and Indo-Europeans. The latter is spoken by a majority of the population.
STUDY OBJECTIVE:
The main objective of the study is to shed light on the population histories of
numerous ethnic groups living in modern day Pakistan. Earlier studies used a only
limited number of polymorphic Y chromosomal markers (Qamar et al., 1999, 2002)
and since then many more informative Y-SNPs have been discovered (Karafet et al.,
17
2008) which have not been typed in this population. Another caveat of the earlier
work was the lack of samples from the Punjab which constitutes the majority
population of Pakistan and this has been addressed in this study.
The study aims to screen Y chromosomal variation in a large number of
Pakistani males from various ethnic and linguistic backgrounds in order to
understand population origins and substructure and unravel the influence of Central
Asia, China, Greece and Persia on this population. Statistical analyses and
simulation modeling is used to identify geographic origins of population groups,
episodes of genetic bottlenecks, demographic expansions and genetic admixture. It
is my hope that these analyses will improve our knowledge of group membership
within Pakistan that will have practical applications in DNA based human forensic
analyses, the design of disease association studies and have implications in
rationalizing use of medicines tailored to an individuals genetic make up.
18
LITERATURE REVIEW
3
PAKISTAN AND ITS POPULATIONS
Pakistan lies in a region that has seen the passage of many invaders and all
have contributed to the racial and linguistic diversity found in this country. It is
bordered by China in the north, India in the east, Iran and Afghanistan on the west
and the Indian Ocean straddle the southern coast line. The Pakistani population
according to the Ministry of Finance is estimated to be 156,770,000 (Pakistan
Economic Survey, 2006-2007) but the World Health Organization estimates the
number to be much higher.
Pakistan consists of four provinces, the northern areas and the Federally
Administered Tribal Areas (FATA) which are located on the Afghan frontier. More
than 18 ethnic and 60 linguistic groups (Grimes, 1992) reside in this country. Major
ethic groups include Baloch, Brahui, Pathans, Punjabis and Sindhis. The majority
Punjabi speaking populations show a great and complex admixture of many ethnic
caste and groups (Ibbetson, 1883) such as the Gujar, Jats, Meos, Rajput and Arians
etc. Other ethnic groups that are of anthropological interest include the Makrani-
Negroid, Mohanna and Parsi in the south and Balti, Burusho, Kalash and Kashmiri in
the north. Of particular interest are the Hazara population which resides in
Baluchistan and the North West Frontier Province (N.W.F.P.). The geographic
locations of the above mentioned Pakistani population are shown in Figure I and their
possible origins and linguistic affiliations are listed in Table I.
19
Figure I. Map of Pakistan showing its neighbours, administrative regions and
the geographical distribution of the populations that are included in this study.
20
Table I: The possible origins and language affinities of Pakistani populations.
The numbers in brackets refers to the population size.
Location Population Language Suggested Origins
North
Balti (300,000) Sino-Tibetan Tibet.
Burusho (60,000) Isolate Greek; Central Asian.
Hazara Indo-European Genghis Khans soldiers.
Kalash (5,000) Indo-European Greece; Syria?
Kashmiri Indo-European Jewish, Indo-Aryans.
Pathan (17,000,000) Indo-European Jewish; Greek; Admixture.
Punjabi (63,000,000) Indo-European Admixture.
South
Baloch (4,000,000) Indo-European Aleppo, Syria
Brahui (1,500,000) Dravidian West and Central Asia
Makrani Baloch Indo-European West Asia
Makrani Negroid Indo-European Africa
Mohanna Indo-European Indigenous fishermen
Parsi (~2000) Indo-European Persia/Iran
Sindhi (15,300,000) Indo-European Admixture
21
Three major Pakistani populations: the Baloch, Brahui and Makrani reside in
the province of Baluchistan and constitute the southern group. Historians believe
that the Baloch migrated from West Asia to South Asia. They claim that they are of
Semitic stock and that between 1 and 2 millennium B.C. their homeland was the
ancient region of Nineveh and Babylon in modern day Iraq. From there they
migrated to Iran, Afghanistan and Pakistan. Many Baloch tribesmen reside in south-
east Iran as well. Some historians also claim that they came from Aleppo in Syria in
682 A.D. (Quddus, 1990) when at least 44 tribes migrated to Iran. Their movement
into Pakistan is considered to be recent. At the beginning of the 10 th century they
moved from Iran and occupied Sistan and as a result of Seljuq invasion they settled
on land of Makran. In the fifteen century they migrated eastward and settled in
Kachi. Now they occupy the area of Sibi and the Loralai District of Quetta Division in
Pakistan (Marri, 1985).
The Brahuis are considered to be the descendents of Turko-Iranian tribes that
migrated from west and central Asia and settled in the Sarawan and Jhalawan
regions of Kalat State in Baluchistan (Hughes-Buller, 1991; Quddus, 1990). They are
the only group in Pakistan that speaks a Dravidian language.
The southwestern dry and arid Makran coast of Pakistan is home to two
distinct populations of Makranis: ___ the Makrani-Baloch and Makrani-Negroid. The
Makrani-Baloch expresses linguistic and ethnic affiliation with the neighboring Baloch
tribes (Grimes, 1992). However, many Makrani have Negroid features and are
referred to as Makrani-Negroids. It has been hypothesized that they originated in
Africa and migrated to Pakistan along the coastal route.
Another population that reside in Baluchistan, mainly in and around the
provincial capital, Quetta, are the Hazara. The name Hazara is derived from the
Persian word meaning thousand. This population is also found in the town of
Parachinar in the NWFP and widespread in Afghanistan. They have typical Mongol
features and claim descent from a detachment of thousand soldiers left behind by
22
Genghis Khan during his invasion of India. Historical records show that they settled
in Pakistan to escape persecution in neighboring Afghanistan.
The other populations from southern Pakistan include the Sindhi, Mohanna
and Parsi all of whom reside in the south eastern province of Sindh. The Sindh
province is referred to in several ancient texts ___ Sindomana by the Greek and
Sindhudesha by ancient Hindus. This region was conquered by the Greek,
Parthians, Brahmans, Arabs, and finally by the British and Mohenjo-Daro, the jewel
of the Indus Valley Civilization, is located here. As a result of multiple invasions and
migration the Sindhis are considered to be an ethnically mixed population of Indo-
European speakers. The Mohanna are another Indo-European population of
fishermen who have been residing on the banks of the River Indus for centuries.
Little is known about their origins.
The suggested origin of the Parsis is in Persia (Nanavutty, 1997). They are
the followers of the Iranian prophet Zoroaster, migrated from Iran to the state of
Gujrat in northwest India in 7th century A.D. after the collapse of the Sassanian
Empire. Many Parsis eventually settled in Mumbai in India and Karachi in Pakistan,
although very few remain in Pakistan.
Several populations reside in the northern part of Pakistan. The Pathans
reside in the North West Frontier Province (N.W.F.P) and its adjoining tribal areas.
They also inhabit the southern and eastern part of Afghanistan and Baluchistan
province of Pakistan. They are also known as Pushtuns, Pakhtuns or Afghans and
are an Eastern Iranian ethno linguistic group formed by amalgamation of several
tribes practicing a traditional code of conduct and honor. They claim to be
descendants of soldiers who came with Alexander the Great and several historical
sources suggest that they are of Semetic stock (Caroe, 1958).
Northern Pakistan is also home to some unique ethno-linguistic populations.
Among them are the Balti, Burusho and Kalsh. Baltis speak a Sino-Tibetan language
and their suggested origin is in Tibet (Dani, 1991). They reside in Baltistan, the north
23
eastern Himalayan region of Pakistan.
The Burusho, one of the isolated northern populations, also believe that they
are the descendants of Greek generals who came to the subcontinent with Alexander
the Great in 327-323 B.C. (Biddulph, 1977). They reside in Hunza, Nagar and Yasin
Valleys in the Karakorum Mountains and are the only population in Pakistan who
speak a language isolate.
The Kalash also claim descent from Greek Macedonia citing Alexanders
invasion of the subcontinent. They reside in the valleys of Bumburet, Rambur, and
Birir near Chitral in the Hindu Kush Mountain ranges in the NWFP. They have been
extensively studied by anthropologists for their unique culture and traditions (Lines,
1999).
DEMOGRAPHIC HUMAN HISTORY
Human diversity occurs as a result of multiple events during human evolution,
migration, and colonization (Lahr and Foley, 1994). Studies reveal that human
history can be deciphered from the analyses of the human genome. The genomic
variation in human individuals and populations contains enough information to allow
the reconstruction of human population history, migration patterns and population
structure.
At the beginning of 20th century data obtained from protein markers led to
insights into human origins, divergence and demographic history (Cavalli-Sforza,
2005). However, in recent years DNA based markers have proved to be more
efficient tools for elucidating questions of human evolution and migration. An
informative DNA marker should be both highly polymorphic and selectively neutral.
DNA markers on the non-recombinant portion of the human Y chromosome and the
mitochondrial DNA are polymorphic markers that have been successfully applied to
shed light on human evolutionary history from the male and female perspective,
respectively.
24
Y CHROMOSOMAL VARIATIONS
Y-chromosomal DNA polymorphisms were first reported in 1985 (Casanova
et al., 1985; Lucotte and Ngo, 1985). Since then more than 600 binary
polymorphisms, the majority of them being SNPs, and numerous multi-allelic STR
markers have been identified on the human Y-chromosome (Karafet et al., 2008).
Since most of the Y chromosome does not undergo recombination these
biallelic polymorphisms define unique mutational events and therefore, unique Y
chromosomal haplogroups. The presence of numerous biallelic polymorphisms
allows their organization in the form of a phylogenetic tree that shows relationships
among the various Y haplogroups. Efforts by the Y Chromosome Consortium (YCC)
have led to the development of a standardized nomenclature system for such a tree.
The initial tree based upon approximately 200 markers (Jobling and Tyler-Smith,
2003; Y Chromosome Consortium, 2002) was recently revised to identify 311 distinct
Y haplogroups (Karafet et al., 2008). The phylogenetic tree is rooted with respect to
the ancestral state of non- human primate sequence.
The Y lineages on the phylogenetic tree contain major 20 major haplogroup
clades designated AT (figure II). Karafet et al., (2008), refer to these as paragroups
in order to differentiate them from the 311 haplogroups that are identified by terminal
mutations, but earlier studies use these terms interchangeably. Y chromosomes
identified by STRs are designated as haplotypes, and those that are defined by the
combination of biallelic markers and STRs are called lineages as proposed by de
Knijff (2000). A brief description of the salient features of major Y haplogroup clades
follows:
HAPLOGROUP A:
Haplogroup or clade A* contains 12 additional haplogroup branches, all
restricted to Africa (Hammer et al., 2001; Underhill et al., 2001). All individuals that
25
26
fall in this group carry the ancestral state for M42, M94, M139, and SRY10831.1 and
derived state for M91 and P97 (Karafet et al., 2008). The M91 lineage is sub divided
into three main haplogroup characterized by derived alleles for the markers P108, M6
and M32. These haplogroup have been mainly observed in the Khoisan and Bantu
speakers from South Africa, Pygmies from Central Africa and in the Sudanese,
Ethiopian and Mali populations of East Africa (Hammer et al., 2001; Semino et al.,
2002; Underhill et al., 2001; Wood et al., 2005).
HAPLOGROUP B:
Clade B* haplogroup are characterized by having derived alleles for M60
SNP. They are also derived for the markers M42, M94 and M139. All 17 branches
of clade B* are frequently found in sub-Saharan Africa. The major sub-clades are
B1* defined by M236 and B2* define by M182 haplogroup. Sub-clade B1a defined
by the M146 marker is mainly found in Mali. The B2* cluster has several
haplogroups one of which is derived for the marker, B2a*- M150, and is frequently
observed in East Africa (Sudan and Ethiopia). The B2b* (M112 or M192 derived Y
chromosomes) are found in Pygmies from central and southern Africa (Cruciani et
al., 2002; Hammer et al., 2001; Jobling and Tyler-Smith, 2003; Semino et al., 2002;
Underhill et al., 2001; Wood et al., 2005).
The distribution and expansion of clades A* and B* suggests that these Y
chromosomes spread very early within the African continent and is supported by the
palaeo-anthropological record of human population expansions through out Africa,
north and south of the Sahara Desert, eventually reaching the Levant about 130,000-
90,000 year ago (Lahr and Foley, 1998).
HAPLOGROUP C:
A total of 30 mutations and 19 haplogroups are currently reported for this
clade. It is defined by five mutations, the hallmark being the synonymous RPS4Y711
27
C to T transition (also referred to as M130) in the exon of the RPS4Y gene that was
among one of the earlier Y chromosomal polymorphisms that were identified (Fisher
et al., 1990). This clade has not been found in sub-Saharan Africa and the mutations
defining this haplogroup probably occurred in Asia after the migration of modern
humans out of Africa. Walter et al., (2000) has suggested that this mutation
originated in south Asia about forty to fifty thousand years ago with the dispersal of
modern humans from the Horn of Africa via a coastal or interior route towards south
Asia. The haplogroup is frequent in populations from Central and East Asia. It is
also found in many indigenous Australasian and Polynesian populations and the
Native American Indian tribes (Capelli et al., 2001; Hammer et al., 2001; Hudjashov
et al., 2007, Karafet et al., 2001; Kayser et al., 2006; Ke et al., 2001; Kivisild et al.,
2003; Scheinfeldt et al., 2006; Underhill et al., 2001; Zegura et al., 2004).
HAPLOGROUPS D and E:
A Y Alu polymorphism (YAP) defines these haplogroups. All Y chromosomes
belonging to these branches have an Alu insertion. Clade D* is restricted to Asian
populations mainly in Japan and Tibet (Su et al., 2000; Karafet et al., 2001). The 15
haplogroups that are part of this clade are all characterized by the presence of M174
T to C transition (Underhill et al., 2000). These are scattered throughout south East
Asia and among Andaman Islanders (Hammer et al., 2006; Thangaraj et al., 2003).
Clade E* is more mutationally diverse and widespread with 56 distinct
haplogroups (Karafet et al., 2008). Y chromosomes belonging to clade E* have been
found in Africa, Levant, Europe, Central and South Asia (Hammer et al., 1998;
Underhill et al., 2001). Clade E* haplogroups are derived for several markers
including M96 and SRY4064. The major sub-clades are E1* and E2* that are
characterized by derived alleles for P147 and M75. The topology and nomenclature
of this branch has been recently revised with the discovery of several novel
mutations. Important sub-clades of E* include E1b1* that is derived for the P2
28
polymorphism and accounts for 80% of clade E haplogroup. M2 or sY81 derived
haplogroup (E1b1a*) are present at high frequencies in sub-Saharan Africa, whereas
the E1b1b* haplogroups defined by the M215 mutation are frequently observed in
north and east Africa, the Mediterranean basin and the Europe (Hammer et al.,
1997). It has been suggested that clade E* haplogroup were spread by the Bantu
farmers during the Neolithic period (Passarino et al., 1998; Scozzari et al., 1999).
The representatives of these haplogroup traveled from the Middle East to southern
Europe and northern India and Pakistan (Cruciani et al., 2002; Hammer et al., 1998;
Semino et al., 2004; Sims et al., 2007; Underhill et al., 2001).
HAPLOGROUP F:
M168 derived haplogroup that have the derived allele for M89 C to T
transition is frequent in non-African populations. Besides M89 and M213 (Underhill
et al., 2000) this clade is now also identified by several markers discovered by
Hammer et al., (2001). The haplogroup probably arose in East Africa about 45,000
years ago and dispersed to Eurasia through the Levantine corridor. Underhill et al.,
(2001) have suggested that the African ancestors first migrated to the Middle East
around 40,000 years ago and eventually expanded towards the west, east and north
giving rise to several major clades (GT) of the Y phylogenetic tree. Paragroup F* is
found mainly on the Indian subcontinent and in Sri-Lanka (Kivisild et al., 2003;
Sengupta et al., 2006).
HAPLOGROUP G:
Characterized by the M201 and P257 mutations this haplogroup is present in
South East Europe, the Mediterranean region, Anatolia, West and Central Asia
(Behar et al., 2004; Cinnioglu et al., 2004; Jobling and Tyler-Smith, 2003; Regueiro et
al., 2006; Sengupta et al., 2006) and North Caucasus (Nasidze et al., 2003).
29
HAPLOGROUP H:
Found almost exclusively in the Indo-Pak subcontinent these haplogroups are
characterized by M69, a T to C mutation. The 10 currently identified haplogroups
within this clade are separated into two major clusters: H1* and H2*. H1* clade is
defined by the M52 A-C transversion whereas the H2* haplogroup is characterized
by the Apt G to A transition. Both have been observed in India but only H1* has
been reported in populations from Pakistan (Jobling and Tyler-Smith, 2003; Karafet
et al., 2005; Sengupta et al., 2006).
HAPLOGROUP I:
It is one of the major clades found in European populations and defined
initially by the M170 A-C transversion. It is thought that this mutation was acquired
during the early expansion of Levantine populations towards the west. Clade I
comprises 16 haplogroups. It is found at high frequency in North Europeans
(Hammer et al., 2001; Jobling and Tyler-Smith, 2003; Rootsi et al., 2004).
HAPLOGROUP J:
One of the major clades that defined by the 12f2a and more recently the
M304 deletion and P209 marker (Karafet et al. 2008). It has two main branches J1*
which is M267 derived and J2* which is derived for M172 (Cinnioglu et al., 2004;
Underhill et al., 2000). The J* clade and its branches probably arose in the Middle
East and Anatolia (Turkey) from where they spread to west Asia and Eurasia
(Hammer et al., 2000; Semino et al., 2004). It is frequent in both India and Pakistan
(Mohyuddin et al., 2006).
30
HAPLOGROUP K:
This haplogroup is a mixed bag characterized by derived alleles for the M9 (C-G
transversion) marker (Underhill et al., 1997). Its low incidence in Africa illustrates
that the mutation occurred after the migration out of Africa. A recent survey by
Karafet et al., (2008) demonstrated derived states for an additional three markers
(P128, P131 and P132) for this haplogroup. The K1 branch derived for M147 has
been observed in populations from the Indo-Pak subcontinent (Underhill et al., 2001).
The K2 branch has been re-designated as haplogroup T* (Karafet et al., 2008).
HAPLOGROUP L:
The L* lineage probably arose in West Asia in a pre-Holocene era and was
initially identified in samples from the Indus Basin in Pakistan (Underhill et al., 2000).
One branch L1 (derived for M27 and M76) probably arose in the Indo-Pak
subcontinent. It is absent in North-East India and found at a low frequency in Central
India and the Northern region of India and Pakistan. The highest frequency at South
India and South-West Pakistan suggests that this lineage originated in the Indian
Peninsula (Sengupta et al., 2006). Other branches of haplogroup L* are present in
the Middle East, Central Asia, Northern Africa, and Europe and along the
Mediterranean coast (Cinnioglu et al., 2004; Cruciani et al., 2002; Jobling and Tyler-
Smith 2003; Sengupta et al., 2006).
HAPLOGROUP M:
Characterized by the P256 SNP this clade is predominantly found in
Indonesia, Melanesia, Papua and New Guinea (Capelli et al., 2001; Hurles et al.,
2002; Kayser et al., 2006; Scheinfeldt et al., 2006; Su et al., 2000). Currently 20
mutations characterize the 12 haplogroups found within this branch (Karafet et al.,
2008).
31
HAPLOGROUPS N and O:
The A to G transition of M214 identifies the ancestor of two major
haplogroups clades N* and O*. M231 and LLY22g characterize clade N* and N1*
and the M175 deletion clade O* (Cinnioglu et al., 2004). Haplogroup N* probably
originated in Asia but are now predominantly found in European populations (Karafet
et al., 2001; Rootsi et al., 2007).
Clade O* is found at high frequency in East Asians. A major branch of this
clade is characterized by the Y-SNP O3*-M122 and it predominates in East Asia and
is found in a majority of the Chinese population. The microsatellite diversity in this
sub haplogroups is highest in South-Chinese population indicating it appeared there
before expanding northwards approximately 30,000-25,000 years ago (Shi et al.,
2005).
HAPLOGROUPS P, Q and R:
Clade P* is defined by the presence of 92R7, M45, M74 and several other
SNPs that are derived for the M9 mutation as well. This clade includes several major
groups that are prevalent in various world populations.
Haplogroup Q* (derived for the C to T M242 mutation) probably arose in
Central Asia from where these chromosomes spread throughout the world (Seielstad
et al., 2003). These Y chromosomes are found at high frequency in North Eurasia
and Siberia (Karafet et al., 2002) and at lower frequencies in Europe, East Asia and
the Middle East. One major branch of this haplogroup (Q1a3a*-M3) is almost
exclusively restricted to the Native Americans (Zegura et al., 2004).
Eight mutations, including the M207 A-G SNP, represent clade R. This clade
is further characterized into two sub-clusters R1*-M173 and R2-M124. It is assumed
that around 30,000 years ago the R*-M207 mutation expanded westwards to reach
Europe, Caucasus, Middle East, Central Asia, northern India and Pakistan. The R1*
haplogroup is one of the most common in Europe and west Asia and probably
32
originated in central Asia. The R1a1*-M17 clade that is characterized by deletion of
the G nucleotide (Underhill et al., 1997) is frequently found in south-west Pakistan
and north India (Jobling and Tyler-Smith, 2003).
HAPLOGROUPS S and T:
A reexamination of the Y phylogenetic tree led to the addition of haplogroups
S* and T* characterized by markers M230 and M184, respectively (Karafat et al.,
2008). Haplogroup S* chromosomes were previously characterized as K-M230 while
those now belonging to clade T* were previously identified as haplogroup K-M70 (Y
Chromosome Consortium, 2002).
Clade S* lineages are also identified by P202 and P204 markers and are found in
Oceania and Indonesia (Kayser et al., 2006; Scheinfeldt et al., 2006). Clade T* that
is also characterized by M70, M193 and M272 is further delineated by M320 and P77
and has been observed in the Middle East, Africa, and Europe (Underhill et al., 2001;
King et al., 2007).
33
MATERIALS AND METHODS
-4-
COLLECTION OF SAMPLES:
For this study, the blood samples were collected from1213 unrelated male
subjects, belonging to sixteen ethnic groups of Pakistani population. Informed
consent was obtained from all participants included in this study. Ethnicity of the
sampled individuals was confirmed prior to collection.
10ml blood of each individual was collected in Vacutainer tubes (Becton
Dickinson, Mountain View, CA.). 66 samples belong to Baloch and 117 samples
from Brahui population were collected from Quetta and Kalat Division in Baluchistan.
97 samples belong to Burusho population were collected from Hunza and Nagar in
the Northern Areas. 224 Hazara samples were collected from the area of Parachinar
and Quetta. 44 Kalash samples were collected from Chitral Division. The 90 blood
samples of Parsis and 14 Balti samples were collected from Karachi. 96 Pathan
samples were collected from the North-West Frontier Province. 138 Sindhi samples
were collected from the Sukkur in Sindh. 16 samples of Meos, 10 Rajput and 159
Gujar samples were collected from the rural areas of Punjab Province. 27 Makrani-
Baloch, 33 Makrani-Negroid and 70 Mohanna samples were collected from interior
part of Sindh Province. 12 Kashmiries were collected from Muzafrabad (Kashmir).
The 77 Greek DNA samples were provided by Dr. Myrto Papaioannou (Unit of
Prenatal Diagnosis, Center for Thalassemia, Laiko General Hospital, Athens,
Greece).
PREPARATION OF EPSTEIN-BARR VIRUS FROM B95-8 CELLS:
The Epstein-Barr Virus (EBV) producing B95-8 marmoset cell line (American
Type Culture Collection, Manassas, VA) was suspended (5 x 106 cells) in 10 ml of
wash medium which consisted of RPMI-1640 (Sigma-Aldrich, St. Louis, MI)
supplemented with 1% fetal calf serum (FCS; Biochrom AG, Berlin, Germany) and
1X GPPS (2 mM L glutamine, 100 U/ml penicillin, 1 mM sodium pyruvate and 50
g/ml streptomycin) and centrifuged in an IEC-HN-SII bench top centrifuge
34
(International Equipment Company, Needham, MA), at 1000 rpm (300g) for 10
minutes. The supernatant was decanted and the pellet was washed twice in 5 ml of
wash medium followed by centrifugation at 1000 rpm for 10 min. The cells were
transferred into a 25 cm2 culture flask (Corning, Corning, NY) containing RPMI-1640
medium supplemented with 1X GPPS and 10% FCS. The flask was incubated at
37 C in a humidified atmosphere of 93% air and 7% CO2. The culture was gradually
expanded and split first into a 75 cm2 and finally in 125 cm2 flasks. When the
medium in the culture flask became yellow they were incubated at 34 C without any
additional medium supplementation for 7 days to enhance EBV production. On the
8th day the cell pellet was removed from the suspension by centrifugation at 1000
rpm for 10 minutes. The supernatant containing EBV was filtered through a 0.45 M
Millipore membrane filter (Nilsson, 1976). The EBV supernatant was aliquoted into
cryovials (Corning, Corning, NY) and stored at70 C until use. 1 ml aliquot of this
preparation was able to transform human B lymphocytes.
PREPARATION OF LYMPHOCYTES:
For the isolation of peripheral blood mononuclear cells (PBMC),
approximately 5 ml venous blood was collected in acid citrate dextrose (ACD)
vacutainer tubes (Becton Dickinson, San Jose, CA). The blood was layered over 3ml
Histopaque-1077 (Sigma Aldrich) in a sterile 15 ml polypropylene conical tube
(Corning, Corning, NY). Each sample was centrifuged at 2000 rpm (400g) for 20
minutes. The upper plasma layer was aspirated and PBMC were collected from the
interface between the plasma and Histopaque and transferred in to another sterile 15
ml tube containing 10 ml wash medium and centrifuged at 1000 rpm for 10 minutes.
The supernatant was decanted and the cell pellet washed twice with 5 ml wash
medium and resuspended in 1 ml of wash medium (Boyum, 1968). Cell viability was
checked by the trypan blue exclusion test.
35
CELL COUNTING BY TRYPAN BLUE EXCLUSION TEST:
Cell viability was calculated by the trypan blue exclusion test as described by
Kruse, (1973). An equal volume (10 l) of cell suspension was mixed with 0.16%
(w/v) trypan blue solution in physiological saline. Cells were counted using a
haemocytometer. Unstained live and blue stained dead cells were counted in the
central 1mm square of the counting chamber. The cell viability was calculated by the
following formula:
Number of live cells total number of cells x 100.
The total number of live cells per ml was calculated as follows:
Number of live cells x 2 (dilution factor) x 104.
ESTABLISMENT OF EBV TRANSFORMED LYMPHOBLASTOID CELL
LINES:
In order to preserve and obtain an inexhaustible supply of an individuals DNA
human lymphoblastoid cell lines were established. Approximately 4-5 x 106 PBMCs
were transferred to a 25cm2 culture flask, containing 3 ml transformation medium
(RPMI-1640, 10% FCS, 1X GPPS, 0.05 mM beta- g/ml
cyclosporin A) and 1 ml EBV supernatant prepared earlier. The flask was incubated
at 37 C in a humidified atmosphere of 93% air and 7% CO2, keeping the cap of flask
slightly loose (Walls and Crawford, 1987). The culture was visualized periodically
under an inverted microscope. After 5-6 days when colonies formed and the culture
medium became acidic, the culture was fed with feeding medium (RPMI-1640, 10-
15% FCS and 1X GPPS). When the transformed cell density in a culture flask had
suitably increased, half of the culture was transferred into a 75cm2 culture flask and
expanded for cryogenic preservation and DNA preparation.
36
CRYOPRESERVATION OF CELL LINES:
For cryogenic preservation, cell viability was checked by the trypan blue
exclusion test as described earlier. Only cultures with cell viability > 90% were
frozen. The volume of cell suspension containing 5 x 106 cells was centrifuged at
1000 rpm for 10 minutes. The supernatant was decanted and the cell pellet was
resuspended in 1 ml of freezing mix (45% RPMI-1640, 45% FCS and 10%
dimethylsulphoxide (DMSO; BDH, Poole, U.K) and transferred to a 1.2 ml cryogenic
vial. The vial was kept in a polystyrene box at -70 C overnight so that the
temperature decreased gradually. The following day the vial was transferred to the
vapour phase of the liquid nitrogen cryo-storage system (Jencons, Leighton Buzzard,
UK) for long term storage.
EXTRACTION OF CELLULAR DNA:
For the isolation of total genomic DNA a modified organic method was used
(Maniatis et al., 1982). Approximately 5x107 lymphoblastoid cells established from
each individual were pelleted into a sterile 50 ml polypropylene centrifuge tube. To
the cell pellet 19 ml STE buffer (100 mM sodium chloride, 50 mM Tris and 1 mM
EDTA; pH 8.0) was added. Next 1 ml of 10% sodium dodecyl sulphate (SDS) was
added dropwise with gentle vortexing, followed by 20 l of Proteinase K (20 mg/ml).
The samples were incubated overnight in shaking water bath at 55 C and extracted
the following day with an equal volume of tris base equilibrated phenol (pH 8.0). The
samples were mixed for 10 minutes, placed on ice for 10 minutes and then
centrifuged in MSE 3000i (Mistral, UK) at 4 C for 40 minutes at 3200 rpm. The
aqueous layer containing the nucleic acid was collected in a fresh, labeled 50 ml
centrifuge tube. The next extraction was done by adding an equal volume of chilled
24:1 (v/v) Chloroform: isoamyl alcohol. The samples were mixed and the aqueous
layer was collected in a fresh 50 ml tube. For precipitation of nucleic acids, 1/10
volume of 10 M ammonium acetate and an equal volume of chilled isopropanol were
37
added and mixed until white precipitates formed. These samples were stored over
night at -20 C or at -70 C for 15-20 minutes. Samples were then centrifuged at 3200
rpm for 90 minutes to pellet the nucleic acid and the pellet was washed with 5 ml of
chilled 70% ethanol. The pellets were vacuum dried for 10 minutes. To the pellets,
1ml Tris-EDTA (TE; 10 mM tris, 1 mM EDTA; pH 8) was added and the samples
were incubated at 37 C for 1 hour to resuspend the pellets. To digest the RNA, 10 l
of RNase A (10mg/ml) was added to the samples and they were incubated at 37 C
for 2 hours in a shaking water bath. The RNase was subsequently removed by
adding 50 l of 10% SDS and 5 l of proteinase K and incubation at 55 C for 1 hour
in a shaking water bath. At this point the samples could be stored at 4 C till further
extraction. Subsequent extract was carried out by adding 6 ml TE to each sample
before extracting successively with an equal volume of phenol and chloroform:
isoamyl alcohol. For precipitation of DNA, 1/10 volume of 10 M ammonium acetate
and an equal volume of chilled isopropanol was added. The samples were mixed
until the DNA was seen and stored at -20 C overnight or at -70 C for 15-20 minutes.
DNA was pelleted and washed with 5 ml of 70% chilled ethanol. The pellet was
vacuum dried for 10 minutes and the DNA was resuspended in 1 ml of 10 mM Tris-
HCl (pH 8).
The optical density (OD) of the samples was measured at 260nm and 280nm (ideally
260/280 ratio=1.8) using a Hitachi U3210 spectrophotometer (Hitachi, Tokoyo,
Japan). The quantity of DNA was calculated by the following formula:
DNA concentration g/ml = Abs 260 x dilution factor x correction factor.
A dilution factor of 50 was usually employed and the correction factor for double
stranded DNA is 50. If the OD260/OD280 ratio was 1.7-2.0, DNA was considered pure
and free of contaminating phenol or proteins and for further analysis. Each sample
was kept in an appropriately labeled microcentrifuge tube and stored at 4oC until use.
Some DNA samples were also directly prepared from the blood sample. The
procedure for the extraction of the DNA from blood was the same as above with
38
some minor modifications. Initially the blood was mixed with the cell lysis buffer
(0.15 M ammonium chloride, 0.01 M potassium bicarbonate and 0.1mM of 0.5M
EDTA; pH 8.0) and kept on ice for 30 minutes. The samples were centrifuged for 10
minutes at 1200 rpm. The pellets were again washed with 10 ml of lysis buffer and
centrifuged for 10 minutes at 1200 rpm. To this pellet 4.75 ml of STE buffer was
added along with 250 l of 10% SDS (drop wise with gentle vortexing) followed by 10
l of proteinase K. The tube was incubated overnight in a rotary water bath at 55oC.
The next day, the samples were extracted using phenol and chloroform: isoamyl
alcohol as described earlier. After this first extraction, 10 l of RNAse A (10 mg/ml)
was added and the samples were incubated at 37oC for 2 hours. After 2 hours the
samples were again treated with 250 l of 10% SDS and proteinase K and incubated
at 55oC for 1 hour. Subsequent extraction and precipitation were the same as
described for lymphoblastoid cell lines.
PHENOL EQUILIBRATION:
Analytical grade phenol (BDH) was redistilled at 160C to remove
contaminants that cause breakdown or cross linking of nucleic acids. Aliquots of
200-500 ml distilled phenol were stored at -20C. Before use, the phenol was melted
at 55-70C and -hydroxyquinolin was added as an oxidant and RNase inhibitor at a
final concentration of 0.1% (w/v). The melted phenol was extracted once with an
equal volume of 1.0 M Tris buffer (pH 8.0) and 3 to 4 times with 0.1 M Tris (pH 8.0).
This equilibrated phenol was stored at 4C in equilibration buffer (0.1 M Tris) to which
0.2% -merceptoethanol (v/v) was added. Under these conditions it was stable for
approximately one month (Maniatis et al., 1982).
39
GENOTYPING OF Y MARKERS BY POLYMERASE CHAIN REACTION
(PCR):
Polymerase chain reaction was first described in 1985 (Saiki et al., 1985) and
the method was extensively employed in this study to amplify the desired fragment of
Y chromosome from genomic DNA. The 93 Y markers that were genotyped in this
study are shown in table II and a brief overview of the various methods used to
detect them follows:
AMPLIFICATION REFRACTORY MUTATION SYSTEM (ARMS) PCR:
The ARMS PCR technique is a simple method for the detection of single base
mutations. In this allele specific PCR the genomic DNA is only amplified when a
specific allele is present. Two sets of reactions are run in parallel using three types
of primers, one of which is common in both reactions. One set consists of the
common primer and a primer that is specific for the normal sequence. The other
contains the common primer and another that is specific for the mutant sequence.
The principle is that the extension of primer by DNA polymerase is dependent up on
correct base pairing at the 3`end.
AMPLIFICATION FRAGMENT LENGTH POLYMORPHISM (AFLP) PCR:
The AFLP PCR is based on the principle that the base changes results in the
creation or abolition of a restriction site. PCR primers are designed from sequences
flanking the restriction site to produce a 100-500 base pair product. The amplified
product is subsequently digested with the appropriate restriction enzyme and
fragments are analyzed by agarose gel electrophoresis. The SNPs typed by AFLP
method are listed in table III.
40
Table II: A list of Y haplogroups, markers, type of polymorphism and genotyping methods used in this study. Y haplogroups were
determined in a hierarchal manner, screening initially with markers that identified deep lineages (bold) and subsequently genotyping
markers that further delineated the tree in the target population. The typing methods were amplified fragment length polymorphism
(AFLP), denaturing high performance liquid chromatography (DHPLC), amplification refractory mutation system polymerase chain reaction
(ARMS-PCR) or dideoxy DNA sequencing (Seq).
Polymorphism
Polymorphism
Polymorphism
Haplogroup
Haplogroup
Haplogroup
Genotyping
Genotyping
Genotyping
Markers
Markers
Markers
Method
Method
Method
A M91 del T DHPLC H1b M97 TG DHPLC O1b M110 TC Seq
A1 M31 GC DHPLC H2 Apt GA AFLP O2 P31 TC Seq
A2 M6 TC DHPLC I M170 AC ARMS O2a1 M88 AG Seq
A2 PK1 CA AFLP J 12f2 del PCR O2a1 M111 del TT Seq
A3a M32 TC DHPLC J1 M267 TG ARMS O2a1a PK4 AT DHPLC
B M60 ins T DHPLC J1a M62 TC ARMS O2b SRY+465 CT AFLP
B2a M150 CT DHPLC J2 M172 TG ARMS O3 M122 TC ARMS
B2a1 M109 CT DHPLC J2a1b M67 AT ARMS O3a3 L1Y LINE1 ins PCR
B2a1 M152 CT DHPLC J2a1b1 M92 TC ARMS O3a5 M134 del G DHPLC
B2a1 M218 CT DHPLC J2b M12 GT ARMS O3a5a M117 del ATCT DHPLC
C RPS4Y CT AFLP K M9 CG AFLP O3a5a M133 del T DHPLC
C1 M8 GT Seq K1 M147 ins T Seq P 92R7 CT AFLP
C2 M38 TG Seq K4 M177 CT Seq P M45 GA DHPLC
C3 M217 AC Seq L M20 AG AFLP P M74 GA DHPLC
C3 PK2 T-C ARMS L M11 AG AFLP Q M242 CT ARMS
C3C M48 AG ARMS L M185 CT DHPLC Q2 M25 GC DHPLC
DE YAP Alu ins PCR L1 M27 CG ARMS Q2 M143 GT DHPLC
E SRY-8299 GA AFLP L1 M76 TG DHPLC R M207 AG ARMS
E3a sY81 AG AFLP L2 M317 del GA DHPLC R1 M173 AC ARMS
E3b1 M35 GC ARMS L2 M349 GT DHPLC R1a1 M17 del G ARMS
E3b1a M78 CT ARMS L3 M357 CA DHPLC R1a1 SRY-1532 AGA AFLP
E3b1a1 M148 AG DHPLC L3a PK3 TC ARMS R1a1a M56 AT ARMS
E3b1c M123 GA ARMS NO M214 AG ARMS R1a1b M157 AC DHPLC
E3b1c2 M136 CT DHPLC N LLY22g CA AFLP R1a1c M87 TC DHPLC
F M89 CT ARMS N M231 GA DHPLC R1a1d PK5 CT AFLP
G M201 GT ARMS N3 TAT TC AFLP R1b2 M73 del GT DHPLC
G2a P15 C-T DHPLC O M175 del TTCTC Seq R1b3F SRY-2627 CT AFLP
H M69 TC DHPLC O1 M119 AC DHPLC R1c M343 CA ARMS
H1 M52 AC ARMS O1a M101 CT DHPLC R2 M124 CT ARMS
H1 M82 del AT DHPLC O1b M50 TC DHPLC T M70 AC ARMS
H1a M36 TG DHPLC O1b M103 CT DHPLC T M193 ins CAAA DHPLC
41
Table III: List of SNPs typed by AFLP method.
SNO Markers Restriction Enzyme
1 Apt Hae III
2 Lly22g HindIII
3 M9 Hinf I
4 M11 Msp I
5 M17 Afl III
6 M20 Ssp I
7 PK1 Psp14061
8 PK5 Mnl1
9 RPS4Y Bsl I
10 SRY+465 FnuH I
11 SRY 1532 Dra III
12 SRY2627 Ban I
13 SRY8299 BsrBI
14 Sy81 Nla III
15 TAT Mae II
16 92R7 Hind III
42
PREPARATION OF AGAROSE GEL:
6g of molecular grade agarose (molecular biology grade; Sigma Chem. Co)
was mixed in 300 ml of or TAE electrophoresis buffer (0.04M Tris-acetate and 0.01 M
EDTA / liter) to make a 2% (W/V) agarose gel. The agarose was melted in a
microwave oven keeping the cap of the bottle loose. When the agarose was dissolved
completely, 5 l (0.5g/ml) ethidium bromide (Sigma-Aldrich, St.Louis, USA) was
added and mixed thoroughly. The gel was placed on shaking water bath at 55 C for
20-25 minutes. A gel tray was sealed with rubber clamps and placed on a level
horizontal surface. The required combs were placed at appropriate positions (0.5-
1.0mm above the base of the gel). The gel was poured into the gel tray. After the gel
solidified, the combs and clamps were removed from the gel tray. The gel was placed
in an electrophoresis tank containing appropriate 1X TAE electrophoresis buffer.
Orange G loading dye (0.125% orange G, 20% Ficoll, 100mM EDTA) was
added to each sample and the samples were loaded on the gel. A 100 bp ladder
(Promega) was loaded in the first well. Electrophoresis was carried out for
approximately 40 minutes at 150 volts using a power pack (3000 Bio Rad
laboratories). Photographs were taken under UV transilluminator using the Syngene
system (Bio imaging system, Cambridge, UK).
MULTIPLEX PCR:
Each sample was PCR amplified in a multiplex reaction consisting of 4 to 5
primer pairs which were labeled either with TET, HEX or FAM (Table IV). The multiplex
PCR assay was performed in a 10 l final volume. The reaction mixture was prepared
TM
in two steps. In first step, Super Taq polymerase / Taq Start Antibody premix was
prepared. Briefly, the premix consisted of the following: 0.13U Super Taq enzyme (HT
TM
Biotechnology Ltd) was incubated with 2.3 M Taq Start Antibody (Clontech) in the
43
TM
presence of 0.874 l /RXN Taq Start Dilution buffer for 5-7 minutes at room-
temperature. In the second step, PCR master mix was prepared. Briefly the reaction
consisted of following: 1x Supper Taq PCR Buffer1 (10mM Tris-HCl pH 9 , 1.5mM
Mgcl2, 50mM KCl, 0.01% gelatin and 0.01% Triton X-100), 0.7mM Mgcl2, 200 M
dNTPs, primer (concentration was described in table IV) and 1.225 l /RXN Super Taq
polymerase / Taq Start TM Antibody premix.
The above mixture was added in to the tubes containing 20ng (1l) genomic
DNA. PCR was performed by Touch Down protocol as described in Ayub et al.,
(2000). PCR was carried out using the following conditions: 1 cycle of 1 minute at
940C; 8 cycles of 1 minute at 940C, 1 minute at 600C and 1 minute 720C (the annealing
temperature was decreased by 0.5 C in each cycle); 30 cycles of 1 minute at 940C, 1
minute at 560C and 1 minute 720C; I cycle of 5 minute at 720C.
SAMPLE PREPARATION:
0.3 l of amplified product was mixed with 2.7 l of dye (0.342 l Dextran blue,
1.5 l formamide, 0.478 l autoclave deionized water and 0.38 l TAMRA 300 or 500
internal lane size standard / reaction). Samples was denatured at 90C for 2 minutes
and placed on ice untilled loading. Samples were run on ABI 377 DNA sequencer for
one and a half hour. The data was collected by using ABI collection software. The
fragment sizes were estimated using Gene Scan software (v2.1). The allele were
called using Genotyper software (v2.0).
4% POLY ACRYLAMIDE GEL PREPARATION:
5.4 g of urea was dissolved in 5 ml of autoclaved deionized water by continuous
stirring and heating.1.5 ml of 40 %(19:1,acrylamide:bis acrylamide) acrylamide solution
and 2-3 gm of mix bed ion- exchange resin was added to the urea and mixed for 2-3
44
minutes. The solution was filtered through a Whatmann No. 1 filter paper into a 50 ml
graduated cylinder already containing 1.5 ml of 10X TBE (Trizma base; Tris
[hydroxymethyl] aminomethane 70g, 55g boric acid and 9.0g ethylene diamine tetra
acetic acid (EDTA, pH 8-8.2). The volume was made to 15 ml and filtered through a
0.2 M Millipore filter paper using a Millipore vacuum filtration assembly. To the filtered
solution 5 l of 10% ammonium per sulphate (APS) and 10.5 l TEMED was added
just before pouring the gel.
The rear and the front plate (12 cm) were washed with 1% Alconox detergent
first with de-mineralized water and then with deionized water. When plates were dry,
the rear plate was placed on the gel casting apparatus (Sequencing Gel Caster: model
SGC-1) with the inside of the plate facing up. Wet 0.2 mm spacers were placed on the
rear plate. The front plate was placed half way down on top of the rear plate. The 4 %
acryamide solution was filled in a 50ml syringe and poured slowly between the two
plates. The flat edge of a 0.2 mm comb was inserted in between the plates and plates
were sealed with clamps. The plate assembly was left for 30-45 minutes for the gel to
polymerize. The comb and clamps were removed. The plate assembly was washed
with demineralized water then deionized water and left for 15- 20 minutes. The shark
tooth side of the comb was inserted so that the teeth of the comb just touch the gel.
The plates were fixed on the gel cassette then on to the sequencer. The upper and
lower buffer reservoirs were attached. Plate check was carried out to ensure that the
gel plate was clean. 1X TBE buffer was filled in upper and lower buffer reservoirs.
Before loading the samples the gel was electrophoreses for 10 minutes.
45
Table IV: YSTR Primers sequences.
Dye Final Conc.
YSTR1 Primer name Primer Sequence label (M)
DYS19-L CTA CTG AGT TTC TGT TAT AGT TET 0.236
DYS19-R ATG GCA TGT AGT GAG GAC A 0.236
DYS388-L GTG AGT TAG CCG TTT AGC GA TET 0.318
DYS388-R CAG ATC GCA ACC ACT GCG 0.318
DYS390-L TAT ATT TTA CAC ATT TTT GGG CC 0.127
DYS390-R TGA CAG TAA AAT GAA CAC ATT GC FAM 0.127
DYS391-L-N CTA TTC ATT CAA TCA TAC ACC CAT AT FAM 0.384
DYS391-R-N ACA TAG CCA AAT ATC TCC TGG G 0.384
DYS392-L-N AAA AGC CAA GAA GGA AAA CAA A 0.155
DYS392-R-N CAG TCA AAG TGG AAA GTA GTC TGG HEX 0.155
DYS393-L GTG GTC TTC TAC TTG TGT CAA TAC 0.18
DYS393-R AAC TCA AGT CCA AAA AAT GAG G HEX 0.088
YSTR2
DYS389I-L CCA ACT CTC ATC TGT ATT ATC TAT TET 0.032
DYS389I-R TCT TAT CTC CAC CCA CCA GA 0.032
DYS389II-L CCA ACT CTC ATC TGT ATT ATC TAT TET 0.032
DYS389II-R TTA TCC CTG AGT AGT AGA AGA AT 0.032
DYS425-L TGG AGA GAA GAA GAG AGA AAT 0.861
DYS425-R AGC TCT ACA AGC CAT TGT GAT CT FAM 0.861
cont.
46
Dye Final Conc.
YSTR2 Primer name Primer Sequence
label (M)
DYS426-L GGT GAC AAG ACG AGA CTT TGT G HEX 0.30
DYS 426-R CTC AAA GTA TGA AAG CAT GAC CA 0.25
YSTR3 DYS434-L CAC TCC CTG AGT GCT GGA TT TET 0.2
DYS434-R GGA GAT GAA TGA ATG GAT GGA 0.2
DYS437-L GAC TAT GGG CGT GAG TGC AT HEX 0.1
DYS437-R AGA CCC TGT CAT TCA CAG ATG A 0.1
DYS435-L AGC ATC TCC ACA CAG CAC AC TET 0.05
DYS435-R TTC TCT CTC CCC CTC CTC TC 0.05
DYS438-L TGG GGA ATA GTT GAA CGG TAA HEX 0.2
DYS438-R GTG GCA GAC GCC TAT AAT CC 0.2
DYS436-L CCA GGA GAG CAC ACA CAA AA FAM 0.025
DYS436-R GCA ATC CAA CTT CAG CCA AT 0.025
DYS439-L TCC TGA ATG GTA CTT CCT AGG TTT TET 0.2
DYS439-R GCC TGG CTT GGA ATT CTT TT 0.2
47
AUTOMATED FLUORESCENT DNA SEQUENCING:
Automated sequencing (di-deoxy terminator cycle sequencing) was carried out
using an ABI 377 DNA Sequencer and the dye terminator cycle sequencing ready
reaction kit (version 3.1; Applied Bio system).
DNA was amplified by polymerase chain reaction in a 50 l reaction volume.
The reaction mixture contained: 1X PCR buffer II, 1.5mM MgCl2, 100 M dNTPs, 1U
DNA Taq polymerase, 1.0 M Primer (forward and reverse each) and 40ng DNA
template. The following PCR cycling conditions were used for the amplification: 1 cycle
of 4 minutes at 950C; 35 cycles of 1 minute at 950C, 1 minute (annealing) (depend on
the primer and describe in Appendix I), 1 minute at 720C; 1 cycle of 10 minute at 720C.
Amplified PCR products were first checked on 2% agarose gel. The amplified
product was precipitated with 50l of 95% ethanol. Sample was then washed with
200l of 70% ethanol and the pellets were resuspended in autoclaved deionised
water. If required, PCR products were also purified with the QIAquick PCR product
extraction kit (Qiagen) according to the manufacturers instruction. Sequencing
reaction was carried out in 10.0 l total reaction volume consisted of the following: 2.0
l sterile deionised H2O, 4.0 l Terminator ready reaction mix. (Includes labelled dye
terminators, buffer, and dNTPs), 1.0 l forward or reverse sequence specific primer
and 3.0 l purified DNA (0.5 g).
PCR was performed using a Thermo Hybaid multi-block system (MBS 0.2S), or
Thermo Hybaid PxE 0.2 thermal cycler for 25 cycles as follows: 10 seconds at 96oC, 5
seconds at 50oC and 4 minutes at 60oC.
After amplification, the products were precipitated with 50l of 95% ethanol,
washed with 200l of 70% ethanol and vacuum dried. The pellets were resuspended
in 5l of ABI loading buffer, diluted with formamide (1:5), samples was denatured at
95C for 2 minutes and placed on ice until loading. Samples were run on ABI 377
48
DNA sequencer for seven hour. The data was collected by using ABI collection
software.
PREPARATION OF SEQUENCING GEL:

To prepare sequencing gel, 9g of urea (6M) was dissolved in approximately
10ml of deionised water, placed on a hot plate with constant stirring. After dissolving
the urea, 2.5ml of a 19:1 acrylamide gel solution (Sequa gel) and 2.5ml of 10X TBE
was added q.s. to 25ml with sterile deionised water. The solution was filtered through a
0.2m Millipore membrane filter and degassed using a Millipore vacuum filtration
assembly. To the filtered solution, 200l of 10% APS and 5l of TEMED was added
and immediately poured into the gel plates. The remaining procedure was same as
mentioned for the 4% poly acryl amide gel preparation.
DENATURING HIGH PERFORMANCE LIQUID CHROMATOGRAPHY
(DHPLC):
The technique denaturing high performance liquid chromatography (DHPLC)
was initially developed by Oefner and Underhill (1995). This is a powerful technique in
which SNPs are identified by the presence of hetroduplexes in a mixture of amplified
products from a wild type DNA (control sample) and the test sample. The DNA
fragments are separated on a specialized DNA Sep column based upon the principle
of ion-pair reversed phase HPLC carried out under denaturing conditions. The
Transgenomic WAVETM DNA fragment analysis system was used for DHPLC work.
PCR was carried out in 15 l total reaction volume. The concentration of reagent for
PCR reaction is: 1X PCR Buffer, 1.5 mM MgCl2, 200 M dNTPs, 1U BioTaq DNA
polymerase, 1.0 M Primer (forward and reversed each), 40ng DNA template (20ng/ l).
PCR cycling parameters were described in the Appendix I.
The quality of amplified product was first checked on a 2% agarose gel by
taking 5 l of each PCR product. Equal volumes of the PCR products of a wild type
49
and each test sample were separately mixed and denatured at 95oC for 5 minutes.
They were then allowed to reanneal by decreasing the temperature at the rate of
1.5oC/min from 95oC-25oC.
Before setting up the experiment, the instrument was initially allowed to run
(purged) with 33% of buffer A (0.1M triethylamonium acetate (TEAA) solution, pH 7.0),
33% of buffer B (0.1M TEAA solution containing 25% acetonitrile, pH 7.0) and 34% of
buffer C (75% acetonitrile solution) for 2-5 minutes. After purging, the column was
equilibrated for 30 minutes with 50% of buffer A and 50% buffer B at a flow rate of
0.9ml/min. Five needle and injection port washes were carried out using buffer D (8%
acetonitrile).
The DNA sequence to be screened for polymorphisms was copied to the Wave
Maker (version 4.1) software and the appropriate temperature and gradient method for
that particular sequence was determined. A sample sheet specifying the tube
numbers, injection volumes, sample IDs and gradient was prepared. The system was
initialized and run according to the manufacturers instructions.
The optimal melting temperature for any DNA fragment can be determined by
electronic submission of sequence to the web site
(http://insertion.stanford.edu/melt.html).
50
RESULTS
-5-
SECTION 1
PHYLOGEOGRAPHY OF PAKISTANI ETHNIC GROUPS:
The Y chromosomal biallelic markers (base substitutions, insertions and
deletions) identify stable Y haplogroups and lineages. More than 600 such markers
on the male specific region of the human Y chromosome delineate >300 Y
haplogroups with a worldwide distribution (Figure II). In this study 93 of these Y
chromosomal biallelic markers were examined in 1,213 unrelated male individuals
representing 16 ethnic groups from Pakistan. The ethnic groups were categorized
broadly into two groups (Table I). The northern group was represented by unrelated
males from the Balti, Burusho, Hazara, Kalash, Kashmiri, Pathan and Punjabi ethnic
groups. Punjabis constitute the majority of Pakistans population and most reside in
the Punjab province adjoining India. The Punjabi samples analyzed were 185
unrelated male samples of the Gujar, Meo and Rajput castes. The southern group
comprised of unrelated males from the Baloch, Brahui, Makrani-Baloch, Makrani-
Negroid, Mohanna, Parsi and Sindhi populations.
Y biallelic polymorphisms were typed using a hierarchical approach. All
samples were initially analyzed for four markers representing clades close to the root
of the Y phylogenetic tree. These included SRY10831.1 (clade B*), RPS4Y711 (clade
C*), YAP (clade E*) and M89 (clade F*). The frequencies of these B*, C*, E* and F*
haplogroups in Pakistan are shown in Table V.
Futher subtyping of markers within each haplogroup revealed thirty-three
haplogroups in different ethnic groups of Pakistan. Among these four (B*, C*, E* and
F*) haplogroups, F* was the most frequent in both northern and southern populations
(Figure III). As expected, the majority (85%) of Y chromosomes from Pakistan were
derived from M89. The M89 derived alleles are frequently found in most world
populations residing outside Africa, and represents YCC clades F through T (Figure
II). Twenty-five different haplogroups of F*-M89 chromosomes were found at varying
51
frequencies in the different ethnic groups of Pakistan (Table VI). The thirty-three
haplogroups are summarized in Figure IV.
Clade A* is restricted to sub-Saharan African populations and was not
observed in any individual belonging to Pakistan. However, a low frequency of B*-
M60 haplogroup was observed in 0.9 % of the Brahui and 3% of the Makrani-Negroid
samples from southern Pakistan.
Haplogroup C* was the predominant haplogroup in the Hazara population
(60%). It was also present in the Brahui, Mohanna, Burusho, Meo and Gujar with a
frequency that ranges from 1.6 to 8.2% (Tables VI and VII). Individuals carrying the
derived allele for RPS4Y711 marker were further sub-typed for five additional markers
that identify clades C1, C2 and C3. These included the markers M8 (C1*), M38
(C2*), M217, PK2 (C3*), and M48 (C3a). Of these, only PK2 was detected. The PK2
marker is one of the several novel SNPs and it is phylogenetically equivalent to
haplogroup C3* (Mohyuddin et al., 2006). All of the Hazara (60%) and Burusho
(8.2%) RPS4Y711 derived Y chromosomes also had the derived allele for the PK2
marker.
YAP derived chromosomes constitute 3% of Pakistani population belonging to
clade DE* were observed mainly in the southern populations. Except for the
Mohanna, this haplogroup was observed in all southern populations with frequency
between 1.5%- 10.6%. The Pathans were the only northern population in which
these chromosomes were observed (2.1%). Several off-shoots of DE* clade were
analyzed and all Pakistani YAP positive (YAP+) Y chromosomes belonged to
haplogroup E* and carried the derived allele for SRY-8299. Further sub-typing of clade
E* defined three informative haplogroups; E1b1a*, E1b1b1a*, and E1b1b1c*. The
highest frequency of E1b1a* (marker sY81=M2) was observed in the Makrani-
Negroid (9.1%). These chromosomes were also found in the Makrani-Baloch (3.7%),
Brahui (3.4%) and in Baloch (1.5%). The remaining YAP+ chromosomes carried the
52
Table V: Frequency of haplogroups B*, C*, E* and F* in ethnic groups
from Pakistan.
Population n B* C* E* F*
Northern
Balti 14 - - - 1.000
Burusho 97 - 0.082 - 0.918
Hazara 224 - 0.600 - 0.402
Kalash 44 - - - 1.000
Kashmiri 12 - - - 1.000
Pathan 96 - - 0.021 0.976
Punjabi 185 - 0.016 - 0.984
Southern
Baloch 66 - - 0.106 0.894
Brahui 117 0.009 0.017 0.034 0.940
Makarini Baloch 27 - - 0.074 0.926
Makarani Negroid 33 0.030 - 0.121 0.848
Mohanna 70 - 0.043 - 0.957
Parsi 90 - - 0.056 0.944
Sindhi 138 - - 0.022 0.978
Total 1213 0.002 0.124 0.022 0.852
53
Figure III: Distribution of haplogroups B*, C*, E* and F* in populations from
northern and southern Pakistan.
54
Figure IV: Y haplogroups frequency distribution in ethnic groups of Pakistan.
55
derived allele for E1b1b1*-M35 haplogroup. This clade comprises six main branches
which have a wide distribution in Africa, Asia and Europe. Of these, the E1b1b1a*-
M78 and E1b1b1c*-M123 derived chromosomes were observed in Pakistan. It was
interesting that only two YAP+ populations i.e., Baloch from southern group and
Pathan from northern group share this E1b1b1a*-M78 haplogroup at a frequency of
6.1% and 2.1%, respectively. The majority of the southern populations carry the
derived allele for the M123 marker. The frequency of E1b1b1c*-M123 haplogroup
was 5.6% in the Parsi, 3.7% in the Makrani-Baloch, 2.2% in the Sindhi and 1.5% in
the Baloch.The derived allele for M89 was observed at very high frequency in
representatives from all population groups of Pakistan except for the Hazara. The
following branches of this haplogroup were observed in Pakistan:
Haplogroup G*-M201 which is distributed mainly in Eurasian populations
comprises 1.1% of the Pakistani Y chromosome. The frequency of this haplogroup
was highest in the Kalash from northern Pakistan. Haplogroup G* was also observed
in all southern populations except for the Baloch and Makrani-Baloch tribes. Low
frequency of G* was observed in the Mohanna, Burusho, and Gujar Y chromosome.
One major sub-clade of this haplogroup G2a*, which is derived for the P15
polymorphism, accounts for a major proportion of the variation observed in this
haplogroup in Pakistan. Haplogroup (G2a*) is widely distributed among the southern
populations. Among the northern group only Kalash and Pathan Y-chromosome
carry this haplogroup at a frequency of 18.1% and 1%, respectively.
The H1*-M52 haplogroup which is a sub clade of H*-M69 Y chromosomes
exhibits a frequency of 4% in Pakistan. The highest frequency was found in the Balti
(7.1%), Kalash (20.4%), Punjabi (7.6%), Makrani Negroid (6.1%) and Sindhi (5.8%)
samples (Table VI and Figure V). Individuals carrying the derived allele for H1* clade
were further sub-typed for two markers that identify clade H1a1-M36 and H1a2-M97.
Neither H1a1 nor H1a2 haplogroup were present in Pakistan.
56
Haplogroup I*-M170, A-C mutation on the Y chromosome is thought to have
arisen in Europe. The European Y-chromosome gene pool contains a high
frequency of this haplogroup. In Pakistan, frequency of M170 polymorphism was
<0.1% as it was only observed in one individual belonging to the Hazara population.
Clade J*, characterized by the 12f2a deletion, was widely distributed across
Pakistan. The majority of these Y chromosomes were represented by the J2a2* (M-
67 derived) haplogroup that is a major branch of the J2*-M172 haplogroup. The
J2a2* haplogroup was found in all ethnic groups examined and constituted 10% 0f
the population (Figure V). One offshoot of the J2a2* haplogroup, the J2a2a*
haplogroup characterized by the derived allele for the biallelic marker M92, was
observed in one southern population the Brahui (8.5%). The other main branch of
the J lineage, J1*-M267, was also observed in this population in addition to the
Baloch, Makrani-Baloch and Sindhi from southern Pakistan. The Pathan was the
only northern group that carried the J1* haplogroup, albeit at very low frequency
(1.0%).
A majority of non-African Y chromosomal haplogroup are derived for the M9
marker and fall in clades K*-T*. The derived allele for M9 is widespread in Pakistan
and accounts for 61% of all Y-chromosomes, all of which were resolved into sub-
clades L*, NO*, Q*, R* and T*. Lineages K1-K4, that are a component of the Asian
Y-chromosomal gene pool were not observed in Pakistan.
Sub-clade L*, defined by the A to G M20 SNP constitutes 11% of the
Pakistani population with frequency ranging from 1.1%-24.2%. Of the three well
characterized branches in this haplogroup the most dominant off-shoot present in
Pakistan is L1 that has the derived allele for M27. L1 occurs at an average
frequency of 5.0% and is present in all southern populations with a frequency of
24.2% in the Baloch and 1.4% in the Parsi. Among the northern populations this
haplogroup is observed only in the Pathan and Punjabi (Tables VI, VII and Figure V).
57
The L2*-M317 haplogroup, another offshoot of L* was observed in only two southern
populations the Parsis and Makrani- Baloch at frequencies of 13.3% and 3.7%,
respectively. The remaining branch L3* had a more widespread distribution and the
highest frequency was observed in the northern Burusho and Balti populations (Table
VI and Figure VI). L3a, a branch of L3*, characterized by the marker PK3 appears
only in Kalash population at a relatively high frequency (23%).
An extremely low frequency of the NO* clade was observed in Pakistan. The
12 individuals belonging to various branches of this clade were observed in two
northern (Burusho and Pathan) and two southern (Brahui and Mohanna) populations
only. The N1* (LLY22g derived) Y chromosomes were present in a Brahui and
Mohanna individual. The newly discovered haplogroup O2a1a-PK4 was found only
in the Pathan (4.2%) but the East Asian O3* M122 derived haplogroup was observed
in the Brahui (<1%), Burusho (3.1%) and Pathan (1%) samples. LY1 derived
haplogroup O3a3a* was present at low frequency in the Brahui only.
Two major Y haplogroups Q*-M242 and R*-M207 branch off clade P* that is
delineated by numerous SNPs including 92R7, M45 and M74. All P* chromosomes
were resolved into Q* and R* haplogroup. Haplogroup Q* occurs at an average
frequency of 1.8% in Pakistan and is observed in four northern (Burusho, Hazara,
Pathan and Punjabi) and four southern (Baloch, Brahui, Makrani-Baloch and Sindhi)
populations.
Haplogroup R* characterized by the M207 SNP has a widespread distribution
in Pakistan. It has two major branches R1* (M173 derived) and R2 (M124 derived)
which have a distinct geographic worldwide distribution. R1*, which is common in
Europe, West and Central Asia occurs at an average frequency of 4.8% and is
observed in all the Pakistani populations (Table VI and Figure VI). One derivative of
M173, R1a1-M17, which occurs at an average frequency of 35.1% in the population,
is the most common Y haplogroup in Pakistan. This particular haplogroup was
present in all population included in this study (Table VI and Figure VI). The highest
58
frequency of R1a1* was observed in the Mohanna (71.4%) and lowest in the Parsi
(7.8%). Other populations with appreciable (>50%) frequency of R1a1* included the
Kashmiri (58.3%), Punjabi caste (56.7%), and Sindhi (51.4%). On the background of
R1a1* haplogroup one of newly discovered haplogroups R1a1e-PK5 was observed
however, it was restricted only to the Burusho population (2.1%).
Haplogroup R2 that has the M124 derived allele occurs in many Pakistani
populations and has an average frequency of 5.8%. Except for the Mohanna it is
observed in all southern populations. Its distribution is patchy in the north of Pakistan
and it is found only in the Burusho, Kashmiri and Punjabi populations (Figure VI).
Haplogroup K2 (Y Chromosome Consortium, 2002) was recently reassigned
to new haplogroup T* (Karafet et al., 2008). This haplogroup is characterized by the
derived allele for M70 and was only found in a single Pathan individual.
59
Table VI: Number and frequencies of populations fall in haplogroup B-I.
No. Haplogroups
(SRY-8299)
(RPS4Y711)
(sY81=M2)
E1b1b1a
E1b1b1c
(M123)
(M201)
(M170)
E1b1a
(M60)
(M78)
(M89)
(M52)
(PK2)
(P15)
G2a
Population
C3
H1
G*
C
B
I
n
North
Balti 14 6 0 0 0 0 0 0 0 0 0 0 1(7.1) 0
Burusho 97 15 0 0 8(8.2) 0 0 0 0 1(1.0) 1(1.0) 0 4(4.1) 0
Hazara 224 9 0 0 134(60) 0 0 0 0 13(5.8) 0 0 0 1(0.5)
Kalash 44 8 0 0 0 0 0 0 0 0 0 8(18.1) 9(20.4) 0
Kashmiri 12 5 0 0 0 0 0 0 0 0 0 0 0 0
Pathan 96 16 0 0 0 0 0 2(2.1) 0 2(2.1) 10(10.4) 1(1.0) 4(4.2) 0
Punjabi 185 14 0 3 (1.6) 0 0 0 0 0 7(4.0) 1(0.54) 0 14(7.6) 0
South
Baloch 66 13 0 0 0 1(1.5) 1(1.5) 4(6.1) 1(1.5) 1(1.51) 0 0 0 0
Brahui 117 18 1(0.9) 2 (2.0) 0 0 4(3.4) 0 0 0 0 9(8.0) 1(1.0) 0
Makrani-B 27 11 0 0 0 0 1(3.7) 0 1(3.7) 0 0 0 0 0
Makrani-N 33 11 1(3.0) 0 0 1(3.0) 3(9.1) 0 0 0 0 1(3.0) 2(6.1) 0
Mohanna 70 9 0 3 (4.3) 0 0 0 0 0 0 1(1.4) 3(4.3) 2(2.9) 0
Parsi 90 11 0 0 0 0 0 0 5(5.6) 0 0 1(1.1) 2(2.2) 0
Sindhi 138 13 0 0 0 0 0 0 3(2.2) 2(1.5) 0 2(1.5) 8(5.8) 0
Total 1213 2 8 142 2 9 6 10 26 13 25 47 1
33 (0.2) (0.7) (11.7) (0.2) (0.7) (0.5) (0.8) (2.1) (1.1) (2.1) (4.0) (0.08)
%
Cont.
60
Table VI: Number and frequencies of populations fall in haplogroup J-L.
Population
(12f2a)
(M267)
(M172)
(M317)
(M357)
J2a2a
(M67)
(M92)
(M20)
(M27)
(PK3)
J2a2
L3a
L1
L2
L3
J1
J2
L
J
n
North
Balti 14 0 0 0 2(14.3) 0 0 0 0 2(14.3) 0
Burusho 97 0 0 1(1.0) 7(7.2) 0 3(3.1) 0 0 14(14.4) 0
Hazara 224 21(9.4) 0 3(1.4) 1(0.5) 0 0 0 0 0 0
Kalash 44 0 0 0 4(9.1) 0 1(2.3) 0 0 0 10(23.0)
Kashmiri 12 1(8.3) 0 0 1(8.3) 0 0 0 0 0 0
Pathan 96 0 1(1.0) 0 5(5.2) 0 0 5(5.2) 0 7(7.3) 0
Punjabi 185 1(0.54) 0 0 18(9.7) 0 2(1.1) 15(8.2) 0 4(2.2) 0
South
Baloch 66 0 2(3.0) 0 6(9.1) 0 0 16(24.2) 0 3(4.5) 0
Brahui 117 5(4.3) 6(5.1) 0 10(8.5) 10(8.5) 0 7(6.0) 0 2(1.7) 0
Makrani-B 27 0 1(3.7) 0 5(18.5) 0 1(3.7) 2(7.4) 1(3.7) 0 0
Makrani-N 33 0 0 0 6(18.1) 0 0 2(6.1) 0 1(3.0) 0
Mohanna 70 0 0 0 3(4.3) 0 1(1.4) 6(8.6) 0 0 0
Parsi 90 0 0 0 35(38.9) 0 3(3.3) 1(1.4) 12(13.3) 0 0
Sindhi 138 2(1.45) 4(3.0) 0 19(14.0) 0 0 6(4.4) 0 4(3.0) 0
Total 30 14 4 122 10 11 60 13 37 10
1213 (2.5) (1.2) (0.3) (10.1) (0.8) (0.9) (5.0) (1.1) (3.0) (0.8)
%
Cont.
61
Table VI: Number and frequencies of populations fall in haplogroup N-T.
(LLY22g)
(M122)
(M242)
(M207)
(M173)
(M124)
O2a1a
O3a3a
R1a1e
Population
(M17)
(M70)
(PK4)
(PK5)
(L1Y)
R1a1
O3
N1
R1
R2
Q
T
N
North
Balti 14 0 0 0 0 0 2(14.3) 1(7.1) 6(43.0) 0 0 0
Burusho 97 0 0 3(3.1) 0 2(2.1) 11(11.3) 1(1.0) 25(25.8) 2(2.1) 14(14.3) 0
Hazara 224 0 0 0 0 4(2.0) 0 26(11.6) 21(9.4) 0 0 0
Kalash 44 0 0 0 0 0 3(7.0) 1(2.3) 8(18.1) 0 0 0
Kashmiri 12 0 0 0 0 0 0 2(16.6) 7(58.3) 0 1(8.3) 0
Pathan 96 0 4(4.2) 1(1.0) 0 5(5.2) 1(1.0) 4(4.2) 43(44.8) 0 0 1(1.0)
Punjabi 185 0 0 0 0 1(0.55) 2(1.1) 4(2.1) 105(56.7) 0 8(4.3) 0
South
Baloch 66 0 0 0 0 2(3.1) 0 4(6.1) 19(28.8) 0 6(9.1) 0
Brahui 117 1(0.8) 0 1(0.8) 1(1.0) 1(1.0) 0 3(2.6) 45(38.4) 0 8(7.0) 0
Makrani-B 27 0 0 0 0 1(3.7) 0 1(3.7) 9(33.3) 0 4(15) 0
Makrani-N 33 0 0 0 0 0 0 4(12.1) 10(30.3) 0 2(6.1) 0
Mohanna 70 1(1.43) 0 0 0 0 0 0 50(71.4) 0 0 0
Parsi 90 0 0 0 0 0 1(1.1) 4(4.4) 7(7.8) 0 19(21.1) 0
Sindhi 138 0 0 0 0 6(4.3) 0 3(2.2) 71(51.4) 0 8(6.0) 0
Total 2 4 5 1 22 20 58 426 2 70 1
1213 (0.2) (0.3) (0.4) (0.1) (1.8) (1.6) (4.8) (35.1) (0.2) (5.8) (0.1)
%
62
Table VII: Y lineages found in the three Punjabi castes examined in this study.
No. haplogroups
(RPS4Y711)
(12f2a)
(M201)
(M357)
(M242)
(M207)
(M173)
(M124)
R1a1*
(M89)
(M52)
(M67)
(M20)
(M27)
(M17)
J2a2*
H1*
R1*
Populations
L3*
C*
R2
G*
Q*
L1
R*
F*
L*
J*
n
Gujar 159 13 2 6 1 14 1 17 2 15 4 0 1 3 86 7
(1.3) (3.8) (0.6) (8.8) (0.6) (10.6) (1.3) (9.4) (2.5) - (0.6) (1.3) (55) (4.4)
Meo 16 4 1 0 0 0 0 1 0 0 0 0 0 1 13 0
(6.2) - - - - (6.3) - - - - - (6.25) (81) -
Rajput 10 5 0 1 0 0 0 0 0 0 0 1 1 0 6 1
- (10) - - - - - - - (10) (10) - (60) (10)
Total 185 14 3 7 1 14 1 18 2 15 4 1 2 4 105 8

(%) (1.6) (3.8) (0.5) (7.6) (0.5) (9.7) (1.1) (8.1) (2.2) (0.5) (1.1) (2.2) (57) (4.3)
63
Figure V: Distribution of major Y lineages (PK2, M52, M67 and M27) frequencies
in Pakistan (frequencies are shown in table VI).
64
Figure VI: Distribution of major Y lineages (M357, M173, M17 and M124)
frequencies in Pakistan (frequencies are shown in table VI).
65
PHYLOGENETIC ANALYSES
PRINCIPAL COMPONENT ANALYSIS:
The Principal Component Analysis was carried out in order to examine
population relationships. This analysis is based upon the frequencies of thirty three
Y haplogroups in Pakistani ethnic groups. The principal component, PC1 and PC2,
account for 72% of the variation in the population (Figure VII). The PC analysis
shows that the all Pakistani populations group together, with the exception of the
Hazara, who are relatively distinct from other Pakistani ethnic groups and are
clustered in the lower right quadrant of the graph. Interestingly, other populations
such as, Brahui and Balti which are linguistically different from others; and the
Kalash, that are isolated; did not stand out and grouped with other ethnic group from
Pakistan.
PHYLOGENETIC ANALYSIS:
Analysis of Molecular Variance (AMOVA) was carried out using the Arlequin
software. The populations were grouped on the basis of ethnicity, geographic origin
and the linguistic affiliation. On the basis of this analysis we ascribed that ethnically
the population were significantly different from each other (p value Va vs FCT:
0.02050.0050). As expected, majority of the variation was explained by variation
within Pakistani population (Table VIII).
The pair-wise FST values between Pakistani ethnic groups based on the
haplogroups frequencies also corroborate this result. The P-value matrix of
significance; based upon 110 permutations among the Pakistani populations with
significance level of 0.05; also demonstrated that significant variation occurs among
the populations (Tables IX and X).
66
Figure VII: Principal component analysis based on Y haplogroup frequencies
in Pakistani populations.
Balti: Blt, Burusho: Bsk, Hazara: Hzr, Kalash: Kal, Kashmiri: Ksr, Pathan: Pkh,
Gujar: Gjr, Meo: Meo, Rajput: Rpt, Baloch: Ball, Brahui: Bru, Makrani-Baloch:
Mak-B, Makrani-Negroid, Mak-N, Mohanna: Mhn, Parsi: Prs, Sindhi: Sdh.
67
Table VIII: Percentage of variation obtained by AMOVA at three levels of population hierarchy in ethnic groups from Pakistan.
Basis for Number Percentage of variation Variance components Fixation Indices p value
grouping of Among Among Within Va Vb FCT FSC FST Va vs FCT
groups groups populations populations (1023 permutations)
within
groups
None 1 - 15.22 84.78 0.0649 0.3617 - - 0.1522 -
Ethnicity 13 14.45 0.90 84.65 0.0618 0.0038 0.0105 0.1445 0.1535 0.0205 0.0050
Geographic 2 1.12 14.52 84.36 0.0048 0.0623 0.0112 0.1469 0.1564 0.4076 0.0167
Linguistic 4 - 8.99 19.34 89.65 - 0.0363 0.0780 - 0.0899 0.1774 0.1035 0.9746 0.0047
68
Table IX: Population pair wise FSTs between Pakistani ethnic groups computed from Y haplogroup frequencies.
FST p values (based upon 110 permutations) are given above the diagonal with * indicating significant pair wise
differences.
Population BAL BRU MAKB MAKN MHN PRS SDH BLT BSK HZR KAL KSR PKH MEO GJR RPT
Baloch (BAL) - 0.0000* 0.3153 0.0630 0.0000* 0.0000* 0.0000* 0.1081 0.0000* 0.0000* 0.0000* 0.0360* 0.0000* 0.0000* 0.0000* 0.0360*
Brahui (BRU) 0.0275 - 0.3063 0.1982 0.0000* 0.0000* 0.0180* 0.3243 0.0000* 0.0000* 0.0000* 0.2882 0.0090* 0.0000* 0.0000* 0.1801
0.1982
Makrani Baloch (MAKB) 0.0053 0.0016 - 0.8018 0.0000* 0.0090 0.0720 0.3423 0.0991 0.0000* 0.0000* 0.3063 0.05405* 0.0000* 0.0180*
0.0810
Makrani Negroid (MAKN) 0.0146 0.0088 -0.0146 - 0.0000* 0.0000* 0.0270* 0.5495 0.0090* 0.0000* 0.0000* 0.3063 0.0180* 0.0090 0.0000*
0.2973
Mohanna (MHN) 0.1405 0.0774 0.1280 0.1392 - 0.0000* 0.0000* 0.0180* 0.0000* 0.0000* 0.0000* 0.1711 0.0000* 0.5225 0.0000*
0.0000*
Parsi (PRS) 0.1148 0.1268 0.0539 0.0728 0.3099 - 0.0000* 0.0000* 0.0000* 0.0000* 0.0000* 0.0000* 0.0000* 0.0000* 0.0000*
0.5855
Sindhi (SDH) 0.0549 0.0172 0.0143 0.0284 0.0376 0.1647 - 0.4234 0.0000* 0.0000* 0.0000* 0.6486 0.0270* 0.0720 0.3783
0.5585
Balti (BLT) 0.0339 0.0058 0.0019 -0.0087 0.0899 0.1261 -0.0026 - 0.4324 0.0000* 0.0000* 0.5225 0.4774 0.0810 0.1891
0.0720
Burusho (BSK) 0.0458 0.0389 0.0188 0.0273 0.1585 0.0991 0.0629 -0.0000 - 0.0000* 0.0000* 0.0270* 0.0000* 0.0000* 0.0000*
0.0000*
Hazara (HZR) 0.2653 0.2603 0.2721 0.2580 0.3997 0.3058 0.3072 0.2882 0.2109 - 0.0000* 0.0000* 0.0000* 0.0000* 0.0000*
0.0090
Kalash (KAL) 0.1002 0.0797 0.0799 0.0586 0.2338 0.1374 0.1224 0.0650 0.0759 0.2818 - 0.0000* 0.0000* 0.0000* 0.0000*
0.8918
Kashmiri (KSR) 0.0535 0.0052 0.0117 0.0149 0.0224 0.1798 -0.0144 -0.0124 0.0591 0.3150 0.1299 - 0.3513 0.3243 0.4864
0.3693
Pathan (PKH) 0.0418 0.0193 0.0264 0.0272 0.0580 0.1721 0.0129 -0.0075 0.0467 0.2812 0.1024 0.0023 - 0.0000* 0.0090
0.3693
Meo (MEO) 0.1653 0.0943 0.1408 0.1459 -0.0113 0.3160 0.0470 0.1031 0.1675 0.4194 0.2485 0.0112 0.0720 - 0.0630
0.4864
Gujjar (GJR) 0.0582 0.0279 0.0329 0.0416 0.0255 0.1941 0.0002 0.0062 0.0772 0.3193 0.1354 -0.0074 0.0164 0.0415 -
-
Rajput (RPT) 0.0590 0.0115 0.0216 0.0464 0.0096 0.2071 -0.0135 -0.0106 0.0429 0.3293 0.1292 -0.0389 -0.0047 0.0216 -0.0099
69
Table X: Matrix of significant. FST p values (significance level =0.0500) based upon 110 permutations among the
ethnic group of Pakistan.
Population BAL BRU MAKB MAKN MHN PRS SDH BLT BSK HZR KAL KSR PKH MEO GJR RPT
Baloch (BAL) -------- + - - + + + - + + + + + + + +

-
Brahui (BRU) + --------- - - + + + - + + + - + + +
Makrani Baloch (MAKB) - - ----------- - + + - - - + + - - + + -
Makrani Negroid (MAKN) - - - ---------- + + + - + + + - + + + -
Mohanna (MHN) + + + + ---------- + + + + + + - + - + -
Parsi (PRS) + + + + + --------- + + + + + + + + + +
Sindhi (SDH) + + - + + + ---------- - + + + - + - - -
Balti (BLT) - - - - + + - ---------- - + + - - - - -
Burusho (BSK) + + - + + + + - --------- + + + + + + -

Hazara (HZR) + + + + + + + + + --------- + + + + + +
Kalash (KAL) + + + + + + + + + + --------- + + + + +
Kashmiri (KSR) + - - - - + - - + + + ---------- - - - -
Pathan (PKH) + + - + + + + - + + + - --------- + + -
Meo (MEO) + + + + - + - - + + + - + --------- - -
Gujjar (GJR) + + + + + + - - + + + - + - --------- -

----------
Rajput (RPT) + - - - - + - - - + + - - - -
70
MEDIAN-JOINING NETWORK:
Genetic variations among the Pakistani populations were further investigated
by making median-joining network (Bandelt et al., 1995). Here we present L*-M20
lineage network (Figure VIII). The L lineage is considered to arise in Indus valley
region during the Indus valley civilization. The network revealed four clusters,
representing four haplogroups. Samples encircled in red represent L1-M27
haplogroup, samples carrying the L2*-M317 haplgroup were encircled in green and
L3a-PK3 samples were encircled in yellow. The remaining samples carry L3*-M357
haplogroup. The network of L lineage reveals considerable variation among the
Pakistani populations; conversely this net work shows a high degree of population-
specific sub-structure. The network shows isolated Parsi-specific clusters at the
upper right end containing 15 of 16 Parsis. The Kalash fall into two clusters and
Burusho make a cluster at the middle of the net work. Haplotype sharing is the other
striking feature of this network. Within a specific population, for example, the
Burusho, Kalash and Parsi share some haplotypes. However, the four Baloch
individuals shared their haplotype with Sindhi and Makrani-Baloch individuals from
nearby southern population. Similarly, one haplotype was shared between a Brahui
and a Makrani-Negroid individual.
71
Figure VIII: Median-joining network of Lineage L individuals based on YSTR
haplotypes.
72
SECTION 2
COMPARISONS BETWEEN THE PAKISTANI AND GREEK
POPULATIONS:
Current study also included three ethnic groups from northern Pakistan ___
the Burusho, Kalash and Pathan ___ that claim Greek ancestry. These populations
were compared with extant Greek samples from Europe that were genotyped for the
same Y markers. The Y-chromosomal haplgroups and their frequencies in the
Greeks, Burusho, Kalash, Pathan and the rest of the Pakistani populations are
shown in Figure IX.
HAPLOGROUP FREQUENCIES IN PAKISTAN AND GREEK
POPULATIONS:
The combination of biallelic markers identified 13 Y-chromosomal
haplogroups in the Greeks, 16 in the Pathan and 15 in the Burusho populations.
Only eight Y haplogroups were found in the Kalash population. More than 75% of
these samples were represented by haplogroups which are frequent in West Asia,
Europe and the Mediterranean region.
A comparison of the three Pakistani ethnic groups with the Greek populations
shows that certain haplogroups are shared between these populations. These
include clades E*, F*, I*, J*, R1* and T*. Majority of the Pakistani and Greek Y
chromosomes have the derived allele for the M207 marker that encompasses
branches R1* and R1a1* of the Y chromosome phylogenetic tree (Figure IX). R1a1*
was the most common haplogroup found in Pakistan (35.9%) and Greece (15.6%).
Compared to the Greek the frequency of haplogroup R1a1* was relatively higher in
the Pathan (44.8%), Burusho (25.8%) and Kalash (18.2%) samples. Clade R1*
represented by the derived allele for SNP M173 was observed in 11.7% of the Greek
73
and 5.32% of the Pakistani samples. The Greek population exhibited a higher
frequency of this clade in comparison with the Burusho (1.03%), Kalash (2.27%) and
Pathan (4.2%).
Haplogroup J* was the other haplogroup that was found at a high frequency
in the Greek (17%) and Pakistani (14.8%) samples. The overwhelming majority of
Greek J* chromosomes belonged to haplogroup J2* which was present at a
comparable frequency in Pakistan. This haplogroup J2* (including all its derivatives)
was present at a frequency of 15.6% in the Greek, 8.2% in the Burusho, 9.09% in the
Kalash and 5.2% in Pathan. The majority of J2* Y chromosomes in Pakistan
belonged to haplogroup J2a2*, being derived for the marker M67. The Greek
samples could not be typed for this SNP due to lack of DNA. The J1* haplogroup
characterized by the derived allele for M267 was absent in the Burusho and Kalash
populations and was found at low (1%) frequency in the Greek and Pathan.
Clade E* haplogroup were more frequent in the Greek (21%) as compared to
Pakistan (2.2%). The majority of haplogroup E* chromosomes belonged to clade
E1b1b1* (M35 derived) and all Greek and Pakistani samples were resolved into the
branches E1b1b1a* (M78 derived) and E1b1b1c* (M123 derived). Among the three
Pakistani populations claiming Greek descent the M78 derived Y chromosomes were
observed only in the Pathan (2%). This branch constituted 16.9% of the Greek
samples. Clade E1b1b1c* was present at a frequency of only 2.6% in the Greek and
was absent in the Burusho, Kalash, Pathan populations. Its frequency in the
remaining Pakistani populations was 1%.
All G*-M201 derived Greek Y chromosomes (9% of total) belonged to the
G2a* haplogroup characterized by the T allele for SNP P15 (Hammer et al., 2000).
This haplogroup was observed in 18.18% of Kalash and 1% of the Pathan samples
and was absent in the Burusho.
Two branches that frequently characterize Y chromosomes found outside Africa are
H* and I* which distinguish eastern and western populations respectively.

74
Figure IX: A rooted maximum-parsimony tree of Y lineages found in the Greek, Burusho, Kalash, Pathan and Pakistani
populations. The lineages were defined by binary markers whose designations and population frequencies (percentages) are
given below each branch. Branch lengths are arbitrary and the YCC lineage names (Karafet et al., 2008) are shown below the
frequencies. Haplogroup and haplotypes diversity are shown for each population.
75
(Rootsi et al., 2004; Underhill et al., 2001). One Greek sample belonged to
haplogroup H2*, which is characterized by the Apt G to A transition (Pandya et al.,
1998). These Y chromosomes are not found in Pakistan but have been observed in
neighboring India and this is the first time they have been observed in Greece.
Haplogroup I* characterized by the derived allele for M170 is mainly restricted
to Europe and was observed in 19.5% of the Greek sample. This haplogroup was
not observed in the Burusho, Kalash or Pathan and its frequency in Pakistan was <
0.2%.
Only a small proportion of Y chromosomes remain unresolved in clade F* and
were represented by 2% of the Pathan and 1% of the Greek and Burusho samples. It
is possible that in this case distinct haplogroups, as yet unknown, are being classified
into the same paraphyletic haplogroup.
STATISTICAL AND PHYLOGENETIC ANALYSES
PRINCIPAL COMPONENT ANALYSES:
In order to examine population relationships principal component analysis
based upon Y haplogroup frequencies in the Greek and Pakistani ethnic groups was
carried out (Figure X). The first two principal components, PC1 and PC2, account for
79% haplogroup frequency data and separate the populations according to their
geographic locations. The plot shows the Pathan and Burusho populations clustering
with the remaining Pakistani populations in the upper right quadrant of the graph.
The Kalash and Greek form two separate and distinct clusters. To ensure that the
Greek individuals included in this study were representative of the Greek population
studied earlier, results of comparable biallelic data (Francalacci et al., 2003) were
incorporated in the principal component analysis (Figure XI). The Greek population
included in this study clustered with the Greek populations studied earlier but the
distinct Kalash population cluster was not apparent.
76
Figure X: A plot of the first two principal coordinates based upon the analysis
of Y haplogroup frequencies in Pakistani and Greek populations.
77
Figure XI: A plot of the first two principal coordinates based upon the analysis
of Y haplogroup frequencies in Pakistani and Greek samples (1=this study; 2
= Francalacci et al., 2003) using comparable biallelic markers.
78
GENETIC DISTANCES AND PHYLOGENETIC ANALYSIS:
The genetic distances between the populations were calculated using
measures that are more sensitive to recent events (Table XI). The PakistaniGreek
population pair wise FST values based on the variation of STRs within haplogroups
(Qamar et al., 2002) ranged from 0.131 to 0.213, with the lowest value between the
Pathan and the Greeks. Pairwise genetic distances (the number of steps between
a haplotype in one population and the closest haplotype in the second population,
averaged over all comparisons) (Bandelt et al., 1999) ranged from 4.3 to 8.1, with the
lowest value again between the Pathan and the Greeks.
Phylogenetic analysis using the matrix of genetic distances between
populations with tree validation carried out by bootstrap re-sampling (10,000
replicates) also demonstrated that of the three Pakistani populations, the Pathans
were closest to the Greek (Figure XII).
Therefore, together these results, suggest that there might have been a low
degree of recent PathanGreek admixture. Examination of individual lineages by the
NETWORK software using Y-STR frequencies was carried out to investigate this
possibility further.
79
Table XI: Weighted population pair wise genetic distances (below diagonal)
and FST values (above diagonal) based on STR variation within haplogroups.
Greek Burusho Kalash Pathan
Greek 0.000 0.188 0.213 0.131
Burusho 5.659 0.000 0.214 0.196
Kalash 8.066 3.882 0.000 0.219
Pathan 4.277 2.451 3.254 0.000
80
Figure XII: Neighbor-joining tree showing the relationship between the Greek
and three Pakistani ethnic groups. The tree is based on genetic distances.
Bootstrap values from 10,000 replicates are shown.
81
MEDIAN-JOINING NETWORK:
A median-joining network of clade E1b1b1a* Y chromosomes was
constructed in order to examine the genetic relationship between the Greek and
Pathan samples. A duplication of 10 and 13 repeat units was observed in the clade-
E derived Y chromosomes for the tri-nucleotide repeat DYS425 and this locus was
subsequently excluded from the network. The most striking feature of this network
was the sharing of haplotypes between the Pathan and Greek samples (Figure XIII).
One Pathan individual shared the same Y-STR haplotype with three Greek
individuals, and the other Pathan sample was separated from this cluster by a single
mutation at the DYS436 locus. This demonstrates a very close relationship between
the Pathan and Greek E lineages.
82
Figure XIII: Median-joining network of clade E* lineages in Pakistan (open
circles) and Greece (hatched circles). Circles represent haplotypes and have
an area proportional to frequency. The Pathan individuals are shown in black.
83
CONTOUR MAPPING:
The worldwide distribution and frequency of the haplotype shared between
the Greek and Pathan clade E1b1b1a* individuals was checked in the Y-STR
Haplotype Reference Database (YHRD; Roewer et al., 2001). Worldwide data for
the subset of 16 Y-STRs including DYS19, DYS388, DYS389I, DYS389II, DYS390,
DYS391, DYS392, DYS393, DYS425, DYS426, DYS434, DYS437, DYS435,
DYS438, DYS436, DYS439 were not available in this database. However, part of
this haplotype based upon a subset of nine Y-STRs (DYS19=15; 389I=13; 389II=29;
390=24; 391=10; 392=11; 393=12; 438=9; 439=12) was found in 53 individuals in a
worldwide population sample of 7,897 haplotypes. This haplotype was highly specific
for the Balkans. The contour map of this haplotype (Figure XIV) shows a major
concentration in the Balkans, around Macedonia and Greece, with a low scattering in
other European countries and a comparable frequency in Tunisia and West Africa
and the Pathan. This gives a strong indication of an European, possibly Greek,
origin of these Pathan Y chromosomes.
84
Figure XIV: Contour map showing the 9 Y-STR haplotype frequency
distribution in Eurasia and northern Africa. This haplotype was shared
between three Greeks and a Pathan individual belonging to clade E1b1b1a*.
85
DISCUSSION
-6-
Our DNA is inherited from our ancestors, so genetic analysis can be used to
provide information regarding our history. The Y chromosome is particularly useful in
this respect because most of it is passed down from father to son without change,
except for the gradual accumulation of mutations which appear as DNA
polymorphisms. The present study provides an example of the power of a
genealogical approach to Y-chromosome analysis based on a hierarchical use of
specific markers in the Pakistani population.
Pakistan lies on the postulated southern coastal route out of Africa. The
earliest evidence suggests this region was colonized about 60,000-70,000 years ago.
Pakistan was the site of several ancient cultures such as Mehrgarh, one of the
world's earliest known towns, present in the southern Pakistani province of
Baluchistan (Jarrige, 1991) and evidence from this region indicates that modern
humans were settled in this region during the Neolithic period. The region's other
earliest civilizations were the Indus Valley civilization at Harappa and Mohenjo-Daro.
Moreover, the Indo-Pak subcontinent has become home to a multitudinous variety of
racial groups due to the invasion of the region through out the millennia. Thus, it is
one of the most genetically diverse areas in the world today.
Present day Pakistan is bordered by Iran and Afghanistan on the west, India
towards the east and China in the north. The Indian Ocean straddles its entire
southern coast line. The Himalayan Hindukush Mountains form a formidable
presence in the north and north west.
The diversity of Y chromosome has been extensively used to study the
genetic variation in humans. Human Y chromosomes are delineated into distinct
haplogroups and lineages, defined by a combination of unique event or biallelic
polymorphisms and Y-STRs. Each haplogroup represents a unique chromosome
lineage that originated from a single male ancestor somewhere in the world in the
past. The discovery of new paragroups and the formerly discovered lineages have
made it possible to carry out detailed population genetic analysis based on

86
haplogroup and haplotype frequencies. The spread of each haplogroup is assumed
to be unaffected by both selection and male migration. However, the haplogroup
frequencies in an area may be influenced by demographic factors and genetic
founder effects such as gene flow and genetic drift.
In the current study we examined 93 biallelic markers in 1,213 male subjects
from 16 ethnic groups of Pakistan and a Greek population by a variety of PCR
techniques. The extensive analyses of Y diversity allowed us to investigate:
1. The genetic diversity within Pakistani ethnic groups from the male
perspective.
2. Comparison of three Pakistani populations (the Burusho, the Kalash
and the Pathan) with the Greek population. These Pakistani
populations claim that they are the descendent from the Greek
soldiers which were left behind in this region by Alexander the Great.
3. Genetic differences between male individuals from Pakistan in
comparison to world populations.
4. Gain insight into the origin of Pakistani ethnic groups.
87
PART 1
COMPARSION WITHIN PAKISTAN:
According to their geographic distribution Pakistani populations were
characterized into two categories; the northern group that incorporated the Punjabi
populations and a southern group. The northern populations that were screened
included Balti, Burusho, Hazara, Kalash, Kashmiri, Pathan and the Punjabis (Gujar,
Meo and Rajput) castes. The populations from the south of Pakistan included
Baloch, Brahui, Makrani-Baloch, Makrani-Negroid, Mohanna, Parsi and Sindhi. The
combination of 93 biallelic markers identified 33 stable Y chromosomal haplogroups
in the Pakistani populations (Table VI).
Haplogroups H1*-M52, J2a2*-M67, L1-M27, R1a1*-M17, R2-M124 which are
frequent in South Asia, Europe and the Mediterranean region, together make up 60%
of the Pakistani populations. It was also observed that the southern population group
is more genetically diverse as compared to the northern group. Forty-five percent
(45%) of southern populations carry these 33 Y haplogroups, whereas they are found
in 39% and 15% of northern and Punjabi populations respectively. In this study, we
also screened 1,213 Pakistani individuals for five novel Y-SNPs PK1-PK5
(Mohyuddin et al., 2006). Three SNPs identify population specific haplogroups within
Pakistan. L3a-PK3 was found solely in the Kalash population, the O2a1a-PK4 was
restricted to Pathan population while R1a1e*-PK5 was confined to the Burusho.
Based upon the Y haplogroup frequencies principal component (PC) analysis,
it is observed that all the ethnic groups from Pakistan cluster together except the
Hazara (Figure VII). Although the Pakistani population include geographically,
culturally and the linguistically isolated ethnic groups such as Kalash, Burusho and
the Dravidian speaking Brahui, however, they do not stand out in the over all
comparison.
88
Haplogroup C*-chromosome and its off-shoot separate the northern and
southern region within Pakistan. C*-RPS4Y haplogroup was only found in two
southern populations the Mohanna (4.3%) and Brahui (2%). Interestingly, the
Punjabis from the northern part carry this haplogroup (1.6%) as well (Table VI).
However, C3-PK2, one of the newly identified off-shoots of C*-RPS4Y haplogroup
was found only in two northern ethnic groups (Table VI). This haplogroup was
highest among the Hazara (60%) followed by the Burusho (8.2%). The C*-RPS4Y
haplogroup is fairly common in Central Asia and Mongolia and it points towards the
Mongol origins of the Hazara population which is supported historically (Bellew,
1979) and genetically (Qamar et al., 2002; Zerjal et al., 2003). However, the origin of
Burusho is not well documented. Some claim that they are the descendants of
Greek soldiers while some others claim that they are descendants of Dards from
Central Asia (Biddulph, 1977). The analysis of Francalacci and Rootsi shows that
the Haplogroup C* chromosome is not present in Greece (Francalacci et al., 2003;
Rootsi et al., 2004). On the other hand, one of the earlier studies shows that the
populations belonging to Tajikistan clustered with Hunza Burusho (Wells et al.,
2001). Furthermore, the studies with the autosomal genetic markers (Ayub et al.,
2003; Mansoor et al., 2004) and markers of Y chromosome (Firasat et al., 2007)
suggest that the Burusho are genetically close to their geographic neighbors. The
high frequencies of haplogroup C *-chromosome in Hazara, Burusho and in Central
Asia suggest that the C*-chromosome arose in Central Asia before the separation of
these two Pakistani populations (Mohyuddin et al., 2006).
Major haplogroups of clade E*, E1b1a*-sY81 and off-shoots of E1b1a*-sY81
were also detected with higher frequency in the southern group of Pakistan as
compared to northern and the Punjabi group. Haplogroup E*-SRY-8299 has been
reported to have a North African origin and is not found in northern Pakistani ethnic
groups and the Punjabi group (Qamar et al.,1999). However, a low frequency of this
haplogroup is found in the southern group of Pakistan (0. 2%). The haplogroup of
89
E1b1a*-sY81 (M2) is sub-Saharan in origin and is found in Baloch, Brahui, Makrani-
Baloch and Makrani-Negroid (1.5%, 3.4%, 3.7 and 9.1% respectively) populations of
the south (Table VI). The highest frequency of haplogroup E1b1a*-Sy81 is found in
the Makrani-Negroid population (9.1%) who are reported to have a recent African
origin. The highest frequency of E1b1a*-Sy81 in Makrani-Negroid could represent
the genetic legacy of the African slaves that were brought to the Indo-Pakistan
subcontinent by the Arabs and European invaders.
The other sub clade of E-haplogroup is E1b1b1*-M35 that originated in East
Africa (Semino et al., 2004). The remaining E1b1b1* Pakistani Y chromosomes were
resolved into two branches E1b1b1a*-M78 and E1b1b1c*-M123. The E1b1b1a*-M78
haplogroup was present only in Pathan (2.1%) from northern site and Baloch (6.1%)
from southern site of Pakistan (Table VI). All the E1b1b1*-M35 chromosomes from
southern Pakistan further resolved into E1b1b1c*-M123 haplogroup. The Y-
chromosome of E1b1b1a*-M78and E1b1b1c*-M123 haplogroup are also found in
Iran (Regueiro et al., 2006), Turkey (Cinnioglu et al., 2004) and in Greece (Firasat et
al., 2007). It is also possible that the clade E haplogroup expands with the spread of
agriculture (Hammer et al., 1998; Semino et al., 2000).
The G*-M201 haplogroup is present with a low frequency in Pakistani ethnic
groups. The highest frequency of G*-M201 haplogroup is only observed in Pathan
(10.4%). Towards the south the frequency of G*-M201 dramatically decreased and
only 1.4% Mohanna carry this haplogroup (Table VI). Haplogroup G*-M201 occurs at
~ 30% in Georgia (Semino et al., 2000) and the north Caucasus (Nasidze et al.,
2003), 10.9% in Turkey (Cinnioglu et al., 2004), 2.2% in Iraq (Al-Zharery et al., 2003)
and 1.33% in Iran (Regueiro et al., 2006). This haplogroup is also found in southeast
Europe and in the Mediterranean regions (Semino et al., 2000). In contrast to the
haplogroup G*-M201, the G2a*-P15 haplogroup is the most frequently present
haplogroup in Southern group of Pakistan. Except the Baloch and the Makrani-
Baloch this haplogroup is found in all other ethnic groups belonging to southern
90
Pakistan. However, from northern Pakistan only Kalash and Pathan carry this
haplogroup. G2a*-P15 haplogroup occurs at 9% in Turkey (Cinnioglu et al., 2004),
5% in Italy and Greece (DiGiacomo et al., 2003) and 7.33% in Iran and throughout
the Middle East with a maximum of 19 % in the Druze (Hammer et al., 2000).
Haplogroup H1*-M52 was observed almost in all ethnic groups of Pakistan.
The highest frequency of H1*-M52 Y chromosome was found in Kalash (20.4%)
followed by the Gujar (7.6%), Balti (7.1%), Makrani-Negroid (6.1%), Sindhi (5.8%)
etc. (Tables VI and VII). Many studies have showed that the clade H originated
within the Indo-Pak subcontinent (Gayden et al., 2007; Kivisild et al., 2003; Pandya et
al., 1998; Sengupta et al., 2006). The frequency of this indigenous haplogroup was
found higher in southern India (Ramana et al., 2001; Wells et al., 2001) as compared
to the northwest Punjab (Kivisild et al., 2003). Other than India and Pakistan this
haplogroup was found in Newar (6.1%), Kathmandu (11.7%) (Gayden et al., 2007)
and in Turkey (0.38%) (Cinnioglu et al., 2004). The other branch of Clade H*, H2*-
APT, is also found with higher frequency in India but none of the Pakistani Y-
chromosome carry this haplogroup. It is also interesting that the Greek Y
chromosome carry H2*-APT haplogroup at low frequency (Firasat et al., 2007).
Haplogroup J* is identified by the 12f2 human endogenous retroviral
polymorphism (Sun et al., 2000; Rosser et al., 2000). Haplogroup J* Y chromosome
is widely distributed in Eurasia, Middle East, and in North Africa (Hammer et al.,
2001; Quintan-Murci et al., 2001). Haplogroup J* branches were distributed across
all Pakistani populations. The low frequency of J1*-M267 was detected in Pakistani
populations. This haplogroup characterized African and Arabian populations and the
frequency of J1*-M267 chromosome decreases towards the north and east direction.
The high frequencies of this haplogroup were found in Oman (38%) (Luis et al.,
2004); Iraq (33%) (Al-Zahery et al., 2003); Egypt (20%) (Luis et al., 2004); Lebanon
(13%) (Semino et al., 2000); Turkey (9%) (Cinnioglu et al., 2004); Iran (10.5%)
(Regueiro et al., 2006); India (0.27%) and East Asia (0%) (Sengupta et al., 2006);
91
and in Pakistan (1.2%). The frequencies of this haplogroup indicate the differential
influence from East Africa and Middle East in southwestern Asia. However, the other
clade of J* haplogroup the J2* haplogroup are distributed mainly in west Asians and
Eurasian populations. The demographic expansion of J2* chromosomes occurred
during the dispersal of Neolithic farmers (King and Underhill, 2002). Haplogroup J2*
and its derivative were found at a frequency of 23% in Iran (Regueiro et al., 2006),
22.2% in Turkey (Cinnioglu et al., 2004), 9% in India (Sengupta et al., 2006) and
11.2% in Pakistan. There appears to be a decrease in the frequency of this
haplogroup as one moves from the south west to the north east of Pakistan. A
decrease in the frequency of J2* derivatives can be seen east of Iranian Plateau in
South Pakistan (7.7%), with a dramatic decline in north Pakistan (2.0%) and in
Punjabi caste (1.5%) (Table VI). Sengupta et al., (2006) shows that J2* clade is
nearly absent in East Asia (1.14%). The presence of J2* and its derivative
chromosome in the Pakistani populations indicates a Persian and Mediterranean
gene flow and is supported by the high frequency of this haplogroup in the Parsis.
This population arrived in India from Iran (Quintana-Murci et al., 2001).
Haplogroup L* is delineated by the presence of M 20 mutation (Underhill et
al., 1997). The L* haplogroup could be a recent event and arose in Indus valley
region during the Indus valley civilization. This high frequency of L* haplogroup is
found in the Indo-Pak subcontinent. The L* chromosome is largely restricted to
south Caucasus populations (Weale et al., 2001), Middle East (Nebel et al., 2001b),
Pakistan (Qamar et al., 2002), India (Kivisild et al., 2003; Sengupta et al., 2006).
However one of its sub branches L1-M27 was found with high frequency in Pakistan
(5%), India (6.32%) (Sengupta et al., 2006) and Iran (2.6%) (Regueiro et al., 2006)
while no L1-M27 chromosome was observed in East Asia (Sengupta et al., 2006)
and in Turkey (Cinnioglu et al., 2004). Comparison among the three Pakistani
groups (northern, southern and Punjabi group) displays a significant difference in
92
haplogroup distribution. A considerable diversity was noticed in populations
belonging to southern Pakistan.
The most frequent haplogroup in Pakistan was haplogroup R* (48%) (Table
VI). This haplogroup is widespread in Europe, the Caucasus, West Asia, Central Asia
and in South Asia (Sengupta et al., 2006) however, it is absent in Africa and the New
World chromosomes. The most frequently found sub clade of haplogroup R* is
R1a1*-M17. The haplogroup R1a1* chromosomes originated in Southern
Russia/Ukraine in the region between the Black and Caspian Seas. This R1a1*
chromosome spread with the expansion of Kurgan culture (Passarino et al., 2001;
Quintana-Murci et al., 2001; Wells et al., 2001; Sengupta et al., 2006). Recent
studies showed that this chromosome covers the area ranging from India to Norway
(Kivisild et al. 2003; Passarino et al., 2002; Quintana-Murci et al., 2001) but it is
almost absent in East Asia (Sengupta et al., 2006; Su et al., 1999).
In the indo-Pak subcontinent it has been postulated that this haplogroup
coincided with the arrival of Indo-European nomadic pastoral tribes from West and
Central Asia (Quintana-Murci et al., 2001). However, the study by Sengupta et al.
(2006) revealed the Holocene expansion of this R1a1*-M17 chromosome before the
arrival of Indo-European tribes from the north western side of India.
93
PART 2
COMPARISION BETWEEN PAKISTANI AND GREEK POPULATIONS:
In the present study the genetic relationship of three Pakistani populations
Burusho, Kalash and Pathan who claim descent from the Greek soldiers was
compared with the extant Greek population. For this purpose a combination of ninety
three (93) biallelic Y chromosome SNPs (Table II) and a set of 16 YSTRs were used
(Table IV). This extensive analysis of Y diversity within Greeks and three Pakistani
populations allowed us to compare Y diversity within these populations and re-
evaluate their suggested Greek origins.
The genetic relationship between the three Pakistani populations and the
Greeks can now be judged in the light of phylogenetic analyses and corresponding
statistical results. The phylogenetic results (Figure IX) showed that clade H, clade I
and the clade L haplogroups are the major haplogroups that separate Pakistani
populations from the Greeks.
The H* haplogroup is an Asia specific haplogroup (Underhill et al., 2001).
Sub-branch of haplogroup H*, H1*-M52 was observed in Pakistani populations, but
not in any of the Greek samples (Figure IX). However, the Indian specific branch
H2*-APT was not present in any Pakistani ethnic group but a low frequency (1.3%)
was observed in Greek population (Firasat et al., 2007). The presence of the Indian
specific sub-clade H2*-APT haplogroup in the Greek is the first time that this
haplogroup has been observed in any western European population and could
indicate ancient contacts.
On the other hand Haplogroup I*-M170 appears as a European specific
haplogroup (Rootsi et al., 2004). The consistency of this result was also seen in our
analyses and 19.5% Greeks have I-M170 Y chromosome (Figure IX). This
haplogroup was absent in Burusho, Kalash and Pathan. Low contribution of this
haplogroup was seen in the rest of the Pakistani ethnic groups.
94
Similarly clade L* observed only in Pakistani populations and absent in the
Greeks (Figure IX). Like haplogroup H*, the L*-M20 and R2-M124 are indigenous to
the Indus Valley and south west Asia. Clade L* has been suggested to be associated
with the spread of agriculture in the Indus Valley between 7000-2000 B.C. (Qamar et
al., 2002). All L*-M20 derived Y chromosomes in the Kalash population were
distinguished by the presence of a novel PK3 polymorphism which placed them in
the sub-clade L3a (Figure IX). In the same way the R2-M124 was absent in Greeks
and found 14.4% in Burusho and 5.74% in rest of Pakistani populations (Figure IX).
Clade E* Y chromosomes most probably originated in east Africa and spread
in North Africa, Middle East, and European countries (Semino et al., 2004). In the
Pakistani populations, a low frequency of E* haplogroup was present as compared to
the Greeks (2.5% and 21% respectively). Sub clade of E* haplogroup, E1b1b1a*-
M78, also arose in Africa (Cruciani et al., 2004). E1b1b1a*-M78 of haplogroup E* is
the only branch that is present with low frequency in Pakistani populations (0.41%)
and high frequency in Greek population (17%). Among the three Pakistani
populations that claim Greek ancestry the Pathan were the only population in which a
low frequency of clade E1b1b1a* -M78 was present (2.1%) (Figure IX). Even more
compelling evidence in support of the genetic relationship between the Pathan and
Greek E1b1b1a*-M78 Y chromosome was provided by the median joining network
(Figure XIII). One Pathan shared the same Y-STR haplotype; that included a
duplication of 10 and 13 repeats for the DYS425 locus; with three Greek individuals
and the other was separated from this cluster by a single mutation which enabled us
to estimate the Time to the most recent common ancestor (TMRCA)( mean SD),
using the Network software as between 2000 400 and 5000 1200 Years before
past (YBP) depending upon the observed (Kasyer et al., 2000) or inferred mutation
rates (Zhivotovsky et al., 2004). This coincides with the period of Alexanders
invasion during 327-323 B.C. In addition, this haplotype was not found in any other
E1b1b1a*-derived Pakistani Y chromosome. However, this haplotype was observed
95
in 53 individuals in the Y-STR Haplotype Reference Database (YHRD) Kasyer et al.,
2000) and was highly specific for the Balkans the highest frequency being in
Macedonia.
It is worth emphasizing here that the chance of picking up rare events largely
amplified by drift affecting a limited portion of the population cannot be discounted,
and Cruciani et al., (2006) also recommend caution when using microsatellite alleles
as surrogates of unique event polymorphisms. The genetic data alone do not tell us
when the Balkan chromosomes arrived in Pakistan; therefore, it is necessary to turn
to the historical record for this. There has been no known Greek admixture within the
last few generations, but in addition to Alexanders armies, the possibility of
admixture between the Greek slaves who were brought to this region by Xerxes
around one hundred and fifty years before Alexanders arrival, and the local
population, cannot be discounted (Firasat et al., 2007). At that time Afghanistan and
present day Pakistan were part of the Persian Empire (Wolpert, 2000). Nevertheless,
Alexanders army of 2500030000 mercenary foot soldiers from Persia and West
Asia and 50007000 Macedonian cavalry (Engles, 1981) perhaps provides a more
likely explanation because of their elite status and substantial political impact on the
region.
Several studies have shown that Clade E* is present at a relatively high
frequency in the Greek population (Firasat et al., 2007; Francalacci et al., 2003;
Hammer et al., 2001). Our results have shown that the high frequency of clade H1*-
M52 and L3a-PK3 (20.45% and 22.7% respectively) and the lack of clade E* in the
gene pool of Kalash, make the Kalash distinct from the Greeks (Figure IX).
The statistical analysis of results has also shown the highest pair-wise genetic
distance [ST (0.213) and (8.066)] values for the Kalash population (Table XI).
Moreover, the Kalash form a distinct cluster in the principal component analysis
(Figure X). On the basis of these results it is thus concluded that the true Greek
contribution to the Kalash gene pool remains uncertain.
96
The presence of a unique population specific L3a-PK3 haplogroup in Kalash
sample enabled us to use the BATWING algorithm (Wilson et al.,1998) to estimate
the median TMRCA for the Kalash L3a lineages as 970 YBP (200-3500 YBP). This
coincides with the arrival of the Kalash from Afghanistan into the Chitral Valley in
northern Pakistan during the tenth and eleventh century AD (Lines, 1999).
The pair-wise genetic distance ST (0.188) and (5.659) values reveal no
Greek connection for Burusho which is a language isolated-population. Furthermore,
principal component analysis placed Burusho as being distinct from the Greek and
closer to their neighbors in Pakistan (Figure X), suggesting that the linguistic
differences arose after the common Y pattern was established. Alternatively, there
may have been sufficient Y gene flow between populations to eliminate any initial
differences that may have been present.
This study as a whole excludes a large Greek contribution to any Pakistani
population confirming previous observations (Mansoor et al., 2004). However, it
provides evidence in support of the Greek origins for a very small proportion of
Pathan as demonstrated by clade E* network (Figure XIII) and low pair-wise genetic
distances between these two populations (Table XI). The contribution to the Kalash
is unclear and no contribution to the Burusho could be detected. This conclusion
requires the assumption that extant Greeks are representative of Alexanders armies.
The failure to find a conclusive Y link with the extant Greek population could also be
attributed to the fact that besides the 5000-7000 men strong Macedonian cavalry,
Alexanders army also consisted of 25,000-30,000 mercenary foot soldiers from
Persia and West Asia (Engels, 1981) and populations from Pakistan have been
shown to be closer to those from West Asia (Qamar et al., 2002; Quintana-Murci et
al., 2001).
97
PART 3
COMPARISION WITH WORLD POPULATIONS:
In this part Pakistani populations compared with World populations by using
the published haplogroup frequency data at similar molecular resolution. Table XII
provides all information about Asian reference population that was used in this
analysis.
The Pakistani Y chromosomes contain four major haplogroups, i.e.
haplogroup C*, haplogroup J*, haplogroup L*, and haplogroup R*, which together
account for 85.5% of total Y chromosome of Pakistani population (Table VI). The
most frequently observed haplogroup in Pakistan are haplogroup R* which make
47.5% (including all the derivatives) of the total Pakistani population. The world wide
data of Y chromosome show that the R* haplogroup with high frequency is present
among populations belonging to western and southern countries. Among
populations this haplogroup represents a variety of language groups such as
Dravidian, Indo-Iranian and Indo-European etc. However, haplogroup R* are rare
(present with low frequency) or absent in eastern countries populations. According to
the Figure XV adapted from Gyden et al., 2007, the Kyrgyz Y chromosomes in
central Asia have more than 50% haplogroup R*. The frequency gradually
decreases in Kara kalpak (34%) and Kazak (11%). In west Asia the highest
frequency of haplogroup R* is observed in northern Iran (27.2%), southern Iran
(25.6%), Syria (25%), Iraq (17.3%) and Lebanon (6%). Haplogroup R* is found in the
southern Asian populations with a frequency of 62.1% in Newar, 59% in Punjab,
46.8% in Kathmandu and 31% in Gujarat.
The second most abundant major clade is haplogroup J*, which occurs with
an average frequency of 15% in Pakistan (Table VI). This haplogroup originated

98
about 30,000 YBP in Fertile Crescent (a region that today includes Israel, the West
Bank, Jordan, Lebanon, Syria and Iraq: Semino et al., 2004). The high frequencies
among populations of the Middle East, North Africa and East Africa provide evidence
that haplogroup J* expanded more in southern direction in these areas (Thomas et
al., 1999). However, J2* originated in northern part of the Fertile Crescent. The
presence of this haplogroup in Europe and in India, Pakistan and in Nepal reveals
that haplogroup J2* expanded in both east and west directions (Al-Zahery et al.,
2003). The haplogroup J1*/J2* occurs at a frequency of 40.6%/15.8% in Jordan
(Flores et al., 2005), 37.2%/9.9% in Oman, 19.7/12.2% in Egypt (Luis et al., 2004),
9.2%/ 24.3% in Turkey (Cinnioglu et al., 2004)31%/ 26.6% in Iraq (Al-Zahery et al.,
2003), 13.8% / 18.9% in Iran (Nasidze et al., 2004; Underhill et al., 2000; Wells et al.,
2001), 16.3% /29.8% in Lebanon (Hammer et al., 2000; Semino et al.,2004; Wells et
al., 2001), 32.4%/ 22.5% in Syria (Crucani et al., 2004; Di Giacomo et al., 2004;
Hammer et al ., 2000) 38.5% / 16.8% in Palestine (Crucani et al., 2004; Hammer et
al., 2000; Nebel et al., 2001), 2.5% / 0.5% in Somalia (Sanchez et al., 2005) and
1.3% / 7.2% in Greek (Firasat et al., 2007).
12.3% of Pakistani Y chromosomes have haplogroup C*. This haplogroup is
found at high frequency in Australian aborigines, Polynesians, Kazaks, Mongolians,
Manchurians, Tuva etc. Haplogroup C* is spread in all directions. For example, C* is
found on the Indian subcontinent, Sri Lanka and in parts of SE Asia. The C1*
haplogroup found at low frequency in Japan, while C2* is found predominantly in New
Guinea, Melanesia, and Polynesia. The successful C3* haplogroup is originated in
southeast or central Asia. From central Asia this haplogroup is expanded towards
northern Asia and the Americas, and low concentrations are also found in eastern
and central Europe, where it may represent evidence of the westward expansion of
the Huns in the early middle ages. C4* is found among aboriginal Australians and a
significant occurrence of C5* is found in India.
99
The Hazara are an ethnic group in Pakistan that claim to be
descendents of Genghis Khan. The highest frequency of C3 haplogroup in Mongolia
suggested that C3 chromosome spread widely during the time when Genghis Khan
(Mongol) conquered Asia. Haplogroup C3* is present in 60% males belonging to
Hazara and 8.2% of Burushos (Table VI). In a study conducted by Zerjal et al. (2003)
the median-joining network (Bandelt et al., 1999) links the Hazara population to the
male descendents of Genghis Khan (Figure XVI). This is due to the presence of the
unique star cluster Y-STR haplotypes in haplogroup C3Y chromosomes. However,
the star haplotype was not observed in Burusho population indicating separate origins
of these two populations despite some sharing of haplogroup C3*.
The L* haplogroup is other main haplogroup in Pakistan. This haplogroup
occurs at the background of M9 haplogroup. The segment of the M9 Eurasia Clan
migrated south and reached the rugged, mountainous Pamir Knot region. Their L*
haplogroup may have been born about 30,000 years ago and represents the earliest
significant settlement of human in Indo subcontinent. Therefore, Haplogroup L* is
known as the Indian Clan. Today, the L* haplogroup is found primarily as sub-group
L1 in India and Sri Lanka. Sub-group L3* is found mostly in Pakistan. Haplogroup L*
can also be found in low frequencies in the Middle East and in Europe along the
Mediterranean coast.
Haplogroup L* is mainly associated with south Asia. The current analysis of
Sengupta et al., 2006, Thamseem et al., 2006 alongwith Cordaux et al., 2004 and
Basu et al., 2003 reveal that 7-15% Indian males have L* haplogroup while10.8%
Pakistani males carry this haplogroup (present study). As shown in Figure XVII, and
the work conducted by Wells et al., 2001, a very high frequency of haplogroup L* was
present in South India and western Pakistan than in south Pakistan. However a low
frequency of haplogroup L* was observed in northern India and Pakistan while
haplogroup L* absent in east India. A low frequency was found in Oman (0.8%: Luis
100
et al., 2004), Iraq (1%: Al-Zahery et al., 2003), Lebanon (2%: Hammer et al., 2000;
Semino et al., 2004; Wells et al., 2001), and Greece (1.1%: Di Giacomo et al., 2003;
Semino et al., 2004).
Haplogroup B* is one of the oldest Y-chromosome haplogroup confined in
African population (Knight et al., 2003). This haplogroup appears at low frequency all
around Africa, but is at its highest frequency in Pygmy populations. In current study,
an interesting observation was the presence of this ancient haplogroup B* lineage in
two Pakistani males i.e. one that belongs to Brahui the Dravidian speaking population
and the second one that belongs to Makrani-Negroid from the southern population.
Median-joining network (Bandelt et al., 1995) for the M60 derived Y haplotypes for
DYS19, 389I, 389b, 390 and 392 revealed that the Brahui sample (Y-STR haplotype
14_11_18_24_13) differed from three Sukuma individuals (Knight et al., 2003) at the
DYS19 locus only (16_11_18_24_13) (Figure XVIII). However, the Makrani Negroid
(Y-STR haplotype 15_10_18_21_11) differed from one individual belonging to
Hadzabe population at the 389b, 390, and 392 loci (15_10_17_20_13) (Table XIII).
The time of separation between the populations, estimated by the software Network
(Bandelt et al., 1995) was approximately 5000-10,000 years. These results exclude
an ancient migration and suggest that a more recent migratory event is responsible
for this separation. It is possible that these chromosomes originated as the M2
derived chromosomes found in some populations of southern Pakistan as described
by Qamar et al., 2002. However, Qunitana-Murci et al., 2004 described it as genetic
legacy of the slave trade that existed between the southern coast of Pakistan and
East Africa.
Haplogroup O* is commonly present in East and South Asia. 80-90% of all
men in East and Southeast Asia carry this haplogroup; however, a low frequency
(0.82%) of this haplogroup was observed in Pakistan (Figure XIX).
101
In comparison with worldwide data, it is suggested that the gene pool of
Pakistani ethnic groups is much closer to the western populations as compared to
the populations of the east and south east Asia. It is illustrated by the presence of
frequently found haplogroups like, J* and R* etc. are also contributed in western Asia
and the European gene pool but not found in China and Japan. However, the low
prevalence, or absence, of East Asian i.e C3 and O*haplogroups in Pakistan
indicates that the Karakoram Mountains, which separate Pakistan and China, form a
formidable barrier to gene flow from the north. The Hazara are the only population
that have 60% C3 Y-chromosome shows significant East Asian (Mongolian) ancestry
but historical records indicate that they did not cross this geographical boundary and
arrived in the subcontinent from the West.
102
Table XII: Description of World populations.
Geographic Abbreviation Language Family No. of References
Region and Subjects
Population
Middle East:
Northern Iran NIR Indo-European 33 Regueiro et al., 2006
Southern Iran SIR Indo-European 117 Regueiro et al., 2006
Iraq IRQ Afro-Asiatic 139 Al-Zahery et al., 2003
Lebanon LEB Afro-Asiatic 50 Wells et al., 2001
Syria SYR Afro-Asiatic 20 Semino et al., 2000
Central Asia:
Kazak KAZ Altaic 54 Wells et al., 2001
Kyrgyz KYR Altaic 52 Wells et al., 2001
Karalkalpak KAR Altaic 44 Wells et al., 2001
Shugnan SHU Indo-European 44 Wells et al., 2001
Mongolia MON Altaic 24 Wells et al., 2001
Tibet TIB Sino-Tibetan 156 Gayden et al., 2007
South Asia:
Adi ADI Sino-Tibetan 55 Cordaux et al., 2004
Gujarat GUJ Indo-European 29 Kivisild et al., 2003
Punjab PUN Indo-European 66 Kivisild et al. 2003
Pakistan PAK Indo-European 1213 Present study
Tamang TAM Sino-Tibetan 45 Gayden et al., 2007
Newar NEW Sino-Tibetan 66 Gayden et al., 2007
Kathmandu KAT Indo-European 77 Gayden et al., 2007
103
Northeast Asia:
Korea KOR Altaic 74 Karafet et al., 2001
Japan JAP Altaic 259 Hammer et al., 2006
Tuva TUV Altaic 42 Wells et al., 2001
Buryat BUR Altaic 81 Karafet et al., 2001
Manchu MAN Altaic 35 Xue et al., 2006
Southeast Asia:
Philippines PHI Austronesian 48 Karafet et al., 2005
Malaysia MAL Austronesian 32 Karafet et al., 2005
Vietnam VIE Austronesian 70 Karafet et al., 2005
Bali BAL Austronesian 551 Karafet et al., 2005
Southern Han SHA Sino-Tibetan 166 Karafet et al., 2005
104
Figure XV: The frequencies of Major haplogroups in Asian population. The
populations legends are shown in Table XII.
105
Figure XVI. Median-joining network of C* lineages. The central star-cluster
profile is 10-16-25-10-11-13-14-12-11-11-11-12-8-10-10, for the loci DYS389I-
DYS389b-DYS390-DYS391-DYS392-DYS393-DYS388-DYS425-DYS426-
DYS434-DYS435-DYS436-DYS437-DYS438-DYS439. Circles represent
lineages, area is proportional to frequency, and color indicates population of
origin. Lines represent microsatellite mutational differences.
adapted from Zerjal et al.2003.
106
Figure XVII: Distribution of L* haplogroup in Indo Pak sub continent.
adapted from Sengupta et al. 2006.
107
Table XIII: Y-STRS data of clade B* lineages in Pakistan and African
populations.
DYS19_389I_389b_390_392
Hadzabe
Sukuma
Makrani
Lisongo
Negroid
TOTAL
Brahui
Biaka
Mbuti
San
H1 14_11_15_25_14 1 1
H2 14_11_18_24_13 1 1
H3 15_10_14_21_13 2 2
H4 15_10_15_20_13 1 1
H5 15_10_15_22_13 7 7
H6 15_10_17_20_13 1 1
H7 15_10_18_21_11 1 1
H8 15_11_16_23_13 1 1
H9 16_10_15_24_13 2 1 1
H10 16_11_13_24_13 2 2
H11 16_11_14_24_13 1 1
H12 16_11_15_25_13 1 1
H13 16_11_16_20_13 1 1
H14 16_11_16_23_13 1 1
H15 16_11_18_24_13 3 3
H16 16_7_14_24_13 1 1
H17 16_7_15_24_13 1 1
H18 16_7_16_24_13 1 1
H19 17_11_13_24_13 1 1
H20 17_11_14_24_13 1 1
H21 17_11_16_20_13 1 1
H22 17_7_16_24_13 1 1
H23 18_11_16_23_13 1 1
108
Figure XVIII: Median-joining network of clade B* lineages in Pakistan and
African populations. Circles represent haplotypes and have an area
proportional to frequency. The Pakistani individuals are shown in orange and
light blue colour.
109
Figure XIX: Geographic distribution of haplogroup O3.
adapted from Shi et al. 2005.
110
PART 4
INSIGHTS INTO POPULATION ORIGINS:
Pakistan is geo-strategically placed and has witnessed many invasions and
migrations from the west over the centuries. Present day Pakistan is bordered by
Iran and Afghanistan on the west, India towards the east and China in the north. The
Indian Ocean straddles its entire southern coast line.
In the light of Y haplogroup frequencies which used to perform the statistical
analysis and allow us to interpret the origin of Pakistani populations.
BALTI:
The Balti reside in eastern Baltistan in northern Pakistan, and there are
approximately 300,000 Balti speakers in Pakistan. Their language (Balti) is a Sino-
Tibetan language and they are thought to have originated in Tibet. However, not all
Balti speakers that are found in Pakistan are from Tibetan stock. With the passage
of time many other populations that entered their territory, such as the Shins, Arabs,
Persian and Turks gradually mixed with the Balti people. Although this study
analyzed only a few unrelated Balti samples yet they did not observe Y lineages
commonly found in Tibet. Clade D* which is present at high frequency in the Tibetan
population was not observed in the Balti (Table VI). The results were consistent with
the earlier study carried out by Qamar et al., (2002).
HAZARA:
The Hazara population, which is ethnically related to their brethren in
neighbouring Afghanistan, stand out on the basis of their Y haplogroup frequencies.
Hazara individuals have typical Mongolian features and they claim to be descendants
of Genghis Khans army. Their name is derived from the Persian word hazar
111
meaning thousand, because troops were left behind in detachments of a thousand
(Qamar et al., 2002). An earlier study done on a limited number of samples (n = 33)
has shown them to be closer to populations in Mongolia (Qamar et al., 2002) and the
star Y-STR haplotype (Figure XVI) observed in this population suggested that they
were direct descendants of Genghis Khan (Zerjal et al., 2003).
The present study analyzed a much larger population sample (n =224) from a
wider geographical area in Pakistan. The earlier samples were collected from NWFP
and the additional samples were from Quetta, Baluchistan. Two haplogroups
predominated in this population, i.e. haplogroup R* (21%) and haplogroup C* (64%)
(Table VI). Haplogroup R* is also present at high frequency in other ethnic groups of
Pakistan (53.5%, when the Hazara are excluded). However, haplogroup C* is rare in
other Pakistani populations. It is present at a frequency of 1.3%, when the Hazaras
are excluded. This haplogroup is fairly common in Central Asia and Mongolia and
points towards the Mongol origins of the Hazara population (Figure XXI).
BURUSHO:
The Burusho, who speak Burushaski, are of particular genetic, linguistic and
anthropological interest. Their language is one of the few remaining language
isolates in the world (Dani, 1991; Grimes, 1992). Approximately 60,000 Burusho are
estimated to reside in present day Pakistan. The samples used here were collected
from the valleys of the Karakorum Mountains in Hunza, Nagar and Yasin. The origin
of Burusho is not well documented. Some claim they are descendants of four
generals in Alexanders army (Dani, 1989). Others believe them to be Dardics from
Central Asia, or nomads from Pamir, who migrated to this area, and displaced the
original inhabitants (Biddulph, 1977).
Studies with the autosomal (Ayub et al., 2003; Mansoor et al., 2004) and Y
chromosomal markers (Firasat et al., 2007) suggest that the Burusho have the same
112
genetic makeup as their geographical neighbours in Pakistan. Preliminary study by
Wells et al. using a limited number of Y markers showed that the Hunza Burusho
clustered with populations from Tajikistan (Wells et al., 2001) but found no such
evidence using a larger number of markers. The high frequencies of Central Asian
haplogroup C* chromosomes in the Burusho and Hazara indicate that these arose in
Central Asia before the separation of these two Pakistani populations (Mohuuddin et
al., 2006). There is also no evidence of genetic relatedness with the Greek.
Haplogroup C* is absent in Greeks (Francalacci et al., 2003; Rootsi et al., 2004), and
haplogroup E* which is common in Greece is absent in the Burusho (Figure IX).
Although they share R1a1* hapologroups but the branch derived from R1a1* that
was observed in 2 burusho individuals points towards a long separation, based on
microsatellite variation.
KALASH:
The Kalash have been isolated for centuries in the Hindu Kush mountain
ranges of northern Pakistan. Their language, Kalasha, belongs to the Dardic group of
Indo-European languages. They are around 3000-6000 in present day Pakistan.
Oral traditions ascribe their origins to a mythical place called Tsiam, which some
claim refers to Syria (Decker, 1992). Various scholars have attributed their origins to
the remnants of Alexanders army (Robertson, 1896). The lack of clade E*
chromosomes, which are present at a relatively high frequency in the Greek
population (Francalacci et al., 2003; Hammer et al., 2001) and the presence of clade
H* (20%) and L3a (23%) make the Kalash distinct from the Greeks (Firasat et al.,
2007). However, the presence of high frequency of haplogroup R* (27%) indicates
that they have a predominantly European component and their possible origin is
described in Figure XXI. Study of maternal (mitochondrial) (Schurr et al., 2000),
paternal (Y chromosome SNP and STR) (Qamar et al., 2002) and autosomal STR
(Mansoor et al., 2004) has also demonstrated their greater affinity with European
113
populations. In the principal component analyses based on haplogroup frequencies,
the Kalsah are distinct from the other ethnic groups of Pakistan (Figure X). The
presence of a unique Y haplogroup (L3a) observed only in this population suggests
genetic drift (L3a) in this population. The timing of their isolation can be better
studied by analyzing populations from Nuristan, Afghanistan from where they are
thought to have migrated to settle in Chitral District in northern Pakistan. The
median-joining network for H1*-M52 (Figure XX) which is present at appreciable
frequencies in the Burusho, Kalash and the Pathan based on 16 Y-STRs also shows
a high degree of Kalash specific substructure. Except for one individual all the
Kalash samples fall in one cluster. From the network it appears that H1*-M52 spread
to neighboring northern populations. Taken together these results suggest that the
high frequency of unique population specific SNPs and haplogroups in this group are
probably due to genetic drift in a population that has been isolated for centuries in the
Hindu Kush Mountains.
PATHAN:
The last of the northern population with claims to Greek origins, the Pathans, occupy
vast tracts of land in Pakistan and neighbouring Afghanistan. In Pakistan the vast
majority of Pathans reside in the NWFP and Baluchistan province of Pakistan. The
provincial metropolis of Peshawar (NWFP) and Quetta (Baluchistan) have large
Pathan populations and are the important centers of Pathan in Pakistan. According
to the Population Census Organization, Government of Pakistan retrieved 7 June
2006 (http://www.newworldencyclopedia.org/entry/Pashtun_people) and Census of
Afghans in Pakistan, (UNHCR Statistical Summary Report http://www.unhcr.org/cgi-
bin/texis/vtx/home/opendoc.pdf ) the Pathan population constitutes 15% of the
population of present day Pakistan. Their language, Pashtu, is classified under the
Indo-Iranian branch of the Indo-European languages and linguistically they are
classified as an Iranian people (Nicholas and Asmatullah, 2007). Folklore legends
114
Figure XX: Median-joining network H1*-M52 lineage fall in Burusho, Kalash
and Pathan, based on their Y-STR haplotype.
115
claim that either they are of Jewish origin (Ahmed, 1952) or are descendants of
Alexanders army (Bellew, 1998).
In present study, the presence of small amount of haplogroup E1b1b1a*
chromosome that is present with large amount in Greek (Figure IX) provide an
evidence of a small Greek contribution to the Pathan gene pool that will likely require
further investigation in order to ascertain its pervasiveness (Firasat et al., 2007).
However, earlier studies carried out by Quintana-Murci (2004) and Mansoor (2004)
using mitochondrial DNA and STR markers demonstrated that the Pathans are
mainly related to the Iranians and their geographic neighbors in northern Pakistan.
PARSI:
The origins of the Parsi are well-documented and there are only a few
thousand Parsi inhabitants in Pakistan now. These followers of the Persian Prophet
Zoroaster (http://www.ozemzil.com.au/~Zarathus/Zor33.html) migrated to India after
the collapse of the Sassanian Empire in the 7th century A.D. and settled in the
northwest Indian province of Gujarat in 900 A.D. where they were called the Parsi
___ meaning from Iran. Eventually they moved to Mumbai in India and Karachi in
Pakistan, from where the present population was sampled (Figure XXI). They speak
indo-European language.
The earlier study of their Y chromosomes (Qamar et al., 2002) showed that
the Parsis are genetically closer to Iranians than to their neighbors in Pakistan. In
this study, 39% of the Parsis sampled belonged to haplogroup J* (Table VI). This is
similar to the frequency of this haplogroup (40%) in the present day Iranian
population (Qamar et al., 2002). Surprisingly based upon their mitochondrial DNA
variation the Parsis were genetically close to Gujrati population of India (Quintana-
Murci et al., 2004) rather than to the Iranians, indicating a loss of mitochondrial DNA
of Iranian origin mainly due to their admixture with the local population in India after
116
their seventh century migration.
BALOCH:
Balochis are affiliated with the Iranian Baloch tribes across the south West
border with Iran and these people speak the language Balochi an Indo-Aryan
language (Grimes, 1992). Currently around 8 million Balochis live in Pakistan.
Researchers are unsure of their origins. Some scholars believe that they belong to
the northern regions of Elburz, a mountain range in North Iran, whereas others claim
they came from Aleppo in Syria or Mesopotamia.
Y data analysis demonstrate that Syrians and Iranian people are
characterized by the presence of low frequency of haplogroup R* (9-26%) and high
frequency of haplogroup J* (35-57%) (Hammer et al., 2000), which is converse to the
frequency distribution of these haplogroups in the Baloch. Approximately (29%) of
the Baloch Y chromosomes carry the haplogroup R* and only 9% carry haplogroup
J* (Table VI). These results support the earlier observation (Qamar et al., 2002) that
used a limited number of Y markers. HLA data supports genetic relatedness among
the Baloch tribes of Iran and Pakistan (Farjadian et al., 2004). In worldwide surveys
of HGDP-CEPH HGDP cell line panels, the Baloch are closely related to their
geographic neighbours and share the same branch as populations from the Middle
East and West Eurasia (Jakobsson, 2008; Li, 2008).
BRAHUI:
Brahui people are found in the central region of Balochistan province of
Pakistan. About 1.5 million Brahuis reside the Sarawan and Jhalawan region of Kalat
state, Baluchistan (Hughes-Buller, 1991). They speak Brahui language that belongs
to the Dravidian language family (Grimes, 1992). Dravidians are found mostly in
southern India, Sri Lanka, Bangladesh, Pakistan, Afghanistan and Iran. Dravidians
are supposedly Indian in origin (Fuller, 2003). However, according to proto-Elamo-
117
Dravidian hypothesis, they originated in the Iranian province of Elam and were once
spread over a much larger area, including Iran, Pakistan, Afghanistan and all India
(McAlpin, 1974, 1981). According to some historical traditions, Brahuis are the
descendants of western Asian people (McAlpin, 1974, 1981) such as, Turko-Iranian
tribe and Scythians (Hughes-Buller, 1991). Some historians also claim that they
have the same origins as that of Baloch (Hughes-Buller, 1991; Quddus, 1990).
Brahuis are widely suggested to be remnants of a formerly widespread Dravidian
entered in South Asia with the expansion of Dravidian speaking farmers (Quintana-
Murci, 2001).
In order to detect its true origin a set of 117 Y Brahuis chromosome were
analyzed. The result of present study was compared with neighboring populations.
The presence of two Y chromosomal haplogroups, haplogroup J* and haplogroup L*
(Table VI) reveal the movement of population from west Asia to south Asia and from
India to Pakistan respectively.
The highest frequency of haplogroup J* is found in Iranian populations (30-
60%: Quintana-Murci et al., 2001), and in the Fertile Crescent region that includes,
Palestinians (51%), Lebanese 46% and Syrians 57% (Hammer et al., 2000). These
results indicate that the haplogroup J* originated in west Asia and from there they
spread to South Asia. The presence of high frequency of haplogroup J* in Brahui
(26.5%) also confirmed these observations. The major movement of population from
west Asia to south Asia is correlated with the expansion of farming economy that
started between 6th and 5th millennia B.C. from Iran to Indo-Pak subcontinent. After
this, the other major development was the expansion of domesticated animals by the
pastoral nomadic. Probably the expansion of haplogroup J* has been associated with
the dispersal of farmers and pastoral nomadic (Dravidian) in southern Asia (Cavalli-
Sforza, 1988; Renfrew, 1987). However, Sengupta et al., 2006 suggests the origin of
Dravidian is in India. They deduced by the presence of indigenous haplogroup L1-
M76 (M-27) in Dravidian speakers (7.5% in India). The 6% Brahui Y chromosome
118
carry L1-M76 haplogroup provides an idea that Brahui could migrated to Baluchistan
from India. It is also proved by the mean microsatellite variance which is higher in
India (0.35) than in Pakistan (0.19) (Sengupta et al., 2006).
MAKRANI NEGROID:
The Negroid Makrani has African physical traits, reside along the southern
Makran coastal region of Pakistan and speak an Indo-Eurpeon language. It has
been speculated that they represent migrants from Africa (Figure XXI) but the timing
of this migration is uncertain (Ansari, 1996). Although they do have low frequency of
sub-Saharan African haplogroups such as E1b1a* they also exhibit a sizeable
proportion of L*, J* and R*. L* haplogroup are mostly restricted to the Indo-Pak
subcontinent and haplogroups J* and L* to Eurasia. The contribution of African Y
chromosome to this population was estimated to be approximately 12% (Qamar et
al., 2002) and mitochondrial DNA data supported these results. This data alongwith
their history as remnants of the east African slave trade indicated that they were
probably recent settlers (Quintana-Murci et al., 2004).
119
Figure XXI: Possible origins a) Hazara b) Kalash c) Parsi d) Makrani
Negroid
MONGOLIA
West Eurasian Y lineages
Y Lineage C from East Asia
Origins: Hazara
Origins:
Kalash
a b
Y lineages from West Asia (Iran) Y lineages from sub-Saharan Africa
Iran
Origins:Gujrat
Parsi
Mumbai Origins: Makrani
Negroid
c d
adapted from Mehdi, S.Q. 2007
120
CONCLUSIONS:
The molecular analysis of the human genome is providing a better
understanding of human ancestry and diversity from both the maternal and paternal
perspective. The evolutionary antiquity of Pakistani populations and the subsequent
migration from west Asia, Europe and to a less extent from East Asia has resulted in
a rich tapestry of socio-cultural, linguistic and biological diversity. This study provides
the report on the diversity in Pakistani population on the basis of haplogroup
frequencies. It provides insight into the genetic relationship of the Pakistani
population with respect to each other as well as the other world population. These
studies will serve as a background for epidemiological work in different populations of
the world. The genetic makeup of a population determines the differences in
incidence and prognosis of various diseases across different populations. The study
will provide major insights where a patients origin will be useful in determining the
predisposition to various diseases. The knowledge of a populations genetic
composition will also be helpful in eliminating any spurious risk factors for different
diseases. Furthermore, apart from the inherited diseases, the study will be of
immense medical importance in understanding susceptibility and resistance to
infectious diseases as well as the efficacy of drug treatment, heralding the era of
genomic medicine.
121
REFERENCES
-7-
Ahmad AKN. (1952). Jesus in heaven on earth. The Civil and Military Gazette Ltd,
Lahore, Pakistan.
Aitman TJ, Dong R, Vyse TJ, Norsworthy PJ, Johnson MD, Smith J, Mangion J,
Roberton-Lowe C, Marshall AJ, Petretto E, Hodges MD, Bhangal G, Patel SG,
Sheehan-Rooney K, Duda M, Cook PR, Evans DJ, Domin J, Flint J, Boyle JJ, Pusey
CD and Cook HT.(2006). Copy number polymorphism in Fcgr3 predisposes to
glomerulonephritis in rats and humans. Nature. 439:851-855.
Al-Zahery N, Semino O, Benuzzi G, Magri C, Passarino G, Torroni A, Santachiara-

Benerecetti AS. (2003). Y-chromosome and mtDNA polymorphisms in Iraq, a
crossroad of the early human dispersal and of post-Neolithic migrations. Mol
Phylogenet Evol. 28:458-472.
Anderson S, Bankier AT, Barrell BG, De Bruijn MHL, Coulson AR, Drouin J, Eperon
IC, Nierlich DP, Roe B A, Sanger F, Schreier PH, Smith AJH, Staden R and Young
IG.(1981). Sequence and organization of the human mitochondrial genome. Nature.
290: 457-465.
Ansari SSA.(1996). The Afghan or Pathans. In: The Musalman races found in
Sindh, Baluchistan and Afghanistan. Indus publications, Karachi.pp9-16.
Ayub Q, Mansoor A, Ismail M, Khaliq S, Mohyuddin A, Hameed A, Mazhar K,

Rehman S, Siddiqi S, Papaioannou M, Piazza A, Cavalli-Sforza LL and Mehdi SQ.
(2003). Reconstruction of human evolutionary tree using polymorphic autosomal
microsatellites. Am J Phys Anthropol.122:259-268.
Baird M, Balazs I, Giusti A, Miyazaki L, Nicholas L, Wexler K, Kanter E, Glassberg J,

Allen F, Rubinstein P, and Sussman L.(1986). Allele frequency distribution of two
highly polymorphic DNA sequences in three ethnic groups and its application to the
determination of paternity. Am J Hum Genet. 39:489-501.
Bandelt HJ, Forster P, SykesBC, and Richards MB.(1995). Mitochondrial Portraits

of Human Populations Using Median Networks. Genetics. 141: 743-753.
Bandelt HJ, Forster P and Rohl A.(1999). Median-joining networks for inferring
intraspecific phylogenies. Mol Biol Evol. 16: 37 48.
Barley J, Blackwood A, Miller M, Markandu ND, Carter ND, Jeffery S, Cappuccio FP,
MacGregor, GA and Sagnelle GA.(1996). Angiotensin converting enzyme gene I/D
polymorphism, blood pressure and the rennin-angitensin system in Caucasians and
Afro-Caribbean peoples. J Hum Hypertens. 10: 31-35.
Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, ChakrabortyM, Dey B, Roy

M, Roy B, Bhattacharyya NP, Roychoudhury S, Majumder PP.(2003). Ethnic India: a
genomic view, with special reference to peopling and structure. Genome Res.
13:22772290.
Batzer MA, Kilroy GE and Richard PE.(1990). Structure and variability of recent
inserted Alu family members. Nucleic acids Res. 18:6793-6798.
Batzer M A and Deininger PL.(1991). A human-specific subfamily of Alu sequences.

Genomics 9:481-487.
122
Batzer MA, Gudi VA, Mena JC, Foltz DW, Herrera RJ and Deininger PL.(1991).
Amplification dynamics of Human-specific (HS) Alu family members. Nucleic Acids
Res.19:3619-3623.
Batzer MA, Acrot SS, Phinney JW, Alegria-Hartman M, Kass DH, Milligan SM,
Kimpton C, Gill P, Hochmeister M, Ioannou PA, Herrera RJ, Boudreau DA, Scheer
WD, Keats BJ, Deininger PL, Stoneking M.(1996). Genetic variation of recent Alu
insertion in human populations. J mol Evol. 42:22-29.
Batzer MA and Deininger PL.(2002). Alu repeats and human genomic diversity. Nat
Rev Genet. 3:370-379.
Behar DM, Garrigan D, Kaplan ME, Mobasher Z, Rosengarten D, Karafet TM,

Quintana-Murci L, Ostrer H, Skorecki K, and Hammer MF.(2004). Contrasting
patterns of Y chromosome variation in Ashkenazi Jewish and host non-Jewish
European populations. Hum. Genet. 114: 354365.
Bellew HW.(1979). The races of Afghanistan. Sang-e-Meel Publications, Lahore,

Pakistan.
Bellew HW. (1998). An enquiry into the ethnography of Afghanistan. Vanguard

Books, Lahore, Pakistan.
Biddulph J.(1977). Tribes of the Hindoo Koosh. Karachi, Pakistan:

IndusPublications.
Birnboim HC and Straus NA.(1975). DNA from Eukaryotic cells contain unusually
long pyrimidine sequences. Can J Biochem. 53:640-643.
Bowcock A M, Kidd J, Moutain JL, Hebert JM, Carotennuto L, Kidd KK and Cavalli-
Sforza LL.(1991). Drift, admixture, and selection in human evolution: a study with
DNA polymorphisms. Proc Natl Acad Sci. USA 88: 839-843.
Bowcock A M, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR and Cavalli-Sforza

LL.(1994). High resolution of human evolutionary trees with polymorphic
microsatellites. Nature. 368:455-457.
Boyum A.(1968). Seperation of lymphocytes and erythrocytes by centrifugation.

Scand.J Clin Lab Invest. 21:( Supplement 97), pp91.
Brook JD, McCurrach ME, Harley HG, BucklerA J, Church D, Aburatani H, Hunter K,
Stanton VP, Thirion JP, Hudson T, Sohn R, Zemelman B, Snell RG, Rundle SA,
Crow S, Davies J, Shelbourne P, Buxton J, Jones C, Juvonen V, Johnson K, Harper
PS, ShawDJ, and Housman DE.(1992). Molecular basis of myotonic dystrophy:
expansion of trinucleotide (CTG|) repeat at the3 end of the transcript encoding a
protein kinase family member. Cell. 68:799-808.
Brooks MB, Gu W, Barnas JL, Ray J and Ray KA.(2003). Line 1 insertion in the
Factor IX gene segregates with mild hemophilia B in dogs. Mamm Genome. 14:788-
795.
Brown P, Sutikna T, Morwood MJ, Soejono RP, Jatmiko, Saptomo EW, Due RA.
(2004). A new small-bodied hominin from the Late Pleistocene of Flores, Indonesia.
Nature. 431:1055-1061.
123
Brown WM, George M Jr and Wioson AC.(1979). Rapid evolution of animal
mitochondrial DNA. Proc Natl Acad Sci. USA 76:1967-1971.
Budowle B, Moretti TR, Niezgoda SJ and Brown BL. (1998). CODIS and PCR-
based short tandem repeat loci: Law enforcement tools. In: Second European
Symposium on Human Identification 1998, Promega Corporation, Madison,
Wisconsin pp 73-88.
Budowle B and Chakraborty R.( 2001). Population variation at the CODIS core
short tandem repeat loci in Europeans. Leg Med (Tokyo) 3:29-33.
Cann RL, Stoneking M and Wilson AC.(1987). Mitochondrial DNA and human
evolution. Nature. 325: 31-36.
Capelli C, Wilson JF, Richards M, Stumpf MP, Gratrix F, Oppenheimer S, Underhill

P, Pascali VL Ko TM and Goldstein DB.(2001). A predominantly indigenous
paternal heritage for the Austronesian-speaking peoples of insular Southeast Asia
and Oceania. Am J Hum Genet. 68: 432-443.
Cappuzzo F, Toschi L, Domenichini I, Bartolini S, Ceresoli GL, Rossi E, Ludovini V,

Cancellieri A, Magrini E, Bemis L, Franklin WA, Crino L, Bunn PA Jr, Hirsch FR,
Varella-Garcia M.(2005). HER3 genomic gain and sensitivity to gefitinib in advanced
non-small-cell lung cancer patients. Br J Cancer. 93:1334-40.
Caroe O.(1958). The Pathans. Karachi, Pakistan: Oxford University Press.
Carter NP.(2007). Methods and strategies for analyzing copy number variation using
DNA microarrays. Nat. Genet. 39: Suppl: S16-S21.
Casanova M, Leroy P, Boucekkine C, Weissenbach J, Bishop C, Fellous M, Purrello

M, Fiori G and Siniscalco M.(1985). A human Y-linked DNA polymorphism and its
potential for estimating genetic and evolutionary distance. Science. 230:1403-1406.
Cavalli-Sforza LL.(1988). The Basque population and ancient migrations in Europe.

Munibe. 6:129-137.
Cavalli-Sforza LL, MenozziP and Piazza A.(1994). The History and Geography of
Human Genes. Princeton University Press, Priceton.
Cavalli-Sforza LL.(2005). The Human Genome Diversity Project: past, present and
future. Nat Rev Genet. 6:333-40.
Chakraborty R, Kimmel M, Stivers DN, Deka R and Davison LJ.(1997). Relative

mutation rates at di-,tri-, and tetra- nucleotide microsatellite loci. Proc Natl Acad. Sci.
USA 94:1041-1046.
Cinniolu C, King R, Kivisild T, Kalfolu E, Atasoy S, Cavalleri GL, Lillie AS,

Roseman CC, Lin AA, Prince K, Oefner PJ, Shen P, Semino O, Cavalli-Sforza LL
and Underhill PA.(2004). Excavating Y-chromosome haplotype strata in Anatolia.
Hum Genet.114: 127148.
Collins DW and Jukes TH.(1994). Rates of transition and transversion in coding

sequences since the human- rodent divergence. Genomics. 20: 386-396.
124
Cooper DN and Krawczak M.(1995). An introduction to the structure, function and
expression of human genes. In: Human gene mutation. Bios Scientific Publishers
Limited. UK. pp 19-48.
Cooper DN, Krawczak M, Antonorakis SE.(2000). The nature and mechanisms of

human gene mutation. In: The Metabolic and Molecular Bases of Inherited Disease,
Vol. 1, 8th Edn (eds Scriver CR, Beaudet AL, Sly WS, Valle D). Mc Graw-Hill, New
York.
Cordaux R, Weiss G, Saha N, Stoneking M.(2004). The northeast Indian

passageway: a barrier or corridor for human migrations? Mol Biol Evol .21:1525-
1533.
Cost GJ and Boake JD.(1998). Targeting of human retrotransposons integration is

directed by the specificity of the L1 endonuclease for regions of unusual DNA
structure. Biochemistry 37:18081-18093.
Cruciani F, Santolamazza P, Shen P, Macaulay V, Moral P, Olckers A, Modiano D,

Holmes S, Destro-Bisol G, Coia V, Wallace DC, Oefner PJ, Torroni A, Cavalli-Sforza
LL, Scozzari R and Underhill PA.(2002). A back migration from Asia to sub-Saharan
Africa is supported by high-resolution analysis of human Y-chromosome haplotypes.
Am J Hum Genet. 70: 11971214.
Cruciani F, La Fratta R, Santolamazza P, Sellitto D, Pascone R, Moral P, Watson E,

Guida V, Colomb EB, Zaharova B, Lavinha J, Vona G, Aman R, Cal` F, Akar N,
Richards M, Torroni A, Novelletto A, Scozzari R.(2004). Phylogeographic analysis of
haplogroup E3b (E-M215) Y chromosomes reveal multiple migratory events within
and out of Africa. Am J Hum Genet. 74:10141022.
Cruciani F, La Fratta R, Torroni A Underhill PA, Scozzari R.(2006). Molecular

dissection of the Y chromosome haplogroup E-M78 (E3b1a): a posteriori evaluation
of a microsatellite-network-based approach through six new biallelic markers. Hum.
Mutation 2006; 27: 831 832.
Csink AK and Henikoff S.(1998). Some thing from nothing: the evolution and utility
of satellite repeats. Trends Genet.14: 200-204.
Dani AH.(1989). Early history the early inhabitants. In:History of Northern Areas of
Pakistan. National Institute of Historical and Culture Research, Islamabad, Pakistan.
pp110-157.
Dani AH.(1991). History of Northern Areas of Pakistan. National Institute of

Historical and Culture Research, Islamabad, Pakistan.
Dausset J.(1954). Leuko-agglutinins IV: Leuko agglutinins and blood transfusion.

Vox Sanguinis 4: 190.
Decker KD.(1992). Sociolinguistic survey of Northern Pakistan. Vol 5, Languages of

Chitral. National Institute of Pakistan Studies, Islamabad.
Deininger PL and Daniels GR.(1986). The recent evolution of mammalian repetitive

elements. Trend Genet. 2:76-80.
Deininger PL and Slagel VK. (1988). Recently amplified Alu family members share
a common parental Alu sequences. Mol. Cell Biol. 8:4566-4569.
125
Deininger PL, Batzer MA, Hutchinson III CA and Edgell MH. (1992). Master genes
in mammalian repetitive DNA amplification. Trend Genet. 8:307-312.
Deininger PL, Sherry ST, Risch G, Donaldson C, Robichaux MB, Soodyall H,

Jenkins T, Sheen F-M, Swergold G, Stoneking M, Batzer MA.(1999). Interspersed
repeat insertion polymorphisms for studies of human molecular anthropology. In:
Genomic Diversity, Application in Human population Genetics. (eds Papiha SS, Deka
R, Chakraborty R). Kluwer Academic / Plenum Publishers. New York, Boston,
Dordrecht, London, Moscow.
de Knijff P.(2000). Message through bottle necks: On the combined use of slow
and fast evolving polymorphic markers on the human Y chromosome. Am J Hum
Genet. 67:1055-1061.
Dietrich W, Katz H, Lincoln SE, Shin H-S, Friedman J, Dracopoli NC and Lander
ES.(1992). A genetic map of mouse suitable for intra specific crosses. Genetics
131:423-447.
Di Giacomo F, Luca F, Anagnou N, Ciavarella G, Corbo RM, Cresta M, Cucci F, Di

Stasi L, Agostiano V, Giparaki M,Loutradis A, Mammi C, Michalodimitrakis EN,
Papola F, Pedicini G, Plata E, Terrenato L, Tofanelli S, Malaspina P,Novelletto A.
(2003). Clinal patterns of humanYchromosomal diversity in continental Italy and
Greece are dominated bydrift and founder effects. Mol Phylogenet Evol. 28:387
395.
Di Giacomo F, Luca F, Popa LO, Akar N, Anagnou N, Banyko J, Brdicka R,

Barbujani G, Papola F, Ciavarella G, Cucci F, Di Stasi L, Gavrila L, Kerimova MG,
Kovatchev D, Kozlov AI, Loutradis A, Mandarino V, Mammi' C, Michalodimitrakis EN,
Paoli G, Pappa KI, Pedicini G, Terrenato L, Tofanelli S, Malaspina P, Novelletto
A.(2004). Y chromosomal haplogroup J as a signature of the post-neolithic
colonization of Europe. Hum Genet. 115:357-371.
Di Rienzo A, Peterson AC, Garza JC, Valdes AM, Saltkin M and Freimer NB. (1994).
Mutational process of simple-sequence repeat loci in human populations. Proc Natl
Acad. Sci. 91:3166-3170.
Dong SL, Wang E, Hsie L, Cao YX, Chen XG, Gingeras TR.(2001). Flexible use of
high density oligonucleotide arrays for single nucleotide polymorphism discovery and
validation. Genome Res. 11:1418-1424.
Duru K, Farrow S, Wang JM, Lockette W and Kurtz T. (1994). Frequency of a

deletion polymorphism in the gene for angiotensin converting enzyme is increased in
African Americans with hypertension. Am J Hypertens. 7:759-762.
Edwards A, Civitello A, Hammond HA and caskey CT.(1991). DNA typing and

genetic mapping with trimeric and tetrameric tandem repeats. Am J Hum Genet.
49:746-756.
Engels DW.(1981). Alexander the Great and the logistics of the Macedonian Army.
Berkeley, CA: University of California Press.
Epplen JT, Mc Carrey JR, Sutou S and Ohno S.(1982). Base sequences of a cloned
snake W-chromosome DNA fragment and identification of a male putative mRNA in
the mouse. Proc Natl Acad. Sci. USA 79:3798-3802.
126
Farjadian S, Naruse T, Kawata H, Ghaderi A, Bahram S, Inoko H.(2004).
Molecular analysis of HLA allele frequencies and haplotypes in Baloch of Iran
compared with related populations of Pakistan. Tissue Antigens. 6:581-587.
Feng Q, Moran JV, Kazazian HHJr and Boeke JD.(1996). Human L1 retrotransposon
encodes a conserved endonuclease required for retrotransposition. Cell 87:905-916.
Feuk L, MacDonald JR, Tang T, Carson AR, Li M, Rao G, Khaja R, Scherer SW.
(2005). Discovery of human inversion polymorphisms by comparative analysis of
human and chimpanzee DNA sequence assemblies. PLoS Genet. 1: 489498.
Feuk L, Carson AR and Scherer SW.(2006). Structural variation in the human

genome. Nature Reviews Genetics 7:85-97.
Firasat S, Khaliq S, Mohyuddin A, Papaioannou M, Tyler-Smith C, Underhill PA,

Ayub Q.(2007). Y-chromosomal evidence for a limited Greek contribution to the
Pathan population of Pakistan. Eur J Hum Genet.15:121-126.
Fisher EM, Beer-Romero P, Brown LG, Ridley A, McNeil JA, Lawrence JB, Willard
HF, Bieber FR, Page DC.(1990). Homologous ribosomal protein genes on the human
X and Y chromosomes: escape from X inactivation and possible implications for
Turner syndrome. Cell 63:1205-1218.
Flores C, Maca-Meyer N, Larruga JM, Cabrera VM, Karadsheh N, Gonzalez AM.

(2005). Isolates in a corridor of migrations: a high-resolution analysis of Y
chromosome variation in Jordan. J Hum Genet. 50: 435-441.
Francalacci P, Morelli L, Underhill PA, Lillie AS, Passarino G, Useli A, Madeddu R,

Paoli G, Tofanelli S, Cal CM, Ghiani ME, Varesi L, Memmi M, Vona G, Lin AA,
Oefner P, Cavalli-Sforza LL.(2003). Peopling of three Mediterranean islands
(Corsica, Sardinia, and Sicily) inferred by Y-chromosome biallelic variability. Am J
Phys Anthropol. 121:270-9.
Fuller D.(2003). An agricultural perspective on Dravidian historical linguistics:

archaeological crop packages, livestock and Dravidian crop vocabulary. In: Bellwood
P, Renfrew C (eds). Examining the farming/language dispersal hypothesis.
McDonald Institute for Archaeological Research, Cambridge, United Kingdom,
pp191-213.
Fu Y-H, Kuhl DPA, Pizzuti A, Pieretti M, Sutcliffe JS, Richards S, Verkerk AJM,
Holden JH, Fenwick RG, Warren ST, Oostra BA, Nelson DL and Caskey CT. (1991).
Variation of the CGG repeats at the fragile X site results in the genetic
instability:resolution of the Sherman paradox. Cell. 67:1047-1058.
Fu Y-H, Pizzuti A, Fenwick RGJr, King J, Rajnarayan S, Dunne PW, Dubel J, Nasser
GA, Ashizawa T, de Jong P, Wieringa B, Korneluk R, Perryman MB, Epstein HF, and
Caskey CT.(1992). An unstable triplet repeat in a gene related to myotonic muscular
dystrophy. Science. 255:1256-1258.
Gabunia L and Vekua A.(1995). A Plio-pleistocene hominid from Dmanisi, East

Georgia, Caucasus. Nature. 373: 509-512.
Gayden T , Cadenas AM, Regueiro M, Singh NB, Zhivotovsky L A, Underhill PA,

Cavalli-Sforza LL and Herrera RJ.(2007). The Himalayas as a Directional Barrier to
Gene Flow. Am J Hum Genet. 80:884-894.
127
Gilbert N, Lutz-Prigge S and Moran J V.(2002). Genomic deletions created upon
LINE-1 retrotransposition. Cell 110:315-325.
Giles RE, Blanc H, cann HM and Wallace DC.(1980). Maternal inheritence of human
mitochondrial DNA. Proc Natl Acad. Sci. USA 77:6715-6719.
Gill P, Ivanov PL, Kimpton C, Piercy R, Benson N, Tully G, Evett I, Hagelberg E and
Sullivan K.(1994). Identification of the remains of the Romanov family by DNA
analysis. Nat Genet. 6:130-135
Goodier JL, Ostertag EM, Du K and Kazazian HH Jr.(2001). A novel active L1

retrotransposon subfamily in the mouse. Genome Res.11:1677-1685. Erratum in:
Genome Res 11:1968.
Gonzalez E, Kulkarni H, Bolivar H, Mangano A, Sanchez R, Catano G, Nibbs RJ,

Freedman BI, Quinones MP, Bamshad MJ, Murthy KK, Rovin BH, Bradley W, Clark
RA, Anderson SA, O'Connell RJ, Agan BK, Ahuja SS, Bologna B, Sen L, Dolan MJ
and Ahuja SK.(2005). The Influence of CCL3L1 Gene-Containing Segmental
Duplications on HIV-1/AIDS Susceptibility. Science. 307, 1434-1440.
Grimes BF.(1992). Ethnologue: Languages of the World, 12th ed., Summer

Institute of Linguistics, Inc., Dallas, Texas, USA.
Grimes B and Cooke H.(1998). Enginering mammalian chromosomes. Hum Mol

Genet. 7: 1635-1640.
Grubb R and Laurell AB.(1956). Hereditary serological human serum groups. Acta
Pathol Microbiol Scand. 39:390-398.
Hacia JG, Fan J-B, Ryder O, Jin L, Edgemon K, Ghandour G, Mayer RA, Bryan Sun,
Hsie L, Robbins CM, Brody LC, Wang D, Lander ES, Lipshutz R, Fodor SPA and
Collins FS.(1999). Determination of ancestral alleles for human singlenucleotide
polymorphisms using high-density oligonucleotide arrays. Nat Genet. 22: 164-167.
Hacia JG and Collins FS.(1999). Mutational analysis using oligonucleotide

microarrays. J Med Genet. 1999 36:730-736.
Hamada H and Kakunaga T.(1982). Potential Z-DNA forming sequences are highly
dispersed in the human genome. Nature. 298:396-398.
Hamada H, Petrino MG and Kakunaga T.(1982). A novel repeated element with Z-

DNA forming potential is widely found in evolutionary diverse eukaryotic genomes.
Proc Natl Acad. Sci. USA 79:6465-6469.
Hamada H, Seidman M, Howard BH and Gorman CM.(1984). Enhance gene

expression by poly (dT-dG). (dC-dA) sequence. Mol Cell Biol. 4:2622-2630.
Hammer MF, Spurdle AB, Karafet T, Bonner MR, Wood ET, Novelletto A, Malaspina
P, Mitchell RJ, Horai S, Jenkins T and Zegura SL.(1997). The geographic
distribution of human Y chromosome variation. Genetics.145:787-805.
Hammer MF, Karafet TM, Rasanayagam A, Wood ET, Altheide TK, Jenkins T,
Griffiths RC, Templeton AR and Zegura SL.(1998). Out of Africa and back again:
Nested cladistic analysis of human Y chromosome variation. Mol Biol Evol. 15: 427-
441.
128
Hammer MF, Redd AJ, Wood ET, Bonner MR, Jarjanazi H, Karafet T, Santachiara-
Benerecetti S, Oppenheim A, Jobling MA, JenkinsT, Ostrer H and Bonne-Tamir
B.(2000). Jewish and Middle Eastern non-Jewish populations share a common pool
of Y-chromosome biallelic haplotypes. Proc Natl Acad Sci. 97: 6769-6774.
Hammer MF, Karafet TM, Redd AJ, Jarjanazi H, Santachiara-Benerecetti S,

Soodyall H, and Zegura SL. (2001). Hierarchical patterns of global human Y-
chromosome diversity. Mol Biol Evol. 18:1189-1203.
Hammer MF, Karafet TM, Park H, Omoto K, Harihara S, Stoneking M, and Horai
S.(2006). Dual origins of the Japanese: Common ground for hunter-gatherer and
farmer Y chromosomes. J Hum Genet. 51:47-58.
Harris H. (1966). Enzyme polymorphism in man. Proc R Soc Lond B Biol Sci.
22:298-310.
Hearn CM, Ghosh S and Todd JA.(1992). Microsatellite for linkage analysis of
genetic traits. Trends Genet. 8: 288-294.
Henikoff S, Ahmed K and Malik HS.(2001). The centromere paradox: Stable

inheritance with rapidly evolving DNA. Science. 293: 1098-1102.
Hinds DA, Kloek AP, Jen M, Chen X and Frazer KA.(2006). Common deletions and
SNPs are in linkage disequilibrium in the human genome. Nat Genet. 38: 8285.
Hirszfeld L and Hirszfeld H.(1919). Serological differences between the blood of

different races: The results of researches on the Macedonian front. Lancet ii: 675-
679.
Horai S, Haysaka K, Kondo R, Tsugane K and Takahata N.(1995). Recent African

origin of modern humans revealed by complete sequences of hominoid mitochondrial
DNAs. Proc Natl Acad. Sci. USA 92: 532-536.
Hudjashov G, Kivisild T, Underhill PA, Endicott P, Sanchez JJ, Lin AA, Shen P,
Oefner P, Renfrew C, Villems R and Forster P.(2007). Revealing the prehistoric
settlement of Australia by Y chromosome and mtDNA analysis. Proc Natl Acad. Sci.
104:87268730.
Hughes-Buller R.(1991). Imperial Gazetteer of India, Provincial Series Balochistan,

Sange-Meel publication, Lahore, Pakistan. Pp 89-91.
Hurles ME, Nicholson J, Bosch E, Renfrew C, Sykes BC and Jobling MA. (2002). Y
chromosomal evidence for the origins of Oceanic-speaking peoples. Genetics. 160:
289303.
Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW. and
Lee C.(2004). Detection of large-scale variation in the human genome. Nat Genet.
36: 949951.
Ibbetson D.(1883). Punjab Caste. Sang-e-Meel publications, Lahore. pp 9-16.
Immervoll T, Loesgen S, Dtsch G, Gohlke H , Herbon N, Klugbauer S , Dempfle A ,

Bickebller B , Becker-Follmann J, Rschendorf F, Saar K, Reis A , Wichmann H-E
and Wjst M.(2001). Fine mapping and single nucleotide polymorphism association
results of candidate genes for asthma and related phenotypes. Hum Mutat. 18:327-
336.
129
International HapMap Consortium.(2005). A haplotype map of the human
genome. Nature. 437:1299-1320.
International Human Genome Sequencing Consortium.(2001). Initial sequencing

and analyses of the human genome. Nature 409: 860-921.
International Human Genome Sequencing Consortium.(2004). Finishing the

euchromatic sequence of the human genome. Nature. 431:931-945.
Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA,
Degnan JH, Wang K, Guerreiro R, Bras JM, Schymick JC, Hernandez DG, Traynor
BJ, Simon-Sanchez J, Matarin M, Britton A, van de Leemput J, Rafferty I, Bucan M,
Cann HM, Hardy JA, Rosenberg NA, Singleton AB.(2008). Genotype, haplotype and
copy-number variation in worldwide human populations. Nature 451:998-1003.
Jarrige J.(1991). Mehrgarh: Its place in the development of ancient cultures in

Pakistan. In: Forgotten citis on the Indus Early Civilization in Pakistan from 8 th to 2nd
Millennium BC.(Eds.M, Jansen, M. Mulloyn and G Urban). Verlag Philipp von
Zabern, Mainz, Germany. 34-50.
Jefferys AJ, Wilson V and Thein SL.(1985). Individual- specific finger printsof
human DNA. Nature. 316:76-79.
Jeffreys AJ.(1987). Highly variable minisatellites and DNA fingerprints. Biochemical

Society Transactions 15:309-317.
Jeffery AJ, Royle V, Wilson V and Wong Z.(1988). Spontaneous mutation rate to
new length allele at tandem repetitive hypervariable loci in human DNA. Nature.
332:278-281.
Jeffreys AJ and Pena SD.(1993). Brief introduction to human DNA fingerprinting.

EXS. 67:1-20.
Jeng JR, Harn HJ, Jeng CY, Yueh KC and Shieh SM.(1997). Angiotensin I
converting enzyme gene polymorphism in Chinese patients with hypertension. Am J
Hypertens. 10: 558-561.
Jenkins S and Gibson N.(2002). High-throughput SNP genotyping. Funct

Genomics. 3:57-66.
Jobling MA and Tyler-Smith C. (2003). The human Y chromosome: An evolutionary

marker comes of age. Nat Rev Genet. 4: 598-612.
Jorde LB, Bamshad MJ, Watkins WS, Zenger R, Fraley AE, Krakowiak PA,
Carpenter KD, Soodyall H, Jenkins Tand Rogers AR.(1995). Origins and affinities of
modern human: a comparison of mitochondrial and nuclear genetic data. Am J Hum
Genet. 57: 523-538.
Jurka J.(1997). Sequence patterns indicate an enzymatic involvement in integration

of mammalian retroposons. Proc Natl Acad Sci. USA 94:1872-1877.
Kajikawa M and Okada N.(2002). LINEs mobilize SINEs in the eel through a shared
3` sequence. Cell 111:433-444.
130
Kan YW and Dozy AM.(1978). Polymorphism of DNA sequence adjacent to human
globin structural gene: relation ship to sickle mutation. Proc Natl Acad. Sci. USA
75:5631-5635.
Kapitonov V and Jurka J.(1996). The age of Alu subfamilies. J Mol Evol. 42:59-65.
Karafet T, Xu L, Du R, Wang W, Feng S, Wells RS, Redd AJ, Zegura SL and

Hammer MF.(2001). Paternal population history of East Asia: Sources, patterns, and
micro evolutionary processes. Am J Hum Genet. 69: 615628.
Karafet TM, Osipova LP, Gubina MA, Posukh OL, Zegura SL, and Hammer MF.
(2002). High levels of Y-chromosome differentiation among native Siberian
populations and the genetic signature of a boreal hunter-gatherer way of life. Hum
Biol. 74: 761-789.
Karafet TM, Lansing JS, Redd AJ, Reznikova S, Watkins JC, Surata SP,
Arthawiguna WA, Mayer L, Bamshad M, Jorde LB, Hammer MF.(2005). Balinese Y-
chromosome perspective on the peopling of Indonesia: Genetic contributions from
pre-Neolithic hunter-gatherers, Austronesian farmers, and Indian traders. Hum Biol.
77: 93-114.
Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL and Hammer
MF.(2008). New binary polymorphisms reshape and increase resolution of the
human Y chromosomal haplogroup tree. Genome Res.185:830-838.
Kayser M, Roewer L, Hedman M, Henke L, Henke J, Brauer S, Kru ger K, Krawczak

M, Nagy M, Dobosz T, Szibor R, de Knijff P and Sajantila A.(2000). Characteristics
and frequency of germline mutations at microsatellite loci from the human Y
chromosome, as revealed by direct observation in father/son pairs. Am J Hum
Genet. 66:1580-1588.
Kayser M, Brauer S, Cordaux R, Casto A, Lao O, Zhivotovsky LA., Moyse-Faurie C,

Rutledge RB, Schiefenhoevel W, Gil,D, Lin AA, Underhill PA , Oefner PJ, Trent RJ
and Stoneking M.(2006). Melanesian and Asian origins of Polynesians: mtDNA and
Y chromosome gradients across the Pacific. Mol Biol Evol. 23: 2234-2244.
Kazazian HH Jr and Moran JV.(1998). The impect of L1 retrotransposons on the

human genome. Nat Genet. 19:19-24.
Kazazian HH Jr, Wong C, Youssoufian H, Scott AF, Phillips DG and Antonarakis

S.(1988). Haemophilia A resulting from denovo insertion of L1 sequences represents
a novel machanism for mutation in man. Nature (London) 332:164-166.
Ke Y, Su B, Song X, Lu D, Chen L, Li H, Qi C, Marzuki S, Deka R, Underhill P, Xiao

C, Shriver M, Lell J, Wallace D, Wells RS, Seielstad M, Oefner P, Zhu D, Jin J,
Huang W, Chakraborty R, Chen Z and Jin L.(2001). African origin of modern
humans in East Asia: A tale of 12,000 Y chromosomes. Science 292: 1151-1153.
Kimmel M and Chakraborty R.(1996). Measure of variation at DNA repeat loci under
a generalized stepwise mutation model. Theor Pop Biol. 50:345-367.
King R and Underhill PA.(2002). Congruent distribution of Neolithic painted pottery

and ceramic figurines with Y-chromosome lineages. Antiquity 76:707-714.
131
King TE, Bowden GR, Belaresque PL, Adams SM, Shanks ME and Jobling MA.
(2007). Thomas Jeffersons Y chromosome belongs to a rare European lineage. Am
J Phys Anthropol. 132: 583589.
Kivisild T, Rootsi S, Metspalu M, Mastana S, Kaldma K, Parik J, Metspalu E,

Adojaan M, Tolk HV, Stepanov V, Glge M, Usanga E, Papiha SS, Cinniolu C, King
R, Cavalli-Sforza L, Underhill PA, Villems R.(2003). The genetic heritage of the
earliest settlers persists both in Indian tribal and caste populations. Am J Hum Genet.
72: 313332.
Klein RG.(1989). The Human Career: Human Biological and Cultural Origin.
Chicago: Chicago University Press.
Knight A, Batzer MA, Stoneking M, Tiwari HK, Scheer WD, Herrera RJ, Deinninger
PL.(1996). DNA sequences of Alu elements indicatea recent replacement of the
human autosomal genetic complement. Proc Natl Acad. Sci. USA 93: 4360-4364.
Knight A, Underhill PA, Mortensen HM, Zhivotovsky LA, Lin AA, Henn BM, Louis D,
Ruhlen M, Mountain JL.(2003). African Y chromosome and mtDNA divergence
provides insight into the history of click languages. Curr Biol. 13:464-473.
Kongberg J R and Rykowski M C.(1988). Human genome organization: Alu, lines,

and the molecular structure of metaphase chromosome bands. Cell 53:391-400.
Koschinsky ML, Boffa MB, Nesheim ME, Zinman B, Hanley AJG, Harris SB, Cao H
and Hegele RA.(2001). Association of a single nucleotide polymorphism in CPB2
encoding the thrombin-activable fibrinolysis inhibitor (TAFI) with blood pressure. Clin
Genet. 60:345-349.
Kremer EJ, Pritchard M, Lynch M, Yu S, Holman K, Baker E, Warren ST,

Schlessinger D, Sutherland GR, and Richards RI.(1991). Mapping of DNA instability
at the fragile X to a trinucleotide repeat sequence p(CCG)n. Science 252:1711-1714.
Kruglyak S, Durrett RT, Schug MD, Aquadro CF.(1998). Equilibrium distributions of

microsatellite repeat length resulting from a balance between slippage events and
point mutations. Proc Natl Acad Sci U S A. 95:10774-10778.
Kruse PE Jr, and Patterson MK.(1973). Tissue Culture: Methods and application.
Academic Press, NewYork. pp16-17.
Labuda D, Sinnett D, Richer C, Deragon JM and Striker G.(1991). Evolution of

mouse B1 repeats: 7SL RNA folding pattern conserved. Mol Evol. 325:405-414.
Lahr MM and Foley RA.(1994). Multiple dispersals and modern human origins.
Evolutionary Anthropology. 3: 48-60.
Lahr MM and Foley RA.(1998). Towards a theory of modern human origins:

Geography, demography, and diversity in recent human evolution. Am J Phys
Anthropol. 41:137-176.
Lander E S, Linton L M, Birren B, Nusbaum C, Zody M C, Baldwin J, Devon K,

Dewar K, Doyle M, Fitzhugh W, Funke R, Gage D, Harris K, Heaford A, Howland J,
Kann L, Lehoczky J,LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP,
Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A,
Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers
J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A,
132
Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D,
Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A,
Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M,
Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra
MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl
MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS,
Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P,
Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher
E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley
KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL,
Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe
C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave
F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Rosenthal A, Platzer M,
Nyakatura G, Taudien S, Rump A, Yang HM, Yu J, Wang J, Huang GY, Gu J, Hood
L, Rowen L, Madan A, Qin SZ, Davis RW, Federspiel NA, Abola AP, Proctor MJ,
Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R,
Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M,
Schultz R, Roe BA, Chen F, Pan HQ, Ramser J, Lehrach H, Reinhardt R, McCombie
WR, de la Bastide M, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R,
Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge
CB, Cerutti L, Chen HC, Church D, Clamp M,Copley RR, Doerks T, Eddy SR, Eichler
EE, Furey TS, Galagan J, Gilbert JGR, Harmon C, Hayashizaki Y, Haussler D,
Hermjakob H, Hokamp K, Jang WH, Johnson LS, Jones TA, Kasif S, Kaspryzk A,
Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I; Kulp D, Lancet D, Lowe TM,
McLysaght A, Mikkelsen T, Moran JV, Mulder N,Pollara VJ, Ponting CP, Schuler G,
Schultz JR, Slater G, Smit AFA, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-
Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh
RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A,
Morgan MJ and Int Human Genome Sequencing Conso.(2001). Initial sequencing
and analysis of human genome. Nature 409: 860-921.
LandsteinerK.(1901). Uber agglutinationsersheimun normalen menschlichengen

Blutes Wein. Klin. Wschr. 14:1132-1134.
La Spada AR, Wilson AM, Lubahn DB, Harding AE and Fish beck KH.(1991).
Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy.
Nature. 352:77-79.
Leakey R.(1994). The origin of human kind. Basic Books, A Division of Harper
Colllins, New York.
Lichten MJ, Fox MS. (1983). Detection of non-homology containing heteroduplex

molecule. Nucleic Acid Res. 11:3959-3971.
Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM,
Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM.(2008). Worldwide human
relationships inferred from genome-wide patterns of variation. Science. 319:1100-4.
Li W-H, Gu Z, Wang H and Nekrutenko A.(2001). Evolutionary analyses of the

human genome. Nature. 409, 847-849.
Lines M.(1999). The Kalasha people of North-western Pakistan. Peshawar,

Pakistan: Emjay Books International.
133
Litt M and Luty JA.(1989). A hypervariable microsatellite revealed by in vitro
amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am J
Hum Genet. 44:397-401.
Lucotte G and Ngo NY.(1985). p49f, A highly polymorphic probe, that detects Taq1
RFLPs on the human Y chromosome. Nucleic Acids Res.13:8285.
Ludwing E, Comeli PS, Aderson JL, Marshall HW, Lalouel JM, and Ward RH.
(1995). Angiotensin-converting enzyme gene polymorphism is associated with
myocardial infarction but not with development of coronary stenosis. Circulation
91:2120-2124.
Luis JR, Rowold DJ, Regueiro M, Caeiro B, Cinniolu C, Roseman C, Underhill PA,
Cavalli-Sforza LL, Herrera RJ.(2004). The Levant versus the Horn of Africa:
evidence for bidirectional corridors of human migrations. Am J Hum Genet. 74:532-
44.
LuningPrak ET, Dodson AW, Farkash EA, Kazazian HHJr.(2003). Tracking an

embryonic L1 retrotransposition event. Proc Natl Acad Sci U S A. 100:1832-7.
Malik HS, Burke W D and Eickbush T H. (1999). The age and evolution of non-LTR
transposable elements. Mol Biol Evol .16:793-805.
Maniatis T, Fritsch EF and Sambrook J.(1982). Molecular cloning: A laboratory

manual. Cold Spring Harbor laboratory, Cold Spring Harbor. New York.
Mansoor A, Mazhar K, Khaliq S, Hameed A, Rehman S, Siddiqi S, Papaioannou M,

Cavalli-Sforza LL, Mehdi SQ, Ayub Q.(2004). Investigation of the Greek ancestry of
populations from northern Pakistan. Hum Genet.114:484-90.
Marri MKBB.(1985). Search lights on Baloch and Balochistan. 3rd Edition. Nisa
traders, Quetta, Pakistan.
Marshall A and Hodgson J.(1998). DNA chips: An array of possibilities. Nature

Biotechnology 16: 2731.
Mathias SL, Scott AF, Kazazian H H Jr, Boeke J D and Gabriel A.(1991). Reverse
transcriptase encoded by a human transposable element. Science. 254:1808-1810.
McAlpin DW.(1974). Towards proto-Elamo-Dravidian. Language. 50:89-101.
McAlpin DW.(1981). Proto-Elamo-Dravidian: the evidence and its implications.

Trans Am Phil Soc. 71:3-155.
McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, Dallaire S,
Gabriel SB, Lee C, Daly MJ, Altshuler DM and The International HapMap
Consortium.(2006). Common deletion polymorphisms in the human genome. Nat
Genet. 38: 8692.
Mc Clay JL, Sugden K, Koch HG, Higuchi S and Craig IW.(2002). High-throughput
single nucleotide polymorphisms genotyping by fluorescent competitive allele-specific
polymerase chain reaction (SNiPTag). Anal Biochem. 301:200-206.
Mehdi, SQ.(2007), "Genetics of Pakistani Populations in an Asian and Global

Context", in Cavalli-Sforza, L.L. and Feldman, M. (eds), Human Population Genetics:
134
Evolution and Variation , The Biomedical & Life Sciences Collection, Henry Stewart
Talks Ltd, London. (online at http://hstalks.com/bio).
Meselson M and Yucan R. (1968). DNA restriction enzyme from Ecoli. Nature
217:1110-1114.
Mhlanga MM and Malmberg L.(2001). Using Molecular Beacons to Detect Single-

Nucleotide Polymorphisms with Real-Time PCR. Methods. 25:463-471.
Miesfeld R, Krystal M and Arnheim N.(1981). A member of a new repeated

sequence family which is conserved throughout eucaryotic evolution is found
between the human and globin genes. Nucl. Acids Res. 9:5931-5948.
Mohyuddin A, Ayub Q, Underhill PA, Tyler-Smith C and Mehdi SQ.(2006).

Detection of novel Y SNPs provides further insights into Y chromosomal variation in
Pakistan. J Hum Genet. 51:375-378.
Morrish TA, Gilbert N, Myser JS, Vincent BJ, Stamato TD, Taccioli GE, Batzer M A
and Moran JV.(2002). DNA repair mediated by endonuclease-independent LINE-1
retrotransposition. Nat Genet. 31:159-165.
Mountain JL and Cavalli-Sforza LL.(1994). Inference of human evolution through

cladistic analysis of nuclear DNA restriction polymorphisms. Proc Natl Acad. Sci.
USA 91: 6515-6519.
Myers JS, Vincent BJ, Udall H, Watkins W S, Morrish T A, Kilroy G E, Swergold G D,

Henke J, Henke L, Moran J V, Jorde LB and Batzer MA.(2002). A comprehensive
analysis of recently integrated human Ta L1 elements. Am. J. Hum Genet. 71: 312-
326.
Nakamura Y, Leppert M, OConell P, Wolff R, Holm T, Culver M, martin C, Fujimoto

E, Hoff M, Kumlin E, and White R.(1987). Variable number of tandem repeat (VNTR)
markers from human gene mapping. Science. 235:1616-1622.
Nanavutty P.(1997). The Parsis. National Book Trust, New Delhi, India.
Nasidze I, Sarkisian T, Kerimov A and Stoneking M. (2003). Testing hypotheses of

language replacement in the Caucasus: evidence from the Y-chromosome. Hum
Genet. 112:255-261.
Nasidze I, Ling EYS, Quinque D, Dupanloup I, Cordaux R, Rychkov S, Naumova O,

Zhukova O, Sarraf-Zadegan N, Naderi GA, Asgary S, Sardas S, Farhud DD,
Sarkisian T, Asadov C, Kerimov A, Stoneking M.(2004). Mitochondrial DNA and Y-
chromosome variation in the Caucasus. Ann Hum Genet. 68: 205221.
Nebel A, Filon D, Brinkmann B, Majumder PP, Faerman M, Oppenheim A. (2001).

The Y chromosome pool of Jews as part of the genetic landscape of the Middle East.
Am. J. Hum. Genet. 69: 10951112.
Nicholas Awde and Asmatullah Sarwan. Pashto Dictionary & Phrasebook: Pashto-
English, English-Pashto. (Hippocrene Books, 2003, ISBN 078180972X) retrieved 10
January 2007.
135
Oakey R, Tyler-Smith C.(1990). Y chromosome DNA haplotyping Suggest the most
European and Asian men are descended from one of two males. Genomics. 7:325-
330.
Oefner PJ and Underhill PA.(1995). Comparative DNA sequence by denaturing high

performance liquid chromatography (DHPLC). Am J Hum Genet. 57:A266.
Olivio PD, Van de Walle MJ, LaipisPJ and Hauswirth WW.(1983). Nucleotide
sequence evidence for rapid genotypic shifts in the bovine mitochondrial DNA D-
loop. Nature. 306:400-402.
Orita M, Iwahana H, Kanazawa H, Hayashi K and Sekiya T.(1989). Detection of

polymorphisms of human DNA by gel electrophoresisas single-strand conformation
polymorphisms. Proc. Natd. Acad. Sci. USA 86: 2766-2770.
Ostertag EM and Kazazian HHJr. (2001). Twin priming a proposed mechanism for
the creation of inversion in L1 retrotransposition. Genome Res. 11:2059-2065.
Ostertag EM, DeBerardinis RJ, Goodier JL, Zhang Y, Yang N, Gerton GL and
Kazazian HHJr. (2002). A mouse model of human L1 retrotransposition. Nat Genet.
32:655-660.
Pakistan Economic Survey.(2006-2007). An accountancy publication

www.accountancy.com.pk.
Pandya A, King TE, Santos FR, Taylor PG, Thangaraj K, SinghL, Jobling MA, Tyler-
Smith C.(1998). A polymorphic human Y-chromosomal G to A transition found in
India. Ind J Hum Genet. 4:5261.
Passarino G, Semino O, Quintana-Murci L, Excoffier L, Hammer M and Santachiara-

Benerecetti AS.(1998). Different genetic components in the Ethiopian population,
identified by mtDNA and Y-chromosome polymorphisms. Am J Hum Genet.62:420-
434.
Passarino G, Semino O, Magri C, Al-Zahery N, Benuzzi G, Quintana-Murci L,

Andellnovic S, Bullc-Jakus F, Liu A, Arslan A, Santachiara-Benerecetti AS (2001).
The 49a,f haplotype 11 is a new marker of the EU19 lineage that traces migrations
from northern regions of the Black Sea. Hum Immunol 62:922-32. Erratum in: Hum
Immunol 62:1313-14.
Passarino G, Cavalleri GL, Lin AA, Cavalli-Sforza LL, Brresen-Dale AL, Underhill
PA.(2002). Different genetic components in the Norwegian population revealed by
the analysis of mtDNA and Y chromosome polymorphisms. Eur J Hum Genet.
10:521-529.
Payne R, Tripp M, Weigle J, Bodmer W and Bodmer J.(1964). A new leukocyte iso-
antigen system in man. Cold Spring Harbor Quantitative Biology.29:28p5.
Perez-Lezaun A, Calafell F, Mateu E, Comas D, Ruiz-Pacheco R and Bertranpetit

J.(1997). Microsatellite variation and the differentiation of modern humans. Human
Genet. 99:1-7.
Prak EL and Haig HKJr. (2000). Mobile elements and the human genome. Nature
Rev Genet. 1:134-144.
136
Qamar R, Ayub Q, Khaliq S, Mansoor A, Karafet T, Mehdi SQ and Hammer MF.
(1999). African and Levantine origins of Pakistani YAP+ Y chromosomes. Hum Biol.
71:745-755.
Qamar R, Ayub Q, Mohyuddin A, Helgason A, Mazhar K, Mansoor A, Zerjal T, Tyler-

Smith C and Mehdi SQ. (2002). Y-chromosomal DNA variation in Pakistan. Am J
Hum Genet.7:1107-1124.
Qi XQ, Bakht S, Devos KM, Gale MD and Osbourn A. (2001). L-RCA (Ligation
rolling circle amplification): a general method for genotyping of single nucleotide
polymorphism (SNPs). Nucleic Acids Res. 29: U68-U74.
Quddus SA.(1990). A Tribal Balochistan. Ferozsons (PVt.) Ltd., Lahore, Pakistan.
Queller DC, Strassmann JE and Colin RH.(1993). Microsatellites and kinship. Tree
8:285-288.
Quintana-Murci L, Semino O, Minch E, Passarimo G, Brega A and Santachiara-

Benerecetti AS.(1999a). Further characteristics of proto-European Y chromosomes.
Eur J Hum Genet. 7:603-8.
Quintana-Murci L, Semino O, Poloni ES, Liu A, Van Gijn M, Passarino G, Brega A,

Nasidze IS, Maccioni L, Cossu G, al-Zahery N, Kidd JR, Kidd KK and Santachiara-
Benerecetti AS.(1999b). Y-chromosome specific YCAII, DYS19 and YAP
polymorphisms in human populations: a comparative study. Ann Hum Genet. 63:153-
166.
Quintana-Murci L, Semino O, Bandelt HJ, Passarino G, McElreavey K and

Santachiara-Benerecetti AS. (1999c). Genetic evidence of an early exit of Homo
sapiens sapiens from Africa through eastern Africa. Nat Genet. 23:437-441.
Quintana-Murci L, Krausz C, Zerjal T, Sayar SH, Hammer MF, Mehdi SQ, Ayub Q,
Qamar R, Mohyuddin A, Radhakrishna U, Jobling MA, Tyler-Smith C and
McElreavey K.(2001). Y-Chromosome Lineages Trace Diffusion of People and
Languages in Southwestern Asia. Am J Hum Genet. 68:537-542.
Quintana-Murci L, Chaix R, Wells RS, Behar DM, Sayar H, Scozzari R, Rengo C,

Al-Zahery N, Semino O, Santachiara-Benerecetti AS, Coppa A, Ayub Q, Mohyuddin
A, Tyler-Smith C, Qasim Mehdi S, Torroni A, McElreavey K. (2004). Where west
meets east: the complex mtDNA landscape of the southwest and Central Asian
corridor. Am J Hum Genet. 74:827-45.
Ramana GV, Su B, Jin L, Singh L, Wang N, Underhill PA, Chakraborty R (2001). Y

chromosome SNP haplotypes suggest evidence of gene flow among caste, tribe, and
the migrant Siddi populations of Andhra Pradesh, South India. Eur J Hum Genet.
9:695700.
Ramsay G. (1998). DNA chips: state of the art. Nat Biotechnol. 16:40-44.
Raynolds MV, Bristow M R, Bush E W, Abraham W T, Lowes B D, Zisman L S, Taft

CS, and Perryman MB.(1993). Angiotensin-converting enzyme DD genotype in
patients with ischaemic or idiopathic dilated cardiomyopathy. Lancet 342:1073-1075.
Regueiro M, Cadenas AM, Gayden T, Underhill PA and Herrera RJ. (2006). Iran:
Tricontinental nexus for Y-chromosome driven migration. Hum Hered. 61:132143.
137
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero
MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzlez JR, Gratacs
M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R,
Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J,
Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad
DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW
and Hurles ME. (2006). Global variation in copy number in the human genome.
Nature. 444: 444-454.
Repping S, van Daalen SK, Brown LG, Korver CM, Lange J, Marszalek JD,
Pyntikova T, van der Veen F, Skaletsky H, Page DC and Rozen S. (2006). High
mutation rates have driven extensive structural polymorphism among human Y
chromosomes. Nat Genet. 38:463-467.
Renfrew C.(1987). Archaeology and language: the puzzle of Indo-European origins.

Jonathan Cape, London.
Ricards RI, Holman K, Yu S and Sutherland GR.(1993). Fragile X syndrome

unstable element, p(CCG)n, and other simple tandem repeat sequences are binding
sites for specific nuclear proteins. Hum. Mol.Genet. 2:1429-1435.
Righmire GP.(1989). Middle stone agehumans from eastern and southern Africa. In:
P Mellars and CB Stringer (eds): Te human Revolution. Edinburgh: Edinburgh
University Press, pp109-122.
Robertson GS. (1896). The Kafirs of the Hindu-Kush. Oxford University Press,
Karachi, Pakistan.
Roberts RJ and Murray K. (1976). Restriction Endonucleases. CRC Crit Rev

Biochem. 1976 4:123164.
Roewer L, Krawczak M, Willuweit S, Nagy M, Alves C, Amorim A, Anslinger K,

Augustin C, Betz A, Bosch E, Cagli A, Carracedo A, Corach D, Dekairelle AF,
Dobosz T, Dupuy BM, Fredi S, Gehrig C, Gusma L, Henke J, Henke L, Hidding M,
Hohoff C, Hoste B, Jobling MA, Krgel HJ, de Knijff P, Lessig R, Liebeherr E, Lorente
M, Martnez-Jarreta B, Nievas P, Nowak M, Parson W, Pascali VL, Penacino G,
Ploski R, Rolf B, Sala A, Schmidt U, Schmitt C, Schneider PM, Szibor R, Teifel-
Greding J, Kayser M.(2001). Online reference database of European Y-
chromosomal short tandem repeat (STR) haplotypes. Forensic Sci Int. 118: 106-113.
Rootsi S, Magri C, Kivisild T, Benuzzi G, Help H, Bermisheva M, Kutuev I, Bara L,

Perici M, Balanovsky O, Pshenichnov A, Dion D, Grobei M, Zhivotovsky LA,
Battaglia V, Achilli A, Al-Zahery N, Parik J, King R, Cinniolu C, Khusnutdinova E,
Rudan P, Balanovska E, Scheffrahn W, Simonescu M, Brehm A, Goncalves R, Rosa
A, Moisan JP, Chaventre A, Ferak V, Fredi S, Oefner PJ, Shen P, Beckman L,
Mikerezi I, Terzi R, Primorac D, Cambon-Thomsen A, Krumina A, Torroni A,
Underhill PA, Santachiara-Benerecetti AS, Villems R and Semino O. (2004).
Phylogeography of Y-chromosome haplogroup I reveals distinct domains of
prehistoric gene flow in Europe. Am J Hum Genet. 75:128-137.
Rootsi S, Zhivotovsky LA, Baldovic M, Kayser M, Kutuev IA, Khusainova R,

Bermisheva MA, Gubina M, Fedorova SA, Ilume AM, Khusnutdinova EK, Voevoda
MI, Osipova LP, Stoneking M, Lin AA, Ferak V, Parik J, Kivisild T, Underhill PA and
Villems R.(2007). A counter-clockwise northern route of the Y-chromosome
haplogroup N from Southeast Asia towards Europe. Eur J Hum Genet. 15: 204-211.
138
Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA and
Feldman MW.(2002). Genetic structure of human populations. Science. 298:2381-
2385.
Rosser ZH, Zerjal T, Hurles ME, Adojaan M, Alavantic D, Amorim A, Amos W,

Armenteros M, Arroyo E, Barbujani G, Beckman G, Beckman L, Bertranpetit J, Bosch
E, Bradley DG, Brede G, Cooper G, Crte-Real HB, de Knijff P, Decorte R, Dubrova
YE, Evgrafov O, Gilissen A, Glisic S, Glge M, Hill EW, Jeziorowska A, Kalaydjieva
L, Kayser M, Kivisild T, Kravchenko SA, Krumina A, Kucinskas V, Lavinha J, Livshits
LA, Malaspina P, Maria S, McElreavey K, Meitinger TA, Mikelsaar AV, Mitchell RJ,
Nafa K, Nicholson J, Nrby S, Pandya A, Parik J, Patsalis PC, Pereira L, Peterlin B,
Pielberg G, Prata MJ, Previder C, Roewer L, Rootsi S, Rubinsztein DC, Saillard J,
Santos FR, Stefanescu G, Sykes BC, Tolun A, Villems R, Tyler-Smith C, Jobling
MA.(2000). Y-chromosomal diversity in Europe is clinal and influenced primarily by
geography, rather than by language. Am J Hum Genet. 67:1526-1543.
Royle NJ, Clarkson RE, Wong Z, Jeffery AJ.(1988). Clustering of hypervariable

minisatellite in the proterminal region of human autosome. Genomics. 3:352-360.
Ruiz-Pesini E, Lott MT, Procaccio V, Poole JC, Brandon MC, Mishmar D, Yi C,

Kreuziger J, Baldi P and Wallace DC.(2007). An enhanced MITOMAP with a global
mtDNA mutational phylogeny. Nucleic Acids Res. 35:D823D828.
Ruvolo ME, Zehr S, von Dornum M, Pan D, Chang B and Lin J.(1993).
Mitochondrial COII sequences and modern human origins. Mol Biol Evol 10:1115-
1135.
Saiki RK, Scharf S, Faloona F, Mullis KB, Horn GT, Erlich HA and Arnheim N.
(1985). Enzymatic amplification of beta-globin genomic sequences and restriction
site analysis for diagnosis of sickle cell anemia. Science 230:1350-1354.
Sanchez JJ, Hallenberg C, Borsting C, Hernandez A, Morling N. (2005). High

frequencies of Y chromosome lineages characterized by E3b1, DYS19-11, DYS392-
12 in Somali males. Eur J Hum Genet. 13: 856-866.
Santos FR, Pandya A, Kayser M, Mitchell RJ, Liu A, Singh L, Destro-Bisol G,

Novelletto A, Qamar R, Mehdi SQ, Adhikari R, de Knijff P and Tyler-Smith C. (2000).
A polymorphic L1 retroposon insertion in the centromere of the human Y
chromosome. Hum Mol Genet. 9:421-430.
Sassaman DM, Dombroski BA, Moran JV, Kimberland ML, Naas TP, De Berardinis
RJ, Gabriel A, Swergold GD and Kazazian HHJr.(1997). Many humanL1 elements
are capable of retrotransposition. Nat Genet. 16:37-43.
Scheinfeldt L, Friedlaender F, Friedlaender J, Latham K, Koki G, Karafet T, Hammer

M, and Lorenz J.(2006). Unexpected NRY chromosome variation in Northern Island
Melanesia. Mol Biol Evol. 23:1628-1641.
Schunkert H, Hense HW, Holmer SR, Stender M, Perz S, Keil U, Lorell BH, and
Riegger GA. (1994). Association between a deletion polymorphism of the
Angiotensin- converting enzyne gene and left ventricular hypertrophy. N Engl J Med.
330:1634-1638.
Schurr TG, Maggi WR, Fowler K, Wallace DC. (2000). The ethnic origins of an
enigmatic south Asian population, the Kalasha of northern Pakistan, as revealed by
mtDNA variation. Am J Hum Genet. 67:217.
139
Scozzari R, Torroni A, Semino O, Sirugo G, Brega A and Santachiara Benerecetti
AS.(1988). Genetic studies on the Senegal population and mitochondrial DNA
polymorphism. Am J Hum Genet. 43:534-544.
Scozzari R, Cruciani F, Santolamazza P, Malaspina P, Torroni A, Sellitto D, Arredi B,

Destro-Bisol G, De Stefano G, Rickards O, Martinez-Labarga C, Modiano D, Biondi
G, Moral P, Olckers A, Wallace DC and Novelletto A.(1999). Combined use of
biallelic and microsatellite Y-chromosome polymorphisms to infer affinities among
African populations. Am J Hum Genet. 65:829-46.
Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H,

WalkerM, Chi M, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner A, Gilliam TC,
Trask B, Patterson N, Zetterberg A and Wigler M.(2004). Large-scale copy number
polymorphism in the human genome. Science 305: 525-528.
Seielstad M, Yuldasheva N, Singh N, Underhill P, Oefner P, Shen P and Wells

RS.(2003). A novel Y-chromosome variant puts an upper limit on the timing of first
entry into the Americas. Am J Hum Genet. 73:700-755.
Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE, Lin AA,
Mitra M, Sil SK, Ramesh A, Usha Rani MV, Thakur CM, Cavalli-Sforza LL, Majumder
PP, Underhill PA.(2006). Polarity and temporality of high-resolution Y-chromosome
distributions in India identify both indigenous and exogenous expansions and reveal
minor genetic influence of Central Asian pastoralists. Am J Hum Genet. 78:202-221.
Semino O, Passarino G, Oefner PJ, Lin AA, Arbuzova S, Beckman LE, De

Benedictis G, Francalacci P, Kouvatsi A, Limborska S, MarcikiaeM, Mika A, Mika B,
Primorac D, Santachiara-Benerecetti AS, Cavalli-Sforza LL, Underhill PA.(2000).
The genetic legacy of Palaeolithic Homo sapiens sapiens in extant Europeans: a Y-
chromosome perspective. Science. 290:1155-1159.
Semino O, Santachiara-Benerecetti AS, Falaschi F, Cavalli-Sforza LL, and Underhill

PA.(2002). Ethiopians and Khoisan share the deepest clades of the human Y-
chromosome phylogeny. Am J Hum Genet. 70:265-268.
Semino O, Magri C, Benuzzi G, Lin AA, Al-Zahery N, Battaglia V, Maccioni L,

Triantaphyllidis C, Shen P, Oefner PJ, Zhivotovsky LA, King R, Torroni A, Cavalli-
Sforza LL, Underhill PA and Santachiara-Benerecetti AS.(2004). Origin, diffusion,
and differentiation of Y-chromosome haplogroups E and J: Inferences on the
neolithization of Europe and later migratory events in the Mediterranean area. Am J
Hum Genet. 74:1023-1034.
Serre D and Hudson TJ. (2006). Resources for Genetic Variation Studies. Annu
Rev Genomics Hum. 7: 443-457.
Sharp AJ, Locke DP, McGrath SD, Cheng Z, Bailey JA, Vallente RU, Pertz LM,
Clark RA, Schwartz S, Segraves R, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner
A, Gilliam TC, Trask B, Patterson N, Zetterberg A and Wigler M. (2005). Segmental
duplications and copy-number variation in the human genome. Am J Hum Genet.
77:78-88.
Shen MR, Batzer MA and Deininger PL. (1991). Evolution of the master Alu gene
(s). J Mol Evol. 33:311-320.
140
Shi H, Dong YL, Wen B, Xiao CJ, Underhill PA, Shen PD, Chakraborty R, Jin L, and
Su B.(2005). Y-chromosome evidence of southern origin of the East Asian-specific
haplogroup O3-M122. Am J Hum Genet 77: 408-419.
Shriver MD, Jin L, Chakrabraty R and Boerwinkle E.(1993). VNTR allele-frequency

distribution under the stepwise mutation model-a computer stimulation approach.
Genetics. 134:983-993.
Shriver MD, Jin L, Ferrell RE and Deka R. (1997). Micosatellite Data support an
early population expansion in Africa. Genomes Res 7: 586-591.
Sims LM, Garvey D and Ballantyne J. (2007). Sub-populations within the major
European and African derived haplogroups R1b3 and E3a are differentiated by
previously phylogenetically undefined Y-SNPs. Hum Mutat. 28:97.
Smith AF.(1996). The origin of interspersed repeats in the human genome. Curr
Opin Genet Dev. 6:743-778.
Smith AF.(1999). Interspersed repeats and other mementos of transposable

elements in mammalian genome. Curr Opin Genet Dev. 9:657-663.
Stefansson H, Helgason A, Thorleifsson G, Steinthorsdottir V, Masson G, Barnard J,

Baker A, Jonasdottir A, Ingason A, Gudnadottir VG, Desnica N, Hicks A, Gylfason A,
Gudbjartsson DF, Jonsdottir GM, Sainz J, Agnarsson K, Birgisdottir B, Ghosh S,
Olafsdottir A, Cazier JB, Kristjansson K, Frigge ML, Thorgeirsson TE, Gulcher JR,
Kong A and Stefansson K.(2005). A common inversion under selection in
Europeans. Nat Genet. 37:129-137.
Strachan T and Read AP.(2004). Human Molecular Genetics, 3rd ed. Garland
Science, London and New York.
Stringer CB and Andrews P. (1988). Genetic and fossils evidence for the origin of
modern humans. Science. 239:1263-1268.
Stringer C. (2000). Palaeoanthropology. Coasting out of Africa. Nature 405:24-27.
Swallow DM, GENDLER S, GRIFFITHS B, CORNEY G, Taylor-Papadimitriou J And

Bramwell ME. (1987). The human tumour-associated epithelial mucins are coded by
an expressed hypervariable gene locus PUM. Nature. 328:82-84.
Swisher CC 3rd, Curtis GH, Jacob T, Getty AG, SuprijoA, Widiasmoro.(1994). Age
of the earliest known hominids in Java, Indonesia. Science 263: 1118-1121.
Su B, Xiao J, Underhill P, Deka R, Zhang W, Akey J, Huang W, Shen D, Lu D, Luo J,

Chu J, Tan J, Shen P, Davis R, Cavalli-Sforza L, Chakraborty R, Xiong M, Du R,
Oefner P, Chen Z, Jin L.(1999). Y-chromosome evidence for a northward migration
of modern humans into eastern Asia during the last Ice Age. Am J Hum Genet.
65:17181724.
Su B, Jin L, Underhill P, Martinson J, Saha N, McGarvey ST, Shriver MD, Chu J,

Oefner P, Chakraborty R and Deka R. (2000). Polynesian origins: Insights from the
Y chromosome. Proc Natl Acad Sci. 97: 82258228.
Sun C, Skaletsky H, Rozen S, Gromoll J, Nieschlag E, Oates R & Page D C. (2000).

Deletion of azoospermia factor a (AZFa) region of human Y chromosome caused by
recombination between HERV15 proviruses. Hum. Mol. Biol. 9: 2291-2296.
141
Tattersall I. (1997). Out of Africa again ------ and again? Sci Am. 276:60-67.
Tautz D.(1989). Hypervariability of simple sequences as a general source for

polymorphic DNA markers. Nucleic Acids Res. 17: 6463-6471.
Thangaraj K, Singh L, Reddy AG, Rao VR, Sehgal SC, Underhill PA, Pierson M,
Frame IG, and Hagelberg E. (2003). Genetic affinities of the Andaman Islanders, a
vanishing human population. Curr Biol. 13:86-93.
Thanseem I, Thangaraj K, Chaubey G, Singh VK, Bhaskar LV, Reddy BM, Reddy
AG, Singh L. (2006). Genetic affinities among the lower castes and tribal groups of
India: inference from Y chromosome and mitochondrial DNA. BMC Genet. 7:42.
The ENCODE Project Consortium.(2007). Identification and analysis of functional

elements in 1% of the human genome by the ENCODE pilot project. Nature.
447:799-816.
Thomas MG, Bradman N, Flin HM.(1999). High throughput analysis of 10

microsatellite and 11 diallelic polymorphisms on the human Y-chromosome. Hum
Genet 105:577581.
Tishkoff SA, Dietzsch E, Speed W, Pakstis AJ, Kidd JR, Cheung K, Bonne`-Tamir B,
Santachiara-Benerecetti AS, Moral P and Krings M.(1996). Global patterns of linage
disequilibrium at the CD4 locus and modern human origins. Science. 271:1380-
1387.
Todd J A, Aitman TJ, Cornall RJ, Ghosh S, Hall JRS, Hearne CM, KnighT AM, Love
JM, Mcaleer MA, Prins J-B, Rodrigues N, Lathrop M, Pressey A, Delarato NH,
Peterson LB and Wicker LS.(1991). Genetic analysis of auto immune type 1
diabetes mellitus in mice. Nature. 351: 542-547.
Toth G. Gaspari Z, and Jurka J.(2000). Microsatellite in different eukaryotic

genomes:survey and analysis. Genome Res. 10:967-981.
Treco D and Arnheim N.(1986). The evolutionary conserved repetitive sequence

d(TG.AC)n promotes reciprocal exchange and generate unusual recombinants
tetrads during yeast meiosis. Mol Cell Biol. 6:3934-3947.
Tsunoda K,Sanke T,Nakagawa T,Furuta H andNanjo K.(2001). Single nucleotide

polymorphism (D68D, T to C) in the syntaxin 1A gene correlates to age at onset and
insulin requirement in Type II diabetic patients. Diabetologia 44:2092-2097.
Turner G, Barbulescu M, Su M, Jensen-SeaanMI, Kidd KK and Lenz J.(2001).

Insertional polymorphism of full-length endogenous retroviruses in humans. Curr
Biol. 11:1531-1535.
Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H,
Albertson D, Pinkel D, Olson MV and Eichler EE.(2005). Fine-scale structural
variation of the human genome. Nat Genet. 37:727-732.
Ullu E and Tschudi C.(1984). Alu sequences are processed 7SL RNA genes.
Nature 312:171-172.
Underhill PA, Jin L, Lin AA, Mehdi SQ, Jenkins T, Vollrath D, Davis RW, Cavalli-
Sforza LL and Oefner PJ.(1997). Detection of numerous Y chromosome biallelic
142
polymorphisms by denaturing high-performance liquid chromatography. Genome
Res. 7:996-1005.
Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Yang WH, Kauffman E, Bonn-
Tamir B, Bertranpetit J, Francalacci P, Ibrahim M, Jenkins T, Kidd JR, Mehdi SQ,
Seielstad MT, Wells RS, Piazza A, Davis RW, Feldman MW, Cavalli-Sforza LL and
Oefner PJ.(2000). Y chromosome sequence variation and the history of human
populations. Nat Genet. 26:358-61.
Underhill PA, Passarino G, Lin AA, Shen P, Mirazon Lahr M, Foley RA, Oefner PJ,
and Cavalli-Sforza LL.(2001). The phylogeography of Y chromosome binary
haplotypes and the origins of modern human populations. Ann Hum Genet. 65: 43
62.
Valdes AM, Saltkin M and Freimer NB. (1993). Allele frequency at microsatellite loci:
the stepwise mutation model revisited. Genetics. 133:737-749.
Verkerk AJMH, Pieretti M, Sutcliffe JS, Fu Y-H, Kuhl DPA, Pizzuti A, Reiner O,
Richards S, Victoria MF, Zhang F, Eussen BE, van Ommen G-JB, Blonden LAJ,
Riggins GJ, Chastain JL, Kunst CB, Galjaard H, Caskey CT, Nelson DL, Oostra BA
and Warren S.(1991). Identification of the gene (FMR-1) containing CGG repeat
coincident with a brekpoint cluster region exhibiting length variation in fragile X
syndrome. Cell. 65:905-914.
Walls EV and Crawford DH.(1987). Generation of lymphoblastoid cell lines using

Epstein-Barr virus. In: Lymphocytes, A practical apporch. Ed. Klaus G.G.B. IRL
press, Oxford. pp 157.
Walter RC, Buffler RT, Bruggemann JH, Guillaume MM, Berhe SM, Negassi B,
Libsekal Y, Cheng H, Edwards RL, von Cosel R, Nraudeau D and Gagnon
M.(2000). Early human occupation of Red sea coast of Eritrea during the last inter
giacial. Nature. 405:65-69.
Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R, Ghandour G, Perkins
N, Winchester E, Spencer J, Kruglyak L, Stein L, Linda H, Topaloglou T, Hubbell E,
Robinson E, Mittmann M, Morris MS, Shen N, Kilburn D, Rioux J, Nusbaum C,
Rozen S, Hudson TJ, Lipshutz R, Chee M and Lander ES.(1998). Large-Scale
Identification, Mapping, and Genotyping of Single-Nucleotide Polymorphisms in the
Human Genome. Science. 280:1077-1082.
Watkins WS, Ricker CE, Bamshad MJ, Carroll ML, Nguyen SV, Batzer MA,
Harpending HC, Rogers AR, Jorde LB.(2001). Patterns of ancestral human diversity:
an analysis of Alu insertion and restriction-site polymorphisms. Am. J. Hum Genet.
68:738-752.
Watson JD and Crick FHC.(1953). A Structure for Deoxyribose Nucleic Acid.

Nature. 171:737-738.
Weale ME, Yepiskoposyan L, Jager RF, Hovhannisyan N, Khudoyan A, Burbage-

Hall O, Bradman N, Thomas MG.(2001). Armenian Y chromosome haplotypes
reveal strong regional structure within a single ethno-national group. Hum
Genet.109:659-674.
143
Webster MT, Smith NG, Ellegren H. (2002). Microsatellite evolution inferred from
human-chimpanzee genomic sequence alignments. Proc Natl Acad Sci USA
99:8748-8753.
Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, EvseevaI, Blue-Smith J, Jin L,

Su B, Pitchappan R, Shanmugalakshmi S, Balakrishnan K, Read M, Pearson NM,
Zerjal T, Webster MT, Zholoshvili I, Jamarjashvili E, Gambarov S, Nikbin B, Dostiev
A, Aknazarov O, ZallouaP, Tsoy I, Kitaev M, Mirrakhimov M, Chariev A, Bodmer
WF.(2001). The Eurasian heartland: a continental perspective on Y-chromosome
diversity. Proc. Natl. Acad. Sci. USA 98:1024410249.
Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ,

Makhijani V, Roth GT, Gomes X, Tartaro K, Niazi F, Turcotte CL, Irzyk GP, Lupski
JR, Chinault C, Song XZ, Liu Y, Yuan Y, Nazareth L, Qin X, Muzny DM, Margulies M,
Weinstock GM, Gibbs RA, Rothberg JM.(2008). The complete genome of an
individual by massively parallel DNA sequencing. Nature. 17:872-876.
Wilson IJ, Balding DJ.(1998). Genealogical inference from microsatellite data.

Genetics 150:499510.
Wolpert S.(2000). A new history of India. Oxford University Press, New York.
Wood ET, Stover DA, Ehret C, Destro-Bisol G, Spedini G, McLeod H, Louie L,

Bamshad M, Strassmann BI, Soodyall H and Hammer MF.(2005). Contrasting
patterns of Y chromosome and mtDNA variation in Africa: Evidence for sex-biased
demographic processes. Eur J Hum Genet 13: 867876.
Wong Z, Wilson V, Patel I, Povey S, Jeffreys AJ.(1987). Characterization of a panel

of highly variable minisatellites cloned from human DNA. Ann Hum Genet. 51(Pt
4):269-288.
Xue Y, Zerjal T, Bao W, Zhu S, Shu Q, Xu J, Du R, Fu S, Li P, Hurles ME, Yang H

andChris Tyler-Smith C.(2006). Male demography in East Asia: a north-south contrast
in human population expansion times. Genetics. 172:24312439.
Y Chromosome Consortium.(2002). A nomenclature system for the tree of human

Y-chromosomal binary haplogroups. Genome Res. 12:339-348.
Yoshino T, Takeyama H and Matsunaga T.(2001). Single nucleotide polymorphism

analysis using a bacterial magnetic particle microarray. Electrochemistry 69:1008-
1012.
Youil R , Kemper B W, and Cotton R G.(1995). Screening for mutations by enzyme

mismatch cleavage with T4 endonuclease VII. Proc Natl Acad Sci. USA 92:87-91.
Zegura SL, Karafet TM, Zhivotovsky LA and Hammer MF.(2004). High-resolution

SNPs and microsatellite haplotypes point to a single, recent entry of Native American
Y chromosomes into the Americas. Mol Biol Evol. 21:164-175.
Zerjal T, Xue Y, Bertorelle G, Wells RS, Bao W, Zhu S, Qamar R, Ayub Q,

Mohyuddin A, Fu S, Li P, Yuldasheva N, Ruzibakiev R, Xu J, Shu Q, Du R, Yang H,
Hurles ME, Robinson E, Gerelsaikhan T, Dashnyam B, Mehdi SQ, Tyler-Smith C.
(2003). The genetic legacy of the Mongols. Am J HumGenet. 72:717-21.
144
Zhang F, Su B, Zhang YP and Jin L. (2007). Genetic studies of human diversity in
East Asia. Phil. Trans. R. Soc. B 362: 987995.
Zhivotovsky LA, Bennett L, Bowcock AM and Feldman MW.(2000). Human

population expansion and microsatellite variation. Mol Biol Evol. 17:757-767.
Zhivotovsky L, Underhill P, Cinniolu C, Kayser M, Morar B, Kivisild T, Scozzari R ,

Cruciani F, Destro-Bisol G and Spedini G. (2004). The Effective Mutation Rate at Y
Chromosome Short Tandem Repeats, with Application to Human Population-
DivergenceTime.AmJHumGenet.74:50-61.
145
APPENDIX
-8-
Appendix I: List of Y-SNPs analyzed along with their primer sequences and PCR amplification conditions used in this study.
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING

METHOD DESIGNATION TEMPERATUREC
1 Apt AFLP TEK E TGG ATT GCA TTC AAC TTC ACT TAC 65.5
TEK G CTG AGT TCA AAT GCT CGG GTC TC
2 LLY22g AFLP LLY22gF CCA CCCAGT TTT ATG CAT TTG 55
LLY22gR ATA GAT GGC GTC TTC ATG AGT
3 L1Y PCR L1YF GCA CAA TGT GCA CAT GTA CCC TA
L1YR TGA TGT GTG CAT TCA TCT CAT ATA T
4 M6 DHPLC M6 F CAC TAC CAC ATT TCT GGT TGG 63, 56
M6 R CGC TGA GTC CAT TCT TTG AG
5 M8 Sequencing M8 F CCC ACC CAC TTC AGT ATG AA 56
M8 R AGG CTG ACA GAC AAG TCC AC
6 M9 AFLP M9F GCA GCA TAT AAA ACT TTC AGG 55
M9R AAA ACC TAA CTT TGC TCA AGC
7 M11 AFLP M11R TTC ATC ACA AGG AGC ATA AAC AA 55
M11F CCC TCC CTC TCT CCT TGT ATT CTA CC
8 M12 ARMS PCR M12 F ACT AAA ACA CCA TTA GAA ACA AAG G 57
M12Nor R AGC AAC ATA GTG ACC CCC AAC
M12Mut R GCA ACA TAG TGA CCC CCA AA
9A M17 AFLP M17F GTG GTT GCT GGT TGT TAC GT 60
M17R AGC TGA CCA CAA ACT GAT GTA GA
9B M17 ARMS M17FN TTG CTG GTT GTT ACG GGG 60
M17FM GTTG CTG GTT GTT ACG GGT
M17R GCT ATT CTT GTT TCT CCA GGC
10 M20 AFLP M20F GAT TGG GTG TCT TCA GTG CT 60
M20R CAC ACA ACA AGG CAC CAT C 58
11 M25 DHPLC M25 F AAA GCG AGA GAT TCA ATC CAG 63, 56
M25R TTT TAG CAA GTT AAG TCA CCA GC
12 M27 ARMS-PCR M27 F CGG AAG TCA AAG TTA TAG TTA CTG G 65
M27RNL TAT AGG AAT CGA GGT TCA GGT CAG
M27 RMT TAT AGG AAT CGA GGT TCA GGT CAC
a
13 M31 DHPLC M31 F GAA CC AGA CAA TAC GAA ATA GAA G 63, 56
M31 R TTT AGC GGC TTA TCT CAT TAC C
14 M32 DHPLC M32 F TTG AAA AAA TAC AGT GGA AC 63, 56
M32 R CAA GTG TTT AAG GAT ACA GA
15 M35 ARMS-PCR M35 FN ATT TTC CTT TGG GAC ACT AG 58
M35 FM ATT TTC CTT TGG GAC ACT AC
M35 R AGA GGG AGC AAT GAG GAC A
16 M36 DHPLC M36 F AGA TCA TCC CAA AAC AAT CAT AA 63, 56
M36 R AAG GCT GAA ATC AAT CCA ATC TG
17 M38 Sequencing M38 F CAG TTT TTA GAG AAT AAT GTC CT 63, 56
M38 R TTA AAG AAA AGA AAA GCA GAT G
18 M45 DHPLC M45F GCT GGC AAG ACA CTT CTG AG 63, 56
M45R AAT ATG TTC CTG ACA CCT TCC
19 M48 ARMS-PCR M48 FN TGA CAA TTA GGA TTA AGA ATA TTA TA
M48 FM TGA CAA TTA GGA TTA AGA ATA TTA TG
M48R AAA ATT CCA AGT TTC AGT GTC ACA TA
20 M50 DHPLC M50 F CGG CAA CAG TGA GGA CAG T 63, 56
M50 R TGC TTC AGG AGA TAG AGG CTC
21 M52 ARMS-PCR M52FC TAT CGG CCT CCT GAG TAC CTG 60
M52RG CAA GAA ACC TAT CAA ACA TCC G
M52FM CAA GAA ACC TAT CAA ACA TCC TC
22 M56 ARMS PCR M56R TCT CAT TGC TGC CTC TCT TTA 55
M56FNL GCA ATG GGA GGA TTA CGA CA
M56FMT GCA ATG GGA GGA TTA CGA CT
23 M60 DHPLC M60 F GCA CTG GCG TTC ATC ATC T 63, 56
M60 R ATG TTC ATT ATG GTT CAG GAG G
24 M62 ARMS-PCR M62 FNL GGA ATT AAT TAT TTC TCT TTC TCA T 54
M62 FMT GGA ATT AAT TAT TTC TCT TTC TCA C
M62 R TGG TGG CAT GTG CCT GTG TT
25 M67 ARMS-PCR M67 F CCA TAT TCT TTA TAC TTT CTA CCT 55
M67 RNL TCG TGG ACC CCT CTA TAC A
M67 RMT TCG TGG ACC CCT CTA TAC T
b
26 M69 DHPLC M69 F GGT TAT CAT AGC CCA CTA TAC TTT G 63, 56
M69 R ATC TTT ATT CCC TTT GTC TTG CT
27 M70 ARMS-PCR M70 FNL GGA CTC ATG TCT CCA TGA GTA 58
M70 FMT GGA CTC ATG TCT CCA TGA GTC
M70 R ATC TTT ATT CCC TTT GTC TTG CT
28 M73 DHPLC M73 F CAG AAT AAT AGG AGA ATT TTT GGT 63, 56
M73 R ATT TTC CTT ATT TTC TAA GCA GC
29 M74 DHPLC M174 F ATG CTA TAA TAA CTA GGT GTT GAA G 63, 56
M174 R AAT TCA GCT TTT ACC ACT TCT GAA
30 M76 DHPLC M76 F TAG AAG TAG CAG ATT GGG AGA GG 63, 56
M76 R CCT GAT AAA ATG AAA AAA ATG GTC
31 M78 ARMS-PCR M78 F TGG TTC TCC ACT ACA GGA GA 61
M78 RN ATT TTG AAA TAT TTG GAA GGG TG
M78RM TAT TTT GAA ATA TTT GGA AGG GTA
32 M82 DHPLC M82 F CTG TAC TCC TGG GTA GCC TGT 63, 56
M82 R AAG AAC GAT TGA ACA CAC TAA CTC
33 M87 DHPLC M87 F TCC CAT TAT TTG CTA TAT TTG CT 55
M87 RNL AAC AAG CTG GCA TCA GAA TAT AA
M87RMT CAA GCT GGC ATC AGA ATA TAG
34 M88 Sequencing M88 F ATT CTA GGG TCA GGC AAC TAG G 63, 56
M88 R TGT TTG TTC TAT TCT ATG GTC TTC C
35 M89 ARMS-PCR M89 F AGA AGC AGA TTG ATG TCC CAC T 62
M89 RNL AAC TCA GGC AAA GTG AGA GAA G
M89 RMT AAC TCA GGC AAA GTG AGA GAA A
36 M91 DHPLC M91F GAG CTT GGA CTT TAG GAC GG 63, 56
M91R AAA CTT TAA GGC ACT TCT GGC
37 M92 ARMS-PCR M92 F GGC CTT ATA AGA TTG GCA TAC 62
M92 RNL CTA AAT ACT GTT GGA GCC TAT A
M92 RMT CTA AAT ACT GTT GGA GCC TAT G
38 M97 DHPLC M97 F GTT GCC CTC TCA CAG AGC AC 63, 56
M97R AAG GTC ACT GGA AGG ATT GC
39 M101 DHPLC M101 F TCA CAG CAG CTT CAG CAA A 63, 56
c
M101 R ATA AAA ATT AGA CTC TGT GTT ACT AGC
40 M103 DHPLC M103 F CAG TAA GTG AAC TCA CAC ATA ATT CC 63, 56
M103 R CCA GTT TTA TTT CAG TTT CAC AGC
41 M109 DHPLC M109 F GGG TAT CAA AAT GTC TTC AAC CT 63, 56
M109 R GGG AAT TTC CTG CTA CTT GC
42 M110 Sequencing M110F CAG GGA AGG ACC GTA AAA GG 63, 56
M110 R ATG TTT ATC ATG TGC AGT AAA GGT T
43 M111 Sequencing M111 F AAT CTT CTG CAA AGG GTT CC 63, 56
M111 R CAG CTA CAA AAC AAA ATA CTG GAC
44 M117 DHPLC M117 F AAG TAT GAC TTA TGA AGT ACG AAG AAA 63, 56
M117 R ATT CAG TTA GAT TTT ACA ATG AGC A
45 M119 DHPLC M119 F GAA TGC TTA TGA ATT TCC CAG A 63, 56
M119 R TTC ACA CAA TAT ACA AGA TGT ATT CTT
46 M122 ARMS-PCR M122FN AAT TGA GAT ACT AAT TCA T 50
M122FM AAT TGA GAT ACT AAT TCA C
M122R AAA ACT TTA TCA TAT TGA G
47 M123 ARMS-PCR M123 F CAG CGA ATT AGA TTT TCT TGC 58
M123RN GTA TCT GAA CTA GCA TAT CTG
M123RM AGT ATC TGA ACT AGC ATA TCT A
48 M124 ARMS-PCR M124 F TGC CTT TTG GAA ATG AAT AAA TC 60
M124 N ACA AAC TCA GTA TTA TTA AAC CG
M124 R ACA AAC TCA GTA TTA TTA AAC CA
49 M133 DHPLC M133 F TGA AAT GGA AAT CAA TAA ACT CAG T 63, 56
M133 R CCT TTT CTT TTT CTT TAA CCC TTC
50 M134 DHPLC M134 F AGA ATC ATC AAA CCC AGA AGG 63, 56
M134 R TCT TTG GCT TCT CTT TGA ACA G
51 M136 DHPLC M136 F ATG TGA AGA CAA CAC TGT GTG G 63, 56
M136 R TTG TGG TAG TCT TAG TTC TCA TGG
52 M143 DHPLC M143 F ATG CTA TAA TAA CTA GGT GTT GAA G 63, 56
M143 R AAT TCA GCT TTT ACC ACT TCT GAA
53 M147 Sequencing M147 F GTA TTC TGG GGC AAT TTT AGG 94-63-56-72 94-56-72
M147 R TTG ATA CAA GAG GTT ATT TTA AGC A 0.5Cdec/cycle
d
54 M148 DHPLC M148 F AAC AGA ATT ATC AGG AAA AGG TTT 63, 56
M148 R TTT TAC TTG TTC GTG TAC TTT CAA
55 M150 DHPLC M150 F GCA GTG GAG ATG AAG TGAG AC 63, 56
M150 R CCT ACT TTC CCC CTC TTC TG
56 M152 DHPLC M152 F AAG CTA TTT TGG TTT CTT TCA 63, 56
M152 R GCC TTG TGT GGG TAT GAT TG
57 M157 DHPLC M157F GCT GGC AAG ACA CTT CTG A 55
M157RNL ACC AAA GGT CAT TTG TGG AT
M157RMT CCA AAG GTC ATT TGT GGA G
58 M170 ARMS-PCR M170 N TAT TTA CTT AAA AAT CAT TGT TCA 56
M170FCmutant TAT TTA CTT AAA AAT CAT TGT TCC
M170 Rnormal CTT TTT TCA GTT CTT CAT CAG TTA
59 M172 ARMS-PCR M172 FNL CCC AAA CCC ATT TTG ATG CTA T 61
M172 FMT CCC AAA CCC ATT TTG ATG CTA G
M172 R TCA CAG TGG ATC CAT CTT CAC T
60 M173 ARMS-PCR M173 N AAT TCA AGG GCA TTT AGA ACA
M173 FC AAT TCA AGG GCA TTT AGA ACC 56
M173R TAT CTG GCA TCC GTT AGA AAA G 55
61 M175 Sequencing M175 F TTG AGC AAG AAA AAT AGT ACC CA 94-63-56-72 94-56-72
M175 R CTC CAT TCT TAA CTA TCT CAG GGA 0.5Cdec/cycle
62 M177 Sequencing M177 F TTT AAC ATT GAC AGG ACC AG 94-63-56-72 94-56-72
M177 R GTG TTG GTT CTC CTG TAA AG 0.5Cdec/cycle
63 M185 DHPLC M185 F GGA GTA CCT ATC ACT GAA TGT GC 63, 56
M185 R GTC ATT CAT TTC TGC TTG GAA C
64 M193 DHPLC M193 F GCC TGG ATG AGG AAG TGA G 63, 56
M193 R GCC TTC TCC ATT TTT GAC CT
65 M201 ARMS PCR M201 FN AAT AAT CCA GTA TCA ACT GAG AG 56
M201 FM TAA TAA TCC AGT ATC AAC TGA GAT
M201 R GTT CTG AAT GAA AGT TCA AAC GT
66 M207 ARMS-PCR M207 FN TAA GTC AAG CAA GAA ATT TTA 56
M207 FD TAA GTC AAG CAA GAA ATT TTG 52
M207 R CAA AAT TCA CCA AGA ATC CTT G
e
67 M214 ARMS-PCR M214 F CAA GCG TAG AGG TAT TAC TAC AA 66
M214RNL TGA GAC ACT GTC TGA AAA CAA TA
M214 RMT TGA GAC ACT GTC TGA AAA CAA TG
68 M217 Sequencing M217 F GCT TAT TTT TAG TCT CTC TTC CAT 63, 56
M217 R ACC TGT TGA ATG TTA CAT TTC TTT
69 M218 DHPLC M218 F TTG TGA GTT TTT TTC CAT CAA TC 63, 56
M218 R TTT ATT GAC GAT GGT ATT AGA AGA G
70 M231 DHPLC M231F CCT ATT ATC CTG GAA AAT GTG G 63, 56
M231R ATT CCG ATT CCT AGT CAC TTG G
71 M242 ARMS-PCR M242 F AAC TCT TGA TAA ACC GTG CTG 61
M242 RNL CAC GTT AAG ACC AAT GCC ATG
M242 RMT CAC GTT AAG ACC AAT GCC ATA
72 M267 ARMS-PCR M267 F TTA TCC TGA GCC GTT GTC C
M267 RNL CCA CAC AAA ATA CTG AAC GAT 62
M267 RMT CCA CAC AAA ATA CTG AAC GAC 58
73 M317 DHPLC M317 F TGG TTC TAC AGT TGG GAT TTT G 63, 56
M317 R CCT TAA TAA CCG AGG CAC AA
74 M343 ARMS M343 F TTT AAC CTC CTC CAG CTC TG
M343RNL CCA CAT ATC TCC AGG TCT AG
M343RMT CCA CAT ATC TCC AGG TCT AT
75 M349 ARMS M349 F TGG GAT TAA AGG TGC TCA TG 58
M349RN CCT AAG GTC AGA AAG TTT TAA C
M349 RM CCT AAG GTC AGA AAG TTT TAA A
76 M357 DHPLC M357 F CCC CGT TTT TTC CTC TCT GCC 63, 56
M357 R CAC GTA ACC TGG GAT GGT CAT A
77 P15 DHPLC P15F AGA GAG TTT TCT AAC AGG GCG 63, 56
P15R TGG GAA TCA CTT TTG CAA CT
78 P31 Sequencing P31 F TAA GGC TGC GTG TTC CCT AT 63, 56
P31 R GCA CTG TCA CTG TGG ATG TT
79 PK1 AFLP PK1 F TCA ACT TTC TTA AAT GAT TGT ACG TT
PK1 R TCT GTT CAG GAG AAC CTC TAT GG
80 PK2 ARMS-PCR PK2 F TGT GTC CTG GTG TCT TTT GG 67
f
PK2 RN GGT GTA CAA AAT AGT TTT TGT TTT TGA TCT AA
PK2 RM GGT GTA CAA AAT AGT TTT TGT TTTT GAT CTC G
81 PK3 ARMS-PCR PK3 F TGT GTC CTG GTG TCT TTT GG 68
PK3 N AAA GCC ACC ATC TCA AGA TGG TGT ACT A
PK3 M AAA GCC ACC ATC TCA AGA TGG TGT ACT G
82 PK4 DHPLC PK4 F CCA TCC TCC CAT GGC TAG T 63, 56
PK4 R GCT TCC AAG GTG CCC TTT AT
83 PK5 AFLP PK5 F TTC CAA ACA CAT GCT TCT GC 58.5
PK5 R TAA AAA GGA GGA GGG ACT GC
84 RPS4Y AFLP RPS4Y L CCA CAG AGA TGG TGT GGG TA 61
RPS4Y R GAG TGG GAG GGA CTG TGA GA
85 SRY+465 AFLP SRY13 GCC GAA GAA TTG CAG TTT 58
SRY14 GTT GAT GGG CGG TAA GTG GC
86 SRY1532 AFLP SRY1 TCC TTA GCAACC ATT AAT CTG G 60
SRY2 AAA TAGCAAAAA ATG ACA CAA GGC
87 SRY2627 AFLP SRY-2627 F CGC GGC TTT GAA TTT CAA GCT CTG 63
SRY-2627 R TAA GAG TCC CTC GGG GCC CTG G
88 SRY8299 AFLP SRY8299 R ACA GCA CAT TAG CTG GTA TGA C
SRY8299 F TCT CTT TAT GGC AAG ACT TAC G
89 sY81 AFLP SY810.1 AGG CAC TGG TCA GAA TGA AG 56
SY810.2 AAT GGA AAA TAC AGC TCC CC
90 TAT AFLP TAT 1 GAC TCT GAG TGT AGA CTT GTG A 60
TAT 3 GAA GGT GCC GTA AAA GTG TGA A
91 YAP PCR YAP 1 CAG GGG AAG ATA AAG AAA TA 59
YAP 2 ACT GCT AAA AGG GGA TGG AT
92 12f2 PCR 12F2 F TCT TCT AGA ATT TCT TCA CAG AAT TG 59
12F2 D CTG ACT GAT CAA AAT GCT TAC AGA TC
93 92R7 AFLP 92R7 L GCC TAT CTA CTT CAG TGA TTT CT 62
92R7 L (R ) GAC CCG CTG TAG ACC TGA CT
92R7 A TGC ATG AAC ACA AAA GAC GTA 65
92R7 B GCA TTG TTA AAT ATG ACC AGC
g
M320
T2
USP9Y+3178=M184, M70, M193,M272 P77
T1
T*
M226
S1d
OCEANIA& INDONESIA
P83
S1c
P61
M254 S1b
P57
S1a
M230, P202, P204
S1*
S**
S
M124, P249, P267
R2
R 2
M335
R1b1c
M160
R1b1b2h2
U152 M126
R1b1b2h1
R1b1b2h*
P107
R1b1b2g2
U106 U198
R1b1b2g1
R1b1b2g*
P66
R1b1b2f
M222=USP9Y+3636
R1b1b2e
SRY2627 (M167)
R1b1b2d
M153
R1b1b2c
M65 R1b1b2b
M269
M37
EURASIA
R1b1b2a
P297
R1b1b2*
P25
M373
R1b1b1a
M73
R1b1b1*
R1b1b*
M343 M18
R1b1a
R1b1*
M173=P241, P231, P233, P234, P236, P238, P242 R1b*
P286, P294 M434
R1a1f
M207, M306, P224, P227, P229, P232, P280, P285 Pk5
R1a1e
IX
P98
R1a1d
M64.2, M87, M204
R1a1c
M17M198
M157
R1a1b
SRY10831.2 M56
R1a1a
R1a1*
R1a*
R1*
R*
M378
Q1b
M323
Q1a6
P89
Q1a5
P48
Q1a4
M199, P106, P292
Q1a3a3
M194
M3 Q1a3a2
Q1a3a 2
M19
Q1a3a1
a*
Q1a3a*
Q1a3
P27, 92R7, M45, M74 M242 P36.2 MEH2 M346
X
AMERICA
Q1a3*
M25, M143
Q1a2
M120, N14
Q1a1
Q1a*
Q1*
Q*
P
M333
O3a6
M300
O3a5
P103
O3a4a
(002611)
O3a4*
P101
O3a3c12
M162
O3a3c1a
M134 M117, M133
O3a3c1*
O3a3c*
P164
O3a3b2
P201=(021354) N5
O3a3b1b
M7 M113, M188, M209 N4
O3a3b1a
O3a3b1*
O3a3b*
M159
O3a3a
O3a3*
M324, P93, P197, P199 M164
O3a2
P200 M121, P27.2
M122, P198 O3a1
O3a*
O3*
47Z
AUSTRALASIA
O2b1
SRY465, P49, M176
(022454) O2b*
Pk4
M88, M111 O2a1a
M95 O2a1*
P31, M266
O2a*
M175, P186, P191, P196 O2*
M50, M103, M110
O1a2
M101
O1a1a
M119 P203
O1a1*
MSY2.2
O1a*
O1*
O*
VII
P119 N1c1c
P67
N1c1b
M178
P21
N1c1a
TAT (M46),P105
N1c1
N1c*
LLY22g
P63
P43 N1b1
EUROPE
N1b*
M231
M128
N1a
N1*
N*
P117, P118
M3
SRY9138=M177
M353, M387 M2a
M2
M83 M1b1b
P22=M104 M16
M1b1a
P256 P87 NEW GUINEA
M1b1*
M1b*
P94
M4, M5=P73, M106, M186, M189, M296, P35 M1a2
P51
P34 M1a1
M1a*
M1*
M*
M*
Pk3
L3a
P14, M89, M213
M357
L3*
M274
L2b
M11, M20, M22, M61, M185, M295 M317 M349
L2a
INDUSVALLEY
VIII
L2*
M27, M76
L1
L*
M177 P261 P263 K4
P79
P7 9 K3
P6
P600
K2
SRY M147
9138
K1
ASIA
K
P84
J2b2d
M321
J2b2c
M280
M241 J2b2b
M12, M102, M221, M314
M99
J2b2a
J2b2*
M205
J2b1
J2b*
P279 J2a13
P81
J2a12
M419 J2a11
M340 J2a10
M339 J2a9
MEDITERRANEAN & LEVANT

M319
J2a8
M318
J2a7
M289
J2a6
M172 M410 M158
J2a5
M137
DYS413<18 J2a4
/
M68 J2a3
M163, M166 J2a2b
M327
M92, M260 J2a2a1
M67
J2a2a*
J2a2*
M47, M322
12f2a, M304,P209 J2a1
J2a*
J2**
J2
M369
J1e2
P58 M367, M368
J1e1
J1e*
P56
J1d
J1 d
M390
J1c
J1 c
M267 M365
J1b
J1b
M62
J1a
J1*
J*
P95 I2b4
I2b4
P78 I2b3
M223, P214, P216, P217, P218, P219, P220, P221, P222=U250, P223 M379 I2b2
M284 I2b1
NORTH EUROPE
VI
P215 I2b*
M161
M26 I2a2a
P41.2=M359 I2a2*
P37.2
I2a1
I2a*
P259
I1d
P109
P19,M170, P38, M258, P212, U179 I1c
M253, M307, P30, P40, M450 M72
M227 I1b1
I1b*
M21
I1a
I1*
I*
I*
P266 H2b
APT P80 H2a
M370 H2*
H1b
M39,M138 H1a3
M97
M69 M52 M82 H1a2
M36, M197
INDIA
H1a1
H1a*
H1*
H
M283
G3
M377
G2c
G2 c
M287
G2b
G2 b
M286
G2a2
P17, P18 G2a1a
P287 P15 P16
1*
G2a1*
G2a
G2a*
G2*
G2 *
EURASIA
M201,P257 P76
G1b
P20
M285, M342 G1a
G1*
G*
M427, M428
F2
F 2
M282
F1
F1
F
P9, M168, M294 P258
E2b1a2
M200 P45
E2b1a1
M85 E2b1a*
M54, M90, M98
E2b1*
M75, P68 E 2b*
E2b*
M41 2a
E2a
E
2*
E2*
E
P75
E1b2
M329
E1b1c
P72
E1b1b1f
V6
E1b1b1e
M281
E1b1b1d
M290
E1b1b1c1b
M34 M84, M136
E1b1b1c1a
M123
E1b1b1c1*
E1b1b1c*
M165, M183 E1b1b1b2
M81 M107
E1b1b1b1
E1b1b1b*
V65
E1b1b1a4
V19
M35 E1b1b1a3b
V22 M148 E1b1b1a3a
E1b1b1a3*
P65 E1b1b1a2b
V13, V36 V27
E1b1b1a2a
E1b1b1a2*
M78 V32 E1b1b1a1b
V12 M224 E1b1b1a1a
E1b1b1a1*
M215
E1b1b1a*
III
E1b1b1*
E1b1b*
P268, P269
E1b1a9
P59
E1b1a8a12
U181
AFRICA & MIDDLE EAST

U209, P277, P278 U290 E1b1a8a1a
E1b1a8a1*
U175
E1b1a8a*
E1b1a8*
P113 E1b1a7a3a
P116
E1b1a7a3*
M2, P1, M180=P88, P46, P182 P115
U174 E1b1a7a2
P189, P211, P293 P9.2
E1b1a7a1
M191, U186, U247
E1b1a7a*
P177
E1b1a7*
M10, M66, M156, M195 E1b1a6
SRY4064, M96, P29, P150, P152, P54, P155, P156, P162, P168, P169, P170, P171, P172, P173, P174, P175, P176, (SRY-8289=M40)
P147 M155 E1b1a5
M154 E1b1a4
SRY10831.1, M42, M94, M139
M149 E1b1a3
M116.2 E1b1a2
P2, P179, P180, P181 M58
E1b1a1
DYS391P E1b1a*
E1b1*
YAP (M1) M145 M203 P110 E1a2
M33, M132 M44 E1a1
E1a*
E*
E*
P99 P47 D3a*
D3*
P120
D2a3
M151 D2a2
P53.2 D2a1b1
M116.1 (022457)
D2a1b*
M174 M125 P12 D2a1a1
P42
IV
JAPAN
D2a1a*
(021355)
M55, M57, M64.1, M179, P37.1, P41.1, P190, 12f2b D2a1*
D2a*
D2*
N2
N1 D1a1
M15 D1a*
D1*
D*
M401
P55
C6
P92 C5a
M356
C5*
M210 C4a
M347
C4*
P62
C3e
P53.1 C3d
M48, M77, M86 C3c
M39 C3b
V
M407
M217, Pk2, P44 C3a2
M93
ASIA & AMERICA

RPS4Y711 (M130), M216, P184, P255, P260 M255, M325 C3a1
C3a*
C3*
P54
C2a2
M208 P33 C2a1
M38
C2a*
C2*
P121 C1a
M8, M105, M131, P122
C1*
C*
P112 B2c
MSY2.1, M211 B2b4b
P7 P8, P70 B2b4a
B2b4*
M108.2 B2b3a
M30, M129
M112, M192, 50f2(P) B2b3*
M115, M169 B2b2
P6
II
M182 b1
B2b1
B2
b*
B2b*
B2
M108.1 P111, M43 B2a2
B2a2aa
B2a2*
M150 M109, M152, P32, P50 B2a1a
M218
M60, M181, P85, P90
B 2a1*
B2a*
B 2*
M146 B1a
M236, M288
AFRICA
B1*
M118
A3b2b
M13, M63, M127, M202, M219,M305 M171
A3b2a
A3b2*
M144, M190, M220, P289 P71, P102
A3b1a
M51, P100, P291
A3b1
M32
A3b*
M28, M59
A3a
P262 A2c
I
P28
M6, M14, M23, M49, M71, M135, M141, M196, M206, M212, MEH1, P3, P4, P5, P36.1, Pk1, P247, P248 A2b
M114
M91 P97 A2a
A2*
P114
A1b
P108 M31, P82
A1a
A1*

Analysis of Y-Chromosome Polymorphisms in Pakistani Populations

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Analysis of Y-Chromosome Polymorphisms in Pakistani Populations

Încărcat de

Drepturi de autor:

Formate disponibile

ANALYSIS OF Y-CHROMOSOME POLYMORPHISMS IN

Thesis submitted to the Sindh Institute of Medical Sciences

for the degree of Doctor of Philosophy.

Centre of Human Genetics and Molecular Medicine

Sindh Institute of Medical Sciences

Sindh Institute of Urology and Transplantation (SIUT)

List of Tables iii

Materials and Methods 34

Phylogeography of Pakistani ethnic groups. 51

Comparison between the Pakistani and Greek populations 73

Comparison within Pakistan 88

Comparison between the Pakistani and Greek population 94

Comparison with world populations 98

Insight in to populations origins 111

Table Title Page

I. The possible origins and language affinities of Pakistani populations. 21

II. A list of Y haplogroups, markers, type of polymorphism and

III. List of SNPs typed by AFLP method 42

IV. YSTR Primer sequences. 46

V. Frequency of haplogroups B*, C*, E* and F* in ethnic groups from 53

VI. Number and frequencies of populations fall in haplogroup B-T. 60

VIII. Percentage of variation obtained by AMOVA at three levels of

X. Matrix of significant. FST p values (significance level =0.0500) based

XI. Weighted population pair wise genetic distances (below diagonal)

XII. Description of World populations. 103

Figure Title Page

I. Map of Pakistan showing its neighbors, administrative regions

II. Phylogenetic tree. 26

III. Distribution of haplogroups B*, C*, E* and F* in populations

IV. Y haplogroup frequency distribution in ethnic group of 55

V. Distribution of major Y lineages (PK2, M52, M67, M27)

VI. Distribution of major Y lineages (M357, M173, M17 and M124)

VII. Principal component analysis based on Y haplogroup

VIII. Median-joining network of Lineage L individuals based on Y

IX. A rooted maximum-parsimony tree of Y lineages found in the

X. A plot of the first two principal coordinates based upon the

XII. Neighbor-joining tree showing the relationship between the

XIII. Median-joining network of clade E lineages in Pakistan (open

XIV. Contour map showing the 9 Y-STR haplotypes frequency

XVI. Median-joining network of C lineage. 106

XVII. Distribution of L haplogroup in Indo Pak sub continent. 107

XVIII. Median-joining network of clade B lineages in Pakistan and

XIX. Geographic distribution of O haplogroup. 110

XX. Median-joining network H1-M52 lineage fall in Burusho,

XXI. Possible origins a) Hazara b) Kalash c) Parsi d) Makrani

chromosomal diversity among different ethnic groups from Pakistan. It provides

on the patrilineal origins of these populations. The major conclusions are

1. Genetic relationships in Pakistan are dictated primarily by

geographic proximity rather than linguistics:

dictated primarily by geographic proximity. Ethnic groups speaking Dravidian

(Brahui), Sino-Tibetan (Balti) or the language isolate Burushaski (Burusho) share

genetic affinity with their Indo-European speaking geographic neighbors. Although

comparison with their neighbors in Pakistan.

Based on Y haplogroup frequencies, the majority of the ethnic groups from

constitute 11% of the Pakistani population.

2. The Karakoram Mountains form a formidable barrier to gene flow

Haplogroups, such as haplogroup C3 and O*, that are commonly observed in

boundary and arrived in the sub-continent from the West.

3. Genetic signatures of invasions:

The Indo-European contribution to the Y gene pool in Pakistan is substantial and is

probably a reflection of the colonization of the subcontinent by invaders from West

The presence of a unique star cluster based on Y-STR haplotypes in

descendants of Genghis Khan (1162-1227). These Y chromosomes are prevalent in

V. Frequency of haplogroups B, C, E* and F* in ethnic groups from 53

III. Distribution of haplogroups B, C, E* and F* in populations