Sunteți pe pagina 1din 171

ANALYSIS OF Y-CHROMOSOME POLYMORPHISMS IN

PAKISTANI POPULATIONS

Thesis submitted to the Sindh Institute of Medical Sciences

for the degree of Doctor of Philosophy.

BY

Sadaf Firasat

Centre of Human Genetics and Molecular Medicine

Sindh Institute of Medical Sciences

Sindh Institute of Urology and Transplantation (SIUT)

Karachi, Pakistan

2010
TABLE OF CONTENTS

Title page

Acknowledgements ii

List of Tables iii

List of Figures iv

Summary vi

Introduction 1

Literature Review 19

Materials and Methods 34

Results

Phylogeography of Pakistani ethnic groups. 51

Comparison between the Pakistani and Greek populations 73

Discussion 86

Comparison within Pakistan 88

Comparison between the Pakistani and Greek population 94

Comparison with world populations 98

Insight in to populations origins 111

Conclusions 121

References 122

Appendix a

i
ACKNOWLEDGEMENT

I thank Prof. Dr. Syed Qasim Mehdi H.I. S.I., for his support, encouragement
and for providing all the facilities for doing scientific work in his laboratory.
The work presented in this thesis was done under the supervision of Dr.
Qasim Ayub T.I. It is great pleasure for me to acknowledge the keen interest, advice,
patient guidance and kindness that I have received from him during the course of this
work.
I would like to thank Dr. Shagufta Khaliq, (PoP), for teaching all the molecular
genetics lab techniques and also to Dr Aiysha Abid for comments on this manuscript
and suggestion for its improvement.
I am also grateful to Mrs. Ambreen Ayub for her help in making the contour
map.
I thank my colleague Ms. Sadia Ajaz for her help and cooperation in proof
reading the thesis.
It has been an honor for me to work at SIUT and I thank Prof. Dr Adeeb Rizvi
H.I. S.I., Director, SIUT, for his constant support and guidance.
Finally, I would like to thank my parent, without their love and support the
completion of this work would have not been possible.

ii
LIST OF TABLES

Table Title Page

I. The possible origins and language affinities of Pakistani populations. 21

II. A list of Y haplogroups, markers, type of polymorphism and


genotyping methods used in this study. Y haplogroups were
determined in a hierarchal manner, screening initially with markers
that identified deep lineages (bold) and subsequently genotyping
markers that further delineated the tree in the target population. The
typing methods were amplified fragment length polymorphism (AFLP),
denaturing high performance liquid chromatography (DHPLC),
amplification refractory mutation system polymerase chain reaction
(ARMS-PCR) or dideoxy DNA sequencing (Seq). 41

III. List of SNPs typed by AFLP method 42

IV. YSTR Primer sequences. 46

V. Frequency of haplogroups B*, C*, E* and F* in ethnic groups from 53


Pakistan.

VI. Number and frequencies of populations fall in haplogroup B-T. 60

VII. Y lineages found in the three Punjabi castes examined in this study. 63

VIII. Percentage of variation obtained by AMOVA at three levels of


population hierarchy in ethnic groups from Pakistan. 68

IX. Population pair wise FSTs between Pakistani ethnic groups computed
from Y haplogroup frequencies. FST p values (based upon 110
permutations) are given above the diagonal with *indicating significant
pair wise differences. 69

X. Matrix of significant. FST p values (significance level =0.0500) based


upon 110 permutations among the ethnic group of Pakistan. 70

XI. Weighted population pair wise genetic distances (below diagonal)


and FST values (above diagonal) based on STR variation within
haplogroups. 80

XII. Description of World populations. 103

XIII. Y-STRS data of clade B lineages in Pakistan and African populations. 108
iii
LIST OF FIGURES

Figure Title Page

I. Map of Pakistan showing its neighbors, administrative regions


and the geographical distribution of the populations that are
included in this study. 20

II. Phylogenetic tree. 26

III. Distribution of haplogroups B*, C*, E* and F* in populations


from northern and southern Pakistan. 54

IV. Y haplogroup frequency distribution in ethnic group of 55


Pakistani.

V. Distribution of major Y lineages (PK2, M52, M67, M27)


frequencies in Pakistan. 64

VI. Distribution of major Y lineages (M357, M173, M17 and M124)


frequencies in Pakistan 65

VII. Principal component analysis based on Y haplogroup


frequencies in Pakistani populations. 67

VIII. Median-joining network of Lineage L individuals based on Y


STR haplotypes. 72

IX. A rooted maximum-parsimony tree of Y lineages found in the


Greek, Burusho, Kalash, Pathan and Pakistani populations. 75

X. A plot of the first two principal coordinates based upon the


analysis of Y haplogroup frequencies in Pakistani and Greek
populations. 77

XI. A plot of the first two principal coordinates based upon the
analysis of Y haplogroup frequencies in Pakistani and Greek
samples (1=this study; 2 = Francalacci et al., 2003) using
comparable biallelic markers. 78

XII. Neighbor-joining tree showing the relationship between the


Greek and three Pakistani ethnic groups. The tree is based
on genetic distances. 81

XIII. Median-joining network of clade E lineages in Pakistan (open


circles) and Greece (hatched circles). Circles represent
haplotypes and have an area proportional to frequency. The
Pathan individuals are shown in black. 83

XIV. Contour map showing the 9 Y-STR haplotypes frequency


distribution in Eurasia and northern Africa. This haplotype
was shared between three Greeks and a Pathan individual
belonging to clade E1b1b1a. 85
iv
XV. The frequencies of Major haplogroup in Asian population. 105

XVI. Median-joining network of C lineage. 106

XVII. Distribution of L haplogroup in Indo Pak sub continent. 107

XVIII. Median-joining network of clade B lineages in Pakistan and


African populations. Circles represent haplotypes and have an
area proportional to frequency. The Pakistani individuals are
shown in orange and light blue colour. 109

XIX. Geographic distribution of O haplogroup. 110

XX. Median-joining network H1-M52 lineage fall in Burusho,


Kalash and Pathan, based on their Y-STR haplotype. 115

XXI. Possible origins a) Hazara b) Kalash c) Parsi d) Makrani


Negroid. 120

v
SUMMARY

-1-
The data presented in this thesis provides a comprehensive report on Y

chromosomal diversity among different ethnic groups from Pakistan. It provides

insights into the genetic variation in Pakistan in a global context and also sheds light

on the patrilineal origins of these populations. The major conclusions are

summarized as follows:

1. Genetic relationships in Pakistan are dictated primarily by

geographic proximity rather than linguistics:

The results suggest that within Pakistan male genetic relationships are

dictated primarily by geographic proximity. Ethnic groups speaking Dravidian

(Brahui), Sino-Tibetan (Balti) or the language isolate Burushaski (Burusho) share

genetic affinity with their Indo-European speaking geographic neighbors. Although

the isolation of the Hunza Burusho in the mountains of northern Pakistan has led to

the preservation of their language it has not made them genetically distinct in

comparison with their neighbors in Pakistan.

Based on Y haplogroup frequencies, the majority of the ethnic groups from

Pakistan show evidence of admixture mostly with Central/South Asian and European

populations. This is illustrated by the fact that the major haplogroups such as E*, J*

and R*, that are frequent in west Asians and Europeans, together constitute 65% of

the total. Haplogroups L1 and R2 are shared with populations from India and

constitute 11% of the Pakistani population.

2. The Karakoram Mountains form a formidable barrier to gene flow

from China:

Haplogroups, such as haplogroup C3 and O*, that are commonly observed in

East Asians, are rare, or absent in the Pakistani populations and constitute < 1.5 % of

the total. Populations living in these mountain valleys such as the Hunza Burusho,

Balti and Kashmiri are all genetically closer to other ethnic groups in Pakistan. This
vi
low prevalence, or absence, of East Asian haplotypes in Pakistan indicates that the

Karakoram Mountains, which separate Pakistan and China, form a formidable barrier

to gene flow from the north. The Hazara are the only population with significant East

Asian ancestry but historical records indicate that they did not cross this geographical

boundary and arrived in the sub-continent from the West.

3. Genetic signatures of invasions:

The Indo-European contribution to the Y gene pool in Pakistan is substantial and is

probably a reflection of the colonization of the subcontinent by invaders from West

and Central Asia. These probably replaced the indigenous Y haplogroups which are

now mostly found in South Indians and isolated populations in the Andaman Islands.

Three populations (Burusho, Kalash and Pathan) also claim Greek ancestry

following Alexanders invasion of the subcontinent. However, the results shown here

only provided strong support for a minor Greek genetic contribution to the Pathan

gene pool.

The presence of a unique star cluster based on Y-STR haplotypes in

haplogroup C3 Y chromosomes in the Hazara population has been linked to the male

descendants of Genghis Khan (1162-1227). These Y chromosomes are prevalent in

Mongolia and are observed at a frequency of 60% in a much larger sample of Hazara

males from northern and southern Pakistan that were analyzed in this study.

Although this haplogroup was also observed in the Burusho (8.2%) but these

samples did not share the star haplotype pointing towards separate origins for these

populations. Historical records also support the genetic relatedness between East

Asians and the Hazara.

vii
4. The Kalash as genetic outliers:

This study also demonstrates that the Kalash have a distinct genetic identity

within Pakistan. Located in the remote valleys of the Hindu Kush Mountains they

show significant Caucasian ancestry but also have a high proportion of population

specific haplogroup L3a that is not found elsewhere in Pakistan. Their genetic

uniqueness is a reflection of genetic drift in an isolated population struggling to

maintain their distinct cultural and religious identity.

Future Prospects:

This endeavour expands our knowledge about Pakistani populations and

complements data obtained from analyzing autosomal and mitochondrial markers. It

improves our understanding of geographic, linguistic and religious factors on

population diversity and structure in this region and provides a basis for future work

in this field.

viii
INTRODUCTION

-2-
Where do we come from? What are we? Where are we going? These provocative

questions as framed in the title of the French artist Paul Gauguins painting have

always aroused human curiosity. Using evidence from archaeology, fossils and

lately genetics, scientists have gained insights into humanitys past.

Human evolutionary history begins with the appearance of our species about

2.5 -1.5 million years ago (MYA), the earliest evidence of which has been found in

Africa (Klien, 1989). With the passage of time, various species of the genus Homo

have been identified including H. ergaster, H. erectus, Neanderthals and the H.

floresiensis (Brown et al., 2004; Gabunia and Vekua, 1995; Swisher et al., 1994), all

of whom are now extinct with the exception of modern H. Sapiens, the last fully

developed species that appeared about 100,000 years ago in East Africa (Klien,

1989; Righmire, 1989). The demise of our early ancestors has been attributed to

harsh weather conditions or the difficulty in finding food and other life necessities.

There is consensus among the modern scientific community that modern

humans arose in Africa and several waves of migrations help explain their passage

out of Africa. Evidence from fossils and archaeological remains suggest that

expansion of modern humans became possible when weather conditions were

favorable. The discovery of 125,000 old artifacts in Eritrea`s Red Sea coast (Walter

et al.,2000) suggest that people from the Horn of Africa moved across the Arabian

peninsula to the southern part of the Red Sea. They reached southern Asia,

traveling further east to Australia (Stringer, 2000) around 50-60 thousand years ago

(KYA). The evidence found from Skhul and Qafzeh, in modern day Israel, dating 100

KYA suggests that another wave of migration humans crossed the Red sea and

entered the Levantine region 47 KYA. From Arabia, people moved towards west and

east and reached Western Europe and Siberia about 40 KYA and East Asia about 39

KYA. These waves of migrations resulted in development of several populations and

races of modern humans that are characterized by the differences in their physical

appearance,culture and language.

1
Fossil and archaeological evidence in favour of an African origin for modern

humans is also supported by molecular genetic evidence (Batzer et al., 1996;

Bowcock et al., 1991, 1994; Cann et al., 1987; Cavalli-Sforza et al., 1994; Horai et

al., 1995; Jorde et al., 1995; Knight et al., 1996; Lahr and Foley, 1994; Leakey 1994;

Mountain et al., 1994; Perez-Lezaun et al., 1997; Ruvolo et al., 1993; Scozzari et al.,

1988; Shiver et al., 1997, Stringer and Andrew, 1988; Tattersall, 1997; Tishkoff et al.,

1996). This biological evidence has provided valuable insights and, in association

with paleontology and archaeology, allowed the reconstruction of human history.

The blood groups were the first markers to be analyzed in human populations soon

after the discovery of the ABO blood groups (Landsteiner, 1901). Variations in these

blood groups were analyzed among Second World War soldiers and the slaves from

different nations (Hirszfeld and Hirszfeld, 1919). This was followed by the discovery

and analysis of variation of several classical serological markers such as the

immunoglobulin allotypes, red cell enzymes, human leukocytes antigens (HLA)

(Dausset, 1954; Grubb and Laurell, 1956; Payne et al., 1964) and serum proteins

(Harris, 1966). All these markers collectively contributed to our understanding of the

human variation and charted their origins and dispersals.

WHAT IS DNA?
In 1953 the celebrated Nobel Prize winners Watson and Crick described the

double helical chemical structure of DNA (Watson and Crick, 1953) and laid the

foundations for the development of DNA based genetic markers that have now

become the hallmark of research into our past history. The simple but elegant

structure of DNA that they described has two anti-parallel polynucleotide chains with

a sugar- phosphate backbone. The nucleotide bases in DNA are of only four kinds:

adenine (A), guanine (G), cytosine (C) and thymine (T) that strictly obey hydrogen

bonding of nucleotides A with T and G with C. The sequences of these bases in the

2
polynucleotide chain dictate the structure and function of proteins and every

morphological and functional characteristic of each cell in the human body.

In humans DNA is present inside the cellular nucleus and the mitochondria,

an extra nuclear organelle. In the mitochondria the DNA is small, circular and double

stranded with a length of 16,569 base pairs (bp) (Anderson et al., 1981; Ruiz-Pesini

et al., 2007). It consists of only 37 genes but has been extremely useful in tracing

back the maternal origin of the human populations because it has three important

characteristics:

1.) A maternal mode of inheritance (Giles et al., 1980).

2.) A high mutation rate (Olivio et al., 1983).

3.) A lack of recombination (Brown, 1979).

The human nuclear genome consists of a double stranded DNA molecule that

is packed into 23 pairs of chromosome. Of these twenty-two pairs or autosomes are

identical in both male and female. One pair, the sex chromosomal pair, is different in

the sexes. Females have two X chromosomes whereas males have one X-

chromosome which they inherit from the mother and one Y chromosome which is

paternally inherited. This Y-chromosome is passed from a father to his son and does

not undergo inter-chromosomal recombination for most of its length. This feature has

been of great value in the study of variation in modern human males.

The completion of the Human Genome Project (International Human Genome

Sequencing Consortium, 2004) has revealed that enormous variation exists in our

genome. Only 2-3% of our genome codes for functional molecules such as proteins

and RNA. The intergenic regions, which constitutes 97-98% of the sequence,

consists of repetitive sequences, regulatory sequences, pseudogenes, intermediate

to large scale DNA copy number and sequence variants. All are remnants of our

evolutionary past and provide valuable insights about what makes us human.

The human genome contains three billion pairs of nucleotides. The sequence

of the nucleotides that constitutes the DNA strand carries all the genetic information

3
required for the survival of an organism. The gene, which codes for a protein product

is located at a relatively fixed position on a chromosome and performs specific

biological functions during the development of an individual from a fertilized egg and

throughout life. Recent estimates show that the human nuclear genome contains

about 20,000 25,000 genes (The ENCODE Project Consortium, 2007).

Any change that occurs in the DNA sequence is referred to as a mutation or

polymorphism. It can be categorized on the basis of its size as either a large or small

scale mutation. Large scale mutations can also include abnormalities such as an

alteration in chromosomal number that occur in Downs syndrome (trisomy 21)

Klinefelters syndrome (XXY) and Turner syndrome (XO), or chromosomal

translocations as observed in the Philadelphia chromosome t(9;22)(q34;q11). These

chromosomal abnormalities can be easily detected by cytogenetic analysis. Small

scale mutations refer to the alteration in the sequence of the nucleotides. This

includes the replacement of one nucleotide with another, or the deletion, or insertion,

of any of the four nucleotides resulting in a new allele for a particular gene. In some

instances these new alleles may result in disease or improve the fitness of the

organism. In most cases they are neutral changes and do not play any beneficial or

detrimental role.

Any mutation in the germ line DNA sequence is inherited in a stable form and

has the ability to pass from one generation to the next. Mutations can occur either at

the time of recombination during meiosis, when the parental DNA is transmitted to

their progeny or during mitotic cell division that occurs throughout the life time of an

individual. They occur due to errors in DNA replication during cell division. Copying

DNA requires great accuracy for the insertion of the correct nucleotide to the growing

polynucleotide strand. DNA replication enzymes, the DNA polymerases have proof

reading activity that reduces the error rate. The 3`-5` exonuclease activity of these

enzymes removes one incorrect nucleotide at a time from the 3` hydroxyl terminus

until the correct nucleotide appears. Despite these effective DNA proof reading and

4
repair mechanisms replication error occurs at about10-9-10-11 per incorporated

nucleotide (Cooper et al., 1995; Cooper et al., 2000).

HUMAN GENETIC POLYMORPHISMS

In humans 99.9 % of the genome is identical and only 0.1-2.0% of the DNA

sequence shows variation. These variations result in genotypic differences between

individuals as well as phenotypic differences commonly observed in traits such as

height, facial morphology, skin, eye and hair colour. These variations occur due to

polymorphisms which are non-pathogenic changes that exist at significant

frequencies (usually > 1%) in any given population. To date many types of

polymorphism have been discovered in the coding regions as well as in the non-

coding regions of the human genome and they form the basis of all current genetic

markers. They are used not only to unravel our evolutionary past but to genetically

predict our biological future and as diagnostic markers.

The non-coding DNA sequences that constitute the bulk of the human

genome are dispersed through out the genome. The exact function of these non-

coding regions remain unknown and this non-genic DNA also known as selfish or

junk DNA.

Several recent findings have shown the dynamic nature of these regions that

play a major role in gene regulation. The junk DNA does not encode for any product

used by the cell. It has a tendency to repeat the sequences many times. In some

instances this interferes with the function of other genes or increases their copy

number. A great amount of non-coding DNA consists of short tandem repeats of

nucleotide, in the form of an array or a block of bases, scattered through out the

genome.

5
According to their size, the human polymorphisms can be classified as single

nucleotide polymorphisms, and repeat polymorphisms that include satellite DNA,

mini-satellite DNA, micro-satellite DNA and copy number variants.

SINGLE NUCLEOTIDE POLYMOPHISMS:


The most common polymorphism in the human genome is the single

nucleotide polymorphism (SNPs). SNPs include single base substitutions, deletions

or insertions. The base substitutions can be classified into two groups namely

transitions and transversions. In case of transition the purine is replace by a purine

(A G) or a pyrimidine by a pyrimidine (C T). Transversion is the substitution of

a purine by a pyrimidine (A/G C/T) or vice versa (C/T A/G). According to

Collins and Jukes (1994) the transition mutation occurs frequently in the mammalian

genome as compared to transversions.

SNPs are dispersed throughout the genome such as in the promoter region,

coding sequences, intronic sequences and non-coding regions. According to the

single nucleotide polymorphism database the human genome contains more than 55

million SNPs. More than 6 million SNPs lie within genes (Serre and Hodson, 2006).

SNPs were the first generation of polymorphic genetic markers. Their use

was realized in late 1970s with the development of restriction fragment length

polymorphism (RFLP) (Roberts and Murray, 1976). RFLP occurs when a mutation

causes a loss or gain of the recognition site for a restriction enzyme. Restriction

enzymes were discovered in 1968 (Meselson and Yucan, 1968) and they are of three

types designated TYPE I, II and III. Among them TYPE II restriction enzyme are

most useful for genotyping. These restriction endonucleases recognize specific DNA

sequences and cut the DNA within, or near, the recognition sequence. The first

polymorphism in a restriction enzyme site was observed for the human globin

structural gene with the restriction enzyme HpaI (Kan and Dozy, 1978).

6
Since then many SNP genotyping methods such as heteroduplex analysis

(Lichten and Fox, 1983), single-strand conformational polymorphism (Orita et al.,

1989), enzymatic mutation detection (Youil et al., 1995), microarray or variant

detector arrays (Dong et al., 2001; Hacia et al., 1999; Hacia and Collins, 1999;

Marshall and Hodgson, 1998; Qi et al., 2001; Ramsay, 1998; Wang et al., 1998;

Yoshino et al., 2001), high-throughput SNP genotyping (Jenkins and Gibson, 2002,

McClay et al., 2002), and molecular beacon methods (Mhlanga and Malmberg, 2001)

have been discovered to construct high-density SNP maps. More recently massively

parallel resequencing has revolutionized the pace of discovery of SNPs in individual

genomes and the Thousand Genome Project aims to catalogue SNPs occurring at

frequencies of <1% in several diverse human populations (Wheeler et al., 2008).

In the present century SNPs have become the markers of choice for many

applications in the forensic sciences and medical and evolutionary genetics. The

recent discovery of large numbers of SNPs and the determination of their allelic

frequencies in various populations provides a new approach to disease detection,

anthropological studies and pharmaco-genetic analyses which will benefit the

biomedical sciences. Studies have identified genetic variation due to SNPs as one of

the factors associated with susceptibility to many common diseases such as heart

disorders, blood pressure (Koschinsky et al., 2001), Type II diabetes (Tsunoda et al.,

2001), and asthma (Immervoll et al., 2001).

The discovery of million of SNPs has greatly aided the field of pharmaco-

genetics and pharmacogenomics which aims to tailor drugs based on a persons

genotype. The relationship between the SNPs, disease and medicine are not the

same among various populations or even among the individuals within a population.

Due to the presence of variations in the target genes or drug metabolizing enzymes,

some patients suffering from the same disease exhibit a life-threatening adverse

reaction to a particular medicine while others fail to show any adverse reaction.

Some show intermediate responses for the same drug. The genotype of an

7
individual based upon SNP markers will soon allow the design of different new and

more efficacious drugs for individual patients.

SNPs have also helped in understanding how the modern humans and their

genome has evolved. In particular, SNPs found on the Y chromosome and

mitochondrial DNA have been used to describe the origins and migrations of our

male and female ancestors, respectively.

COPY NUMBER VARIANTS:

Copy number variations (CNVs) are structural variations in DNA sequence

that occur due to differences in the number of copies of a particular genomic region.

They evolve due to the duplication or deletions of DNA segment ranging several

kilobase (kb) to mega base in size (Feuk et al., 2006).

CNVs were first uncovered among the normal, healthy human individuals

soon after the completion of the human genome project and many studies have

shown them to be as prevalent as SNPs and an important source of genetic

variation, contributing to our uniqueness (Feuk et al., 2005; Hinds et al., 2006; Iafrate

et al., 2004; Sebat et al., 2004; Sharp et al., 2005; Stefansson et al., 2005; Tuzun et

al., 2005). It is estimated that about 12% of the human genome and thousands of

genes differ with respect to copy number variation (Carter, 2007).

CNVs often encompass genes, and lead to dosage imbalances (Buckland,

2003; McCarroll et al., 2006; Repping et al., 2006). They have been shown to

influence phenotypic variation, gene expression and gene dosage and are

associated with several human diseases through these mechanisms. An increase in

the copy number of EGFR gene increases risk for non-small cell lung cancer

(Cappuzzo et al., 2005). Another study has demonstrated that the high copy number

of CCL3L1 is associated with lower susceptibility to human HIV infection (Gonzalez

et al., 2005). Low copy number of FCGR3B (CD 16 cell surface immunoglobulin

8
receptor) can increase susceptibility to systemic lupus erythematosus and similar

inflammatory immune system disorders (Aitman et al., 2006).

The most widely used method to study CNVs is by DNA microarray

technology based on comparative genome hybridization (CGH) using synthesized

oligonucleotides. This technology has been useful in the detection of new CNVs and

their association with normal and disease phenotypes (Carter, 2007). In the most

complete world wide analyses (Redon et al., 2006) the first-generation CNV map was

constructed using two different platforms of microarrays: single-nucleotide

polymorphism (SNP) genotyping arrays, and clone-based comparative genomic

hybridization. In this survey a total of 1,447 copy number variable regions (CNVRs),

covering 360 megabases (12% of the genome) were identified in 270 individuals that

had been previously surveyed for SNPs (The International HapMap Consortium,

2005).

SATELLITE DNA:
It is located mainly in the darkly stained region of chromosomes referred to as

heterochromatin. Its exact function is unclear (Csink and Henikoff, 1998; Henikoff et

al., 2001) but transcription is limited in this region and it is thought to play a role in the

structure and function of centromeres (Grimes and Cooke, 1998). It consists of large

blocks of short tandem repeats. Although genotyping these repeats are not easy, it

has been used in human evolutionary studies (Oakey and Tyler-Smith, 1990).

MINI-SATELLITE DNA:
The mini-satellite DNA or the variable number of tandem repeats (VNTR)

(Nakamura et al., 1987) was first identified in the human myoglobin gene (Jeffery et

al., 1985). It consists of intermediate size arrays of short tandem repeats and

thousands of arrays ranging from 0.1-20 kilobases (kb) are found in the euchromatic

region of eukaryotes chromosome (Jeffreys, 1987).

9
Most mini-satellites are rich in GC content and clustered towards the ends of

the chromosomes (i.e. telomeres) (Royle et al., 1988). The majority of mini-satellite

DNA is transcriptionally inactive, but in some cases they are expressed for example

MUC1 locus (Swallow et al., 1987).

Mini-satellites are highly polymorphic (Wong et al., 1987) with

heterozygosity values between 70 - 90% (Jeffrey et al., 1985) and their mutation rate

is also higher in comparison to the classical genetic markers (Jeffery et al., 1988). It

is estimated that mutations occurs at a frequency of 1-2% per gamete per generation

resulting in a new variant with a different repeat copy number in individuals and

populations. Baird et al., (1986) were among the first to analyze two VNTR loci,

HRAS-I and D14S1 in various populations.

MICROSATELLITE DNA:
The microsatellites also referred to as short tandem repeat (STRs)

polymorphisms or simple sequence repeats (SSR) are a special class of tandem

repeats firstly recognized by Birnboim and Straus (1975) as polypyrimidinic

stretches. The term microsatellite was coined by Litt and Luty (1989) and Edward et

al., (1991) coined the term STR.

STRs are composed of 1-6 base pair repeat units that follow each other in

tandem (Tautz, 1989). Depending upon the number of bases in the repeat unit they

are classified as mono-, di-, tri-, tetra-, penta-, or hexa-nucleotide repeats. The tetra-

nucleotide repeat (GATA) and the array of TG repeats were the first di-nucleotide

STRs identified in human delta and beta globin gene (Miesfield et al., 1981).

Subsequently CA repeats were identified in the actin gene of cardiac muscles

(Hamada and Kakunaga, 1982) and several other di-nucleotide repeats (GT or CA)

were described by these groups (Epplen et al., 1982; Hamada et al., 1982)

10
respectively. These repeats are found in the euchromatin region of the chromosomes

and do not generally cluster near the telomeric regions.

STRs constitute about 2% of the human genome and are more frequent than

the mini-satellites. Estimates place the number of STR loci to be approximately

100,000 in the human genome. Both mini-satellites and STRs can be produced by

the unequal crossing over and by DNA slippage during replication (Kruglyak et al.,

1998; Toth et al., 2000). New STR alleles are thought to arise mostly by the DNA

slippage during replication (Di Rienzo et al., 1994; Jeffrey et al., 1993; Kimmel and

Chakraborty, 1996; Shriver et al., 1993; Valdes et al., 1993).

In humans the di-, tri- and tetra-nucleotide repeats are more frequent in

comparison with the large polymorphic repeats. Among all classes of STRs the most

frequent are the di-nucleotide repeats that comprise 0.5% of the genome. They are

highly polymorphic and tend to mutate more rapidly as compared to the tri- and tetra-

nucleotides (Chakraborty et al., 1997; Webster et al., 2002). The motifs of CA/TG

repeats are present at a frequency of 1 per 36 kb whereas the AT/TA motifs are

present at 1 per 50 kb. The less common AG/CT arrays are presents at a frequency

of 1 every 125 kb. The rarest di-nucleotide repeats are CG/GC repeats that are

present at 1 per 10 Mb. Among the tri-nucleotides the most frequently found arrays

are the ACC repeats followed by AGC, ACT and less common ACG.

Genetic variation at STR loci make them very useful genetic markers that

have been extensively applied towards human identification specially in forensic

cases (Budowle et al., 1998; 2001; Gill et al., 1994), linkage analysis of disease

(Dietrich et al., 1992; Hearn et al., 1992; Jefferys et al., 1985; Jefferys and Pena,

1993; Queller et al., 1993; Todd et al., 1991) and as a powerful tool for the

investigation of human past and diversity (Bowcock et al., 1994). The multi-allelic

variation at STR loci has been exploited by population geneticists to create a

powerful, accurate and informative tool that has aided in reconstructing the

11
evolutionary history of man and exposed the relationship between various world

populations and languages (Ayub et al., 2003; Rosenberg et al., 2002).

A striking feature of STRs is their high mutation rate in comparison with

SNPs. The average mutation rate for tri- and tetra-nucleotide repeats at autosomal

loci is estimated between 7.0 x 10-4 and 9.3 x 10-4 (Zhivotovsky et al., 2000) and for

Y-chromosomal STRs estimates range between 2.4 x 10-3 and 6.9X10-4 per locus,

per generation depending upon whether the mutation rate is observed (Kayser et al.,

2000) or inferred (Zhivotovsky et al., 2004).

Although there is some evidence that the STR loci are neutral in nature and

not involved in any biological function, yet many studies show that some STRs, such

as CA repeats, are involved in the enhancement of gene expression (Hamada et al.,

1984). Many of them have binding sites for specific nuclear proteins (Richards et al.,

1993), most of which promote homologous recombination (Treco and Arnheim,

1986). The tri-nucleotide STR loci are associated with several genetic diseases.

The first such association of the tri-nucleotide motif CCG was reported with fragile X

syndrome (Fu et al., 1991; Kremer et al., 1991; Verker et al., 1991). In normal

individuals 6 - 54 CCG repeats are located at the 5 untranslated region of fragile X-

mental retardation 1 gene (FMR1). In affected individuals these number between

52 to 1000 repeats. The meiotic instability of these repeats are associated with over

a dozen of human diseases such as, X-linked spinal and bulbar muscular atrophy

SBMA (La Spada et al., 1991), myotonic dystrophy (Brook et al., 1992; Fu et al.,

1992).

TRANSPOSABLE ELEMENTS:

The other class of repetitive DNA includes the interspersed repetitive non-

coding DNA that occupies 45% of the human genome (International Human Genome

Sequencing Consortium, 2001; Li et al., 2001). Polymorphisms of this class have

12
also been linked with certain diseases. These are derived from mobile DNA

sequences, also called transposable elements (Prak and Haig, 2000; Smith, 1999).

These elements have an ability to migrate from one region of the human genome and

integrate into another region (Prak and Haig, 2000; Smith, 1996). Until now there is

no known mechanism for the removal of these elements.

The transposable elements can be characterized in to four groups:

A) Long interspersed nuclear elements (LINES)

B) Short interspersed nuclear elements (SINES)

C) Long terminals repeats LTR transposons (retro- virus like elements)

D) DNA transposons.

Depending upon the transposition mechanism these four groups are broadly

organized into two groups:

1) Retrotransposons or retroposons:
2) DNA transposons.
Retro transposons are transposable elements that make their copies through

reverse transcriptase and include LINEs, SINEs and LTRs. Cellular reverse-

transcriptases transcribe mRNA into neutral cDNA which is then integrated in any

region of chromosomal DNA.

In DNA transposons the DNA sequences are excised and directly integrated

into another place of the genome by a cut and paste mechanism. DNA transposons

accounts for 3% of the human genome and virtually all human DNA transposons are

non-functional (Strachan and Read, 2004).

The most successful and ancient transposable elements are the LINES.

These elements first appeared in the eukaryotic genomes about 600 million years

ago (Malik et al., 1999) and collectively comprises about 21% of the human genome.

These elements are sub divided into three distantly related families LINES 1, LINES

2, and LINES 3. In comparison with LINES 2 and LINES 3 elements, the LINE 1

13
element is the only family, which is still being actively transposed (International

Human Genome Sequencing Consortium, 2001).

LINE 1 is an important transposable element about 6.0 kilo-bases (kb) long.

Recent estimates based on computational methods suggest that about 500,000 L1

fragments reside in the human genome and make up 17% of the genome. (Lander et

al., 2001; Smith, 1996). These elements are mostly found in AT rich regions

(Kongberg and Rykowski, 1988). The LINE 1 element consists of two open reading

frames ORF1 and ORF2. ORF1 encodes a 40 kilo Dalton (kDa) RNA-binding protein

while ORF2 encodes 150 kDa protein, which have both endonuclease and reverse

transciptase activity (Feng et al., 1996; Mathias et al., 1991). The LINE 1 transcript

moves from the nucleus to the cytoplasm where it is translated to yield ORF proteins.

The LINE1 RNA assembles with its own encoded proteins and re-enters the nucleus,

where the L1 endonuclease cleaves one strand of DNA preferably at the 5`-TTTT.A-

3`consensus site (Cost and Boeke, 1998; Feng et al., 1996; Jurka, 1997; Morrish et

al., 2002) and the reverse transcriptase uses the same site to prime reverse

transcription from the 3` end of the LINE RNA. At the time of integration, in most

instances, the reverse transcription fails to proceed to the 5` end resulting in a

truncated, non-functional copy of LINE 1 element.

In the human genome about 99.8% copies of the LINE1 elements present are

defective (Gilbert et al., 2002; Kazazian and Moran,1998; Myers et al., 2002;

Ostertag and Kazazian, 2001; Sassaman et al., 1997) with an average size of 900 bp

(Lander et al., 2001). It is estimated that approximately 40 elements of L1 family are

still functional and produce new copies (Sassaman et al., 1997). At least 1 in every

50 humans has a new genomic L1 insertion. These occur in the parental germ cell or

during early embryonic development (Goodier et al., 2001; Luningprak et al., 2003;

Ostertag et al., 2002). The functional significance of this occurrence is unknown but

these new copies can be used as genetic markers such as the L1 insertion in the

centromeric alphoid array of human Y chromosome designated as LY1 (Santos et al.,

14
2000). Some times these insertions can lead to disease as in the case of hemophilia

B (Brooks et al., 2003; Kazazian et al., 1988).

SINES comprises 13% of the human genome. These sequences are 100-

400 bp long and include the Alu repeats which are dispersed throughout the human

genome. Unlike LINE elements they do not encode any protein and use the LINE

machinery for their transposition (Kajikawa and Okada, 2002). All, except one, of the

families of SINE elements originated from tRNA. The only exception is the Alu family

which originated from signal recognition particle component (SRP 7SL) RNA (Ullu

and Tschudi, 1984).

The Alu elements are about 300 bp long and they constitute 10.7 % of the

human genome. The Alu insertion has been postulated to have occurred early in

primate evolution, about 30-65 million years ago (mya) (Batzer et al., 2002; Deininger

et al., 1992; Deininger and Daniels, 1986; Deininger and Slagel, 1988; Kapitoov,

1996; Labuda et al., 1991; Shen et al., 1991). A subfamily of these Alu repeats

termed as human specific (HS) repeats (Batzer et al., 1990) appeared in the human

genome record within the last 6 million years (Batzer et al., 1991; Batzer and

Deininger, 1991). Approximately 75% of these HS repeats are present in all human

populations indicating that they were inserted early in human evolution and were

completely fixed before the migration of humans from Africa (Deininger et al., 1999).

Alu repeats have also proven to be extremely useful genetic markers (Myers

et al., 2002; Watkins et al., 2001). About 25% (400 sites) of these recent Alu

insertions are variable among world populations and highly informative in

ascertaining the relationships between human populations. Several Alu insertions

are associated with human diseases such as hypertension (Barley et al., 1996; Duru

et al., 1994; Jeng et al., 1997), myocardial infarction (Ludwing et al., 1995),

ventricular hypertrophy (Schunkert et al., 1994) and cardiomyopathy (Raynolds et al.,

1993).

15
In the human genome 8.5% of repetitive DNA belongs to LTR which

comprises of autonomous and non-autonomous elements. About 4.7% of the human

genome is occupied by the autonomous endogenous retroviral sequences (ERV).

This human ERV (HERV) contains many sub-families and shows a small number of

polymorphism (Turner et al., 2001). Many of the LTRs are defective and

transposition has been rare. The non-autonomous element of LTR consists of the

MaLR family accounts about 3.8% of human genome. This family lacks the pol gene

and at times the gag gene.

Over the past decade the genetic variation of these DNA based markers has

been exploited to unravel the paternal and maternal lineages and the relationship

among modern humans (Cavalli-Sforza, 1994; Hammer et al., 1997; Quintana-Murci

et al., 1999 a, b and c). The current study was designed to use polymorphic markers

to uncover the genetic history of ethnic groups residing in present day Pakistan and

provide basis for further analyses of these populations in genetic association and

disease susceptibility studies.

THE GENETIC HISTORY OF PAKISTAN

The modern state of Pakistan was established on August 14, 1947, but the

region where it is located, the Indo-Pak subcontinent, has been of importance

throughout human history. The country lies on the postulated southern coastal route

that modern humans took from Africa to Australia.

The earliest evidence indicates that humans were present in this region

around 100,000 -150,000 years ago but the fossil record is non-existent. Neolithic

sites have been found in the Peshawar Valley in the north-west and at Mehrgarh, in

the south-east in the province of Baluchistan (Jarrige, 1991). The evidence found at

Mehrgarh indicates a modern human settlement dating to around 7,000 B.C. This

predates the region's other earliest civilizations, the Indus Valley civilizations found

16
throughout the sub-continent with major centres at Harappa and Mohenjo-Daro in

Pakistan. This civilization flourished in the 3rd and 2nd millennia B.C. (2,500-1,500

B.C.).

Due to its geostrategic importance as the gateway to India this region was

invaded many times. Around 1,500 B.C. the Indo-European speaking nomadic

pastoral tribes, the so-called Aryans, entered this region through the Hindu Kush

Mountains and established their supremacy replacing the Dravidian language

speakers who were thought to be there initially. Their rule lasted from about 1,500

B.C.500 B.C. when this region was occupied by the Persian Empire. In 326 B. C.

this region was conquered by Alexander the Great. Subsequently it was conquered

by the Mauryas (305 B.C.), Saka (97 B.C.), Arabs (711 A.D.), Turks (1001), Mughals

(16th cen.) and lastly by the British Empire.

India and Pakistan house many different races and languages and are often

referred to as "a museum of races." Present day Pakistan has a population of over

170 million (Pakistan Economic Survey, 2006-2007) and consists of more than 12

ethnic and linguistic groups, the majority being descendants of the invader stocks.

Ethnic groups from the southern part of Pakistan include Baloch, Brahui, Makrani

Baloch, Makrani Negroid, Parsi and Sindhi. Major populations represented by the

northern groups include Balti, Burusho, Kalash, Kashmiri, Pathan and Punjabis. The

latter form the majority population of this country and include several castes.

Linguistic groups found in Pakistan include a language isolate, Dravidians, Sino-

Tibetans and Indo-Europeans. The latter is spoken by a majority of the population.

STUDY OBJECTIVE:

The main objective of the study is to shed light on the population histories of

numerous ethnic groups living in modern day Pakistan. Earlier studies used a only

limited number of polymorphic Y chromosomal markers (Qamar et al., 1999, 2002)

and since then many more informative Y-SNPs have been discovered (Karafet et al.,

17
2008) which have not been typed in this population. Another caveat of the earlier

work was the lack of samples from the Punjab which constitutes the majority

population of Pakistan and this has been addressed in this study.

The study aims to screen Y chromosomal variation in a large number of

Pakistani males from various ethnic and linguistic backgrounds in order to

understand population origins and substructure and unravel the influence of Central

Asia, China, Greece and Persia on this population. Statistical analyses and

simulation modeling is used to identify geographic origins of population groups,

episodes of genetic bottlenecks, demographic expansions and genetic admixture. It

is my hope that these analyses will improve our knowledge of group membership

within Pakistan that will have practical applications in DNA based human forensic

analyses, the design of disease association studies and have implications in

rationalizing use of medicines tailored to an individuals genetic make up.

18
LITERATURE REVIEW

3
PAKISTAN AND ITS POPULATIONS

Pakistan lies in a region that has seen the passage of many invaders and all

have contributed to the racial and linguistic diversity found in this country. It is

bordered by China in the north, India in the east, Iran and Afghanistan on the west

and the Indian Ocean straddle the southern coast line. The Pakistani population

according to the Ministry of Finance is estimated to be 156,770,000 (Pakistan

Economic Survey, 2006-2007) but the World Health Organization estimates the

number to be much higher.

Pakistan consists of four provinces, the northern areas and the Federally

Administered Tribal Areas (FATA) which are located on the Afghan frontier. More

than 18 ethnic and 60 linguistic groups (Grimes, 1992) reside in this country. Major

ethic groups include Baloch, Brahui, Pathans, Punjabis and Sindhis. The majority

Punjabi speaking populations show a great and complex admixture of many ethnic

caste and groups (Ibbetson, 1883) such as the Gujar, Jats, Meos, Rajput and Arians

etc. Other ethnic groups that are of anthropological interest include the Makrani-

Negroid, Mohanna and Parsi in the south and Balti, Burusho, Kalash and Kashmiri in

the north. Of particular interest are the Hazara population which resides in

Baluchistan and the North West Frontier Province (N.W.F.P.). The geographic

locations of the above mentioned Pakistani population are shown in Figure I and their

possible origins and linguistic affiliations are listed in Table I.

19
Figure I. Map of Pakistan showing its neighbours, administrative regions and

the geographical distribution of the populations that are included in this study.

20
Table I: The possible origins and language affinities of Pakistani populations.

The numbers in brackets refers to the population size.

Location Population Language Suggested Origins

North

Balti (300,000) Sino-Tibetan Tibet.

Burusho (60,000) Isolate Greek; Central Asian.

Hazara Indo-European Genghis Khans soldiers.

Kalash (5,000) Indo-European Greece; Syria?

Kashmiri Indo-European Jewish, Indo-Aryans.

Pathan (17,000,000) Indo-European Jewish; Greek; Admixture.

Punjabi (63,000,000) Indo-European Admixture.

South

Baloch (4,000,000) Indo-European Aleppo, Syria

Brahui (1,500,000) Dravidian West and Central Asia

Makrani Baloch Indo-European West Asia

Makrani Negroid Indo-European Africa

Mohanna Indo-European Indigenous fishermen

Parsi (~2000) Indo-European Persia/Iran

Sindhi (15,300,000) Indo-European Admixture

21
Three major Pakistani populations: the Baloch, Brahui and Makrani reside in

the province of Baluchistan and constitute the southern group. Historians believe

that the Baloch migrated from West Asia to South Asia. They claim that they are of

Semitic stock and that between 1 and 2 millennium B.C. their homeland was the

ancient region of Nineveh and Babylon in modern day Iraq. From there they

migrated to Iran, Afghanistan and Pakistan. Many Baloch tribesmen reside in south-

east Iran as well. Some historians also claim that they came from Aleppo in Syria in

682 A.D. (Quddus, 1990) when at least 44 tribes migrated to Iran. Their movement

into Pakistan is considered to be recent. At the beginning of the 10 th century they

moved from Iran and occupied Sistan and as a result of Seljuq invasion they settled

on land of Makran. In the fifteen century they migrated eastward and settled in

Kachi. Now they occupy the area of Sibi and the Loralai District of Quetta Division in

Pakistan (Marri, 1985).

The Brahuis are considered to be the descendents of Turko-Iranian tribes that

migrated from west and central Asia and settled in the Sarawan and Jhalawan

regions of Kalat State in Baluchistan (Hughes-Buller, 1991; Quddus, 1990). They are

the only group in Pakistan that speaks a Dravidian language.

The southwestern dry and arid Makran coast of Pakistan is home to two

distinct populations of Makranis: ___ the Makrani-Baloch and Makrani-Negroid. The

Makrani-Baloch expresses linguistic and ethnic affiliation with the neighboring Baloch

tribes (Grimes, 1992). However, many Makrani have Negroid features and are

referred to as Makrani-Negroids. It has been hypothesized that they originated in

Africa and migrated to Pakistan along the coastal route.

Another population that reside in Baluchistan, mainly in and around the

provincial capital, Quetta, are the Hazara. The name Hazara is derived from the

Persian word meaning thousand. This population is also found in the town of

Parachinar in the NWFP and widespread in Afghanistan. They have typical Mongol

features and claim descent from a detachment of thousand soldiers left behind by
22
Genghis Khan during his invasion of India. Historical records show that they settled

in Pakistan to escape persecution in neighboring Afghanistan.

The other populations from southern Pakistan include the Sindhi, Mohanna

and Parsi all of whom reside in the south eastern province of Sindh. The Sindh

province is referred to in several ancient texts ___ Sindomana by the Greek and

Sindhudesha by ancient Hindus. This region was conquered by the Greek,

Parthians, Brahmans, Arabs, and finally by the British and Mohenjo-Daro, the jewel

of the Indus Valley Civilization, is located here. As a result of multiple invasions and

migration the Sindhis are considered to be an ethnically mixed population of Indo-

European speakers. The Mohanna are another Indo-European population of

fishermen who have been residing on the banks of the River Indus for centuries.

Little is known about their origins.

The suggested origin of the Parsis is in Persia (Nanavutty, 1997). They are

the followers of the Iranian prophet Zoroaster, migrated from Iran to the state of

Gujrat in northwest India in 7th century A.D. after the collapse of the Sassanian

Empire. Many Parsis eventually settled in Mumbai in India and Karachi in Pakistan,

although very few remain in Pakistan.

Several populations reside in the northern part of Pakistan. The Pathans

reside in the North West Frontier Province (N.W.F.P) and its adjoining tribal areas.

They also inhabit the southern and eastern part of Afghanistan and Baluchistan

province of Pakistan. They are also known as Pushtuns, Pakhtuns or Afghans and

are an Eastern Iranian ethno linguistic group formed by amalgamation of several

tribes practicing a traditional code of conduct and honor. They claim to be

descendants of soldiers who came with Alexander the Great and several historical

sources suggest that they are of Semetic stock (Caroe, 1958).

Northern Pakistan is also home to some unique ethno-linguistic populations.

Among them are the Balti, Burusho and Kalsh. Baltis speak a Sino-Tibetan language

and their suggested origin is in Tibet (Dani, 1991). They reside in Baltistan, the north

23
eastern Himalayan region of Pakistan.

The Burusho, one of the isolated northern populations, also believe that they

are the descendants of Greek generals who came to the subcontinent with Alexander

the Great in 327-323 B.C. (Biddulph, 1977). They reside in Hunza, Nagar and Yasin

Valleys in the Karakorum Mountains and are the only population in Pakistan who

speak a language isolate.

The Kalash also claim descent from Greek Macedonia citing Alexanders

invasion of the subcontinent. They reside in the valleys of Bumburet, Rambur, and

Birir near Chitral in the Hindu Kush Mountain ranges in the NWFP. They have been

extensively studied by anthropologists for their unique culture and traditions (Lines,

1999).

DEMOGRAPHIC HUMAN HISTORY

Human diversity occurs as a result of multiple events during human evolution,

migration, and colonization (Lahr and Foley, 1994). Studies reveal that human

history can be deciphered from the analyses of the human genome. The genomic

variation in human individuals and populations contains enough information to allow

the reconstruction of human population history, migration patterns and population

structure.

At the beginning of 20th century data obtained from protein markers led to

insights into human origins, divergence and demographic history (Cavalli-Sforza,

2005). However, in recent years DNA based markers have proved to be more

efficient tools for elucidating questions of human evolution and migration. An

informative DNA marker should be both highly polymorphic and selectively neutral.

DNA markers on the non-recombinant portion of the human Y chromosome and the

mitochondrial DNA are polymorphic markers that have been successfully applied to

shed light on human evolutionary history from the male and female perspective,

respectively.

24
Y CHROMOSOMAL VARIATIONS

Y-chromosomal DNA polymorphisms were first reported in 1985 (Casanova

et al., 1985; Lucotte and Ngo, 1985). Since then more than 600 binary

polymorphisms, the majority of them being SNPs, and numerous multi-allelic STR

markers have been identified on the human Y-chromosome (Karafet et al., 2008).

Since most of the Y chromosome does not undergo recombination these

biallelic polymorphisms define unique mutational events and therefore, unique Y

chromosomal haplogroups. The presence of numerous biallelic polymorphisms

allows their organization in the form of a phylogenetic tree that shows relationships

among the various Y haplogroups. Efforts by the Y Chromosome Consortium (YCC)

have led to the development of a standardized nomenclature system for such a tree.

The initial tree based upon approximately 200 markers (Jobling and Tyler-Smith,

2003; Y Chromosome Consortium, 2002) was recently revised to identify 311 distinct

Y haplogroups (Karafet et al., 2008). The phylogenetic tree is rooted with respect to

the ancestral state of non- human primate sequence.

The Y lineages on the phylogenetic tree contain major 20 major haplogroup

clades designated AT (figure II). Karafet et al., (2008), refer to these as paragroups

in order to differentiate them from the 311 haplogroups that are identified by terminal

mutations, but earlier studies use these terms interchangeably. Y chromosomes

identified by STRs are designated as haplotypes, and those that are defined by the

combination of biallelic markers and STRs are called lineages as proposed by de

Knijff (2000). A brief description of the salient features of major Y haplogroup clades

follows:

HAPLOGROUP A:

Haplogroup or clade A* contains 12 additional haplogroup branches, all

restricted to Africa (Hammer et al., 2001; Underhill et al., 2001). All individuals that

25
26
fall in this group carry the ancestral state for M42, M94, M139, and SRY10831.1 and

derived state for M91 and P97 (Karafet et al., 2008). The M91 lineage is sub divided

into three main haplogroup characterized by derived alleles for the markers P108, M6

and M32. These haplogroup have been mainly observed in the Khoisan and Bantu

speakers from South Africa, Pygmies from Central Africa and in the Sudanese,

Ethiopian and Mali populations of East Africa (Hammer et al., 2001; Semino et al.,

2002; Underhill et al., 2001; Wood et al., 2005).

HAPLOGROUP B:

Clade B* haplogroup are characterized by having derived alleles for M60

SNP. They are also derived for the markers M42, M94 and M139. All 17 branches

of clade B* are frequently found in sub-Saharan Africa. The major sub-clades are

B1* defined by M236 and B2* define by M182 haplogroup. Sub-clade B1a defined

by the M146 marker is mainly found in Mali. The B2* cluster has several

haplogroups one of which is derived for the marker, B2a*- M150, and is frequently

observed in East Africa (Sudan and Ethiopia). The B2b* (M112 or M192 derived Y

chromosomes) are found in Pygmies from central and southern Africa (Cruciani et

al., 2002; Hammer et al., 2001; Jobling and Tyler-Smith, 2003; Semino et al., 2002;

Underhill et al., 2001; Wood et al., 2005).

The distribution and expansion of clades A* and B* suggests that these Y

chromosomes spread very early within the African continent and is supported by the

palaeo-anthropological record of human population expansions through out Africa,

north and south of the Sahara Desert, eventually reaching the Levant about 130,000-

90,000 year ago (Lahr and Foley, 1998).

HAPLOGROUP C:

A total of 30 mutations and 19 haplogroups are currently reported for this

clade. It is defined by five mutations, the hallmark being the synonymous RPS4Y711

27
C to T transition (also referred to as M130) in the exon of the RPS4Y gene that was

among one of the earlier Y chromosomal polymorphisms that were identified (Fisher

et al., 1990). This clade has not been found in sub-Saharan Africa and the mutations

defining this haplogroup probably occurred in Asia after the migration of modern

humans out of Africa. Walter et al., (2000) has suggested that this mutation

originated in south Asia about forty to fifty thousand years ago with the dispersal of

modern humans from the Horn of Africa via a coastal or interior route towards south

Asia. The haplogroup is frequent in populations from Central and East Asia. It is

also found in many indigenous Australasian and Polynesian populations and the

Native American Indian tribes (Capelli et al., 2001; Hammer et al., 2001; Hudjashov

et al., 2007, Karafet et al., 2001; Kayser et al., 2006; Ke et al., 2001; Kivisild et al.,

2003; Scheinfeldt et al., 2006; Underhill et al., 2001; Zegura et al., 2004).

HAPLOGROUPS D and E:

A Y Alu polymorphism (YAP) defines these haplogroups. All Y chromosomes

belonging to these branches have an Alu insertion. Clade D* is restricted to Asian

populations mainly in Japan and Tibet (Su et al., 2000; Karafet et al., 2001). The 15

haplogroups that are part of this clade are all characterized by the presence of M174

T to C transition (Underhill et al., 2000). These are scattered throughout south East

Asia and among Andaman Islanders (Hammer et al., 2006; Thangaraj et al., 2003).

Clade E* is more mutationally diverse and widespread with 56 distinct

haplogroups (Karafet et al., 2008). Y chromosomes belonging to clade E* have been

found in Africa, Levant, Europe, Central and South Asia (Hammer et al., 1998;

Underhill et al., 2001). Clade E* haplogroups are derived for several markers

including M96 and SRY4064. The major sub-clades are E1* and E2* that are

characterized by derived alleles for P147 and M75. The topology and nomenclature

of this branch has been recently revised with the discovery of several novel

mutations. Important sub-clades of E* include E1b1* that is derived for the P2

28
polymorphism and accounts for 80% of clade E haplogroup. M2 or sY81 derived

haplogroup (E1b1a*) are present at high frequencies in sub-Saharan Africa, whereas

the E1b1b* haplogroups defined by the M215 mutation are frequently observed in

north and east Africa, the Mediterranean basin and the Europe (Hammer et al.,

1997). It has been suggested that clade E* haplogroup were spread by the Bantu

farmers during the Neolithic period (Passarino et al., 1998; Scozzari et al., 1999).

The representatives of these haplogroup traveled from the Middle East to southern

Europe and northern India and Pakistan (Cruciani et al., 2002; Hammer et al., 1998;

Semino et al., 2004; Sims et al., 2007; Underhill et al., 2001).

HAPLOGROUP F:

M168 derived haplogroup that have the derived allele for M89 C to T

transition is frequent in non-African populations. Besides M89 and M213 (Underhill

et al., 2000) this clade is now also identified by several markers discovered by

Hammer et al., (2001). The haplogroup probably arose in East Africa about 45,000

years ago and dispersed to Eurasia through the Levantine corridor. Underhill et al.,

(2001) have suggested that the African ancestors first migrated to the Middle East

around 40,000 years ago and eventually expanded towards the west, east and north

giving rise to several major clades (GT) of the Y phylogenetic tree. Paragroup F* is

found mainly on the Indian subcontinent and in Sri-Lanka (Kivisild et al., 2003;

Sengupta et al., 2006).

HAPLOGROUP G:

Characterized by the M201 and P257 mutations this haplogroup is present in

South East Europe, the Mediterranean region, Anatolia, West and Central Asia

(Behar et al., 2004; Cinnioglu et al., 2004; Jobling and Tyler-Smith, 2003; Regueiro et

al., 2006; Sengupta et al., 2006) and North Caucasus (Nasidze et al., 2003).

29
HAPLOGROUP H:

Found almost exclusively in the Indo-Pak subcontinent these haplogroups are

characterized by M69, a T to C mutation. The 10 currently identified haplogroups

within this clade are separated into two major clusters: H1* and H2*. H1* clade is

defined by the M52 A-C transversion whereas the H2* haplogroup is characterized

by the Apt G to A transition. Both have been observed in India but only H1* has

been reported in populations from Pakistan (Jobling and Tyler-Smith, 2003; Karafet

et al., 2005; Sengupta et al., 2006).

HAPLOGROUP I:

It is one of the major clades found in European populations and defined

initially by the M170 A-C transversion. It is thought that this mutation was acquired

during the early expansion of Levantine populations towards the west. Clade I

comprises 16 haplogroups. It is found at high frequency in North Europeans

(Hammer et al., 2001; Jobling and Tyler-Smith, 2003; Rootsi et al., 2004).

HAPLOGROUP J:

One of the major clades that defined by the 12f2a and more recently the

M304 deletion and P209 marker (Karafet et al. 2008). It has two main branches J1*

which is M267 derived and J2* which is derived for M172 (Cinnioglu et al., 2004;

Underhill et al., 2000). The J* clade and its branches probably arose in the Middle

East and Anatolia (Turkey) from where they spread to west Asia and Eurasia

(Hammer et al., 2000; Semino et al., 2004). It is frequent in both India and Pakistan

(Mohyuddin et al., 2006).

30
HAPLOGROUP K:

This haplogroup is a mixed bag characterized by derived alleles for the M9 (C-G

transversion) marker (Underhill et al., 1997). Its low incidence in Africa illustrates

that the mutation occurred after the migration out of Africa. A recent survey by

Karafet et al., (2008) demonstrated derived states for an additional three markers

(P128, P131 and P132) for this haplogroup. The K1 branch derived for M147 has

been observed in populations from the Indo-Pak subcontinent (Underhill et al., 2001).

The K2 branch has been re-designated as haplogroup T* (Karafet et al., 2008).

HAPLOGROUP L:

The L* lineage probably arose in West Asia in a pre-Holocene era and was

initially identified in samples from the Indus Basin in Pakistan (Underhill et al., 2000).

One branch L1 (derived for M27 and M76) probably arose in the Indo-Pak

subcontinent. It is absent in North-East India and found at a low frequency in Central

India and the Northern region of India and Pakistan. The highest frequency at South

India and South-West Pakistan suggests that this lineage originated in the Indian

Peninsula (Sengupta et al., 2006). Other branches of haplogroup L* are present in

the Middle East, Central Asia, Northern Africa, and Europe and along the

Mediterranean coast (Cinnioglu et al., 2004; Cruciani et al., 2002; Jobling and Tyler-

Smith 2003; Sengupta et al., 2006).

HAPLOGROUP M:

Characterized by the P256 SNP this clade is predominantly found in

Indonesia, Melanesia, Papua and New Guinea (Capelli et al., 2001; Hurles et al.,

2002; Kayser et al., 2006; Scheinfeldt et al., 2006; Su et al., 2000). Currently 20

mutations characterize the 12 haplogroups found within this branch (Karafet et al.,

2008).

31
HAPLOGROUPS N and O:

The A to G transition of M214 identifies the ancestor of two major

haplogroups clades N* and O*. M231 and LLY22g characterize clade N* and N1*

and the M175 deletion clade O* (Cinnioglu et al., 2004). Haplogroup N* probably

originated in Asia but are now predominantly found in European populations (Karafet

et al., 2001; Rootsi et al., 2007).

Clade O* is found at high frequency in East Asians. A major branch of this

clade is characterized by the Y-SNP O3*-M122 and it predominates in East Asia and

is found in a majority of the Chinese population. The microsatellite diversity in this

sub haplogroups is highest in South-Chinese population indicating it appeared there

before expanding northwards approximately 30,000-25,000 years ago (Shi et al.,

2005).

HAPLOGROUPS P, Q and R:

Clade P* is defined by the presence of 92R7, M45, M74 and several other

SNPs that are derived for the M9 mutation as well. This clade includes several major

groups that are prevalent in various world populations.

Haplogroup Q* (derived for the C to T M242 mutation) probably arose in

Central Asia from where these chromosomes spread throughout the world (Seielstad

et al., 2003). These Y chromosomes are found at high frequency in North Eurasia

and Siberia (Karafet et al., 2002) and at lower frequencies in Europe, East Asia and

the Middle East. One major branch of this haplogroup (Q1a3a*-M3) is almost

exclusively restricted to the Native Americans (Zegura et al., 2004).

Eight mutations, including the M207 A-G SNP, represent clade R. This clade

is further characterized into two sub-clusters R1*-M173 and R2-M124. It is assumed

that around 30,000 years ago the R*-M207 mutation expanded westwards to reach

Europe, Caucasus, Middle East, Central Asia, northern India and Pakistan. The R1*

haplogroup is one of the most common in Europe and west Asia and probably

32
originated in central Asia. The R1a1*-M17 clade that is characterized by deletion of

the G nucleotide (Underhill et al., 1997) is frequently found in south-west Pakistan

and north India (Jobling and Tyler-Smith, 2003).

HAPLOGROUPS S and T:

A reexamination of the Y phylogenetic tree led to the addition of haplogroups

S* and T* characterized by markers M230 and M184, respectively (Karafat et al.,

2008). Haplogroup S* chromosomes were previously characterized as K-M230 while

those now belonging to clade T* were previously identified as haplogroup K-M70 (Y

Chromosome Consortium, 2002).

Clade S* lineages are also identified by P202 and P204 markers and are found in

Oceania and Indonesia (Kayser et al., 2006; Scheinfeldt et al., 2006). Clade T* that

is also characterized by M70, M193 and M272 is further delineated by M320 and P77

and has been observed in the Middle East, Africa, and Europe (Underhill et al., 2001;

King et al., 2007).

33
MATERIALS AND METHODS

-4-
COLLECTION OF SAMPLES:

For this study, the blood samples were collected from1213 unrelated male

subjects, belonging to sixteen ethnic groups of Pakistani population. Informed

consent was obtained from all participants included in this study. Ethnicity of the

sampled individuals was confirmed prior to collection.

10ml blood of each individual was collected in Vacutainer tubes (Becton

Dickinson, Mountain View, CA.). 66 samples belong to Baloch and 117 samples

from Brahui population were collected from Quetta and Kalat Division in Baluchistan.

97 samples belong to Burusho population were collected from Hunza and Nagar in

the Northern Areas. 224 Hazara samples were collected from the area of Parachinar

and Quetta. 44 Kalash samples were collected from Chitral Division. The 90 blood

samples of Parsis and 14 Balti samples were collected from Karachi. 96 Pathan

samples were collected from the North-West Frontier Province. 138 Sindhi samples

were collected from the Sukkur in Sindh. 16 samples of Meos, 10 Rajput and 159

Gujar samples were collected from the rural areas of Punjab Province. 27 Makrani-

Baloch, 33 Makrani-Negroid and 70 Mohanna samples were collected from interior

part of Sindh Province. 12 Kashmiries were collected from Muzafrabad (Kashmir).

The 77 Greek DNA samples were provided by Dr. Myrto Papaioannou (Unit of

Prenatal Diagnosis, Center for Thalassemia, Laiko General Hospital, Athens,

Greece).

PREPARATION OF EPSTEIN-BARR VIRUS FROM B95-8 CELLS:

The Epstein-Barr Virus (EBV) producing B95-8 marmoset cell line (American

Type Culture Collection, Manassas, VA) was suspended (5 x 106 cells) in 10 ml of

wash medium which consisted of RPMI-1640 (Sigma-Aldrich, St. Louis, MI)

supplemented with 1% fetal calf serum (FCS; Biochrom AG, Berlin, Germany) and

1X GPPS (2 mM L glutamine, 100 U/ml penicillin, 1 mM sodium pyruvate and 50

g/ml streptomycin) and centrifuged in an IEC-HN-SII bench top centrifuge

34
(International Equipment Company, Needham, MA), at 1000 rpm (300g) for 10

minutes. The supernatant was decanted and the pellet was washed twice in 5 ml of

wash medium followed by centrifugation at 1000 rpm for 10 min. The cells were

transferred into a 25 cm2 culture flask (Corning, Corning, NY) containing RPMI-1640

medium supplemented with 1X GPPS and 10% FCS. The flask was incubated at

37 C in a humidified atmosphere of 93% air and 7% CO2. The culture was gradually

expanded and split first into a 75 cm2 and finally in 125 cm2 flasks. When the

medium in the culture flask became yellow they were incubated at 34 C without any

additional medium supplementation for 7 days to enhance EBV production. On the

8th day the cell pellet was removed from the suspension by centrifugation at 1000

rpm for 10 minutes. The supernatant containing EBV was filtered through a 0.45 M

Millipore membrane filter (Nilsson, 1976). The EBV supernatant was aliquoted into

cryovials (Corning, Corning, NY) and stored at70 C until use. 1 ml aliquot of this

preparation was able to transform human B lymphocytes.

PREPARATION OF LYMPHOCYTES:

For the isolation of peripheral blood mononuclear cells (PBMC),

approximately 5 ml venous blood was collected in acid citrate dextrose (ACD)

vacutainer tubes (Becton Dickinson, San Jose, CA). The blood was layered over 3ml

Histopaque-1077 (Sigma Aldrich) in a sterile 15 ml polypropylene conical tube

(Corning, Corning, NY). Each sample was centrifuged at 2000 rpm (400g) for 20

minutes. The upper plasma layer was aspirated and PBMC were collected from the

interface between the plasma and Histopaque and transferred in to another sterile 15

ml tube containing 10 ml wash medium and centrifuged at 1000 rpm for 10 minutes.

The supernatant was decanted and the cell pellet washed twice with 5 ml wash

medium and resuspended in 1 ml of wash medium (Boyum, 1968). Cell viability was

checked by the trypan blue exclusion test.

35
CELL COUNTING BY TRYPAN BLUE EXCLUSION TEST:

Cell viability was calculated by the trypan blue exclusion test as described by

Kruse, (1973). An equal volume (10 l) of cell suspension was mixed with 0.16%

(w/v) trypan blue solution in physiological saline. Cells were counted using a

haemocytometer. Unstained live and blue stained dead cells were counted in the

central 1mm square of the counting chamber. The cell viability was calculated by the

following formula:

Number of live cells total number of cells x 100.

The total number of live cells per ml was calculated as follows:

Number of live cells x 2 (dilution factor) x 104.

ESTABLISMENT OF EBV TRANSFORMED LYMPHOBLASTOID CELL

LINES:

In order to preserve and obtain an inexhaustible supply of an individuals DNA

human lymphoblastoid cell lines were established. Approximately 4-5 x 106 PBMCs

were transferred to a 25cm2 culture flask, containing 3 ml transformation medium

(RPMI-1640, 10% FCS, 1X GPPS, 0.05 mM beta- g/ml

cyclosporin A) and 1 ml EBV supernatant prepared earlier. The flask was incubated

at 37 C in a humidified atmosphere of 93% air and 7% CO2, keeping the cap of flask

slightly loose (Walls and Crawford, 1987). The culture was visualized periodically

under an inverted microscope. After 5-6 days when colonies formed and the culture

medium became acidic, the culture was fed with feeding medium (RPMI-1640, 10-

15% FCS and 1X GPPS). When the transformed cell density in a culture flask had

suitably increased, half of the culture was transferred into a 75cm2 culture flask and

expanded for cryogenic preservation and DNA preparation.

36
CRYOPRESERVATION OF CELL LINES:

For cryogenic preservation, cell viability was checked by the trypan blue

exclusion test as described earlier. Only cultures with cell viability > 90% were

frozen. The volume of cell suspension containing 5 x 106 cells was centrifuged at

1000 rpm for 10 minutes. The supernatant was decanted and the cell pellet was

resuspended in 1 ml of freezing mix (45% RPMI-1640, 45% FCS and 10%

dimethylsulphoxide (DMSO; BDH, Poole, U.K) and transferred to a 1.2 ml cryogenic

vial. The vial was kept in a polystyrene box at -70 C overnight so that the

temperature decreased gradually. The following day the vial was transferred to the

vapour phase of the liquid nitrogen cryo-storage system (Jencons, Leighton Buzzard,

UK) for long term storage.

EXTRACTION OF CELLULAR DNA:

For the isolation of total genomic DNA a modified organic method was used

(Maniatis et al., 1982). Approximately 5x107 lymphoblastoid cells established from

each individual were pelleted into a sterile 50 ml polypropylene centrifuge tube. To

the cell pellet 19 ml STE buffer (100 mM sodium chloride, 50 mM Tris and 1 mM

EDTA; pH 8.0) was added. Next 1 ml of 10% sodium dodecyl sulphate (SDS) was

added dropwise with gentle vortexing, followed by 20 l of Proteinase K (20 mg/ml).

The samples were incubated overnight in shaking water bath at 55 C and extracted

the following day with an equal volume of tris base equilibrated phenol (pH 8.0). The

samples were mixed for 10 minutes, placed on ice for 10 minutes and then

centrifuged in MSE 3000i (Mistral, UK) at 4 C for 40 minutes at 3200 rpm. The

aqueous layer containing the nucleic acid was collected in a fresh, labeled 50 ml

centrifuge tube. The next extraction was done by adding an equal volume of chilled

24:1 (v/v) Chloroform: isoamyl alcohol. The samples were mixed and the aqueous

layer was collected in a fresh 50 ml tube. For precipitation of nucleic acids, 1/10

volume of 10 M ammonium acetate and an equal volume of chilled isopropanol were

37
added and mixed until white precipitates formed. These samples were stored over

night at -20 C or at -70 C for 15-20 minutes. Samples were then centrifuged at 3200

rpm for 90 minutes to pellet the nucleic acid and the pellet was washed with 5 ml of

chilled 70% ethanol. The pellets were vacuum dried for 10 minutes. To the pellets,

1ml Tris-EDTA (TE; 10 mM tris, 1 mM EDTA; pH 8) was added and the samples

were incubated at 37 C for 1 hour to resuspend the pellets. To digest the RNA, 10 l

of RNase A (10mg/ml) was added to the samples and they were incubated at 37 C

for 2 hours in a shaking water bath. The RNase was subsequently removed by

adding 50 l of 10% SDS and 5 l of proteinase K and incubation at 55 C for 1 hour

in a shaking water bath. At this point the samples could be stored at 4 C till further

extraction. Subsequent extract was carried out by adding 6 ml TE to each sample

before extracting successively with an equal volume of phenol and chloroform:

isoamyl alcohol. For precipitation of DNA, 1/10 volume of 10 M ammonium acetate

and an equal volume of chilled isopropanol was added. The samples were mixed

until the DNA was seen and stored at -20 C overnight or at -70 C for 15-20 minutes.

DNA was pelleted and washed with 5 ml of 70% chilled ethanol. The pellet was

vacuum dried for 10 minutes and the DNA was resuspended in 1 ml of 10 mM Tris-

HCl (pH 8).

The optical density (OD) of the samples was measured at 260nm and 280nm (ideally

260/280 ratio=1.8) using a Hitachi U3210 spectrophotometer (Hitachi, Tokoyo,

Japan). The quantity of DNA was calculated by the following formula:

DNA concentration g/ml = Abs 260 x dilution factor x correction factor.

A dilution factor of 50 was usually employed and the correction factor for double

stranded DNA is 50. If the OD260/OD280 ratio was 1.7-2.0, DNA was considered pure

and free of contaminating phenol or proteins and for further analysis. Each sample

was kept in an appropriately labeled microcentrifuge tube and stored at 4oC until use.

Some DNA samples were also directly prepared from the blood sample. The

procedure for the extraction of the DNA from blood was the same as above with
38
some minor modifications. Initially the blood was mixed with the cell lysis buffer

(0.15 M ammonium chloride, 0.01 M potassium bicarbonate and 0.1mM of 0.5M

EDTA; pH 8.0) and kept on ice for 30 minutes. The samples were centrifuged for 10

minutes at 1200 rpm. The pellets were again washed with 10 ml of lysis buffer and

centrifuged for 10 minutes at 1200 rpm. To this pellet 4.75 ml of STE buffer was

added along with 250 l of 10% SDS (drop wise with gentle vortexing) followed by 10

l of proteinase K. The tube was incubated overnight in a rotary water bath at 55oC.

The next day, the samples were extracted using phenol and chloroform: isoamyl

alcohol as described earlier. After this first extraction, 10 l of RNAse A (10 mg/ml)

was added and the samples were incubated at 37oC for 2 hours. After 2 hours the

samples were again treated with 250 l of 10% SDS and proteinase K and incubated

at 55oC for 1 hour. Subsequent extraction and precipitation were the same as

described for lymphoblastoid cell lines.

PHENOL EQUILIBRATION:

Analytical grade phenol (BDH) was redistilled at 160C to remove

contaminants that cause breakdown or cross linking of nucleic acids. Aliquots of

200-500 ml distilled phenol were stored at -20C. Before use, the phenol was melted

at 55-70C and -hydroxyquinolin was added as an oxidant and RNase inhibitor at a

final concentration of 0.1% (w/v). The melted phenol was extracted once with an

equal volume of 1.0 M Tris buffer (pH 8.0) and 3 to 4 times with 0.1 M Tris (pH 8.0).

This equilibrated phenol was stored at 4C in equilibration buffer (0.1 M Tris) to which

0.2% -merceptoethanol (v/v) was added. Under these conditions it was stable for

approximately one month (Maniatis et al., 1982).

39
GENOTYPING OF Y MARKERS BY POLYMERASE CHAIN REACTION

(PCR):

Polymerase chain reaction was first described in 1985 (Saiki et al., 1985) and

the method was extensively employed in this study to amplify the desired fragment of

Y chromosome from genomic DNA. The 93 Y markers that were genotyped in this

study are shown in table II and a brief overview of the various methods used to

detect them follows:

AMPLIFICATION REFRACTORY MUTATION SYSTEM (ARMS) PCR:

The ARMS PCR technique is a simple method for the detection of single base

mutations. In this allele specific PCR the genomic DNA is only amplified when a

specific allele is present. Two sets of reactions are run in parallel using three types

of primers, one of which is common in both reactions. One set consists of the

common primer and a primer that is specific for the normal sequence. The other

contains the common primer and another that is specific for the mutant sequence.

The principle is that the extension of primer by DNA polymerase is dependent up on

correct base pairing at the 3`end.

AMPLIFICATION FRAGMENT LENGTH POLYMORPHISM (AFLP) PCR:

The AFLP PCR is based on the principle that the base changes results in the

creation or abolition of a restriction site. PCR primers are designed from sequences

flanking the restriction site to produce a 100-500 base pair product. The amplified

product is subsequently digested with the appropriate restriction enzyme and

fragments are analyzed by agarose gel electrophoresis. The SNPs typed by AFLP

method are listed in table III.

40
Table II: A list of Y haplogroups, markers, type of polymorphism and genotyping methods used in this study. Y haplogroups were

determined in a hierarchal manner, screening initially with markers that identified deep lineages (bold) and subsequently genotyping

markers that further delineated the tree in the target population. The typing methods were amplified fragment length polymorphism

(AFLP), denaturing high performance liquid chromatography (DHPLC), amplification refractory mutation system polymerase chain reaction

(ARMS-PCR) or dideoxy DNA sequencing (Seq).

Polymorphism

Polymorphism

Polymorphism
Haplogroup

Haplogroup

Haplogroup
Genotyping

Genotyping

Genotyping
Markers

Markers

Markers
Method

Method

Method
A M91 del T DHPLC H1b M97 TG DHPLC O1b M110 TC Seq
A1 M31 GC DHPLC H2 Apt GA AFLP O2 P31 TC Seq
A2 M6 TC DHPLC I M170 AC ARMS O2a1 M88 AG Seq
A2 PK1 CA AFLP J 12f2 del PCR O2a1 M111 del TT Seq
A3a M32 TC DHPLC J1 M267 TG ARMS O2a1a PK4 AT DHPLC
B M60 ins T DHPLC J1a M62 TC ARMS O2b SRY+465 CT AFLP
B2a M150 CT DHPLC J2 M172 TG ARMS O3 M122 TC ARMS
B2a1 M109 CT DHPLC J2a1b M67 AT ARMS O3a3 L1Y LINE1 ins PCR
B2a1 M152 CT DHPLC J2a1b1 M92 TC ARMS O3a5 M134 del G DHPLC
B2a1 M218 CT DHPLC J2b M12 GT ARMS O3a5a M117 del ATCT DHPLC
C RPS4Y CT AFLP K M9 CG AFLP O3a5a M133 del T DHPLC
C1 M8 GT Seq K1 M147 ins T Seq P 92R7 CT AFLP
C2 M38 TG Seq K4 M177 CT Seq P M45 GA DHPLC
C3 M217 AC Seq L M20 AG AFLP P M74 GA DHPLC
C3 PK2 T-C ARMS L M11 AG AFLP Q M242 CT ARMS
C3C M48 AG ARMS L M185 CT DHPLC Q2 M25 GC DHPLC
DE YAP Alu ins PCR L1 M27 CG ARMS Q2 M143 GT DHPLC
E SRY-8299 GA AFLP L1 M76 TG DHPLC R M207 AG ARMS
E3a sY81 AG AFLP L2 M317 del GA DHPLC R1 M173 AC ARMS
E3b1 M35 GC ARMS L2 M349 GT DHPLC R1a1 M17 del G ARMS
E3b1a M78 CT ARMS L3 M357 CA DHPLC R1a1 SRY-1532 AGA AFLP
E3b1a1 M148 AG DHPLC L3a PK3 TC ARMS R1a1a M56 AT ARMS
E3b1c M123 GA ARMS NO M214 AG ARMS R1a1b M157 AC DHPLC
E3b1c2 M136 CT DHPLC N LLY22g CA AFLP R1a1c M87 TC DHPLC
F M89 CT ARMS N M231 GA DHPLC R1a1d PK5 CT AFLP
G M201 GT ARMS N3 TAT TC AFLP R1b2 M73 del GT DHPLC
G2a P15 C-T DHPLC O M175 del TTCTC Seq R1b3F SRY-2627 CT AFLP
H M69 TC DHPLC O1 M119 AC DHPLC R1c M343 CA ARMS
H1 M52 AC ARMS O1a M101 CT DHPLC R2 M124 CT ARMS
H1 M82 del AT DHPLC O1b M50 TC DHPLC T M70 AC ARMS
H1a M36 TG DHPLC O1b M103 CT DHPLC T M193 ins CAAA DHPLC

41
Table III: List of SNPs typed by AFLP method.

SNO Markers Restriction Enzyme

1 Apt Hae III

2 Lly22g HindIII

3 M9 Hinf I

4 M11 Msp I

5 M17 Afl III

6 M20 Ssp I

7 PK1 Psp14061

8 PK5 Mnl1

9 RPS4Y Bsl I

10 SRY+465 FnuH I

11 SRY 1532 Dra III

12 SRY2627 Ban I

13 SRY8299 BsrBI

14 Sy81 Nla III

15 TAT Mae II

16 92R7 Hind III

42
PREPARATION OF AGAROSE GEL:

6g of molecular grade agarose (molecular biology grade; Sigma Chem. Co)

was mixed in 300 ml of or TAE electrophoresis buffer (0.04M Tris-acetate and 0.01 M

EDTA / liter) to make a 2% (W/V) agarose gel. The agarose was melted in a

microwave oven keeping the cap of the bottle loose. When the agarose was dissolved

completely, 5 l (0.5g/ml) ethidium bromide (Sigma-Aldrich, St.Louis, USA) was

added and mixed thoroughly. The gel was placed on shaking water bath at 55 C for

20-25 minutes. A gel tray was sealed with rubber clamps and placed on a level

horizontal surface. The required combs were placed at appropriate positions (0.5-

1.0mm above the base of the gel). The gel was poured into the gel tray. After the gel

solidified, the combs and clamps were removed from the gel tray. The gel was placed

in an electrophoresis tank containing appropriate 1X TAE electrophoresis buffer.

Orange G loading dye (0.125% orange G, 20% Ficoll, 100mM EDTA) was

added to each sample and the samples were loaded on the gel. A 100 bp ladder

(Promega) was loaded in the first well. Electrophoresis was carried out for

approximately 40 minutes at 150 volts using a power pack (3000 Bio Rad

laboratories). Photographs were taken under UV transilluminator using the Syngene

system (Bio imaging system, Cambridge, UK).

MULTIPLEX PCR:

Each sample was PCR amplified in a multiplex reaction consisting of 4 to 5

primer pairs which were labeled either with TET, HEX or FAM (Table IV). The multiplex

PCR assay was performed in a 10 l final volume. The reaction mixture was prepared
TM
in two steps. In first step, Super Taq polymerase / Taq Start Antibody premix was

prepared. Briefly, the premix consisted of the following: 0.13U Super Taq enzyme (HT
TM
Biotechnology Ltd) was incubated with 2.3 M Taq Start Antibody (Clontech) in the

43
TM
presence of 0.874 l /RXN Taq Start Dilution buffer for 5-7 minutes at room-

temperature. In the second step, PCR master mix was prepared. Briefly the reaction

consisted of following: 1x Supper Taq PCR Buffer1 (10mM Tris-HCl pH 9 , 1.5mM

Mgcl2, 50mM KCl, 0.01% gelatin and 0.01% Triton X-100), 0.7mM Mgcl2, 200 M

dNTPs, primer (concentration was described in table IV) and 1.225 l /RXN Super Taq

polymerase / Taq Start TM Antibody premix.

The above mixture was added in to the tubes containing 20ng (1l) genomic

DNA. PCR was performed by Touch Down protocol as described in Ayub et al.,

(2000). PCR was carried out using the following conditions: 1 cycle of 1 minute at

940C; 8 cycles of 1 minute at 940C, 1 minute at 600C and 1 minute 720C (the annealing

temperature was decreased by 0.5 C in each cycle); 30 cycles of 1 minute at 940C, 1

minute at 560C and 1 minute 720C; I cycle of 5 minute at 720C.

SAMPLE PREPARATION:

0.3 l of amplified product was mixed with 2.7 l of dye (0.342 l Dextran blue,

1.5 l formamide, 0.478 l autoclave deionized water and 0.38 l TAMRA 300 or 500

internal lane size standard / reaction). Samples was denatured at 90C for 2 minutes

and placed on ice untilled loading. Samples were run on ABI 377 DNA sequencer for

one and a half hour. The data was collected by using ABI collection software. The

fragment sizes were estimated using Gene Scan software (v2.1). The allele were

called using Genotyper software (v2.0).

4% POLY ACRYLAMIDE GEL PREPARATION:

5.4 g of urea was dissolved in 5 ml of autoclaved deionized water by continuous

stirring and heating.1.5 ml of 40 %(19:1,acrylamide:bis acrylamide) acrylamide solution

and 2-3 gm of mix bed ion- exchange resin was added to the urea and mixed for 2-3

44
minutes. The solution was filtered through a Whatmann No. 1 filter paper into a 50 ml

graduated cylinder already containing 1.5 ml of 10X TBE (Trizma base; Tris

[hydroxymethyl] aminomethane 70g, 55g boric acid and 9.0g ethylene diamine tetra

acetic acid (EDTA, pH 8-8.2). The volume was made to 15 ml and filtered through a

0.2 M Millipore filter paper using a Millipore vacuum filtration assembly. To the filtered

solution 5 l of 10% ammonium per sulphate (APS) and 10.5 l TEMED was added

just before pouring the gel.

The rear and the front plate (12 cm) were washed with 1% Alconox detergent

first with de-mineralized water and then with deionized water. When plates were dry,

the rear plate was placed on the gel casting apparatus (Sequencing Gel Caster: model

SGC-1) with the inside of the plate facing up. Wet 0.2 mm spacers were placed on the

rear plate. The front plate was placed half way down on top of the rear plate. The 4 %

acryamide solution was filled in a 50ml syringe and poured slowly between the two

plates. The flat edge of a 0.2 mm comb was inserted in between the plates and plates

were sealed with clamps. The plate assembly was left for 30-45 minutes for the gel to

polymerize. The comb and clamps were removed. The plate assembly was washed

with demineralized water then deionized water and left for 15- 20 minutes. The shark

tooth side of the comb was inserted so that the teeth of the comb just touch the gel.

The plates were fixed on the gel cassette then on to the sequencer. The upper and

lower buffer reservoirs were attached. Plate check was carried out to ensure that the

gel plate was clean. 1X TBE buffer was filled in upper and lower buffer reservoirs.

Before loading the samples the gel was electrophoreses for 10 minutes.

45
Table IV: YSTR Primers sequences.

Dye Final Conc.

YSTR1 Primer name Primer Sequence label (M)

DYS19-L CTA CTG AGT TTC TGT TAT AGT TET 0.236

DYS19-R ATG GCA TGT AGT GAG GAC A 0.236

DYS388-L GTG AGT TAG CCG TTT AGC GA TET 0.318

DYS388-R CAG ATC GCA ACC ACT GCG 0.318

DYS390-L TAT ATT TTA CAC ATT TTT GGG CC 0.127

DYS390-R TGA CAG TAA AAT GAA CAC ATT GC FAM 0.127

DYS391-L-N CTA TTC ATT CAA TCA TAC ACC CAT AT FAM 0.384

DYS391-R-N ACA TAG CCA AAT ATC TCC TGG G 0.384

DYS392-L-N AAA AGC CAA GAA GGA AAA CAA A 0.155

DYS392-R-N CAG TCA AAG TGG AAA GTA GTC TGG HEX 0.155

DYS393-L GTG GTC TTC TAC TTG TGT CAA TAC 0.18

DYS393-R AAC TCA AGT CCA AAA AAT GAG G HEX 0.088
YSTR2
DYS389I-L CCA ACT CTC ATC TGT ATT ATC TAT TET 0.032

DYS389I-R TCT TAT CTC CAC CCA CCA GA 0.032

DYS389II-L CCA ACT CTC ATC TGT ATT ATC TAT TET 0.032

DYS389II-R TTA TCC CTG AGT AGT AGA AGA AT 0.032

DYS425-L TGG AGA GAA GAA GAG AGA AAT 0.861

DYS425-R AGC TCT ACA AGC CAT TGT GAT CT FAM 0.861

cont.

46
Dye Final Conc.
YSTR2 Primer name Primer Sequence
label (M)

DYS426-L GGT GAC AAG ACG AGA CTT TGT G HEX 0.30

DYS 426-R CTC AAA GTA TGA AAG CAT GAC CA 0.25

YSTR3 DYS434-L CAC TCC CTG AGT GCT GGA TT TET 0.2

DYS434-R GGA GAT GAA TGA ATG GAT GGA 0.2

DYS437-L GAC TAT GGG CGT GAG TGC AT HEX 0.1

DYS437-R AGA CCC TGT CAT TCA CAG ATG A 0.1

DYS435-L AGC ATC TCC ACA CAG CAC AC TET 0.05

DYS435-R TTC TCT CTC CCC CTC CTC TC 0.05

DYS438-L TGG GGA ATA GTT GAA CGG TAA HEX 0.2

DYS438-R GTG GCA GAC GCC TAT AAT CC 0.2

DYS436-L CCA GGA GAG CAC ACA CAA AA FAM 0.025

DYS436-R GCA ATC CAA CTT CAG CCA AT 0.025

DYS439-L TCC TGA ATG GTA CTT CCT AGG TTT TET 0.2

DYS439-R GCC TGG CTT GGA ATT CTT TT 0.2

47
AUTOMATED FLUORESCENT DNA SEQUENCING:

Automated sequencing (di-deoxy terminator cycle sequencing) was carried out

using an ABI 377 DNA Sequencer and the dye terminator cycle sequencing ready

reaction kit (version 3.1; Applied Bio system).

DNA was amplified by polymerase chain reaction in a 50 l reaction volume.

The reaction mixture contained: 1X PCR buffer II, 1.5mM MgCl2, 100 M dNTPs, 1U

DNA Taq polymerase, 1.0 M Primer (forward and reverse each) and 40ng DNA

template. The following PCR cycling conditions were used for the amplification: 1 cycle

of 4 minutes at 950C; 35 cycles of 1 minute at 950C, 1 minute (annealing) (depend on

the primer and describe in Appendix I), 1 minute at 720C; 1 cycle of 10 minute at 720C.

Amplified PCR products were first checked on 2% agarose gel. The amplified

product was precipitated with 50l of 95% ethanol. Sample was then washed with

200l of 70% ethanol and the pellets were resuspended in autoclaved deionised

water. If required, PCR products were also purified with the QIAquick PCR product

extraction kit (Qiagen) according to the manufacturers instruction. Sequencing

reaction was carried out in 10.0 l total reaction volume consisted of the following: 2.0

l sterile deionised H2O, 4.0 l Terminator ready reaction mix. (Includes labelled dye

terminators, buffer, and dNTPs), 1.0 l forward or reverse sequence specific primer

and 3.0 l purified DNA (0.5 g).

PCR was performed using a Thermo Hybaid multi-block system (MBS 0.2S), or

Thermo Hybaid PxE 0.2 thermal cycler for 25 cycles as follows: 10 seconds at 96oC, 5

seconds at 50oC and 4 minutes at 60oC.

After amplification, the products were precipitated with 50l of 95% ethanol,

washed with 200l of 70% ethanol and vacuum dried. The pellets were resuspended

in 5l of ABI loading buffer, diluted with formamide (1:5), samples was denatured at

95C for 2 minutes and placed on ice until loading. Samples were run on ABI 377

48
DNA sequencer for seven hour. The data was collected by using ABI collection

software.

PREPARATION OF SEQUENCING GEL:


To prepare sequencing gel, 9g of urea (6M) was dissolved in approximately

10ml of deionised water, placed on a hot plate with constant stirring. After dissolving

the urea, 2.5ml of a 19:1 acrylamide gel solution (Sequa gel) and 2.5ml of 10X TBE

was added q.s. to 25ml with sterile deionised water. The solution was filtered through a

0.2m Millipore membrane filter and degassed using a Millipore vacuum filtration

assembly. To the filtered solution, 200l of 10% APS and 5l of TEMED was added

and immediately poured into the gel plates. The remaining procedure was same as

mentioned for the 4% poly acryl amide gel preparation.

DENATURING HIGH PERFORMANCE LIQUID CHROMATOGRAPHY

(DHPLC):

The technique denaturing high performance liquid chromatography (DHPLC)

was initially developed by Oefner and Underhill (1995). This is a powerful technique in

which SNPs are identified by the presence of hetroduplexes in a mixture of amplified

products from a wild type DNA (control sample) and the test sample. The DNA

fragments are separated on a specialized DNA Sep column based upon the principle

of ion-pair reversed phase HPLC carried out under denaturing conditions. The

Transgenomic WAVETM DNA fragment analysis system was used for DHPLC work.

PCR was carried out in 15 l total reaction volume. The concentration of reagent for

PCR reaction is: 1X PCR Buffer, 1.5 mM MgCl2, 200 M dNTPs, 1U BioTaq DNA

polymerase, 1.0 M Primer (forward and reversed each), 40ng DNA template (20ng/ l).

PCR cycling parameters were described in the Appendix I.

The quality of amplified product was first checked on a 2% agarose gel by

taking 5 l of each PCR product. Equal volumes of the PCR products of a wild type

49
and each test sample were separately mixed and denatured at 95oC for 5 minutes.

They were then allowed to reanneal by decreasing the temperature at the rate of

1.5oC/min from 95oC-25oC.

Before setting up the experiment, the instrument was initially allowed to run

(purged) with 33% of buffer A (0.1M triethylamonium acetate (TEAA) solution, pH 7.0),

33% of buffer B (0.1M TEAA solution containing 25% acetonitrile, pH 7.0) and 34% of

buffer C (75% acetonitrile solution) for 2-5 minutes. After purging, the column was

equilibrated for 30 minutes with 50% of buffer A and 50% buffer B at a flow rate of

0.9ml/min. Five needle and injection port washes were carried out using buffer D (8%

acetonitrile).

The DNA sequence to be screened for polymorphisms was copied to the Wave

Maker (version 4.1) software and the appropriate temperature and gradient method for

that particular sequence was determined. A sample sheet specifying the tube

numbers, injection volumes, sample IDs and gradient was prepared. The system was

initialized and run according to the manufacturers instructions.

The optimal melting temperature for any DNA fragment can be determined by

electronic submission of sequence to the web site

(http://insertion.stanford.edu/melt.html).

50
RESULTS

-5-
SECTION 1

PHYLOGEOGRAPHY OF PAKISTANI ETHNIC GROUPS:

The Y chromosomal biallelic markers (base substitutions, insertions and

deletions) identify stable Y haplogroups and lineages. More than 600 such markers

on the male specific region of the human Y chromosome delineate >300 Y

haplogroups with a worldwide distribution (Figure II). In this study 93 of these Y

chromosomal biallelic markers were examined in 1,213 unrelated male individuals

representing 16 ethnic groups from Pakistan. The ethnic groups were categorized

broadly into two groups (Table I). The northern group was represented by unrelated

males from the Balti, Burusho, Hazara, Kalash, Kashmiri, Pathan and Punjabi ethnic

groups. Punjabis constitute the majority of Pakistans population and most reside in

the Punjab province adjoining India. The Punjabi samples analyzed were 185

unrelated male samples of the Gujar, Meo and Rajput castes. The southern group

comprised of unrelated males from the Baloch, Brahui, Makrani-Baloch, Makrani-

Negroid, Mohanna, Parsi and Sindhi populations.

Y biallelic polymorphisms were typed using a hierarchical approach. All

samples were initially analyzed for four markers representing clades close to the root

of the Y phylogenetic tree. These included SRY10831.1 (clade B*), RPS4Y711 (clade

C*), YAP (clade E*) and M89 (clade F*). The frequencies of these B*, C*, E* and F*

haplogroups in Pakistan are shown in Table V.

Futher subtyping of markers within each haplogroup revealed thirty-three

haplogroups in different ethnic groups of Pakistan. Among these four (B*, C*, E* and

F*) haplogroups, F* was the most frequent in both northern and southern populations

(Figure III). As expected, the majority (85%) of Y chromosomes from Pakistan were

derived from M89. The M89 derived alleles are frequently found in most world

populations residing outside Africa, and represents YCC clades F through T (Figure

II). Twenty-five different haplogroups of F*-M89 chromosomes were found at varying

51
frequencies in the different ethnic groups of Pakistan (Table VI). The thirty-three

haplogroups are summarized in Figure IV.

Clade A* is restricted to sub-Saharan African populations and was not

observed in any individual belonging to Pakistan. However, a low frequency of B*-

M60 haplogroup was observed in 0.9 % of the Brahui and 3% of the Makrani-Negroid

samples from southern Pakistan.

Haplogroup C* was the predominant haplogroup in the Hazara population

(60%). It was also present in the Brahui, Mohanna, Burusho, Meo and Gujar with a

frequency that ranges from 1.6 to 8.2% (Tables VI and VII). Individuals carrying the

derived allele for RPS4Y711 marker were further sub-typed for five additional markers

that identify clades C1, C2 and C3. These included the markers M8 (C1*), M38

(C2*), M217, PK2 (C3*), and M48 (C3a). Of these, only PK2 was detected. The PK2

marker is one of the several novel SNPs and it is phylogenetically equivalent to

haplogroup C3* (Mohyuddin et al., 2006). All of the Hazara (60%) and Burusho

(8.2%) RPS4Y711 derived Y chromosomes also had the derived allele for the PK2

marker.

YAP derived chromosomes constitute 3% of Pakistani population belonging to

clade DE* were observed mainly in the southern populations. Except for the

Mohanna, this haplogroup was observed in all southern populations with frequency

between 1.5%- 10.6%. The Pathans were the only northern population in which

these chromosomes were observed (2.1%). Several off-shoots of DE* clade were

analyzed and all Pakistani YAP positive (YAP+) Y chromosomes belonged to

haplogroup E* and carried the derived allele for SRY-8299. Further sub-typing of clade

E* defined three informative haplogroups; E1b1a*, E1b1b1a*, and E1b1b1c*. The

highest frequency of E1b1a* (marker sY81=M2) was observed in the Makrani-

Negroid (9.1%). These chromosomes were also found in the Makrani-Baloch (3.7%),

Brahui (3.4%) and in Baloch (1.5%). The remaining YAP+ chromosomes carried the

52
Table V: Frequency of haplogroups B*, C*, E* and F* in ethnic groups

from Pakistan.

Population n B* C* E* F*

Northern

Balti 14 - - - 1.000

Burusho 97 - 0.082 - 0.918

Hazara 224 - 0.600 - 0.402

Kalash 44 - - - 1.000

Kashmiri 12 - - - 1.000

Pathan 96 - - 0.021 0.976

Punjabi 185 - 0.016 - 0.984

Southern

Baloch 66 - - 0.106 0.894

Brahui 117 0.009 0.017 0.034 0.940

Makarini Baloch 27 - - 0.074 0.926

Makarani Negroid 33 0.030 - 0.121 0.848

Mohanna 70 - 0.043 - 0.957

Parsi 90 - - 0.056 0.944

Sindhi 138 - - 0.022 0.978

Total 1213 0.002 0.124 0.022 0.852

53
Figure III: Distribution of haplogroups B*, C*, E* and F* in populations from

northern and southern Pakistan.

54
Figure IV: Y haplogroups frequency distribution in ethnic groups of Pakistan.

55
derived allele for E1b1b1*-M35 haplogroup. This clade comprises six main branches

which have a wide distribution in Africa, Asia and Europe. Of these, the E1b1b1a*-

M78 and E1b1b1c*-M123 derived chromosomes were observed in Pakistan. It was

interesting that only two YAP+ populations i.e., Baloch from southern group and

Pathan from northern group share this E1b1b1a*-M78 haplogroup at a frequency of

6.1% and 2.1%, respectively. The majority of the southern populations carry the

derived allele for the M123 marker. The frequency of E1b1b1c*-M123 haplogroup

was 5.6% in the Parsi, 3.7% in the Makrani-Baloch, 2.2% in the Sindhi and 1.5% in

the Baloch.The derived allele for M89 was observed at very high frequency in

representatives from all population groups of Pakistan except for the Hazara. The

following branches of this haplogroup were observed in Pakistan:

Haplogroup G*-M201 which is distributed mainly in Eurasian populations

comprises 1.1% of the Pakistani Y chromosome. The frequency of this haplogroup

was highest in the Kalash from northern Pakistan. Haplogroup G* was also observed

in all southern populations except for the Baloch and Makrani-Baloch tribes. Low

frequency of G* was observed in the Mohanna, Burusho, and Gujar Y chromosome.

One major sub-clade of this haplogroup G2a*, which is derived for the P15

polymorphism, accounts for a major proportion of the variation observed in this

haplogroup in Pakistan. Haplogroup (G2a*) is widely distributed among the southern

populations. Among the northern group only Kalash and Pathan Y-chromosome

carry this haplogroup at a frequency of 18.1% and 1%, respectively.

The H1*-M52 haplogroup which is a sub clade of H*-M69 Y chromosomes

exhibits a frequency of 4% in Pakistan. The highest frequency was found in the Balti

(7.1%), Kalash (20.4%), Punjabi (7.6%), Makrani Negroid (6.1%) and Sindhi (5.8%)

samples (Table VI and Figure V). Individuals carrying the derived allele for H1* clade

were further sub-typed for two markers that identify clade H1a1-M36 and H1a2-M97.

Neither H1a1 nor H1a2 haplogroup were present in Pakistan.

56
Haplogroup I*-M170, A-C mutation on the Y chromosome is thought to have

arisen in Europe. The European Y-chromosome gene pool contains a high

frequency of this haplogroup. In Pakistan, frequency of M170 polymorphism was

<0.1% as it was only observed in one individual belonging to the Hazara population.

Clade J*, characterized by the 12f2a deletion, was widely distributed across

Pakistan. The majority of these Y chromosomes were represented by the J2a2* (M-

67 derived) haplogroup that is a major branch of the J2*-M172 haplogroup. The

J2a2* haplogroup was found in all ethnic groups examined and constituted 10% 0f

the population (Figure V). One offshoot of the J2a2* haplogroup, the J2a2a*

haplogroup characterized by the derived allele for the biallelic marker M92, was

observed in one southern population the Brahui (8.5%). The other main branch of

the J lineage, J1*-M267, was also observed in this population in addition to the

Baloch, Makrani-Baloch and Sindhi from southern Pakistan. The Pathan was the

only northern group that carried the J1* haplogroup, albeit at very low frequency

(1.0%).

A majority of non-African Y chromosomal haplogroup are derived for the M9

marker and fall in clades K*-T*. The derived allele for M9 is widespread in Pakistan

and accounts for 61% of all Y-chromosomes, all of which were resolved into sub-

clades L*, NO*, Q*, R* and T*. Lineages K1-K4, that are a component of the Asian

Y-chromosomal gene pool were not observed in Pakistan.

Sub-clade L*, defined by the A to G M20 SNP constitutes 11% of the

Pakistani population with frequency ranging from 1.1%-24.2%. Of the three well

characterized branches in this haplogroup the most dominant off-shoot present in

Pakistan is L1 that has the derived allele for M27. L1 occurs at an average

frequency of 5.0% and is present in all southern populations with a frequency of

24.2% in the Baloch and 1.4% in the Parsi. Among the northern populations this

haplogroup is observed only in the Pathan and Punjabi (Tables VI, VII and Figure V).
57
The L2*-M317 haplogroup, another offshoot of L* was observed in only two southern

populations the Parsis and Makrani- Baloch at frequencies of 13.3% and 3.7%,

respectively. The remaining branch L3* had a more widespread distribution and the

highest frequency was observed in the northern Burusho and Balti populations (Table

VI and Figure VI). L3a, a branch of L3*, characterized by the marker PK3 appears

only in Kalash population at a relatively high frequency (23%).

An extremely low frequency of the NO* clade was observed in Pakistan. The

12 individuals belonging to various branches of this clade were observed in two

northern (Burusho and Pathan) and two southern (Brahui and Mohanna) populations

only. The N1* (LLY22g derived) Y chromosomes were present in a Brahui and

Mohanna individual. The newly discovered haplogroup O2a1a-PK4 was found only

in the Pathan (4.2%) but the East Asian O3* M122 derived haplogroup was observed

in the Brahui (<1%), Burusho (3.1%) and Pathan (1%) samples. LY1 derived

haplogroup O3a3a* was present at low frequency in the Brahui only.

Two major Y haplogroups Q*-M242 and R*-M207 branch off clade P* that is

delineated by numerous SNPs including 92R7, M45 and M74. All P* chromosomes

were resolved into Q* and R* haplogroup. Haplogroup Q* occurs at an average

frequency of 1.8% in Pakistan and is observed in four northern (Burusho, Hazara,

Pathan and Punjabi) and four southern (Baloch, Brahui, Makrani-Baloch and Sindhi)

populations.

Haplogroup R* characterized by the M207 SNP has a widespread distribution

in Pakistan. It has two major branches R1* (M173 derived) and R2 (M124 derived)

which have a distinct geographic worldwide distribution. R1*, which is common in

Europe, West and Central Asia occurs at an average frequency of 4.8% and is

observed in all the Pakistani populations (Table VI and Figure VI). One derivative of

M173, R1a1-M17, which occurs at an average frequency of 35.1% in the population,

is the most common Y haplogroup in Pakistan. This particular haplogroup was

present in all population included in this study (Table VI and Figure VI). The highest
58
frequency of R1a1* was observed in the Mohanna (71.4%) and lowest in the Parsi

(7.8%). Other populations with appreciable (>50%) frequency of R1a1* included the

Kashmiri (58.3%), Punjabi caste (56.7%), and Sindhi (51.4%). On the background of

R1a1* haplogroup one of newly discovered haplogroups R1a1e-PK5 was observed

however, it was restricted only to the Burusho population (2.1%).

Haplogroup R2 that has the M124 derived allele occurs in many Pakistani

populations and has an average frequency of 5.8%. Except for the Mohanna it is

observed in all southern populations. Its distribution is patchy in the north of Pakistan

and it is found only in the Burusho, Kashmiri and Punjabi populations (Figure VI).

Haplogroup K2 (Y Chromosome Consortium, 2002) was recently reassigned

to new haplogroup T* (Karafet et al., 2008). This haplogroup is characterized by the

derived allele for M70 and was only found in a single Pathan individual.

59
Table VI: Number and frequencies of populations fall in haplogroup B-I.

No. Haplogroups

(SRY-8299)
(RPS4Y711)

(sY81=M2)

E1b1b1a

E1b1b1c
(M123)

(M201)

(M170)
E1b1a
(M60)

(M78)

(M89)

(M52)
(PK2)

(P15)
G2a
Population

C3

H1
G*
C
B

I
n

North
Balti 14 6 0 0 0 0 0 0 0 0 0 0 1(7.1) 0
Burusho 97 15 0 0 8(8.2) 0 0 0 0 1(1.0) 1(1.0) 0 4(4.1) 0
Hazara 224 9 0 0 134(60) 0 0 0 0 13(5.8) 0 0 0 1(0.5)
Kalash 44 8 0 0 0 0 0 0 0 0 0 8(18.1) 9(20.4) 0
Kashmiri 12 5 0 0 0 0 0 0 0 0 0 0 0 0
Pathan 96 16 0 0 0 0 0 2(2.1) 0 2(2.1) 10(10.4) 1(1.0) 4(4.2) 0
Punjabi 185 14 0 3 (1.6) 0 0 0 0 0 7(4.0) 1(0.54) 0 14(7.6) 0
South
Baloch 66 13 0 0 0 1(1.5) 1(1.5) 4(6.1) 1(1.5) 1(1.51) 0 0 0 0
Brahui 117 18 1(0.9) 2 (2.0) 0 0 4(3.4) 0 0 0 0 9(8.0) 1(1.0) 0
Makrani-B 27 11 0 0 0 0 1(3.7) 0 1(3.7) 0 0 0 0 0
Makrani-N 33 11 1(3.0) 0 0 1(3.0) 3(9.1) 0 0 0 0 1(3.0) 2(6.1) 0
Mohanna 70 9 0 3 (4.3) 0 0 0 0 0 0 1(1.4) 3(4.3) 2(2.9) 0
Parsi 90 11 0 0 0 0 0 0 5(5.6) 0 0 1(1.1) 2(2.2) 0
Sindhi 138 13 0 0 0 0 0 0 3(2.2) 2(1.5) 0 2(1.5) 8(5.8) 0
Total 1213 2 8 142 2 9 6 10 26 13 25 47 1
33 (0.2) (0.7) (11.7) (0.2) (0.7) (0.5) (0.8) (2.1) (1.1) (2.1) (4.0) (0.08)
%

Cont.

60
Table VI: Number and frequencies of populations fall in haplogroup J-L.

Population

(12f2a)

(M267)

(M172)

(M317)

(M357)
J2a2a
(M67)

(M92)

(M20)

(M27)

(PK3)
J2a2

L3a
L1

L2

L3
J1

J2

L
J
n

North
Balti 14 0 0 0 2(14.3) 0 0 0 0 2(14.3) 0
Burusho 97 0 0 1(1.0) 7(7.2) 0 3(3.1) 0 0 14(14.4) 0
Hazara 224 21(9.4) 0 3(1.4) 1(0.5) 0 0 0 0 0 0
Kalash 44 0 0 0 4(9.1) 0 1(2.3) 0 0 0 10(23.0)
Kashmiri 12 1(8.3) 0 0 1(8.3) 0 0 0 0 0 0
Pathan 96 0 1(1.0) 0 5(5.2) 0 0 5(5.2) 0 7(7.3) 0
Punjabi 185 1(0.54) 0 0 18(9.7) 0 2(1.1) 15(8.2) 0 4(2.2) 0
South
Baloch 66 0 2(3.0) 0 6(9.1) 0 0 16(24.2) 0 3(4.5) 0
Brahui 117 5(4.3) 6(5.1) 0 10(8.5) 10(8.5) 0 7(6.0) 0 2(1.7) 0
Makrani-B 27 0 1(3.7) 0 5(18.5) 0 1(3.7) 2(7.4) 1(3.7) 0 0
Makrani-N 33 0 0 0 6(18.1) 0 0 2(6.1) 0 1(3.0) 0
Mohanna 70 0 0 0 3(4.3) 0 1(1.4) 6(8.6) 0 0 0
Parsi 90 0 0 0 35(38.9) 0 3(3.3) 1(1.4) 12(13.3) 0 0
Sindhi 138 2(1.45) 4(3.0) 0 19(14.0) 0 0 6(4.4) 0 4(3.0) 0
Total 30 14 4 122 10 11 60 13 37 10
1213 (2.5) (1.2) (0.3) (10.1) (0.8) (0.9) (5.0) (1.1) (3.0) (0.8)
%

Cont.

61
Table VI: Number and frequencies of populations fall in haplogroup N-T.

(LLY22g)

(M122)

(M242)

(M207)

(M173)

(M124)
O2a1a

O3a3a

R1a1e
Population

(M17)

(M70)
(PK4)

(PK5)
(L1Y)

R1a1
O3
N1

R1

R2
Q

T
N

North
Balti 14 0 0 0 0 0 2(14.3) 1(7.1) 6(43.0) 0 0 0
Burusho 97 0 0 3(3.1) 0 2(2.1) 11(11.3) 1(1.0) 25(25.8) 2(2.1) 14(14.3) 0
Hazara 224 0 0 0 0 4(2.0) 0 26(11.6) 21(9.4) 0 0 0
Kalash 44 0 0 0 0 0 3(7.0) 1(2.3) 8(18.1) 0 0 0
Kashmiri 12 0 0 0 0 0 0 2(16.6) 7(58.3) 0 1(8.3) 0
Pathan 96 0 4(4.2) 1(1.0) 0 5(5.2) 1(1.0) 4(4.2) 43(44.8) 0 0 1(1.0)
Punjabi 185 0 0 0 0 1(0.55) 2(1.1) 4(2.1) 105(56.7) 0 8(4.3) 0
South
Baloch 66 0 0 0 0 2(3.1) 0 4(6.1) 19(28.8) 0 6(9.1) 0
Brahui 117 1(0.8) 0 1(0.8) 1(1.0) 1(1.0) 0 3(2.6) 45(38.4) 0 8(7.0) 0
Makrani-B 27 0 0 0 0 1(3.7) 0 1(3.7) 9(33.3) 0 4(15) 0
Makrani-N 33 0 0 0 0 0 0 4(12.1) 10(30.3) 0 2(6.1) 0
Mohanna 70 1(1.43) 0 0 0 0 0 0 50(71.4) 0 0 0
Parsi 90 0 0 0 0 0 1(1.1) 4(4.4) 7(7.8) 0 19(21.1) 0
Sindhi 138 0 0 0 0 6(4.3) 0 3(2.2) 71(51.4) 0 8(6.0) 0
Total 2 4 5 1 22 20 58 426 2 70 1
1213 (0.2) (0.3) (0.4) (0.1) (1.8) (1.6) (4.8) (35.1) (0.2) (5.8) (0.1)
%

62
Table VII: Y lineages found in the three Punjabi castes examined in this study.
No. haplogroups

(RPS4Y711)

(12f2a)
(M201)

(M357)

(M242)

(M207)

(M173)

(M124)
R1a1*
(M89)

(M52)

(M67)

(M20)

(M27)

(M17)
J2a2*
H1*

R1*
Populations

L3*
C*

R2
G*

Q*
L1

R*
F*

L*
J*
n

Gujar 159 13 2 6 1 14 1 17 2 15 4 0 1 3 86 7
(1.3) (3.8) (0.6) (8.8) (0.6) (10.6) (1.3) (9.4) (2.5) - (0.6) (1.3) (55) (4.4)

Meo 16 4 1 0 0 0 0 1 0 0 0 0 0 1 13 0
(6.2) - - - - (6.3) - - - - - (6.25) (81) -

Rajput 10 5 0 1 0 0 0 0 0 0 0 1 1 0 6 1
- (10) - - - - - - - (10) (10) - (60) (10)

Total 185 14 3 7 1 14 1 18 2 15 4 1 2 4 105 8


(%) (1.6) (3.8) (0.5) (7.6) (0.5) (9.7) (1.1) (8.1) (2.2) (0.5) (1.1) (2.2) (57) (4.3)

63
Figure V: Distribution of major Y lineages (PK2, M52, M67 and M27) frequencies

in Pakistan (frequencies are shown in table VI).

64
Figure VI: Distribution of major Y lineages (M357, M173, M17 and M124)

frequencies in Pakistan (frequencies are shown in table VI).

65
PHYLOGENETIC ANALYSES

PRINCIPAL COMPONENT ANALYSIS:

The Principal Component Analysis was carried out in order to examine

population relationships. This analysis is based upon the frequencies of thirty three

Y haplogroups in Pakistani ethnic groups. The principal component, PC1 and PC2,

account for 72% of the variation in the population (Figure VII). The PC analysis

shows that the all Pakistani populations group together, with the exception of the

Hazara, who are relatively distinct from other Pakistani ethnic groups and are

clustered in the lower right quadrant of the graph. Interestingly, other populations

such as, Brahui and Balti which are linguistically different from others; and the

Kalash, that are isolated; did not stand out and grouped with other ethnic group from

Pakistan.

PHYLOGENETIC ANALYSIS:

Analysis of Molecular Variance (AMOVA) was carried out using the Arlequin

software. The populations were grouped on the basis of ethnicity, geographic origin

and the linguistic affiliation. On the basis of this analysis we ascribed that ethnically

the population were significantly different from each other (p value Va vs FCT:

0.02050.0050). As expected, majority of the variation was explained by variation

within Pakistani population (Table VIII).

The pair-wise FST values between Pakistani ethnic groups based on the

haplogroups frequencies also corroborate this result. The P-value matrix of

significance; based upon 110 permutations among the Pakistani populations with

significance level of 0.05; also demonstrated that significant variation occurs among

the populations (Tables IX and X).

66
Figure VII: Principal component analysis based on Y haplogroup frequencies

in Pakistani populations.

Balti: Blt, Burusho: Bsk, Hazara: Hzr, Kalash: Kal, Kashmiri: Ksr, Pathan: Pkh,

Gujar: Gjr, Meo: Meo, Rajput: Rpt, Baloch: Ball, Brahui: Bru, Makrani-Baloch:

Mak-B, Makrani-Negroid, Mak-N, Mohanna: Mhn, Parsi: Prs, Sindhi: Sdh.

67
Table VIII: Percentage of variation obtained by AMOVA at three levels of population hierarchy in ethnic groups from Pakistan.

Basis for Number Percentage of variation Variance components Fixation Indices p value
grouping of Among Among Within Va Vb FCT FSC FST Va vs FCT
groups groups populations populations (1023 permutations)
within
groups
None 1 - 15.22 84.78 0.0649 0.3617 - - 0.1522 -
Ethnicity 13 14.45 0.90 84.65 0.0618 0.0038 0.0105 0.1445 0.1535 0.0205 0.0050
Geographic 2 1.12 14.52 84.36 0.0048 0.0623 0.0112 0.1469 0.1564 0.4076 0.0167
Linguistic 4 - 8.99 19.34 89.65 - 0.0363 0.0780 - 0.0899 0.1774 0.1035 0.9746 0.0047

68
Table IX: Population pair wise FSTs between Pakistani ethnic groups computed from Y haplogroup frequencies.

FST p values (based upon 110 permutations) are given above the diagonal with * indicating significant pair wise

differences.

Population BAL BRU MAKB MAKN MHN PRS SDH BLT BSK HZR KAL KSR PKH MEO GJR RPT
Baloch (BAL) - 0.0000* 0.3153 0.0630 0.0000* 0.0000* 0.0000* 0.1081 0.0000* 0.0000* 0.0000* 0.0360* 0.0000* 0.0000* 0.0000* 0.0360*
Brahui (BRU) 0.0275 - 0.3063 0.1982 0.0000* 0.0000* 0.0180* 0.3243 0.0000* 0.0000* 0.0000* 0.2882 0.0090* 0.0000* 0.0000* 0.1801
0.1982
Makrani Baloch (MAKB) 0.0053 0.0016 - 0.8018 0.0000* 0.0090 0.0720 0.3423 0.0991 0.0000* 0.0000* 0.3063 0.05405* 0.0000* 0.0180*
0.0810
Makrani Negroid (MAKN) 0.0146 0.0088 -0.0146 - 0.0000* 0.0000* 0.0270* 0.5495 0.0090* 0.0000* 0.0000* 0.3063 0.0180* 0.0090 0.0000*
0.2973
Mohanna (MHN) 0.1405 0.0774 0.1280 0.1392 - 0.0000* 0.0000* 0.0180* 0.0000* 0.0000* 0.0000* 0.1711 0.0000* 0.5225 0.0000*
0.0000*
Parsi (PRS) 0.1148 0.1268 0.0539 0.0728 0.3099 - 0.0000* 0.0000* 0.0000* 0.0000* 0.0000* 0.0000* 0.0000* 0.0000* 0.0000*
0.5855
Sindhi (SDH) 0.0549 0.0172 0.0143 0.0284 0.0376 0.1647 - 0.4234 0.0000* 0.0000* 0.0000* 0.6486 0.0270* 0.0720 0.3783
0.5585
Balti (BLT) 0.0339 0.0058 0.0019 -0.0087 0.0899 0.1261 -0.0026 - 0.4324 0.0000* 0.0000* 0.5225 0.4774 0.0810 0.1891
0.0720
Burusho (BSK) 0.0458 0.0389 0.0188 0.0273 0.1585 0.0991 0.0629 -0.0000 - 0.0000* 0.0000* 0.0270* 0.0000* 0.0000* 0.0000*
0.0000*
Hazara (HZR) 0.2653 0.2603 0.2721 0.2580 0.3997 0.3058 0.3072 0.2882 0.2109 - 0.0000* 0.0000* 0.0000* 0.0000* 0.0000*
0.0090
Kalash (KAL) 0.1002 0.0797 0.0799 0.0586 0.2338 0.1374 0.1224 0.0650 0.0759 0.2818 - 0.0000* 0.0000* 0.0000* 0.0000*
0.8918
Kashmiri (KSR) 0.0535 0.0052 0.0117 0.0149 0.0224 0.1798 -0.0144 -0.0124 0.0591 0.3150 0.1299 - 0.3513 0.3243 0.4864
0.3693
Pathan (PKH) 0.0418 0.0193 0.0264 0.0272 0.0580 0.1721 0.0129 -0.0075 0.0467 0.2812 0.1024 0.0023 - 0.0000* 0.0090
0.3693
Meo (MEO) 0.1653 0.0943 0.1408 0.1459 -0.0113 0.3160 0.0470 0.1031 0.1675 0.4194 0.2485 0.0112 0.0720 - 0.0630
0.4864
Gujjar (GJR) 0.0582 0.0279 0.0329 0.0416 0.0255 0.1941 0.0002 0.0062 0.0772 0.3193 0.1354 -0.0074 0.0164 0.0415 -
-
Rajput (RPT) 0.0590 0.0115 0.0216 0.0464 0.0096 0.2071 -0.0135 -0.0106 0.0429 0.3293 0.1292 -0.0389 -0.0047 0.0216 -0.0099

69
Table X: Matrix of significant. FST p values (significance level =0.0500) based upon 110 permutations among the
ethnic group of Pakistan.

Population BAL BRU MAKB MAKN MHN PRS SDH BLT BSK HZR KAL KSR PKH MEO GJR RPT

Baloch (BAL) -------- + - - + + + - + + + + + + + +


-
Brahui (BRU) + --------- - - + + + - + + + - + + +
Makrani Baloch (MAKB) - - ----------- - + + - - - + + - - + + -

Makrani Negroid (MAKN) - - - ---------- + + + - + + + - + + + -

Mohanna (MHN) + + + + ---------- + + + + + + - + - + -

Parsi (PRS) + + + + + --------- + + + + + + + + + +

Sindhi (SDH) + + - + + + ---------- - + + + - + - - -

Balti (BLT) - - - - + + - ---------- - + + - - - - -

Burusho (BSK) + + - + + + + - --------- + + + + + + -


Hazara (HZR) + + + + + + + + + --------- + + + + + +

Kalash (KAL) + + + + + + + + + + --------- + + + + +

Kashmiri (KSR) + - - - - + - - + + + ---------- - - - -

Pathan (PKH) + + - + + + + - + + + - --------- + + -

Meo (MEO) + + + + - + - - + + + - + --------- - -

Gujjar (GJR) + + + + + + - - + + + - + - --------- -


----------
Rajput (RPT) + - - - - + - - - + + - - - -

70
MEDIAN-JOINING NETWORK:

Genetic variations among the Pakistani populations were further investigated

by making median-joining network (Bandelt et al., 1995). Here we present L*-M20

lineage network (Figure VIII). The L lineage is considered to arise in Indus valley

region during the Indus valley civilization. The network revealed four clusters,

representing four haplogroups. Samples encircled in red represent L1-M27

haplogroup, samples carrying the L2*-M317 haplgroup were encircled in green and

L3a-PK3 samples were encircled in yellow. The remaining samples carry L3*-M357

haplogroup. The network of L lineage reveals considerable variation among the

Pakistani populations; conversely this net work shows a high degree of population-

specific sub-structure. The network shows isolated Parsi-specific clusters at the

upper right end containing 15 of 16 Parsis. The Kalash fall into two clusters and

Burusho make a cluster at the middle of the net work. Haplotype sharing is the other

striking feature of this network. Within a specific population, for example, the

Burusho, Kalash and Parsi share some haplotypes. However, the four Baloch

individuals shared their haplotype with Sindhi and Makrani-Baloch individuals from

nearby southern population. Similarly, one haplotype was shared between a Brahui

and a Makrani-Negroid individual.

71
Figure VIII: Median-joining network of Lineage L individuals based on YSTR

haplotypes.

72
SECTION 2

COMPARISONS BETWEEN THE PAKISTANI AND GREEK

POPULATIONS:

Current study also included three ethnic groups from northern Pakistan ___

the Burusho, Kalash and Pathan ___ that claim Greek ancestry. These populations

were compared with extant Greek samples from Europe that were genotyped for the

same Y markers. The Y-chromosomal haplgroups and their frequencies in the

Greeks, Burusho, Kalash, Pathan and the rest of the Pakistani populations are

shown in Figure IX.

HAPLOGROUP FREQUENCIES IN PAKISTAN AND GREEK

POPULATIONS:

The combination of biallelic markers identified 13 Y-chromosomal

haplogroups in the Greeks, 16 in the Pathan and 15 in the Burusho populations.

Only eight Y haplogroups were found in the Kalash population. More than 75% of

these samples were represented by haplogroups which are frequent in West Asia,

Europe and the Mediterranean region.

A comparison of the three Pakistani ethnic groups with the Greek populations

shows that certain haplogroups are shared between these populations. These

include clades E*, F*, I*, J*, R1* and T*. Majority of the Pakistani and Greek Y

chromosomes have the derived allele for the M207 marker that encompasses

branches R1* and R1a1* of the Y chromosome phylogenetic tree (Figure IX). R1a1*

was the most common haplogroup found in Pakistan (35.9%) and Greece (15.6%).

Compared to the Greek the frequency of haplogroup R1a1* was relatively higher in

the Pathan (44.8%), Burusho (25.8%) and Kalash (18.2%) samples. Clade R1*

represented by the derived allele for SNP M173 was observed in 11.7% of the Greek

73
and 5.32% of the Pakistani samples. The Greek population exhibited a higher

frequency of this clade in comparison with the Burusho (1.03%), Kalash (2.27%) and

Pathan (4.2%).

Haplogroup J* was the other haplogroup that was found at a high frequency

in the Greek (17%) and Pakistani (14.8%) samples. The overwhelming majority of

Greek J* chromosomes belonged to haplogroup J2* which was present at a

comparable frequency in Pakistan. This haplogroup J2* (including all its derivatives)

was present at a frequency of 15.6% in the Greek, 8.2% in the Burusho, 9.09% in the

Kalash and 5.2% in Pathan. The majority of J2* Y chromosomes in Pakistan

belonged to haplogroup J2a2*, being derived for the marker M67. The Greek

samples could not be typed for this SNP due to lack of DNA. The J1* haplogroup

characterized by the derived allele for M267 was absent in the Burusho and Kalash

populations and was found at low (1%) frequency in the Greek and Pathan.

Clade E* haplogroup were more frequent in the Greek (21%) as compared to

Pakistan (2.2%). The majority of haplogroup E* chromosomes belonged to clade

E1b1b1* (M35 derived) and all Greek and Pakistani samples were resolved into the

branches E1b1b1a* (M78 derived) and E1b1b1c* (M123 derived). Among the three

Pakistani populations claiming Greek descent the M78 derived Y chromosomes were

observed only in the Pathan (2%). This branch constituted 16.9% of the Greek

samples. Clade E1b1b1c* was present at a frequency of only 2.6% in the Greek and

was absent in the Burusho, Kalash, Pathan populations. Its frequency in the

remaining Pakistani populations was 1%.

All G*-M201 derived Greek Y chromosomes (9% of total) belonged to the

G2a* haplogroup characterized by the T allele for SNP P15 (Hammer et al., 2000).

This haplogroup was observed in 18.18% of Kalash and 1% of the Pathan samples

and was absent in the Burusho.

Two branches that frequently characterize Y chromosomes found outside Africa are

H* and I* which distinguish eastern and western populations respectively.


74
Figure IX: A rooted maximum-parsimony tree of Y lineages found in the Greek, Burusho, Kalash, Pathan and Pakistani
populations. The lineages were defined by binary markers whose designations and population frequencies (percentages) are
given below each branch. Branch lengths are arbitrary and the YCC lineage names (Karafet et al., 2008) are shown below the
frequencies. Haplogroup and haplotypes diversity are shown for each population.

75
(Rootsi et al., 2004; Underhill et al., 2001). One Greek sample belonged to

haplogroup H2*, which is characterized by the Apt G to A transition (Pandya et al.,

1998). These Y chromosomes are not found in Pakistan but have been observed in

neighboring India and this is the first time they have been observed in Greece.

Haplogroup I* characterized by the derived allele for M170 is mainly restricted

to Europe and was observed in 19.5% of the Greek sample. This haplogroup was

not observed in the Burusho, Kalash or Pathan and its frequency in Pakistan was <

0.2%.

Only a small proportion of Y chromosomes remain unresolved in clade F* and

were represented by 2% of the Pathan and 1% of the Greek and Burusho samples. It

is possible that in this case distinct haplogroups, as yet unknown, are being classified

into the same paraphyletic haplogroup.

STATISTICAL AND PHYLOGENETIC ANALYSES

PRINCIPAL COMPONENT ANALYSES:

In order to examine population relationships principal component analysis

based upon Y haplogroup frequencies in the Greek and Pakistani ethnic groups was

carried out (Figure X). The first two principal components, PC1 and PC2, account for

79% haplogroup frequency data and separate the populations according to their

geographic locations. The plot shows the Pathan and Burusho populations clustering

with the remaining Pakistani populations in the upper right quadrant of the graph.

The Kalash and Greek form two separate and distinct clusters. To ensure that the

Greek individuals included in this study were representative of the Greek population

studied earlier, results of comparable biallelic data (Francalacci et al., 2003) were

incorporated in the principal component analysis (Figure XI). The Greek population

included in this study clustered with the Greek populations studied earlier but the

distinct Kalash population cluster was not apparent.

76
Figure X: A plot of the first two principal coordinates based upon the analysis

of Y haplogroup frequencies in Pakistani and Greek populations.

77
Figure XI: A plot of the first two principal coordinates based upon the analysis

of Y haplogroup frequencies in Pakistani and Greek samples (1=this study; 2

= Francalacci et al., 2003) using comparable biallelic markers.

78
GENETIC DISTANCES AND PHYLOGENETIC ANALYSIS:

The genetic distances between the populations were calculated using

measures that are more sensitive to recent events (Table XI). The PakistaniGreek

population pair wise FST values based on the variation of STRs within haplogroups

(Qamar et al., 2002) ranged from 0.131 to 0.213, with the lowest value between the

Pathan and the Greeks. Pairwise genetic distances (the number of steps between

a haplotype in one population and the closest haplotype in the second population,

averaged over all comparisons) (Bandelt et al., 1999) ranged from 4.3 to 8.1, with the

lowest value again between the Pathan and the Greeks.

Phylogenetic analysis using the matrix of genetic distances between

populations with tree validation carried out by bootstrap re-sampling (10,000

replicates) also demonstrated that of the three Pakistani populations, the Pathans

were closest to the Greek (Figure XII).

Therefore, together these results, suggest that there might have been a low

degree of recent PathanGreek admixture. Examination of individual lineages by the

NETWORK software using Y-STR frequencies was carried out to investigate this

possibility further.

79
Table XI: Weighted population pair wise genetic distances (below diagonal)

and FST values (above diagonal) based on STR variation within haplogroups.

Greek Burusho Kalash Pathan

Greek 0.000 0.188 0.213 0.131

Burusho 5.659 0.000 0.214 0.196

Kalash 8.066 3.882 0.000 0.219

Pathan 4.277 2.451 3.254 0.000

80
Figure XII: Neighbor-joining tree showing the relationship between the Greek

and three Pakistani ethnic groups. The tree is based on genetic distances.

Bootstrap values from 10,000 replicates are shown.

81
MEDIAN-JOINING NETWORK:

A median-joining network of clade E1b1b1a* Y chromosomes was

constructed in order to examine the genetic relationship between the Greek and

Pathan samples. A duplication of 10 and 13 repeat units was observed in the clade-

E derived Y chromosomes for the tri-nucleotide repeat DYS425 and this locus was

subsequently excluded from the network. The most striking feature of this network

was the sharing of haplotypes between the Pathan and Greek samples (Figure XIII).

One Pathan individual shared the same Y-STR haplotype with three Greek

individuals, and the other Pathan sample was separated from this cluster by a single

mutation at the DYS436 locus. This demonstrates a very close relationship between

the Pathan and Greek E lineages.

82
Figure XIII: Median-joining network of clade E* lineages in Pakistan (open

circles) and Greece (hatched circles). Circles represent haplotypes and have

an area proportional to frequency. The Pathan individuals are shown in black.

83
CONTOUR MAPPING:

The worldwide distribution and frequency of the haplotype shared between

the Greek and Pathan clade E1b1b1a* individuals was checked in the Y-STR

Haplotype Reference Database (YHRD; Roewer et al., 2001). Worldwide data for

the subset of 16 Y-STRs including DYS19, DYS388, DYS389I, DYS389II, DYS390,

DYS391, DYS392, DYS393, DYS425, DYS426, DYS434, DYS437, DYS435,

DYS438, DYS436, DYS439 were not available in this database. However, part of

this haplotype based upon a subset of nine Y-STRs (DYS19=15; 389I=13; 389II=29;

390=24; 391=10; 392=11; 393=12; 438=9; 439=12) was found in 53 individuals in a

worldwide population sample of 7,897 haplotypes. This haplotype was highly specific

for the Balkans. The contour map of this haplotype (Figure XIV) shows a major

concentration in the Balkans, around Macedonia and Greece, with a low scattering in

other European countries and a comparable frequency in Tunisia and West Africa

and the Pathan. This gives a strong indication of an European, possibly Greek,

origin of these Pathan Y chromosomes.

84
Figure XIV: Contour map showing the 9 Y-STR haplotype frequency

distribution in Eurasia and northern Africa. This haplotype was shared

between three Greeks and a Pathan individual belonging to clade E1b1b1a*.

85
DISCUSSION

-6-
Our DNA is inherited from our ancestors, so genetic analysis can be used to

provide information regarding our history. The Y chromosome is particularly useful in

this respect because most of it is passed down from father to son without change,

except for the gradual accumulation of mutations which appear as DNA

polymorphisms. The present study provides an example of the power of a

genealogical approach to Y-chromosome analysis based on a hierarchical use of

specific markers in the Pakistani population.

Pakistan lies on the postulated southern coastal route out of Africa. The

earliest evidence suggests this region was colonized about 60,000-70,000 years ago.

Pakistan was the site of several ancient cultures such as Mehrgarh, one of the

world's earliest known towns, present in the southern Pakistani province of

Baluchistan (Jarrige, 1991) and evidence from this region indicates that modern

humans were settled in this region during the Neolithic period. The region's other

earliest civilizations were the Indus Valley civilization at Harappa and Mohenjo-Daro.

Moreover, the Indo-Pak subcontinent has become home to a multitudinous variety of

racial groups due to the invasion of the region through out the millennia. Thus, it is

one of the most genetically diverse areas in the world today.

Present day Pakistan is bordered by Iran and Afghanistan on the west, India

towards the east and China in the north. The Indian Ocean straddles its entire

southern coast line. The Himalayan Hindukush Mountains form a formidable

presence in the north and north west.

The diversity of Y chromosome has been extensively used to study the

genetic variation in humans. Human Y chromosomes are delineated into distinct

haplogroups and lineages, defined by a combination of unique event or biallelic

polymorphisms and Y-STRs. Each haplogroup represents a unique chromosome

lineage that originated from a single male ancestor somewhere in the world in the

past. The discovery of new paragroups and the formerly discovered lineages have

made it possible to carry out detailed population genetic analysis based on


86
haplogroup and haplotype frequencies. The spread of each haplogroup is assumed

to be unaffected by both selection and male migration. However, the haplogroup

frequencies in an area may be influenced by demographic factors and genetic

founder effects such as gene flow and genetic drift.

In the current study we examined 93 biallelic markers in 1,213 male subjects

from 16 ethnic groups of Pakistan and a Greek population by a variety of PCR

techniques. The extensive analyses of Y diversity allowed us to investigate:

1. The genetic diversity within Pakistani ethnic groups from the male

perspective.

2. Comparison of three Pakistani populations (the Burusho, the Kalash

and the Pathan) with the Greek population. These Pakistani

populations claim that they are the descendent from the Greek

soldiers which were left behind in this region by Alexander the Great.

3. Genetic differences between male individuals from Pakistan in

comparison to world populations.

4. Gain insight into the origin of Pakistani ethnic groups.

87
PART 1

COMPARSION WITHIN PAKISTAN:

According to their geographic distribution Pakistani populations were

characterized into two categories; the northern group that incorporated the Punjabi

populations and a southern group. The northern populations that were screened

included Balti, Burusho, Hazara, Kalash, Kashmiri, Pathan and the Punjabis (Gujar,

Meo and Rajput) castes. The populations from the south of Pakistan included

Baloch, Brahui, Makrani-Baloch, Makrani-Negroid, Mohanna, Parsi and Sindhi. The

combination of 93 biallelic markers identified 33 stable Y chromosomal haplogroups

in the Pakistani populations (Table VI).

Haplogroups H1*-M52, J2a2*-M67, L1-M27, R1a1*-M17, R2-M124 which are

frequent in South Asia, Europe and the Mediterranean region, together make up 60%

of the Pakistani populations. It was also observed that the southern population group

is more genetically diverse as compared to the northern group. Forty-five percent

(45%) of southern populations carry these 33 Y haplogroups, whereas they are found

in 39% and 15% of northern and Punjabi populations respectively. In this study, we

also screened 1,213 Pakistani individuals for five novel Y-SNPs PK1-PK5

(Mohyuddin et al., 2006). Three SNPs identify population specific haplogroups within

Pakistan. L3a-PK3 was found solely in the Kalash population, the O2a1a-PK4 was

restricted to Pathan population while R1a1e*-PK5 was confined to the Burusho.

Based upon the Y haplogroup frequencies principal component (PC) analysis,

it is observed that all the ethnic groups from Pakistan cluster together except the

Hazara (Figure VII). Although the Pakistani population include geographically,

culturally and the linguistically isolated ethnic groups such as Kalash, Burusho and

the Dravidian speaking Brahui, however, they do not stand out in the over all

comparison.

88
Haplogroup C*-chromosome and its off-shoot separate the northern and

southern region within Pakistan. C*-RPS4Y haplogroup was only found in two

southern populations the Mohanna (4.3%) and Brahui (2%). Interestingly, the

Punjabis from the northern part carry this haplogroup (1.6%) as well (Table VI).

However, C3-PK2, one of the newly identified off-shoots of C*-RPS4Y haplogroup

was found only in two northern ethnic groups (Table VI). This haplogroup was

highest among the Hazara (60%) followed by the Burusho (8.2%). The C*-RPS4Y

haplogroup is fairly common in Central Asia and Mongolia and it points towards the

Mongol origins of the Hazara population which is supported historically (Bellew,

1979) and genetically (Qamar et al., 2002; Zerjal et al., 2003). However, the origin of

Burusho is not well documented. Some claim that they are the descendants of

Greek soldiers while some others claim that they are descendants of Dards from

Central Asia (Biddulph, 1977). The analysis of Francalacci and Rootsi shows that

the Haplogroup C* chromosome is not present in Greece (Francalacci et al., 2003;

Rootsi et al., 2004). On the other hand, one of the earlier studies shows that the

populations belonging to Tajikistan clustered with Hunza Burusho (Wells et al.,

2001). Furthermore, the studies with the autosomal genetic markers (Ayub et al.,

2003; Mansoor et al., 2004) and markers of Y chromosome (Firasat et al., 2007)

suggest that the Burusho are genetically close to their geographic neighbors. The

high frequencies of haplogroup C *-chromosome in Hazara, Burusho and in Central

Asia suggest that the C*-chromosome arose in Central Asia before the separation of

these two Pakistani populations (Mohyuddin et al., 2006).

Major haplogroups of clade E*, E1b1a*-sY81 and off-shoots of E1b1a*-sY81

were also detected with higher frequency in the southern group of Pakistan as

compared to northern and the Punjabi group. Haplogroup E*-SRY-8299 has been

reported to have a North African origin and is not found in northern Pakistani ethnic

groups and the Punjabi group (Qamar et al.,1999). However, a low frequency of this

haplogroup is found in the southern group of Pakistan (0. 2%). The haplogroup of
89
E1b1a*-sY81 (M2) is sub-Saharan in origin and is found in Baloch, Brahui, Makrani-

Baloch and Makrani-Negroid (1.5%, 3.4%, 3.7 and 9.1% respectively) populations of

the south (Table VI). The highest frequency of haplogroup E1b1a*-Sy81 is found in

the Makrani-Negroid population (9.1%) who are reported to have a recent African

origin. The highest frequency of E1b1a*-Sy81 in Makrani-Negroid could represent

the genetic legacy of the African slaves that were brought to the Indo-Pakistan

subcontinent by the Arabs and European invaders.

The other sub clade of E-haplogroup is E1b1b1*-M35 that originated in East

Africa (Semino et al., 2004). The remaining E1b1b1* Pakistani Y chromosomes were

resolved into two branches E1b1b1a*-M78 and E1b1b1c*-M123. The E1b1b1a*-M78

haplogroup was present only in Pathan (2.1%) from northern site and Baloch (6.1%)

from southern site of Pakistan (Table VI). All the E1b1b1*-M35 chromosomes from

southern Pakistan further resolved into E1b1b1c*-M123 haplogroup. The Y-

chromosome of E1b1b1a*-M78and E1b1b1c*-M123 haplogroup are also found in

Iran (Regueiro et al., 2006), Turkey (Cinnioglu et al., 2004) and in Greece (Firasat et

al., 2007). It is also possible that the clade E haplogroup expands with the spread of

agriculture (Hammer et al., 1998; Semino et al., 2000).

The G*-M201 haplogroup is present with a low frequency in Pakistani ethnic

groups. The highest frequency of G*-M201 haplogroup is only observed in Pathan

(10.4%). Towards the south the frequency of G*-M201 dramatically decreased and

only 1.4% Mohanna carry this haplogroup (Table VI). Haplogroup G*-M201 occurs at

~ 30% in Georgia (Semino et al., 2000) and the north Caucasus (Nasidze et al.,

2003), 10.9% in Turkey (Cinnioglu et al., 2004), 2.2% in Iraq (Al-Zharery et al., 2003)

and 1.33% in Iran (Regueiro et al., 2006). This haplogroup is also found in southeast

Europe and in the Mediterranean regions (Semino et al., 2000). In contrast to the

haplogroup G*-M201, the G2a*-P15 haplogroup is the most frequently present

haplogroup in Southern group of Pakistan. Except the Baloch and the Makrani-

Baloch this haplogroup is found in all other ethnic groups belonging to southern
90
Pakistan. However, from northern Pakistan only Kalash and Pathan carry this

haplogroup. G2a*-P15 haplogroup occurs at 9% in Turkey (Cinnioglu et al., 2004),

5% in Italy and Greece (DiGiacomo et al., 2003) and 7.33% in Iran and throughout

the Middle East with a maximum of 19 % in the Druze (Hammer et al., 2000).

Haplogroup H1*-M52 was observed almost in all ethnic groups of Pakistan.

The highest frequency of H1*-M52 Y chromosome was found in Kalash (20.4%)

followed by the Gujar (7.6%), Balti (7.1%), Makrani-Negroid (6.1%), Sindhi (5.8%)

etc. (Tables VI and VII). Many studies have showed that the clade H originated

within the Indo-Pak subcontinent (Gayden et al., 2007; Kivisild et al., 2003; Pandya et

al., 1998; Sengupta et al., 2006). The frequency of this indigenous haplogroup was

found higher in southern India (Ramana et al., 2001; Wells et al., 2001) as compared

to the northwest Punjab (Kivisild et al., 2003). Other than India and Pakistan this

haplogroup was found in Newar (6.1%), Kathmandu (11.7%) (Gayden et al., 2007)

and in Turkey (0.38%) (Cinnioglu et al., 2004). The other branch of Clade H*, H2*-

APT, is also found with higher frequency in India but none of the Pakistani Y-

chromosome carry this haplogroup. It is also interesting that the Greek Y

chromosome carry H2*-APT haplogroup at low frequency (Firasat et al., 2007).

Haplogroup J* is identified by the 12f2 human endogenous retroviral

polymorphism (Sun et al., 2000; Rosser et al., 2000). Haplogroup J* Y chromosome

is widely distributed in Eurasia, Middle East, and in North Africa (Hammer et al.,

2001; Quintan-Murci et al., 2001). Haplogroup J* branches were distributed across

all Pakistani populations. The low frequency of J1*-M267 was detected in Pakistani

populations. This haplogroup characterized African and Arabian populations and the

frequency of J1*-M267 chromosome decreases towards the north and east direction.

The high frequencies of this haplogroup were found in Oman (38%) (Luis et al.,

2004); Iraq (33%) (Al-Zahery et al., 2003); Egypt (20%) (Luis et al., 2004); Lebanon

(13%) (Semino et al., 2000); Turkey (9%) (Cinnioglu et al., 2004); Iran (10.5%)

(Regueiro et al., 2006); India (0.27%) and East Asia (0%) (Sengupta et al., 2006);
91
and in Pakistan (1.2%). The frequencies of this haplogroup indicate the differential

influence from East Africa and Middle East in southwestern Asia. However, the other

clade of J* haplogroup the J2* haplogroup are distributed mainly in west Asians and

Eurasian populations. The demographic expansion of J2* chromosomes occurred

during the dispersal of Neolithic farmers (King and Underhill, 2002). Haplogroup J2*

and its derivative were found at a frequency of 23% in Iran (Regueiro et al., 2006),

22.2% in Turkey (Cinnioglu et al., 2004), 9% in India (Sengupta et al., 2006) and

11.2% in Pakistan. There appears to be a decrease in the frequency of this

haplogroup as one moves from the south west to the north east of Pakistan. A

decrease in the frequency of J2* derivatives can be seen east of Iranian Plateau in

South Pakistan (7.7%), with a dramatic decline in north Pakistan (2.0%) and in

Punjabi caste (1.5%) (Table VI). Sengupta et al., (2006) shows that J2* clade is

nearly absent in East Asia (1.14%). The presence of J2* and its derivative

chromosome in the Pakistani populations indicates a Persian and Mediterranean

gene flow and is supported by the high frequency of this haplogroup in the Parsis.

This population arrived in India from Iran (Quintana-Murci et al., 2001).

Haplogroup L* is delineated by the presence of M 20 mutation (Underhill et

al., 1997). The L* haplogroup could be a recent event and arose in Indus valley

region during the Indus valley civilization. This high frequency of L* haplogroup is

found in the Indo-Pak subcontinent. The L* chromosome is largely restricted to

south Caucasus populations (Weale et al., 2001), Middle East (Nebel et al., 2001b),

Pakistan (Qamar et al., 2002), India (Kivisild et al., 2003; Sengupta et al., 2006).

However one of its sub branches L1-M27 was found with high frequency in Pakistan

(5%), India (6.32%) (Sengupta et al., 2006) and Iran (2.6%) (Regueiro et al., 2006)

while no L1-M27 chromosome was observed in East Asia (Sengupta et al., 2006)

and in Turkey (Cinnioglu et al., 2004). Comparison among the three Pakistani

groups (northern, southern and Punjabi group) displays a significant difference in

92
haplogroup distribution. A considerable diversity was noticed in populations

belonging to southern Pakistan.

The most frequent haplogroup in Pakistan was haplogroup R* (48%) (Table

VI). This haplogroup is widespread in Europe, the Caucasus, West Asia, Central Asia

and in South Asia (Sengupta et al., 2006) however, it is absent in Africa and the New

World chromosomes. The most frequently found sub clade of haplogroup R* is

R1a1*-M17. The haplogroup R1a1* chromosomes originated in Southern

Russia/Ukraine in the region between the Black and Caspian Seas. This R1a1*

chromosome spread with the expansion of Kurgan culture (Passarino et al., 2001;

Quintana-Murci et al., 2001; Wells et al., 2001; Sengupta et al., 2006). Recent

studies showed that this chromosome covers the area ranging from India to Norway

(Kivisild et al. 2003; Passarino et al., 2002; Quintana-Murci et al., 2001) but it is

almost absent in East Asia (Sengupta et al., 2006; Su et al., 1999).

In the indo-Pak subcontinent it has been postulated that this haplogroup

coincided with the arrival of Indo-European nomadic pastoral tribes from West and

Central Asia (Quintana-Murci et al., 2001). However, the study by Sengupta et al.

(2006) revealed the Holocene expansion of this R1a1*-M17 chromosome before the

arrival of Indo-European tribes from the north western side of India.

93
PART 2

COMPARISION BETWEEN PAKISTANI AND GREEK POPULATIONS:

In the present study the genetic relationship of three Pakistani populations

Burusho, Kalash and Pathan who claim descent from the Greek soldiers was

compared with the extant Greek population. For this purpose a combination of ninety

three (93) biallelic Y chromosome SNPs (Table II) and a set of 16 YSTRs were used

(Table IV). This extensive analysis of Y diversity within Greeks and three Pakistani

populations allowed us to compare Y diversity within these populations and re-

evaluate their suggested Greek origins.

The genetic relationship between the three Pakistani populations and the

Greeks can now be judged in the light of phylogenetic analyses and corresponding

statistical results. The phylogenetic results (Figure IX) showed that clade H, clade I

and the clade L haplogroups are the major haplogroups that separate Pakistani

populations from the Greeks.

The H* haplogroup is an Asia specific haplogroup (Underhill et al., 2001).

Sub-branch of haplogroup H*, H1*-M52 was observed in Pakistani populations, but

not in any of the Greek samples (Figure IX). However, the Indian specific branch

H2*-APT was not present in any Pakistani ethnic group but a low frequency (1.3%)

was observed in Greek population (Firasat et al., 2007). The presence of the Indian

specific sub-clade H2*-APT haplogroup in the Greek is the first time that this

haplogroup has been observed in any western European population and could

indicate ancient contacts.

On the other hand Haplogroup I*-M170 appears as a European specific

haplogroup (Rootsi et al., 2004). The consistency of this result was also seen in our

analyses and 19.5% Greeks have I-M170 Y chromosome (Figure IX). This

haplogroup was absent in Burusho, Kalash and Pathan. Low contribution of this

haplogroup was seen in the rest of the Pakistani ethnic groups.

94
Similarly clade L* observed only in Pakistani populations and absent in the

Greeks (Figure IX). Like haplogroup H*, the L*-M20 and R2-M124 are indigenous to

the Indus Valley and south west Asia. Clade L* has been suggested to be associated

with the spread of agriculture in the Indus Valley between 7000-2000 B.C. (Qamar et

al., 2002). All L*-M20 derived Y chromosomes in the Kalash population were

distinguished by the presence of a novel PK3 polymorphism which placed them in

the sub-clade L3a (Figure IX). In the same way the R2-M124 was absent in Greeks

and found 14.4% in Burusho and 5.74% in rest of Pakistani populations (Figure IX).

Clade E* Y chromosomes most probably originated in east Africa and spread

in North Africa, Middle East, and European countries (Semino et al., 2004). In the

Pakistani populations, a low frequency of E* haplogroup was present as compared to

the Greeks (2.5% and 21% respectively). Sub clade of E* haplogroup, E1b1b1a*-

M78, also arose in Africa (Cruciani et al., 2004). E1b1b1a*-M78 of haplogroup E* is

the only branch that is present with low frequency in Pakistani populations (0.41%)

and high frequency in Greek population (17%). Among the three Pakistani

populations that claim Greek ancestry the Pathan were the only population in which a

low frequency of clade E1b1b1a* -M78 was present (2.1%) (Figure IX). Even more

compelling evidence in support of the genetic relationship between the Pathan and

Greek E1b1b1a*-M78 Y chromosome was provided by the median joining network

(Figure XIII). One Pathan shared the same Y-STR haplotype; that included a

duplication of 10 and 13 repeats for the DYS425 locus; with three Greek individuals

and the other was separated from this cluster by a single mutation which enabled us

to estimate the Time to the most recent common ancestor (TMRCA)( mean SD),

using the Network software as between 2000 400 and 5000 1200 Years before

past (YBP) depending upon the observed (Kasyer et al., 2000) or inferred mutation

rates (Zhivotovsky et al., 2004). This coincides with the period of Alexanders

invasion during 327-323 B.C. In addition, this haplotype was not found in any other

E1b1b1a*-derived Pakistani Y chromosome. However, this haplotype was observed

95
in 53 individuals in the Y-STR Haplotype Reference Database (YHRD) Kasyer et al.,

2000) and was highly specific for the Balkans the highest frequency being in

Macedonia.

It is worth emphasizing here that the chance of picking up rare events largely

amplified by drift affecting a limited portion of the population cannot be discounted,

and Cruciani et al., (2006) also recommend caution when using microsatellite alleles

as surrogates of unique event polymorphisms. The genetic data alone do not tell us

when the Balkan chromosomes arrived in Pakistan; therefore, it is necessary to turn

to the historical record for this. There has been no known Greek admixture within the

last few generations, but in addition to Alexanders armies, the possibility of

admixture between the Greek slaves who were brought to this region by Xerxes

around one hundred and fifty years before Alexanders arrival, and the local

population, cannot be discounted (Firasat et al., 2007). At that time Afghanistan and

present day Pakistan were part of the Persian Empire (Wolpert, 2000). Nevertheless,

Alexanders army of 2500030000 mercenary foot soldiers from Persia and West

Asia and 50007000 Macedonian cavalry (Engles, 1981) perhaps provides a more

likely explanation because of their elite status and substantial political impact on the

region.

Several studies have shown that Clade E* is present at a relatively high

frequency in the Greek population (Firasat et al., 2007; Francalacci et al., 2003;

Hammer et al., 2001). Our results have shown that the high frequency of clade H1*-

M52 and L3a-PK3 (20.45% and 22.7% respectively) and the lack of clade E* in the

gene pool of Kalash, make the Kalash distinct from the Greeks (Figure IX).

The statistical analysis of results has also shown the highest pair-wise genetic

distance [ST (0.213) and (8.066)] values for the Kalash population (Table XI).

Moreover, the Kalash form a distinct cluster in the principal component analysis

(Figure X). On the basis of these results it is thus concluded that the true Greek

contribution to the Kalash gene pool remains uncertain.

96
The presence of a unique population specific L3a-PK3 haplogroup in Kalash

sample enabled us to use the BATWING algorithm (Wilson et al.,1998) to estimate

the median TMRCA for the Kalash L3a lineages as 970 YBP (200-3500 YBP). This

coincides with the arrival of the Kalash from Afghanistan into the Chitral Valley in

northern Pakistan during the tenth and eleventh century AD (Lines, 1999).

The pair-wise genetic distance ST (0.188) and (5.659) values reveal no

Greek connection for Burusho which is a language isolated-population. Furthermore,

principal component analysis placed Burusho as being distinct from the Greek and

closer to their neighbors in Pakistan (Figure X), suggesting that the linguistic

differences arose after the common Y pattern was established. Alternatively, there

may have been sufficient Y gene flow between populations to eliminate any initial

differences that may have been present.

This study as a whole excludes a large Greek contribution to any Pakistani

population confirming previous observations (Mansoor et al., 2004). However, it

provides evidence in support of the Greek origins for a very small proportion of

Pathan as demonstrated by clade E* network (Figure XIII) and low pair-wise genetic

distances between these two populations (Table XI). The contribution to the Kalash

is unclear and no contribution to the Burusho could be detected. This conclusion

requires the assumption that extant Greeks are representative of Alexanders armies.

The failure to find a conclusive Y link with the extant Greek population could also be

attributed to the fact that besides the 5000-7000 men strong Macedonian cavalry,

Alexanders army also consisted of 25,000-30,000 mercenary foot soldiers from

Persia and West Asia (Engels, 1981) and populations from Pakistan have been

shown to be closer to those from West Asia (Qamar et al., 2002; Quintana-Murci et

al., 2001).

97
PART 3

COMPARISION WITH WORLD POPULATIONS:

In this part Pakistani populations compared with World populations by using

the published haplogroup frequency data at similar molecular resolution. Table XII

provides all information about Asian reference population that was used in this

analysis.

The Pakistani Y chromosomes contain four major haplogroups, i.e.

haplogroup C*, haplogroup J*, haplogroup L*, and haplogroup R*, which together

account for 85.5% of total Y chromosome of Pakistani population (Table VI). The

most frequently observed haplogroup in Pakistan are haplogroup R* which make

47.5% (including all the derivatives) of the total Pakistani population. The world wide

data of Y chromosome show that the R* haplogroup with high frequency is present

among populations belonging to western and southern countries. Among

populations this haplogroup represents a variety of language groups such as

Dravidian, Indo-Iranian and Indo-European etc. However, haplogroup R* are rare

(present with low frequency) or absent in eastern countries populations. According to

the Figure XV adapted from Gyden et al., 2007, the Kyrgyz Y chromosomes in

central Asia have more than 50% haplogroup R*. The frequency gradually

decreases in Kara kalpak (34%) and Kazak (11%). In west Asia the highest

frequency of haplogroup R* is observed in northern Iran (27.2%), southern Iran

(25.6%), Syria (25%), Iraq (17.3%) and Lebanon (6%). Haplogroup R* is found in the

southern Asian populations with a frequency of 62.1% in Newar, 59% in Punjab,

46.8% in Kathmandu and 31% in Gujarat.

The second most abundant major clade is haplogroup J*, which occurs with

an average frequency of 15% in Pakistan (Table VI). This haplogroup originated


98
about 30,000 YBP in Fertile Crescent (a region that today includes Israel, the West

Bank, Jordan, Lebanon, Syria and Iraq: Semino et al., 2004). The high frequencies

among populations of the Middle East, North Africa and East Africa provide evidence

that haplogroup J* expanded more in southern direction in these areas (Thomas et

al., 1999). However, J2* originated in northern part of the Fertile Crescent. The

presence of this haplogroup in Europe and in India, Pakistan and in Nepal reveals

that haplogroup J2* expanded in both east and west directions (Al-Zahery et al.,

2003). The haplogroup J1*/J2* occurs at a frequency of 40.6%/15.8% in Jordan

(Flores et al., 2005), 37.2%/9.9% in Oman, 19.7/12.2% in Egypt (Luis et al., 2004),

9.2%/ 24.3% in Turkey (Cinnioglu et al., 2004)31%/ 26.6% in Iraq (Al-Zahery et al.,

2003), 13.8% / 18.9% in Iran (Nasidze et al., 2004; Underhill et al., 2000; Wells et al.,

2001), 16.3% /29.8% in Lebanon (Hammer et al., 2000; Semino et al.,2004; Wells et

al., 2001), 32.4%/ 22.5% in Syria (Crucani et al., 2004; Di Giacomo et al., 2004;

Hammer et al ., 2000) 38.5% / 16.8% in Palestine (Crucani et al., 2004; Hammer et

al., 2000; Nebel et al., 2001), 2.5% / 0.5% in Somalia (Sanchez et al., 2005) and

1.3% / 7.2% in Greek (Firasat et al., 2007).

12.3% of Pakistani Y chromosomes have haplogroup C*. This haplogroup is

found at high frequency in Australian aborigines, Polynesians, Kazaks, Mongolians,

Manchurians, Tuva etc. Haplogroup C* is spread in all directions. For example, C* is

found on the Indian subcontinent, Sri Lanka and in parts of SE Asia. The C1*

haplogroup found at low frequency in Japan, while C2* is found predominantly in New

Guinea, Melanesia, and Polynesia. The successful C3* haplogroup is originated in

southeast or central Asia. From central Asia this haplogroup is expanded towards

northern Asia and the Americas, and low concentrations are also found in eastern

and central Europe, where it may represent evidence of the westward expansion of

the Huns in the early middle ages. C4* is found among aboriginal Australians and a

significant occurrence of C5* is found in India.

99
The Hazara are an ethnic group in Pakistan that claim to be

descendents of Genghis Khan. The highest frequency of C3 haplogroup in Mongolia

suggested that C3 chromosome spread widely during the time when Genghis Khan

(Mongol) conquered Asia. Haplogroup C3* is present in 60% males belonging to

Hazara and 8.2% of Burushos (Table VI). In a study conducted by Zerjal et al. (2003)

the median-joining network (Bandelt et al., 1999) links the Hazara population to the

male descendents of Genghis Khan (Figure XVI). This is due to the presence of the

unique star cluster Y-STR haplotypes in haplogroup C3Y chromosomes. However,

the star haplotype was not observed in Burusho population indicating separate origins

of these two populations despite some sharing of haplogroup C3*.

The L* haplogroup is other main haplogroup in Pakistan. This haplogroup

occurs at the background of M9 haplogroup. The segment of the M9 Eurasia Clan

migrated south and reached the rugged, mountainous Pamir Knot region. Their L*

haplogroup may have been born about 30,000 years ago and represents the earliest

significant settlement of human in Indo subcontinent. Therefore, Haplogroup L* is

known as the Indian Clan. Today, the L* haplogroup is found primarily as sub-group

L1 in India and Sri Lanka. Sub-group L3* is found mostly in Pakistan. Haplogroup L*

can also be found in low frequencies in the Middle East and in Europe along the

Mediterranean coast.

Haplogroup L* is mainly associated with south Asia. The current analysis of

Sengupta et al., 2006, Thamseem et al., 2006 alongwith Cordaux et al., 2004 and

Basu et al., 2003 reveal that 7-15% Indian males have L* haplogroup while10.8%

Pakistani males carry this haplogroup (present study). As shown in Figure XVII, and

the work conducted by Wells et al., 2001, a very high frequency of haplogroup L* was

present in South India and western Pakistan than in south Pakistan. However a low

frequency of haplogroup L* was observed in northern India and Pakistan while

haplogroup L* absent in east India. A low frequency was found in Oman (0.8%: Luis
100
et al., 2004), Iraq (1%: Al-Zahery et al., 2003), Lebanon (2%: Hammer et al., 2000;

Semino et al., 2004; Wells et al., 2001), and Greece (1.1%: Di Giacomo et al., 2003;

Semino et al., 2004).

Haplogroup B* is one of the oldest Y-chromosome haplogroup confined in

African population (Knight et al., 2003). This haplogroup appears at low frequency all

around Africa, but is at its highest frequency in Pygmy populations. In current study,

an interesting observation was the presence of this ancient haplogroup B* lineage in

two Pakistani males i.e. one that belongs to Brahui the Dravidian speaking population

and the second one that belongs to Makrani-Negroid from the southern population.

Median-joining network (Bandelt et al., 1995) for the M60 derived Y haplotypes for

DYS19, 389I, 389b, 390 and 392 revealed that the Brahui sample (Y-STR haplotype

14_11_18_24_13) differed from three Sukuma individuals (Knight et al., 2003) at the

DYS19 locus only (16_11_18_24_13) (Figure XVIII). However, the Makrani Negroid

(Y-STR haplotype 15_10_18_21_11) differed from one individual belonging to

Hadzabe population at the 389b, 390, and 392 loci (15_10_17_20_13) (Table XIII).

The time of separation between the populations, estimated by the software Network

(Bandelt et al., 1995) was approximately 5000-10,000 years. These results exclude

an ancient migration and suggest that a more recent migratory event is responsible

for this separation. It is possible that these chromosomes originated as the M2

derived chromosomes found in some populations of southern Pakistan as described

by Qamar et al., 2002. However, Qunitana-Murci et al., 2004 described it as genetic

legacy of the slave trade that existed between the southern coast of Pakistan and

East Africa.

Haplogroup O* is commonly present in East and South Asia. 80-90% of all

men in East and Southeast Asia carry this haplogroup; however, a low frequency

(0.82%) of this haplogroup was observed in Pakistan (Figure XIX).

101
In comparison with worldwide data, it is suggested that the gene pool of

Pakistani ethnic groups is much closer to the western populations as compared to

the populations of the east and south east Asia. It is illustrated by the presence of

frequently found haplogroups like, J* and R* etc. are also contributed in western Asia

and the European gene pool but not found in China and Japan. However, the low

prevalence, or absence, of East Asian i.e C3 and O*haplogroups in Pakistan

indicates that the Karakoram Mountains, which separate Pakistan and China, form a

formidable barrier to gene flow from the north. The Hazara are the only population

that have 60% C3 Y-chromosome shows significant East Asian (Mongolian) ancestry

but historical records indicate that they did not cross this geographical boundary and

arrived in the subcontinent from the West.

102
Table XII: Description of World populations.

Geographic Abbreviation Language Family No. of References

Region and Subjects

Population

Middle East:

Northern Iran NIR Indo-European 33 Regueiro et al., 2006

Southern Iran SIR Indo-European 117 Regueiro et al., 2006

Iraq IRQ Afro-Asiatic 139 Al-Zahery et al., 2003

Lebanon LEB Afro-Asiatic 50 Wells et al., 2001

Syria SYR Afro-Asiatic 20 Semino et al., 2000

Central Asia:

Kazak KAZ Altaic 54 Wells et al., 2001

Kyrgyz KYR Altaic 52 Wells et al., 2001

Karalkalpak KAR Altaic 44 Wells et al., 2001

Shugnan SHU Indo-European 44 Wells et al., 2001

Mongolia MON Altaic 24 Wells et al., 2001

Tibet TIB Sino-Tibetan 156 Gayden et al., 2007

South Asia:

Adi ADI Sino-Tibetan 55 Cordaux et al., 2004

Gujarat GUJ Indo-European 29 Kivisild et al., 2003

Punjab PUN Indo-European 66 Kivisild et al. 2003

Pakistan PAK Indo-European 1213 Present study

Tamang TAM Sino-Tibetan 45 Gayden et al., 2007

Newar NEW Sino-Tibetan 66 Gayden et al., 2007

Kathmandu KAT Indo-European 77 Gayden et al., 2007

103
Northeast Asia:

Korea KOR Altaic 74 Karafet et al., 2001

Japan JAP Altaic 259 Hammer et al., 2006

Tuva TUV Altaic 42 Wells et al., 2001

Buryat BUR Altaic 81 Karafet et al., 2001

Manchu MAN Altaic 35 Xue et al., 2006

Southeast Asia:

Philippines PHI Austronesian 48 Karafet et al., 2005

Malaysia MAL Austronesian 32 Karafet et al., 2005

Vietnam VIE Austronesian 70 Karafet et al., 2005

Bali BAL Austronesian 551 Karafet et al., 2005

Southern Han SHA Sino-Tibetan 166 Karafet et al., 2005

104
Figure XV: The frequencies of Major haplogroups in Asian population. The

populations legends are shown in Table XII.

105
Figure XVI. Median-joining network of C* lineages. The central star-cluster

profile is 10-16-25-10-11-13-14-12-11-11-11-12-8-10-10, for the loci DYS389I-

DYS389b-DYS390-DYS391-DYS392-DYS393-DYS388-DYS425-DYS426-

DYS434-DYS435-DYS436-DYS437-DYS438-DYS439. Circles represent

lineages, area is proportional to frequency, and color indicates population of

origin. Lines represent microsatellite mutational differences.

adapted from Zerjal et al.2003.

106
Figure XVII: Distribution of L* haplogroup in Indo Pak sub continent.

adapted from Sengupta et al. 2006.

107
Table XIII: Y-STRS data of clade B* lineages in Pakistan and African

populations.

DYS19_389I_389b_390_392

Hadzabe

Sukuma
Makrani
Lisongo

Negroid
TOTAL

Brahui
Biaka

Mbuti

San
H1 14_11_15_25_14 1 1

H2 14_11_18_24_13 1 1

H3 15_10_14_21_13 2 2

H4 15_10_15_20_13 1 1

H5 15_10_15_22_13 7 7

H6 15_10_17_20_13 1 1

H7 15_10_18_21_11 1 1

H8 15_11_16_23_13 1 1

H9 16_10_15_24_13 2 1 1

H10 16_11_13_24_13 2 2

H11 16_11_14_24_13 1 1

H12 16_11_15_25_13 1 1

H13 16_11_16_20_13 1 1

H14 16_11_16_23_13 1 1

H15 16_11_18_24_13 3 3

H16 16_7_14_24_13 1 1

H17 16_7_15_24_13 1 1

H18 16_7_16_24_13 1 1

H19 17_11_13_24_13 1 1

H20 17_11_14_24_13 1 1

H21 17_11_16_20_13 1 1

H22 17_7_16_24_13 1 1

H23 18_11_16_23_13 1 1

108
Figure XVIII: Median-joining network of clade B* lineages in Pakistan and

African populations. Circles represent haplotypes and have an area

proportional to frequency. The Pakistani individuals are shown in orange and

light blue colour.

109
Figure XIX: Geographic distribution of haplogroup O3.

adapted from Shi et al. 2005.

110
PART 4

INSIGHTS INTO POPULATION ORIGINS:

Pakistan is geo-strategically placed and has witnessed many invasions and

migrations from the west over the centuries. Present day Pakistan is bordered by

Iran and Afghanistan on the west, India towards the east and China in the north. The

Indian Ocean straddles its entire southern coast line.

In the light of Y haplogroup frequencies which used to perform the statistical

analysis and allow us to interpret the origin of Pakistani populations.

BALTI:

The Balti reside in eastern Baltistan in northern Pakistan, and there are

approximately 300,000 Balti speakers in Pakistan. Their language (Balti) is a Sino-

Tibetan language and they are thought to have originated in Tibet. However, not all

Balti speakers that are found in Pakistan are from Tibetan stock. With the passage

of time many other populations that entered their territory, such as the Shins, Arabs,

Persian and Turks gradually mixed with the Balti people. Although this study

analyzed only a few unrelated Balti samples yet they did not observe Y lineages

commonly found in Tibet. Clade D* which is present at high frequency in the Tibetan

population was not observed in the Balti (Table VI). The results were consistent with

the earlier study carried out by Qamar et al., (2002).

HAZARA:

The Hazara population, which is ethnically related to their brethren in

neighbouring Afghanistan, stand out on the basis of their Y haplogroup frequencies.

Hazara individuals have typical Mongolian features and they claim to be descendants

of Genghis Khans army. Their name is derived from the Persian word hazar

111
meaning thousand, because troops were left behind in detachments of a thousand

(Qamar et al., 2002). An earlier study done on a limited number of samples (n = 33)

has shown them to be closer to populations in Mongolia (Qamar et al., 2002) and the

star Y-STR haplotype (Figure XVI) observed in this population suggested that they

were direct descendants of Genghis Khan (Zerjal et al., 2003).

The present study analyzed a much larger population sample (n =224) from a

wider geographical area in Pakistan. The earlier samples were collected from NWFP

and the additional samples were from Quetta, Baluchistan. Two haplogroups

predominated in this population, i.e. haplogroup R* (21%) and haplogroup C* (64%)

(Table VI). Haplogroup R* is also present at high frequency in other ethnic groups of

Pakistan (53.5%, when the Hazara are excluded). However, haplogroup C* is rare in

other Pakistani populations. It is present at a frequency of 1.3%, when the Hazaras

are excluded. This haplogroup is fairly common in Central Asia and Mongolia and

points towards the Mongol origins of the Hazara population (Figure XXI).

BURUSHO:

The Burusho, who speak Burushaski, are of particular genetic, linguistic and

anthropological interest. Their language is one of the few remaining language

isolates in the world (Dani, 1991; Grimes, 1992). Approximately 60,000 Burusho are

estimated to reside in present day Pakistan. The samples used here were collected

from the valleys of the Karakorum Mountains in Hunza, Nagar and Yasin. The origin

of Burusho is not well documented. Some claim they are descendants of four

generals in Alexanders army (Dani, 1989). Others believe them to be Dardics from

Central Asia, or nomads from Pamir, who migrated to this area, and displaced the

original inhabitants (Biddulph, 1977).

Studies with the autosomal (Ayub et al., 2003; Mansoor et al., 2004) and Y

chromosomal markers (Firasat et al., 2007) suggest that the Burusho have the same

112
genetic makeup as their geographical neighbours in Pakistan. Preliminary study by

Wells et al. using a limited number of Y markers showed that the Hunza Burusho

clustered with populations from Tajikistan (Wells et al., 2001) but found no such

evidence using a larger number of markers. The high frequencies of Central Asian

haplogroup C* chromosomes in the Burusho and Hazara indicate that these arose in

Central Asia before the separation of these two Pakistani populations (Mohuuddin et

al., 2006). There is also no evidence of genetic relatedness with the Greek.

Haplogroup C* is absent in Greeks (Francalacci et al., 2003; Rootsi et al., 2004), and

haplogroup E* which is common in Greece is absent in the Burusho (Figure IX).

Although they share R1a1* hapologroups but the branch derived from R1a1* that

was observed in 2 burusho individuals points towards a long separation, based on

microsatellite variation.

KALASH:

The Kalash have been isolated for centuries in the Hindu Kush mountain

ranges of northern Pakistan. Their language, Kalasha, belongs to the Dardic group of

Indo-European languages. They are around 3000-6000 in present day Pakistan.

Oral traditions ascribe their origins to a mythical place called Tsiam, which some

claim refers to Syria (Decker, 1992). Various scholars have attributed their origins to

the remnants of Alexanders army (Robertson, 1896). The lack of clade E*

chromosomes, which are present at a relatively high frequency in the Greek

population (Francalacci et al., 2003; Hammer et al., 2001) and the presence of clade

H* (20%) and L3a (23%) make the Kalash distinct from the Greeks (Firasat et al.,

2007). However, the presence of high frequency of haplogroup R* (27%) indicates

that they have a predominantly European component and their possible origin is

described in Figure XXI. Study of maternal (mitochondrial) (Schurr et al., 2000),

paternal (Y chromosome SNP and STR) (Qamar et al., 2002) and autosomal STR

(Mansoor et al., 2004) has also demonstrated their greater affinity with European

113
populations. In the principal component analyses based on haplogroup frequencies,

the Kalsah are distinct from the other ethnic groups of Pakistan (Figure X). The

presence of a unique Y haplogroup (L3a) observed only in this population suggests

genetic drift (L3a) in this population. The timing of their isolation can be better

studied by analyzing populations from Nuristan, Afghanistan from where they are

thought to have migrated to settle in Chitral District in northern Pakistan. The

median-joining network for H1*-M52 (Figure XX) which is present at appreciable

frequencies in the Burusho, Kalash and the Pathan based on 16 Y-STRs also shows

a high degree of Kalash specific substructure. Except for one individual all the

Kalash samples fall in one cluster. From the network it appears that H1*-M52 spread

to neighboring northern populations. Taken together these results suggest that the

high frequency of unique population specific SNPs and haplogroups in this group are

probably due to genetic drift in a population that has been isolated for centuries in the

Hindu Kush Mountains.

PATHAN:

The last of the northern population with claims to Greek origins, the Pathans, occupy

vast tracts of land in Pakistan and neighbouring Afghanistan. In Pakistan the vast

majority of Pathans reside in the NWFP and Baluchistan province of Pakistan. The

provincial metropolis of Peshawar (NWFP) and Quetta (Baluchistan) have large

Pathan populations and are the important centers of Pathan in Pakistan. According

to the Population Census Organization, Government of Pakistan retrieved 7 June

2006 (http://www.newworldencyclopedia.org/entry/Pashtun_people) and Census of

Afghans in Pakistan, (UNHCR Statistical Summary Report http://www.unhcr.org/cgi-

bin/texis/vtx/home/opendoc.pdf ) the Pathan population constitutes 15% of the

population of present day Pakistan. Their language, Pashtu, is classified under the

Indo-Iranian branch of the Indo-European languages and linguistically they are

classified as an Iranian people (Nicholas and Asmatullah, 2007). Folklore legends

114
Figure XX: Median-joining network H1*-M52 lineage fall in Burusho, Kalash

and Pathan, based on their Y-STR haplotype.

115
claim that either they are of Jewish origin (Ahmed, 1952) or are descendants of

Alexanders army (Bellew, 1998).

In present study, the presence of small amount of haplogroup E1b1b1a*

chromosome that is present with large amount in Greek (Figure IX) provide an

evidence of a small Greek contribution to the Pathan gene pool that will likely require

further investigation in order to ascertain its pervasiveness (Firasat et al., 2007).

However, earlier studies carried out by Quintana-Murci (2004) and Mansoor (2004)

using mitochondrial DNA and STR markers demonstrated that the Pathans are

mainly related to the Iranians and their geographic neighbors in northern Pakistan.

PARSI:

The origins of the Parsi are well-documented and there are only a few

thousand Parsi inhabitants in Pakistan now. These followers of the Persian Prophet

Zoroaster (http://www.ozemzil.com.au/~Zarathus/Zor33.html) migrated to India after

the collapse of the Sassanian Empire in the 7th century A.D. and settled in the

northwest Indian province of Gujarat in 900 A.D. where they were called the Parsi

___ meaning from Iran. Eventually they moved to Mumbai in India and Karachi in

Pakistan, from where the present population was sampled (Figure XXI). They speak

indo-European language.

The earlier study of their Y chromosomes (Qamar et al., 2002) showed that

the Parsis are genetically closer to Iranians than to their neighbors in Pakistan. In

this study, 39% of the Parsis sampled belonged to haplogroup J* (Table VI). This is

similar to the frequency of this haplogroup (40%) in the present day Iranian

population (Qamar et al., 2002). Surprisingly based upon their mitochondrial DNA

variation the Parsis were genetically close to Gujrati population of India (Quintana-

Murci et al., 2004) rather than to the Iranians, indicating a loss of mitochondrial DNA

of Iranian origin mainly due to their admixture with the local population in India after

116
their seventh century migration.

BALOCH:

Balochis are affiliated with the Iranian Baloch tribes across the south West

border with Iran and these people speak the language Balochi an Indo-Aryan

language (Grimes, 1992). Currently around 8 million Balochis live in Pakistan.

Researchers are unsure of their origins. Some scholars believe that they belong to

the northern regions of Elburz, a mountain range in North Iran, whereas others claim

they came from Aleppo in Syria or Mesopotamia.

Y data analysis demonstrate that Syrians and Iranian people are

characterized by the presence of low frequency of haplogroup R* (9-26%) and high

frequency of haplogroup J* (35-57%) (Hammer et al., 2000), which is converse to the

frequency distribution of these haplogroups in the Baloch. Approximately (29%) of

the Baloch Y chromosomes carry the haplogroup R* and only 9% carry haplogroup

J* (Table VI). These results support the earlier observation (Qamar et al., 2002) that

used a limited number of Y markers. HLA data supports genetic relatedness among

the Baloch tribes of Iran and Pakistan (Farjadian et al., 2004). In worldwide surveys

of HGDP-CEPH HGDP cell line panels, the Baloch are closely related to their

geographic neighbours and share the same branch as populations from the Middle

East and West Eurasia (Jakobsson, 2008; Li, 2008).

BRAHUI:

Brahui people are found in the central region of Balochistan province of

Pakistan. About 1.5 million Brahuis reside the Sarawan and Jhalawan region of Kalat

state, Baluchistan (Hughes-Buller, 1991). They speak Brahui language that belongs

to the Dravidian language family (Grimes, 1992). Dravidians are found mostly in

southern India, Sri Lanka, Bangladesh, Pakistan, Afghanistan and Iran. Dravidians

are supposedly Indian in origin (Fuller, 2003). However, according to proto-Elamo-

117
Dravidian hypothesis, they originated in the Iranian province of Elam and were once

spread over a much larger area, including Iran, Pakistan, Afghanistan and all India

(McAlpin, 1974, 1981). According to some historical traditions, Brahuis are the

descendants of western Asian people (McAlpin, 1974, 1981) such as, Turko-Iranian

tribe and Scythians (Hughes-Buller, 1991). Some historians also claim that they

have the same origins as that of Baloch (Hughes-Buller, 1991; Quddus, 1990).

Brahuis are widely suggested to be remnants of a formerly widespread Dravidian

entered in South Asia with the expansion of Dravidian speaking farmers (Quintana-

Murci, 2001).

In order to detect its true origin a set of 117 Y Brahuis chromosome were

analyzed. The result of present study was compared with neighboring populations.

The presence of two Y chromosomal haplogroups, haplogroup J* and haplogroup L*

(Table VI) reveal the movement of population from west Asia to south Asia and from

India to Pakistan respectively.

The highest frequency of haplogroup J* is found in Iranian populations (30-

60%: Quintana-Murci et al., 2001), and in the Fertile Crescent region that includes,

Palestinians (51%), Lebanese 46% and Syrians 57% (Hammer et al., 2000). These

results indicate that the haplogroup J* originated in west Asia and from there they

spread to South Asia. The presence of high frequency of haplogroup J* in Brahui

(26.5%) also confirmed these observations. The major movement of population from

west Asia to south Asia is correlated with the expansion of farming economy that

started between 6th and 5th millennia B.C. from Iran to Indo-Pak subcontinent. After

this, the other major development was the expansion of domesticated animals by the

pastoral nomadic. Probably the expansion of haplogroup J* has been associated with

the dispersal of farmers and pastoral nomadic (Dravidian) in southern Asia (Cavalli-

Sforza, 1988; Renfrew, 1987). However, Sengupta et al., 2006 suggests the origin of

Dravidian is in India. They deduced by the presence of indigenous haplogroup L1-

M76 (M-27) in Dravidian speakers (7.5% in India). The 6% Brahui Y chromosome

118
carry L1-M76 haplogroup provides an idea that Brahui could migrated to Baluchistan

from India. It is also proved by the mean microsatellite variance which is higher in

India (0.35) than in Pakistan (0.19) (Sengupta et al., 2006).

MAKRANI NEGROID:

The Negroid Makrani has African physical traits, reside along the southern

Makran coastal region of Pakistan and speak an Indo-Eurpeon language. It has

been speculated that they represent migrants from Africa (Figure XXI) but the timing

of this migration is uncertain (Ansari, 1996). Although they do have low frequency of

sub-Saharan African haplogroups such as E1b1a* they also exhibit a sizeable

proportion of L*, J* and R*. L* haplogroup are mostly restricted to the Indo-Pak

subcontinent and haplogroups J* and L* to Eurasia. The contribution of African Y

chromosome to this population was estimated to be approximately 12% (Qamar et

al., 2002) and mitochondrial DNA data supported these results. This data alongwith

their history as remnants of the east African slave trade indicated that they were

probably recent settlers (Quintana-Murci et al., 2004).

119
Figure XXI: Possible origins a) Hazara b) Kalash c) Parsi d) Makrani

Negroid

MONGOLIA
West Eurasian Y lineages

Y Lineage C from East Asia

Origins: Hazara
Origins:
Kalash
a b

Y lineages from West Asia (Iran) Y lineages from sub-Saharan Africa

Iran

Origins:Gujrat
Parsi
Mumbai Origins: Makrani
Negroid

c d

adapted from Mehdi, S.Q. 2007

120
CONCLUSIONS:

The molecular analysis of the human genome is providing a better

understanding of human ancestry and diversity from both the maternal and paternal

perspective. The evolutionary antiquity of Pakistani populations and the subsequent

migration from west Asia, Europe and to a less extent from East Asia has resulted in

a rich tapestry of socio-cultural, linguistic and biological diversity. This study provides

the report on the diversity in Pakistani population on the basis of haplogroup

frequencies. It provides insight into the genetic relationship of the Pakistani

population with respect to each other as well as the other world population. These

studies will serve as a background for epidemiological work in different populations of

the world. The genetic makeup of a population determines the differences in

incidence and prognosis of various diseases across different populations. The study

will provide major insights where a patients origin will be useful in determining the

predisposition to various diseases. The knowledge of a populations genetic

composition will also be helpful in eliminating any spurious risk factors for different

diseases. Furthermore, apart from the inherited diseases, the study will be of

immense medical importance in understanding susceptibility and resistance to

infectious diseases as well as the efficacy of drug treatment, heralding the era of

genomic medicine.

121
REFERENCES

-7-
Ahmad AKN. (1952). Jesus in heaven on earth. The Civil and Military Gazette Ltd,
Lahore, Pakistan.

Aitman TJ, Dong R, Vyse TJ, Norsworthy PJ, Johnson MD, Smith J, Mangion J,
Roberton-Lowe C, Marshall AJ, Petretto E, Hodges MD, Bhangal G, Patel SG,
Sheehan-Rooney K, Duda M, Cook PR, Evans DJ, Domin J, Flint J, Boyle JJ, Pusey
CD and Cook HT.(2006). Copy number polymorphism in Fcgr3 predisposes to
glomerulonephritis in rats and humans. Nature. 439:851-855.

Al-Zahery N, Semino O, Benuzzi G, Magri C, Passarino G, Torroni A, Santachiara-


Benerecetti AS. (2003). Y-chromosome and mtDNA polymorphisms in Iraq, a
crossroad of the early human dispersal and of post-Neolithic migrations. Mol
Phylogenet Evol. 28:458-472.

Anderson S, Bankier AT, Barrell BG, De Bruijn MHL, Coulson AR, Drouin J, Eperon
IC, Nierlich DP, Roe B A, Sanger F, Schreier PH, Smith AJH, Staden R and Young
IG.(1981). Sequence and organization of the human mitochondrial genome. Nature.
290: 457-465.

Ansari SSA.(1996). The Afghan or Pathans. In: The Musalman races found in
Sindh, Baluchistan and Afghanistan. Indus publications, Karachi.pp9-16.

Ayub Q, Mansoor A, Ismail M, Khaliq S, Mohyuddin A, Hameed A, Mazhar K,


Rehman S, Siddiqi S, Papaioannou M, Piazza A, Cavalli-Sforza LL and Mehdi SQ.
(2003). Reconstruction of human evolutionary tree using polymorphic autosomal
microsatellites. Am J Phys Anthropol.122:259-268.

Baird M, Balazs I, Giusti A, Miyazaki L, Nicholas L, Wexler K, Kanter E, Glassberg J,


Allen F, Rubinstein P, and Sussman L.(1986). Allele frequency distribution of two
highly polymorphic DNA sequences in three ethnic groups and its application to the
determination of paternity. Am J Hum Genet. 39:489-501.

Bandelt HJ, Forster P, SykesBC, and Richards MB.(1995). Mitochondrial Portraits


of Human Populations Using Median Networks. Genetics. 141: 743-753.

Bandelt HJ, Forster P and Rohl A.(1999). Median-joining networks for inferring
intraspecific phylogenies. Mol Biol Evol. 16: 37 48.

Barley J, Blackwood A, Miller M, Markandu ND, Carter ND, Jeffery S, Cappuccio FP,
MacGregor, GA and Sagnelle GA.(1996). Angiotensin converting enzyme gene I/D
polymorphism, blood pressure and the rennin-angitensin system in Caucasians and
Afro-Caribbean peoples. J Hum Hypertens. 10: 31-35.

Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, ChakrabortyM, Dey B, Roy


M, Roy B, Bhattacharyya NP, Roychoudhury S, Majumder PP.(2003). Ethnic India: a
genomic view, with special reference to peopling and structure. Genome Res.
13:22772290.

Batzer MA, Kilroy GE and Richard PE.(1990). Structure and variability of recent
inserted Alu family members. Nucleic acids Res. 18:6793-6798.

Batzer M A and Deininger PL.(1991). A human-specific subfamily of Alu sequences.


Genomics 9:481-487.

122
Batzer MA, Gudi VA, Mena JC, Foltz DW, Herrera RJ and Deininger PL.(1991).
Amplification dynamics of Human-specific (HS) Alu family members. Nucleic Acids
Res.19:3619-3623.

Batzer MA, Acrot SS, Phinney JW, Alegria-Hartman M, Kass DH, Milligan SM,
Kimpton C, Gill P, Hochmeister M, Ioannou PA, Herrera RJ, Boudreau DA, Scheer
WD, Keats BJ, Deininger PL, Stoneking M.(1996). Genetic variation of recent Alu
insertion in human populations. J mol Evol. 42:22-29.

Batzer MA and Deininger PL.(2002). Alu repeats and human genomic diversity. Nat
Rev Genet. 3:370-379.

Behar DM, Garrigan D, Kaplan ME, Mobasher Z, Rosengarten D, Karafet TM,


Quintana-Murci L, Ostrer H, Skorecki K, and Hammer MF.(2004). Contrasting
patterns of Y chromosome variation in Ashkenazi Jewish and host non-Jewish
European populations. Hum. Genet. 114: 354365.

Bellew HW.(1979). The races of Afghanistan. Sang-e-Meel Publications, Lahore,


Pakistan.

Bellew HW. (1998). An enquiry into the ethnography of Afghanistan. Vanguard


Books, Lahore, Pakistan.

Biddulph J.(1977). Tribes of the Hindoo Koosh. Karachi, Pakistan:


IndusPublications.

Birnboim HC and Straus NA.(1975). DNA from Eukaryotic cells contain unusually
long pyrimidine sequences. Can J Biochem. 53:640-643.

Bowcock A M, Kidd J, Moutain JL, Hebert JM, Carotennuto L, Kidd KK and Cavalli-
Sforza LL.(1991). Drift, admixture, and selection in human evolution: a study with
DNA polymorphisms. Proc Natl Acad Sci. USA 88: 839-843.

Bowcock A M, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR and Cavalli-Sforza


LL.(1994). High resolution of human evolutionary trees with polymorphic
microsatellites. Nature. 368:455-457.

Boyum A.(1968). Seperation of lymphocytes and erythrocytes by centrifugation.


Scand.J Clin Lab Invest. 21:( Supplement 97), pp91.

Brook JD, McCurrach ME, Harley HG, BucklerA J, Church D, Aburatani H, Hunter K,
Stanton VP, Thirion JP, Hudson T, Sohn R, Zemelman B, Snell RG, Rundle SA,
Crow S, Davies J, Shelbourne P, Buxton J, Jones C, Juvonen V, Johnson K, Harper
PS, ShawDJ, and Housman DE.(1992). Molecular basis of myotonic dystrophy:
expansion of trinucleotide (CTG|) repeat at the3 end of the transcript encoding a
protein kinase family member. Cell. 68:799-808.

Brooks MB, Gu W, Barnas JL, Ray J and Ray KA.(2003). Line 1 insertion in the
Factor IX gene segregates with mild hemophilia B in dogs. Mamm Genome. 14:788-
795.

Brown P, Sutikna T, Morwood MJ, Soejono RP, Jatmiko, Saptomo EW, Due RA.
(2004). A new small-bodied hominin from the Late Pleistocene of Flores, Indonesia.
Nature. 431:1055-1061.

123
Brown WM, George M Jr and Wioson AC.(1979). Rapid evolution of animal
mitochondrial DNA. Proc Natl Acad Sci. USA 76:1967-1971.

Budowle B, Moretti TR, Niezgoda SJ and Brown BL. (1998). CODIS and PCR-
based short tandem repeat loci: Law enforcement tools. In: Second European
Symposium on Human Identification 1998, Promega Corporation, Madison,
Wisconsin pp 73-88.

Budowle B and Chakraborty R.( 2001). Population variation at the CODIS core
short tandem repeat loci in Europeans. Leg Med (Tokyo) 3:29-33.

Cann RL, Stoneking M and Wilson AC.(1987). Mitochondrial DNA and human
evolution. Nature. 325: 31-36.

Capelli C, Wilson JF, Richards M, Stumpf MP, Gratrix F, Oppenheimer S, Underhill


P, Pascali VL Ko TM and Goldstein DB.(2001). A predominantly indigenous
paternal heritage for the Austronesian-speaking peoples of insular Southeast Asia
and Oceania. Am J Hum Genet. 68: 432-443.

Cappuzzo F, Toschi L, Domenichini I, Bartolini S, Ceresoli GL, Rossi E, Ludovini V,


Cancellieri A, Magrini E, Bemis L, Franklin WA, Crino L, Bunn PA Jr, Hirsch FR,
Varella-Garcia M.(2005). HER3 genomic gain and sensitivity to gefitinib in advanced
non-small-cell lung cancer patients. Br J Cancer. 93:1334-40.

Caroe O.(1958). The Pathans. Karachi, Pakistan: Oxford University Press.

Carter NP.(2007). Methods and strategies for analyzing copy number variation using
DNA microarrays. Nat. Genet. 39: Suppl: S16-S21.

Casanova M, Leroy P, Boucekkine C, Weissenbach J, Bishop C, Fellous M, Purrello


M, Fiori G and Siniscalco M.(1985). A human Y-linked DNA polymorphism and its
potential for estimating genetic and evolutionary distance. Science. 230:1403-1406.

Cavalli-Sforza LL.(1988). The Basque population and ancient migrations in Europe.


Munibe. 6:129-137.

Cavalli-Sforza LL, MenozziP and Piazza A.(1994). The History and Geography of
Human Genes. Princeton University Press, Priceton.

Cavalli-Sforza LL.(2005). The Human Genome Diversity Project: past, present and
future. Nat Rev Genet. 6:333-40.

Chakraborty R, Kimmel M, Stivers DN, Deka R and Davison LJ.(1997). Relative


mutation rates at di-,tri-, and tetra- nucleotide microsatellite loci. Proc Natl Acad. Sci.
USA 94:1041-1046.

Cinniolu C, King R, Kivisild T, Kalfolu E, Atasoy S, Cavalleri GL, Lillie AS,


Roseman CC, Lin AA, Prince K, Oefner PJ, Shen P, Semino O, Cavalli-Sforza LL
and Underhill PA.(2004). Excavating Y-chromosome haplotype strata in Anatolia.
Hum Genet.114: 127148.

Collins DW and Jukes TH.(1994). Rates of transition and transversion in coding


sequences since the human- rodent divergence. Genomics. 20: 386-396.

124
Cooper DN and Krawczak M.(1995). An introduction to the structure, function and
expression of human genes. In: Human gene mutation. Bios Scientific Publishers
Limited. UK. pp 19-48.

Cooper DN, Krawczak M, Antonorakis SE.(2000). The nature and mechanisms of


human gene mutation. In: The Metabolic and Molecular Bases of Inherited Disease,
Vol. 1, 8th Edn (eds Scriver CR, Beaudet AL, Sly WS, Valle D). Mc Graw-Hill, New
York.

Cordaux R, Weiss G, Saha N, Stoneking M.(2004). The northeast Indian


passageway: a barrier or corridor for human migrations? Mol Biol Evol .21:1525-
1533.

Cost GJ and Boake JD.(1998). Targeting of human retrotransposons integration is


directed by the specificity of the L1 endonuclease for regions of unusual DNA
structure. Biochemistry 37:18081-18093.

Cruciani F, Santolamazza P, Shen P, Macaulay V, Moral P, Olckers A, Modiano D,


Holmes S, Destro-Bisol G, Coia V, Wallace DC, Oefner PJ, Torroni A, Cavalli-Sforza
LL, Scozzari R and Underhill PA.(2002). A back migration from Asia to sub-Saharan
Africa is supported by high-resolution analysis of human Y-chromosome haplotypes.
Am J Hum Genet. 70: 11971214.

Cruciani F, La Fratta R, Santolamazza P, Sellitto D, Pascone R, Moral P, Watson E,


Guida V, Colomb EB, Zaharova B, Lavinha J, Vona G, Aman R, Cal` F, Akar N,
Richards M, Torroni A, Novelletto A, Scozzari R.(2004). Phylogeographic analysis of
haplogroup E3b (E-M215) Y chromosomes reveal multiple migratory events within
and out of Africa. Am J Hum Genet. 74:10141022.

Cruciani F, La Fratta R, Torroni A Underhill PA, Scozzari R.(2006). Molecular


dissection of the Y chromosome haplogroup E-M78 (E3b1a): a posteriori evaluation
of a microsatellite-network-based approach through six new biallelic markers. Hum.
Mutation 2006; 27: 831 832.

Csink AK and Henikoff S.(1998). Some thing from nothing: the evolution and utility
of satellite repeats. Trends Genet.14: 200-204.

Dani AH.(1989). Early history the early inhabitants. In:History of Northern Areas of
Pakistan. National Institute of Historical and Culture Research, Islamabad, Pakistan.
pp110-157.

Dani AH.(1991). History of Northern Areas of Pakistan. National Institute of


Historical and Culture Research, Islamabad, Pakistan.

Dausset J.(1954). Leuko-agglutinins IV: Leuko agglutinins and blood transfusion.


Vox Sanguinis 4: 190.

Decker KD.(1992). Sociolinguistic survey of Northern Pakistan. Vol 5, Languages of


Chitral. National Institute of Pakistan Studies, Islamabad.

Deininger PL and Daniels GR.(1986). The recent evolution of mammalian repetitive


elements. Trend Genet. 2:76-80.

Deininger PL and Slagel VK. (1988). Recently amplified Alu family members share
a common parental Alu sequences. Mol. Cell Biol. 8:4566-4569.
125
Deininger PL, Batzer MA, Hutchinson III CA and Edgell MH. (1992). Master genes
in mammalian repetitive DNA amplification. Trend Genet. 8:307-312.

Deininger PL, Sherry ST, Risch G, Donaldson C, Robichaux MB, Soodyall H,


Jenkins T, Sheen F-M, Swergold G, Stoneking M, Batzer MA.(1999). Interspersed
repeat insertion polymorphisms for studies of human molecular anthropology. In:
Genomic Diversity, Application in Human population Genetics. (eds Papiha SS, Deka
R, Chakraborty R). Kluwer Academic / Plenum Publishers. New York, Boston,
Dordrecht, London, Moscow.

de Knijff P.(2000). Message through bottle necks: On the combined use of slow
and fast evolving polymorphic markers on the human Y chromosome. Am J Hum
Genet. 67:1055-1061.

Dietrich W, Katz H, Lincoln SE, Shin H-S, Friedman J, Dracopoli NC and Lander
ES.(1992). A genetic map of mouse suitable for intra specific crosses. Genetics
131:423-447.

Di Giacomo F, Luca F, Anagnou N, Ciavarella G, Corbo RM, Cresta M, Cucci F, Di


Stasi L, Agostiano V, Giparaki M,Loutradis A, Mammi C, Michalodimitrakis EN,
Papola F, Pedicini G, Plata E, Terrenato L, Tofanelli S, Malaspina P,Novelletto A.
(2003). Clinal patterns of humanYchromosomal diversity in continental Italy and
Greece are dominated bydrift and founder effects. Mol Phylogenet Evol. 28:387
395.

Di Giacomo F, Luca F, Popa LO, Akar N, Anagnou N, Banyko J, Brdicka R,


Barbujani G, Papola F, Ciavarella G, Cucci F, Di Stasi L, Gavrila L, Kerimova MG,
Kovatchev D, Kozlov AI, Loutradis A, Mandarino V, Mammi' C, Michalodimitrakis EN,
Paoli G, Pappa KI, Pedicini G, Terrenato L, Tofanelli S, Malaspina P, Novelletto
A.(2004). Y chromosomal haplogroup J as a signature of the post-neolithic
colonization of Europe. Hum Genet. 115:357-371.

Di Rienzo A, Peterson AC, Garza JC, Valdes AM, Saltkin M and Freimer NB. (1994).
Mutational process of simple-sequence repeat loci in human populations. Proc Natl
Acad. Sci. 91:3166-3170.

Dong SL, Wang E, Hsie L, Cao YX, Chen XG, Gingeras TR.(2001). Flexible use of
high density oligonucleotide arrays for single nucleotide polymorphism discovery and
validation. Genome Res. 11:1418-1424.

Duru K, Farrow S, Wang JM, Lockette W and Kurtz T. (1994). Frequency of a


deletion polymorphism in the gene for angiotensin converting enzyme is increased in
African Americans with hypertension. Am J Hypertens. 7:759-762.

Edwards A, Civitello A, Hammond HA and caskey CT.(1991). DNA typing and


genetic mapping with trimeric and tetrameric tandem repeats. Am J Hum Genet.
49:746-756.

Engels DW.(1981). Alexander the Great and the logistics of the Macedonian Army.
Berkeley, CA: University of California Press.

Epplen JT, Mc Carrey JR, Sutou S and Ohno S.(1982). Base sequences of a cloned
snake W-chromosome DNA fragment and identification of a male putative mRNA in
the mouse. Proc Natl Acad. Sci. USA 79:3798-3802.

126
Farjadian S, Naruse T, Kawata H, Ghaderi A, Bahram S, Inoko H.(2004).
Molecular analysis of HLA allele frequencies and haplotypes in Baloch of Iran
compared with related populations of Pakistan. Tissue Antigens. 6:581-587.

Feng Q, Moran JV, Kazazian HHJr and Boeke JD.(1996). Human L1 retrotransposon
encodes a conserved endonuclease required for retrotransposition. Cell 87:905-916.

Feuk L, MacDonald JR, Tang T, Carson AR, Li M, Rao G, Khaja R, Scherer SW.
(2005). Discovery of human inversion polymorphisms by comparative analysis of
human and chimpanzee DNA sequence assemblies. PLoS Genet. 1: 489498.

Feuk L, Carson AR and Scherer SW.(2006). Structural variation in the human


genome. Nature Reviews Genetics 7:85-97.

Firasat S, Khaliq S, Mohyuddin A, Papaioannou M, Tyler-Smith C, Underhill PA,


Ayub Q.(2007). Y-chromosomal evidence for a limited Greek contribution to the
Pathan population of Pakistan. Eur J Hum Genet.15:121-126.

Fisher EM, Beer-Romero P, Brown LG, Ridley A, McNeil JA, Lawrence JB, Willard
HF, Bieber FR, Page DC.(1990). Homologous ribosomal protein genes on the human
X and Y chromosomes: escape from X inactivation and possible implications for
Turner syndrome. Cell 63:1205-1218.

Flores C, Maca-Meyer N, Larruga JM, Cabrera VM, Karadsheh N, Gonzalez AM.


(2005). Isolates in a corridor of migrations: a high-resolution analysis of Y
chromosome variation in Jordan. J Hum Genet. 50: 435-441.

Francalacci P, Morelli L, Underhill PA, Lillie AS, Passarino G, Useli A, Madeddu R,


Paoli G, Tofanelli S, Cal CM, Ghiani ME, Varesi L, Memmi M, Vona G, Lin AA,
Oefner P, Cavalli-Sforza LL.(2003). Peopling of three Mediterranean islands
(Corsica, Sardinia, and Sicily) inferred by Y-chromosome biallelic variability. Am J
Phys Anthropol. 121:270-9.

Fuller D.(2003). An agricultural perspective on Dravidian historical linguistics:


archaeological crop packages, livestock and Dravidian crop vocabulary. In: Bellwood
P, Renfrew C (eds). Examining the farming/language dispersal hypothesis.
McDonald Institute for Archaeological Research, Cambridge, United Kingdom,
pp191-213.

Fu Y-H, Kuhl DPA, Pizzuti A, Pieretti M, Sutcliffe JS, Richards S, Verkerk AJM,
Holden JH, Fenwick RG, Warren ST, Oostra BA, Nelson DL and Caskey CT. (1991).
Variation of the CGG repeats at the fragile X site results in the genetic
instability:resolution of the Sherman paradox. Cell. 67:1047-1058.

Fu Y-H, Pizzuti A, Fenwick RGJr, King J, Rajnarayan S, Dunne PW, Dubel J, Nasser
GA, Ashizawa T, de Jong P, Wieringa B, Korneluk R, Perryman MB, Epstein HF, and
Caskey CT.(1992). An unstable triplet repeat in a gene related to myotonic muscular
dystrophy. Science. 255:1256-1258.

Gabunia L and Vekua A.(1995). A Plio-pleistocene hominid from Dmanisi, East


Georgia, Caucasus. Nature. 373: 509-512.

Gayden T , Cadenas AM, Regueiro M, Singh NB, Zhivotovsky L A, Underhill PA,


Cavalli-Sforza LL and Herrera RJ.(2007). The Himalayas as a Directional Barrier to
Gene Flow. Am J Hum Genet. 80:884-894.

127
Gilbert N, Lutz-Prigge S and Moran J V.(2002). Genomic deletions created upon
LINE-1 retrotransposition. Cell 110:315-325.

Giles RE, Blanc H, cann HM and Wallace DC.(1980). Maternal inheritence of human
mitochondrial DNA. Proc Natl Acad. Sci. USA 77:6715-6719.

Gill P, Ivanov PL, Kimpton C, Piercy R, Benson N, Tully G, Evett I, Hagelberg E and
Sullivan K.(1994). Identification of the remains of the Romanov family by DNA
analysis. Nat Genet. 6:130-135

Goodier JL, Ostertag EM, Du K and Kazazian HH Jr.(2001). A novel active L1


retrotransposon subfamily in the mouse. Genome Res.11:1677-1685. Erratum in:
Genome Res 11:1968.

Gonzalez E, Kulkarni H, Bolivar H, Mangano A, Sanchez R, Catano G, Nibbs RJ,


Freedman BI, Quinones MP, Bamshad MJ, Murthy KK, Rovin BH, Bradley W, Clark
RA, Anderson SA, O'Connell RJ, Agan BK, Ahuja SS, Bologna B, Sen L, Dolan MJ
and Ahuja SK.(2005). The Influence of CCL3L1 Gene-Containing Segmental
Duplications on HIV-1/AIDS Susceptibility. Science. 307, 1434-1440.

Grimes BF.(1992). Ethnologue: Languages of the World, 12th ed., Summer


Institute of Linguistics, Inc., Dallas, Texas, USA.

Grimes B and Cooke H.(1998). Enginering mammalian chromosomes. Hum Mol


Genet. 7: 1635-1640.

Grubb R and Laurell AB.(1956). Hereditary serological human serum groups. Acta
Pathol Microbiol Scand. 39:390-398.

Hacia JG, Fan J-B, Ryder O, Jin L, Edgemon K, Ghandour G, Mayer RA, Bryan Sun,
Hsie L, Robbins CM, Brody LC, Wang D, Lander ES, Lipshutz R, Fodor SPA and
Collins FS.(1999). Determination of ancestral alleles for human singlenucleotide
polymorphisms using high-density oligonucleotide arrays. Nat Genet. 22: 164-167.

Hacia JG and Collins FS.(1999). Mutational analysis using oligonucleotide


microarrays. J Med Genet. 1999 36:730-736.

Hamada H and Kakunaga T.(1982). Potential Z-DNA forming sequences are highly
dispersed in the human genome. Nature. 298:396-398.

Hamada H, Petrino MG and Kakunaga T.(1982). A novel repeated element with Z-


DNA forming potential is widely found in evolutionary diverse eukaryotic genomes.
Proc Natl Acad. Sci. USA 79:6465-6469.

Hamada H, Seidman M, Howard BH and Gorman CM.(1984). Enhance gene


expression by poly (dT-dG). (dC-dA) sequence. Mol Cell Biol. 4:2622-2630.

Hammer MF, Spurdle AB, Karafet T, Bonner MR, Wood ET, Novelletto A, Malaspina
P, Mitchell RJ, Horai S, Jenkins T and Zegura SL.(1997). The geographic
distribution of human Y chromosome variation. Genetics.145:787-805.

Hammer MF, Karafet TM, Rasanayagam A, Wood ET, Altheide TK, Jenkins T,
Griffiths RC, Templeton AR and Zegura SL.(1998). Out of Africa and back again:
Nested cladistic analysis of human Y chromosome variation. Mol Biol Evol. 15: 427-
441.
128
Hammer MF, Redd AJ, Wood ET, Bonner MR, Jarjanazi H, Karafet T, Santachiara-
Benerecetti S, Oppenheim A, Jobling MA, JenkinsT, Ostrer H and Bonne-Tamir
B.(2000). Jewish and Middle Eastern non-Jewish populations share a common pool
of Y-chromosome biallelic haplotypes. Proc Natl Acad Sci. 97: 6769-6774.

Hammer MF, Karafet TM, Redd AJ, Jarjanazi H, Santachiara-Benerecetti S,


Soodyall H, and Zegura SL. (2001). Hierarchical patterns of global human Y-
chromosome diversity. Mol Biol Evol. 18:1189-1203.

Hammer MF, Karafet TM, Park H, Omoto K, Harihara S, Stoneking M, and Horai
S.(2006). Dual origins of the Japanese: Common ground for hunter-gatherer and
farmer Y chromosomes. J Hum Genet. 51:47-58.

Harris H. (1966). Enzyme polymorphism in man. Proc R Soc Lond B Biol Sci.
22:298-310.

Hearn CM, Ghosh S and Todd JA.(1992). Microsatellite for linkage analysis of
genetic traits. Trends Genet. 8: 288-294.

Henikoff S, Ahmed K and Malik HS.(2001). The centromere paradox: Stable


inheritance with rapidly evolving DNA. Science. 293: 1098-1102.

Hinds DA, Kloek AP, Jen M, Chen X and Frazer KA.(2006). Common deletions and
SNPs are in linkage disequilibrium in the human genome. Nat Genet. 38: 8285.

Hirszfeld L and Hirszfeld H.(1919). Serological differences between the blood of


different races: The results of researches on the Macedonian front. Lancet ii: 675-
679.

Horai S, Haysaka K, Kondo R, Tsugane K and Takahata N.(1995). Recent African


origin of modern humans revealed by complete sequences of hominoid mitochondrial
DNAs. Proc Natl Acad. Sci. USA 92: 532-536.

Hudjashov G, Kivisild T, Underhill PA, Endicott P, Sanchez JJ, Lin AA, Shen P,
Oefner P, Renfrew C, Villems R and Forster P.(2007). Revealing the prehistoric
settlement of Australia by Y chromosome and mtDNA analysis. Proc Natl Acad. Sci.
104:87268730.

Hughes-Buller R.(1991). Imperial Gazetteer of India, Provincial Series Balochistan,


Sange-Meel publication, Lahore, Pakistan. Pp 89-91.

Hurles ME, Nicholson J, Bosch E, Renfrew C, Sykes BC and Jobling MA. (2002). Y
chromosomal evidence for the origins of Oceanic-speaking peoples. Genetics. 160:
289303.

Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW. and
Lee C.(2004). Detection of large-scale variation in the human genome. Nat Genet.
36: 949951.

Ibbetson D.(1883). Punjab Caste. Sang-e-Meel publications, Lahore. pp 9-16.

Immervoll T, Loesgen S, Dtsch G, Gohlke H , Herbon N, Klugbauer S , Dempfle A ,


Bickebller B , Becker-Follmann J, Rschendorf F, Saar K, Reis A , Wichmann H-E
and Wjst M.(2001). Fine mapping and single nucleotide polymorphism association
results of candidate genes for asthma and related phenotypes. Hum Mutat. 18:327-
336.
129
International HapMap Consortium.(2005). A haplotype map of the human
genome. Nature. 437:1299-1320.

International Human Genome Sequencing Consortium.(2001). Initial sequencing


and analyses of the human genome. Nature 409: 860-921.

International Human Genome Sequencing Consortium.(2004). Finishing the


euchromatic sequence of the human genome. Nature. 431:931-945.

Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA,
Degnan JH, Wang K, Guerreiro R, Bras JM, Schymick JC, Hernandez DG, Traynor
BJ, Simon-Sanchez J, Matarin M, Britton A, van de Leemput J, Rafferty I, Bucan M,
Cann HM, Hardy JA, Rosenberg NA, Singleton AB.(2008). Genotype, haplotype and
copy-number variation in worldwide human populations. Nature 451:998-1003.

Jarrige J.(1991). Mehrgarh: Its place in the development of ancient cultures in


Pakistan. In: Forgotten citis on the Indus Early Civilization in Pakistan from 8 th to 2nd
Millennium BC.(Eds.M, Jansen, M. Mulloyn and G Urban). Verlag Philipp von
Zabern, Mainz, Germany. 34-50.

Jefferys AJ, Wilson V and Thein SL.(1985). Individual- specific finger printsof
human DNA. Nature. 316:76-79.

Jeffreys AJ.(1987). Highly variable minisatellites and DNA fingerprints. Biochemical


Society Transactions 15:309-317.

Jeffery AJ, Royle V, Wilson V and Wong Z.(1988). Spontaneous mutation rate to
new length allele at tandem repetitive hypervariable loci in human DNA. Nature.
332:278-281.

Jeffreys AJ and Pena SD.(1993). Brief introduction to human DNA fingerprinting.


EXS. 67:1-20.

Jeng JR, Harn HJ, Jeng CY, Yueh KC and Shieh SM.(1997). Angiotensin I
converting enzyme gene polymorphism in Chinese patients with hypertension. Am J
Hypertens. 10: 558-561.

Jenkins S and Gibson N.(2002). High-throughput SNP genotyping. Funct


Genomics. 3:57-66.

Jobling MA and Tyler-Smith C. (2003). The human Y chromosome: An evolutionary


marker comes of age. Nat Rev Genet. 4: 598-612.

Jorde LB, Bamshad MJ, Watkins WS, Zenger R, Fraley AE, Krakowiak PA,
Carpenter KD, Soodyall H, Jenkins Tand Rogers AR.(1995). Origins and affinities of
modern human: a comparison of mitochondrial and nuclear genetic data. Am J Hum
Genet. 57: 523-538.

Jurka J.(1997). Sequence patterns indicate an enzymatic involvement in integration


of mammalian retroposons. Proc Natl Acad Sci. USA 94:1872-1877.

Kajikawa M and Okada N.(2002). LINEs mobilize SINEs in the eel through a shared
3` sequence. Cell 111:433-444.

130
Kan YW and Dozy AM.(1978). Polymorphism of DNA sequence adjacent to human
globin structural gene: relation ship to sickle mutation. Proc Natl Acad. Sci. USA
75:5631-5635.

Kapitonov V and Jurka J.(1996). The age of Alu subfamilies. J Mol Evol. 42:59-65.

Karafet T, Xu L, Du R, Wang W, Feng S, Wells RS, Redd AJ, Zegura SL and


Hammer MF.(2001). Paternal population history of East Asia: Sources, patterns, and
micro evolutionary processes. Am J Hum Genet. 69: 615628.

Karafet TM, Osipova LP, Gubina MA, Posukh OL, Zegura SL, and Hammer MF.
(2002). High levels of Y-chromosome differentiation among native Siberian
populations and the genetic signature of a boreal hunter-gatherer way of life. Hum
Biol. 74: 761-789.

Karafet TM, Lansing JS, Redd AJ, Reznikova S, Watkins JC, Surata SP,
Arthawiguna WA, Mayer L, Bamshad M, Jorde LB, Hammer MF.(2005). Balinese Y-
chromosome perspective on the peopling of Indonesia: Genetic contributions from
pre-Neolithic hunter-gatherers, Austronesian farmers, and Indian traders. Hum Biol.
77: 93-114.

Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL and Hammer
MF.(2008). New binary polymorphisms reshape and increase resolution of the
human Y chromosomal haplogroup tree. Genome Res.185:830-838.

Kayser M, Roewer L, Hedman M, Henke L, Henke J, Brauer S, Kru ger K, Krawczak


M, Nagy M, Dobosz T, Szibor R, de Knijff P and Sajantila A.(2000). Characteristics
and frequency of germline mutations at microsatellite loci from the human Y
chromosome, as revealed by direct observation in father/son pairs. Am J Hum
Genet. 66:1580-1588.

Kayser M, Brauer S, Cordaux R, Casto A, Lao O, Zhivotovsky LA., Moyse-Faurie C,


Rutledge RB, Schiefenhoevel W, Gil,D, Lin AA, Underhill PA , Oefner PJ, Trent RJ
and Stoneking M.(2006). Melanesian and Asian origins of Polynesians: mtDNA and
Y chromosome gradients across the Pacific. Mol Biol Evol. 23: 2234-2244.

Kazazian HH Jr and Moran JV.(1998). The impect of L1 retrotransposons on the


human genome. Nat Genet. 19:19-24.

Kazazian HH Jr, Wong C, Youssoufian H, Scott AF, Phillips DG and Antonarakis


S.(1988). Haemophilia A resulting from denovo insertion of L1 sequences represents
a novel machanism for mutation in man. Nature (London) 332:164-166.

Ke Y, Su B, Song X, Lu D, Chen L, Li H, Qi C, Marzuki S, Deka R, Underhill P, Xiao


C, Shriver M, Lell J, Wallace D, Wells RS, Seielstad M, Oefner P, Zhu D, Jin J,
Huang W, Chakraborty R, Chen Z and Jin L.(2001). African origin of modern
humans in East Asia: A tale of 12,000 Y chromosomes. Science 292: 1151-1153.

Kimmel M and Chakraborty R.(1996). Measure of variation at DNA repeat loci under
a generalized stepwise mutation model. Theor Pop Biol. 50:345-367.

King R and Underhill PA.(2002). Congruent distribution of Neolithic painted pottery


and ceramic figurines with Y-chromosome lineages. Antiquity 76:707-714.

131
King TE, Bowden GR, Belaresque PL, Adams SM, Shanks ME and Jobling MA.
(2007). Thomas Jeffersons Y chromosome belongs to a rare European lineage. Am
J Phys Anthropol. 132: 583589.

Kivisild T, Rootsi S, Metspalu M, Mastana S, Kaldma K, Parik J, Metspalu E,


Adojaan M, Tolk HV, Stepanov V, Glge M, Usanga E, Papiha SS, Cinniolu C, King
R, Cavalli-Sforza L, Underhill PA, Villems R.(2003). The genetic heritage of the
earliest settlers persists both in Indian tribal and caste populations. Am J Hum Genet.
72: 313332.

Klein RG.(1989). The Human Career: Human Biological and Cultural Origin.
Chicago: Chicago University Press.

Knight A, Batzer MA, Stoneking M, Tiwari HK, Scheer WD, Herrera RJ, Deinninger
PL.(1996). DNA sequences of Alu elements indicatea recent replacement of the
human autosomal genetic complement. Proc Natl Acad. Sci. USA 93: 4360-4364.

Knight A, Underhill PA, Mortensen HM, Zhivotovsky LA, Lin AA, Henn BM, Louis D,
Ruhlen M, Mountain JL.(2003). African Y chromosome and mtDNA divergence
provides insight into the history of click languages. Curr Biol. 13:464-473.

Kongberg J R and Rykowski M C.(1988). Human genome organization: Alu, lines,


and the molecular structure of metaphase chromosome bands. Cell 53:391-400.

Koschinsky ML, Boffa MB, Nesheim ME, Zinman B, Hanley AJG, Harris SB, Cao H
and Hegele RA.(2001). Association of a single nucleotide polymorphism in CPB2
encoding the thrombin-activable fibrinolysis inhibitor (TAFI) with blood pressure. Clin
Genet. 60:345-349.

Kremer EJ, Pritchard M, Lynch M, Yu S, Holman K, Baker E, Warren ST,


Schlessinger D, Sutherland GR, and Richards RI.(1991). Mapping of DNA instability
at the fragile X to a trinucleotide repeat sequence p(CCG)n. Science 252:1711-1714.

Kruglyak S, Durrett RT, Schug MD, Aquadro CF.(1998). Equilibrium distributions of


microsatellite repeat length resulting from a balance between slippage events and
point mutations. Proc Natl Acad Sci U S A. 95:10774-10778.

Kruse PE Jr, and Patterson MK.(1973). Tissue Culture: Methods and application.
Academic Press, NewYork. pp16-17.

Labuda D, Sinnett D, Richer C, Deragon JM and Striker G.(1991). Evolution of


mouse B1 repeats: 7SL RNA folding pattern conserved. Mol Evol. 325:405-414.

Lahr MM and Foley RA.(1994). Multiple dispersals and modern human origins.
Evolutionary Anthropology. 3: 48-60.

Lahr MM and Foley RA.(1998). Towards a theory of modern human origins:


Geography, demography, and diversity in recent human evolution. Am J Phys
Anthropol. 41:137-176.

Lander E S, Linton L M, Birren B, Nusbaum C, Zody M C, Baldwin J, Devon K,


Dewar K, Doyle M, Fitzhugh W, Funke R, Gage D, Harris K, Heaford A, Howland J,
Kann L, Lehoczky J,LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP,
Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A,
Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers
J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A,
132
Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D,
Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A,
Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M,
Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra
MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl
MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS,
Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P,
Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher
E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley
KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL,
Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe
C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave
F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Rosenthal A, Platzer M,
Nyakatura G, Taudien S, Rump A, Yang HM, Yu J, Wang J, Huang GY, Gu J, Hood
L, Rowen L, Madan A, Qin SZ, Davis RW, Federspiel NA, Abola AP, Proctor MJ,
Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R,
Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M,
Schultz R, Roe BA, Chen F, Pan HQ, Ramser J, Lehrach H, Reinhardt R, McCombie
WR, de la Bastide M, Dedhia N, Blocker H, Hornischer K, Nordsiek G, Agarwala R,
Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge
CB, Cerutti L, Chen HC, Church D, Clamp M,Copley RR, Doerks T, Eddy SR, Eichler
EE, Furey TS, Galagan J, Gilbert JGR, Harmon C, Hayashizaki Y, Haussler D,
Hermjakob H, Hokamp K, Jang WH, Johnson LS, Jones TA, Kasif S, Kaspryzk A,
Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I; Kulp D, Lancet D, Lowe TM,
McLysaght A, Mikkelsen T, Moran JV, Mulder N,Pollara VJ, Ponting CP, Schuler G,
Schultz JR, Slater G, Smit AFA, Stupka E, Szustakowki J, Thierry-Mieg D, Thierry-
Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh
RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A,
Morgan MJ and Int Human Genome Sequencing Conso.(2001). Initial sequencing
and analysis of human genome. Nature 409: 860-921.

LandsteinerK.(1901). Uber agglutinationsersheimun normalen menschlichengen


Blutes Wein. Klin. Wschr. 14:1132-1134.

La Spada AR, Wilson AM, Lubahn DB, Harding AE and Fish beck KH.(1991).
Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy.
Nature. 352:77-79.

Leakey R.(1994). The origin of human kind. Basic Books, A Division of Harper
Colllins, New York.

Lichten MJ, Fox MS. (1983). Detection of non-homology containing heteroduplex


molecule. Nucleic Acid Res. 11:3959-3971.

Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM,
Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM.(2008). Worldwide human
relationships inferred from genome-wide patterns of variation. Science. 319:1100-4.

Li W-H, Gu Z, Wang H and Nekrutenko A.(2001). Evolutionary analyses of the


human genome. Nature. 409, 847-849.

Lines M.(1999). The Kalasha people of North-western Pakistan. Peshawar,


Pakistan: Emjay Books International.

133
Litt M and Luty JA.(1989). A hypervariable microsatellite revealed by in vitro
amplification of a dinucleotide repeat within the cardiac muscle actin gene. Am J
Hum Genet. 44:397-401.

Lucotte G and Ngo NY.(1985). p49f, A highly polymorphic probe, that detects Taq1
RFLPs on the human Y chromosome. Nucleic Acids Res.13:8285.

Ludwing E, Comeli PS, Aderson JL, Marshall HW, Lalouel JM, and Ward RH.
(1995). Angiotensin-converting enzyme gene polymorphism is associated with
myocardial infarction but not with development of coronary stenosis. Circulation
91:2120-2124.

Luis JR, Rowold DJ, Regueiro M, Caeiro B, Cinniolu C, Roseman C, Underhill PA,
Cavalli-Sforza LL, Herrera RJ.(2004). The Levant versus the Horn of Africa:
evidence for bidirectional corridors of human migrations. Am J Hum Genet. 74:532-
44.

LuningPrak ET, Dodson AW, Farkash EA, Kazazian HHJr.(2003). Tracking an


embryonic L1 retrotransposition event. Proc Natl Acad Sci U S A. 100:1832-7.

Malik HS, Burke W D and Eickbush T H. (1999). The age and evolution of non-LTR
transposable elements. Mol Biol Evol .16:793-805.

Maniatis T, Fritsch EF and Sambrook J.(1982). Molecular cloning: A laboratory


manual. Cold Spring Harbor laboratory, Cold Spring Harbor. New York.

Mansoor A, Mazhar K, Khaliq S, Hameed A, Rehman S, Siddiqi S, Papaioannou M,


Cavalli-Sforza LL, Mehdi SQ, Ayub Q.(2004). Investigation of the Greek ancestry of
populations from northern Pakistan. Hum Genet.114:484-90.

Marri MKBB.(1985). Search lights on Baloch and Balochistan. 3rd Edition. Nisa
traders, Quetta, Pakistan.

Marshall A and Hodgson J.(1998). DNA chips: An array of possibilities. Nature


Biotechnology 16: 2731.

Mathias SL, Scott AF, Kazazian H H Jr, Boeke J D and Gabriel A.(1991). Reverse
transcriptase encoded by a human transposable element. Science. 254:1808-1810.

McAlpin DW.(1974). Towards proto-Elamo-Dravidian. Language. 50:89-101.

McAlpin DW.(1981). Proto-Elamo-Dravidian: the evidence and its implications.


Trans Am Phil Soc. 71:3-155.

McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, Dallaire S,
Gabriel SB, Lee C, Daly MJ, Altshuler DM and The International HapMap
Consortium.(2006). Common deletion polymorphisms in the human genome. Nat
Genet. 38: 8692.

Mc Clay JL, Sugden K, Koch HG, Higuchi S and Craig IW.(2002). High-throughput
single nucleotide polymorphisms genotyping by fluorescent competitive allele-specific
polymerase chain reaction (SNiPTag). Anal Biochem. 301:200-206.

Mehdi, SQ.(2007), "Genetics of Pakistani Populations in an Asian and Global


Context", in Cavalli-Sforza, L.L. and Feldman, M. (eds), Human Population Genetics:

134
Evolution and Variation , The Biomedical & Life Sciences Collection, Henry Stewart
Talks Ltd, London. (online at http://hstalks.com/bio).

Meselson M and Yucan R. (1968). DNA restriction enzyme from Ecoli. Nature
217:1110-1114.

Mhlanga MM and Malmberg L.(2001). Using Molecular Beacons to Detect Single-


Nucleotide Polymorphisms with Real-Time PCR. Methods. 25:463-471.

Miesfeld R, Krystal M and Arnheim N.(1981). A member of a new repeated


sequence family which is conserved throughout eucaryotic evolution is found
between the human and globin genes. Nucl. Acids Res. 9:5931-5948.

Mohyuddin A, Ayub Q, Underhill PA, Tyler-Smith C and Mehdi SQ.(2006).


Detection of novel Y SNPs provides further insights into Y chromosomal variation in
Pakistan. J Hum Genet. 51:375-378.

Morrish TA, Gilbert N, Myser JS, Vincent BJ, Stamato TD, Taccioli GE, Batzer M A
and Moran JV.(2002). DNA repair mediated by endonuclease-independent LINE-1
retrotransposition. Nat Genet. 31:159-165.

Mountain JL and Cavalli-Sforza LL.(1994). Inference of human evolution through


cladistic analysis of nuclear DNA restriction polymorphisms. Proc Natl Acad. Sci.
USA 91: 6515-6519.

Myers JS, Vincent BJ, Udall H, Watkins W S, Morrish T A, Kilroy G E, Swergold G D,


Henke J, Henke L, Moran J V, Jorde LB and Batzer MA.(2002). A comprehensive
analysis of recently integrated human Ta L1 elements. Am. J. Hum Genet. 71: 312-
326.

Nakamura Y, Leppert M, OConell P, Wolff R, Holm T, Culver M, martin C, Fujimoto


E, Hoff M, Kumlin E, and White R.(1987). Variable number of tandem repeat (VNTR)
markers from human gene mapping. Science. 235:1616-1622.

Nanavutty P.(1997). The Parsis. National Book Trust, New Delhi, India.

Nasidze I, Sarkisian T, Kerimov A and Stoneking M. (2003). Testing hypotheses of


language replacement in the Caucasus: evidence from the Y-chromosome. Hum
Genet. 112:255-261.

Nasidze I, Ling EYS, Quinque D, Dupanloup I, Cordaux R, Rychkov S, Naumova O,


Zhukova O, Sarraf-Zadegan N, Naderi GA, Asgary S, Sardas S, Farhud DD,
Sarkisian T, Asadov C, Kerimov A, Stoneking M.(2004). Mitochondrial DNA and Y-
chromosome variation in the Caucasus. Ann Hum Genet. 68: 205221.

Nebel A, Filon D, Brinkmann B, Majumder PP, Faerman M, Oppenheim A. (2001).


The Y chromosome pool of Jews as part of the genetic landscape of the Middle East.
Am. J. Hum. Genet. 69: 10951112.

Nicholas Awde and Asmatullah Sarwan. Pashto Dictionary & Phrasebook: Pashto-
English, English-Pashto. (Hippocrene Books, 2003, ISBN 078180972X) retrieved 10
January 2007.

135
Oakey R, Tyler-Smith C.(1990). Y chromosome DNA haplotyping Suggest the most
European and Asian men are descended from one of two males. Genomics. 7:325-
330.

Oefner PJ and Underhill PA.(1995). Comparative DNA sequence by denaturing high


performance liquid chromatography (DHPLC). Am J Hum Genet. 57:A266.

Olivio PD, Van de Walle MJ, LaipisPJ and Hauswirth WW.(1983). Nucleotide
sequence evidence for rapid genotypic shifts in the bovine mitochondrial DNA D-
loop. Nature. 306:400-402.

Orita M, Iwahana H, Kanazawa H, Hayashi K and Sekiya T.(1989). Detection of


polymorphisms of human DNA by gel electrophoresisas single-strand conformation
polymorphisms. Proc. Natd. Acad. Sci. USA 86: 2766-2770.

Ostertag EM and Kazazian HHJr. (2001). Twin priming a proposed mechanism for
the creation of inversion in L1 retrotransposition. Genome Res. 11:2059-2065.

Ostertag EM, DeBerardinis RJ, Goodier JL, Zhang Y, Yang N, Gerton GL and
Kazazian HHJr. (2002). A mouse model of human L1 retrotransposition. Nat Genet.
32:655-660.

Pakistan Economic Survey.(2006-2007). An accountancy publication


www.accountancy.com.pk.

Pandya A, King TE, Santos FR, Taylor PG, Thangaraj K, SinghL, Jobling MA, Tyler-
Smith C.(1998). A polymorphic human Y-chromosomal G to A transition found in
India. Ind J Hum Genet. 4:5261.

Passarino G, Semino O, Quintana-Murci L, Excoffier L, Hammer M and Santachiara-


Benerecetti AS.(1998). Different genetic components in the Ethiopian population,
identified by mtDNA and Y-chromosome polymorphisms. Am J Hum Genet.62:420-
434.

Passarino G, Semino O, Magri C, Al-Zahery N, Benuzzi G, Quintana-Murci L,


Andellnovic S, Bullc-Jakus F, Liu A, Arslan A, Santachiara-Benerecetti AS (2001).
The 49a,f haplotype 11 is a new marker of the EU19 lineage that traces migrations
from northern regions of the Black Sea. Hum Immunol 62:922-32. Erratum in: Hum
Immunol 62:1313-14.

Passarino G, Cavalleri GL, Lin AA, Cavalli-Sforza LL, Brresen-Dale AL, Underhill
PA.(2002). Different genetic components in the Norwegian population revealed by
the analysis of mtDNA and Y chromosome polymorphisms. Eur J Hum Genet.
10:521-529.

Payne R, Tripp M, Weigle J, Bodmer W and Bodmer J.(1964). A new leukocyte iso-
antigen system in man. Cold Spring Harbor Quantitative Biology.29:28p5.

Perez-Lezaun A, Calafell F, Mateu E, Comas D, Ruiz-Pacheco R and Bertranpetit


J.(1997). Microsatellite variation and the differentiation of modern humans. Human
Genet. 99:1-7.

Prak EL and Haig HKJr. (2000). Mobile elements and the human genome. Nature
Rev Genet. 1:134-144.

136
Qamar R, Ayub Q, Khaliq S, Mansoor A, Karafet T, Mehdi SQ and Hammer MF.
(1999). African and Levantine origins of Pakistani YAP+ Y chromosomes. Hum Biol.
71:745-755.

Qamar R, Ayub Q, Mohyuddin A, Helgason A, Mazhar K, Mansoor A, Zerjal T, Tyler-


Smith C and Mehdi SQ. (2002). Y-chromosomal DNA variation in Pakistan. Am J
Hum Genet.7:1107-1124.

Qi XQ, Bakht S, Devos KM, Gale MD and Osbourn A. (2001). L-RCA (Ligation
rolling circle amplification): a general method for genotyping of single nucleotide
polymorphism (SNPs). Nucleic Acids Res. 29: U68-U74.

Quddus SA.(1990). A Tribal Balochistan. Ferozsons (PVt.) Ltd., Lahore, Pakistan.

Queller DC, Strassmann JE and Colin RH.(1993). Microsatellites and kinship. Tree
8:285-288.

Quintana-Murci L, Semino O, Minch E, Passarimo G, Brega A and Santachiara-


Benerecetti AS.(1999a). Further characteristics of proto-European Y chromosomes.
Eur J Hum Genet. 7:603-8.

Quintana-Murci L, Semino O, Poloni ES, Liu A, Van Gijn M, Passarino G, Brega A,


Nasidze IS, Maccioni L, Cossu G, al-Zahery N, Kidd JR, Kidd KK and Santachiara-
Benerecetti AS.(1999b). Y-chromosome specific YCAII, DYS19 and YAP
polymorphisms in human populations: a comparative study. Ann Hum Genet. 63:153-
166.

Quintana-Murci L, Semino O, Bandelt HJ, Passarino G, McElreavey K and


Santachiara-Benerecetti AS. (1999c). Genetic evidence of an early exit of Homo
sapiens sapiens from Africa through eastern Africa. Nat Genet. 23:437-441.

Quintana-Murci L, Krausz C, Zerjal T, Sayar SH, Hammer MF, Mehdi SQ, Ayub Q,
Qamar R, Mohyuddin A, Radhakrishna U, Jobling MA, Tyler-Smith C and
McElreavey K.(2001). Y-Chromosome Lineages Trace Diffusion of People and
Languages in Southwestern Asia. Am J Hum Genet. 68:537-542.

Quintana-Murci L, Chaix R, Wells RS, Behar DM, Sayar H, Scozzari R, Rengo C,


Al-Zahery N, Semino O, Santachiara-Benerecetti AS, Coppa A, Ayub Q, Mohyuddin
A, Tyler-Smith C, Qasim Mehdi S, Torroni A, McElreavey K. (2004). Where west
meets east: the complex mtDNA landscape of the southwest and Central Asian
corridor. Am J Hum Genet. 74:827-45.

Ramana GV, Su B, Jin L, Singh L, Wang N, Underhill PA, Chakraborty R (2001). Y


chromosome SNP haplotypes suggest evidence of gene flow among caste, tribe, and
the migrant Siddi populations of Andhra Pradesh, South India. Eur J Hum Genet.
9:695700.

Ramsay G. (1998). DNA chips: state of the art. Nat Biotechnol. 16:40-44.

Raynolds MV, Bristow M R, Bush E W, Abraham W T, Lowes B D, Zisman L S, Taft


CS, and Perryman MB.(1993). Angiotensin-converting enzyme DD genotype in
patients with ischaemic or idiopathic dilated cardiomyopathy. Lancet 342:1073-1075.

Regueiro M, Cadenas AM, Gayden T, Underhill PA and Herrera RJ. (2006). Iran:
Tricontinental nexus for Y-chromosome driven migration. Hum Hered. 61:132143.
137
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero
MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, Gonzlez JR, Gratacs
M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R,
Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J,
Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad
DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW
and Hurles ME. (2006). Global variation in copy number in the human genome.
Nature. 444: 444-454.
Repping S, van Daalen SK, Brown LG, Korver CM, Lange J, Marszalek JD,
Pyntikova T, van der Veen F, Skaletsky H, Page DC and Rozen S. (2006). High
mutation rates have driven extensive structural polymorphism among human Y
chromosomes. Nat Genet. 38:463-467.

Renfrew C.(1987). Archaeology and language: the puzzle of Indo-European origins.


Jonathan Cape, London.

Ricards RI, Holman K, Yu S and Sutherland GR.(1993). Fragile X syndrome


unstable element, p(CCG)n, and other simple tandem repeat sequences are binding
sites for specific nuclear proteins. Hum. Mol.Genet. 2:1429-1435.

Righmire GP.(1989). Middle stone agehumans from eastern and southern Africa. In:
P Mellars and CB Stringer (eds): Te human Revolution. Edinburgh: Edinburgh
University Press, pp109-122.

Robertson GS. (1896). The Kafirs of the Hindu-Kush. Oxford University Press,
Karachi, Pakistan.

Roberts RJ and Murray K. (1976). Restriction Endonucleases. CRC Crit Rev


Biochem. 1976 4:123164.

Roewer L, Krawczak M, Willuweit S, Nagy M, Alves C, Amorim A, Anslinger K,


Augustin C, Betz A, Bosch E, Cagli A, Carracedo A, Corach D, Dekairelle AF,
Dobosz T, Dupuy BM, Fredi S, Gehrig C, Gusma L, Henke J, Henke L, Hidding M,
Hohoff C, Hoste B, Jobling MA, Krgel HJ, de Knijff P, Lessig R, Liebeherr E, Lorente
M, Martnez-Jarreta B, Nievas P, Nowak M, Parson W, Pascali VL, Penacino G,
Ploski R, Rolf B, Sala A, Schmidt U, Schmitt C, Schneider PM, Szibor R, Teifel-
Greding J, Kayser M.(2001). Online reference database of European Y-
chromosomal short tandem repeat (STR) haplotypes. Forensic Sci Int. 118: 106-113.

Rootsi S, Magri C, Kivisild T, Benuzzi G, Help H, Bermisheva M, Kutuev I, Bara L,


Perici M, Balanovsky O, Pshenichnov A, Dion D, Grobei M, Zhivotovsky LA,
Battaglia V, Achilli A, Al-Zahery N, Parik J, King R, Cinniolu C, Khusnutdinova E,
Rudan P, Balanovska E, Scheffrahn W, Simonescu M, Brehm A, Goncalves R, Rosa
A, Moisan JP, Chaventre A, Ferak V, Fredi S, Oefner PJ, Shen P, Beckman L,
Mikerezi I, Terzi R, Primorac D, Cambon-Thomsen A, Krumina A, Torroni A,
Underhill PA, Santachiara-Benerecetti AS, Villems R and Semino O. (2004).
Phylogeography of Y-chromosome haplogroup I reveals distinct domains of
prehistoric gene flow in Europe. Am J Hum Genet. 75:128-137.

Rootsi S, Zhivotovsky LA, Baldovic M, Kayser M, Kutuev IA, Khusainova R,


Bermisheva MA, Gubina M, Fedorova SA, Ilume AM, Khusnutdinova EK, Voevoda
MI, Osipova LP, Stoneking M, Lin AA, Ferak V, Parik J, Kivisild T, Underhill PA and
Villems R.(2007). A counter-clockwise northern route of the Y-chromosome
haplogroup N from Southeast Asia towards Europe. Eur J Hum Genet. 15: 204-211.

138
Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA and
Feldman MW.(2002). Genetic structure of human populations. Science. 298:2381-
2385.

Rosser ZH, Zerjal T, Hurles ME, Adojaan M, Alavantic D, Amorim A, Amos W,


Armenteros M, Arroyo E, Barbujani G, Beckman G, Beckman L, Bertranpetit J, Bosch
E, Bradley DG, Brede G, Cooper G, Crte-Real HB, de Knijff P, Decorte R, Dubrova
YE, Evgrafov O, Gilissen A, Glisic S, Glge M, Hill EW, Jeziorowska A, Kalaydjieva
L, Kayser M, Kivisild T, Kravchenko SA, Krumina A, Kucinskas V, Lavinha J, Livshits
LA, Malaspina P, Maria S, McElreavey K, Meitinger TA, Mikelsaar AV, Mitchell RJ,
Nafa K, Nicholson J, Nrby S, Pandya A, Parik J, Patsalis PC, Pereira L, Peterlin B,
Pielberg G, Prata MJ, Previder C, Roewer L, Rootsi S, Rubinsztein DC, Saillard J,
Santos FR, Stefanescu G, Sykes BC, Tolun A, Villems R, Tyler-Smith C, Jobling
MA.(2000). Y-chromosomal diversity in Europe is clinal and influenced primarily by
geography, rather than by language. Am J Hum Genet. 67:1526-1543.

Royle NJ, Clarkson RE, Wong Z, Jeffery AJ.(1988). Clustering of hypervariable


minisatellite in the proterminal region of human autosome. Genomics. 3:352-360.

Ruiz-Pesini E, Lott MT, Procaccio V, Poole JC, Brandon MC, Mishmar D, Yi C,


Kreuziger J, Baldi P and Wallace DC.(2007). An enhanced MITOMAP with a global
mtDNA mutational phylogeny. Nucleic Acids Res. 35:D823D828.

Ruvolo ME, Zehr S, von Dornum M, Pan D, Chang B and Lin J.(1993).
Mitochondrial COII sequences and modern human origins. Mol Biol Evol 10:1115-
1135.

Saiki RK, Scharf S, Faloona F, Mullis KB, Horn GT, Erlich HA and Arnheim N.
(1985). Enzymatic amplification of beta-globin genomic sequences and restriction
site analysis for diagnosis of sickle cell anemia. Science 230:1350-1354.

Sanchez JJ, Hallenberg C, Borsting C, Hernandez A, Morling N. (2005). High


frequencies of Y chromosome lineages characterized by E3b1, DYS19-11, DYS392-
12 in Somali males. Eur J Hum Genet. 13: 856-866.

Santos FR, Pandya A, Kayser M, Mitchell RJ, Liu A, Singh L, Destro-Bisol G,


Novelletto A, Qamar R, Mehdi SQ, Adhikari R, de Knijff P and Tyler-Smith C. (2000).
A polymorphic L1 retroposon insertion in the centromere of the human Y
chromosome. Hum Mol Genet. 9:421-430.

Sassaman DM, Dombroski BA, Moran JV, Kimberland ML, Naas TP, De Berardinis
RJ, Gabriel A, Swergold GD and Kazazian HHJr.(1997). Many humanL1 elements
are capable of retrotransposition. Nat Genet. 16:37-43.

Scheinfeldt L, Friedlaender F, Friedlaender J, Latham K, Koki G, Karafet T, Hammer


M, and Lorenz J.(2006). Unexpected NRY chromosome variation in Northern Island
Melanesia. Mol Biol Evol. 23:1628-1641.

Schunkert H, Hense HW, Holmer SR, Stender M, Perz S, Keil U, Lorell BH, and
Riegger GA. (1994). Association between a deletion polymorphism of the
Angiotensin- converting enzyne gene and left ventricular hypertrophy. N Engl J Med.
330:1634-1638.

Schurr TG, Maggi WR, Fowler K, Wallace DC. (2000). The ethnic origins of an
enigmatic south Asian population, the Kalasha of northern Pakistan, as revealed by
mtDNA variation. Am J Hum Genet. 67:217.
139
Scozzari R, Torroni A, Semino O, Sirugo G, Brega A and Santachiara Benerecetti
AS.(1988). Genetic studies on the Senegal population and mitochondrial DNA
polymorphism. Am J Hum Genet. 43:534-544.

Scozzari R, Cruciani F, Santolamazza P, Malaspina P, Torroni A, Sellitto D, Arredi B,


Destro-Bisol G, De Stefano G, Rickards O, Martinez-Labarga C, Modiano D, Biondi
G, Moral P, Olckers A, Wallace DC and Novelletto A.(1999). Combined use of
biallelic and microsatellite Y-chromosome polymorphisms to infer affinities among
African populations. Am J Hum Genet. 65:829-46.

Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, Maner S, Massa H,


WalkerM, Chi M, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner A, Gilliam TC,
Trask B, Patterson N, Zetterberg A and Wigler M.(2004). Large-scale copy number
polymorphism in the human genome. Science 305: 525-528.

Seielstad M, Yuldasheva N, Singh N, Underhill P, Oefner P, Shen P and Wells


RS.(2003). A novel Y-chromosome variant puts an upper limit on the timing of first
entry into the Americas. Am J Hum Genet. 73:700-755.

Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE, Lin AA,
Mitra M, Sil SK, Ramesh A, Usha Rani MV, Thakur CM, Cavalli-Sforza LL, Majumder
PP, Underhill PA.(2006). Polarity and temporality of high-resolution Y-chromosome
distributions in India identify both indigenous and exogenous expansions and reveal
minor genetic influence of Central Asian pastoralists. Am J Hum Genet. 78:202-221.

Semino O, Passarino G, Oefner PJ, Lin AA, Arbuzova S, Beckman LE, De


Benedictis G, Francalacci P, Kouvatsi A, Limborska S, MarcikiaeM, Mika A, Mika B,
Primorac D, Santachiara-Benerecetti AS, Cavalli-Sforza LL, Underhill PA.(2000).
The genetic legacy of Palaeolithic Homo sapiens sapiens in extant Europeans: a Y-
chromosome perspective. Science. 290:1155-1159.

Semino O, Santachiara-Benerecetti AS, Falaschi F, Cavalli-Sforza LL, and Underhill


PA.(2002). Ethiopians and Khoisan share the deepest clades of the human Y-
chromosome phylogeny. Am J Hum Genet. 70:265-268.

Semino O, Magri C, Benuzzi G, Lin AA, Al-Zahery N, Battaglia V, Maccioni L,


Triantaphyllidis C, Shen P, Oefner PJ, Zhivotovsky LA, King R, Torroni A, Cavalli-
Sforza LL, Underhill PA and Santachiara-Benerecetti AS.(2004). Origin, diffusion,
and differentiation of Y-chromosome haplogroups E and J: Inferences on the
neolithization of Europe and later migratory events in the Mediterranean area. Am J
Hum Genet. 74:1023-1034.

Serre D and Hudson TJ. (2006). Resources for Genetic Variation Studies. Annu
Rev Genomics Hum. 7: 443-457.

Sharp AJ, Locke DP, McGrath SD, Cheng Z, Bailey JA, Vallente RU, Pertz LM,
Clark RA, Schwartz S, Segraves R, Navin N, Lucito R, Healy J, Hicks J, Ye K, Reiner
A, Gilliam TC, Trask B, Patterson N, Zetterberg A and Wigler M. (2005). Segmental
duplications and copy-number variation in the human genome. Am J Hum Genet.
77:78-88.

Shen MR, Batzer MA and Deininger PL. (1991). Evolution of the master Alu gene
(s). J Mol Evol. 33:311-320.

140
Shi H, Dong YL, Wen B, Xiao CJ, Underhill PA, Shen PD, Chakraborty R, Jin L, and
Su B.(2005). Y-chromosome evidence of southern origin of the East Asian-specific
haplogroup O3-M122. Am J Hum Genet 77: 408-419.

Shriver MD, Jin L, Chakrabraty R and Boerwinkle E.(1993). VNTR allele-frequency


distribution under the stepwise mutation model-a computer stimulation approach.
Genetics. 134:983-993.

Shriver MD, Jin L, Ferrell RE and Deka R. (1997). Micosatellite Data support an
early population expansion in Africa. Genomes Res 7: 586-591.

Sims LM, Garvey D and Ballantyne J. (2007). Sub-populations within the major
European and African derived haplogroups R1b3 and E3a are differentiated by
previously phylogenetically undefined Y-SNPs. Hum Mutat. 28:97.

Smith AF.(1996). The origin of interspersed repeats in the human genome. Curr
Opin Genet Dev. 6:743-778.

Smith AF.(1999). Interspersed repeats and other mementos of transposable


elements in mammalian genome. Curr Opin Genet Dev. 9:657-663.

Stefansson H, Helgason A, Thorleifsson G, Steinthorsdottir V, Masson G, Barnard J,


Baker A, Jonasdottir A, Ingason A, Gudnadottir VG, Desnica N, Hicks A, Gylfason A,
Gudbjartsson DF, Jonsdottir GM, Sainz J, Agnarsson K, Birgisdottir B, Ghosh S,
Olafsdottir A, Cazier JB, Kristjansson K, Frigge ML, Thorgeirsson TE, Gulcher JR,
Kong A and Stefansson K.(2005). A common inversion under selection in
Europeans. Nat Genet. 37:129-137.

Strachan T and Read AP.(2004). Human Molecular Genetics, 3rd ed. Garland
Science, London and New York.

Stringer CB and Andrews P. (1988). Genetic and fossils evidence for the origin of
modern humans. Science. 239:1263-1268.

Stringer C. (2000). Palaeoanthropology. Coasting out of Africa. Nature 405:24-27.

Swallow DM, GENDLER S, GRIFFITHS B, CORNEY G, Taylor-Papadimitriou J And


Bramwell ME. (1987). The human tumour-associated epithelial mucins are coded by
an expressed hypervariable gene locus PUM. Nature. 328:82-84.

Swisher CC 3rd, Curtis GH, Jacob T, Getty AG, SuprijoA, Widiasmoro.(1994). Age
of the earliest known hominids in Java, Indonesia. Science 263: 1118-1121.

Su B, Xiao J, Underhill P, Deka R, Zhang W, Akey J, Huang W, Shen D, Lu D, Luo J,


Chu J, Tan J, Shen P, Davis R, Cavalli-Sforza L, Chakraborty R, Xiong M, Du R,
Oefner P, Chen Z, Jin L.(1999). Y-chromosome evidence for a northward migration
of modern humans into eastern Asia during the last Ice Age. Am J Hum Genet.
65:17181724.

Su B, Jin L, Underhill P, Martinson J, Saha N, McGarvey ST, Shriver MD, Chu J,


Oefner P, Chakraborty R and Deka R. (2000). Polynesian origins: Insights from the
Y chromosome. Proc Natl Acad Sci. 97: 82258228.

Sun C, Skaletsky H, Rozen S, Gromoll J, Nieschlag E, Oates R & Page D C. (2000).


Deletion of azoospermia factor a (AZFa) region of human Y chromosome caused by
recombination between HERV15 proviruses. Hum. Mol. Biol. 9: 2291-2296.
141
Tattersall I. (1997). Out of Africa again ------ and again? Sci Am. 276:60-67.

Tautz D.(1989). Hypervariability of simple sequences as a general source for


polymorphic DNA markers. Nucleic Acids Res. 17: 6463-6471.

Thangaraj K, Singh L, Reddy AG, Rao VR, Sehgal SC, Underhill PA, Pierson M,
Frame IG, and Hagelberg E. (2003). Genetic affinities of the Andaman Islanders, a
vanishing human population. Curr Biol. 13:86-93.

Thanseem I, Thangaraj K, Chaubey G, Singh VK, Bhaskar LV, Reddy BM, Reddy
AG, Singh L. (2006). Genetic affinities among the lower castes and tribal groups of
India: inference from Y chromosome and mitochondrial DNA. BMC Genet. 7:42.

The ENCODE Project Consortium.(2007). Identification and analysis of functional


elements in 1% of the human genome by the ENCODE pilot project. Nature.
447:799-816.

Thomas MG, Bradman N, Flin HM.(1999). High throughput analysis of 10


microsatellite and 11 diallelic polymorphisms on the human Y-chromosome. Hum
Genet 105:577581.

Tishkoff SA, Dietzsch E, Speed W, Pakstis AJ, Kidd JR, Cheung K, Bonne`-Tamir B,
Santachiara-Benerecetti AS, Moral P and Krings M.(1996). Global patterns of linage
disequilibrium at the CD4 locus and modern human origins. Science. 271:1380-
1387.

Todd J A, Aitman TJ, Cornall RJ, Ghosh S, Hall JRS, Hearne CM, KnighT AM, Love
JM, Mcaleer MA, Prins J-B, Rodrigues N, Lathrop M, Pressey A, Delarato NH,
Peterson LB and Wicker LS.(1991). Genetic analysis of auto immune type 1
diabetes mellitus in mice. Nature. 351: 542-547.

Toth G. Gaspari Z, and Jurka J.(2000). Microsatellite in different eukaryotic


genomes:survey and analysis. Genome Res. 10:967-981.

Treco D and Arnheim N.(1986). The evolutionary conserved repetitive sequence


d(TG.AC)n promotes reciprocal exchange and generate unusual recombinants
tetrads during yeast meiosis. Mol Cell Biol. 6:3934-3947.

Tsunoda K,Sanke T,Nakagawa T,Furuta H andNanjo K.(2001). Single nucleotide


polymorphism (D68D, T to C) in the syntaxin 1A gene correlates to age at onset and
insulin requirement in Type II diabetic patients. Diabetologia 44:2092-2097.

Turner G, Barbulescu M, Su M, Jensen-SeaanMI, Kidd KK and Lenz J.(2001).


Insertional polymorphism of full-length endogenous retroviruses in humans. Curr
Biol. 11:1531-1535.

Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, Haugen E, Hayden H,
Albertson D, Pinkel D, Olson MV and Eichler EE.(2005). Fine-scale structural
variation of the human genome. Nat Genet. 37:727-732.

Ullu E and Tschudi C.(1984). Alu sequences are processed 7SL RNA genes.
Nature 312:171-172.

Underhill PA, Jin L, Lin AA, Mehdi SQ, Jenkins T, Vollrath D, Davis RW, Cavalli-
Sforza LL and Oefner PJ.(1997). Detection of numerous Y chromosome biallelic

142
polymorphisms by denaturing high-performance liquid chromatography. Genome
Res. 7:996-1005.

Underhill PA, Shen P, Lin AA, Jin L, Passarino G, Yang WH, Kauffman E, Bonn-
Tamir B, Bertranpetit J, Francalacci P, Ibrahim M, Jenkins T, Kidd JR, Mehdi SQ,
Seielstad MT, Wells RS, Piazza A, Davis RW, Feldman MW, Cavalli-Sforza LL and
Oefner PJ.(2000). Y chromosome sequence variation and the history of human
populations. Nat Genet. 26:358-61.

Underhill PA, Passarino G, Lin AA, Shen P, Mirazon Lahr M, Foley RA, Oefner PJ,
and Cavalli-Sforza LL.(2001). The phylogeography of Y chromosome binary
haplotypes and the origins of modern human populations. Ann Hum Genet. 65: 43
62.

Valdes AM, Saltkin M and Freimer NB. (1993). Allele frequency at microsatellite loci:
the stepwise mutation model revisited. Genetics. 133:737-749.

Verkerk AJMH, Pieretti M, Sutcliffe JS, Fu Y-H, Kuhl DPA, Pizzuti A, Reiner O,
Richards S, Victoria MF, Zhang F, Eussen BE, van Ommen G-JB, Blonden LAJ,
Riggins GJ, Chastain JL, Kunst CB, Galjaard H, Caskey CT, Nelson DL, Oostra BA
and Warren S.(1991). Identification of the gene (FMR-1) containing CGG repeat
coincident with a brekpoint cluster region exhibiting length variation in fragile X
syndrome. Cell. 65:905-914.

Walls EV and Crawford DH.(1987). Generation of lymphoblastoid cell lines using


Epstein-Barr virus. In: Lymphocytes, A practical apporch. Ed. Klaus G.G.B. IRL
press, Oxford. pp 157.

Walter RC, Buffler RT, Bruggemann JH, Guillaume MM, Berhe SM, Negassi B,
Libsekal Y, Cheng H, Edwards RL, von Cosel R, Nraudeau D and Gagnon
M.(2000). Early human occupation of Red sea coast of Eritrea during the last inter
giacial. Nature. 405:65-69.

Wang DG, Fan J-B, Siao C-J, Berno A, Young P, Sapolsky R, Ghandour G, Perkins
N, Winchester E, Spencer J, Kruglyak L, Stein L, Linda H, Topaloglou T, Hubbell E,
Robinson E, Mittmann M, Morris MS, Shen N, Kilburn D, Rioux J, Nusbaum C,
Rozen S, Hudson TJ, Lipshutz R, Chee M and Lander ES.(1998). Large-Scale
Identification, Mapping, and Genotyping of Single-Nucleotide Polymorphisms in the
Human Genome. Science. 280:1077-1082.

Watkins WS, Ricker CE, Bamshad MJ, Carroll ML, Nguyen SV, Batzer MA,
Harpending HC, Rogers AR, Jorde LB.(2001). Patterns of ancestral human diversity:
an analysis of Alu insertion and restriction-site polymorphisms. Am. J. Hum Genet.
68:738-752.

Watson JD and Crick FHC.(1953). A Structure for Deoxyribose Nucleic Acid.


Nature. 171:737-738.

Weale ME, Yepiskoposyan L, Jager RF, Hovhannisyan N, Khudoyan A, Burbage-


Hall O, Bradman N, Thomas MG.(2001). Armenian Y chromosome haplotypes
reveal strong regional structure within a single ethno-national group. Hum
Genet.109:659-674.

143
Webster MT, Smith NG, Ellegren H. (2002). Microsatellite evolution inferred from
human-chimpanzee genomic sequence alignments. Proc Natl Acad Sci USA
99:8748-8753.

Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, EvseevaI, Blue-Smith J, Jin L,


Su B, Pitchappan R, Shanmugalakshmi S, Balakrishnan K, Read M, Pearson NM,
Zerjal T, Webster MT, Zholoshvili I, Jamarjashvili E, Gambarov S, Nikbin B, Dostiev
A, Aknazarov O, ZallouaP, Tsoy I, Kitaev M, Mirrakhimov M, Chariev A, Bodmer
WF.(2001). The Eurasian heartland: a continental perspective on Y-chromosome
diversity. Proc. Natl. Acad. Sci. USA 98:1024410249.

Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, He W, Chen YJ,


Makhijani V, Roth GT, Gomes X, Tartaro K, Niazi F, Turcotte CL, Irzyk GP, Lupski
JR, Chinault C, Song XZ, Liu Y, Yuan Y, Nazareth L, Qin X, Muzny DM, Margulies M,
Weinstock GM, Gibbs RA, Rothberg JM.(2008). The complete genome of an
individual by massively parallel DNA sequencing. Nature. 17:872-876.

Wilson IJ, Balding DJ.(1998). Genealogical inference from microsatellite data.


Genetics 150:499510.

Wolpert S.(2000). A new history of India. Oxford University Press, New York.

Wood ET, Stover DA, Ehret C, Destro-Bisol G, Spedini G, McLeod H, Louie L,


Bamshad M, Strassmann BI, Soodyall H and Hammer MF.(2005). Contrasting
patterns of Y chromosome and mtDNA variation in Africa: Evidence for sex-biased
demographic processes. Eur J Hum Genet 13: 867876.

Wong Z, Wilson V, Patel I, Povey S, Jeffreys AJ.(1987). Characterization of a panel


of highly variable minisatellites cloned from human DNA. Ann Hum Genet. 51(Pt
4):269-288.

Xue Y, Zerjal T, Bao W, Zhu S, Shu Q, Xu J, Du R, Fu S, Li P, Hurles ME, Yang H


andChris Tyler-Smith C.(2006). Male demography in East Asia: a north-south contrast
in human population expansion times. Genetics. 172:24312439.

Y Chromosome Consortium.(2002). A nomenclature system for the tree of human


Y-chromosomal binary haplogroups. Genome Res. 12:339-348.

Yoshino T, Takeyama H and Matsunaga T.(2001). Single nucleotide polymorphism


analysis using a bacterial magnetic particle microarray. Electrochemistry 69:1008-
1012.

Youil R , Kemper B W, and Cotton R G.(1995). Screening for mutations by enzyme


mismatch cleavage with T4 endonuclease VII. Proc Natl Acad Sci. USA 92:87-91.

Zegura SL, Karafet TM, Zhivotovsky LA and Hammer MF.(2004). High-resolution


SNPs and microsatellite haplotypes point to a single, recent entry of Native American
Y chromosomes into the Americas. Mol Biol Evol. 21:164-175.

Zerjal T, Xue Y, Bertorelle G, Wells RS, Bao W, Zhu S, Qamar R, Ayub Q,


Mohyuddin A, Fu S, Li P, Yuldasheva N, Ruzibakiev R, Xu J, Shu Q, Du R, Yang H,
Hurles ME, Robinson E, Gerelsaikhan T, Dashnyam B, Mehdi SQ, Tyler-Smith C.
(2003). The genetic legacy of the Mongols. Am J HumGenet. 72:717-21.

144
Zhang F, Su B, Zhang YP and Jin L. (2007). Genetic studies of human diversity in
East Asia. Phil. Trans. R. Soc. B 362: 987995.

Zhivotovsky LA, Bennett L, Bowcock AM and Feldman MW.(2000). Human


population expansion and microsatellite variation. Mol Biol Evol. 17:757-767.

Zhivotovsky L, Underhill P, Cinniolu C, Kayser M, Morar B, Kivisild T, Scozzari R ,


Cruciani F, Destro-Bisol G and Spedini G. (2004). The Effective Mutation Rate at Y
Chromosome Short Tandem Repeats, with Application to Human Population-
DivergenceTime.AmJHumGenet.74:50-61.

145
APPENDIX

-8-
Appendix I: List of Y-SNPs analyzed along with their primer sequences and PCR amplification conditions used in this study.

SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING


METHOD DESIGNATION TEMPERATUREC

1 Apt AFLP TEK E TGG ATT GCA TTC AAC TTC ACT TAC 65.5
TEK G CTG AGT TCA AAT GCT CGG GTC TC
2 LLY22g AFLP LLY22gF CCA CCCAGT TTT ATG CAT TTG 55
LLY22gR ATA GAT GGC GTC TTC ATG AGT
3 L1Y PCR L1YF GCA CAA TGT GCA CAT GTA CCC TA
L1YR TGA TGT GTG CAT TCA TCT CAT ATA T
4 M6 DHPLC M6 F CAC TAC CAC ATT TCT GGT TGG 63, 56
M6 R CGC TGA GTC CAT TCT TTG AG
5 M8 Sequencing M8 F CCC ACC CAC TTC AGT ATG AA 56
M8 R AGG CTG ACA GAC AAG TCC AC
6 M9 AFLP M9F GCA GCA TAT AAA ACT TTC AGG 55
M9R AAA ACC TAA CTT TGC TCA AGC
7 M11 AFLP M11R TTC ATC ACA AGG AGC ATA AAC AA 55
M11F CCC TCC CTC TCT CCT TGT ATT CTA CC
8 M12 ARMS PCR M12 F ACT AAA ACA CCA TTA GAA ACA AAG G 57
M12Nor R AGC AAC ATA GTG ACC CCC AAC
M12Mut R GCA ACA TAG TGA CCC CCA AA
9A M17 AFLP M17F GTG GTT GCT GGT TGT TAC GT 60
M17R AGC TGA CCA CAA ACT GAT GTA GA
9B M17 ARMS M17FN TTG CTG GTT GTT ACG GGG 60
M17FM GTTG CTG GTT GTT ACG GGT
M17R GCT ATT CTT GTT TCT CCA GGC
10 M20 AFLP M20F GAT TGG GTG TCT TCA GTG CT 60
M20R CAC ACA ACA AGG CAC CAT C 58
11 M25 DHPLC M25 F AAA GCG AGA GAT TCA ATC CAG 63, 56
M25R TTT TAG CAA GTT AAG TCA CCA GC
12 M27 ARMS-PCR M27 F CGG AAG TCA AAG TTA TAG TTA CTG G 65
M27RNL TAT AGG AAT CGA GGT TCA GGT CAG
M27 RMT TAT AGG AAT CGA GGT TCA GGT CAC

a
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING
METHOD DESIGNATION TEMPERATUREC

13 M31 DHPLC M31 F GAA CC AGA CAA TAC GAA ATA GAA G 63, 56
M31 R TTT AGC GGC TTA TCT CAT TAC C
14 M32 DHPLC M32 F TTG AAA AAA TAC AGT GGA AC 63, 56
M32 R CAA GTG TTT AAG GAT ACA GA
15 M35 ARMS-PCR M35 FN ATT TTC CTT TGG GAC ACT AG 58
M35 FM ATT TTC CTT TGG GAC ACT AC
M35 R AGA GGG AGC AAT GAG GAC A
16 M36 DHPLC M36 F AGA TCA TCC CAA AAC AAT CAT AA 63, 56
M36 R AAG GCT GAA ATC AAT CCA ATC TG
17 M38 Sequencing M38 F CAG TTT TTA GAG AAT AAT GTC CT 63, 56
M38 R TTA AAG AAA AGA AAA GCA GAT G
18 M45 DHPLC M45F GCT GGC AAG ACA CTT CTG AG 63, 56
M45R AAT ATG TTC CTG ACA CCT TCC
19 M48 ARMS-PCR M48 FN TGA CAA TTA GGA TTA AGA ATA TTA TA
M48 FM TGA CAA TTA GGA TTA AGA ATA TTA TG
M48R AAA ATT CCA AGT TTC AGT GTC ACA TA
20 M50 DHPLC M50 F CGG CAA CAG TGA GGA CAG T 63, 56
M50 R TGC TTC AGG AGA TAG AGG CTC
21 M52 ARMS-PCR M52FC TAT CGG CCT CCT GAG TAC CTG 60
M52RG CAA GAA ACC TAT CAA ACA TCC G
M52FM CAA GAA ACC TAT CAA ACA TCC TC
22 M56 ARMS PCR M56R TCT CAT TGC TGC CTC TCT TTA 55
M56FNL GCA ATG GGA GGA TTA CGA CA
M56FMT GCA ATG GGA GGA TTA CGA CT
23 M60 DHPLC M60 F GCA CTG GCG TTC ATC ATC T 63, 56
M60 R ATG TTC ATT ATG GTT CAG GAG G
24 M62 ARMS-PCR M62 FNL GGA ATT AAT TAT TTC TCT TTC TCA T 54
M62 FMT GGA ATT AAT TAT TTC TCT TTC TCA C
M62 R TGG TGG CAT GTG CCT GTG TT
25 M67 ARMS-PCR M67 F CCA TAT TCT TTA TAC TTT CTA CCT 55
M67 RNL TCG TGG ACC CCT CTA TAC A
M67 RMT TCG TGG ACC CCT CTA TAC T
b
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING
METHOD DESIGNATION TEMPERATUREC

26 M69 DHPLC M69 F GGT TAT CAT AGC CCA CTA TAC TTT G 63, 56
M69 R ATC TTT ATT CCC TTT GTC TTG CT
27 M70 ARMS-PCR M70 FNL GGA CTC ATG TCT CCA TGA GTA 58
M70 FMT GGA CTC ATG TCT CCA TGA GTC
M70 R ATC TTT ATT CCC TTT GTC TTG CT
28 M73 DHPLC M73 F CAG AAT AAT AGG AGA ATT TTT GGT 63, 56
M73 R ATT TTC CTT ATT TTC TAA GCA GC
29 M74 DHPLC M174 F ATG CTA TAA TAA CTA GGT GTT GAA G 63, 56
M174 R AAT TCA GCT TTT ACC ACT TCT GAA
30 M76 DHPLC M76 F TAG AAG TAG CAG ATT GGG AGA GG 63, 56
M76 R CCT GAT AAA ATG AAA AAA ATG GTC
31 M78 ARMS-PCR M78 F TGG TTC TCC ACT ACA GGA GA 61
M78 RN ATT TTG AAA TAT TTG GAA GGG TG
M78RM TAT TTT GAA ATA TTT GGA AGG GTA
32 M82 DHPLC M82 F CTG TAC TCC TGG GTA GCC TGT 63, 56
M82 R AAG AAC GAT TGA ACA CAC TAA CTC
33 M87 DHPLC M87 F TCC CAT TAT TTG CTA TAT TTG CT 55
M87 RNL AAC AAG CTG GCA TCA GAA TAT AA
M87RMT CAA GCT GGC ATC AGA ATA TAG
34 M88 Sequencing M88 F ATT CTA GGG TCA GGC AAC TAG G 63, 56
M88 R TGT TTG TTC TAT TCT ATG GTC TTC C
35 M89 ARMS-PCR M89 F AGA AGC AGA TTG ATG TCC CAC T 62
M89 RNL AAC TCA GGC AAA GTG AGA GAA G
M89 RMT AAC TCA GGC AAA GTG AGA GAA A
36 M91 DHPLC M91F GAG CTT GGA CTT TAG GAC GG 63, 56
M91R AAA CTT TAA GGC ACT TCT GGC
37 M92 ARMS-PCR M92 F GGC CTT ATA AGA TTG GCA TAC 62
M92 RNL CTA AAT ACT GTT GGA GCC TAT A
M92 RMT CTA AAT ACT GTT GGA GCC TAT G
38 M97 DHPLC M97 F GTT GCC CTC TCA CAG AGC AC 63, 56
M97R AAG GTC ACT GGA AGG ATT GC
39 M101 DHPLC M101 F TCA CAG CAG CTT CAG CAA A 63, 56
c
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING
METHOD DESIGNATION TEMPERATUREC

M101 R ATA AAA ATT AGA CTC TGT GTT ACT AGC
40 M103 DHPLC M103 F CAG TAA GTG AAC TCA CAC ATA ATT CC 63, 56
M103 R CCA GTT TTA TTT CAG TTT CAC AGC
41 M109 DHPLC M109 F GGG TAT CAA AAT GTC TTC AAC CT 63, 56
M109 R GGG AAT TTC CTG CTA CTT GC
42 M110 Sequencing M110F CAG GGA AGG ACC GTA AAA GG 63, 56
M110 R ATG TTT ATC ATG TGC AGT AAA GGT T
43 M111 Sequencing M111 F AAT CTT CTG CAA AGG GTT CC 63, 56
M111 R CAG CTA CAA AAC AAA ATA CTG GAC
44 M117 DHPLC M117 F AAG TAT GAC TTA TGA AGT ACG AAG AAA 63, 56
M117 R ATT CAG TTA GAT TTT ACA ATG AGC A
45 M119 DHPLC M119 F GAA TGC TTA TGA ATT TCC CAG A 63, 56
M119 R TTC ACA CAA TAT ACA AGA TGT ATT CTT
46 M122 ARMS-PCR M122FN AAT TGA GAT ACT AAT TCA T 50
M122FM AAT TGA GAT ACT AAT TCA C
M122R AAA ACT TTA TCA TAT TGA G
47 M123 ARMS-PCR M123 F CAG CGA ATT AGA TTT TCT TGC 58
M123RN GTA TCT GAA CTA GCA TAT CTG
M123RM AGT ATC TGA ACT AGC ATA TCT A
48 M124 ARMS-PCR M124 F TGC CTT TTG GAA ATG AAT AAA TC 60
M124 N ACA AAC TCA GTA TTA TTA AAC CG
M124 R ACA AAC TCA GTA TTA TTA AAC CA
49 M133 DHPLC M133 F TGA AAT GGA AAT CAA TAA ACT CAG T 63, 56
M133 R CCT TTT CTT TTT CTT TAA CCC TTC
50 M134 DHPLC M134 F AGA ATC ATC AAA CCC AGA AGG 63, 56
M134 R TCT TTG GCT TCT CTT TGA ACA G
51 M136 DHPLC M136 F ATG TGA AGA CAA CAC TGT GTG G 63, 56
M136 R TTG TGG TAG TCT TAG TTC TCA TGG
52 M143 DHPLC M143 F ATG CTA TAA TAA CTA GGT GTT GAA G 63, 56
M143 R AAT TCA GCT TTT ACC ACT TCT GAA
53 M147 Sequencing M147 F GTA TTC TGG GGC AAT TTT AGG 94-63-56-72 94-56-72
M147 R TTG ATA CAA GAG GTT ATT TTA AGC A 0.5Cdec/cycle
d
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING
METHOD DESIGNATION TEMPERATUREC

54 M148 DHPLC M148 F AAC AGA ATT ATC AGG AAA AGG TTT 63, 56
M148 R TTT TAC TTG TTC GTG TAC TTT CAA
55 M150 DHPLC M150 F GCA GTG GAG ATG AAG TGAG AC 63, 56
M150 R CCT ACT TTC CCC CTC TTC TG
56 M152 DHPLC M152 F AAG CTA TTT TGG TTT CTT TCA 63, 56
M152 R GCC TTG TGT GGG TAT GAT TG
57 M157 DHPLC M157F GCT GGC AAG ACA CTT CTG A 55
M157RNL ACC AAA GGT CAT TTG TGG AT
M157RMT CCA AAG GTC ATT TGT GGA G
58 M170 ARMS-PCR M170 N TAT TTA CTT AAA AAT CAT TGT TCA 56
M170FCmutant TAT TTA CTT AAA AAT CAT TGT TCC
M170 Rnormal CTT TTT TCA GTT CTT CAT CAG TTA
59 M172 ARMS-PCR M172 FNL CCC AAA CCC ATT TTG ATG CTA T 61
M172 FMT CCC AAA CCC ATT TTG ATG CTA G
M172 R TCA CAG TGG ATC CAT CTT CAC T
60 M173 ARMS-PCR M173 N AAT TCA AGG GCA TTT AGA ACA
M173 FC AAT TCA AGG GCA TTT AGA ACC 56
M173R TAT CTG GCA TCC GTT AGA AAA G 55
61 M175 Sequencing M175 F TTG AGC AAG AAA AAT AGT ACC CA 94-63-56-72 94-56-72
M175 R CTC CAT TCT TAA CTA TCT CAG GGA 0.5Cdec/cycle
62 M177 Sequencing M177 F TTT AAC ATT GAC AGG ACC AG 94-63-56-72 94-56-72
M177 R GTG TTG GTT CTC CTG TAA AG 0.5Cdec/cycle
63 M185 DHPLC M185 F GGA GTA CCT ATC ACT GAA TGT GC 63, 56
M185 R GTC ATT CAT TTC TGC TTG GAA C
64 M193 DHPLC M193 F GCC TGG ATG AGG AAG TGA G 63, 56
M193 R GCC TTC TCC ATT TTT GAC CT
65 M201 ARMS PCR M201 FN AAT AAT CCA GTA TCA ACT GAG AG 56
M201 FM TAA TAA TCC AGT ATC AAC TGA GAT
M201 R GTT CTG AAT GAA AGT TCA AAC GT
66 M207 ARMS-PCR M207 FN TAA GTC AAG CAA GAA ATT TTA 56
M207 FD TAA GTC AAG CAA GAA ATT TTG 52
M207 R CAA AAT TCA CCA AGA ATC CTT G
e
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING
METHOD DESIGNATION TEMPERATUREC

67 M214 ARMS-PCR M214 F CAA GCG TAG AGG TAT TAC TAC AA 66
M214RNL TGA GAC ACT GTC TGA AAA CAA TA
M214 RMT TGA GAC ACT GTC TGA AAA CAA TG
68 M217 Sequencing M217 F GCT TAT TTT TAG TCT CTC TTC CAT 63, 56
M217 R ACC TGT TGA ATG TTA CAT TTC TTT
69 M218 DHPLC M218 F TTG TGA GTT TTT TTC CAT CAA TC 63, 56
M218 R TTT ATT GAC GAT GGT ATT AGA AGA G
70 M231 DHPLC M231F CCT ATT ATC CTG GAA AAT GTG G 63, 56
M231R ATT CCG ATT CCT AGT CAC TTG G
71 M242 ARMS-PCR M242 F AAC TCT TGA TAA ACC GTG CTG 61
M242 RNL CAC GTT AAG ACC AAT GCC ATG
M242 RMT CAC GTT AAG ACC AAT GCC ATA
72 M267 ARMS-PCR M267 F TTA TCC TGA GCC GTT GTC C
M267 RNL CCA CAC AAA ATA CTG AAC GAT 62
M267 RMT CCA CAC AAA ATA CTG AAC GAC 58
73 M317 DHPLC M317 F TGG TTC TAC AGT TGG GAT TTT G 63, 56
M317 R CCT TAA TAA CCG AGG CAC AA
74 M343 ARMS M343 F TTT AAC CTC CTC CAG CTC TG
M343RNL CCA CAT ATC TCC AGG TCT AG
M343RMT CCA CAT ATC TCC AGG TCT AT
75 M349 ARMS M349 F TGG GAT TAA AGG TGC TCA TG 58
M349RN CCT AAG GTC AGA AAG TTT TAA C
M349 RM CCT AAG GTC AGA AAG TTT TAA A
76 M357 DHPLC M357 F CCC CGT TTT TTC CTC TCT GCC 63, 56
M357 R CAC GTA ACC TGG GAT GGT CAT A
77 P15 DHPLC P15F AGA GAG TTT TCT AAC AGG GCG 63, 56
P15R TGG GAA TCA CTT TTG CAA CT
78 P31 Sequencing P31 F TAA GGC TGC GTG TTC CCT AT 63, 56
P31 R GCA CTG TCA CTG TGG ATG TT
79 PK1 AFLP PK1 F TCA ACT TTC TTA AAT GAT TGT ACG TT
PK1 R TCT GTT CAG GAG AAC CTC TAT GG
80 PK2 ARMS-PCR PK2 F TGT GTC CTG GTG TCT TTT GG 67
f
SNO. Y-SNP GENOTYPING PRIMER PRIMER SEQUENCE ANNEALING
METHOD DESIGNATION TEMPERATUREC

PK2 RN GGT GTA CAA AAT AGT TTT TGT TTT TGA TCT AA
PK2 RM GGT GTA CAA AAT AGT TTT TGT TTTT GAT CTC G
81 PK3 ARMS-PCR PK3 F TGT GTC CTG GTG TCT TTT GG 68
PK3 N AAA GCC ACC ATC TCA AGA TGG TGT ACT A
PK3 M AAA GCC ACC ATC TCA AGA TGG TGT ACT G
82 PK4 DHPLC PK4 F CCA TCC TCC CAT GGC TAG T 63, 56
PK4 R GCT TCC AAG GTG CCC TTT AT
83 PK5 AFLP PK5 F TTC CAA ACA CAT GCT TCT GC 58.5
PK5 R TAA AAA GGA GGA GGG ACT GC
84 RPS4Y AFLP RPS4Y L CCA CAG AGA TGG TGT GGG TA 61
RPS4Y R GAG TGG GAG GGA CTG TGA GA
85 SRY+465 AFLP SRY13 GCC GAA GAA TTG CAG TTT 58
SRY14 GTT GAT GGG CGG TAA GTG GC
86 SRY1532 AFLP SRY1 TCC TTA GCAACC ATT AAT CTG G 60
SRY2 AAA TAGCAAAAA ATG ACA CAA GGC
87 SRY2627 AFLP SRY-2627 F CGC GGC TTT GAA TTT CAA GCT CTG 63
SRY-2627 R TAA GAG TCC CTC GGG GCC CTG G
88 SRY8299 AFLP SRY8299 R ACA GCA CAT TAG CTG GTA TGA C
SRY8299 F TCT CTT TAT GGC AAG ACT TAC G
89 sY81 AFLP SY810.1 AGG CAC TGG TCA GAA TGA AG 56
SY810.2 AAT GGA AAA TAC AGC TCC CC
90 TAT AFLP TAT 1 GAC TCT GAG TGT AGA CTT GTG A 60
TAT 3 GAA GGT GCC GTA AAA GTG TGA A
91 YAP PCR YAP 1 CAG GGG AAG ATA AAG AAA TA 59
YAP 2 ACT GCT AAA AGG GGA TGG AT
92 12f2 PCR 12F2 F TCT TCT AGA ATT TCT TCA CAG AAT TG 59
12F2 D CTG ACT GAT CAA AAT GCT TAC AGA TC
93 92R7 AFLP 92R7 L GCC TAT CTA CTT CAG TGA TTT CT 62
92R7 L (R ) GAC CCG CTG TAG ACC TGA CT
92R7 A TGC ATG AAC ACA AAA GAC GTA 65
92R7 B GCA TTG TTA AAT ATG ACC AGC

g
M320
T2
USP9Y+3178=M184, M70, M193,M272 P77
T1
T*
M226
S1d
OCEANIA& INDONESIA

P83
S1c
P61
M254 S1b
P57
S1a
M230, P202, P204
S1*
S**
S
M124, P249, P267
R2
R 2
M335
R1b1c
M160
R1b1b2h2
U152 M126
R1b1b2h1
R1b1b2h*
P107
R1b1b2g2
U106 U198
R1b1b2g1
R1b1b2g*
P66
R1b1b2f
M222=USP9Y+3636
R1b1b2e
SRY2627 (M167)
R1b1b2d
M153
R1b1b2c
M65 R1b1b2b
M269
M37
EURASIA

R1b1b2a
P297
R1b1b2*
P25
M373
R1b1b1a
M73
R1b1b1*
R1b1b*
M343 M18
R1b1a
R1b1*
M173=P241, P231, P233, P234, P236, P238, P242 R1b*
P286, P294 M434
R1a1f
M207, M306, P224, P227, P229, P232, P280, P285 Pk5
R1a1e
IX

P98
R1a1d
M64.2, M87, M204
R1a1c
M17M198
M157
R1a1b
SRY10831.2 M56
R1a1a
R1a1*
R1a*
R1*
R*
M378
Q1b
M323
Q1a6
P89
Q1a5
P48
Q1a4
M199, P106, P292
Q1a3a3
M194
M3 Q1a3a2
Q1a3a 2
M19
Q1a3a1
a*
Q1a3a*
Q1a3
P27, 92R7, M45, M74 M242 P36.2 MEH2 M346
X
AMERICA

Q1a3*
M25, M143
Q1a2
M120, N14
Q1a1
Q1a*
Q1*
Q*
P
M333
O3a6
M300
O3a5
P103
O3a4a
(002611)
O3a4*
P101
O3a3c12
M162
O3a3c1a
M134 M117, M133
O3a3c1*
O3a3c*
P164
O3a3b2
P201=(021354) N5
O3a3b1b
M7 M113, M188, M209 N4
O3a3b1a
O3a3b1*
O3a3b*
M159
O3a3a
O3a3*
M324, P93, P197, P199 M164
O3a2
P200 M121, P27.2
M122, P198 O3a1
O3a*
O3*
47Z
AUSTRALASIA

O2b1
SRY465, P49, M176
(022454) O2b*
Pk4
M88, M111 O2a1a
M95 O2a1*
P31, M266
O2a*
M175, P186, P191, P196 O2*
M50, M103, M110
O1a2
M101
O1a1a
M119 P203
O1a1*
MSY2.2
O1a*
O1*
O*
VII

P119 N1c1c
P67
N1c1b
M178
P21
N1c1a
TAT (M46),P105
N1c1
N1c*
LLY22g
P63
P43 N1b1
EUROPE

N1b*
M231
M128
N1a
N1*
N*
P117, P118
M3
SRY9138=M177
M353, M387 M2a
M2
M83 M1b1b
P22=M104 M16
M1b1a
P256 P87 NEW GUINEA
M1b1*
M1b*
P94
M4, M5=P73, M106, M186, M189, M296, P35 M1a2
P51
P34 M1a1
M1a*
M1*
M*
M*
Pk3
L3a
P14, M89, M213
M357
L3*
M274
L2b
M11, M20, M22, M61, M185, M295 M317 M349
L2a

INDUSVALLEY

VIII
L2*
M27, M76
L1
L*
M177 P261 P263 K4
P79
P7 9 K3
P6
P600
K2
SRY M147
9138
K1

ASIA
K
P84
J2b2d
M321
J2b2c
M280
M241 J2b2b
M12, M102, M221, M314
M99
J2b2a
J2b2*
M205
J2b1
J2b*
P279 J2a13
P81
J2a12
M419 J2a11
M340 J2a10
M339 J2a9

MEDITERRANEAN & LEVANT


M319
J2a8
M318
J2a7
M289
J2a6
M172 M410 M158
J2a5
M137
DYS413<18 J2a4
/

M68 J2a3
M163, M166 J2a2b
M327
M92, M260 J2a2a1
M67
J2a2a*
J2a2*
M47, M322
12f2a, M304,P209 J2a1
J2a*
J2**
J2
M369
J1e2
P58 M367, M368
J1e1
J1e*
P56
J1d
J1 d
M390
J1c
J1 c
M267 M365
J1b
J1b
M62
J1a
J1*
J*
P95 I2b4
I2b4
P78 I2b3
M223, P214, P216, P217, P218, P219, P220, P221, P222=U250, P223 M379 I2b2
M284 I2b1

NORTH EUROPE

VI
P215 I2b*
M161
M26 I2a2a
P41.2=M359 I2a2*
P37.2
I2a1
I2a*
P259
I1d
P109
P19,M170, P38, M258, P212, U179 I1c
M253, M307, P30, P40, M450 M72
M227 I1b1
I1b*
M21
I1a
I1*
I*
I*
P266 H2b
APT P80 H2a
M370 H2*
H1b
M39,M138 H1a3
M97
M69 M52 M82 H1a2
M36, M197

INDIA
H1a1
H1a*
H1*
H
M283
G3
M377
G2c
G2 c
M287
G2b
G2 b
M286
G2a2
P17, P18 G2a1a
P287 P15 P16
1*
G2a1*
G2a
G2a*
G2*
G2 *

EURASIA
M201,P257 P76
G1b
P20
M285, M342 G1a
G1*
G*
M427, M428
F2
F 2
M282
F1
F1
F
P9, M168, M294 P258
E2b1a2
M200 P45
E2b1a1
M85 E2b1a*
M54, M90, M98
E2b1*
M75, P68 E 2b*
E2b*
M41 2a
E2a
E
2*
E2*
E
P75
E1b2
M329
E1b1c
P72
E1b1b1f
V6
E1b1b1e
M281
E1b1b1d
M290
E1b1b1c1b
M34 M84, M136
E1b1b1c1a
M123
E1b1b1c1*
E1b1b1c*
M165, M183 E1b1b1b2
M81 M107
E1b1b1b1
E1b1b1b*
V65
E1b1b1a4
V19
M35 E1b1b1a3b
V22 M148 E1b1b1a3a
E1b1b1a3*
P65 E1b1b1a2b
V13, V36 V27
E1b1b1a2a
E1b1b1a2*
M78 V32 E1b1b1a1b
V12 M224 E1b1b1a1a
E1b1b1a1*
M215
E1b1b1a*

III
E1b1b1*
E1b1b*
P268, P269
E1b1a9
P59
E1b1a8a12
U181

AFRICA & MIDDLE EAST


U209, P277, P278 U290 E1b1a8a1a
E1b1a8a1*
U175
E1b1a8a*
E1b1a8*
P113 E1b1a7a3a
P116
E1b1a7a3*
M2, P1, M180=P88, P46, P182 P115
U174 E1b1a7a2
P189, P211, P293 P9.2
E1b1a7a1
M191, U186, U247
E1b1a7a*
P177
E1b1a7*
M10, M66, M156, M195 E1b1a6
SRY4064, M96, P29, P150, P152, P54, P155, P156, P162, P168, P169, P170, P171, P172, P173, P174, P175, P176, (SRY-8289=M40)
P147 M155 E1b1a5
M154 E1b1a4
SRY10831.1, M42, M94, M139
M149 E1b1a3
M116.2 E1b1a2
P2, P179, P180, P181 M58
E1b1a1
DYS391P E1b1a*
E1b1*
YAP (M1) M145 M203 P110 E1a2
M33, M132 M44 E1a1
E1a*
E*
E*
P99 P47 D3a*
D3*
P120
D2a3
M151 D2a2
P53.2 D2a1b1
M116.1 (022457)
D2a1b*
M174 M125 P12 D2a1a1
P42

IV
JAPAN
D2a1a*
(021355)
M55, M57, M64.1, M179, P37.1, P41.1, P190, 12f2b D2a1*
D2a*
D2*
N2
N1 D1a1
M15 D1a*
D1*
D*
M401
P55
C6
P92 C5a
M356
C5*
M210 C4a
M347
C4*
P62
C3e
P53.1 C3d
M48, M77, M86 C3c
M39 C3b

V
M407
M217, Pk2, P44 C3a2
M93

ASIA & AMERICA


RPS4Y711 (M130), M216, P184, P255, P260 M255, M325 C3a1
C3a*
C3*
P54
C2a2
M208 P33 C2a1
M38
C2a*
C2*
P121 C1a
M8, M105, M131, P122
C1*
C*
P112 B2c
MSY2.1, M211 B2b4b
P7 P8, P70 B2b4a
B2b4*
M108.2 B2b3a
M30, M129
M112, M192, 50f2(P) B2b3*
M115, M169 B2b2
P6

II
M182 b1
B2b1
B2
b*
B2b*
B2
M108.1 P111, M43 B2a2
B2a2aa
B2a2*
M150 M109, M152, P32, P50 B2a1a
M218
M60, M181, P85, P90
B 2a1*
B2a*
B 2*
M146 B1a
M236, M288

AFRICA
B1*
M118
A3b2b
M13, M63, M127, M202, M219,M305 M171
A3b2a
A3b2*
M144, M190, M220, P289 P71, P102
A3b1a
M51, P100, P291
A3b1
M32
A3b*
M28, M59
A3a
P262 A2c

I
P28
M6, M14, M23, M49, M71, M135, M141, M196, M206, M212, MEH1, P3, P4, P5, P36.1, Pk1, P247, P248 A2b
M114
M91 P97 A2a
A2*
P114
A1b
P108 M31, P82
A1a
A1*

S-ar putea să vă placă și