Documente Academic
Documente Profesional
Documente Cultură
ISSN 2229-5518
ABSTRACT : A mass of breast tissue that is developing in an abnormal, uncontrolled way is the cancerous breast tumor.
The early detection of breast cancer is a key for survival because of its association with augmented treatment options.
Mammography screening and MRI are some of the existing breast cancer detection methods. MRI has problem of
resulting more number of false positives. Mammogram has disadvantages like expensive, false positives for patients with
dense breast tissues, detects only if tumor size bigger than 5mm and painful. Hence there is a need to develop more
convenient and accurate method. In this proposed approach, we analyzed gene expression patterns in blood cells for
detecting the breast cancer in the early stage. BRCA gene is a tumor suppressor gene which all people have. The BRCA
DNA sequences from patients are generated by PCR method and used as input in the local sequence alignment program
which is the implementation of Smith waterman algorithm. It compares the patient's gene sequence with the reference
BRCA gene sequence to determine the cancer risk at a very early stage.
KEYWORDS: Breast cancer, early detection, Tumor suppressor genes, BRCA, blood sample, PCR method, DNA
sequencing, gene sequence, Local sequence alignment algorithm, Smith waterman.
—————————— ——————————
IJSER © 2010
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 1, Issue 3, December-2010 2
ISSN 2229-5518
breasts, compared with 78% sensitivity for BRCA genes provides instructions for
the entire sample of women in the study. So making a protein that is directly involved
there is a need to develop more accurate, in repairing damaged DNA. By helping
convenient and objective detection method repair DNA, BRCA plays a role in
[12].Comparing patient BRCA gene with the maintaining the stability of a cell's genetic
original gene is the identified method in the information [13].
proposed approach. Gene is a stretch of
DNA, so DNA sequences are compared to It is identified that more than 1,000
identify cancer risk. The sequence mutations in theBRCA1 gene and 800
comparison is executed using the dynamic mutations in the BRCA2 gene are
programming algorithm for local alignment possible, many of which are associated
between two DNA sequences proposed by with an increased risk of breast cancer.
Smith and Waterman called smith Most of these mutations lead to the j
waterman algorithm is a very well known production of an abnormally short version
and versatile algorithm [16]. of the BRCA1 protein, or prevent any
protein from being made from one copy of
3. EXPERIMENTAL STUDY OF BRCA the gene. Other BRCA1 mutations change
GENES single j protein building blocks (amino
acids) in the protein or delete large
The official name of BRCA 1 gene and
segments of DNA from the BRCA1 gene.
BRCA2 gene are breast cancer susceptibility
Many BRCA2 mutations insert or delete a
gene 1 and breast cancer susceptibility gene
small number of DNA building blocks
2, respectively. The BRCA genes belong to a
(nucleotides) in the gene. Researchers believe
class of genes known as tumor suppressor
that a defective or missing BRCA1 protein is
genes [10]. Like many other tumor
unable to help repair damaged DNA or fix
suppressors, the protein produced from
mutations that occur in other |genes. As
the BRCA genes helps prevent cells from
these defects accumulate, they can allow
growing and dividing too rapidly or in an
cells to | grow and divide uncontrollably
uncontrolled way. There is no strong
and form a tumor [8, 9].
homology between BRCA1 and BRCA2,
although both genes have a large exon 11 4. SEQUENCE COMPARISON
which seems to be crucial for function.
However, the function of the two genes Sequence comparison can be defined as
seems to be similar [14, 20]. The BRCA the problem of J finding which parts of the
genes provides instructions for making a sequences are similar and which parts are
protein that is directly involved in different. Generally, a measure of how
repairing damaged DNA. By helping similar they are is also desirable. A typical
repair DNA, BRCA 1 plays a role in approach to solve this problem is to find a
maintaining the stability of a cell's genetic good and plausible alignment between the
information [13]. It is identified that more two sequences. Then, given an appropriate
than 1,000 mutations in the both genes have scoring scheme, their similarity can be
a large exon 11 which seems to be crucial computed. Generally, sequence comparisons
for function. However, the function of the involve aligning sections of the two
two genes seems to be similar [14, 20]. The sequences in a way that exposes the
IJSER © 2010
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 1, Issue 3, December-2010 3
ISSN 2229-5518
similarities between them [7]. The idea of precision (such as searching a database for
aligning two sequences (of possibly sequences with high similarity to a query).
different sizes) is to write one on top of the The three primary methods of producing
other, and break them into smaller pieces by pair wise alignments are dot-matrix
inserting spaces in one or the other so that methods, dynamic programming, and
identical subsequences are eventually word methods.
aligned in a one-to-one correspondence
naturally, spaces are not inserted in both Global alignment is achieved using the
sequences at the same position. The Needleman-Wunsch algorithm. The
objective of sequence alignment is to algorithm it tries to take all of one
match identical subsequences as far as sequence and align it with all of a second
possible. However, if the sequences are not sequence. Short and highly similar
identical, mismatches are likely to occur as subsequences may be missed in the
different letters are aligned together. The alignment because they are outweighed by
insertion of spaces produced gaps in the the rest of the sequence. Hence, one would
sequences. They are important to allow a like to create a locally optimal alignment
good alignment between the characters of [18]. Local alignments are more useful for
sequences. A gap in the first sequence is dissimilar sequences that are suspected to
considered an insertion of a character from contain regions of similarity or similar
the second sequence into the first one, sequence motifs within their larger
whereas a gap in the second sequence is sequence context. The Smith-Waterman
considered a deletion of a character of the algorithm is a general local alignment
first sequence. method also based on dynamic
programming. The dynamic
Once the alignment is produced, a score | programming approach to pair wise
can be assigned to each pair of aligned letters, sequence alignment is guaranteed to
called aligned pair, according to a chosen provide the optimal global or local pair
scoring scheme. The similarity of two wise alignment and score given a particular
sequences can be defined the best score scoring scheme [1]. In smith waterman
among all possible alignments between algorithm,
them. Sequence comparison is actually a 1. All symbols (residues) in the
well-know problem in computer science. two sequences have to be in the
Computational approaches to sequence alignment, and in the same
alignment generally fall into two order they appear in the
categories: global alignments and local sequences
alignments. Pair wise sequence alignment 2. We can align one symbol from
methods are used to find the best- one sequence with one from
matching piecewise (local) or global another
alignments of two query sequences. Pair 3. A symbol can be aligned with a
wise alignments can only be used between blank ('-')
two sequences at a time, but they are 4. Two blanks cannot be aligned
efficient to calculate and are often used for [6, 15, 17]
methods that do not require extreme
5. PROPOSED SYSTEM
IJSER © 2010
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 1, Issue 3, December-2010 4
ISSN 2229-5518
H (i , j) =
7. CONCLUSION
IJSER © 2010
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 1, Issue 3, December-2010 6
ISSN 2229-5518
Consequently, this early detection method BRCA1 Breast cancer 1, early onset
using DNA sequencing has significantly BRCA2 Breast cancer 2, early onset
advantageous than other methods since PCR Polymerase Chain Reaction
cancer risks can be identified in the early DNA Deoxyribonucleic acid
stage, even before the symptoms are
clearly observable. Moreover, this method REFERENCES
is beneficial as a consequence of its ease of [1]. EC Rouchka "Aligning DNA sequencing
use, economical with respect to laboratory using Dynamic Programming",ACM, 2006.
usage and reliable as genes are used for [2]. American Cancer Society, "Breast
detection. The proposed approach has Cancer Facts and Figures 2009-2010",
95% efficiency in detecting breast cancer American Cancer Society, 2009.
in early stage. This project assures more [3]. American Cancer Society, "What is
effective and accurate method and aims Breast Cancer", American Cancer Society,
towards breast cancer detection in early Sep. 18, 2009.
stage. [4]. Baylor college of Medicine HGSC,
"Smith waterman algorithm," Baylor college
8. FUTURE SCOPE of Medicine HGSC, Aug.01, 2002.
[5]. Breast Cancer, "Stages of Breast Cancer",
The current evaluation system has Breast Cancer, Jan.21, 2010.
potential outcome in observing the cancer
[6]. David W Mount, Bioinformatics:
risk of patient. The smith waterman
Sequence and genome analysis, 2nd ed, NY:
algorithm is effective for text string
Cold spring horbor laboratory press, 2000.
matching, but an assessment is required to
[7]. Eugene W. Myers, "An Overview of
determine the proportional benefits of the
Sequence
algorithm with the traditional techniques
Comparison Algorithms in Molecular Biology,"
and other sequencing algorithms. Thought
Department of Computer Science, The
smith waterman algorithm is very sensitive
University of Arizona, Arizona, Tech Rep
and accurate, it has more time complexity
91-29, December 20,1991.
and it needs large memory space. As the
[8]. Genetic Home Reference, "BRCA1",
biological sequencing data are rapidly
Genetic Home Reference, Aug, 2007
expanding, the memory requirement has
[9]. Genetic Home Reference, "BRCA2",
become a critical problem in the existing
Genetic Home I Reference, Aug, 2007.
smith waterman algorithm. The future
[10]. National Cancer Institute, "BRCA I and
work can target to use the upgraded Smith
BRCA2: Canarl Risk and Genetic Testing"
waterman algorithm, that has reduced
National Cancer Institute, May.29, 2009.
computational complexity to (N*(M+l)/2)
[11]. "Overview of steps in DNA Sequencing".
and less size and space complexity.
[Online]. .Apr.6 2010.
Moreover, risk level of cancer can also be
[12]. P. Sharma et al, "Early detection of breast
identified in further computational
cancer base on gene-expression patterns in
analysis.
peripheral blood cells," Breast cancer research,
p. 634+, Jun 2005.
[13]. Ralph Scully, "Role of BRCA gene
ABBREVATIONS dysfunction in breast and ovarian cancer
IJSER © 2010
http://www.ijser.org
International Journal of Scientific & Engineering Research, Volume 1, Issue 3, December-2010 7
ISSN 2229-5518
IJSER © 2010
http://www.ijser.org