Documente Academic
Documente Profesional
Documente Cultură
I.
INTRODUCTION
34
Rules
Relations
Concept Hierarchies
Concepts
Synonyms
Terms
Figure 1. Ontology Learning Layer-Cake [1]
RELATED WORK
Frantzi et al. [8] presented an approach called C-value/NCvalue to use multiple POS tag combinations, to extract
complex terms. They present three linguistic filters as
follows:
Noun+ Noun
(Adj|Noun)+Noun
((Adj|Noun)+|((Adj|Noun)*(NounPrep)?)(Adj|Noun)
* +
) Noun
36
C. Corpus Selection
Existing approaches in ontology learning select corpora
based on word count [3], [4]. Nevertheless, Sinclair [10] has
stated that frequency of terms in the corpus follows Zipf's
Law, which means that approximately half of them occur
only once, a quarter only twice, and so on. For example, in
the Brown corpus (first million-word corpus of general
written American English), there was a vocabulary of
different word forms of 69002, of which 35065 occurred
only once. At the other end of the frequency scale, the
commonest word with a frequency of 69970, which is
almost twice as common as the next one, was at 36410. This
implies that word count is not enough to select a good
corpus.
III.
Candidate
terms
Applying linguistic rules
Calculating term
distribution
Annotated corpora
Organizing
corpora
Existing corpora
Computer Science
o Mikalai Krapivin Corpus (MK) [11]
o NUS Keyphrase Corpus (NUS) [12]
Bio Medical
o GENIA corpus (G) [13]
Agriculture
o FAO-Food-And-Agriculture (FAO) [11]
Built corpora,
Cricket domain
o based on Cricinfo RSS feeds (C)
Business domain
o based on Reuters and ABC RSS feeds (R)
NUS
FAO
1.31
2.23
1.69
2.46
0.08
20.82
0.39
0.55
0.81
0.63
6.49
0.19
0.25
0.39
0.27
2.26
3.09
0.12
0.15
0.26
0.17
0.12
1.82
0.08
0.10
0.19
0.11
1.26
Total
2.11
3.31
3.63
3.66
2.48
33.50
38
39
IV.
V.
(2)
Here,
CONCLUSION
(1)
i
EVALUATION
40
41