Sunteți pe pagina 1din 1

Automatic Genre-Specific Text Classification

Kim, S.-B., Han, K.-S., Rim, H.-C., & Myaeng, S. H. of the Seventh ACM/IEEE-CS Joint Conference on
(2006). Some effective techniques for naive bayes text Digital Libraries (pp. 440-441). New York, NY, USA:
classification. IEEE Transactions on Knowledge and ACM Press.
Data Engineering, vol. 18, no. 11, 1457–1466.
Yu, X., Tungare, M., Fan, W., Yuan, Y., Pérez-Quiño-
Matsunaga, Y., Yamada, S., Ito, E., & Hirokaw S. nes, M., Fox, E. A., Cameron, W., & Cassel, L. (2008).
(2003) A web syllabus crawler and its efficiency evalu- Automatic syllabus classification using support vector
ation. In Proceedings of International Symposium on machines. (To appear in) M. Song & Y. Wu (Eds.)
Information Science and Electrical Engineering (pp. Handbook of Research on Text and Web Mining Tech-
565-568). nologies. Idea Group Inc.
Pomerantz, J., Oh, S., Yang, S., Fox, E. A., & Wilde-
muth, B. M. (2006) The core: Digital library education
in library and information science programs. D-Lib KEY TERMS
Magazine, vol. 12, no. 11.
False Syllabus: A page that does not describe a
Platt, J. C. (1999). Fast training of support vector
course.
machines using sequential minimal optimization. In
B. Schölkopf, C. J. Burges, & A. J. Smola, (Eds.) Ad- Feature Selection: A method to reduce the high
vances in Kernel Methods: Support Vector Learning dimensionality of the feature space by selecting features
(pp. 185-208). MIT Press, Cambridge, MA. that are more representative than others. In text clas-
sification, usually the feature space consists of unique
Sebastiani, F. (2002). Machine learning in automated
terms occurring in the documents.
text categorization. ACM Computing Surveys. 34, 1
(Mar. 2002), 1-47. Genre: Information presented in a specific format,
often with certain fields and subfields associated closely
Thompson, C. A., Smarr, J., Nguyen, H. & Manning,
with the genre; e.g. syllabi, news reports, academic
C. (2003) Finding educational resources on the web:
articles, etc.
Exploiting automatic extraction of metadata. In Pro-
ceedings of European Conference on Machine Learning Model Testing: A procedure performed after model
Workshop on Adaptive Text Extraction and Mining. training that applies the trained model to a different
data set and evaluates the performance of the trained
Tungare, M., Yu, X., Cameron, W., Teng, G., Pérez-
model.
Quiñones, M., Fox, E., Fan, W., & Cassel, L. (2007).
Towards a syllabus repository for computer science Model Training: A procedure in supervised learn-
courses. In Proceedings of the 38th Technical Sym- ing that generates a function to map inputs to desired
posium on Computer Science Education (pp. 55-59). outputs. In text classification, a function is generated
SIGCSE Bull. 39, 1. to map a document represented by features into known
classes.
Witten, I. H., & Frank E. (2005). Data Mining: Practi-
cal Machine Learning Tools and Techniques (Second Naïve Bayes (NB) Classifiers: A classifier mod-
Edition). Morgan Kaufmann. eled as a Bayesian network where feature attributes are
conditionally independent of class attributes.
Yang Y. & Pedersen, J. O. (1997). A comparative study
on feature selection in text categorization. In Proceed- Support Vector Machines (SVM): A supervised
ings of the Fourteenth International Conference on machine learning approach used for classification
Machine Learning (pp. 412–420). San Francisco, CA, and regression to find the hyperplane maximizing the
USA: Morgan Kaufmann Publishers Inc. minimum distance between the plane and the training
data points.
Yu, X., Tungare, M., Fan, W., Pérez-Quiñones, M., Fox,
E. A., Cameron, W., Teng, G., & Cassel, L. (2007). Syllabus Component: One of the following pieces
Automatic syllabus classification. In Proceedings of information: course code, title, class time, class loca-
tion, offering institute, teaching staff, course description,



S-ar putea să vă placă și