TEXT CATEGORIZATION USING ONLY FRAGMENTS OF DOCUMENTS

In this paper we presented a lot of experiments that examine how the particular parts of the documents do contribute to the performance of a classifier. We evaluated text classifiers on two very different text corpora. We conclude that some parts of the text are more important from the point of text classification performance. Giving higher weights to more important parts can increase the performance of the classifier. The question, that which parts are more or less important depends on the nature of the documents in the corpora. Some tasks that remains to be done: − More text corpora should be investigated. − In section 6.4 we optimized the number of features to be kept independent from the section. However, it could be optimized for each section. − Splitting the documents into parts of 50 words, to examine what if the parts are of equal size not only inside a document, but among the documents too. − When splitting documents into k equal parts, we may combine the classifiers resulted from different k values.


Issue Date:
2007
Publication Type:
Journal Article
DOI and Other Identifiers:
ISSN 0046-5518 (Other)
PURL Identifier:
http://purl.umn.edu/58927
Published in:
GAZDÁLKODÁS: Scientific Journal on Agricultural Economics, Volume 51, Special Issue Number 19
Page range:
214-221
Total Pages:
8
Series Statement:
51.
19. Special Issue




 Record created 2017-04-01, last modified 2018-01-22

Fulltext:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)