Performance Evaluation of Classification Methods in Layout Prediction of Web Pages

dc.authorid0000-0003-4351-2244
dc.authorid0000-0002-3971-2676
dc.authorscopusid57194265151
dc.authorscopusid54783608800
dc.authorwosidUzun, Erdinç/AAG-5529-2019
dc.authorwosidOZHAN, Erkan/N-8743-2016
dc.contributor.authorÖzhan, Erkan
dc.contributor.authorUzun, Erdinç
dc.date.accessioned2022-05-11T14:15:54Z
dc.date.available2022-05-11T14:15:54Z
dc.date.issued2018
dc.departmentFakülteler, Çorlu Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü
dc.descriptionInternational Conference on Artificial Intelligence and Data Processing (IDAP) -- SEP 28-30, 2018 -- Inonu Univ, Malatya, TURKEY
dc.description.abstractThe Web is an invaluable source of data stored on web pages. These data are contained in HTML layout elements of a web page. It is a crucial issue to extract data automatically from a web page. In this study, a dataset, which is annotated with seven different layouts including main content, headline, summary, other necessary layouts, menu, link, and other unnecessary layouts, is used. Then, 49 different features are computed from these layouts. Finally, we compare the different classification methods for evaluating the performance of these methods in layout prediction. The experiments show that the Random Forest classifier achieves a high accuracy of 98.46%. Thanks to this classifier, the prediction of link layout has a higher performance (approximately 0.988 f-Measure) according to the performance of the prediction of other layouts. On the other hand, the prediction of the summary layout has the worst performance with about 0.882 f-Measure.
dc.description.sponsorshipInonu Univ, Comp Sci Dept, IEEE Turkey Sect, Anatolian Sci
dc.description.sponsorshipNamik Kemal University Research FundNamik Kemal University
dc.description.sponsorshipThe authors acknowledge the support received from the Namik Kemal University Research Fund.
dc.identifier.isbn978-1-5386-6878-8
dc.identifier.scopus2-s2.0-85062506995
dc.identifier.urihttps://hdl.handle.net/20.500.11776/6112
dc.identifier.wosWOS:000458717400170
dc.identifier.wosqualityN/A
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.institutionauthorÖzhan, Erkan
dc.institutionauthorUzun, Erdinç
dc.language.isoen
dc.publisherIEEE
dc.relation.ispartof2018 International Conference on Artificial Intelligence and Data Processing (Idap)
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectweb data extraction
dc.subjectclassification methods
dc.subjectlayout detection
dc.subjectimbalanced dataset
dc.titlePerformance Evaluation of Classification Methods in Layout Prediction of Web Pages
dc.typeConference Object

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
6112.pdf
Boyut:
1.2 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Tam Metin / Full Text