Performance Evaluation of Classification Methods in Layout Prediction of Web Pages
dc.authorid | 0000-0003-4351-2244 | |
dc.authorid | 0000-0002-3971-2676 | |
dc.authorscopusid | 57194265151 | |
dc.authorscopusid | 54783608800 | |
dc.authorwosid | Uzun, Erdinç/AAG-5529-2019 | |
dc.authorwosid | OZHAN, Erkan/N-8743-2016 | |
dc.contributor.author | Özhan, Erkan | |
dc.contributor.author | Uzun, Erdinç | |
dc.date.accessioned | 2022-05-11T14:15:54Z | |
dc.date.available | 2022-05-11T14:15:54Z | |
dc.date.issued | 2018 | |
dc.department | Fakülteler, Çorlu Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü | |
dc.description | International Conference on Artificial Intelligence and Data Processing (IDAP) -- SEP 28-30, 2018 -- Inonu Univ, Malatya, TURKEY | |
dc.description.abstract | The Web is an invaluable source of data stored on web pages. These data are contained in HTML layout elements of a web page. It is a crucial issue to extract data automatically from a web page. In this study, a dataset, which is annotated with seven different layouts including main content, headline, summary, other necessary layouts, menu, link, and other unnecessary layouts, is used. Then, 49 different features are computed from these layouts. Finally, we compare the different classification methods for evaluating the performance of these methods in layout prediction. The experiments show that the Random Forest classifier achieves a high accuracy of 98.46%. Thanks to this classifier, the prediction of link layout has a higher performance (approximately 0.988 f-Measure) according to the performance of the prediction of other layouts. On the other hand, the prediction of the summary layout has the worst performance with about 0.882 f-Measure. | |
dc.description.sponsorship | Inonu Univ, Comp Sci Dept, IEEE Turkey Sect, Anatolian Sci | |
dc.description.sponsorship | Namik Kemal University Research FundNamik Kemal University | |
dc.description.sponsorship | The authors acknowledge the support received from the Namik Kemal University Research Fund. | |
dc.identifier.isbn | 978-1-5386-6878-8 | |
dc.identifier.scopus | 2-s2.0-85062506995 | |
dc.identifier.uri | https://hdl.handle.net/20.500.11776/6112 | |
dc.identifier.wos | WOS:000458717400170 | |
dc.identifier.wosquality | N/A | |
dc.indekslendigikaynak | Web of Science | |
dc.indekslendigikaynak | Scopus | |
dc.institutionauthor | Özhan, Erkan | |
dc.institutionauthor | Uzun, Erdinç | |
dc.language.iso | en | |
dc.publisher | IEEE | |
dc.relation.ispartof | 2018 International Conference on Artificial Intelligence and Data Processing (Idap) | |
dc.relation.publicationcategory | Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı | en_US |
dc.rights | info:eu-repo/semantics/closedAccess | |
dc.subject | web data extraction | |
dc.subject | classification methods | |
dc.subject | layout detection | |
dc.subject | imbalanced dataset | |
dc.title | Performance Evaluation of Classification Methods in Layout Prediction of Web Pages | |
dc.type | Conference Object |
Dosyalar
Orijinal paket
1 - 1 / 1
Küçük Resim Yok
- İsim:
- 6112.pdf
- Boyut:
- 1.2 MB
- Biçim:
- Adobe Portable Document Format
- Açıklama:
- Tam Metin / Full Text