A novel algorithm for extracting the user reviews from web pages

dc.authorid0000-0003-4351-2244
dc.authorid0000-0003-4842-2635
dc.authorwosidTufekci, Pinar/ABA-5121-2020
dc.authorwosidUzun, Erdinç/AAG-5529-2019
dc.contributor.authorUçar, Erdem
dc.contributor.authorUzun, Erdinç
dc.contributor.authorTufekçi, Pınar
dc.date.accessioned2022-05-11T14:15:50Z
dc.date.available2022-05-11T14:15:50Z
dc.date.issued2017
dc.departmentFakülteler, Çorlu Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü
dc.description.abstractExtracting the user reviews in websites such as forums, blogs, newspapers, commerce, trips, etc. is crucial for text processing applications (e.g. sentiment analysis, trend detection/monitoring and recommendation systems) which are needed to deal with structured data. Traditional algorithms have three processes consisting of Document Object Model (DOM) tree creation, extraction of features obtained from this tree and machine learning. However, these algorithms increase time complexity of extraction process. This study proposes a novel algorithm that involves two complementary stages. The first stage determines which HTML tags correspond to review layout for a web domain by using the DOM tree as well as its features and decision tree learning. The second stage extracts review layout for web pages in a web domain using the found tags obtained from the first stage. This stage is more time-efficient, being approximately 21 times faster compared to the first stage. Moreover, it achieves a relatively high accuracy of 96.67% in our experiments of review block extraction.
dc.identifier.doi10.1177/0165551516666446
dc.identifier.endpage712
dc.identifier.issn0165-5515
dc.identifier.issn1741-6485
dc.identifier.issue5en_US
dc.identifier.startpage696
dc.identifier.urihttps://doi.org/10.1177/0165551516666446
dc.identifier.urihttps://hdl.handle.net/20.500.11776/6089
dc.identifier.volume43
dc.identifier.wosWOS:000415348100008
dc.identifier.wosqualityQ2
dc.indekslendigikaynakWeb of Science
dc.institutionauthorUzun, Erdinç
dc.institutionauthorTufekçi, Pınar
dc.language.isoen
dc.publisherSage Publications Ltd
dc.relation.ispartofJournal of Information Science
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectEfficient extraction
dc.subjectweb data extraction
dc.subjectweb user reviews
dc.titleA novel algorithm for extracting the user reviews from web pages
dc.typeArticle

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Küçük Resim Yok
İsim:
6089.pdf
Boyut:
2.56 MB
Biçim:
Adobe Portable Document Format
Açıklama:
Tam Metin / Full Text