Gelişmiş Arama

Basit öğe kaydını göster

dc.contributor.authorUzun, Erdinç
dc.contributor.authorAgun, Hayri Volkan
dc.contributor.authorYerlikaya, Tarık
dc.date.accessioned2022-05-11T14:15:47Z
dc.date.available2022-05-11T14:15:47Z
dc.date.issued2012
dc.identifier.isbn9781467300568
dc.identifier.urihttps://doi.org/10.1109/SIU.2012.6204476
dc.identifier.urihttps://hdl.handle.net/20.500.11776/6068
dc.description2012 20th Signal Processing and Communications Applications Conference, SIU 2012 -- 18 April 2012 through 20 April 2012 -- Fethiye, Mugla -- 90786en_US
dc.description.abstractVia information extraction techniques, web pages are able to generate datasets for various studies such as natural language processing, and data mining. However, nowadays the uninformative sections like advertisement, menus, and links are in increase. The cleaning of web pages from uninformative sections, and extraction of informative content has become an important issue. In this study, we present an decision tree learning approach over DOM based features which aims to clean the uninformative sections and extract informative content in three classes: title, main content, and additional information. Through this approach, differently from previous studies, the learning model for the extraction of the main content constructed on DIV and TD tags. The proposed method achieved 95.58% accuracy in cleaning uninformative sections and extraction of the informative content. Especially for the extraction of the main block, 0.96 f-measure is obtained. © 2012 IEEE.en_US
dc.language.isoturen_US
dc.identifier.doi10.1109/SIU.2012.6204476
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectData setsen_US
dc.subjectDecision tree learningen_US
dc.subjectF-measureen_US
dc.subjectInformation extraction techniquesen_US
dc.subjectLearning modelsen_US
dc.subjectNAtural language processingen_US
dc.subjectWeb contenten_US
dc.subjectComputational linguisticsen_US
dc.subjectData miningen_US
dc.subjectDecision treesen_US
dc.subjectNatural language processing systemsen_US
dc.subjectSignal processingen_US
dc.subjectWebsitesen_US
dc.subjectInformation retrieval systemsen_US
dc.titleWeb content extraction by using decision tree learningen_US
dc.title.alternativeKarar a?aci ö?renmesik? kullanarak web i?çeri?k çikarimi]en_US
dc.typeconferencePaperen_US
dc.relation.ispartof2012 20th Signal Processing and Communications Applications Conference, SIU 2012, Proceedingsen_US
dc.departmentFakülteler, Çorlu Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.institutionauthorUzun, Erdinç
dc.relation.publicationcategoryKonferans Öğesi - Uluslararası - Kurum Öğretim Elemanıen_US
dc.authorscopusid54783608800
dc.authorscopusid55293388500
dc.authorscopusid16232085100
dc.identifier.scopus2-s2.0-84863462457en_US


Bu öğenin dosyaları:

Thumbnail

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster