Ara

Toplam kayıt 5, listelenen: 1-5

A Novel Web Scraping Approach Using the Additional Information Obtained from Web Pages

Uzun, Erdinç (Institute of Electrical and Electronics Engineers Inc., 2020)

Web scraping is a process of extracting valuable and interesting text information from web pages. Most of the current studies targeting this task are mostly about automated web data extraction. In the extraction process, ...

A regular expression generator based on CSS selectors for efficient extraction from HTML pages

Uzun, Erdinç (Turkiye Klinikleri, 2020)

Cascading style sheets (CSS) selectors are patterns used to select HTML elements. They are often preferred in web data extraction because they are easy to prepare and have short expressions. In order to be able to extract ...

An effective and efficient web content extractor for optimizing the crawling process

Uzun, Erdinç; Güner, Edip Serdar; Kılıçaslan, Yılmaz; Yerlikaya, Tarık; Agun, Hayri Volkan (John Wiley and Sons Ltd, 2014)

Classical Web crawlers make use of only hyperlink information in the crawling process. However, focused crawlers are intended to download only Web pages that are relevant to a given topic by utilizing word information ...

Web content extraction by using decision tree learning

Uzun, Erdinç; Agun, Hayri Volkan; Yerlikaya, Tarık (2012)

Via information extraction techniques, web pages are able to generate datasets for various studies such as natural language processing, and data mining. However, nowadays the uninformative sections like advertisement, ...

An efficient regular expression inference approach for relevant image extraction

Agün, H.V.; Uzun, Erdinç (Elsevier Ltd, 2023)

Traditional approaches for extracting relevant images automatically from web pages are error-prone and time-consuming. To improve this task, operations such as preparing a larger dataset and finding new features are used ...