An efficient regular expression inference approach for relevant image extraction

Agün, H.V.; Uzun, Erdinç

An efficient regular expression inference approach for relevant image extraction

dc.authorscopusid	55293388500
dc.authorscopusid	54783608800
dc.contributor.author	Agün, H.V.
dc.contributor.author	Uzun, Erdinç
dc.date.accessioned	2023-05-06T17:19:37Z
dc.date.available	2023-05-06T17:19:37Z
dc.date.issued	2023
dc.department	Fakülteler, Çorlu Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü
dc.description.abstract	Traditional approaches for extracting relevant images automatically from web pages are error-prone and time-consuming. To improve this task, operations such as preparing a larger dataset and finding new features are used in the web data extraction approaches. However, these operations are difficult and laborious. In this study, we propose a fully-automated approach based on alignment of regular expressions to automatically extract the relevant images from web pages. The automatically constructed regular expressions has been applied to a classification task for the first time. In this respect, a multi-stage inference approach is developed for generating regular expressions from the attribute values of relevant and irrelevant image elements in web pages. The proposed approach reduces the complexity of the alignment of two regular expressions by applying a constraint on a version of the Levenshtein distance algorithm. The classification accuracy of regular expression approaches is compared with the naive Bayes, logistic regression, J48, and multilayer perceptron classifiers on a balanced relevant image retrieval dataset consisting of 360 image element samples for 10 shopping websites. According to the cross-validation results, the regular expression inference-based classification achieved a 0.98 f-measure with only 5 frequent n-grams, and it outperformed other classifiers on the same set of features. The classification efficiency of the proposed approach is measured at 0.108 ms, which is very competitive with other classifiers. © 2023
dc.identifier.doi	10.1016/j.asoc.2023.110030
dc.identifier.issn	1568-4946
dc.identifier.scopus	2-s2.0-85149807859
dc.identifier.scopusquality	Q1
dc.identifier.uri	https://doi.org/10.1016/j.asoc.2023.110030
dc.identifier.uri	https://hdl.handle.net/20.500.11776/11889
dc.identifier.volume	135
dc.identifier.wos	WOS:000967879100001
dc.identifier.wosquality	Q1
dc.indekslendigikaynak	Web of Science
dc.indekslendigikaynak	Scopus
dc.institutionauthor	Uzun, Erdinç
dc.language.iso	en
dc.publisher	Elsevier Ltd
dc.relation.ispartof	Applied Soft Computing
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	Feature extraction
dc.subject	Regular expression inference
dc.subject	Text classification
dc.subject	Web image extraction
dc.subject	Classification (of information)
dc.subject	Extraction
dc.subject	Image classification
dc.subject	Pattern matching
dc.subject	Support vector machines
dc.subject	Text processing
dc.subject	Features extraction
dc.subject	Image elements
dc.subject	Image extraction
dc.subject	Regular expression inference
dc.subject	Regular expressions
dc.subject	Text classification
dc.subject	Traditional approaches
dc.subject	Web image extraction
dc.subject	Web images
dc.subject	Web-page
dc.subject	Websites
dc.title	An efficient regular expression inference approach for relevant image extraction
dc.type	Article

Dosyalar

Orijinal paket

Listeleniyor 1 - 1 / 1

İsim:: 11889.pdf
Boyut:: 805 KB
Biçim:: Adobe Portable Document Format
Açıklama:: Tam Metin / Full Text

İndir

Koleksiyon

Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu
Çorlu Mühendislik Fakültesi Koleksiyonu