Gelişmiş Arama

Basit öğe kaydını göster

dc.contributor.authorAgün, H.V.
dc.contributor.authorUzun, Erdinç
dc.date.accessioned2023-05-06T17:19:37Z
dc.date.available2023-05-06T17:19:37Z
dc.date.issued2023
dc.identifier.issn1568-4946
dc.identifier.urihttps://doi.org/10.1016/j.asoc.2023.110030
dc.identifier.urihttps://hdl.handle.net/20.500.11776/11889
dc.description.abstractTraditional approaches for extracting relevant images automatically from web pages are error-prone and time-consuming. To improve this task, operations such as preparing a larger dataset and finding new features are used in the web data extraction approaches. However, these operations are difficult and laborious. In this study, we propose a fully-automated approach based on alignment of regular expressions to automatically extract the relevant images from web pages. The automatically constructed regular expressions has been applied to a classification task for the first time. In this respect, a multi-stage inference approach is developed for generating regular expressions from the attribute values of relevant and irrelevant image elements in web pages. The proposed approach reduces the complexity of the alignment of two regular expressions by applying a constraint on a version of the Levenshtein distance algorithm. The classification accuracy of regular expression approaches is compared with the naive Bayes, logistic regression, J48, and multilayer perceptron classifiers on a balanced relevant image retrieval dataset consisting of 360 image element samples for 10 shopping websites. According to the cross-validation results, the regular expression inference-based classification achieved a 0.98 f-measure with only 5 frequent n-grams, and it outperformed other classifiers on the same set of features. The classification efficiency of the proposed approach is measured at 0.108 ms, which is very competitive with other classifiers. © 2023en_US
dc.language.isoengen_US
dc.publisherElsevier Ltden_US
dc.identifier.doi10.1016/j.asoc.2023.110030
dc.rightsinfo:eu-repo/semantics/closedAccessen_US
dc.subjectFeature extractionen_US
dc.subjectRegular expression inferenceen_US
dc.subjectText classificationen_US
dc.subjectWeb image extractionen_US
dc.subjectClassification (of information)en_US
dc.subjectExtractionen_US
dc.subjectImage classificationen_US
dc.subjectPattern matchingen_US
dc.subjectSupport vector machinesen_US
dc.subjectText processingen_US
dc.subjectFeatures extractionen_US
dc.subjectImage elementsen_US
dc.subjectImage extractionen_US
dc.subjectRegular expression inferenceen_US
dc.subjectRegular expressionsen_US
dc.subjectText classificationen_US
dc.subjectTraditional approachesen_US
dc.subjectWeb image extractionen_US
dc.subjectWeb imagesen_US
dc.subjectWeb-pageen_US
dc.subjectWebsitesen_US
dc.titleAn efficient regular expression inference approach for relevant image extractionen_US
dc.typearticleen_US
dc.relation.ispartofApplied Soft Computingen_US
dc.departmentFakülteler, Çorlu Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümüen_US
dc.identifier.volume135en_US
dc.institutionauthorUzun, Erdinç
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.authorscopusid55293388500
dc.authorscopusid54783608800
dc.identifier.scopus2-s2.0-85149807859en_US


Bu öğenin dosyaları:

Thumbnail

Bu öğe aşağıdaki koleksiyon(lar)da görünmektedir.

Basit öğe kaydını göster