A new word-based compression model allowing compressed pattern matching

dc.authorid0000-0002-1477-3093
dc.authorscopusid56020743600
dc.authorscopusid36130183800
dc.authorscopusid36130374500
dc.authorwosidMesut, Altan/AAE-8734-2019
dc.authorwosidBuluş, Halil Nusret/ABA-8815-2020
dc.contributor.authorBuluş, Halil Nusret
dc.contributor.authorCarus, Aydın
dc.contributor.authorMesut, Altan
dc.date.accessioned2022-05-11T14:15:50Z
dc.date.available2022-05-11T14:15:50Z
dc.date.issued2017
dc.departmentFakülteler, Çorlu Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü
dc.description.abstractIn this study a new semistatic data compression model that has a fast coding process and that allows compressed pattern matching is introduced. The name of the proposed model is chosen as tagged word-based compression algorithm (TWBCA) since it has a word-based coding and word-based compressed matching algorithm. The model has two phases. In the first phase a dictionary is constructed by adding a phrase, paying attention to word boundaries, and in the second phase compression is done by using codewords of phrases in this dictionary. The first byte of the codeword determines whether the word is compressed or not. By paying attention to this rule, the CPM process can be conducted as word based. In addition, the proposed method makes it possible to also search for the group of consecutively compressed words. Any of the previous pattern matching algorithms can be chosen to use in compressed pattern matching as a black box. The duration of the CPM process is always less than the duration of the same process on the texts coded by Gzip tool. While matching longer patterns, compressed pattern matching takes more time on the texts coded by compress and end-tagged dense code (ETDC). However, searching shorter patterns takes less time on texts coded by our approach than the texts compressed with compress. Besides this, the compression ratio of our algorithm has a better performance against ETDC only on a file that has been written in Turkish. The compression performance of TWBCA is stable and does not vary over 6% on different text files.
dc.identifier.doi10.3906/elk-1601-92
dc.identifier.endpage3622
dc.identifier.issn1300-0632
dc.identifier.issn1303-6203
dc.identifier.issue5en_US
dc.identifier.scopus2-s2.0-85053832283
dc.identifier.scopusqualityQ3
dc.identifier.startpage3607
dc.identifier.urihttps://doi.org/10.3906/elk-1601-92
dc.identifier.urihttps://hdl.handle.net/20.500.11776/6088
dc.identifier.volume25
dc.identifier.wosWOS:000412571400010
dc.identifier.wosqualityQ4
dc.indekslendigikaynakWeb of Science
dc.indekslendigikaynakScopus
dc.institutionauthorBuluş, Halil Nusret
dc.institutionauthorCarus, Aydın
dc.institutionauthorMesut, Altan
dc.language.isoen
dc.publisherTubitak Scientific & Technical Research Council Turkey
dc.relation.ispartofTurkish Journal of Electrical Engineering and Computer Sciences
dc.relation.publicationcategoryMakale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanıen_US
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subjectCompression
dc.subjectpattern matching
dc.subjectcompressed pattern matching
dc.subjectsemistatic model
dc.subjectAlgorithm
dc.titleA new word-based compression model allowing compressed pattern matching
dc.typeArticle

Dosyalar

Orijinal paket
Listeleniyor 1 - 1 / 1
Yükleniyor...
Küçük Resim
İsim:
6068.pdf
Boyut:
242.33 KB
Biçim:
Adobe Portable Document Format
Açıklama:
Tam Metin / Full Text