
Universität Bonn > Institut für Informatik > Abteilung V  
CSReports 1995  Copyright 1995 Universität Bonn, Institut für Informatik, Abt. V  
85132

The Fully Compressed String Matching for LempelZiv Encoding
Marek Karpinski, Wojciech Plandowski, Wojciech Rytter [Download PostScript] [Download PDF] The growing importance of massively stored information requires new approaches to efficient algorithms on texts represented in a compressed form. We consider here the {\em stringmatching} problem in the compressed setting. This problem has been already investigated in [2], [3], [1]. A rather theoretical type of compression was considered in [8]. In this paper we consider a practically important compression algorithm of Lempel and Ziv ({\em LZ algorithm}, in short). Denote by $LZ(w)$ the compressed version of a given string $w$ using the LZ algorithm. The {\em \bf Fully Compressed Matching Problem} is that of deciding if the pattern $P$ occurs in a text $T$, given only $LZ(P)$ and $LZ(T)$, without decompressing the pattern and the text. The first occurrence is reported, if there is any. Let $m$ and $n$ denote the sizes of $LZ(P)$ and $LZ(T)$, and $M$, $N$ be the sizes of uncompressed strings $P$ and $T$, respectively. In this paper we design the first polynomial time (with respect to $n$ and $m$) algorithm for the {\em Fully Compressed Matching Problem}. Note that in generall $N$, $M$ are exponential with respect to $n$ and $m$, and any algorithm which explicitely decompresses the pattern $P$ or the text $T$ would work in exponential time! In particular the algorithm given in [5] works in this situation in exponential time with respect to $m$ (in this algorithm the uncompressed pattern is a part of the input). The situations when both objects participating in the {\em stringmatching} are compressed (we deal with compressed patterns) are also practically important, for example in genetics and molecular biology (where uncompressed patterns are extremely long) or when we search for one compressed file in another compressed file. We introduce a new technique of succinct representations for long string periods and overlaps. 

Last Change:
08/23/04 at 07:44:40
English 
Universität Bonn > Institut für Informatik > Abteilung V 