Abstract for HONS 08/01 - Computer Science and Software Engineering - University of Canterbury - New Zealand
HONS 08/01

Compressed-Domain Pattern Matching with the Burrows-Wheeler

Matt Powell
Department of Computer Science
University of Canterbury

Abstract


This report investigates two approaches for online pattern-matching in files compressed with the Burrows-Wheeler transform (Burrows & Wheeler 1994). The first is based on the Boyer-Moore pattern matching algorithm (Boyer & Moore 1977), and the second is based on binary search. The new methods use the special structure of the Burrows-Wheeler transform to achieve efficient, robust pattern matching algorithms that can be used on files that have been only partly decompressed. Experimental results show that both new methods perform considerably faster than a decompress-and-search approach for most applications, with binary search being faster than Boyer-Moore at the expense of increased memory usage. The binary search in particular is strongly related to efficient indexing strategies such as binary trees, and suggests a number of new applications of the Burrows-Wheeler transform in data storage and retrieval.

  • Phone: +64 3 369 2777
    Fax: +64 3 364 2569
    CSSEadministration@canterbury.ac.nz
  • Computer Science and Software Engineering
    University of Canterbury
    Private Bag 4800, Christchurch
    New Zealand
  • Follow us
    FacebookYoutubetwitterLinked In