Re-Pair Compression of Inverted Lists (2009)
Claude, Francisco, Farina, Antonio, Navarro, Gonzalo
Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions...
Kimmo Fredriksson, Gonzalo Navarro
Communicated by Editor’s name Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique...
Improving the Space Cost of k-NN Search in Metric Spaces by Using Distance Estimators (2009)
Benjamin Bustos, Gonzalo Navarro
Similarity searching in metric spaces has a vast number of applications in several fields like multimedia databases, text retrieval, computational biology, and pattern recognition. In this context,...
Bit-Parallel Computation of Local Similarity Score Matrices with Unitary Weights (2009)
Abstract. Local similarity computation between two sequences permits detecting all the relevant alignments present between subsequences thereof. A well-known dynamic programming algorithm works in...
Heikki Hyyr Ö, Gonzalo Navarro
Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage,...
Kimmo Fredriksson, Gonzalo Navarro
Communicated by Editor’s name Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique...
Compressed representations of permutations, and applications (2009)
Jérémy Barbay, Gonzalo Navarro
We explore various techniques to compress a permutation π over n integers, taking advantage of ordered subsequences in π, while supporting its application π(i) and the application of its inverse...
Lightweight natural language text compression (2009)
Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, José R. Paramá
Variants of Huffman codes where words are taken as the source symbols are currently the most attractive choices to compress natural language text databases. In particular, Tagged Huffman Code by...
Improved dynamic rank-select entropy-bound structures (2009)
Rodrigo González, Gonzalo Navarro
Abstract. Operations rank and select over a sequence of symbols have many applications to the design of succinct and compressed data structures to manage text collections, structured text, binary...
A Fast and Compact Web Graph Representation ⋆ (2009)
Francisco Claude, Gonzalo Navarro
Abstract. Compressed graphs representation has become an attractive research topic because of its applications in the manipulation of huge Web graphs in main memory. By far the best current result is...
Sequential and Indexed Two-Dimensional Combinatorial Template Matching Allowing Rotations (2009)
Kimmo Fredriksson, Gonzalo Navarro, Esko Ukkonen
We present new and faster algorithms to search for a 2-dimensional pattern in a 2-dimensional text allowing any rotation of the pattern. This has applications such as image databases and...
Locally Compressed Suffix Arrays (2009)
Rodrigo González, Gonzalo Navarro
Compressed text (self-)indexes have matured up to a point where they can replace a text by a data structure that requires less space and, in addition to giving access to arbitrary text passages,...
On the Least Cost For Proximity Searching in Metric Spaces ⋆ (2009)
Karina Figueroa, Edgar Chávez, Gonzalo Navarro, Rodrigo Paredes
Abstract. Proximity searching consists in retrieving from a database those elements that are similar to a query. As the distance is usually expensive to compute, the goal is to use as few distance...
Abstract. We present a bit-parallel technique to search a text of length n for a regular expression of m symbols permitting k differences in worst case time O(mn / log k s), where s is the amount of...
Combining Structural and Textual Contexts for Compressing Semistructured Databases ∗ (2009)
Joaquín Adiego, Gonzalo Navarro
We describe a compression technique for semistructured documents, called SCMPPM, which combines the Prediction by Partial Matching technique with Structural Contexts Model (SCM) technique. SCMPPM...
A Simple Alphabet-Independent FM-Index (2009)
Szymon Grabowski, Gonzalo Navarro, Rafal Przywarski, Alejandro Salinger, Veli Mäkinen
We design a succinct full-text index based on the idea of Huffman-compressing the text and then applying the Burrows-Wheeler transform over it. The resulting structure can be searched as an FM-index,...
Edgar Chávez, Escuela Ciencias, Físico Matemáticas, Gonzalo Navarro
The advent of the digital age creates an increasing interest to search for information in large unstructured repositories containing textual and multimedia data. Retrieval from those repositories...
SESTL: An Event-Oriented Spatio-Temporal Access Method ∗ (2009)
Gilberto A. Gutiérrez, Gonzalo Navarro, Andrea Rodríguez, Santiago Chile
This work presents a new access method (SESTL) that handles discrete change events over objects ’ spatial attributes. The access method combines snapshots and events in log structures associated...
O(mn log σ) Time Transposition Invariant LCS Computation (2009)
Szymon Grabowski, Gonzalo Navarro
Abstract. Given strings A and B of lengths m and n over a finite alphabet Σ ⊂ Z of size O(σ), the length of the longest common transposition invariant subsequence is LCTS(A, B) = maxt∈Z{LCS(A...
Compressed Text Indexes: From Theory to Practice (2009)
Paolo Ferragina, Rodrigo González, Gonzalo Navarro, Rossano Venturini
A compressed full-text self-index represents a text in a compressed form and still answers queries efficiently. This represents a significant advancement over the (full-)text indexing techniques of...
Approximate String Matching with Lempel-Ziv Compressed Indexes (2009)
Gonzalo Navarro, Arlindo L. Oliveira
Abstract. A compressed full-text self-index for a text T is a data structure requiring reduced space and able of searching for patterns P in T. Furthermore, the structure can reproduce any substring...
Dynamic entropy-compressed sequences and full-text indexes (2009)
We give new solutions to the Searchable Partial Sums with Indels problem. Given a sequence of n k-bit numbers, we present a structure taking kn+o(kn) bits of space, able of performing operations sum,...
Fast and Compact Web Graph Representations ∗ (2009)
Francisco Claude, Gonzalo Navarro
Compressed graph representations, in particular for Web graphs, have become an attractive research topic because of their applications in the manipulation of huge graphs in main memory. By far the...
The LZ-index is a theoretical proposal of a lightweight data structure for text indexing, based on the Ziv-Lempel trie. If a text of u characters over an alphabet of size σ is compressible to n...
A Compressed Text Index on Secondary Memory (2009)
Rodrigo González, Gonzalo Navarro
Abstract. We introduce a practical disk-based compressed text index that, when the text is compressible, takes much less space than the suffix array. It provides very good I/O times for searching,...
Quickheaps: Simple, Efficient, and Cache-Oblivious ⋆ (2009)
Gonzalo Navarro, Rodrigo Paredes
Abstract. We present the Quickheap, a simple and efficient data structure for implementing priority queues in main and secondary memory. Quickheaps are comparable with classical binary heaps in...
Approximate String Matching with Ziv-Lempel Compressed Indexes (2009)
Gonzalo Navarro, Arlindo L. Oliveira
Abstract. A compressed full-text self-index for a text T is a data structure requiring reduced space and able of searching for patterns P in T. Furthermore, the structure can reproduce any substring...
Stronger Lempel-Ziv Based Compressed Text Indexing ⋆ (2009)
Diego Arroyuelo, Gonzalo Navarro, Kunihiko Sadakane
Abstract. Given a text T[1..u] over an alphabet of size σ, the full-text search problem consists in finding the occ occurrences of a given pattern P[1..m] in T. In indexed text searching we build an...
A text is any sequence of symbols (or characters) drawn from an alphabet. A large portion of the information available worldwide in electronic form is actually in text form (other popular forms are...
Re-Pair Achieves High-Order Entropy ∗ (2009)
Re-Pair is a dictionary-based compression method invented in 1999 by Larsson and Moffat. Although its practical performance has been established through experiments, the method has resisted all...
Nordic Journal of Computing SUCCINCT SUFFIX ARRAYS BASED ON RUN-LENGTH ENCODING ∗ (2009)
Abstract. A succinct full-text self-index is a data structure built on a text T = t1t2...tn, which takes little space (ideally close to that of the compressed text), permits efficient search for the...
A Fast and Compact Web Graph Representation ⋆ (2009)
Francisco Claude, Gonzalo Navarro
Abstract. Compressed graphs representation has become an attractive research topic because of its applications to the manipulation of huge Web graphs in main memory. By far the best current result is...
Dynamic Dictionaries in Constant Worst-Case Time (2009)
We introduce a technique to maintain a set of n elements from a universe of size u with membership and indel operations, so that elements are associated r-bit satellite data. We achieve constant...
Average complexity of exact and approximate multiple string matching (2009)
Gonzalo Navarro, Kimmo Fredriksson
We show that the average number of characters examined to search for r random patterns of length m in a text of length n over a uniformly distributed alphabet of size σ cannot be less than Ω(n log...
Diego Arroyuelo, Gonzalo Navarro
Given a text T[1..u] over an alphabet of size σ, the full-text search problem consists in finding the occ occurrences of a given pattern P[1..m] in T. The current trend in indexed text searching is...
Rank/Select on Dynamic Compressed Sequences and Applications ⋆ (2009)
Rodrigo González, Gonzalo Navarro
Operations rank and select over a sequence of symbols have many applications to the design of succinct and compressed data structures managing text collections, structured text, binary relations,...
Heikki Hyyr Ö, Gonzalo Navarro
Communicated by Editor’s name Local similarity computation between two sequences permits detecting all the relevant alignments present between subsequences thereof. A well-known dynamic programming...
Indexed Hierarchical Approximate String Matching (2009)
Gonzalo Navarro, Arlindo L. Oliveira
Abstract. We present a new search procedure for approximate string matching over suffix trees. We show that hierarchical verification, which is a well-established technique for on-line searching, can...
First Huffman, then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index (2009)
Szymon Grabowski, Veli Mäkinen, Gonzalo Navarro
Abstract. We design a succinct full-text index based on the idea of Huffman-compressing the text and then applying the Burrows-Wheeler transform over it. The resulting structure can be searched as an...
We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough ℓ-grams from text windows so as to prove that no occurrence can contain the part of the...
A Compressed Text Index on Secondary Memory ∗ (2009)
Rodrigo González, Gonzalo Navarro
We introduce a practical disk-based compressed text index that, when the text is compressible, takes much less space than the suffix array. It provides good I/O times for searching, which in...
Using Structural Contexts to Compress Semistructured Text Collections ∗† (2009)
Joaquín Adiego, Gonzalo Navarro, Pablo Fuente
We describe a compression model for semistructured documents, called Structural Contexts Model (SCM), which takes advantage of the context information usually implicit in the structure of the text....
Re-Pair Compression of Inverted Lists (2009)
Francisco Claude, Antonio Fariña, Gonzalo Navarro
Compression of inverted lists with methods that support fast intersection operations is an active research topic. Most compression schemes rely on encoding differences between consecutive positions...
Self-Indexing Natural Language ⋆ (2009)
Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, Angeles S. Places, Eduardo Rodríguez
Abstract. Self-indexing is a concept developed for indexing arbitrary strings. It has been enormously successful to reduce the size of the large indexes typically used on strings, namely suffix trees...
Simple, Fast, and Efficient Natural Language Adaptive Compression ⋆ (2009)
Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, José R. Paramá
Abstract. One of the most successful natural language compression methods is word-based Huffman. However, such a two-pass semi-static compressor is not well suited to many interesting real-time...
A Compressed Text Index on Secondary Memory (2009)
Rodrigo González, Gonzalo Navarro
Abstract. We introduce a practical disk-based compressed text index that, when the text is compressible, takes little more than the plain text size (and replaces it). It provides very good I/O times...
Storage and Retrieval of Individual Genomes (2009)
Veli Mäkinen, Gonzalo Navarro, Jouni Sirén, Niko Välimäki
Abstract. A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such...
First Huffman, then Burrows-Wheeler: A Simple Alphabet-Independent FM-Index (2009)
Szymon Grabowski, Veli Mäkinen, Gonzalo Navarro
Main Results. The basic string matching problem is to determine the occurrences of a short pattern P = p1p2...pm in a large text T = t1t2...tn, over an alphabet of size σ. Indexes are structures...
A.: Fully-compressed suffix trees (2009)
Gonzalo Navarro, Arlindo L. Oliveira
Abstract. Suffix trees are by far the most important data structure in stringology, with myriads of applications in fields like bioinformatics and information retrieval. Classical representations of...
Compressed Text Indexes with Fast Locate (2009)
Rodrigo González, Gonzalo Navarro
Abstract. Compressed text (self-)indexes have matured up to a point where they can replace a text by a data structure that requires less space and, in addition to giving access to arbitrary text...
Practical Construction of k-Nearest Neighbor Graphs in Metric Spaces ⋆ (2009)
Rodrigo Paredes, Edgar Chávez, Karina Figueroa, Gonzalo Navarro
Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u) be the k elements in U−{u} having the smallest distance to u. The k-nearest neighbor graph (knng) is a...
Improved dynamic rank-select entropy-bound structures (2009)
Rodrigo González, Gonzalo Navarro
Abstract. Operations rank and select over a sequence of symbols have many applications to the design of succinct and compressed data structures to manage text collections, structured text, binary...
Fast and Compact Prefix Codes (2009)
Gagie, Travis, Navarro, Gonzalo, Nekrich, Yakov
It is well-known that, given a probability distribution over $n$ characters, in the worst case it takes (\Theta (n \log n)) bits to store a prefix code with minimum expected codeword length. However,...
Fully-Functional Static and Dynamic Succinct Trees (2009)
Sadakane, Kunihiko, Navarro, Gonzalo
We propose new succinct representations of ordinal trees, which have been studied extensively. It is known that any $n$-node static tree can be represented in $2n + \order(n)$ bits and various...
Compressed Representations of Permutations, and Applications (2009)
Barbay, Jérémy, Navarro, Gonzalo
We explore various techniques to compress a permutation $\pi$ over n integers, taking advantage of ordered subsequences in $\pi$, while supporting its application $\pi$(i) and the application of its...
Optimal Binary Search Trees with Costs Depending on the Access Paths (2009)
Jayme L Szwarcfiter, Gonzalo Navarro, Jo'isa S. Oliveira, Ricardo Baeza-yates, Walter Cunto
We describe algorithms for constructing optimal binary search trees, in which the access cost to each key depends on the k preceeding keys which were reached in the path to the desired key. Two kinds...
Compressed Representations of Permutations, and Applications (2009)
Barbay, Jérémy, Navarro, Gonzalo
We explore various techniques to compress a permutation π over n integers, taking advantage of ordered subsequences in π, while supporting its application π(i) and the application of...
Compressed Representations of Permutations, and Applications (2009)
Barbay, Jérémy, Navarro, Gonzalo
We explore various techniques to compress a permutation π over n integers, taking advantage of ordered subsequences in π, while supporting its application π(i) and the application of...
Compressed Representations of Permutations, and Applications (2009)
Barbay, Jeremy, Navarro, Gonzalo
We explore various techniques to compress a permutation $pi$ over $n$ integers, taking advantage of ordered subsequences in $pi$, while supporting its application $pi(i)$ and the application of its...
Compressed representations of permutations, and applications (2009)
Jérémy Barbay, Gonzalo Navarro
by the Departamento de Ciencias de la Computación (DCC), Universidad de Chile, Santiago, Chile.
Approximate String Matching with Compressed Indexes (2009)
Gonzalo Navarro, Arlindo L. Oliveira, Pedro Morales
A compressed full-text self-index for a text T is a data structure requiring reduced space and able to search for patterns P in T. It can also reproduce any substring of T, thus actually replacing T....
Parallel Generation of Inverted Lists on a Network of Workstations (2008)
Berthier Ribeiro-neto, Jo~ao Paulo Kitajima, Gonzalo Navarro, Nivio Ziviani, Gonzalo Navarro
on a Network of Workstations Berthier A. Ribeiro-Neto y z Jo~ao Paulo Kitajima \Lambda x
Average-optimal single and multiple approximate string matching (2008)
We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough `-grams from text windows so as to prove that no occurrence can contain the part of the...
Gonzalo Navarro, Kimmo Fredriksson
We show that the average number of characters examined to search for r random patterns of length m in a text of length n over a uniformly distributed alphabet of size σ cannot be less than Ω(n log...
Compressed Text Indexes:From Theory to Practice! (2008)
Paolo Ferragina, Rodrigo González, Gonzalo Navarro, Rossano Venturini
A compressed full-text self-index represents a text in a compressed form and still answers queries efficiently. This technology represents a breakthrough over the text indexing techniques of the...
An(other) Entropy-Bounded Compressed Suffix Tree (2008)
Fischer, Johannes, Mäkinen, Veli, Navarro, Gonzalo
GN was partially funded by Millennium Institute for Cell Dynamics and Biotechnology, Grant ICM P05-001-F, Mideplan, Chile. VM was funded by the Academy of Finland under grant 119815.
Run-Length Compressed Indexes for Repetitive Sequence Collections (2008)
Mäkinen, Veli, Navarro, Gonzalo, Sirén, Jouni, Välimäki, Niko
Computing Reviews (1998) Categories and Subject Descriptors: E.4 Coding and Information Theory — data compaction and compression F.2.2 Analysis of Algorithms and Problem Complexity: Nonnumerical...
Simple, Fast, and Efficient Natural Language Adaptive Compression ⋆ (2008)
Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, José R. Paramá
Abstract. One of the most successful natural language compression methods is word-based Huffman. However, such a two-pass semi-static compressor is not well suited to many interesting real-time...
Approximate String Matching with Lempel-Ziv Compressed Indexes (2008)
Gonzalo Navarro, Arlindo L. Oliveira
Abstract. A compressed full-text self-index for a text T is a data structure requiring reduced space and able of searching for patterns P in T. Furthermore, the structure can reproduce any substring...
Practical Construction of k-Nearest Neighbor Graphs in Metric Spaces (2008)
Rodrigo Paredes, Edgar Chávez, Karina Figueroa, Gonzalo Navarro
Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u) be the k elements in U − {u} which have the smallest distance towards u. The k-nearest neighbor graph...
Average-optimal single and multiple approximate string matching (2008)
Kimmo Fredriksson, Gonzalo Navarro
Abstract. We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough ℓ-grams from text windows so as to prove that no occurrence can contain the...
Edgar Chávez, Gonzalo Navarro, Ricardo Baeza-yates, José Luis Marroquín
The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from...
Improving semistatic compression via pair-based coding (2008)
Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, José R. Paramá
Abstract. In the last years, new semistatic word-based byte-oriented compressors, such as Plain and Tagged Huffman and the Dense Codes, have been used to improve the efficiency of text retrieval...
Kimmo Fredriksson, Veli M Äkinen, Gonzalo Navarro
Communicated by Editor’s name Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique...
Departamento de Computación New Compression Codes for Text Databases (2008)
Tese Doutoral, Directores Nieves, R. Brisaboa, Gonzalo Navarro, A Coruña, ...
Aos meus pais e irmáns Aos meus sobriños Acknowledgements I consider that it is fair to thank my PhD supervisors: Nieves and Gonzalo. Without any kind of doubt, the research I have been doing...
Fully-Compressed Suffix Trees (2008)
Gonzalo Navarro, Arlindo L. Oliveira
Abstract. Suffix trees are by far the most important data structure in stringology, with myriads of applications in fields like bioinformatics and information retrieval. Classical representations of...
Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage,...
The Pre-history and Future of the Block-Sorting Compression Algorithm (2008)
Mike Burrows, David Wheeler, Lee Butterman, Nasir Memon, Paolo Ferragina, Raffaele Giancarlo, ...
2
We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough ℓ-grams from text windows so as to prove that no occurrence can contain the part of the...
AFaster Algorithm for Approximate String Matching Extended Abstract (2008)
Ricardo Baeza-yates, Gonzalo Navarro
We present a new algorithm for on-line approximate string matching. The algorithm is based on the simulation of a non-deterministic nite automaton built from the pattern and using the text as input....
Compressed Text Indexes:From Theory to Practice! (2008)
Paolo Ferragina, Rodrigo González, Gonzalo Navarro, Rossano Venturini
A compressed full-text self-index represents a text in a compressed form and still answers queries efficiently. This technology represents a breakthrough over the text indexing techniques of the...
Universidade Federal de Minas Gerais (2008)
Marco Ant^onio Cristo, Elaine Spinola Silva, Moura Barbosa, Jo~ao Paulo Kitajima, Berthier Ribeiro, Gonzalo Navarro, ...
Abstract. This paper presents experiments performed with an implementation of a quicksort-based parallel indexing algorithm. Besides the expected reduction in execution time, it was observed that the...
Fully Dynamic and Memory-Adaptative Spatial Approximation Trees (2008)
Diego Arroyuelo, Gonzalo Navarro, Nora Reyes, Depto De Informática
Hybrid dynamic spatial approximation trees are recently proposed data structures for searching in metric spaces, based on combining the concepts of spatial approximation and pivot based algorithms....
On Self-indexing Images - Image Compression with Added Value (2008)
Mäkinen, Veli, Navarro, Gonzalo
First author funded by the Academy of Finland under grant 119815. Second author funded by Yahoo! Research grant “Compact Data Structures”.
Lucian Ilie, Gonzalo Navarro, Sheng Yu
Abstract. We give faster algorithms for two methods of reducing the number of states in nondeterministic finite automata. The first uses equivalences and the second uses preorders. We develop...
Nordic Journal of Computing SUCCINCT SUFFIX ARRAYS BASED ON RUN-LENGTH ENCODING ∗ (2008)
Abstract. A succinct full-text self-index is a data structure built on a text T = t1t2...tn, which takes little space (ideally close to that of the compressed text), permits efficient search for the...
Practical Construction of k-Nearest Neighbor Graphs in Metric Spaces? (2008)
Rodrigo Paredes, Karina Figueroa, Gonzalo Navarro
Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u) be the k elements in U- {u} having the smallest distance to u. The k-nearest neighbor graph (knng) is a...
Using Structural Contexts to Compress Semistructured Text Collections ∗† (2008)
Joaquín Adiego, Gonzalo Navarro, Pablo Fuente
We describe a compression model for semistructured documents, called Structural Contexts Model (SCM), which takes advantage of the context information usually implicit in the structure of the text....
Compressed Compact SuOEx Arrays (2008)
1 Introduction and Related Work The classical problem in string matching is to determine the occ occurrences of a short pattern P = p1p2: : : pm in a large text T = t1t2: : : tn. Text and pattern are...
Paolo Ferragina, Giovanni Manzini, Gonzalo Navarro
Abstract. Given a sequence S = s1s2...sn of integers smaller than r = O(polylog(n)), we show how S can be represented using nH0(S) + o(n) bits, so that we can know any sq, as well as answer rank and...
Masaru Kitsuregawa, Betty Salzberg, Gonzalo Navarro, Ricardo Baeza-yates, Erkki Sutinen, Jorma Tarhio, ...
IntegratingDiverseInformationManagementSystems:ABriefSurvey..................................
Approximate String Matching with Lempel-Ziv Compressed Indexes (2008)
Gonzalo Navarro, Arlindo L. Oliveira
1 Introduction and Related Work Approximate string matching (ASM) is an important problem that arises inapplications related to text searching, pattern recognition, signal processing, and...
A Compressed Text Index on Secondary Memory (2008)
Abstract. We introduce a practical disk-based compressed text index that, when the text is compressible, takes much less space than the suffix array. It provides very good I/O times for searching,...
Average-optimal single and multiple approximate string matching (2008)
We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough `-grams from text windows so as to prove that no occurrence can contain the part of the...
Combining Structural and Textual Contexts for Compressing Semistructured Databases ∗ (2008)
Joaquín Adiego, Gonzalo Navarro
We describe a compression technique for semistructured documents, called SCMPPM, which combines the Prediction by Partial Matching technique with Structural Contexts Model (SCM) technique. SCMPPM...
An Index Data Structure for Searching in Metric Space Databases (2008)
Roberto Uribe, Gonzalo Navarro, Ricardo J. Barrientos
Abstract. This paper presents the Evolutionary Geometric Near-neighbor Access Tree (EGNAT) which is a new data structure devised for searching in metric space databases. The EGNAT is fully dynamic,...
Universidad del Bío-Bío (2008)
Gilberto Gutiérrez R, Andrea Rodríguez T, Gonzalo Navarro, José Orellana V, Gonzalo Navarro, Alejandro González O
This paper describes a new spatio-temporal access method (SEST-Index) that combines two approaches for modeling spatio-temporal information: snapshots and events. This method makes it possible to not...
Bit-Parallel Computation of Local Similarity Score Matrices with Unitary Weights (2008)
Local similarity computation between two sequences permits detecting all the relevant alignments present between subsequences thereof. A well-known dynamic programming algorithm works in time O(mn),...
A More Precise Solution to Two Problems on Tries (2008)
Gonzalo Navarro, Patricio V. Poblete
We use the Binomial Transform to address the problem of determining the average size and leaf depth of a trie. These problems lead to the need to find the poles of the function 1 1\Gammap s+1 \Gammaq...
Bit-parallel (δ,γ)-Matching and Suffix Automata (2008)
Maxime Crochemore, Costas S. Iliopoulos, Gonzalo Navarro, Yoan J. Pinzon, Alejandro Salinger
Matching is a string matching problem with applications to music retrieval. The goal is, given a pattern P 1...m and a text T 1...n on an alphabet of integers, find the occurrences P # of the pattern...
Improved Single and Multiple Approximate String Matching (2008)
Kimmo Fredriksson, Gonzalo Navarro
We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough `-grams from text windows so as to prove that no occurrence can contain the part of the...
An(other) entropy-bounded compressed suffix tree (2008)
Johannes Fischer, Gonzalo Navarro
Abstract. Suffix trees are among the most important data structures in stringology, with myriads of applications. Their main problem is space usage, which has triggered much research striving for...
Storage and Retrieval of Individual Genomes (2008)
Mäkinen, Veli, Navarro, Gonzalo, Sirén, Jouni, Välimäki, Niko
A repetitive sequence collection is one where portions of a emph{base sequence} of length $n$ are repeated many times with small variations, forming a collection of total length $N$. Examples of such...
An In-Memory XQuery/XPath Engine over a Compressed Structured Text Representation (2008)
Bonifati, Angela, Leighton, Gregory, Mäkinen, Veli, Maneth, Sebastian, Navarro, Gonzalo, Pugliese, Andrea
We describe the architecture and main algorithmic design decisions for an XQuery/XPath processing engine over XML collections which will be represented using a self-indexing approach, that is, a...
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections (2008)
Siren, Jouni, Välimäki, Niko, Mäkinen, Veli, Navarro, Gonzalo
A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such...
Run-length compressed indexes are superior for highly repetitive sequence collections (2008)
Jouni Sirén, Niko Välimäki, Veli Mäkinen, Gonzalo Navarro
Abstract. A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N. Examples of such...
Gilberto A. Gutiérrez, Avenida La, Gonzalo Navarro, M. Andrea Rodríguez
This work presents a new access method (LES-tree) for spatio-temporal databases that handles discrete change events over objects ’ spatial attributes. The main characteristic of this structure is...
Indexing Straight-Line Programs (2008)
Francisco Claude, Gonzalo Navarro
Straight-line programs offer powerful text compression by representing a text T[1, u] in terms of a context-free grammar of n rules, so that T can be recovered in O(u) time. However, the problem of...
Practical rank/select queries over arbitrary sequences (2008)
Francisco Claude, Gonzalo Navarro
Abstract. We present a practical study on the compact representation of sequences supporting rank, select, and access queries. While there are several theoretical solutions to the problem, only a few...
A.: Fully-Compressed Suffix Trees (2008)
Gonzalo Navarro, Arlindo L. Oliveira
Abstract. Suffix trees are by far the most important data structure in stringology, with myriads of applications in fields like bioinformatics and information retrieval. Classical representations of...
Compressed Text Indexes:From Theory to Practice! (2007)
Ferragina, Paolo, Gonzalez, Rodrigo, Navarro, Gonzalo, Venturini, Rossano
A compressed full-text self-index represents a text in a compressed form and still answers queries efficiently. This technology represents a breakthrough over the text indexing techniques of the...
MEDIACORE: A MULTIMEDIA INTERFACE COMPOSITION TOOLKIT (2007)
In this paper, the problem of complex interface development when adding multimedia features is analyzed. A model is proposed which focuses mainly in managing the complexity of control, a problem...
Autran Macedo, Marco Antonio Cristo, Elaine Spinola Silva, Moura Barbosa, Jo~ao Paulo Kitajima, Berthier Ribeiro, ...
Departamento de Ciencia da Computac~ao
matching, with application to protein (2007)
and simple character classes and bounded gaps pattern
Gonzalo Navarro, Nivio Ziviani, Ricardo Baeza-Yates
We present a fast compression and decompression technique for natural language texts. The novelty is that the exact search can be done on the compressed text directly, using any known sequential...
A Maple Package for Semidefinite Programming (2007)
Fatos Xhafa Gonzalo, Gonzalo Navarro
Introduction We present a package in Maple for solving Semidefinite Programming. Semidefinite Programming (SP) is the problem of optimizing a linear function of a symmetric matrix subject to linear...
A Maple Package for Semidefinite Programming (2007)
Introduction We present a package in Maple for solving Semidefinite Programming. Semidefinite Programming (SP) is the problem of optimizing a linear function of a symmetric matrix subject to linear...
Optimized Binary Search and Text Retrieval (2007)
Eduardo Fernandes, Eduardo Fernandes Barbosa, Gonzalo Navarro, Ricardo Baeza-yates, Chris Perleberg, Nivio Ziviani
We present an algorithm that minimizes the expected cost of indirect binary search for data with nonconstant access costs, such as disk data. Indirect binary search means that sorted access to the...
A Maple Package for Semidefinite Programming (2007)
Introduction We present a package in Maple for solving Semidefinite Programming. Semidefinite Programming (SP) is the problem of optimizing a linear function of a symmetric matrix subject to linear...
Diego Arroyuelo, Gonzalo Navarro, Nora Reyes
Hybrid dynamic spatial approximation trees are recently proposed data structures for searching in metric spaces, based on combining the concepts of spatial approximation and pivot based algorithms....
Exact and Approximate Two Dimensional Pattern Matching allowing Rotations (2007)
Kimmo Fredriksson, Gonzalo Navarro, Esko Ukkonen
We give fast filtering algorithms for searching a 2--dimensional pattern in a 2--dimensional text allowing any rotation of the pattern. We consider the cases of exact and approximate matching under...
Probabilistic proximity search: Fighting the curse of dimensionality in metric spaces (2007)
Proximity searches become very dicult on \high dimensional " metric spaces, that is, those whose histogram of distances has a large mean and/or a small variance. This so-called \curse of...
The date of receipt and acceptance will be inserted by the editor Abstract We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a...
Gonzalo Navarro, Ludovic M E, Laurent Heye
Abstract. We present a pattern matching approach to the problem of misuse detection in a computer system, which is formalized as the problem of multiple approximate pattern matching. This permits...
Gonzalo Navarro, Nivio Ziviani, Belo Horizonte Brasil, Belo Horizonte Brasil, Ricardo Baeza-yates
We present a fast compression and decompression technique for natural language texts. The novelty is that the exact search can be done on the compressed text directly, using any known sequential...
Optimal Binary Search Trees with Costs Depending on the Access Paths (2007)
Gonzalo Navarro, Ricardo Baeza-yates, Walter Cunto
We describe algorithms for constructing optimal binary search trees, in which the access cost of a key depends on the k preceding keys which were reached in the path to it. This problem has...
Gonzalo Navarro, Rodrigo Paredes
Abstract. Let G(V; A) be a connected graph with a nonnegative cost function d: A! R
The Spatial Approximation Tree (sa-tree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data structures in...
Improved Antidictionary Based Compression (2007)
Maxime Crochemore, Gonzalo Navarro
The compression of binary texts using antidictionaries is a novel technique based on the fact that some substrings (called "antifactors") never appear in the text. Let sb be an...
FASTER THAN FFT: ROTATION INVARIANT COMBINATORIAL TEMPLATE MATCHING (2007)
Kimmo Fredriksson, Gonzalo Navarro, Esko Ukkonen
In this article we consider the rotation invariant template matching problem from the combinatorial point of view. The problem is to nd the places and orientations in an image where a pattern can be...
Compressing Semistructured Text Databases (2007)
Joaqun Adiego, Gonzalo Navarro
Abstract We describe a compression model for semistructured documents, called Structural Contexts Model, which takes advantage of the context information usually implicit in the structure of the...
Nieves R. Brisaboa, Eva L. Iglesias, Gonzalo Navarro
Abstract. We present a new compression format for natural language texts, allowing both exact and approximate search without decompression. This new code {called End-Tagged Dense Code { has some...
Josué Kuri, Gonzalo Navarro, Ludovic Mé, Laurent Heye
• Misuse detection as a pattern matching problem
Running title: Template matching (2007)
Kimmo Fredriksson, Gonzalo Navarro, Esko Ukkonen
Work supported by the Academy of Finland In this article we consider the rotation invariant template matching problem from the combinatorial point of view. The problem is to nd the places and...
We present the rst nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family. Given a text of length u compressed into length n, and a...
Abstract. We present a new algorithm for string matching. The algorithm, called BNDM, is the bit-parallel simulation of a known (but recent) algorithm called BDM. BDM skips characters using a...
Fast and Flexible Word Searching on Compressed Text (2007)
Edleno Silva, De Moura, Gonzalo Navarro
We present a fast compression and decompression technique for natural language texts. The novelties are that (i) decompression of arbitrary portions of the text can be done very efficiently, (ii)...
Gonzalo Navarro, Nivio Ziviani, Ricardo Baeza-yates
www.dcc.uchile.cl/~rbaeza Abstract We present a fast compression and decompression scheme for natural language texts that allows efficient and flexible string matching by searching the compressed...
Gonzalo Navarro, Jani Tanninen, Jorma Tarhio
Abstract. We present a new index for approximate string matching. The index collects text q-samples, i.e. disjoint text substrings of length q, at fixed intervals and stores their positions. At...
Abstract. We present a new technique to encode a deterministic finite automaton (DFA). Based on the specific properties of Glushkov's nondeterministic finite automaton (NFA) construction...
Joao Paulo Kitajima, Gonzalo Navarro, Nivio Ziviani
We present a scalable algorithm for the parallel computation of inverted files for large text collections. The algorithm takes into account an environment of a high bandwidth network of workstations...
Optimized Indirect Binary Search and Text Retrieval (Preliminary Version) (2007)
Gonzalo Navarro, Eduardo Fern, Ez Barbosa, Chris Perleberg, Ricardo Baeza-yates, Nivio Ziviani
We present an algorithm that minimizes the expected cost of indirect binary search for data with non-constant access costs (e.g. disk data). The term "indirect " indicates that...
Abstract. We address the problem of string matching on Ziv-Lempel compressed text. The goal is to search a pattern in a text without uncompressing it. This is a highly relevant issue to keep...
Fast and flexible string matching by combining bit-parallelism and suffix automata (2007)
The most important features of a string matching algorithm are its efficiency and its flexibility. Efficiency has traditionally received more attention, while flexibility in the search pattern is...
, and Jos'e L. Marroqu'in (2007)
Gonzalo Navarro, Ricardo Baeza-yates
Abstract. The indexing algorithms and data structures for similarity searching in metric spaces seem to emerge from a great diversity, and different approaches have been proposed and analyzed...
Depto. de Ciencias de la Computaci'on (2007)
Ricardo Baeza-yates, Gonzalo Navarro
We consider the recently proposed XQL language, which is designed to query XML documents by content and structure. We show that an already existing model, namely "Proximal Nodes",...
Fixed Queries Array: A Fast and Economical Data Structure for Proximity Searching (2007)
Abstract. Pivot-based algorithms are effective tools for proximity searching in metric spaces. They allow trading space overhead for number of distance evaluations performed at query time. With...
Gonzalo Navarro, Marden Neubert, Nivio Ziviani, Ricardo Baeza-yates
Abstract. Inverted index compression, block addressing and sequential search on compressed text are three techniques that have been separately developed for efficient, low-overhead text retrieval....
Optimized Binary Search and Text Retrieval 1 (2007)
Eduardo Fernandes Barbosa, Gonzalo Navarro, Ricardo Baeza-yates, Chris Perleberg, Nivio Ziviani
Abstract. We present an algorithm that minimizes the expected cost of indirect binary search for data with non-constant access costs, such as disk data. Indirect binary search means that sorted...
Searching in Metric Spaces (2007)
Gonzalo Navarro, Ricardo Baeza-yates
The problem of searching the elements of a set which are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from...
Large Text Searching Allowing Errors M'arcio Drumond Ara'ujo 1 (2007)
Gonzalo Navarro, Nivio Ziviani
ritos/cyted and Chilean Fondecyt grants 1960881 and 1950622.
Abstract. We present a new algorithm to search regular expressions, which is able to skip text characters. The idea is to determine the minimum length ` of a string matching the regular expression,...
Linear Time Sorting of Skewed Distributions (2007)
Gonzalo Navarro, Nivio Ziviani
www.dcc.ufmg.br/~nivio This work presents an efficient linear average time algorithm to sort lists of integers that follow skewed distributions. It also studies a particular case where the list...
Edgar Ch'avez Escuela de Ciencias F'isico-Matem'aticas (2007)
Benjamin Bustos, Gonzalo Navarro, Mich M'exico
With a few exceptions, proximity search algorithms in metric spaces based on the use of pivots select them at random among the elements of the metric space. However, it is well known that the way in...
Karina Figueroa, Gonzalo Navarro
Given a database of sites or elements of a metric space, the all-k-nearest-neighbor problem consists in nding, for each element, its k nearest neighbors. Using the brute force approach the problem is...
An Effective Clustering Algorithm to Index High Dimensional Metric Spaces (2007)
A metric space consists of a collection of objects and a distance function defined among them, which satisfies the triangular inequality. The goal is to preprocess the set so that, given a set of...
Abstract. We present a new technique to encode a deterministic finite automaton (DFA). Based on the specific properties of Glushkov's nondeterministic finite automaton (NFA) construction...
Gonzalo Navarro, Mathieu Raffinot
matching, with application to protein searching
Fast and flexible string matching by combining bit-parallelism and suffix automata (2007)
The most important features of a string matching algorithm are its efficiency and its flexibility. Efficiency has traditionally received more attention, while flexibility in the search pattern is...
Practical Algorithms for Transposition-Invariant String-Matching (2007)
Kjell Lemström, Gonzalo Navarro, Yoan Pinzon
We consider the problems of (1) longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (that is, transposed) to match the second, and (2)...
Edgar Chavez Gonzalo, Gonzalo Navarro
not yet in a mature state. The end users do not have the equivalent of a full DBMS for multimedia indexing. There is, however, at least one rather robust, open source implementation of an index for...
Text Searching: Theory and Practice (2007)
We present the state of the art of the main component of text retrieval systems: the search engine. We outline the main lines of research and issues involved. We survey the relevant techniques in use...
A Practical Index for Genome Searching (2007)
Current search tools for computational biology trade e- ciency for precision, losing many relevant matches. We push in the direction of obtaining maximum eciency from an indexing scheme that does not...
(S,C)-Dense Coding: An Optimized Compression Code for Natural Language Text Databases (2007)
Nieves R. Brisaboa, Antonio Farina, Gonzalo Navarro, María F. Esteller
This work presents (s, c)-Dense Code, a new method for compressing natural language texts. This technique is a generalization of a previous compression technique called End-Tagged Dense Code that...
A Bit-parallel Suffix Automaton Approach for (δ, γ)-Matching in Music Retrieval (2007)
Maxime Crochemore, Costas S. Iliopoulos, Gonzalo Navarro, Yoan J. Pinzon
(delta,gamma)-Matching is a string matching problem with applications to music retrieval. The goal is, given a pattern P1...m and a text T1...n on an alphabet of integers, find the occurrences P of...
Average-optimal single and multiple approximate string matching (2007)
Kimmo Fredriksson, Gonzalo Navarro
Abstract. We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough #-grams from text windows so as to prove that no occurrence can contain the part...
Joaqun Adiego, Gonzalo Navarro
We describe a compression technique for semistructured documents, called SCMPPM, which combines the Prediction by Partial Matching (PPM) technique with the Structural Contexts Model (SCM) idea, which...
Abstract. The compact sux array (CSA) is a space-ecient full-text index, which is fast in practice to search for patterns in a static text. Compared to other compressed sux arrays (Grossi and Vitter,...
Applying the Contexts Model in Semistructured Text Databases (2007)
Joaqun Adiego, Gonzalo Navarro
Abstract We describe a compression model for semistructured documents, called Structural Contexts Model, which takes advantage of the context information usually implicit in the structure of the...
Su#x Arrays in Parallel (2007)
Mauricio Marn, Gonzalo Navarro
Abstract. Su#x arrays are powerful data structures for text indexing. In this paper we present parallel algorithms devised to increase throughput of su#x arrays on a multiple-query setting....
Compressed Full-Text Indexes (2007)
Navarro, Gonzalo, Mäkinen, Veli
Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes...
Lempel-Ziv compression of highly structured documents (2007)
Adiego, Joaquin, Navarro, Gonzalo, De La Fuente, Pablo
Publicación ISI
Rotation and lighting invariant template matching (2007)
Fredriksson, Kirnmo, Makinen, Veli, Navarro, Gonzalo
Publicación ISI
Lempel-Ziv compression of highly structured documents (2007)
Adiego, Joaquin, Navarro, Gonzalo, Fuente, Pablo De La
Publicación ISI
Rotation and lighting invariant template matching (2007)
Fredriksson, Kimmo, Mäkinen, Veli, Navarro, Gonzalo
Publicación ISI
Lightweight natural language text compression. Information Retrieval (2007)
Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, José R. Paramá
Variants of Huffman codes where words are taken as the source symbols are currently the most attractive choices to compress natural language text databases. In particular, Tagged Huffman Code by...
Compressed representations of sequences and full-text indexes (2007)
Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, Gonzalo Navarro
Abstract. Given a sequence S = s1s2... sn of integers smaller than r = O(polylog(n)), we show how S can be represented using nH0(S) + o(n) bits, so that we can know any sq, as well as answer rank and...
Lightweight natural language text compression. Information Retrieval (2007)
Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, José R. Paramá
Variants of Huffman codes where words are taken as the source symbols are currently the most attractive choices to compress natural language text databases. In particular, Tagged Huffman Code by...
Compressed text indexes with fast locate (2007)
Abstract. Compressed text (self-)indexes have matured up to a point where they can replace a text by a data structure that requires less space and, in addition to giving access to arbitrary text...
A Lempel-Ziv text index on secondary storage (2007)
Diego Arroyuelo, Gonzalo Navarro
Full-text searching consists in locating the occurrences of a given pattern P [1..m] in a text T [1..u], both sequences over an alphabet of size σ. In this paper we define a new index for full-text...
Compressed representations of sequences and full-text indexes (2007)
Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, Gonzalo Navarro
Abstract. Given a sequence S = s1s2... sn of integers smaller than r = O(polylog(n)), we show how S can be represented using nH0(S) + o(n) bits, so that we can know any sq, as well as answer rank and...
G.: A Lempel-Ziv text index on secondary storage (2007)
Diego Arroyuelo, Gonzalo Navarro
Abstract. Full-text searching consists in locating the occurrences of a given pattern P[1..m] in a text T[1..u], both sequences over an alphabet of size σ. In this paper we define a new index for...
V.: Compressed full-text indexes (2007)
Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes...
Compressed representations of sequences and full-text indexes (2007)
Paolo Ferragina, Giovanni Manzini, Veli M Äkinen, Gonzalo Navarro
A subset of the results presented in this paper appeared in the Proceedings of the 11th Symposium on String Processing and Information Retrieval (SPIRE), pages 150–160, 2004. The work was partially...
Compressing Web graphs like texts (2007)
The need to run different kinds of algorithms over large Web graphs motivates the research for compressed graph representations that permit accessing without decompressing them. At this point there...
Implicit compression boosting with applications to self-indexing (2007)
Abstract. Compression boosting (Ferragina & Manzini, SODA 2004) is a new technique to enhance zeroth order entropy compressors ’ performance to k-th order entropy. It works by constructing the...
Smaller and Faster Lempel-Ziv Indices (2007)
Diego Arroyuelo, Gonzalo Navarro
Given a text T[1..u] over an alphabet of size σ = O(polylog(u)) and with k-th order empirical entropy Hk(T), we propose a new compressed full-text self-index based on the Lempel-Ziv (LZ) compression...
A simple alphabet-independent FM-index (2006)
Grabowski, Szymon, Mäkinen, Veli, Navarro, Gonzalo, Salinger, Alejandro
We design a succinct full-text index based on the idea of Huffman-compressing the text and then applying the Burrows-Wheeler transform over it. The resulting structure can be searched as an FM-index,...
A metric index for approximate string matching (2006)
Chávez, Edgar, Navarro, Gonzalo
We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between strings. We...
Reducing the space requirement of LZ-index (2006)
Arroyuelo, Diego, Navarro, Gonzalo, Sadakane, Kunihiko
The LZ-index is a compressed full-text self-index able to represent a text T-1...u, over an alphabet of size sigma = O(polylog(u)) and with k-th order empirical entropy H-k(T), using 4uH(k)(T) + o(u...
A simple alphabet-independent FM-index (2006)
Szymon Grabowski, Gonzalo Navarro, Ro Salinger
A classic example of a full-text index is the suffix tree [20] reaching O(m + occ) time complexity for counting and reporting queries. Unfortunately, it takes O(n log n) bits,1 and also the constant...
Dynamic entropy-compressed sequences and full-text indexes (2006)
ACM Journal Name, Vol. V, No. N, Month 20YY, Pages 1-0??.
K.: Reducing the space requirement of LZ-Index (2006)
Diego Arroyuelo, Gonzalo Navarro, Kunihiko Sadakane
Abstract. The LZ-index is a compressed full-text self-index able to represent atext T 1...u, over an alphabet of size s = O(polylog(u)) and with k-th order em-pirical entropy H k(T), using 4uHk(T) +...
Position-restricted substring searching (2006)
Gonzalo Navarro, Ag Genominformatik
Our solutions can be seen as a generalization of rank and select dictionaries, which allow computing how many times a given character c appears in a prefix T [1, i] and also locate the i-th...
Position-restricted substring searching (2006)
1 Introduction and Related Work The indexed string matching problem is that of, given a long text T [1, n] overan alphabet
Dynamic entropy-compressed sequences and full-text indexes (2006)
Abstract. Given a sequence of n bits with binary zero-order entropy H0, we present a dynamic data structure that requires nH0 + o(n) bits of space, which is able of performing rank and select, as...
Compressed full-text indexes (2006)
Full-text indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes...
On the least cost for proximity searching in metric spaces (2006)
Figueroa, Karina, Chávez, Edgar, Navarro, Gonzalo, Paredes, Rodrigo
Proximity searching consists in retrieving from a database those elements that are similar to a query. As the distance is usually expensive to compute, the goal is to use as few distance computations...
Statistical encoding of succinct data structures (2006)
González, Rodrigo, Navarro, Gonzalo
In recent work, Sadakane and Grossi [SODA 2006] introduced a scheme to represent any sequence S = s(1)s(2)...s(n), over an alphabet of size sigma, using nH(k)(S) + O(log(sigma)n/n (k log sigma + log...
Dynamic entropy-compressed sequences and full-text indexes (2006)
Mäkinen, Veli, Navarro, Gonzalo
Given a sequence of n bits with binary zero-order entropy H-o, we present a dynamic data structure that requires nH(o) + o(n) bits of space, which is able of performing rank and select, as well as...
Position-Restricted Substring Searching (2006)
Mäkinen, Veli, Navarro, Gonzalo
A full-text index is a data structure built over a text string T[1, n]. The most basic functionality provided is (a) counting how many times a pattern string P[1,m] appears in T and (b) locating all...
Practical construction of k-nearest neighbor graphs in metric spaces (2006)
Paredes, Rodrigo, Chávez, Edgar, Figueroa, Karina, Navarro, Gonzalo
Let U be a set of elements and d a distance function defined among them. Let NNk (u) be the k elements in U - {u} having the smallest distance to u. The k-nearest neighbor graph (kNNG) is a weighted...
Statistical encoding of succinct data structures (2006)
Rodrigo González, Gonzalo Navarro
Abstract. In recent work, Sadakane and Grossi [SODA 2006] introduced a scheme to represent any (k log σ + log log n)) bits sequence S = s1s2... sn, over an alphabet of size σ, using nHk(S) + O ( n...
An index data structure for searching in metric space databases (2006)
Roberto Uribe, Gonzalo Navarro, Ricardo J. Barrientos, Mauricio Marín
Abstract. This paper presents the Evolutionary Geometric Near-neighbor Access Tree (EGNAT) which is a new data structure devised for searching in metric space databases. The EGNAT is fully dynamic,...
Position-restricted substring searching (2006)
Veli Mäkinen, Gonzalo Navarro, Ag Genominformatik
Abstract. A full-text index is a data structure built over a text string T[1, n]. The most basic functionality provided is (a) counting how many times a pattern string P[1, m] appears in T and (b)...
Compressed full-text indexes (2006)
Full-text indexes provide fast search for patterns over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes...
Rank and select revisited and extended (2006)
The deep connection between the Burrows-Wheeler transform (BWT) and the socalled rank and select data structures for symbol sequences is the basis of most successful approaches to compressed text...
New bounds on D-ary optimal codes (2005)
Navarro, Gonzalo, Brisaboa, Nieves
We propose a simple method that, given a symbol distribution, yields upper and lower bounds on the average code length of a D-ary optimal code over that distribution. Thanks to its simplicity, the...
Sequential and indexed two-dimensional combinatorial template matching allowing rotations (2005)
Fredriksson, Kimmo, Navarro, Gonzalo, Ukkonen, Esko
We present new and faster algorithms to search for a two-dimensional pattern in a two-dimensional text allowing any rotation of the pattern. This has applications such as image databases and...
Transposition invariant string matching (2005)
Mäkinen, Veli, Navarro, Gonzalo, Ukkonen, Esko
Given strings A = a(1)a(2)...a(m) and B=b(1)b(2)...b(n) over an alphabet Sigma subset of U, where U is some numerical universe closed under addition and subtraction, and a distance function d(A, B)...
A compact space decomposition for effective metric indexing (2005)
Chávez, Edgar, Navarro, Gonzalo
he metric space model abstracts many proximity search problems, from nearest-neighbor classifiers to textual and multimedia information retrieval. In this context, an index is a data structure that...
Bit-parallel witnesses and their applications to approximate string matching (2005)
Hyyrö, Heikki, Navarro, Gonzalo
We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The first one, BPM (Myers, 1999), searches for a pattern of length m in a text of length...
Space-efficient construction of LZ-index (2005)
Arroyuelo, Diego, Navarro, Gonzalo
A compressed full-text self-index is a data structure that replaces a text and in addition gives indexed access to it, while taking space proportional to the compressed text size. The LZ-index, in...
Bit-parallel witnesses and their applications to approximate string matching (2005)
We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The first one, BPM [Myers, J. of the ACM, 1999], searches for a pattern of length m in a...
Flexible music retrieval in sublinear time (2005)
Kimmo Fredriksson, Veli Mäkinen, Gonzalo Navarro
Abstract. Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique to music retrieval. Up...
A compact space decomposition for effective metric indexing (2005)
Abstract The metric space model abstracts many proximity search problems, from nearest-neighborclassifiers to textual and multimedia information retrieval. In this context, an index is a data...
Kimmo Fredriksson, Gonzalo Navarro, Esko Ukkonen
z Abstract We present new and faster algorithms to search for a 2-dimensional pattern in a 2-dimensional text allowing any rotation of the pattern. This has applications such as image databases and...
LZgrep: A Boyer-Moore String Matching Tool for Ziv-Lempel Compressed Text (2005)
We present a Boyer-Moore approach to string matching over LZ78 and LZW compressed text. The idea is to search the text directly in compressed form instead of decompressing and then searching it. We...
A compact space decomposition for effective metric indexing (2005)
The metric space model abstracts many proximity search problems, from nearest-neighbor classifiers to textual and multimedia information retrieval. In this context, an index is a data structure that...
Kimmo Fredriksson, Gonzalo Navarro, Esko Ukkonen
We present new and faster algorithms to search for a 2-dimensional pattern in a 2-dimensional text allowing any rotation of the pattern. This has applications such as image databases and...
Kimmo Fredriksson, Gonzalo Navarro, Esko Ukkonen
Abstract We present new and faster algorithms to search for a 2-dimensional pattern in a 2-dimensionaltext allowing any rotation of the pattern. This has applications such as image databases and...
Flexible music retrieval in sublinear time (2005)
Kimmo Fredriksson, Gonzalo Navarro
Abstract. Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique to music retrieval. Up...
Succinct suffix arrays based on run-length encoding (2005)
Gonzalo Navarro, Ag Genominformatik
1 Introduction The classical problem in string matching is to determine the occ occurrences ofa short pattern P = p1p2... pm in a large text T = t1t2... tn. Text and patternare sequences of...
LZgrep: A Boyer-Moore String Matching Tool for Ziv-Lempel Compressed Text (2005)
We present a Boyer-Moore approach to string matching over LZ78 and LZW compressed text. The idea is to search the text directly in compressed form instead of decompressing and then searching it. We...
Space-efficient construction of LZ-index (2005)
Diego Arroyuelo, Gonzalo Navarro
Abstract. A compressed full-text self-index is a data structure that replaces a text and in addition gives indexed access to it, while taking space proportional to the compressed text size. The...
Bit-parallel witnesses and their applications to approximate string matching (2005)
Abstract. We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The first one, BPM (Myers, 1999), searches for a pattern of length m in a text...
Space-efficient construction of LZ-index (2005)
Diego Arroyuelo, Gonzalo Navarro
Abstract. A compressed full-text self-index is a data structure that replaces a text and in addition gives indexed access to it, while taking space proportional to the compressed text size. The...
Succinct suffix arrays based on run-length encoding (2005)
A succinct full-text self-index is a data structure built on a text T = t1t2...tn, which takes little space (ideally close to that of the compressed text), permits efficient search for the...
Flexible music retrieval in sublinear time (2005)
Kimmo Fredriksson, Veli Mäkinen, Gonzalo Navarro
Abstract. Music sequences can be treated as texts in order to perform music retrieval tasks on them. However, the text search problems that result from this modeling are unique to music retrieval. Up...
Space-efficient construction of LZ-index (2005)
Diego Arroyuelo, Gonzalo Navarro
Abstract. A compressed full-text self-index is a data structure that replaces a text and in addition gives indexed access to it, while taking space proportional to the compressed text size. The...
A compact space decomposition for effective metric indexing (2005)
The metric space model abstracts many proximity search problems, from nearest-neighbor classifiers to textual and multimedia information retrieval. In this context, an index is a data structure that...
Practical construction of k nearest neighbor graphs in metric spaces (2005)
Rodrigo Paredes, Gonzalo Navarro
Abstract. Let U be a set of elements and d a distance function defined among them. Let NNk(u)d be the k elements in U − {u} which have the smallest distance to u. The k-nearest neighbors graph...
We present a Boyer-Moore approach to string matching over LZ78 and LZW compressed text. The idea is to search the text directly in compressed form instead of decompressing and then searching it. We...
Practical implementation of rank and select queries (2005)
Rodrigo González, Szymon Grabowski, Veli Mäkinen, Gonzalo Navarro
Research on succinct data structures has made significant progress in recent years. An essential building block of many of those techniques is a data structure to perform rank and select operations...
A Simple Alphabet-Independent FM-Index (2005)
Szymon Grabowski, Veli Mäkinen, Gonzalo Navarro, Alejandro Salinger
We design a succinct full-text index based on the idea of Huffman-compressing the text and then applying the Burrows-Wheeler transform over it. The resulting structure can be searched as an FM-index,...
An important subtask of the pattern discovery process is pattern matching, where the pattern sought is already known and we want to determine how often and where it occurs in a sequence. In this...
Average complexity of exact and approximate multiple string matching (2004)
Navarro, Gonzalo, Fredriksson, Kimmo
We show that the average number of characters examined to search for r random patterns of length m in a text of length n over a uniformly distributed alphabet of size a cannot be less than Omega(n...
An alphabet-friendly FM-index (2004)
Ferragina, Paolo, Manzini, Giovanni, Mäkinen, Veli, Navarro, Gonzalo
We show that, by combining an existing compression boosting technique with the wavelet tree data structure, we are able to design a variant of the FM-index which scales well with the size of the...
Improved single and multiple approximate string matching (2004)
Fredriksson, Kimmo, Navarro, Gonzalo
We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough l-grams from text windows so as to prove that no occurrence can contain the part of the...
Rotation and lighting invariant template matching (2004)
Fredriksson, Kimmo, Mäkinen, Veli, Navarro, Gonzalo
We address the problem of searching for a two-dimensional pattern in a two-dimensional text (or image), such that the pattern can be found even if it appears rotated and brighter or darker than its...
Increased bit-parallelism for approximate string matching (2004)
Hyyrö, Heikki, Fredriksson, Kimmo, Navarro, Gonzalo
Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate occurrences of...
Probabilistic Proximity Search Algorithms Based on Compact Partitions (2004)
Bustos, Benjamin, Navarro, Gonzalo
The main bottleneck of the research in metric space searching is the so-called curse of dimensionality, which makes the task of searching some metric spaces intrinsically difficult, whatever...
An alphabet-friendly FM-index (2004)
Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, Gonzalo Navarro
Abstract. We show that, by combining an existing compression boosting technique with the wavelet tree data structure, we are able to design a variant of the FM-index which scales well with the size...
Increased Bit-Parallelism for Approximate String Matching (2004)
Heikki Hyyrö, Kimmo Fredriksson, Gonzalo Navarro
Abstract. Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate...
Lempel-Ziv compression of structured text (2004)
Joaqun Adiego, Gonzalo Navarro
We describe a novel Lempel-Ziv approach suitable for compressing structured documents, called LZCS, which takes advantage of redundant information that can appear in the structure. The main idea is...
Average complexity of exact and approximate multiple string matching (2004)
Gonzalo Navarro, Kimmo Fredriksson
We show that the average number of characters examined to search for r random patterns of length m in a text of length n over a uniformly distributed alphabet of size # cannot be less than# n log #...
Average complexity of exact and approximate multiple string matching (2004)
Gonzalo Navarro, Kimmo Fredriksson
We show that the average number of characters examined to search for r random patterns of length m in a text of length n over a uniformly distributed alphabet of size # cannot be less than# n log #...
Practical and Flexible Pattern Matching over Ziv-Lempel Compressed Text (2004)
Gonzalo Navarro, Mathieu Raffinot
We address the problem of string matching on Ziv-Lempel compressed text. The goal is to search a pattern in a text without uncompressing it. This is a highly relevant issue to keep compressed text...
Increased Bit-Parallelism for Approximate String Matching (2004)
Heikki Hyyrö, Kimmo Fredriksson, Gonzalo Navarro
Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word.
Rotation and Lighting Invariant Template Matching (2004)
Kimmo Fredriksson, Veli Veli Mäkinen, Gonzalo Navarro
We address the problem of searching for a two-dimensional pattern in a two-dimensional text (or image), such that the pattern can be found even if it appears rotated and brighter or darker than its...
New Search Algorithms and Time/Space Tradeoffs for Succinct Suffix Arrays (2004)
Abstract This paper is about compressed full-text indexes. That is, our goal is to represent full-text indexes in as small space as possible and, at the same time, retain the functionality of the...
Carlos Castillo, Dr. Alistair Moffat, Dr. Gonzalo Navarro
The key factors for the success of the World Wide Web are its large size and the lack of a cen-tralized control over its contents. Both issues are also the most important source of problems for...
Carlos Castillo, Dr. Alistair Moffat, Dr. Gonzalo Navarro
The key factors for the success of the World Wide Web are its large size and the lack of a centralized control over its contents. Both issues are also the most important source of problems for...
The FM-index is a succinct text index needing only O(Hkn) bits of space, where n is the text size and Hk is the kth order entropy of the text. FM-index assumes constant alphabet; it uses exponential...
First Huffman, then Burrows-Wheeler: an alphabet-independent FM-index (2004)
Szymon Grabowski, Gonzalo Navarro
In this paper we present a simple variant of the FM-index, which removes its alphabet dependence. We achieve this by, essentially (but not exactly), Huffman-compressing the text and FM-indexing the...
An alphabet-friendly FM-index (2004)
Paolo Ferragina, Giovanni Manzini, Veli Mäkinen, Gonzalo Navarro
Abstract. We show that, by combining an existing compression boosting technique with the wavelet tree data structure, we are able to design a variant of the FM-index which scales well with the size...
Veli Mäkinen, Gonzalo Navarro, Kunihiko Sadakane
Abstract. One of the most relevant succinct suffix array proposals in the literature is the Compressed Suffix Array (CSA) of Sadakane [ISAAC 2000]. The CSA needs n(H0 + O(log log σ)) bits of space,...
Increased Bit-Parallelism for Approximate String Matching (2004)
Heikki Hyyrö, Kimmo Fredriksson, Gonzalo Navarro
Abstract. Bit-parallelism permits executing several operations simultaneously over a set of bits or numbers stored in a single computer word. This technique permits searching for the approximate...
Bit-parallel branch and bound algorithm for transposition invariant LCS (2004)
Kjell Lemström, Gonzalo Navarro, Yoan Pinzon
Main Results. We consider the problem of longest common subsequence (LCS) of two given strings in the case where the first may be shifted by some constant (i.e. transposed) to match the second. For...
A bit-parallel suffix automaton approach for (delta, gamma)-matching in music retrieval (2003)
Crochemore, Maxime, Iliopoulos, Costas, Navarro, Gonzalo, Pinzon, Yoan, Salinger, Alejandro
delta, gamma)-Matching is a string matching problem with applications to music retrieval. The goal is, given a pattern P-1...m and a text T-1...n on an alphabet of integers, find the occurrences P'...
Pivot Selection Techniques for Proximity Searching in Metric Spaces (2003)
Bustos, Benjamin, Navarro, Gonzalo, Chávez, Edgar
With few exceptions, proximity search algorithms in metric spaces based on the use of pivots select them at random among the objects of the metric space. However, it is well known that the way in...
Regular expression searching on compressed text (2003)
We present a solution to the problem of regular expression searching on compressed text. The format we choose is the Ziv-Lempel family, specically the LZ78 and LZW variants. Given a text of length u...
Matchsimile: A Flexible Approximate Matching Tool for Searching Proper Names (2003)
Gonzalo Navarro, Ricardo Baeza-yates, Azevedo Arcoverde
In this paper we present the architecture and algorithms behind Matchsimile, an approximate string matching lookup tool especially designed for human and company names searches against a large...
Rotation and lighting invariant template matching (2003)
Kimmo Fredriksson, Veli Mäkinen, Gonzalo Navarro
We address the problem of searching for a two-dimensional pattern in a two-dimensional text (or image), such that the pattern can be found even if it appears rotated and it is brighter or darker than...
Rotation and lighting invariant template matching (2003)
Kimmo Fredriksson, Veli Mäkinen, Gonzalo Navarro
Abstract. We address the problem of searching for a two-dimensional pattern in a two-dimensional text (or image), such that the pattern can be found even if it appears rotated and brighter or darker...
Average-optimal multiple approximate string matching (2003)
Kimmo Fredriksson, Gonzalo Navarro
Abstract. We present a new algorithm for multiple approximate string matching, based on an extension of the optimal (on average) singlepattern approximate string matching algorithm of Chang and Marr....
Rotation and lighting invariant template matching (2003)
Kimmo Fredriksson, Gonzalo Navarro
Abstract We address the problem of searching for a two-dimensional pattern in a two-dimensionaltext (or image), such that the pattern can be found even if it appears rotated and it is brighter or...
Suffix Arrays in Parallel (2003)
Sux arrays are powerful data structures for text indexing. Unlike the popular inverted les, sux arrays do not depend on the existence of words in the text and are able of solving much more complex...
Average-optimal multiple approximate string matching (2003)
Kimmo Fredriksson, Gonzalo Navarro
Abstract. We present a new algorithm for multiple approximate string matching, based on an extension of the optimal (on average) singlepattern approximate string matching algorithm of Chang and Marr....
Distributed Query Processing using Suffix Arrays (2003)
Mauricio Marin, Gonzalo Navarro
Abstract. Suffix arrays are more efficient than inverted files for solving complex queries in a number of applications related to text databases. Examples arise when dealing with biological or...
Matchsimile: A Flexible Approximate Matching Tool for Searching Proper Names (2003)
Gonzalo Navarro, Ricardo Baeza-yates, Azevedo Arcoverde
In this paper we present the architecture and algorithms behind Matchsimile, an approximate string matching lookup tool especially designed for human and company names searches against a large...
Improved Deletions in Dynamic Spatial Approximation Trees (2003)
Gonzalo Navarro Nora, Gonzalo Navarro, Nora Reyes
The Dynamic Spatial Approximation Tree (dsa--tree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data...
SCM: Structural Contexts Model for Improving Compression in Semistructured Text Databases (2003)
Joaquín Adiego, Gonzalo Navarro
We describe a compression model for semistructured documents, called Structural Contexts Model, which takes advantage of the context information usually implicit in the structure of the text. The...
A Practical Index for Genome Searching (2003)
Current search tools for computational biology trade e#- ciency for precision, losing many relevant matches. We push in the direction of obtaining maximum e#ciency from an indexing scheme that does...
Rotation and lighting invariant template matching (2003)
Kimmo Fredriksson, Gonzalo Navarro
Abstract We address the problem of searching for a two-dimensional pattern in a two-dimensionaltext (or image), such that the pattern can be found even if it appears rotated and it is brighter or...
Rotation and lighting invariant template matching (2003)
Kimmo Fredriksson, Veli Mäkinen, Gonzalo Navarro
Abstract. We address the problem of searching for a two-dimensional pattern in a two-dimensional text (or image), such that the pattern can be found even if it appears rotated and brighter or darker...
Algorithms for Transposition Invariant String Matching (Extended Abstract) (2003)
Veli Mäkinen, Gonzalo Navarro, Esko Ukkonen
Given strings A and B over an alphabet \Sigma...
Rotation and lighting invariant template matching (2003)
Kimmo Fredriksson, Veli Mäkinen, Gonzalo Navarro
Abstract. We address the problem of searching for a two-dimensional pattern in a two-dimensional text (or image), such that the pattern can be found even if it appears rotated and brighter or darker...
Kjell Lemström, Gonzalo Navarro
Recent research in music retrieval has shown that a combinatorial approach to the problem could be fruitful. Three distinguishing requirements of this particular problem are (a) approximate searching...
A Practical Index for Genome Searching (2003)
Current search tools for computational biology trade e#- ciency for precision, losing many relevant matches. We push in the direction of obtaining maximum e#ciency from an indexing scheme that does...
Distributed query processing using suffix arrays (2003)
Mauricio Marín, Gonzalo Navarro
Abstract. Suffix arrays are more efficient than inverted files for solving complex queries in a number of applications related to text databases. Examples arise when dealing with biological or...
Improved deletions in dynamic spatial approximation trees (2003)
The Dynamic Spatial Approximation Tree (dsa–tree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data...
Rotation and lighting invariant template matching (2003)
Kimmo Fredriksson, Veli Mäkinen, Gonzalo Navarro
We address the problem of searching for a two-dimensional pattern in a two-dimensional text (or image), such that the pattern can be found even if it appears rotated and it is brighter or darker than...
Distributed query processing using suffix arrays (2003)
Mauricio Marín, Gonzalo Navarro
Abstract. Suffix arrays are more efficient than inverted files for solving complex queries in a number of applications related to text databases. Examples arise when dealing with biological or...
S,C)-Dense Coding: An optimized compression code for natural language text databases (2003)
Nieves R. Brisaboa, Antonio Fariña, Gonzalo Navarro, María F. Esteller
Abstract. This work presents (s, c)-Dense Code, a new method for compressing natural language texts. This technique is a generalization of a previous compression technique called End-Tagged Dense...
Probabilistic Proximity Searching Algorithms Based on Compact Partitions (2002)
Bustos, Benjamin, Navarro, Gonzalo
The main bottleneck of the research in metric space searching is the so-called curse of dimensionality, which makes the task of searching some metric spaces intrinsically difficult, whatever...
Probabilistic proximity searching algorithms based on compact partitions (2002)
Benjamin Bustos, Gonzalo Navarro
Abstract. The main bottleneck of the research in metric space searching is the so-called curse of dimensionality, which makes the task of searching some metric spaces intrinsically dicult, whatever...
A Metric Index for Approximate String Matching (2002)
Abstract. We present a radically new indexing approach for approximate string matching. The scheme uses the metric properties of the edit distance and can be applied to any other metric between...
Faster bit-parallel approximate string matching (2002)
Abstract. We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The first one [Myers, J. of the ACM, 1999], searches for a pattern of length m...
Indexing text using the ziv-lempel trie (2002)
Partially supported by Fondecyt Grant 1-020831. Abstract. Let a text of u characters over an alphabet of size be compressible to n symbols by the LZ78 or LZW algorithm. We show that it is possible to...
Probabilistic proximity searching algorithms based on compact partitions (2002)
Benjamin Bustos, Gonzalo Navarro
Abstract. The main bottleneck of the research in metric space searching is the so-called curse of dimensionality, which makes the task of searching some metric spaces intrinsically difficult,...
t-spanners as a data structure for metric space searching (2002)
Gonzalo Navarro, Rodrigo Paredes
Abstract. A t-spanner, a subgraph that approximates graph distances within a precision factor t, is a well known concept in graph theory. In this paper we use it in a novel way, namely as a data...
Fully dynamic spatial approximation trees (2002)
Abstract. The Spatial Approximation Tree (sa-tree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data...
Optimal exact and fast approximate two dimensional pattern matching allowing rotations (2002)
Kimmo Fredriksson, Gonzalo Navarro, Esko Ukkonen
Abstract. We give fast filtering algorithms to search for a 2- dimensional pattern in a 2-dimensional text allowing any rotation of the pattern. We consider the cases of exact and approximate...
Faster bit-parallel approximate string matching (2002)
Abstract. We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The rst one [Myers, J. of the ACM, 1999], searches for a pattern of length m in...
Optimal exact and fast approximate two dimensional pattern matching allowing rotations (2002)
Kimmo Fredriksson, Gonzalo Navarro, Esko Ukkonen
Abstract. We give fast ltering algorithms to search for a 2{ dimensional pattern in a 2{dimensional text allowing any rotation of the pattern. We consider the cases of exact and approximate matching...
Fully dynamic spatial approximation trees (2002)
The Spatial Approximation Tree (sa-tree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data structures in...
A Metric Index for Approximate String Matching (2002)
We present a totally new indexing approach for approximate string matching. The scheme uses the triangular inequality property satisfied by the edit distance and can be applied to any other distance...
An important subtask of the pattern discovery process is pattern matching, where the pattern sought is already known and we want to determine how often and where it occurs in a sequence.
Approximate Regular Expression Searching with Arbitrary Integer Weights (2002)
We present a bit-parallel technique to search a text of length n for a regular expression of m symbols permitting k differences in worst case time O(mn= log k s), where s is the amount of main memory...
Faster Bit-parallel Approximate String Matching (2002)
We present a new bit-parallel technique for approximate string matching. We build on two previous techniques. The first one [Myers, J. of the ACM, 1999], searches for a pattern of length m in a text...
Indexing Text using the Ziv-Lempel Trie (2002)
Let a text of u characters over an alphabet of size be compressible to n symbols by the LZ78 or LZW algorithm. We show how to build a data structure, called the LZ-index, based on the Ziv-Lempel trie...
Approximate Regular Expression Searching with Arbitrary Integer Weights (2002)
We present a bit-parallel technique to search a text of length n for a regular expression of m symbols permitting k differences in worst case time O(mn/log_k s), where s is the amount of main memory...
Indexing Text using the Ziv-Lempel Trie (2002)
Let a text of u characters over an alphabet of size oe be compressible to n symbols by the LZ78 or LZW algorithm. We show that it is possible to build a data structure based on the Ziv-Lempel trie...
Algorithms for Transposition Invariant String Matching (2002)
Veli Mäkinen, Gonzalo Navarro, Esko Ukkonen
Given strings A and B over an alphabet Sigma \subset U, where U is some numerical universe closed under addition and subtraction, and a distance function d(A, B) that gives the score of the best...
Fully dynamic spatial approximation trees (2002)
Metric space searching is an emerging technique to address the problem of efficient similarity searching in many applications, including multimedia databases and other repositories handling complex...
Flexible pattern matching (2002)
An important subtask of the pattern discovery process is pattern matching, where the pattern sought is already known and we want to determine how often and where it occurs in a sequence. In this...
Flexible pattern matching (2002)
Abstract An important subtask of the pattern discovery process is pattern matching, where the pattern sought is already known and we want to determine how often and where it occurs in a sequence. In...
Fully dynamic spatial approximation trees (2002)
Gonzalo Navarro, Nora Reyes, Depto De Informática, San Luis
Abstract. The Spatial Approximation Tree (sa-tree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data...
Fully dynamic spatial approximation trees (2002)
Gonzalo Navarro, Nora Reyes, San Luis
Metric space searching is an emerging technique to address the problem of efficient similarity searching in many applications, including multimedia databases and other repositories handling complex...
Fully dynamic spatial approximation trees (2002)
Gonzalo Navarro, Nora Reyes, San Luis
Metric space searching is an emerging technique to address the problem of efficient similarity searching in many applications, including multimedia databases and other repositories handling complex...
Flexible pattern matching (2002)
An important subtask of the pattern discovery process is pattern matching, where the pattern sought is already known and we want to determine how often and where it occurs in a sequence. In this...
G.: Improved antidictionary based compression (2002)
Maxime Crochemore, Gonzalo Navarro
The compression of binary texts using antidictionaries is a novel technique based on the fact that some substrings (called “antifactors”) never appear in the text. Let�be an antifactor,...
NR-grep: a fast and flexible pattern matching tool (2001)
We present nrgrep ("nondeterministic reverse grep"), a new pattern matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as...
Indexing Methods for Approximate String Matching (2001)
Gonzalo Navarro, Ricardo Baeza-yates, Erkki Sutinen, Jorma Tarhio
Indexing for approximate text searching is a novel problem receiving much attention because of its applications in signal processing, computational biology and text retrieval, to name a few. We...
Gonzalo Navarro, Mathieu Ranot
The problem of fast exact and approximate searching for a pattern that contains classes of characters and bounded size gaps (CBG) in a text has a wide range of applications, among which a very...
Regular expression searching over Ziv-Lempel compressed text (2001)
We present a solution to the problem of regular expression searching on compressed text. The format we choose is the Ziv-Lempel family, specifically the LZ78 and LZW variants. Given a text of length...
Approximate Matching of Run-Length Compressed Strings (2001)
Veli Mkinen, Gonzalo Navarro, Esko Ukkonen
Abstract. We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest...
Faster approximate string matching over compressed text (2001)
Gonzalo Navarro, Takuya Kida, Masayuki Takeda, Ayumi Shinohara, Setsuo Arikawa
Approximate string matching on compressed text was a problem open during almost a decade. The two existing solutions are very recent. Despite that they represent important complexity breakthroughs,...
A guided tour to approximate string matching (2001)
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval...
Towards measuring the searching complexity of metric sapces (2001)
Abstract. In this paper we introduce a new measure of the intrinsic searching complexity of a general metric space. This measure re ects the expected behavior of the search algorithms on the metric...
Searching in metric spaces (2001)
Gonzalo Navarro, Ricardo Baeza-yates
The problem of searching the elements of a set which are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from...
Approximate Matching of Run-Length Compressed Strings (2001)
Veli Mkinen, Gonzalo Navarro, Esko Ukkonen
Abstract. We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest...
Towards measuring the searching complexity of metric sapces (2001)
In recent times searching in metric spaces for proximity queries has deserved attention in the algorithmic community due to applications linked to Web searching, data mining and multimedia retrieval....
A probabilistic spell for the curse of dimensionality (2001)
Abstract. Range searches in metric spaces can be very difficult if the space is "high dimensional", i.e. when the histogram of distances has a large mean and/or a small variance....
A guided tour to approximate string matching (2001)
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval...
Regular Expression Searching over Ziv-Lempel Compressed Text (2001)
We present a solution to the problem of regular expression searching on compressed text. The format we choose is the Ziv-Lempel family, specifically the LZ78 and LZW variants. Given a text of length...
A guided tour to approximate string matching (2001)
We survey the current techniques to cope with the problem of string matching that allows errors. This is becoming a more and more relevant issue for many fast growing areas such as information...
Dynamic spatial approximation trees (2001)
Abstract. The Spatial Approximation Tree (sa-tree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data...
Navarro, Gonzalo, Fuentes, Alfredo
El estudio fitosociológico en un área de las tierras bajas del este en Santa Cruz-Bolivia permite describir las asociaciones boscosas regionales de la Llanura Chaqueña, del Escudo Precámbrico...
New models and algorithms for multidimensional approximate pattern matching (2000)
Ricardo Baeza-yates, Gonzalo Navarro
ABSTRACT: We focus on how to compute the edit distance (or similarity) between two images and the problem of approximate string matching in two dimensions, that is, to find a pattern of size mm in a...
Josu Kuri, Gonzalo Navarro, Ludovic M, Laurent Heye
Abstract. We present a pattern matching approach to the problem of misuse detection in a computer system, which is formalized as the problem of multiple approximate pattern matching. This permits...
An index for two dimensional string matching allowing rotations (2000)
Kimmo Fredriksson, Gonzalo Navarro, Esko Ukkonen
Abstract. We present an index to search a two-dimensional pattern of size m \Theta m in a twodimensional text of size n \Theta n, even when the pattern appears rotated in the text. The index is based...
Approximate string matching over Ziv-Lempel compressed text (2000)
Juha Karkkainen, Gonzalo Navarro, Esko Ukkonen
Abstract. We present a solution to the problem of performing approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family, specifically the LZ78 and LZW variants....
Approximate string matching over Ziv-Lempel compressed text (2000)
Abstract. We present a Boyer-Moore approach to string matching over LZ78 and LZW compressed text. The key idea is that, despite that we cannot exactly choose which text characters to inspect, we can...
Compression: A key for next-generation text retrieval systems (2000)
Nivio Ziviani, Edleno Silva Moura, Gonzalo Navarro, Ricardo Baeza-yates
In this article we discuss recent methods for compressing the text and the index of text retrieval systems. By compressing both the complete text and the index, the total amount of space is less than...
Binary searching with non-uniform costs and its application to text retrieval (2000)
Gonzalo Navarro, Eduardo Fernandes Barbosa, Ricardo Baeza-yates, Walter Cunto, Nivio Ziviani
We study the problem of minimizing the expected cost of binary searching for data where the access cost is not fixed and depends on the last accessed element, such as data stored in magnetic or...
A hybrid indexing method for approximate string matching (2000)
Gonzalo Navarro, Ricardo Baeza-yates
ABSTRACT: We present a new indexing method for the approximate string matching problem. The method is based on a suffix array combined with a partitioning of the pattern. We analyze the resulting...
An index for two dimensional string matching allowing rotations (2000)
Kimmo Fredriksson, Gonzalo Navarro, Esko Ukkonen
Abstract. We present an index to search a two-dimensional pattern of size m \Theta m in a two-dimensional text of size n \Theta n, even when the pattern appears rotated in the text. The index is...
Ricardo Baeza-yates, Gonzalo Navarro
Despite that several models to structure text documents and to query on this structure have been proposed in the past, a standard has emerged only relatively recently with the introduction of XML and...
Adding Compression to Block Addressing Inverted Indexes (2000)
Gonzalo Navarro, Marden Neubert, Nivio Ziviani, Ricardo Baeza-yates
Inverted index compression, block addressing and sequential search on compressed text are three techniques that have been separately developed for efficient, low-overhead text retrieval. Modern text...
Fast Filters for Two Dimensional String Matching Allowing Rotations (2000)
Kimmo Fredriksson, Gonzalo Navarro, Esko Ukkonen
We give faster algorithms for searching a 2-dimensional pattern in a 2-dimensional text allowing rotations, mismatches and/or insertion/deletion errors.
Indexing Text with Approximate q-grams (2000)
Gonzalo Navarro, Erkki Sutinen, Jorma Tarhio
We present a new index for approximate string matching. The index collects text q-samples, that is, disjoint text substrings of length q, at fixed intervals and stores their positions. At search...
Josué Kuri, Gonzalo Navarro, Ludovic Mé, Laurent Heye
Abstract. We present a pattern matching approach to the problem of misuse detection in a computer system, which is formalized as the problem of multiple approximate pattern matching. This permits...
Approximate matching of run-length compressed strings (1999)
We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common...
Bounding the expected length of longest common subsequences and forests (1999)
Abstract. We present two techniques to find lower and upper bounds for the expected length of longest common subsequences and forests of two random sequences of the same length, over a fixed size,...
Fast Multipattern Search Algorithms for Intrusion Detection (1999)
We present new search algorithms to detect the occurrences of any pattern from a given pattern set in a text, allowing in the occurrences a limited number of spurious text characters among those of...
Fast Multipattern Search Algorithms for Intrusion Detection (1999)
We present new search algorithms to detect the occurrences of any pattern from a given pattern set in a text, allowing in the occurrences a limited number of spurious text characters among those of...
Searching in metric spaces by spatial approximation (1999)
Abstract. We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them which satisfies the triangle...
Faster approximate string matching (1999)
Ricardo Baeza-yates, Gonzalo Navarro
We present a new algorithm for on-line approximate string matching. The algorithm is based on the simulation of a non-deterministic finite automaton built from the pattern and using the text as...
Bounding the expected length of longest common subsequences and forests (1999)
Gonzalo Navarro, Rodrigo Scheihing, Blanco Encalada
We present improvements to two techniques to find lower and upper bounds for the expected length of longest common subsequences and forests of two random sequences of the same length, over a fixed...
A new indexing method for approximate string matching (1999)
Gonzalo Navarro, Ricardo Baeza-yates
Abstract. We present a new indexing method for the approximate string matching problem. The method is based on a suffix tree combined with a partitioning of the pattern. We analyze the resulting...
Fast Multipattern Search Algorithms for Intrusion Detection (1999)
We present new search algorithms to detect the occurrences of any pattern from a given pattern set in a text, allowing in the occurrences a limited number of spurious text characters among those of...
Fast Regular Expression Search (1999)
Gonzalo Navarro And, Gonzalo Navarro
. We present a new algorithm to search regular expressions, which is able to skip text characters. The idea is to determine the minimum length ` of a string matching the regular expression,...
Bounding the Expected Length of Longest Common Subsequences and Forests (1999)
. We present two techniques to find lower and upper bounds for the expected length of longest common subsequences and forests of two random sequences of the same length, over a fixed size, uniformly...
Faster Approximate String Matching (1999)
We present a new algorithm for on-line approximate string matching. The algorithm is based on the simulation of a non-deterministic finite automaton built from the pattern and using the text as...
Very fast and simple approximate string matching (1999)
Gonzalo Navarro, Ricardo Baeza-yates
Abstract We improve the fastest known algorithm for approximate string matching. This algorithm can only be used for low error levels. By using a new algorithm to verify potential matches and a new...
Searching in Metric Spaces by Spatial Approximation (1999)
We propose a new data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them, which satisfies the triangular inequality....
A uni model for similarity searching (1999)
Gonzalo Navarro, Ricardo Baeza-yates, Gonzalo Navarro, Ricardo Baeza-yates
Abstract. The indexing algorithms and data structures for similarity seaching in metric spaces seem to emerge from a great diversity, and different approaches have been proposed and analyzed...
Ricardo Baeza-yates, Gonzalo Navarro
We present three new algorithms for on-line multiple string matching allowing errors. These are extensions of previous algorithms that search for a single pattern. The average running time achieved...
Improved approximate pattern matching on hypertext (1998)
Abstract. The problem of approximate pattern matching on hypertext is defined and solved by Amir et al. in O(m(n log m + e)) time, where m is the length of the pattern, n is the total text size and e...
Fast multi-dimensional approximate pattern matching (1998)
Gonzalo Navarro, Ricardo Baeza-yates
Abstract. We address the problem of approximate string matching in d dimensions, that is, to find a pattern of size m
A practical q-gram index for text retrieval allowing errors (1998)
Gonzalo Navarro, Ricardo Baeza-yates
We propose an indexing technique for approximate text searching, which is practical and powerful, and especially optimized for natural language text. Unlike other indices of this kind, it is able to...
Fast approximate string matching in a dictionary (1998)
Ricardo Baeza-yates, Gonzalo Navarro
A successful technique to search large textual databases allowing errors relies on an online search in the vocabulary of the text. To reduce the time of that online search, we index the vocabulary as...
Improved approximate pattern matching on hypertext (1998)
The problem of approximate pattern matching on hypertext is defined and solved by Amir et al. in O(m(n log m+ e)) time, where m is the length of the pattern, n is the total text size and e is the...
Improving an Algorithm for Approximate Pattern Matching (1998)
Gonzalo Navarro, Ricardo Baeza-yates
We study a recent algorithm for fast on-line approximate string matching. This is the problem of searching a pattern in a text allowing errors in the pattern or in the text. The algorithm is based on...
A Model and a Visual Query Language for Structured Text (1998)
Gonzalo Navarro, Depto De Inform'atica
We present a new model to query document databases by content and structure. The main merits of the model are: it allows rich structure in the documents; the query algebra is intuitive (moreover,...
A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text (1998)
Gonzalo Navarro, Mathieu Raffinot
We address in this paper the problem of string matching on Lempel-Ziv compressed text. The goal is to search a pattern in a text without uncompressing. This is a highly relevant issue, since it is...
Fast and Flexible String Matching by Combining Bit-parallelism and Suffix Automata (1998)
Gonzalo Navarro, Mathieu Raffinot
Several string matching algorithms exist, the most famous are Knuth-Morris-Pratt (KMP), BoyerMoore (BM), and some variations on BM, like Hoorspool and Sunday. Most of these algorithms rely on...
Experimental Analysis of a Parallel Quicksort-Based Algorithm for Suffix Array Generation (1998)
Autran Macedo, Marco Antonio Cristo, Elaine Spinola Silva, Denilson Moura Barbosa, Joao Paulo Kitajima, Moura Barbosa, ...
. This paper presents experiments performed with an implementation of a quicksort-based parallel indexing algorithm. Besides the expected reduction in execution time, it was observed that the word...
Fast Searching on Compressed Text Allowing Errors (1998)
Gonzalo Navarro, Nivio Ziviani, Ricardo Baeza-yates
We present a fast compression and decompression scheme for natural language texts that allows efficient and flexible string matching by searching the compressed text directly. The compression scheme...
A Model and a Visual Query Language for Structured Text (1998)
Ricardo Baeza-Yates, Gonzalo Navarro, Jesús Vegas, Depto De Inform'atica
We present a new model to query document databases by content and structure. The main merits of the model are: it allows rich structure in the documents; the query algebra is intuitive (moreover,...
A practical index for text retrieval allowing errors (1997)
Ricardo Baeza-yates, Gonzalo Navarro
We propose a text indexing technique for approximate pattern matching, which is practical and especially aimed at Information Retrieval (IR). Unlike other indices of this kind, it is able to retrieve...
Block-addressing indices for approximate text retrieval (1997)
Ricardo Baeza-yates, Gonzalo Navarro
Although the issue of approximate text retrieval is gaining importance in the last years, it is currently addressed by only a few indexing schemes. To reduce space requirements, the indices may point...
Proximal Nodes: A Model to Query Document Databases by Content and Structure (1997)
Gonzalo Navarro, Ricardo Baeza-yates
A model to query document databases by both their content and structure is presented. The
Multiple approximate string matching (1997)
Ricardo Baeza-yates, Gonzalo Navarro
Abstract. We present two new algorithms for on-line multiple approximate string matching. These are extensions of previous algorithms that search for a single pattern. The single-pattern version of...
Indexing compressed text (1997)
Gonzalo Navarro, Nivio Ziviani
Abstract. We present a technique to build an index based on suffix arrays for compressed texts. We also propose a compression scheme for textual databases based on words that generates a compression...
Atmospheric extinction 0.65 (1997)
Jo~ao Paulo Kitajima, Gonzalo Navarro, Nivio Ziviani
Abstract. An algorithm for the distributed computation of suffix arrays for large texts is presented. The parallelism model is that of a set of sequential tasks which execute in parallel and exchange...
Distributed generation of suffix arrays (1997)
Gonzalo Navarro, Jo~ao Paulo Kitajima, Nivio Ziviani
Abstract. An algorithm for the distributed computation of suffix arrays for large texts is presented. The parallelism model is that of a set of sequential tasks which execute in parallel and exchange...
Block-addressing indices for approximate text retrieval (1997)
Ricardo Baeza-yates, Gonzalo Navarro
We focus on indices for approximate word retrieval. This is an issue which is gaining importance in the last years. At the present time only a few indexing schemes, mostly based on inverted lists,...
Block Addressing Indices for Approximate Text Retrieval (1997)
Ricardo Baeza-yates, Gonzalo Navarro
The issue of reducing the space overhead when indexing large text databases is becoming more and more important, as the text collections grow in size. Another subject, which is gaining importance as...
Proximal Nodes: A Model to Query Document Databases by Content and Structure (1997)
Gonzalo Navarro, Ricardo Baeza-yates
this paper is to present a model to structure and query document databases, following the new trend of mixing content and structure in queries. The model is shown to be expressive and efficiently...
Indexing Methods for Approximate Text Retrieval (Extended Abstract) (1997)
Ricardo Baeza-yates, Gonzalo Navarro, Erkki Sutinen, Jorma Tarhio
While the problem of on-line approximate string matching is well studied, only recently the first off-line indexing techniques have emerged. We study the different indexing mechanisms for this...
A Partial Deterministic Automaton for Approximate String Matching (1997)
. One of the simplest approaches to approximate string matching is to consider the associated non-deterministic finite automaton and make it deterministic. Besides automaton generation, the search...
Multiple Approximate String Matching by Counting (1997)
. We present a very simple and efficient algorithm for online multiple approximate string matching. It uses a previously known counting-based filter [9] that searches for a single pattern by quickly...
Distributed generation of suffix arrays (1997)
Gonzalo Navarro, Kitajima Berthier, A. Ribeiro-neto, Nivio Ziviani Y
Catálogo ecológico preliminar de las cactáceas de Bolivia (1996)
Navarro, G. Catálogo ecológico preliminar de las cactáceas de Bolivia. Lazaroa 17:33-84 (1996). Se presenta una aproximación al catálogo de las cactáceas bolivianas ordenado alfabéticamente,...
A class of linear algorithms to process sets of segments (1996)
Gonzalo Navarro, Ricardo Baeza-yates
We address the problem of efficiently performing operations on sets of segments. While current solutions to operate segments focus on single operations (e.g. insertion or searching), we are...
An optimal index for Pat arrays (1996)
Abstract. We study the problem of keeping in main memory an index for a large PAT array stored in secondary storage. This index is used to minimize the number of accesses to secondary memory,...
A Fast Heuristic for Approximate String Matching (1996)
Ricardo Baeza-yates, Gonzalo Navarro
This work has been supported in part by FONDECYT grants 1950622 and 1960881. Abstract. We study a fast algorithm for on-line approximate string matching. It is based on a non-deterministic finite...
A Faster Algorithm for Approximate String Matching (1996)
Ricardo Baeza-yates, Gonzalo Navarro
Abstract. We present a new algorithm for on-line approximate string matching. The algorithm is based on the simulation of a non-deterministic finite automaton built from the pattern and using the...
A Faster Algorithm for Approximate String Matching (Extended Abstract) (1996)
Ricardo Baeza-yates, Gonzalo Navarro
Ricardo Baeza-Yates Gonzalo Navarro Department of Computer Science University of Chile Blanco Encalada 2120 - Santiago - Chile frbaeza,gnavarrog@dcc.uchile.cl Abstract We present a new algorithm for...
Expressive power of a new model for structured text databases (1995)
Gonzalo Navarro, Ricardo Baeza-yates
This paper studies the expressivity of a new model for structuring and querying textual databases by both the structure and contents of the text. The key idea of the model is a set-oriented query...
Los enebrales rastreros oromediterráneos del Sector Ibérico soriano (1985)
Rivas Martínez, Salvador, Navarro, Gonzalo, Mendiola Ubillos, María Angeles, Tarazona Lafarga, Teresa
An important subtask of the pattern discovery process is pattern matching, where the pattern sought is already known and we want to determine how often and where it occurs in a sequence. In this...