Indexing graphs for path queries with applications in genome research

J Sirén, N Välimäki, V Mäkinen - IEEE/ACM transactions on …, 2014 - ieeexplore.ieee.org
IEEE/ACM transactions on computational biology and bioinformatics, 2014ieeexplore.ieee.org
We propose a generic approach to replace the canonical sequence representation of
genomes with graph representations, and study several applications of such extensions. We
extend the Burrows-Wheeler transform (BWT) of strings to acyclic directed labeled graphs, to
support path queries as an extension to substring searching. We develop, apply, and tailor
this technique to a) read alignment on an extended BWT index of a graph representing pan-
genome, ie, reference genome and known variants of it; and b) split-read alignment on an …
We propose a generic approach to replace the canonical sequence representation of genomes with graph representations, and study several applications of such extensions. We extend the Burrows-Wheeler transform (BWT) of strings to acyclic directed labeled graphs, to support path queries as an extension to substring searching. We develop, apply, and tailor this technique to a) read alignment on an extended BWT index of a graph representing pan-genome, i.e., reference genome and known variants of it; and b) split-read alignment on an extended BWT index of a splicing graph. Other possible applications include probe/primer design, alignments to assembly graphs, and alignments to phylogenetic tree of partial-order graphs. We report several experiments on the feasibility and applicability of the approach. Especially on highly-polymorphic genome regions our pan-genome index is making a significant improvement in alignment accuracy.
ieeexplore.ieee.org