CRISPResso2 provides accurate and rapid genome editing sequence analysis

K Clement, H Rees, MC Canver, JM Gehrke… - Nature …, 2019 - nature.com
K Clement, H Rees, MC Canver, JM Gehrke, R Farouni, JY Hsu, MA Cole, DR Liu, JK Joung…
Nature biotechnology, 2019nature.com
To the Editor—The field of genome editing is advancing rapidly1, most recently exemplified
by the advent of base editors that enable changing single nucleotides in a predictable
manner2–4. For the validation and characterization of genome editing experiments, targeted
amplicon sequencing has become the gold standard5. Here we present a substantially
updated version of our CRISPResso tool6 to facilitate the analysis of data that would be
difficult to handle with existing tools6–9. CRISPResso2 introduces five key innovations: first …
To the Editor—The field of genome editing is advancing rapidly1, most recently exemplified by the advent of base editors that enable changing single nucleotides in a predictable manner2–4. For the validation and characterization of genome editing experiments, targeted amplicon sequencing has become the gold standard5. Here we present a substantially updated version of our CRISPResso tool6 to facilitate the analysis of data that would be difficult to handle with existing tools6–9. CRISPResso2 introduces five key innovations: first, comprehensive analysis of sequencing data from base editors; second, a batch mode for analyzing and comparing multiple editing experiments; third, allelespecific quantification of heterozygous or polymorphic references; fourth, a biologically informed alignment algorithm; and fifth, ultrafast processing time. We discuss each of these in turn below. Our updated software allows users to readily quantify and visualize amplicon sequencing data from base-editing experiments. It takes as input raw FASTQ sequencing files and outputs reports describing frequencies and efficiencies of base editing activity, plots showing base substitutions across the entire amplicon region (Fig. 1a), and nucleotide substitution frequencies for a region specified by the user (Fig. 1b). Users can also specify the nucleotide substitution (for example, C→ T or A→ G) that is relevant for the base editor used, and the software produces publication-quality plots for nucleotides of interest with heat maps showing conversion efficiency. We also improved processing time and memory usage of CRISPResso2 to enable users to analyze, visualize and compare results from hundreds of genome editing experiments using batch functionality. This is particularly useful when many input FASTQ files must to be aligned to the same amplicon or have the same guides, and the genome editing efficiencies and outcomes can be visualized together. In addition, CRISPResso2 generates intuitive plots to show the nucleotide frequencies and indel rates at each position in each sample. This allows users to easily visualize the results and extent of editing in their experiments for different enzymes (Fig. 1c). In cases where the genome editing target contains more than one allele (for example, when heterozygous single nucleotide polymorphisms (SNPs) are present), genome editing on each allele must be quantified separately, even though reads from both alleles are amplified and mixed in the same input FASTQ file. Current strategies are not capable of analyzing multiple reference alleles and may lead to incorrect quantification. CRISPResso2 enables allele-specific quantification by aligning individual reads to each allelic variant and assigning each read to the most closely aligned allele. Downstream processing is performed separately for each allele so that insertions, deletions or substitutions that distinguish each allele are not confounded with genome editing. To demonstrate the utility of our approach, we reanalyzed amplicon sequencing data from a mouse with a heterozygous SNP at the Rho gene in which an engineered SaCas9-KKH nuclease was directed to the P23H mutant allele10. CRISPResso2 deconvoluted reads, quantified insertions and deletions from each allele, and produced intuitive visualizations of experimental outcomes (Fig. 1d). Existing amplicon sequencing analysis toolkits ignore the biological understanding of genome editing and instead optimize the alignment on the basis of sequence identity only. However, this can lead to incorrect quantification of indel events, especially in sequences with short repetitive subsequences where the location of indels may …
nature.com