Abstract
Programmable C•G-to-G•C base editors (CGBEs) have broad scientific and therapeutic potential, but their editing outcomes have proved difficult to predict and their editing efficiency and product purity are often low. We describe a suite of engineered CGBEs paired with machine learning models to enable efficient, high-purity C•G-to-G•C base editing. We performed a CRISPR interference (CRISPRi) screen targeting DNA repair genes to identify factors that affect C•G-to-G•C editing outcomes and used these insights to develop CGBEs with diverse editing profiles. We characterized ten promising CGBEs on a library of 10,638 genomically integrated target sites in mammalian cells and trained machine learning models that accurately predict the purity and yield of editing outcomes (R = 0.90) using these data. These CGBEs enable correction to the wild-type coding sequence of 546 disease-related transversion single-nucleotide variants (SNVs) with >90% precision (mean 96%) and up to 70% efficiency (mean 14%). Computational prediction of optimal CGBE–single-guide RNA pairs enables high-purity transversion base editing at over fourfold more target sites than achieved using any single CGBE variant.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The target library sequencing data generated during this study are available at the NCBI Sequence Read Archive database under PRJNA631290. Data from the Repair-seq screens are available under PRJNA721212. Processed target library data used for training machine learning models have been deposited under the following DOIs: https://doi.org/10.6084/m9.figshare.12275645 and https://doi.org/10.6084/m9.figshare.12275654.
Code availability
Code used for analysis of CRISPRi screens is available at https://github.com/jeffhussmann/repair-seq. Codes used for target library data processing and analysis iare available at https://github.com/maxwshen/lib-dataprocessing and https://github.com/maxwshen/lib-analysis, respectively. The machine learning models for CGBEs trained on target library data are available as a part of the BE-Hive interactive web application at https://crisprbehive.design and the BE-Hive Python package at https://github.com/maxwshen/be_predict_efficiency and https://github.com/maxwshen/be_predict_bystander.
Change history
18 October 2023
A Correction to this paper has been published: https://doi.org/10.1038/s41587-023-02028-8
References
Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
Gehrke, J. M. et al. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nat. Biotechnol. 36, 977–982 (2018).
Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, aaf8729 (2016).
Richter, M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. 38, 883–891 (2020).
Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018).
Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR–Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).
Gaudelli, N. M. et al. Directed evolution of adenine base editors with increased activity and therapeutic application. Nat. Biotechnol. 38, 892–900 (2020).
Mok, B. Y. et al. A bacterial cytidine deaminase toxin enables CRISPR-free mitochondrial base editing. Nature 583, 631–637 (2020).
Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Sci. Adv. 3, eaao4774 (2017).
Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480 (2020).
Kurt, I. C. et al. CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nat. Biotechnol. 39, 41–46 (2020).
Zhao, D. et al. Glycosylase base editors enable C-to-A and C-to-G base changes. Nat. Biotechnol. 39, 35–40 (2020).
Chen, L. et al. Programmable C:G to G:C genome editing with CRISPR-Cas9-directed base excision repair proteins. Nat. Commun. 12, 1384 (2021).
Liu, D. R. & Koblan, L. W. Cytosine to guanine base editor. Patentscope https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2018165629 (2018).
Marquart, K. F. et al. Predicting base editing outcomes with an attention-based deep learning algorithm trained on high-throughput target library screens. Preprint at bioRxiv https://doi.org/10.1101/2020.07.05.186544 (2020).
Sang, P. B., Srinath, T., Patil, A. G., Woo, E.-J. & Varshney, U. A unique uracil-DNA binding protein of the uracil DNA glycosylase superfamily. Nucleic Acids Res. 43, 8452–8463 (2015).
Ahn, W.-C. et al. Covalent binding of uracil DNA glycosylase UdgX to abasic DNA upon uracil excision. Nat. Chem. Biol. 15, 607–614 (2019).
Tu, J., Chen, R., Yang, Y., Cao, W. & Xie, W. Suicide inactivation of the uracil DNA glycosylase UdgX by covalent complex formation. Nat. Chem. Biol. 15, 615–622 (2019).
Hussmann, J. A. et al. Mapping the genetic landscape of DNA double-strand break repair. Preprint at bioRxiv https://doi.org/10.1101/2021.06.14.44834 (2021).
Gilbert, L. A. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442–451 (2013).
Gallina, I. et al. The ubiquitin ligase RFWD3 is required for translesion DNA synthesis. Molecular Cell 81, 442–458.e9 (2021).
Levy, J. M. et al. Cytosine and adenine base editing of the brain, liver, retina, heart and skeletal muscle of mice via adeno-associated viruses. Nat. Biomed. Eng. 4, 97–110 (2020).
Kim, Y. B. et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 35, 371–376 (2017).
Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).
Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2015).
Chen, J. S. et al. Enhanced proofreading governs CRISPR–Cas9 targeting accuracy. Nature 550, 407–410 (2017).
Lee, J. K. et al. Directed evolution of CRISPR-Cas9 to increase its specificity. Nat. Commun. 9, 3048 (2018).
Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).
Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018).
Nishimasu, H. et al. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).
Stenson, P. D. et al. Human Gene Mutation Database: towards a comprehensive central mutation database. J. Med. Genet. 45, 124–126 (2007).
Frank, M. et al. The type of variants at the COL3A1 gene associates with the phenotype and severity of vascular Ehlers–Danlos syndrome. Eur. J. Hum. Genet. 23, 1657–1664 (2015).
Petrucelli, N., Daly, M. B. & Feldman, G. L. Hereditary breast and ovarian cancer due to mutations in BRCA1 and BRCA2. Genet. Med. 12, 245–259 (2010).
Douglas, J. et al. NSD1 mutations are the major cause of Sotos syndrome and occur in some cases of Weaver syndrome but are rare in other overgrowth phenotypes. Am. J. Hum. Genet. 72, 132–143 (2003).
Luna-Peláez, N. et al. The Cornelia de Lange syndrome-associated factor NIPBL interacts with BRD4 ET domain for transcription control of a common set of genes. Cell Death Dis. 10, 548 (2019).
Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Horlbeck, M. A. et al. Compact and highly active next-generation libraries for CRISPR-mediated gene repression and activation. eLife 5, e19760 (2016).
Gilbert, LukeA. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159, 647–661 (2014).
Gilbert, LukeA. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442–451 (2013).
Sherwood, R. I. et al. Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape. Nat. Biotechnol. 32, 171–178 (2014).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019).
Acknowledgements
This work was supported by US NIH (nos. U01AI142756, UG3AI150551, RM1HG009490, R35GM118062, R35GM138167 and P30CA072720), HHMI and Princeton University. B.A. acknowledges a Searle Scholars award. The authors acknowledge NSF Graduate Research Fellowships to L.W.K., M.W.S. and T.A.S.; a NWO Rubicon Fellowship to M.A.; a Jane Coffin Childs postdoctoral fellowship to A.V.A.; fellowship support from the NSF and Hertz Foundation to J.L.D.; a Helen Hay Whitney postdoctoral fellowship to G.A.N.; a Damon Runyon Postdoctoral Fellowship to D.Y.; a Singapore A*STAR NSS fellowship to B.M.; and NIH Ruth L. Kirschstein National Research Service Award no. F31NS115380 to J.M.R. J.A.H. was the Rebecca Ridley Kry Fellow of the Damon Runyon Cancer Research Foundation.
Author information
Authors and Affiliations
Contributions
L.W.K, M.A., M.W.S., J.A.H., A.V.A., J.S.W., B.A. and D.R.L. designed the research. L.W.K., M.A., M.W.S., J.A.H., A.V.A., J.L.D., G.A.N., D.Y., B.M., J.M.R., A.X., T.A.S. and B.A. performed experiments. J.S.W., B.A. and D.R.L. supervised the project. L.W.K. and D.R.L. wrote the manuscript with input from all authors.
Corresponding authors
Ethics declarations
Competing interests
J.A.H. is a consultant for Tessera Therapeutics. J.M.R. is a consultant for Maze Therapeutics. J.S.W. is a consultant for, and holds equity in, Maze Therapeutics, Chroma Medicine and KSQ Therapeutics. B.A. was a member of a ThinkLab Advisory Board for, and holds equity in, Celsius Therapeutics. D.R.L. is a consultant for, and holds equity in, Beam Therapeutics, Prime Medicine, Pairwise Plants and Chroma Medicine. The remaining authors declare no competing interests.
Additional information
Peer review information Nature Biotechnology thanks Jia Chen, Leopold Parts and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–15, Discussion 1–6, Sequences and References.
41587_2021_938_MOESM3_ESM.xlsx
Supplementary Table 1. CRISPRi sgRNA library. Supplementary Table 2. Changes in base editing outcomes for all genes in CRISPRi screens. Supplementary Table 3. Base editing outcomes in a library of disease-related alleles correctable by editing C•G to G•C or to A•T. Supplementary Table 4. CGBE targets, amplicons and oligos used for this study.
Supplementary Data 1
All C•G-to-G•C editing yield, purity and indel outcomes for all experiments in this manuscript. T-tests can be generated for any pairwise comparison in this file.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Koblan, L.W., Arbab, M., Shen, M.W. et al. Efficient C•G-to-G•C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nat Biotechnol 39, 1414–1425 (2021). https://doi.org/10.1038/s41587-021-00938-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-021-00938-z
This article is cited by
-
Whole-brain in vivo base editing reverses behavioral changes in Mef2c-mutant mice
Nature Neuroscience (2024)
-
Deep learning models to predict the editing efficiencies and outcomes of diverse base editors
Nature Biotechnology (2024)
-
CRISPR technologies for genome, epigenome and transcriptome editing
Nature Reviews Molecular Cell Biology (2024)
-
Glycosylase-based base editors for efficient T-to-G and C-to-G editing in mammalian cells
Nature Biotechnology (2024)
-
Adenine transversion editors enable precise, efficient A•T-to-C•G base editing in mammalian cells and embryos
Nature Biotechnology (2024)