159 Refining TCR clonotype identification with long-read sequencing technique

Background

Beyond conventional short-read Next-Generation Sequencing (NGS), long-read sequencing technology, like Oxford Nanopore Technologies (ONT),1 enhance coverage and resolution of genomics fragments, crucial for detecting T Cell Receptors (TCRs) known for their variability from VDJ recombination.2 Short-read NGS sequences from 3’ or 5’ end of the cDNA template, for 3’ short-read sequencing, adapters/barcodes near constant regions hinder capturing variable regions. Long-read sequencing captures entire TCR regions, providing better resolution on the complete CDR3.3 However, long-reads have higher error rates,4 requiring a robust bioinformatic pipeline.5 6 We aim to improve the accuracy and reliability of TCR repertoire reconstruction, enhancing clonotype identification for T-cell vaccine development.

Methods

Human PBMCs were processed for full-length TCR libraries using Chromium Next GEM Single Cell 5’ v2 (10xGenomics) and sequenced using illumina HiSeq X, producing the single-cell TCR (scTCR) dataset (figure 1A). A separate aliquot of full-length TCR cDNA was sequenced using Oxford Nanopore Technologies (ONT) Ligation Sequencing Kit V14 and PromethION Flow Cell (figure 1A), with dorado Duplex base-calling, producing the long-read scTCR dataset.

Long-read scTCR cell barcodes were identified following the 5’ adaptor (figure 2A). In the scTCR dataset, with adaptors removed, barcodes were located using TSO sequence. Extracted barcodes were validated against known whitelist,7 visualized using VennDiagram R package. Reads segregated by unique barcodes for identifying cell-specific TCR alpha/beta chains. Hierarchical clustering was performed on Mash distances8 using Ward’s Linkage. TCR alpha/beta clusters were verified using MiXCR-align.9 Sequencing errors were assessed against MiXCR reference using minimap2,10 and visualized on IGV.

Results

We introduce a position-based cell barcode identification approach (figure 2A), to aid cell-specific TCR clonotype determination. Analysis shows that few cell barcodes from the long-read scTCR dataset matched with 10x whitelist, while majority did not (figure 2B). Over half of the scTCR dataset’s barcodes also failed to match, suggesting that barcode correction is needed. Using reads from a selected cell, clustering revealed two main sequence clusters (figure 3A); with one enriched in alpha, the other enriched in beta, (figure 3B – top; with VDJ genes detailed at the bottom). IGV visualization confirmed consistent basecalls (figure 3C).

Conclusions

We presented a workflow that refines scTCR analysis by improving cell barcode validity and enabling precise reconstruction of full-length TCR through long-read sequencing. Further refinement with barcode and UMI correction could enhance TCR repertoire analysis. This workflow is adaptable to other single-cell/spatial TCR technologies.

Acknowledgements

This work was supported by the Bioinformatics Institute (BII), Singapore Immunology Network (SIgN), and Agency for Science, Technology and Research (A*STAR). This work was funded by H22J1a0043 and MOH-OFYIRG23jan-0021.

References

  • Lin B, Hui J, Mao H. Nanopore technology and its applications in gene sequencing. Biosensors 2021;11(7):214. https://doi.org/10.3390/bios11070214

  • Singh M, Al-Eryani G, Carswell S, Ferguson JM, Blackburn J, Barton K, Roden D, Luciani F, Giang Phan T, Junankar S, Jackson K, Goodnow CC, Smith MA, Swarbrick A. High-throughput targeted long-read single cell sequencing reveals the clonal and transcriptional landscape of lymphocytes. Nature Communications 2019;10(1):3120. https://doi.org/10.1038/s41467-019-11049-4

  • Mika J, CandĂ©ias SM, Badie C, Polanska J. (2022). Can we detect T cell receptors from long-read RNA-Seq data?. In: Rojas I, Valenzuela O, Rojas F, Herrera LJ, Ortuño F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2022. Lecture Notes in Computer Science(), vol 13347. Springer, Cham. https://doi.org/10.1007/978-3-031-07802-6_38

  • Senol Cali D, Kim JS, Ghose S, Alkan C, Mutlu O. Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions. Briefings in Bioinformatics 2019;20(4):1542–1559. https://doi.org/10.1093/bib/bby017

  • Oehler JB, Wright H, Stark Z, Mallett AJ, Schmitz U. The application of long-read sequencing in clinical settings. Human Genomics, 2023;17(1):73. https://doi.org/10.1186/s40246-023-00522-3

  • Gupta S, Witas R, Voigt A, Semenova T, Nguyen CQ. Single-cell sequencing of T cell receptors: a perspective on the technological development and translational application. Advances in Experimental medicine and Biology 2020;1255:29–50. https://doi.org/10.1007/978-981-15-4494-1_3

  • 10x GENOMICS. What is a barcode whitelist?. available at: https://kb.10xgenomics.com/hc/en-us/articles/115004506263-What-is-a-barcode-whitelist (Oct. 2023)

  • Ondov BD, Treangen TJ, Melsted P. et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol 2016;17:132. https://doi.org/10.1186/s13059-016-0997-x

  • Bolotin DA, Poslavsky S, Mitrophanov I, Shugay M, Mamedov IZ, Putintseva EV, Chudakov DM. MiXCR: software for comprehensive adaptive immunity profiling. Nat Methods 2015 May;12(5):380-1. doi: 10.1038/nmeth.3364. PMID: 25924071.

  • Heng Li, Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, September 2018;34(18):3094–3100, https://doi.org/10.1093/bioinformatics/bty191

  • Abstract 159 Figure 1

    Workflow of single-cell TCR analysis using short-read and long-read sequencing. (A) Peripheral blood mononuclear cells (PBMCs) were processed to generate full-length TCR cDNA libraries. Sequencing was performed using Illumina for short-read data and PromethION P2 for long-read data

    Abstract 159 Figure 2

    Cell barcode identification and validation. (A) The relative nucleotide positions of Read1 adaptors, cell barcodes, and Template Switch Oligos (TSO) in representative reads from the long-read scTCR data. (B) A Venn diagram illustrates the overlap and differences in cell barcodes derived from long-read scTCR and scTCR datasets when compared against the known barcode whitelist. 14.1% of long-read scTCR barcodes matched with scTCR datasets and barcode 10x whitelist, whereas 78.4% differed. Above 50% of the scTCR dataset’s barcodes were unmatched against known whitelist

    Abstract 159 Figure 3

    Segregation of sequences from a single cell into alpha and beta chains, VDJ geneiIdentification, and nucleotide consistency assessment. (A) Using reads from a randomly selected cell, Hierarchical clustering based on the pairwise Mash distance matrix clearly distinguishes two main groups. (B) MIXCR analysis: the top table indicates that the left cluster (orange dendrogram branch) predominantly consists of alpha chains (98.2%), while the right cluster (green branch) is primarily composed of beta chains (97.53%); the bottom table lists the identified VDJ gene combinations for each cluster. This indicates that our clustering approach can effectively segregate the alpha and beta chains.(C) IGV visualize representative sequences from the beta group aligned to the TRBV2 gene, showing consistent basecalls at individual locus

    Leave a Reply