Abstract
High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation1,2,3. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Accession codes
References
Cloonan, N. et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 5, 613–619 (2008).
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Nagalakshmi, U., Wang, Z., Waern, K., Shou, C. & Raha, D. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).
Wang, E. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Denoeud, F. et al. Annotating genomes with massive-scale RNA sequencing. Genome Biol. 9, R175 (2008).
Maher, C. et al. Transcriptome sequencing to detect gene fusions in cancer. Nature 458, 97–101 (2009).
Marioni, J., Mason, C., Mane, S., Stephens, M. & Gilad, Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).
Hiller, D., Jiang, H., Xu, W. & Wong, W. Identifiability of isoform deconvolution from junction arrays and RNA-Seq. Bioinformatics 25, 3056–3059 (2009).
Jiang, H. & Wong, W.H. Statistical inferences for isoform expression in RNA-Seq. Bioinformatics 25, 1026–1032 (2009).
Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A. & Dewey, C.N. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics 26, 493–500 (2010).
Mortazavi, A., Williams, B., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).
Pepke, S., Wold, B. & Mortazavi, A. Computation for ChIP-Seq and RNA-Seq studies. Nat. Methods 6, S22–S32 (2009).
Yaffe, D. & Saxel, O. A myogenic cell line with altered serum requirements for differentiation. Differentiation 7, 159–166 (1977).
Yun, K. & Wold, B. Skeletal muscle determination and differentiation: story of a core regulatory network and its context. Curr. Opin. Cell Biol. 8, 877–889 (1996).
Tapscott, S.J. The circuitry of a master switch: Myod and the regulation of skeletal muscle gene transcription. Development 132, 2685–2695 (2005).
Trapnell, C., Pachter, L. & Salzberg, S. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Haas, B.J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
Dilworth, R. A decomposition theorem for partially ordered sets. Ann. Math. 51, 161–166 (1950).
Eriksson, N. et al. Viral population estimation using pyrosequencing. PLOS Comput. Biol. 4, e1000074 (2008).
Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).
Cordes, K.R. et al. miR-145 and miR-143 regulate smooth muscle cell fate and plasticity. Nature 460, 705–710 (2009).
Lareau, L.F., Inada, M., Green, R.E., Wengrod, J.C. & Brenner, S.E. Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature 446, 926–929 (2007).
Bullard, J., Purdom, E., Hansen, K., Durinck, S. & Dudoit, S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics 11, 94 (2010).
Endo, T. & Nadal-Ginard, B. Transcriptional and posttranscriptional control of c-myc during myogenesis: its mRNA remains inducible in differentiated cells and does not suppress the differentiated phenotype. Mol. Cell. Biol. 6, 1412–1421 (1986).
Fuglede, B. & Topsøe, F. in Proceedings of the IEEE International Symposium on Information Theory, 31 (2004).
Cottle, D.L., McGrath, M.J., Cowling, B.S. & Coghill, I.D. FHL3 binds MyoD and negatively regulates myotube formation. J. Cell Sci. 120, 1423–1435 (2007).
Sammeth, M., Lacroix, V., Ribeca, P. & Guigó, R. The FLUX Simulator. <http://flux.sammeth.net>.
Johnson, D., Mortazavi, A., Myers, R. & Wold, B. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497–1502 (2007).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Acknowledgements
This work was supported in part by the US National Institutes of Health (NIH) grants R01-LM006845 and ENCODE U54-HG004576, as well as the Beckman Foundation, the Bren Foundation, the Moore Foundation (Cell Center Program) and the Miller Research Institute. We thank I. Antosechken and L. Schaeffer of the Caltech Jacobs Genome Center for DNA sequencing, and D. Trout, B. King and H. Amrhein for data pipeline and database design, operation and display. We are grateful to R. K. Bradley, K. Datchev, I. Hallgrímsdóttir, J. Landolin, B. Langmead, A. Roberts, M. Schatz and D. Sturgill for helpful discussions.
Author information
Authors and Affiliations
Contributions
C.T. and L.P. developed the mathematics and statistics and designed the algorithms; B.A.W. and G.K. performed the RNA-Seq and B.A.W. designed and executed experimental validations; C.T. implemented Cufflinks and Cuffdiff; G.P. implemented Cuffcompare; M.J.v.B. and A.M. tested the software; C.T., G.P. and A.M. performed the analysis; L.P., A.M. and B.J.W. conceived the project; C.T., L.P., A.M., B. J.W. and S.L.S. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Tables 1–3, Supplementary Figs. 1–11 and Supplementary Methods (PDF 2058 kb)
Supplementary Table 4
Genes with complex isoform expression dynamics in C2C12 myogenesis (XLS 80 kb)
Rights and permissions
About this article
Cite this article
Trapnell, C., Williams, B., Pertea, G. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28, 511–515 (2010). https://doi.org/10.1038/nbt.1621
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nbt.1621
This article is cited by
-
Integrated genome-wide association and transcriptomic analysis to identify receptor kinase genes to stripe rust resistance in wheat germplasm from southwestern China
BMC Plant Biology (2024)
-
Classification of soybeans from different habitats based on metabolomic–transcriptomic integration
Applied Biological Chemistry (2024)
-
Annotations of four high-quality indigenous chicken genomes identify more than one thousand missing genes in subtelomeric regions and micro-chromosomes with high G/C contents
BMC Genomics (2024)
-
Transcriptome analysis of Gossypium hirsutum cultivar Zhongzhimian No.2 uncovers the gene regulatory networks involved in defense against Verticillium dahliae
BMC Plant Biology (2024)
-
Fetal programming and lactation: modulating gene expression in response to undernutrition during intrauterine life
Pediatric Research (2024)