Abstract
The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony (https://github.com/immunogenomics/harmony), an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of ~106 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
All data analyzed in this article are publicly available through online sources. We included links to all data sources in Supplementary Table 8.
Code availability
Harmony and LISI are available as R packages on https://github.com/immunogenomics/harmony and https://github.com/immunogenomics/lisi. Scripts to reproduce results of the primary analyses will be made available on https://github.com/immunogenomics/harmony2019. Additionally, vignettes are included as Supplementary Notes. Supplementary Note 1 provides a detailed walkthrough of Harmony, connecting theoretical algorithm components to their code implementations. Supplementary Note 2 demonstrates the LISI metric and how to evaluate its statistical significance. Supplementary Note 1 uses Harmony with simulated datasets.
Change history
26 August 2020
In the supplementary information originally posted for this article, the Supplementary Results and Supplementary Notes 1–3 were missing. The error has been corrected online.
References
Svensson, V., Vento-Tormo, R. & Teichmann, S. A. Exponential scaling of single-cell RNA-seq in the past decade. Nat. Protocols 13, 599–604 (2018).
Regev, A. et al. The human cell atlas. eLife 6, e27041 (2017).
Zhang, F. et al. Defining inflammatory cell states in rheumatoid arthritis joint synovial tissues by integrating single-cell transcriptomics and mass cytometry. Nat. Immunol. 20, 928–942 (2019).
Arazi, A. et al. The immune cell landscape in kidneys of lupus nephritis patients. Nat. Immunol. 20, 902–914 (2019).
Der, E. et al. Tubular cell and keratinocyte single-cell transcriptomics applied to lupus nephritis reveal type I IFN and fibrosis relevant pathways. Nat. Immunol. 20, 915–927 (2019).
Hicks, S. C., Townes, F. W., Teng, M. & Irizarry, R. A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 19, 562–578 (2017).
Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).
Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36, 421–427 (2018).
Hie, B. L., Bryson, B. & Berger, B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol. 37, 685–691 (2018).
Polanski, K. et al. BBKNN: fast batch alignment of single cell transcriptomes. Bioinformatics https://doi.org/10.1093/bioinformatics/btz625 (2019).
Zheng, G. X. Y. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).
Li, B. et al. HCA Data Portal: census of immune cells (Human Cell Atlas, 2019).
Segerstolpe, A. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).
Baron, M. et al. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 3, 346–360 (2016).
Lawlor, N. et al. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res. 27, 208–222 (2017).
Grun, D. et al. De novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell 19, 266–277 (2016).
Muraro, M. J. et al. A single-cell transcriptome atlas of the human pancreas. Cell Syst. 3, 385–394 (2016).
Ritchie, M. E. et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Gao, T. et al. Pdx1 maintains β cell identity and function by repressing an α cell program. Cell Metab. 19, 259–271 (2014).
Jia, S. et al. Insm1 cooperates with neurod1 and foxa2 to maintain mature pancreatic β-cell function. EMBO J. 34, 1417–1433 (2015).
Sachdeva, M. M. et al. Pdx1 (MODY4) regulates pancreatic beta cell susceptibility to ER stress. Proc. Natl Acad. Sci. USA 106, 19090–19095 (2009).
Katoh, M. C. et al. MafB is critical for glucagon production and secretion in mouse pancreatic α cells in vivo. Mol. Cell. Biol. 38, e00504–e00517 (2018).
Liu, J. et al. Islet-1 regulates arx transcription during pancreatic islet α-cell development. J. Biol. Chem. 286, 15352–15360 (2011).
Akiyama, M. et al. X-box binding protein 1 is essential for insulin regulation of pancreatic α-cell function. Diabetes 62, 2439–2449 (2013).
Burcelin, R., Knauf, C. & Cani, P. D. Pancreatic alpha-cell dysfunction in diabetes. Diabetes Metab. 34, S49–S55 (2008).
Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature 566, 490–495 (2019).
Moffitt, J. R.et al. Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science 362, eaau5324 (2018).
Moffitt, J. et al. Data from: Molecular, Spatial and Functional Single-cell Profiling of the Hypothalamic Preoptic Region (Dryad, Dataset, 2018); https://doi.org/10.5061/dryad.8t8s248
Khan, A. et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res. 46, D260–D266 (2018).
Close, J. et al. Satb1 is an activity-modulated transcription factor required for the terminal differentiation and connectivity of medial ganglionic eminence-derived cortical interneurons. J. Neurosci. 32, 17690–17705 (2012).
Lein, E. S. et al. Genome-wide atlas of gene expression in the adult mouse brain. Nature 445, 168–176 (2007).
Leek, J. T. & Storey, J. D. Capturing heterogeneity in gene expressionstudies by surrogate variable analysis. PloS Genet. 3, e161 (2007).
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nature Protocols 7, 500–507 (2012).
Mizoguchi, F. et al. Functionally distinct disease-associated fibroblast subsets in rheumatoid arthritis. Nat. Commun. 9, 789 (2018).
Manno, G. L. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
Mao, Q., Wang, L., Goodison, S. & Sun, Y. Dimensionality reduction via graph structure learning. In Proc. 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2015, 765–774 (ACM, 2015).
Dhillon, I. S. & Modha, D. S. Concept decompositions for large sparse text data using clustering. Mach. Learn. 42, 143–175 (2001).
Jordan, M. I. & Jacobs, R. A. Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6, 181–214 (1994).
Buttner, M., Miao, Z., Wolf, F. A., Teichmann, S. A. & Theis, F. J. A test metric for assessing single-cell RNA-seq batch correction. Nat. Methods 16, 43–49 (2019).
Azizi, E. et al. Single-cell map of diverse immune phenotypes in the breast tumor microenvironment. Cell 174, 1293–1308 (2018).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
McInnes, L. & Healy, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Becht, E. et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37, 38–44 (2019).
Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with bioconductor. F1000 Res. 5, 2122 (2016).
Blondel, V. D., Guillaume, J.-L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008, P10008 (2008).
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013).
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
The Gene Ontology Consortium. Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 45, D331–D338 (2017).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. the gene ontology consortium. Nat. Genet. 25, 25–29 (2000).
Acknowledgements
This work was supported in part by funding from the National Institutes of Health (grant nos. UH2AR067677 and U19AI111224 and no. 1R01AR063759 (to S.R.) and T32 AR007530-31 (to I.K.)). We thank members of the Raychaudhuri and Brenner labs for comments and discussion. I.K. and K.W. were funded as part of a collaborative research agreement with F. Hoffmann-La Roche Ltd (Basel, Switzerland), to S.R. and M.B.B.
Author information
Authors and Affiliations
Contributions
S.R. and I.K. conceived the research. I.K. led computational work under the guidance of S.R., assisted by N.M., P.L., J.F. and K.S. All authors participated in interpretation and writing the manuscript.
Corresponding author
Ethics declarations
Competing interests
I.K. does paid bioinformatics consulting through Brilyant LLC.
Additional information
Peer review information Nicole Rusk was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–19, Supplementary Results and Supplementary Notes 1–3.
Supplementary Software 1
Harmony R package. Software to perform Harmony integration analysis.
Supplementary Software 2
LISI R package. Software to compute the Local Inverse Simpson’s Index.
Supplementary Tables 1–8
Jurkat LISI, Time benchmark, Memory Benchmark, HCA LISI, PBMC LISI, Inhibitory, Excitatory, Data Sources.
Rights and permissions
About this article
Cite this article
Korsunsky, I., Millard, N., Fan, J. et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16, 1289–1296 (2019). https://doi.org/10.1038/s41592-019-0619-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-019-0619-0
This article is cited by
-
Consensus representation of multiple cell–cell graphs from gene signaling pathways for cell type annotation
BMC Biology (2025)
-
AZIN1 level is increased in medulloblastoma and correlates with c-Myc activity and tumor phenotype
Journal of Experimental & Clinical Cancer Research (2025)
-
Decoding SFRP2 progenitors in sustaining tooth growth at single-cell resolution
Stem Cell Research & Therapy (2025)
-
Epithelial cell diversity and immune remodeling in bladder cancer progression: insights from single-cell transcriptomics
Journal of Translational Medicine (2025)
-
Cellular crosstalk in organotypic vasculature: mechanisms of diabetic cardiorenal complications and SGLT2i responses
Cardiovascular Diabetology (2025)