Pending updates for multiome #1287

brianraymor · 2025-03-06T20:03:54Z

Context

@jahilton noted that this information from #1013 needs to be captured:

One key decision is to accept unpaired scATAC data. This is based on many users finding them valuable, especially because 10x multiome data can be of poor quality.
unpaired scATAC-seq Datasets will be the gene activity matrix (not a peak matrix). Paired scATAC-seq (eg 10x multiome) Datasets will be the gene expression matrix (RNA data)
Matrix Layers table - Accessibility (e.g. ATAC-seq, mC-seq) can be specified to unpaired Accessibility (e.g. ATAC-seq, mC-seq)
Will need to communicate this distinction clearly to the user outside the schema

…and for clarity, the scRNA-seq (UMI, e.g. 10x v3, Slide-seqV2) can have 10x multiome added to the list

Design (@brianraymor)

@jahilton - I could move the definitions for paired and unpaired to the X (Matrix Layers) section. Another approach is to inline the gene activity matrix requirement in the table row with unpaired accessibility?

`X` (Matrix Layers)

...

Definitions for scATAC-seq assays

paired assay. obs['assay_ontology_term_id'] is a descendant of both "EFO:0010891" for scATAC-seq and "EFO:0008913" for single-cell RNA sequencing. A gene expression matrix (RNA data) is required.

unpaired assay. obs['assay_ontology_term_id'] is "EFO:0010891" for scATAC-seq or a descendant and is not a descendant of "EFO:0008913" for single-cell RNA sequencing. A gene activity matrix and not a peak matrix is required.

The following table describes the matrix data and layers requirements that are assay-specific. If an entry in the table is empty, the schema does not have any other requirements on data in those layers beyond the ones listed above.

Assay	"raw" required?	"raw" location	"normalized" required?	"normalized" location
scRNA-seq (UMI, e.g. 10x multiome, 10x v3, Slide-seqV2)	REQUIRED. Values MUST be de-duplicated molecule counts. Each cell MUST contain at least one non-zero value. All non-zero values MUST be positive integers stored as `numpy.float32`.	`AnnData.raw.X` unless no "normalized" is provided, then `AnnData.X`	STRONGLY RECOMMENDED	`AnnData.X`
Visium Spatial (e.g. V1, CytAssist)	REQUIRED. Values MUST be de-duplicated molecule counts. All non-zero values MUST be positive integers stored as `numpy.float32`. If `uns['spatial']['is_single']` is `False` then each cell MUST contain at least one non-zero value. If `uns['spatial']['is_single']` is `True` then the unfiltered feature-barcode matrix (`raw_feature_bc_matrix`) MUST be used. See Space Ranger Feature-Barcode Matrices. if `assay_ontology_term_id` is `"EFO:0022860"` for Visium CytAssist Spatial Gene Expression, 11mm, this matrix MUST contain 14336 rows; otherwise, this matrix MUST contain 4992 rows. If the `obs['in_tissue']` value is `1`, then the cell MUST contain at least one non-zero value. If any `obs['in_tissue']` values are `0`, then at least one cell corresponding to a `obs['in_tissue']` with a value of `0` MUST contain a non-zero value.	`AnnData.raw.X` unless no "normalized" is provided, then `AnnData.X`	STRONGLY RECOMMENDED	`AnnData.X`
scRNA-seq (non-UMI, e.g. SS2)	REQUIRED. Values MUST be one of read counts (e.g. FeatureCounts) or estimated fragments (e.g. output of RSEM). Each cell MUST contain at least one non-zero value. All non-zero values MUST be positive integers stored as `numpy.float32`.	`AnnData.raw.X` unless no "normalized" is provided, then `AnnData.X`	STRONGLY RECOMMENDED	`AnnData.X`
unpaired Accessibility (e.g. ATAC-seq, mCT-seq)	NOT REQUIRED		REQUIRED	`AnnData.X`

The text was updated successfully, but these errors were encountered:

jahilton · 2025-03-12T21:09:18Z

LGTM

brianraymor added 5.3 Next minor CELLxGENE schema version after 5.2 schema CELLxGENE Discover dataset schema labels Mar 6, 2025

brianraymor self-assigned this Mar 6, 2025

brianraymor mentioned this issue Mar 12, 2025

editorial updates for multiome #1293

Merged

brianraymor closed this as completed in #1293 Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pending updates for multiome #1287

Pending updates for multiome #1287

brianraymor commented Mar 6, 2025 •

edited

Loading

jahilton commented Mar 12, 2025

Pending updates for multiome #1287

Pending updates for multiome #1287

Comments

brianraymor commented Mar 6, 2025 • edited Loading

Context

Design (@brianraymor)

X (Matrix Layers)

Definitions for scATAC-seq assays

jahilton commented Mar 12, 2025

brianraymor commented Mar 6, 2025 •

edited

Loading

`X` (Matrix Layers)