You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One key decision is to accept unpaired scATAC data. This is based on many users finding them valuable, especially because 10x multiome data can be of poor quality.
unpaired scATAC-seq Datasets will be the gene activity matrix (not a peak matrix). Paired scATAC-seq (eg 10x multiome) Datasets will be the gene expression matrix (RNA data) Matrix Layers table - Accessibility (e.g. ATAC-seq, mC-seq) can be specified to unpaired Accessibility (e.g. ATAC-seq, mC-seq)
Will need to communicate this distinction clearly to the user outside the schema
…and for clarity, the scRNA-seq (UMI, e.g. 10x v3, Slide-seqV2) can have 10x multiome added to the list
@jahilton - I could move the definitions for paired and unpaired to the X (Matrix Layers) section. Another approach is to inline the gene activity matrix requirement in the table row with unpaired accessibility?
X (Matrix Layers)
...
Definitions for scATAC-seq assays
paired assay. obs['assay_ontology_term_id'] is a descendant of both "EFO:0010891" for scATAC-seq and "EFO:0008913" for single-cell RNA sequencing. A gene expression matrix (RNA data) is required.
unpaired assay. obs['assay_ontology_term_id'] is "EFO:0010891" for scATAC-seq or a descendant and is not a descendant of "EFO:0008913" for single-cell RNA sequencing. A gene activity matrix and not a peak matrix is required.
The following table describes the matrix data and layers requirements that are assay-specific. If an entry in the table is empty, the schema does not have any other requirements on data in those layers beyond the ones listed above.
Assay
"raw" required?
"raw" location
"normalized" required?
"normalized" location
scRNA-seq (UMI, e.g. 10x multiome, 10x v3, Slide-seqV2)
REQUIRED. Values MUST be de-duplicated molecule counts. Each cell MUST contain at least one non-zero value. All non-zero values MUST be positive integers stored as numpy.float32.
AnnData.raw.X unless no "normalized" is provided, then AnnData.X
STRONGLY RECOMMENDED
AnnData.X
Visium Spatial (e.g. V1, CytAssist)
REQUIRED. Values MUST be de-duplicated molecule counts. All non-zero values MUST be positive integers stored as numpy.float32.
If uns['spatial']['is_single'] is False then each cell MUST contain at least one non-zero value.
If uns['spatial']['is_single'] is True then the unfiltered feature-barcode matrix (raw_feature_bc_matrix) MUST be used. See Space Ranger Feature-Barcode Matrices.
if assay_ontology_term_id is "EFO:0022860" for Visium CytAssist Spatial Gene Expression, 11mm, this matrix MUST contain 14336 rows; otherwise, this matrix MUST contain 4992 rows.
If the obs['in_tissue'] value is 1, then the cell MUST contain at least one non-zero value. If any obs['in_tissue'] values are 0, then at least one cell corresponding to a obs['in_tissue'] with a value of 0 MUST contain a non-zero value.
AnnData.raw.X unless no "normalized" is provided, then AnnData.X
STRONGLY RECOMMENDED
AnnData.X
scRNA-seq (non-UMI, e.g. SS2)
REQUIRED. Values MUST be one of read counts (e.g. FeatureCounts) or estimated fragments (e.g. output of RSEM). Each cell MUST contain at least one non-zero value. All non-zero values MUST be positive integers stored as numpy.float32.
AnnData.raw.X unless no "normalized" is provided, then AnnData.X
STRONGLY RECOMMENDED
AnnData.X
unpaired Accessibility (e.g. ATAC-seq, mCT-seq)
NOT REQUIRED
REQUIRED
AnnData.X
The text was updated successfully, but these errors were encountered:
Context
@jahilton noted that this information from #1013 needs to be captured:
One key decision is to accept unpaired scATAC data. This is based on many users finding them valuable, especially because 10x multiome data can be of poor quality.
unpaired scATAC-seq Datasets will be the gene activity matrix (not a peak matrix). Paired scATAC-seq (eg 10x multiome) Datasets will be the gene expression matrix (RNA data)
Matrix Layers table -
Accessibility (e.g. ATAC-seq, mC-seq)
can be specified tounpaired Accessibility (e.g. ATAC-seq, mC-seq)
Will need to communicate this distinction clearly to the user outside the schema
…and for clarity, the
scRNA-seq (UMI, e.g. 10x v3, Slide-seqV2)
can have10x multiome
added to the listDesign (@brianraymor)
@jahilton - I could move the definitions for paired and unpaired to the X (Matrix Layers) section. Another approach is to inline the gene activity matrix requirement in the table row with unpaired accessibility?
X
(Matrix Layers)...
Definitions for scATAC-seq assays
paired assay.
obs['assay_ontology_term_id']
is a descendant of both"EFO:0010891"
for scATAC-seq and"EFO:0008913"
for single-cell RNA sequencing. A gene expression matrix (RNA data) is required.unpaired assay.
obs['assay_ontology_term_id']
is"EFO:0010891"
for scATAC-seq or a descendant and is not a descendant of"EFO:0008913"
for single-cell RNA sequencing. A gene activity matrix and not a peak matrix is required.The following table describes the matrix data and layers requirements that are assay-specific. If an entry in the table is empty, the schema does not have any other requirements on data in those layers beyond the ones listed above.
numpy.float32
.AnnData.raw.X
unless no "normalized" is provided, thenAnnData.X
AnnData.X
numpy.float32
.If
uns['spatial']['is_single']
isFalse
then each cell MUST contain at least one non-zero value.If
uns['spatial']['is_single']
isTrue
then the unfiltered feature-barcode matrix (raw_feature_bc_matrix
) MUST be used. See Space Ranger Feature-Barcode Matrices.if
assay_ontology_term_id
is"EFO:0022860"
for Visium CytAssist Spatial Gene Expression, 11mm, this matrix MUST contain 14336 rows; otherwise, this matrix MUST contain 4992 rows.If the
obs['in_tissue']
value is1
, then the cell MUST contain at least one non-zero value. If anyobs['in_tissue']
values are0
, then at least one cell corresponding to aobs['in_tissue']
with a value of0
MUST contain a non-zero value.AnnData.raw.X
unless no "normalized" is provided, thenAnnData.X
AnnData.X
numpy.float32
.AnnData.raw.X
unless no "normalized" is provided, thenAnnData.X
AnnData.X
AnnData.X
The text was updated successfully, but these errors were encountered: