Multi-modal¶

Here, we’ll showcase how to curate and register ECCITE-seq data from Papalexi21 in the form of MuData objects.

ECCITE-seq is designed to enable interrogation of single-cell transcriptomes together with surface protein markers in the context of CRISPR screens.

MuData objects build on top of AnnData objects to store multimodal data.

# !pip install 'lamindb[jupyter,bionty]'
!lamin init --storage ./test-multimodal --modules bionty

import lamindb as ln
import bionty as bt

ln.track()

Creating MuData Artifacts¶

lamindb provides a from_mudata() method to create Artifact from MuData objects.

mdata = ln.core.datasets.mudata_papalexi21_subset()
mdata

mdata_af = ln.Artifact.from_mudata(mdata, key="papalexi.h5mu")
mdata_af

# MuData Artifacts have the corresponding otype
mdata_af.otype

# MuData Artifacts can easily be loaded back into memory
papalexi_in_memory = mdata_af.load()
papalexi_in_memory

Schema¶

# define labels
perturbation = ln.ULabel(name="Perturbation", is_type=True).save()
ln.ULabel(name="Perturbed", type=perturbation).save()
ln.ULabel(name="NT", type=perturbation).save()

replicate = ln.ULabel(name="Replicate", is_type=True).save()
ln.ULabel(name="rep1", type=replicate).save()
ln.ULabel(name="rep2", type=replicate).save()
ln.ULabel(name="rep3", type=replicate).save()

# define obs schema
obs_schema = ln.Schema(
    name="mudata_papalexi21_subset_obs_schema",
    features=[
        ln.Feature(name="perturbation", dtype="cat[ULabel[Perturbation]]").save(),
        ln.Feature(name="replicate", dtype="cat[ULabel[Replicate]]").save(),
    ],
).save()

obs_schema_rna = ln.Schema(
    name="mudata_papalexi21_subset_rna_obs_schema",
    features=[
        ln.Feature(name="nCount_RNA", dtype=int).save(),
        ln.Feature(name="nFeature_RNA", dtype=int).save(),
        ln.Feature(name="percent.mito", dtype=float).save(),
    ],
    coerce_dtype=True,
).save()

obs_schema_hto = ln.Schema(
    name="mudata_papalexi21_subset_hto_obs_schema",
    features=[
        ln.Feature(name="nCount_HTO", dtype=float).save(),
        ln.Feature(name="nFeature_HTO", dtype=int).save(),
        ln.Feature(name="technique", dtype=bt.ExperimentalFactor).save(),
    ],
    coerce_dtype=True,
).save()

var_schema_rna = ln.Schema(
    name="mudata_papalexi21_subset_rna_var_schema",
    itype=bt.Gene.symbol,
    dtype=float,
).save()

# define composite schema
mudata_schema = ln.Schema(
    name="mudata_papalexi21_subset_mudata_schema",
    otype="MuData",
    slots={
        "obs": obs_schema,
        "rna:obs": obs_schema_rna,
        "hto:obs": obs_schema_hto,
        "rna:var": var_schema_rna,
    },
).save()

mudata_schema

Schema(uid='cZ6g257KL5NDmQo3', name='mudata_papalexi21_subset_mudata_schema', n=-1, is_type=False, itype='Composite', otype='MuData', dtype='num', hash='n4XOfGGtBXwwGkmIujFhQg', minimal_set=True, ordered_set=False, maximal_set=False, branch_id=1, space_id=1, created_by_id=1, run_id=1, created_at=2025-07-14 06:44:59 UTC)

Validate MuData annotations¶

curator = ln.curators.MuDataCurator(mdata, mudata_schema)

! auto-transposed `var` for backward compat, please indicate transposition in the schema definition by calling out `.T`: slots={'var.T': itype=bt.Gene.ensembl_gene_id}

try:
    curator.validate()
except ln.errors.ValidationError:
    pass

! 37 terms not validated in feature 'columns': 'adt:G2M.Score', 'adt:HTO_classification', 'adt:MULTI_ID', 'adt:NT', 'adt:Phase', 'adt:S.Score', 'adt:gene_target', 'adt:guide_ID', 'adt:orig.ident', 'adt:percent.mito', 'adt:perturbation', 'adt:replicate', 'hto:G2M.Score', 'hto:HTO_classification', 'hto:MULTI_ID', 'hto:NT', 'hto:Phase', 'hto:S.Score', 'hto:gene_target', 'hto:guide_ID', ...
    → fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('columns')

! using default organism = human

! using default organism = human

! using default organism = human

! 96 terms not validated in feature 'columns': 'RP5-827C21.6', 'XX-CR54.1', 'RP11-379B18.5', 'RP11-778D9.12', 'RP11-703G6.1', 'AC005150.1', 'RP11-717H13.1', 'CTC-498J12.1', 'CTC-467M3.1', 'HIST1H4K', 'RP11-524H19.2', 'AC006042.7', 'AC002066.1', 'AC073934.6', 'RP11-268G12.1', 'U52111.14', 'RP11-235C23.5', 'RP11-12J10.3', 'CASC1', 'RP11-324E6.9', ...
    12 synonyms found: "CTC-467M3.1" → "MEF2C-AS2", "HIST1H4K" → "H4C12", "CASC1" → "DNAI7", "LARGE" → "LARGE1", "NBPF16" → "NBPF15", "C1orf65" → "CCDC185", "IBA57-AS1" → "IBA57-DT", "KIAA1239" → "NWD2", "TMEM75" → "LINC02912", "AP003419.16" → "RPS6KB2-AS1", "FAM65C" → "RIPOR3", "C14orf177" → "LINC02914"
    → curate synonyms via: .standardize("columns")
    for remaining terms:
    → fix typos, remove non-existent values, or save terms via: curator.cat.add_new_from('columns')

curator.slots["rna:var"].cat.standardize("columns")

curator.slots["rna:var"].cat.add_new_from("columns")

curator.validate()

Register curated Artifact¶

artifact = curator.save_artifact(key="mudata_papalexi21_subset.h5mu")

artifact.describe()

Show code cell output

Hide code cell output

Artifact .h5mu · MuData · dataset
├── General
│   ├── uid: Hwh0yKf0KiRg8Xtm0000          hash: DVP0v1Ox99dbtx__KLuYPg
│   ├── size: 537.1 KB                     n_observations: 200
│   ├── space: all                         branch: main
│   ├── created_at: 2025-07-14 06:45:04    created_by: testuser1 (Test User1)
│   ├── key: mudata_papalexi21_subset.h5mu
│   ├── storage location / path: 
│   │   /home/runner/work/lamin-usecases/lamin-usecases/docs/test-multimodal/.lamindb/Hwh0yKf0KiRg8Xtm0000.h5mu
│   └── transform: multimodal.ipynb
├── Dataset features
│   ├── obs • 2                         [Feature]                                                                  
│   │   perturbation                    cat[ULabel[Perturbation]]          NT, Perturbed                           
│   │   replicate                       cat[ULabel[Replicate]]             rep1, rep2, rep3                        
│   ├── rna:obs • 3                     [Feature]                                                                  
│   │   nCount_RNA                      int                                                                        
│   │   nFeature_RNA                    int                                                                        
│   │   percent.mito                    float                                                                      
│   ├── hto:obs • 3                     [Feature]                                                                  
│   │   technique                       cat[bionty.ExperimentalFactor]     cell hashing                            
│   │   nCount_HTO                      float                                                                      
│   │   nFeature_HTO                    int                                                                        
│   └── rna:var • 184                   [bionty.Gene.symbol]                                                       
│       SH2D6                           num                                                                        
│       MEF2C-AS2                       num                                                                        
│       ARHGAP26-AS1                    num                                                                        
│       GABRA1                          num                                                                        
│       H4C12                           num                                                                        
│       HLA-DQB1-AS1                    num                                                                        
│       HLA-DQB1-AS1                    num                                                                        
│       HLA-DQB1-AS1                    num                                                                        
│       HLA-DQB1-AS1                    num                                                                        
│       HLA-DQB1-AS1                    num                                                                        
│       HLA-DQB1-AS1                    num                                                                        
│       HLA-DQB1-AS1                    num                                                                        
│       SPACA1                          num                                                                        
│       VNN1                            num                                                                        
│       CTAGE15                         num                                                                        
│       CTAGE15                         num                                                                        
│       PFKFB1                          num                                                                        
│       TRPC5                           num                                                                        
│       RBPMS-AS1                       num                                                                        
│       CA8                             num                                                                        
└── Labels
    └── .experimental_factors           bionty.ExperimentalFactor          cell hashing                            
        .ulabels                        ULabel                             Perturbed, NT, rep1, rep2, rep3

ln.finish()

# clean up test instance
!rm -r test-multimodal
!lamin delete --force test-multimodal