CellMarker¶

lamindb provides access to the following public cell marker ontologies through bionty:

CellMarker

Here we show how to access and search cell marker ontologies to standardize new data.

import bionty as bt
import pandas as pd

PublicOntology objects¶

Let us create a public ontology accessor with public(), which chooses a default public ontology source from Source. It’s a PublicOntology object, which you can think about as a public registry:

public = bt.CellMarker.public(organism="human")
public

→ connected lamindb: testuser1/test-public-ontologies

PublicOntology
Entity: CellMarker
Organism: human
Source: cellmarker, 2.0
#terms: 15466

As for registries, you can export the ontology as a DataFrame:

df = public.df()
df.head()

	name	gene_symbol	ncbi_gene_id	uniprotkb_id
0	A1BG	A1BG	1	P04217
1	A2M	A2M	3494	None
2	A2ML1	A2ML1	144568	A8K2U0
3	A4GALT	A4GALT	53947	A0A0S2Z5J1
4	AADAC	AADAC	13	P22760

Unlike registries, you can also export it as a Pronto object via public.ontology.

Look up terms¶

As for registries, terms can be looked up with auto-complete:

lookup = public.lookup()

The . accessor provides normalized terms (lower case, only contains alphanumeric characters and underscores):

lookup.immp1l

CellMarker(name='IMMP1L', synonyms='', gene_symbol='IMMP1L', ncbi_gene_id='196294', uniprotkb_id='Q96LU5')

To look up the exact original strings, convert the lookup object to dict and use the [] accessor:

lookup_dict = lookup.dict()
lookup_dict["IMMP1L"]

CellMarker(name='IMMP1L', synonyms='', gene_symbol='IMMP1L', ncbi_gene_id='196294', uniprotkb_id='Q96LU5')

Search terms¶

Search behaves in the same way as it does for registries:

public.search("CD4").head(5)

	name	gene_symbol	ncbi_gene_id	uniprotkb_id
1900	Cd4	CD4	920	B4DT49
1901	CD40	CD40	958	A0A0S2Z3C7
1905	CD40LG	CD40LG	959	P29965
1908	CD46	CD46	4179	P15529
1907	Cd44	CD44	960	P16070

Search another field (default is .name):

public.search("CD4", field=public.gene_symbol).head(1)

	name	synonyms	gene_symbol	ncbi_gene_id	uniprotkb_id
1900	Cd4		CD4	920	B4DT49

Standardize cell marker identifiers¶

Let us generate a DataFrame that stores a number of cell markers identifiers, some of which corrupted:

markers = pd.DataFrame(
    index=[
        "KI67",
        "CCR7",
        "CD14",
        "CD8",
        "CD45RA",
        "CD4",
        "CD3",
        "CD127a",
        "PD1",
        "Invalid-1",
        "Invalid-2",
        "CD66b",
        "Siglec8",
        "Time",
    ]
)

Now let’s check which cell markers can be found in the reference:

public.inspect(markers.index, public.name);

! 8 unique terms (57.10%) are not validated for name: 'KI67', 'CCR7', 'CD14', 'CD4', 'CD127a', 'Invalid-1', 'Invalid-2', 'Time'

   detected 4 unique terms with inconsistent casing/synonyms: KI67, CCR7, CD14, CD4

→  standardize terms via .standardize()

Logging suggests to map synonyms:

synonyms_mapper = public.standardize(markers.index, return_mapper=True)
synonyms_mapper

{'KI67': 'Ki67', 'CCR7': 'Ccr7', 'CD14': 'Cd14', 'CD4': 'Cd4'}

Let’s replace the synonyms with standardized names in the DataFrame:

markers.rename(index=synonyms_mapper, inplace=True)

The Time, Invalid-1 and Invalid-2 are non-marker channels which won’t be curated by cell marker:

public.inspect(markers.index, public.name);

! 4 unique terms (28.60%) are not validated for name: 'CD127a', 'Invalid-1', 'Invalid-2', 'Time'

We don’t find CD127a, let’s check in the lookup with auto-completion:

lookup = public.lookup()
lookup.cd127

CellMarker(name='CD127', synonyms='', gene_symbol='IL7R', ncbi_gene_id='3575', uniprotkb_id='P16871', _5='cd127')

It should be cd127, we had a typo there with cd127a:

curated_df = markers.rename(index={"CD127a": lookup.cd127.name})

Optionally, search:

public.search("CD127a").head()

	name	synonyms	gene_symbol	ncbi_gene_id	uniprotkb_id	__agg__

Now we see that all cell marker candidates validate:

public.validate(curated_df.index, public.name);

! 3 unique terms (21.40%) are not validated: 'Invalid-1', 'Invalid-2', 'Time'

Ontology source versions¶

For any given entity, we can choose from a number of versions:

bt.Source.filter(entity="bionty.CellMarker").df()

Show code cell output

Hide code cell output

	uid	entity	organism	name	in_db	currently_used	description	url	md5	source_website	space_id	dataframe_artifact_id	version	run_id	created_at	created_by_id	_aux	branch_id
id
12	3kDh8qAX	bionty.CellMarker	human	cellmarker	False	True	CellMarker	s3://bionty-assets/human_cellmarker_2.0_CellMa...	None	http://bio-bigdata.hrbmu.edu.cn/CellMarker	1	None	2.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1
13	7bV5uJo3	bionty.CellMarker	mouse	cellmarker	False	True	CellMarker	s3://bionty-assets/mouse_cellmarker_2.0_CellMa...	None	http://bio-bigdata.hrbmu.edu.cn/CellMarker	1	None	2.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1

# only lists the sources that are currently used
bt.Source.filter(entity="bionty.CellMarker", currently_used=True).df()

	uid	entity	organism	name	in_db	currently_used	description	url	md5	source_website	space_id	dataframe_artifact_id	version	run_id	created_at	created_by_id	_aux	branch_id
id
12	3kDh8qAX	bionty.CellMarker	human	cellmarker	False	True	CellMarker	s3://bionty-assets/human_cellmarker_2.0_CellMa...	None	http://bio-bigdata.hrbmu.edu.cn/CellMarker	1	None	2.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1
13	7bV5uJo3	bionty.CellMarker	mouse	cellmarker	False	True	CellMarker	s3://bionty-assets/mouse_cellmarker_2.0_CellMa...	None	http://bio-bigdata.hrbmu.edu.cn/CellMarker	1	None	2.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1

When instantiating a Bionty object, we can choose a source or version:

source = bt.Source.get(name="cellmarker", version="2.0", organism="human")
public = bt.CellMarker.public(source=source)
public

PublicOntology
Entity: CellMarker
Organism: human
Source: cellmarker, 2.0
#terms: 15466

The currently used ontologies can be displayed using:

bt.Source.filter(currently_used=True).df()

Show code cell output

Hide code cell output

	uid	entity	organism	name	in_db	currently_used	description	url	md5	source_website	space_id	dataframe_artifact_id	version	run_id	created_at	created_by_id	_aux	branch_id
id
1	33TUF039	bionty.Organism	vertebrates	ensembl	False	True	Ensembl	https://ftp.ensembl.org/pub/release-112/specie...	None	https://www.ensembl.org	1	None	release-112	None	2025-07-14 06:41:44.843000+00:00	1	None	1
2	6bbVUTCS	bionty.Organism	bacteria	ensembl	False	True	Ensembl	https://ftp.ensemblgenomes.ebi.ac.uk/pub/bacte...	None	https://www.ensembl.org	1	None	release-57	None	2025-07-14 06:41:44.843000+00:00	1	None	1
3	6s9nV6xh	bionty.Organism	fungi	ensembl	False	True	Ensembl	https://ftp.ensemblgenomes.ebi.ac.uk/pub/fungi...	None	https://www.ensembl.org	1	None	release-57	None	2025-07-14 06:41:44.843000+00:00	1	None	1
4	2PmTrc8x	bionty.Organism	metazoa	ensembl	False	True	Ensembl	https://ftp.ensemblgenomes.ebi.ac.uk/pub/metaz...	None	https://www.ensembl.org	1	None	release-57	None	2025-07-14 06:41:44.843000+00:00	1	None	1
5	7GPHh16S	bionty.Organism	plants	ensembl	False	True	Ensembl	https://ftp.ensemblgenomes.ebi.ac.uk/pub/plant...	None	https://www.ensembl.org	1	None	release-57	None	2025-07-14 06:41:44.843000+00:00	1	None	1
6	4tsksCMX	bionty.Organism	all	ncbitaxon	False	True	NCBItaxon Ontology	http://purl.obolibrary.org/obo/ncbitaxon/2023-...	None	https://github.com/obophenotype/ncbitaxon	1	None	2023-06-20	None	2025-07-14 06:41:44.843000+00:00	1	None	1
7	4UGNz3fr	bionty.Gene	human	ensembl	False	True	Ensembl	s3://bionty-assets/df_human__ensembl__release-...	None	https://www.ensembl.org	1	None	release-112	None	2025-07-14 06:41:44.843000+00:00	1	None	1
8	4r4fvV0S	bionty.Gene	mouse	ensembl	False	True	Ensembl	s3://bionty-assets/df_mouse__ensembl__release-...	None	https://www.ensembl.org	1	None	release-112	None	2025-07-14 06:41:44.843000+00:00	1	None	1
9	4RPA3Re0	bionty.Gene	saccharomyces cerevisiae	ensembl	False	True	Ensembl	s3://bionty-assets/df_saccharomyces cerevisiae...	None	https://www.ensembl.org	1	None	release-112	None	2025-07-14 06:41:44.843000+00:00	1	None	1
10	3EYyGRYN	bionty.Protein	human	uniprot	False	True	Uniprot	s3://bionty-assets/df_human__uniprot__2024-03_...	None	https://www.uniprot.org	1	None	2024-03	None	2025-07-14 06:41:44.843000+00:00	1	None	1
11	01RWXN2V	bionty.Protein	mouse	uniprot	False	True	Uniprot	s3://bionty-assets/df_mouse__uniprot__2024-03_...	None	https://www.uniprot.org	1	None	2024-03	None	2025-07-14 06:41:44.843000+00:00	1	None	1
12	3kDh8qAX	bionty.CellMarker	human	cellmarker	False	True	CellMarker	s3://bionty-assets/human_cellmarker_2.0_CellMa...	None	http://bio-bigdata.hrbmu.edu.cn/CellMarker	1	None	2.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1
13	7bV5uJo3	bionty.CellMarker	mouse	cellmarker	False	True	CellMarker	s3://bionty-assets/mouse_cellmarker_2.0_CellMa...	None	http://bio-bigdata.hrbmu.edu.cn/CellMarker	1	None	2.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1
14	6LyRtvz8	bionty.CellLine	all	clo	False	True	Cell Line Ontology	s3://bionty-assets/df_all__clo__2022-03-21__Ce...	None	https://bioportal.bioontology.org/ontologies/CLO	1	None	2022-03-21	None	2025-07-14 06:41:44.843000+00:00	1	None	1
16	3Uw2Va7a	bionty.CellType	all	cl	False	True	Cell Ontology	http://purl.obolibrary.org/obo/cl/releases/202...	None	https://obophenotype.github.io/cell-ontology	1	None	2024-08-16	None	2025-07-14 06:41:44.843000+00:00	1	None	1
17	MUtAGdL4	bionty.Tissue	all	uberon	False	True	Uberon multi-species anatomy ontology	http://purl.obolibrary.org/obo/uberon/releases...	None	http://obophenotype.github.io/uberon	1	None	2024-08-07	None	2025-07-14 06:41:44.843000+00:00	1	None	1
18	IGIkseWQ	bionty.Disease	all	mondo	False	True	Mondo Disease Ontology	http://purl.obolibrary.org/obo/mondo/releases/...	None	https://mondo.monarchinitiative.org	1	None	2025-06-03	None	2025-07-14 06:41:44.843000+00:00	1	None	1
19	4kswnHVF	bionty.Disease	human	doid	False	True	Human Disease Ontology	http://purl.obolibrary.org/obo/doid/releases/2...	None	https://disease-ontology.org	1	None	2024-05-29	None	2025-07-14 06:41:44.843000+00:00	1	None	1
21	2a1HvjdB	bionty.ExperimentalFactor	all	efo	False	True	The Experimental Factor Ontology	http://www.ebi.ac.uk/efo/releases/v3.70.0/efo.owl	None	https://bioportal.bioontology.org/ontologies/EFO	1	None	3.70.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1
22	6S4qkDx1	bionty.Phenotype	all	pato	False	True	Phenotype And Trait Ontology	http://purl.obolibrary.org/obo/pato/releases/2...	None	https://github.com/pato-ontology/pato	1	None	2024-03-28	None	2025-07-14 06:41:44.843000+00:00	1	None	1
23	48fBFLmn	bionty.Phenotype	human	hp	False	True	Human Phenotype Ontology	https://github.com/obophenotype/human-phenotyp...	None	https://hpo.jax.org	1	None	2024-04-26	None	2025-07-14 06:41:44.843000+00:00	1	None	1
25	7Ent3V2y	bionty.Pathway	all	go	False	True	Gene Ontology	http://purl.obolibrary.org/obo/go/releases/202...	None	http://geneontology.org	1	None	2024-06-17	None	2025-07-14 06:41:44.843000+00:00	1	None	1
27	3rm9aOzL	BFXPipeline	all	lamin	False	True	Bioinformatics Pipeline	s3://bionty-assets/df_all__lamin__1.0.0__BFXpi...	None	https://lamin.ai	1	None	1.0.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1
28	ugaIoIlj	Drug	all	dron	False	True	Drug Ontology	http://purl.obolibrary.org/obo/dron/releases/2...	None	https://bioportal.bioontology.org/ontologies/DRON	1	None	2024-08-05	None	2025-07-14 06:41:44.843000+00:00	1	None	1
30	1GbFkOdz	bionty.DevelopmentalStage	human	hsapdv	False	True	Human Developmental Stages	https://github.com/obophenotype/developmental-...	None	https://github.com/obophenotype/developmental-...	1	None	2024-05-28	None	2025-07-14 06:41:44.843000+00:00	1	None	1
31	10va5JSt	bionty.DevelopmentalStage	mouse	mmusdv	False	True	Mouse Developmental Stages	https://github.com/obophenotype/developmental-...	None	https://github.com/obophenotype/developmental-...	1	None	2024-05-28	None	2025-07-14 06:41:44.843000+00:00	1	None	1
32	MJRqduf9	bionty.Ethnicity	human	hancestro	False	True	Human Ancestry Ontology	http://purl.obolibrary.org/obo/hancestro/relea...	None	https://github.com/EBISPOT/hancestro	1	None	3.0	None	2025-07-14 06:41:44.843000+00:00	1	None	1
33	5JnVODh4	BioSample	all	ncbi	False	True	NCBI BioSample attributes	s3://bionty-assets/df_all__ncbi__2023-09__BioS...	None	https://www.ncbi.nlm.nih.gov/biosample/docs/at...	1	None	2023-09	None	2025-07-14 06:41:44.843000+00:00	1	None	1