lamindb.Transform¶
- class lamindb.Transform(name: str, key: str | None = None, type: TransformType | None = None, revises: Transform | None = None)¶
Bases:
SQLRecord,IsVersionedData transformations such as scripts, notebooks, functions, or pipelines.
A “transform” can refer to a Python function, a script, a notebook, or a pipeline. If you execute a transform, you generate a run (
Run). A run has inputs and outputs.A pipeline is typically created with a workflow tool (Nextflow, Snakemake, Prefect, Flyte, MetaFlow, redun, Airflow, …) and stored in a versioned repository.
Transforms are versioned so that a given transform version maps on a given source code version.
Can I sync transforms to git?
If you switch on
sync_git_repoa script-like transform is synched to its hashed state in a git repository upon callingln.track().>>> ln.settings.sync_git_repo = "https://github.com/laminlabs/lamindb" >>> ln.track()
The definition of transforms and runs is consistent the OpenLineage specification where a
Transformrecord would be called a “job” and aRunrecord a “run”.- Parameters:
name –
strA name or title.key –
str | None = NoneA short name or path-like semantic key.type –
TransformType | None = "pipeline"SeeTransformType.revises –
Transform | None = NoneAn old version of the transform.
Notes
Examples
Create a transform for a pipeline:
>>> transform = ln.Transform(key="Cell Ranger", version="7.2.0", type="pipeline").save()
Create a transform from a notebook:
>>> ln.track()
View predecessors of a transform:
>>> transform.view_lineage()
Attributes¶
- property name: str¶
Name of the transform.
Splits
keyon/and returns the last element.
- property stem_uid: str¶
Universal id characterizing the version family.
The full uid of a record is obtained via concatenating the stem uid and version information:
stem_uid = random_base62(n_char) # a random base62 sequence of length 12 (transform) or 16 (artifact, collection) version_uid = "0000" # an auto-incrementing 4-digit base62 number uid = f"{stem_uid}{version_uid}" # concatenate the stem_uid & version_uid
Simple fields¶
- uid: str¶
Universal id.
- key: str | None¶
A name or “/”-separated path-like string.
All transforms with the same key are part of the same version family.
- description: str | None¶
A description.
- type: TransformType¶
TransformType(default"pipeline").
- source_code: str | None¶
Source code of the transform.
- hash: str | None¶
Hash of the source code.
- reference: str | None¶
Reference for the transform, e.g., a URL.
- reference_type: str | None¶
Reference type of the transform, e.g., ‘url’.
- created_at: datetime¶
Time of creation of record.
- updated_at: datetime¶
Time of last update to record.
- version: str | None¶
Version (default
None).Defines version of a family of records characterized by the same
stem_uid.Consider using semantic versioning with Python versioning.
- is_latest: bool¶
Boolean flag that indicates whether a record is the latest in its version family.
Relational fields¶
- branch: Branch¶
Whether record is on a branch or in another “special state”.
-
predecessors:
Transform¶ Preceding transforms.
Allows to _manually_ define predecessors. Is typically not necessary as data lineage is automatically tracked via runs whenever an artifact or collection serves as an input for a run.
-
successors:
Transform¶ Subsequent transforms.
See
predecessors.
Class methods¶
- classmethod filter(*queries, **expressions)¶
Query records.
- Parameters:
queries – One or multiple
Qobjects.expressions – Fields and values passed as Django query expressions.
- Return type:
- Returns:
A
QuerySet.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ln.ULabel(name="my label").save() >>> ln.ULabel.filter(name__startswith="my").df()
- classmethod get(idlike=None, **expressions)¶
Get a single record.
- Parameters:
idlike (
int|str|None, default:None) – Either a uid stub, uid or an integer id.expressions – Fields and values passed as Django query expressions.
- Raises:
lamindb.errors.DoesNotExist – In case no matching record is found.
- Return type:
See also
Guide: Query & search registries
Django documentation: Queries
Examples
ulabel = ln.ULabel.get("FvtpPJLJ") ulabel = ln.ULabel.get(name="my-label")
- classmethod df(include=None, features=False, limit=100)¶
Convert to
pd.DataFrame.By default, shows all direct fields, except
updated_at.Use arguments
includeorfeatureto include other data.- Parameters:
include (
str|list[str] |None, default:None) – Related fields to include as columns. Takes strings of form"ulabels__name","cell_types__name", etc. or a list of such strings.features (
bool|list[str], default:False) – If a list of feature names, filtersFeaturedown to these features. IfTrue, prints all features with dtypes in the core schema module. If"queryset", infers the features used within the set of artifacts or records. Only available forArtifactandRecord.limit (
int, default:100) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.
- Return type:
DataFrame
Examples
Include the name of the creator in the
DataFrame:>>> ln.ULabel.df(include="created_by__name"])
Include display of features for
Artifact:>>> df = ln.Artifact.df(features=True) >>> ln.view(df) # visualize with type annotations
Only include select features:
>>> df = ln.Artifact.df(features=["cell_type_by_expert", "cell_type_by_model"])
- classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
Search.
- Parameters:
string (
str) – The input string to match against the field ontology values.field (
str|DeferredAttribute|None, default:None) – The field or fields to search. Search all string fields by default.limit (
int|None, default:20) – Maximum amount of top results to return.case_sensitive (
bool, default:False) – Whether the match is case sensitive.
- Return type:
- Returns:
A sorted
DataFrameof search results with a score in columnscore. Ifreturn_querysetisTrue.QuerySet.
Examples
>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2")
- classmethod lookup(field=None, return_field=None)¶
Return an auto-complete object for a field.
- Parameters:
field (
str|DeferredAttribute|None, default:None) – The field to look up the values for. Defaults to first string field.return_field (
str|DeferredAttribute|None, default:None) – The field to return. IfNone, returns the whole record.keep – When multiple records are found for a lookup, how to return the records. -
"first": return the first record. -"last": return the last record. -False: return all records.
- Return type:
NamedTuple- Returns:
A
NamedTupleof lookup information of the field values with a dictionary converter.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
- classmethod using(instance)¶
Use a non-default LaminDB instance.
- Parameters:
instance (
str|None) – An instance identifier of form “account_handle/instance_name”.- Return type:
Examples
>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0
Methods¶
- delete()¶
Delete.
- Return type:
None
- view_lineage(with_successors=False, distance=5)¶
View lineage of transforms.
Note that this only accounts for manually defined predecessors and successors.
Auto-generate lineage through inputs and outputs of runs is not included.
- save(*args, **kwargs)¶
Save.
Always saves to the default database.
- Return type:
TypeVar(T, bound= SQLRecord)