Curate MNIST¶
# !pip install 'lamindb[jupyter]' torch torchvision lightning
!lamin init --storage ./lamin-mlops
import lamindb as ln
from pathlib import Path
ln.track()
Download the MNIST dataset and save it in LaminDB to keep track of the training data that is associated with our model.
from torchvision.datasets import MNIST
from torchvision.transforms import ToTensor
dataset = MNIST(Path.cwd() / "download_mnist", download=True, transform=ToTensor())
# no need for the zipped files
!rm -r download_mnist/MNIST/raw/*.gz
!ls -r download_mnist/MNIST/raw
train-labels-idx1-ubyte t10k-labels-idx1-ubyte
train-images-idx3-ubyte t10k-images-idx3-ubyte
training_data_artifact = ln.Artifact(
"download_mnist/",
key="testdata/mnist",
type="dataset",
).save()
training_data_artifact
After saving the MNIST training dataset in LaminDB, one can see the dataset showing up in LaminHub:

ln.finish()