Lately I’ve been trying to build intuition for diffusion models by implementing things from scratch, without worrying too much about scale, benchmarks, or clever tricks. The goal here was the same as with a lot of my recent side projects: make the math feel concrete by watching it actually run.
This notebook is a minimal(ish) implementation of a class-conditional diffusion model trained on MNIST, using a small UNet and DDIM sampling. It’s not meant to be optimal or fast, and it definitely isn’t production-ready. I mostly wanted to understand:
<ul>
<li>how the forward noising process behaves as a function of time</li>
<li>what the model is really learning when it predicts noise</li>
<li>how classifier-free guidance changes samples in practice</li>
<li>how much you can get away with using a very simple architecture</li>
</ul>
The model is trained to predict noise at random diffusion steps, and sampling is done with a deterministic DDIM update for speed. At the end, I use the conditional model to generate individual digits, and even stitch together multiple digits to generate full numbers.
Nothing fancy, but very satisfying to see working end-to-end.
Below is the full notebook.
<iframe src="https://nbviewer.org/github/Wafik20/SOTA-deep-learning-code-samples/blob/main/Generating_MNIST_Using_Diffusion.ipynb" width="100%" height="800px" style="border:none;"></iframe>
Enjoy!

Generating HandWritten Digits Using Diffusion