Introduction

The paper [1] provides a great overview. You can view the convolution animations on their Github.

PyTorch

>>> inp = torch.randint(low=0, high=256, size=(16, 3, 7, 7)) / 255 # random image
>>> w = torch.rand(6, 3, 3, 3) # 3x3 kernel, 3 in-channels, 6 out-channels
>>> res = torch.nn.functional.unfold(inp, (3, 3))
>>> res.shape
torch.Size([16, 27, 25])
>>> res = res.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2)
>>> # (16, 25, 27) @ (27, 6) -> (16, 25, 6)
>>> res.shape
torch.Size([16, 6, 25])
>>> res = torch.nn.functional.fold(res, (5, 5), (1, 1))
>>> # 7 - 3 + 1 = 5
>>> res.shape
torch.Size([16, 6, 5, 5])
>>> torch.allclose(torch.nn.functional.conv2d(inp, w), res)
True

References

  1. [1]V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning,” arXiv preprint arXiv:1603.07285, 2016.