This is Alexnet, the model that launched the third wave of A I.
That we find ourselves in today.
Alexnet consists of eight very similar compute blocks.
The first compute block takes in an image
and processes it
by sliding a kernel of Learned weight values over the image,
and at each location
computes the dot product between the image and the kernel.
Alex NET uses 64 different kernels to process the input image,
resulting in 64 new images called activation maps.
These maps pick up various low level features like edges.
From here, these 64 activation maps are stacked together
and the same sliding kernel operation is performed again,
producing a new set of activation maps.
This whole process is repeated over and over
using different Learned weight values each time.
Incredibly, after training on a large data set,
later layers of Alexnet are able to recognize complex concepts
like faces without anyone ever telling Alex that what a face is.
Today, CHAT G P T.
Works in a very similar way.