Lecture 5: Convolutional Neural Networks
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture 5 - 1
April 18, 2017
Next: Convolutional Neural Networks
Illustration of LeCun et al. 1998 from CS231n 2017 Lecture 1
Fei-Fei Li & Justin Johnson & Serena Yeung
A bit of history...
Lecture 5 - 4
April 18, 20174
Next: Convolutional Neural Networks
Illustration of LeCun et al. 1998 from CS231n 2017 Lecture 1
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture 5 - 4
April 18, 20174
A bit of history...
recognizable math
Illustration of Rumelhart et al., 1986 by Lane McIntosh, copyright CS231n 2017
Rumelhart et al., 1986: First time back-propagation became popular Fei-Fei Li & Justin Johnson & Serena Yeung
A bit of history... [Hinton and Salakhutdinov 2006]
Lecture L ecture 5 - 7
April April 1 18, 8, 2 2017 017
A bit of history...
recognizable math
Illustration of Rumelhart et al., 1986 by Lane McIntosh, copyright CS231n 2017
Rumelhart et al., 1986: First time back-propagation became popular Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 7
April April 1 18, 8, 2 2017 017
A bit of history... [Hinton and Salakhutdinov 2006]
Reinvigorated research in Deep Learning
Illustration of Hinton Hinton and Salakhutdinov 2006 by Lane McIntosh, copyright CS231n 2017
Fei-Fei Li & Justin Johnson & Serena Yeung
A bit of history: Hubel & Wiesel
Lecture L ecture 5 - 8
April April 1 18, 8, 2 2017 017
A bit of history... [Hinton and Salakhutdinov 2006]
Reinvigorated research in Deep Learning
Illustration of Hinton Hinton and Salakhutdinov 2006 by Lane McIntosh, copyright CS231n 2017
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 8
April April 1 18, 8, 2 2017 017
A bit of history: Hubel & Wiesel, Wiesel, 1959 RECEPTIVE FIELDS OF SINGLE NEURONES IN THE CAT'S STRIATE CORTEX
1962 RECEPTIVE FIELDS, BINOCULAR INTERACTION AND FUNCTIONAL FUNCTIONAL ARCHITECTURE ARCHITECTURE IN THE CAT'S VISUAL CORTEX Cat image by image by CNX OpenStax is licensed
1968... Fei-Fei Li & Justin Johnson & Serena Yeung
A bit of history: Gradient-based learning applied to document recognition
under CC BY 4.0; changes made
10 Lecture 5 - 10
April 18, 2017
A bit of history: Hubel & Wiesel, Wiesel, 1959 RECEPTIVE FIELDS OF SINGLE NEURONES IN THE CAT'S STRIATE CORTEX
1962 RECEPTIVE FIELDS, BINOCULAR INTERACTION AND FUNCTIONAL FUNCTIONAL ARCHITECTURE ARCHITECTURE IN THE CAT'S VISUAL CORTEX Cat image by image by CNX OpenStax is licensed
1968...
under CC BY 4.0; changes made
Fei-Fei Li & Justin Johnson & Serena Yeung
10 Lecture 5 - 10
April 18, 2017
A bit of history: Gradient-based learning applied to document recognition [LeCun, Bottou, Bengio, Haffner 1998]
LeNet-5
Fei-Fei Li & Justin Johnson & Serena Yeung
A bit of history: ImageNet Classification with Deep Convolutional Neural Networks
14 Lecture 5 - 14
April 18, 2017
A bit of history: Gradient-based learning applied to document recognition [LeCun, Bottou, Bengio, Haffner 1998]
LeNet-5
14 Lecture 5 - 14
Fei-Fei Li & Justin Johnson & Serena Yeung
April 18, 2017
A bit of history: ImageNet Classification with Deep Convolutional Neural Networks [Krizhevsky, Sutskever, Hinton, 2012]
Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission.
“AlexNet” Fei-Fei Li & Justin Johnson & Serena Yeung
15 Lecture 5 - 15
April 18, 2017
Fast-forward to today: ConvNets are everywhere Classification
Retrieval
A bit of history: ImageNet Classification with Deep Convolutional Neural Networks [Krizhevsky, Sutskever, Hinton, 2012]
Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission.
“AlexNet” Fei-Fei Li & Justin Johnson & Serena Yeung
15 Lecture 5 - 15
April 18, 2017
Fast-forward to today: ConvNets are everywhere Classification
Retrieval
Figures copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung No errors
Minor errors
Lecture L ecture 5 - 16 Somewhat related
April April 1 18, 8, 2 2017 017
Image Captioning [Vinyals et al., 2015]
Fast-forward to today: ConvNets are everywhere Classification
Retrieval
Figures copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission.
Fei-Fei Li & Justin Johnson & Serena Yeung No errors
Minor errors
Lecture L ecture 5 - 16 Somewhat related
April April 1 18, 8, 2 2017 017
Image Captioning [Vinyals et al., 2015] [Karpathy and Fei-Fei, 2015]
A white teddy bear sitting sitting in the grass
A man riding a wave on top of a surfboard
A man in a baseball uniform throwing a ball
A cat sitting on a suitcase on the floor
A woman is holding a cat in her hand
A woman standing on a beach holding a surfboard
Fei-Fei Li & Justin Johnson & Serena Yeung
All images are are CC0 Public domain: https://pixabay.com/en/luggage-antique-cat-1643010/ https://pixabay.com/en/teddy-plush-bear https://pixabay.com /en/teddy-plush-bears-cute-teddy-bear-1623436/ s-cute-teddy-bear-1623436/ https://pixabay.com/en/surf-wave-sum https://pixabay.com /en/surf-wave-summer-sport-lit mer-sport-litoral-1668716/ oral-1668716/ https://pixabay.com/en/woman-femal https://pixabay.com /en/woman-female-model-portrait-adult-983967/ e-model-portrait-adult-983967/ https://pixabay.com/en/handstand-lake-m https://pixabay.com /en/handstand-lake-meditation-496008/ editation-496008/ https://pixabay.com/en/baseball-player https://pixabay.com /en/baseball-player-shortstop-infiel -shortstop-infield-1045263/ d-1045263/ Captions generated by Justin Johnson using Neuraltalk2
Lecture L ecture 5 - 23
April April 1 18, 8, 2 2017 017
No errors
Minor errors
Somewhat related
Image Captioning [Vinyals et al., 2015] [Karpathy and Fei-Fei, 2015]
A white teddy bear sitting sitting in the grass
A man riding a wave on top of a surfboard
A man in a baseball uniform throwing a ball
A cat sitting on a suitcase on the floor
A woman is holding a cat in her hand
A woman standing on a beach holding a surfboard
Fei-Fei Li & Justin Johnson & Serena Yeung
All images are are CC0 Public domain: https://pixabay.com/en/luggage-antique-cat-1643010/ https://pixabay.com/en/teddy-plush-bear https://pixabay.com /en/teddy-plush-bears-cute-teddy-bear-1623436/ s-cute-teddy-bear-1623436/ https://pixabay.com/en/surf-wave-sum https://pixabay.com /en/surf-wave-summer-sport-lit mer-sport-litoral-1668716/ oral-1668716/ https://pixabay.com/en/woman-femal https://pixabay.com /en/woman-female-model-portrait-adult-983967/ e-model-portrait-adult-983967/ https://pixabay.com/en/handstand-lake-m https://pixabay.com /en/handstand-lake-meditation-496008/ editation-496008/ https://pixabay.com/en/baseball-player https://pixabay.com /en/baseball-player-shortstop-infiel -shortstop-infield-1045263/ d-1045263/ Captions generated by Justin Johnson using Neuraltalk2
Lecture L ecture 5 - 23
April April 1 18, 8, 2 2017 017
Convolutional Neural Networks (First without the brain stuff)
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 25
Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1
April April 1 18, 8, 2 2017 017
Convolutional Neural Networks (First without the brain stuff)
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 25
April April 1 18, 8, 2 2017 017
Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1 input 1 3072
activation 1
10 x 3072 weights
Fei-Fei Li & Justin Johnson & Serena Yeung
10
Lecture L ecture 5 - 26
Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1
April April 1 18, 8, 2 2017 017
Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1 input 1 3072
activation 1
10 x 3072 weights
Fei-Fei Li & Justin Johnson & Serena Yeung
10
Lecture L ecture 5 - 26
April April 1 18, 8, 2 2017 017
Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1 input 1 3072
activation 10 x 3072 weights
1 10 1 number: the result of taking a dot product between a row of W and the input (a 3072-dimensional dot product)
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 27
Convolution Layer 32x32x3 image -> preserve spatial structure
April April 1 18, 8, 2 2017 017
Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1 input
activation
1
10 x 3072 weights
3072
1 10 1 number: the result of taking a dot product between a row of W and the input (a 3072-dimensional dot product)
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 27
April April 1 18, 8, 2 2017 017
Convolution Layer 32x32x3 image -> preserve spatial structure
32 height
32 width 3 depth Fei-Fei Li & Justin Johnson & Serena Yeung
Convolution Layer 32x32x3 image
Lecture L ecture 5 - 28
April April 1 18, 8, 2 2017 017
Convolution Layer 32x32x3 image -> preserve spatial structure
32 height
32 width 3 depth Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 28
April April 1 18, 8, 2 2017 017
Convolution Layer 32x32x3 image 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” 32 3 Fei-Fei Li & Justin Johnson & Serena Yeung
Convolution Layer 32x32x3 32x32x3 image
Lecture L ecture 5 - 29
April April 1 18, 8, 2 2017 017
Filters always extend the full depth of the input volume
Convolution Layer 32x32x3 image 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” 32 3 Fei-Fei Li & Justin Johnson & Serena Yeung
Convolution Layer
Lecture L ecture 5 - 29
April April 1 18, 8, 2 2017 017
Filters always extend the full depth of the input volume
32x32x3 32x32x3 image 5x5x3 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” 32 3 Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 30
Convolution Layer 32x32x3 image 5x5x3 filter
April April 1 18, 8, 2 2017 017
Convolution Layer
Filters always extend the full depth of the input volume
32x32x3 32x32x3 image 5x5x3 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” 32 3 Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 30
April April 1 18, 8, 2 2017 017
Convolution Layer
32
32
32x32x3 image 5x5x3 filter
1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias)
3 Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 31
April April 1 18, 8, 2 2017 017
Convolution Layer activation map
32x32x3 image 5x5x3 filter
Convolution Layer
32
32x32x3 image 5x5x3 filter
1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias)
32 3
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 31
April April 1 18, 8, 2 2017 017
Convolution Layer activation map
32
32x32x3 image 5x5x3 filter 28 convolve (slide) over all spatial locations 28
32 3
1
Fei-Fei Li & Justin Johnson & Serena Yeung
Convolution Layer
Lecture L ecture 5 - 32
April April 1 18, 8, 2 2017 017
consider a second, green filter 32x32x3 image 5x5x3 filter
activation maps
Convolution Layer activation map
32
32x32x3 image 5x5x3 filter 28 convolve (slide) over all spatial locations 28
32 3
1
Fei-Fei Li & Justin Johnson & Serena Yeung
Convolution Layer
32
Lecture L ecture 5 - 32
April April 1 18, 8, 2 2017 017
consider a second, green filter activation maps
32x32x3 image 5x5x3 filter
28 convolve (slide) over all spatial locations 28
32 3
1
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 33
April April 1 18, 8, 2 2017 017
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: activation maps 32
Convolution Layer
32
consider a second, green filter activation maps
32x32x3 image 5x5x3 filter
28 convolve (slide) over all spatial locations 28
32 3
1
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 33
April April 1 18, 8, 2 2017 017
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: activation maps 32 28 Convolution Layer
28
32 3
6
We stack these up to get a “new image” of size 28x28x6! Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 34
April April 1 18, 8, 2 2017 017
Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: activation maps 32 28 Convolution Layer
28
32 3
6
We stack these up to get a “new image” of size 28x28x6! Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 34
April April 1 18, 8, 2 2017 017
Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions
32
32 3
28
CONV, ReLU e.g. 6 5x5x3 filters
28 6
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 35
April April 1 18, 8, 2 2017 017
Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions
Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions
32
28
CONV, ReLU e.g. 6 5x5x3 filters
32 3
28 6
Lecture L ecture 5 - 35
Fei-Fei Li & Justin Johnson & Serena Yeung
April April 1 18, 8, 2 2017 017
Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions
32
32 3
28
CONV, ReLU e.g. 6 5x5x3 filters
28
24
CONV, ReLU e.g. 10 5x5x6 5x5x 6 filters
6
Fei-Fei Li & Justin Johnson & Serena Yeung Preview
CONV, ReLU
….
24 10
Lecture L ecture 5 - 36 [Zeiler and Fergus 2013]
April April 1 18, 8, 2 2017 017
Visualization of VGG-16 by Lane McIntosh. VGG-16 architecture from [Simonyan and Zisserman 2014].
Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions
32
32 3
28
CONV, ReLU e.g. 6 5x5x3 filters
28
24
CONV, ReLU e.g. 10 5x5x6 5x5x 6 filters
6
Fei-Fei Li & Justin Johnson & Serena Yeung Preview
Fei-Fei Li & Justin Johnson & Serena Yeung Preview
CONV, ReLU
….
24 10
Lecture L ecture 5 - 36 [Zeiler and Fergus 2013]
April April 1 18, 8, 2 2017 017
Visualization of VGG-16 by Lane McIntosh. VGG-16 architecture from [Simonyan and Zisserman 2014].
Lecture L ecture 5 - 37
April April 1 18, 8, 2 2017 017
[Zeiler and Fergus 2013]
Preview
Fei-Fei Li & Justin Johnson & Serena Yeung
Visualization of VGG-16 by Lane McIntosh. VGG-16 architecture from [Simonyan and Zisserman 2014].
Lecture L ecture 5 - 37
April April 1 18, 8, 2 2017 017
Lecture L ecture 5 - 38
April April 1 18, 8, 2 2017 017
Preview
Fei-Fei Li & Justin Johnson & Serena Yeung
one filter => one activation map
example 5x5 filters (32 total)
Preview
Fei-Fei Li & Justin Johnson & Serena Yeung
one filter => one activation map
Lecture L ecture 5 - 38
April April 1 18, 8, 2 2017 017
example 5x5 filters (32 total)
We call the layer convolutional because it is related to convolution of two signals:
elementwise multiplication and sum of a filter and the signal (image) Figure copyright Andrej Karpathy.
Fei-Fei Li & Justin Johnson & Serena Yeung preview:
Lecture L ecture 5 - 39
April April 1 18, 8, 2 2017 017
one filter => one activation map
example 5x5 filters (32 total)
We call the layer convolutional because it is related to convolution of two signals:
elementwise multiplication and sum of a filter and the signal (image) Figure copyright Andrej Karpathy.
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 39
April April 1 18, 8, 2 2017 017
Lecture L ecture 5 - 40
April April 1 18, 8, 2 2017 017
preview:
Fei-Fei Li & Justin Johnson & Serena Yeung
A closer look at spatial dimensions: dimensions: activation map
32x32x3 image
preview:
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 40
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions: activation map
32
32x32x3 image 5x5x3 filter 28 convolve (slide) over all spatial locations 28
32 3
1
Fei-Fei Li & Justin Johnson & Serena Yeung
A closer look at spatial dimensions: dimensions:
7
Lecture L ecture 5 - 41
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions: activation map
32
32x32x3 image 5x5x3 filter 28 convolve (slide) over all spatial locations 28
32 3
1
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 41
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7 7x7 input (spatially) assume 3x3 filter 7
Fei-Fei Li & Justin Johnson & Serena Yeung
A closer look at spatial dimensions: dimensions:
7
Lecture L ecture 5 - 42
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7 7x7 input (spatially) assume 3x3 filter 7
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 42
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7 7x7 input (spatially) assume 3x3 filter 7
Fei-Fei Li & Justin Johnson & Serena Yeung
A closer look at spatial dimensions: dimensions:
7
Lecture L ecture 5 - 43
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7 7x7 input (spatially) assume 3x3 filter 7
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 43
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7 7x7 input (spatially) assume 3x3 filter 7
Fei-Fei Li & Justin Johnson & Serena Yeung
A closer look at spatial dimensions: dimensions:
7
Lecture L ecture 5 - 44
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7 7x7 input (spatially) assume 3x3 filter 7
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 44
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7 7x7 input (spatially) assume 3x3 filter 7
Fei-Fei Li & Justin Johnson & Serena Yeung
A closer look at spatial dimensions: dimensions:
7
Lecture L ecture 5 - 45
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7 7x7 input (spatially) assume 3x3 filter 7
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 45
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7 7x7 input (spatially) assume 3x3 filter 7
=> 5x5 output
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 46
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially)
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7 7x7 input (spatially) assume 3x3 filter 7
=> 5x5 output
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 46
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially) assume 3x3 filter applied with stride 2 7
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 47
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially)
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially) assume 3x3 filter applied with stride 2 7
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 47
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially) assume 3x3 filter applied with stride 2 7
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 48
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially)
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially) assume 3x3 filter applied with stride 2 7
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 48
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially) assume 3x3 filter applied with stride 2 => 3x3 output! 7
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 49
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially)
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially) assume 3x3 filter applied with stride 2 => 3x3 output! 7
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 49
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially) assume 3x3 filter applied with stride 3? 7
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 50
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially)
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially) assume 3x3 filter applied with stride 3? 7
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 50
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially) assume 3x3 filter applied with stride 3? 7
Fei-Fei Li & Justin Johnson & Serena Yeung
doesn’t fit! cannot apply 3x3 filter on 7x7 input with stride 3.
Lecture L ecture 5 - 51
N
F
Output size: (N - F) / stride + 1
April April 1 18, 8, 2 2017 017
A closer look at spatial dimensions: dimensions:
7
7x7 input (spatially) assume 3x3 filter applied with stride 3? doesn’t fit! cannot apply 3x3 filter on 7x7 input with stride 3.
7
Lecture L ecture 5 - 51
Fei-Fei Li & Justin Johnson & Serena Yeung
April April 1 18, 8, 2 2017 017
N Output size: (N - F) / stride + 1
F N
F
e.g. N = 7, F = 3: stride 1 => (7 - 3)/1 + 1 = 5 stride 2 => (7 - 3)/2 + 1 = 3 stride 3 => (7 - 3)/3 + 1 = 2.33 :\
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 52
April April 1 18, 8, 2 2017 017
In practice: Common to zero pad the border 0
0
0
0
0
0
e.g. input 7x7
N Output size: (N - F) / stride + 1
F N
F
e.g. N = 7, F = 3: stride 1 => (7 - 3)/1 + 1 = 5 stride 2 => (7 - 3)/2 + 1 = 3 stride 3 => (7 - 3)/3 + 1 = 2.33 :\
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 52
April April 1 18, 8, 2 2017 017
In practice: Common to zero pad the border 0
0
0
0
0
0
0 0
e.g. input 7x7 3x3 filter, 3x3 filter, applied with stride 1 pad with 1 pixel border pixel border => what is the output?
0 0
(recall:) (N - F) / stride + 1 Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 53
April April 1 18, 8, 2 2017 017
In practice: Common to zero pad the border 0
0
0
0
0
0
e.g. input 7x7
In practice: Common to zero pad the border 0
0
0
0
0
0
0 0
e.g. input 7x7 3x3 filter, 3x3 filter, applied with stride 1 pad with 1 pixel border pixel border => what is the output?
0 0
(recall:) (N - F) / stride + 1 Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 53
April April 1 18, 8, 2 2017 017
In practice: Common to zero pad the border 0
0
0
0
0
0
0 0 0
e.g. input 7x7 3x3 filter, 3x3 filter, applied with stride 1 pad with 1 pixel border pixel border => what is the output? 7x7 output!
0
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 54
April April 1 18, 8, 2 2017 017
In practice: Common to zero pad the border 0
0
0
0
0
0
e.g. input 7x7
In practice: Common to zero pad the border 0
0
0
0
0
0
0 0 0
e.g. input 7x7 3x3 filter, 3x3 filter, applied with stride 1 pad with 1 pixel border pixel border => what is the output? 7x7 output!
0
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 54
April April 1 18, 8, 2 2017 017
In practice: Common to zero pad the border 0 0 0 0 0
0
0
0
0
0
e.g. input 7x7 3x3 filter, 3x3 filter, applied with stride 1 pad with 1 pixel border pixel border => what is the output? 7x7 output! in general, common to see CONV layers with stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially) e.g. F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 55
April April 1 18, 8, 2 2017 017
Remember back to… E.g. 32x32 input convolved repeatedly with 5x5 filters filt ers shrinks volumes spatially! (32 -> 28 -> 24 ...). Shrinking too fast is not good, doesn’t work well.
In practice: Common to zero pad the border 0
0
0
0
0
0
e.g. input 7x7 3x3 filter, 3x3 filter, applied with stride 1 pad with 1 pixel border pixel border => what is the output?
0 0 0
7x7 output! in general, common to see CONV layers with stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially) e.g. F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3
0
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 55
April April 1 18, 8, 2 2017 017
Remember back to… E.g. 32x32 input convolved repeatedly with 5x5 filters filt ers shrinks volumes spatially! (32 -> 28 -> 24 ...). Shrinking too fast is not good, doesn’t work well.
32
32 3
28
CONV, ReLU e.g. 6 5x5x3 filters
28
24
CONV, ReLU e.g. 10 5x5x6 5x5x 6 filters
6
Fei-Fei Li & Justin Johnson & Serena Yeung
Examples time:
CONV, ReLU
….
24 10
Lecture L ecture 5 - 56
April April 1 18, 8, 2 2017 017
Remember back to… E.g. 32x32 input convolved repeatedly with 5x5 filters filt ers shrinks volumes spatially! (32 -> 28 -> 24 ...). Shrinking too fast is not good, doesn’t work well.
32
32 3
28
CONV, ReLU e.g. 6 5x5x3 filters
28
24
CONV, ReLU e.g. 10 5x5x6 5x5x 6 filters
6
Fei-Fei Li & Justin Johnson & Serena Yeung
CONV, ReLU
….
24 10
Lecture L ecture 5 - 56
April April 1 18, 8, 2 2017 017
Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Output volume size: ?
Fei-Fei Li & Justin Johnson & Serena Yeung
Examples time:
Lecture L ecture 5 - 57
April April 1 18, 8, 2 2017 017
Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Output volume size: ?
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 57
April April 1 18, 8, 2 2017 017
Examples time: Input volume: 32x32 32x32x3 x3 10 5x5 5x5 filters filters with stride 1, pad 2 Output volume size: (32 32+2* +2*2 2-5)/1 )/1+1 = 32 spatially, so 32x32x10 32x32x10 Fei-Fei Li & Justin Johnson & Serena Yeung
Examples time:
Lecture L ecture 5 - 58
April April 1 18, 8, 2 2017 017
Examples time: Input volume: 32x32 32x32x3 x3 10 5x5 5x5 filters filters with stride 1, pad 2 Output volume size: (32 32+2* +2*2 2-5)/1 )/1+1 = 32 spatially, so 32x32x10 32x32x10 Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 58
April April 1 18, 8, 2 2017 017
Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Number of parameters in this layer?
Fei-Fei Li & Justin Johnson & Serena Yeung
Examples time:
Lecture L ecture 5 - 59
April April 1 18, 8, 2 2017 017
Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Number of parameters in this layer?
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 59
April April 1 18, 8, 2 2017 017
Examples time: Input volume: 32x32 32x32x x3 10 5x5 10 5x5 filters filters with stride 1, pad 2 Number of parameters in this layer? each filter has 5*5 5*5**3 + 1 = 76 76 params params => 76 76**10 10 = = 760 Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 60
(+1 for bias)
April April 1 18, 8, 2 2017 017
Examples time: Input volume: 32x32 32x32x x3 10 5x5 10 5x5 filters filters with stride 1, pad 2 Number of parameters in this layer? each filter has 5*5 5*5**3 + 1 = 76 76 params params => 76 76**10 10 = = 760
(+1 for bias)
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 60
April April 1 18, 8, 2 2017 017
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 61
April April 1 18, 8, 2 2017 017
Common settings: K = (powers of 2, e.g. 32, 64, 128, 512) - F = 3, S = 1, P = 1 F 5 S 1 P 2
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 61
April April 1 18, 8, 2 2017 017
Common settings: K = (powers of 2, e.g. 32, 64, 128, 512) - F = 3, S = 1, P = 1 - F = 5, S = 1, P = 2 - F = 5, S = 2, P = ? (wh (what atev ever er fi fits ts)) - F = 1, S = 1, P = 0
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 62
(btw, 1x1 convolution layers make perfect sense)
April April 1 18, 8, 2 2017 017
Common settings: K = (powers of 2, e.g. 32, 64, 128, 512) - F = 3, S = 1, P = 1 - F = 5, S = 1, P = 2 - F = 5, S = 2, P = ? (wh (what atev ever er fi fits ts)) - F = 1, S = 1, P = 0
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 62
April April 1 18, 8, 2 2017 017
(btw, 1x1 convolution layers make perfect sense)
56
56
1x1 CONV with 32 filters (each filter has size 1x1x64, and performs a 64-dimensional dot product)
64
Fei-Fei Li & Justin Johnson & Serena Yeung
Example: CONV layer in Torch
56
56 32
Lecture L ecture 5 - 63
April April 1 18, 8, 2 2017 017
(btw, 1x1 convolution layers make perfect sense)
56
56
1x1 CONV with 32 filters
56
(each filter has size 1x1x64, and performs a 64-dimensional dot product)
64
Fei-Fei Li & Justin Johnson & Serena Yeung
56 32
Lecture L ecture 5 - 63
April April 1 18, 8, 2 2017 017
Example: CONV layer in Torch
Torch is Torch is licensed under BSD 3-clause. 3-clause.
Fei-Fei Li & Justin Johnson & Serena Yeung
Example: CONV layer in Caffe
Lecture L ecture 5 - 64
April April 1 18, 8, 2 2017 017
Example: CONV layer in Torch
Torch is Torch is licensed under BSD 3-clause. 3-clause.
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 64
April April 1 18, 8, 2 2017 017
Example: CONV layer in Caffe
Caffe is Caffe is licensed under BSD 2-Clause. 2-Clause .
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 65
The brain/neuron view of CONV Layer 32x32x3 image
April April 1 18, 8, 2 2017 017
Example: CONV layer in Caffe
Caffe is Caffe is licensed under BSD 2-Clause. 2-Clause .
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 65
April April 1 18, 8, 2 2017 017
The brain/neuron view of CONV Layer
32
32 3
32x32x3 image 5x5x3 filter
1 number: the result of taking a dot product between the filter and this part of the image (i.e. 5*5*3 = 75-dimensional dot product)
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 66
The brain/neuron view of CONV Layer 32x32x3 image
April April 1 18, 8, 2 2017 017
The brain/neuron view of CONV Layer
32
32x32x3 image 5x5x3 filter
1 number: the result of taking a dot product between the filter and this part of the image (i.e. 5*5*3 = 75-dimensional dot product)
32 3
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 66
April April 1 18, 8, 2 2017 017
The brain/neuron view of CONV Layer
32
32x32x3 image 5x5x3 filter
It’s just a neuron with local connectivity...
32 3
1 number: the result of taking a dot product between the filter and this part of the image (i.e. 5*5*3 = 75-dimensional dot product)
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 67
The brain/neuron view of CONV Layer
April April 1 18, 8, 2 2017 017
The brain/neuron view of CONV Layer
32
32x32x3 image 5x5x3 filter
It’s just a neuron with local connectivity...
1 number: the result of taking a dot product between the filter and this part of the image (i.e. 5*5*3 = 75-dimensional dot product)
32 3
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 67
April April 1 18, 8, 2 2017 017
The brain/neuron view of CONV Layer
32
28
32
28
An activation map is a 28x28 sheet of neuron outputs: 1. Ea Each ch is is conn connec ecte ted d to a sm smal alll regi region on in in the the inpu inputt 2. Al Alll of th them em sh shar are e pa para rame mete ters rs “5x5 filter” -> “5x5 receptive field for each neuron”
3 Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 68
The brain/neuron view of CONV Layer
April April 1 18, 8, 2 2017 017
The brain/neuron view of CONV Layer
32
28
32
28
An activation map is a 28x28 sheet of neuron outputs: 1. Ea Each ch is is conn connec ecte ted d to a sm smal alll regi region on in in the the inpu inputt 2. Al Alll of th them em sh shar are e pa para rame mete ters rs “5x5 filter” -> “5x5 receptive field for each neuron”
3
Lecture L ecture 5 - 68
Fei-Fei Li & Justin Johnson & Serena Yeung
April April 1 18, 8, 2 2017 017
The brain/neuron view of CONV Layer
32 E.g. with 5 filters, CONV layer consists of neurons arranged in a 3D grid (28x28x5)
28
32 3
There will be 5 different neurons all looking at the same region in the input volume
28 5
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 69
April April 1 18, 8, 2 2017 017
Reminder: Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1
Each neuron looks at the full input volume
The brain/neuron view of CONV Layer
32 E.g. with 5 filters, CONV layer consists of neurons arranged in a 3D grid (28x28x5)
28
32
There will be 5 different neurons all looking at the same region in the input volume
28
3
5
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 69
April April 1 18, 8, 2 2017 017
Reminder: Fully Connected Layer Each neuron looks at the full input volume
32x32x3 image -> stretch to 3072 x 1 input 1 3072
activation 10 x 3072 weights
1 10 1 number: the result of taking a dot product between a row of W and the input (a 3072-dimensional dot product)
Fei-Fei Li & Justin Johnson & Serena Yeung two more layers to go: POOL/FC
Lecture L ecture 5 - 70
April April 1 18, 8, 2 2017 017
Reminder: Fully Connected Layer Each neuron looks at the full input volume
32x32x3 image -> stretch to 3072 x 1 input 1 3072
activation 10 x 3072 weights
1 10 1 number: the result of taking a dot product between a row of W and the input (a 3072-dimensional dot product)
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 70
April April 1 18, 8, 2 2017 017
Lecture L ecture 5 - 71
April April 1 18, 8, 2 2017 017
two more layers to go: POOL/FC
Fei-Fei Li & Justin Johnson & Serena Yeung
Pooling layer -
makes makes the the repr represe esenta ntatio tions ns smal smaller ler and more more manag manageab eable le operat operates es over over each each activa activatio tion n map indepe independe ndentl ntly: y:
two more layers to go: POOL/FC
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 71
April April 1 18, 8, 2 2017 017
Pooling layer -
makes makes the the repr represe esenta ntatio tions ns smal smaller ler and more more manag manageab eable le operat operates es over over each each activa activatio tion n map indepe independe ndentl ntly: y:
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 72
MAX POOLING Single depth slice
April April 1 18, 8, 2 2017 017
Pooling layer -
makes makes the the repr represe esenta ntatio tions ns smal smaller ler and more more manag manageab eable le operat operates es over over each each activa activatio tion n map indepe independe ndentl ntly: y:
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 72
April April 1 18, 8, 2 2017 017
MAX POOLING Single depth slice x
1
1
2
4
5
6
7
8
3
2
1
0
1
2
3
4
max pool with 2x2 filters and stride 2
6
8
3
4
y Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 73
April April 1 18, 8, 2 2017 017
MAX POOLING Single depth slice x
1
1
2
4
5
6
7
8
3
2
1
0
1
2
3
4
max pool with 2x2 filters and stride 2
6
8
3
4
y Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 73
April April 1 18, 8, 2 2017 017
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 74
April April 1 18, 8, 2 2017 017
Common settings: F = 2, S = 2
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 74
April April 1 18, 8, 2 2017 017
Common settings: F = 2, S = 2 F = 3, S = 2
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 75
April April 1 18, 8, 2 2017 017
Fully Connected Layer (FC layer) -
Contai Contains ns neuron neurons s that conne connect ct to the the entire entire input input volum volume, e, as in ordin ordinary ary Neura Neurall Networks
Common settings: F = 2, S = 2 F = 3, S = 2
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 75
April April 1 18, 8, 2 2017 017
Fully Connected Layer (FC layer) -
Contai Contains ns neuron neurons s that conne connect ct to the the entire entire input input volum volume, e, as in ordin ordinary ary Neura Neurall Networks
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 76
[ConvNetJS demo: training on CIFAR-10]
April April 1 18, 8, 2 2017 017
Fully Connected Layer (FC layer) -
Contai Contains ns neuron neurons s that conne connect ct to the the entire entire input input volum volume, e, as in ordin ordinary ary Neura Neurall Networks
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 76
April April 1 18, 8, 2 2017 017
[ConvNetJS demo: training on CIFAR-10]
http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html Fei-Fei Li & Justin Johnson & Serena Yeung
Summary
Lecture L ecture 5 - 77
April April 1 18, 8, 2 2017 017
[ConvNetJS demo: training on CIFAR-10]
http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 77
April April 1 18, 8, 2 2017 017
Summary - ConvNets stack CONV,POOL,FC layers - Trend towards smaller filters and deeper architectures - Trend towards getting rid of POOL/FC layers (just CONV) - Typical architectures look like [(CONV-RELU)*N-POOL?]*M-(FC [(CONV-RELU)*N-POOL?]*M-(FC-RELU)*K,SOF -RELU)*K,SOFTMAX TMAX where N is usually up to ~5, M is large, 0 <= K <= 2. - but recent recent advanc advances es such such as as ResNet ResNet/Go /GoogL ogLeNe eNett challenge this paradigm Fei-Fei Li & Justin Johnson & Serena Yeung
import math import math
Lecture L ecture 5 - 78
April April 1 18, 8, 2 2017 017
Summary - ConvNets stack CONV,POOL,FC layers - Trend towards smaller filters and deeper architectures - Trend towards getting rid of POOL/FC layers (just CONV) - Typical architectures look like [(CONV-RELU)*N-POOL?]*M-(FC [(CONV-RELU)*N-POOL?]*M-(FC-RELU)*K,SOF -RELU)*K,SOFTMAX TMAX where N is usually up to ~5, M is large, 0 <= K <= 2. - but recent recent advanc advances es such such as as ResNet ResNet/Go /GoogL ogLeNe eNett challenge this paradigm Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture L ecture 5 - 78
>>> import import math math
[1.0, , 2.0 2.0, , 3.0 3.0, , 4.0 4.0, , 1.0 1.0, , 2.0 2.0, , 3.0 3.0] ] >>> z = [1.0 [math.exp(i) for i in z] >>> z_exp = [math. ([round(i, (i, 2) for i in z_exp]) >>> print >>> print([round [2.72, 7.39, 20.09, 54.6, 2.72, 7.39, 20.09] >>> sum_z_exp = sum sum(z_exp) (z_exp) >>> print >>> print(round round(sum_z_exp, (sum_z_exp, 2))
114.98 >>> softmax = [round [round(i (i / sum_z_exp, 3) for i in z_exp] >>> print >>> print(softmax)
[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]
April April 1 18, 8, 2 2017 017
>>> import import math math
[1.0, , 2.0 2.0, , 3.0 3.0, , 4.0 4.0, , 1.0 1.0, , 2.0 2.0, , 3.0 3.0] ] >>> z = [1.0 [math.exp(i) for i in z] >>> z_exp = [math. ([round(i, (i, 2) for i in z_exp]) >>> print >>> print([round [2.72, 7.39, 20.09, 54.6, 2.72, 7.39, 20.09] >>> sum_z_exp = sum sum(z_exp) (z_exp) >>> print >>> print(round round(sum_z_exp, (sum_z_exp, 2))
114.98 >>> softmax = [round [round(i (i / sum_z_exp, 3) for i in z_exp] >>> print >>> print(softmax)
[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]