Lecture CNN 1

Lecture 5: Convolutional Neural Networks

Fei-Fei Li & Justin Johnson & Serena Yeung

Lecture 5 - 1

April 18, 2017

Next: Convolutional Neural Networks

Illustration of LeCun et al. 1998 from CS231n 2017 Lecture 1


A bit of history...

Lecture 5 - 4

April 18, 20174

Next: Convolutional Neural Networks

Illustration of LeCun et al. 1998 from CS231n 2017 Lecture 1


Lecture 5 - 4

April 18, 20174

A bit of history...

recognizable math

Illustration of Rumelhart et al., 1986 by Lane McIntosh, copyright CS231n 2017

Rumelhart et al., 1986: First time back-propagation became popular Fei-Fei Li & Justin Johnson & Serena Yeung

A bit of history... [Hinton and Salakhutdinov 2006]

Lecture L ecture 5 - 7

April April 1 18, 8, 2 2017 017

A bit of history...

recognizable math

Illustration of Rumelhart et al., 1986 by Lane McIntosh, copyright CS231n 2017

Rumelhart et al., 1986: First time back-propagation became popular Fei-Fei Li & Justin Johnson & Serena Yeung


April April 1 18, 8, 2 2017 017


Reinvigorated research in Deep Learning

Illustration of Hinton Hinton and Salakhutdinov 2006 by Lane McIntosh, copyright CS231n 2017


A bit of history: Hubel & Wiesel


April April 1 18, 8, 2 2017 017


Reinvigorated research in Deep Learning

Illustration of Hinton Hinton and Salakhutdinov 2006 by Lane McIntosh, copyright CS231n 2017



April April 1 18, 8, 2 2017 017

A bit of history: Hubel & Wiesel, Wiesel, 1959 RECEPTIVE FIELDS OF SINGLE NEURONES IN THE CAT'S STRIATE CORTEX

1962 RECEPTIVE FIELDS, BINOCULAR INTERACTION AND FUNCTIONAL FUNCTIONAL ARCHITECTURE ARCHITECTURE IN THE CAT'S VISUAL CORTEX Cat image by image by CNX OpenStax is licensed

1968... Fei-Fei Li & Justin Johnson & Serena Yeung

A bit of history: Gradient-based learning applied to document recognition

under CC BY 4.0; changes made

10 Lecture 5 - 10

April 18, 2017

A bit of history: Hubel & Wiesel, Wiesel, 1959 RECEPTIVE FIELDS OF SINGLE NEURONES IN THE CAT'S STRIATE CORTEX

1962 RECEPTIVE FIELDS, BINOCULAR INTERACTION AND FUNCTIONAL FUNCTIONAL ARCHITECTURE ARCHITECTURE IN THE CAT'S VISUAL CORTEX Cat image by image by CNX OpenStax is licensed

1968...

under CC BY 4.0; changes made


10 Lecture 5 - 10

April 18, 2017

A bit of history: Gradient-based learning applied to document recognition [LeCun, Bottou, Bengio, Haffner 1998]

LeNet-5


A bit of history: ImageNet Classification with Deep Convolutional Neural Networks

14 Lecture 5 - 14

April 18, 2017

A bit of history: Gradient-based learning applied to document recognition [LeCun, Bottou, Bengio, Haffner 1998]

LeNet-5

14 Lecture 5 - 14


April 18, 2017

A bit of history: ImageNet Classification with Deep Convolutional Neural Networks [Krizhevsky, Sutskever, Hinton, 2012]

Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission.

“AlexNet” Fei-Fei Li & Justin Johnson & Serena Yeung

15 Lecture 5 - 15

April 18, 2017

Fast-forward to today: ConvNets are everywhere Classification

Retrieval

A bit of history: ImageNet Classification with Deep Convolutional Neural Networks [Krizhevsky, Sutskever, Hinton, 2012]

Figure copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission.

“AlexNet” Fei-Fei Li & Justin Johnson & Serena Yeung

15 Lecture 5 - 15

April 18, 2017


Retrieval

Figures copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung No errors

Minor errors

Lecture L ecture 5 - 16 Somewhat related

April April 1 18, 8, 2 2017 017

Image Captioning [Vinyals et al., 2015]


Retrieval

Figures copyright Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, 2012. Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung No errors

Minor errors

Lecture L ecture 5 - 16 Somewhat related

April April 1 18, 8, 2 2017 017

Image Captioning [Vinyals et al., 2015] [Karpathy and Fei-Fei, 2015]

A white teddy bear sitting sitting in the grass

A man riding a wave on top of a surfboard

A man in a baseball uniform throwing a ball

A cat sitting on a suitcase on the floor

A woman is holding a cat in her hand

A woman standing on a beach holding a surfboard


All images are are CC0 Public domain: https://pixabay.com/en/luggage-antique-cat-1643010/ https://pixabay.com/en/teddy-plush-bear https://pixabay.com /en/teddy-plush-bears-cute-teddy-bear-1623436/ s-cute-teddy-bear-1623436/ https://pixabay.com/en/surf-wave-sum https://pixabay.com /en/surf-wave-summer-sport-lit mer-sport-litoral-1668716/ oral-1668716/ https://pixabay.com/en/woman-femal https://pixabay.com /en/woman-female-model-portrait-adult-983967/ e-model-portrait-adult-983967/ https://pixabay.com/en/handstand-lake-m https://pixabay.com /en/handstand-lake-meditation-496008/ editation-496008/ https://pixabay.com/en/baseball-player https://pixabay.com /en/baseball-player-shortstop-infiel -shortstop-infield-1045263/ d-1045263/ Captions generated by Justin Johnson using Neuraltalk2


April April 1 18, 8, 2 2017 017

No errors

Minor errors

Somewhat related

Image Captioning [Vinyals et al., 2015] [Karpathy and Fei-Fei, 2015]

A white teddy bear sitting sitting in the grass

A man riding a wave on top of a surfboard

A man in a baseball uniform throwing a ball

A cat sitting on a suitcase on the floor

A woman is holding a cat in her hand

A woman standing on a beach holding a surfboard


All images are are CC0 Public domain: https://pixabay.com/en/luggage-antique-cat-1643010/ https://pixabay.com/en/teddy-plush-bear https://pixabay.com /en/teddy-plush-bears-cute-teddy-bear-1623436/ s-cute-teddy-bear-1623436/ https://pixabay.com/en/surf-wave-sum https://pixabay.com /en/surf-wave-summer-sport-lit mer-sport-litoral-1668716/ oral-1668716/ https://pixabay.com/en/woman-femal https://pixabay.com /en/woman-female-model-portrait-adult-983967/ e-model-portrait-adult-983967/ https://pixabay.com/en/handstand-lake-m https://pixabay.com /en/handstand-lake-meditation-496008/ editation-496008/ https://pixabay.com/en/baseball-player https://pixabay.com /en/baseball-player-shortstop-infiel -shortstop-infield-1045263/ d-1045263/ Captions generated by Justin Johnson using Neuraltalk2


April April 1 18, 8, 2 2017 017

Convolutional Neural Networks (First without the brain stuff)



Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1

April April 1 18, 8, 2 2017 017

Convolutional Neural Networks (First without the brain stuff)



April April 1 18, 8, 2 2017 017

Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1 input 1 3072

activation 1

10 x 3072 weights


10


Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1

April April 1 18, 8, 2 2017 017


activation 1

10 x 3072 weights


10


April April 1 18, 8, 2 2017 017


activation 10 x 3072 weights

1 10 1 number: the result of taking a dot product between a row of W and the input (a 3072-dimensional dot product)



Convolution Layer 32x32x3 image -> preserve spatial structure

April April 1 18, 8, 2 2017 017

Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1 input

activation

1

10 x 3072 weights

3072




April April 1 18, 8, 2 2017 017


32 height

32 width 3 depth Fei-Fei Li & Justin Johnson & Serena Yeung

Convolution Layer 32x32x3 image


April April 1 18, 8, 2 2017 017


32 height

32 width 3 depth Fei-Fei Li & Justin Johnson & Serena Yeung


April April 1 18, 8, 2 2017 017

Convolution Layer 32x32x3 image 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” 32 3 Fei-Fei Li & Justin Johnson & Serena Yeung

Convolution Layer 32x32x3 32x32x3 image


April April 1 18, 8, 2 2017 017

Filters always extend the full depth of the input volume

Convolution Layer 32x32x3 image 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” 32 3 Fei-Fei Li & Justin Johnson & Serena Yeung

Convolution Layer


April April 1 18, 8, 2 2017 017


32x32x3 32x32x3 image 5x5x3 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” 32 3 Fei-Fei Li & Justin Johnson & Serena Yeung


Convolution Layer 32x32x3 image 5x5x3 filter

April April 1 18, 8, 2 2017 017

Convolution Layer


32x32x3 32x32x3 image 5x5x3 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” 32 3 Fei-Fei Li & Justin Johnson & Serena Yeung


April April 1 18, 8, 2 2017 017

Convolution Layer

32

32

32x32x3 image 5x5x3 filter

1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias)

3 Fei-Fei Li & Justin Johnson & Serena Yeung


April April 1 18, 8, 2 2017 017

Convolution Layer activation map


Convolution Layer

32


1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias)

32 3



April April 1 18, 8, 2 2017 017


32

32x32x3 image 5x5x3 filter 28 convolve (slide) over all spatial locations 28

32 3

1


Convolution Layer


April April 1 18, 8, 2 2017 017

consider a second, green filter 32x32x3 image 5x5x3 filter

activation maps


32


32 3

1


Convolution Layer

32


April April 1 18, 8, 2 2017 017

consider a second, green filter activation maps


28 convolve (slide) over all spatial locations 28

32 3

1



April April 1 18, 8, 2 2017 017

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: activation maps 32

Convolution Layer

32

consider a second, green filter activation maps


28 convolve (slide) over all spatial locations 28

32 3

1



April April 1 18, 8, 2 2017 017

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: activation maps 32 28 Convolution Layer

28

32 3

6

We stack these up to get a “new image” of size 28x28x6! Fei-Fei Li & Justin Johnson & Serena Yeung


April April 1 18, 8, 2 2017 017

Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: activation maps 32 28 Convolution Layer

28

32 3

6

We stack these up to get a “new image” of size 28x28x6! Fei-Fei Li & Justin Johnson & Serena Yeung


April April 1 18, 8, 2 2017 017


32

32 3

28

CONV, ReLU e.g. 6 5x5x3 filters

28 6



April April 1 18, 8, 2 2017 017

Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions


32

28


32 3

28 6



April April 1 18, 8, 2 2017 017


32

32 3

28


28

24

CONV, ReLU e.g. 10 5x5x6 5x5x 6 filters

6

Fei-Fei Li & Justin Johnson & Serena Yeung Preview

CONV, ReLU

….

24 10

Lecture L ecture 5 - 36 [Zeiler and Fergus 2013]

April April 1 18, 8, 2 2017 017

Visualization of VGG-16 by Lane McIntosh. VGG-16 architecture from [Simonyan and Zisserman 2014].


32

32 3

28


28

24


6



CONV, ReLU

….

24 10

Lecture L ecture 5 - 36 [Zeiler and Fergus 2013]

April April 1 18, 8, 2 2017 017



April April 1 18, 8, 2 2017 017

[Zeiler and Fergus 2013]

Preview




April April 1 18, 8, 2 2017 017


April April 1 18, 8, 2 2017 017

Preview


one filter => one activation map

example 5x5 filters (32 total)

Preview




April April 1 18, 8, 2 2017 017


We call the layer convolutional because it is related to convolution of two signals:

elementwise multiplication and sum of a filter and the signal (image) Figure copyright Andrej Karpathy.

Fei-Fei Li & Justin Johnson & Serena Yeung preview:


April April 1 18, 8, 2 2017 017



We call the layer convolutional because it is related to convolution of two signals:

elementwise multiplication and sum of a filter and the signal (image) Figure copyright Andrej Karpathy.



April April 1 18, 8, 2 2017 017


April April 1 18, 8, 2 2017 017

preview:


A closer look at spatial dimensions: dimensions: activation map

32x32x3 image

preview:



April April 1 18, 8, 2 2017 017


32


32 3

1


A closer look at spatial dimensions: dimensions:

7


April April 1 18, 8, 2 2017 017


32


32 3

1



April April 1 18, 8, 2 2017 017


7 7x7 input (spatially) assume 3x3 filter 7



7


April April 1 18, 8, 2 2017 017





April April 1 18, 8, 2 2017 017





7


April April 1 18, 8, 2 2017 017





April April 1 18, 8, 2 2017 017





7


April April 1 18, 8, 2 2017 017





April April 1 18, 8, 2 2017 017





7


April April 1 18, 8, 2 2017 017





April April 1 18, 8, 2 2017 017



=> 5x5 output




7

7x7 input (spatially)

April April 1 18, 8, 2 2017 017



=> 5x5 output



April April 1 18, 8, 2 2017 017


7

7x7 input (spatially) assume 3x3 filter applied with stride 2 7




7


April April 1 18, 8, 2 2017 017


7




April April 1 18, 8, 2 2017 017


7





7


April April 1 18, 8, 2 2017 017


7




April April 1 18, 8, 2 2017 017


7

7x7 input (spatially) assume 3x3 filter applied with stride 2 => 3x3 output! 7




7


April April 1 18, 8, 2 2017 017


7

7x7 input (spatially) assume 3x3 filter applied with stride 2 => 3x3 output! 7



April April 1 18, 8, 2 2017 017


7

7x7 input (spatially) assume 3x3 filter applied with stride 3? 7




7


April April 1 18, 8, 2 2017 017


7




April April 1 18, 8, 2 2017 017


7



doesn’t fit! cannot apply 3x3 filter on 7x7 input with stride 3.


N

F

Output size: (N - F) / stride + 1

April April 1 18, 8, 2 2017 017


7

7x7 input (spatially) assume 3x3 filter applied with stride 3? doesn’t fit! cannot apply 3x3 filter on 7x7 input with stride 3.

7



April April 1 18, 8, 2 2017 017

N Output size: (N - F) / stride + 1

F N

F

e.g. N = 7, F = 3: stride 1 => (7 - 3)/1 + 1 = 5 stride 2 => (7 - 3)/2 + 1 = 3 stride 3 => (7 - 3)/3 + 1 = 2.33 :\



April April 1 18, 8, 2 2017 017

In practice: Common to zero pad the border 0

0

0

0

0

0

e.g. input 7x7

N Output size: (N - F) / stride + 1

F N

F

e.g. N = 7, F = 3: stride 1 => (7 - 3)/1 + 1 = 5 stride 2 => (7 - 3)/2 + 1 = 3 stride 3 => (7 - 3)/3 + 1 = 2.33 :\



April April 1 18, 8, 2 2017 017


0

0

0

0

0

0 0

e.g. input 7x7 3x3 filter, 3x3 filter, applied with stride 1 pad with 1 pixel border pixel border => what is the output?

0 0

(recall:) (N - F) / stride + 1 Fei-Fei Li & Justin Johnson & Serena Yeung


April April 1 18, 8, 2 2017 017


0

0

0

0

0

e.g. input 7x7


0

0

0

0

0

0 0


0 0

(recall:) (N - F) / stride + 1 Fei-Fei Li & Justin Johnson & Serena Yeung


April April 1 18, 8, 2 2017 017


0

0

0

0

0

0 0 0

e.g. input 7x7 3x3 filter, 3x3 filter, applied with stride 1 pad with 1 pixel border pixel border => what is the output? 7x7 output!

0



April April 1 18, 8, 2 2017 017


0

0

0

0

0

e.g. input 7x7


0

0

0

0

0

0 0 0

e.g. input 7x7 3x3 filter, 3x3 filter, applied with stride 1 pad with 1 pixel border pixel border => what is the output? 7x7 output!

0



April April 1 18, 8, 2 2017 017

In practice: Common to zero pad the border 0 0 0 0 0

0

0

0

0

0

e.g. input 7x7 3x3 filter, 3x3 filter, applied with stride 1 pad with 1 pixel border pixel border => what is the output? 7x7 output! in general, common to see CONV layers with stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially) e.g. F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3



April April 1 18, 8, 2 2017 017

Remember back to… E.g. 32x32 input convolved repeatedly with 5x5 filters filt ers shrinks volumes spatially! (32 -> 28 -> 24 ...). Shrinking too fast is not good, doesn’t work well.


0

0

0

0

0


0 0 0

7x7 output! in general, common to see CONV layers with stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially) e.g. F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3

0



April April 1 18, 8, 2 2017 017


32

32 3

28


28

24


6


Examples time:

CONV, ReLU

….

24 10


April April 1 18, 8, 2 2017 017


32

32 3

28


28

24


6


CONV, ReLU

….

24 10


April April 1 18, 8, 2 2017 017

Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Output volume size: ?


Examples time:


April April 1 18, 8, 2 2017 017

Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Output volume size: ?



April April 1 18, 8, 2 2017 017

Examples time: Input volume: 32x32 32x32x3 x3 10 5x5 5x5 filters filters with stride 1, pad 2 Output volume size: (32 32+2* +2*2 2-5)/1 )/1+1 = 32 spatially, so 32x32x10 32x32x10 Fei-Fei Li & Justin Johnson & Serena Yeung

Examples time:


April April 1 18, 8, 2 2017 017

Examples time: Input volume: 32x32 32x32x3 x3 10 5x5 5x5 filters filters with stride 1, pad 2 Output volume size: (32 32+2* +2*2 2-5)/1 )/1+1 = 32 spatially, so 32x32x10 32x32x10 Fei-Fei Li & Justin Johnson & Serena Yeung


April April 1 18, 8, 2 2017 017

Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Number of parameters in this layer?


Examples time:


April April 1 18, 8, 2 2017 017

Examples time: Input volume: 32x32x3 10 5x5 filters with stride 1, pad 2 Number of parameters in this layer?



April April 1 18, 8, 2 2017 017

Examples time: Input volume: 32x32 32x32x x3 10 5x5 10 5x5 filters filters with stride 1, pad 2 Number of parameters in this layer? each filter has 5*5 5*5**3 + 1 = 76 76 params params => 76 76**10 10 = = 760 Fei-Fei Li & Justin Johnson & Serena Yeung


(+1 for bias)

April April 1 18, 8, 2 2017 017

Examples time: Input volume: 32x32 32x32x x3 10 5x5 10 5x5 filters filters with stride 1, pad 2 Number of parameters in this layer? each filter has 5*5 5*5**3 + 1 = 76 76 params params => 76 76**10 10 = = 760

(+1 for bias)



April April 1 18, 8, 2 2017 017



April April 1 18, 8, 2 2017 017

Common settings: K = (powers of 2, e.g. 32, 64, 128, 512) - F = 3, S = 1, P = 1 F 5 S 1 P 2



April April 1 18, 8, 2 2017 017

Common settings: K = (powers of 2, e.g. 32, 64, 128, 512) - F = 3, S = 1, P = 1 - F = 5, S = 1, P = 2 - F = 5, S = 2, P = ? (wh (what atev ever er fi fits ts)) - F = 1, S = 1, P = 0



(btw, 1x1 convolution layers make perfect sense)

April April 1 18, 8, 2 2017 017

Common settings: K = (powers of 2, e.g. 32, 64, 128, 512) - F = 3, S = 1, P = 1 - F = 5, S = 1, P = 2 - F = 5, S = 2, P = ? (wh (what atev ever er fi fits ts)) - F = 1, S = 1, P = 0



April April 1 18, 8, 2 2017 017


56

56

1x1 CONV with 32 filters (each filter has size 1x1x64, and performs a 64-dimensional dot product)

64


Example: CONV layer in Torch

56

56 32


April April 1 18, 8, 2 2017 017


56

56

1x1 CONV with 32 filters

56

(each filter has size 1x1x64, and performs a 64-dimensional dot product)

64


56 32


April April 1 18, 8, 2 2017 017


Torch is Torch is licensed under BSD 3-clause. 3-clause.


Example: CONV layer in Caffe


April April 1 18, 8, 2 2017 017


Torch is Torch is licensed under BSD 3-clause. 3-clause.



April April 1 18, 8, 2 2017 017


Caffe is Caffe is licensed under BSD 2-Clause. 2-Clause .



The brain/neuron view of CONV Layer 32x32x3 image

April April 1 18, 8, 2 2017 017


Caffe is Caffe is licensed under BSD 2-Clause. 2-Clause .



April April 1 18, 8, 2 2017 017

The brain/neuron view of CONV Layer

32

32 3


1 number: the result of taking a dot product between the filter and this part of the image (i.e. 5*5*3 = 75-dimensional dot product)



The brain/neuron view of CONV Layer 32x32x3 image

April April 1 18, 8, 2 2017 017


32



32 3



April April 1 18, 8, 2 2017 017


32


It’s just a neuron with local connectivity...

32 3





April April 1 18, 8, 2 2017 017


32


It’s just a neuron with local connectivity...


32 3



April April 1 18, 8, 2 2017 017


32

28

32

28

An activation map is a 28x28 sheet of neuron outputs: 1. Ea Each ch is is conn connec ecte ted d to a sm smal alll regi region on in in the the inpu inputt 2. Al Alll of th them em sh shar are e pa para rame mete ters rs “5x5 filter” -> “5x5 receptive field for each neuron”

3 Fei-Fei Li & Justin Johnson & Serena Yeung



April April 1 18, 8, 2 2017 017


32

28

32

28

An activation map is a 28x28 sheet of neuron outputs: 1. Ea Each ch is is conn connec ecte ted d to a sm smal alll regi region on in in the the inpu inputt 2. Al Alll of th them em sh shar are e pa para rame mete ters rs “5x5 filter” -> “5x5 receptive field for each neuron”

3



April April 1 18, 8, 2 2017 017


32 E.g. with 5 filters, CONV layer consists of neurons arranged in a 3D grid (28x28x5)

28

32 3

There will be 5 different neurons all looking at the same region in the input volume

28 5



April April 1 18, 8, 2 2017 017

Reminder: Fully Connected Layer 32x32x3 image -> stretch to 3072 x 1

Each neuron looks at the full input volume


32 E.g. with 5 filters, CONV layer consists of neurons arranged in a 3D grid (28x28x5)

28

32

There will be 5 different neurons all looking at the same region in the input volume

28

3

5



April April 1 18, 8, 2 2017 017

Reminder: Fully Connected Layer Each neuron looks at the full input volume

32x32x3 image -> stretch to 3072 x 1 input 1 3072



Fei-Fei Li & Justin Johnson & Serena Yeung two more layers to go: POOL/FC


April April 1 18, 8, 2 2017 017

Reminder: Fully Connected Layer Each neuron looks at the full input volume

32x32x3 image -> stretch to 3072 x 1 input 1 3072





April April 1 18, 8, 2 2017 017


April April 1 18, 8, 2 2017 017

two more layers to go: POOL/FC


Pooling layer -

makes makes the the repr represe esenta ntatio tions ns smal smaller ler and more more manag manageab eable le operat operates es over over each each activa activatio tion n map indepe independe ndentl ntly: y:

two more layers to go: POOL/FC



April April 1 18, 8, 2 2017 017

Pooling layer -




MAX POOLING Single depth slice

April April 1 18, 8, 2 2017 017

Pooling layer -




April April 1 18, 8, 2 2017 017

MAX POOLING Single depth slice x

1

1

2

4

5

6

7

8

3

2

1

0

1

2

3

4

max pool with 2x2 filters and stride 2

6

8

3

4

y Fei-Fei Li & Justin Johnson & Serena Yeung


April April 1 18, 8, 2 2017 017

MAX POOLING Single depth slice x

1

1

2

4

5

6

7

8

3

2

1

0

1

2

3

4

max pool with 2x2 filters and stride 2

6

8

3

4

y Fei-Fei Li & Justin Johnson & Serena Yeung


April April 1 18, 8, 2 2017 017



April April 1 18, 8, 2 2017 017

Common settings: F = 2, S = 2



April April 1 18, 8, 2 2017 017

Common settings: F = 2, S = 2 F = 3, S = 2



April April 1 18, 8, 2 2017 017

Fully Connected Layer (FC layer) -

Contai Contains ns neuron neurons s that conne connect ct to the the entire entire input input volum volume, e, as in ordin ordinary ary Neura Neurall Networks

Common settings: F = 2, S = 2 F = 3, S = 2



April April 1 18, 8, 2 2017 017





[ConvNetJS demo: training on CIFAR-10]

April April 1 18, 8, 2 2017 017





April April 1 18, 8, 2 2017 017


http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html Fei-Fei Li & Justin Johnson & Serena Yeung

Summary


April April 1 18, 8, 2 2017 017


http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html Fei-Fei Li & Justin Johnson & Serena Yeung


April April 1 18, 8, 2 2017 017

Summary - ConvNets stack CONV,POOL,FC layers - Trend towards smaller filters and deeper architectures - Trend towards getting rid of POOL/FC layers (just CONV) - Typical architectures look like [(CONV-RELU)*N-POOL?]*M-(FC [(CONV-RELU)*N-POOL?]*M-(FC-RELU)*K,SOF -RELU)*K,SOFTMAX TMAX where N is usually up to ~5, M is large, 0 <= K <= 2. - but recent recent advanc advances es such such as as ResNet ResNet/Go /GoogL ogLeNe eNett challenge this paradigm Fei-Fei Li & Justin Johnson & Serena Yeung

import math import math


April April 1 18, 8, 2 2017 017

Summary - ConvNets stack CONV,POOL,FC layers - Trend towards smaller filters and deeper architectures - Trend towards getting rid of POOL/FC layers (just CONV) - Typical architectures look like [(CONV-RELU)*N-POOL?]*M-(FC [(CONV-RELU)*N-POOL?]*M-(FC-RELU)*K,SOF -RELU)*K,SOFTMAX TMAX where N is usually up to ~5, M is large, 0 <= K <= 2. - but recent recent advanc advances es such such as as ResNet ResNet/Go /GoogL ogLeNe eNett challenge this paradigm Fei-Fei Li & Justin Johnson & Serena Yeung


>>> import import math math

[1.0, , 2.0 2.0, , 3.0 3.0, , 4.0 4.0, , 1.0 1.0, , 2.0 2.0, , 3.0 3.0] ] >>> z = [1.0 [math.exp(i) for i in z] >>> z_exp = [math. ([round(i, (i, 2) for i in z_exp]) >>> print >>> print([round [2.72, 7.39, 20.09, 54.6, 2.72, 7.39, 20.09] >>> sum_z_exp = sum sum(z_exp) (z_exp) >>> print >>> print(round round(sum_z_exp, (sum_z_exp, 2))

114.98 >>> softmax = [round [round(i (i / sum_z_exp, 3) for i in z_exp] >>> print >>> print(softmax)

[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]

April April 1 18, 8, 2 2017 017

>>> import import math math

[1.0, , 2.0 2.0, , 3.0 3.0, , 4.0 4.0, , 1.0 1.0, , 2.0 2.0, , 3.0 3.0] ] >>> z = [1.0 [math.exp(i) for i in z] >>> z_exp = [math. ([round(i, (i, 2) for i in z_exp]) >>> print >>> print([round [2.72, 7.39, 20.09, 54.6, 2.72, 7.39, 20.09] >>> sum_z_exp = sum sum(z_exp) (z_exp) >>> print >>> print(round round(sum_z_exp, (sum_z_exp, 2))

114.98 >>> softmax = [round [round(i (i / sum_z_exp, 3) for i in z_exp] >>> print >>> print(softmax)

[0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]

Lecture CNN 1

Recommend Documents