Adaline/Madaline
Dr. Bernard Widrow* Professor of Electrical Engineering, Stanford University
Dr. Bernard Widrow is Professor of Electrical Engineering at Stanford University. His fields of teaching and research are signal processing, neural networks, acoustics, and control systems. Before coming to Stanford in 1959, he taught at MIT where he received the Doctor of Science Degree Degree in 1956. Dr. Widrow is the author of two books: “Adaptive Signal Processing,” and “Adaptive Inverse Control,” both published by Prentice-Hall. Each is the first of its kind, establishing new fields of research and engineering that are being pursued worldwide by students, faculty, and practicing engineers. Dr. Widrow is the inventor or co-inventor of 17 patents. One of his inventions, an adaptive filter based on the LMS (least mean square) algorithm, is used in almost all the computer MODEMS in the world, making high-speed digital communications (such (such as the internet) possible. He is co-inventor co-inventor of a directional hearing aid that will enable many people with severe to profound hearing loss to regain speech recognition and communication ability. Dr. Widrow has started Cardinal Sound Labs to develop and commercialize the technology. He has been honored many times for his research. The Institute of Electrical and Electronic Engineers (IEEE), elected him a Fellow in 1976. In 1984, he received the IEEE Alexander Graham Bell Medal. He was inducted into the National Academy of Engineering in 1995. Dr. Widrow is currently supervising ten doctoral students at Stanford. Over the years, more than sixty students have completed their Ph.D.’s under his supervision. Many of his former students have become founders and top scientists in Silicon Valley companies. About ten have become university professors, four have gone on to medical school and become MD’s, and two have become Admirals in the U. S. Navy. *http://www.svec.org/hof/1999.html#widrow
Adaline • Name comes from Adaptive Linear neuron – Tribut Tributee to its its resemb resemblance lance to a single single biolo biological gical nerve cell
• Invented by Bernard Widrow in 1959 • Like the perceptron, use a threshold logic device that performs a linear summation of inputs (Classify linearly separable patterns) – Its weig weight ht parame parameter terss are adap adapted ted over over time. time.
Judith Dayhoff, Neural Network Architectures: An Introduction, Van Nostrand Reinhold
Adaline Structure
Neural Computing: NeuralWorks, NeuralWare, Inc
Adaline Learning Algorithm • A learning control mechanism samples the inputs, the output, and the desired output and uses these to adjust the weights. • There are several variants of the adaline learning algorithm – We use use (B. Widro Widrow, w, and and F. W. Smith, Smith, “Pat “Pattern tern - recog recognizin nizing g Control Systems,” Computer and Informations Sciences Symposium Proceedings, Spartan Books, Washington, DC, 1963.
W (t + 1) = W (t ) + η d (t ) − ∑ W (t ) X (t ) X (t ) =1 n
i
i
i
i
i
i
Where
0≤ i ≤ n and η is the learning rate and usually is a small number ranging from 0 to 1 ( typically η ≤ 1/n) Neural Computing: NeuralWorks, NeuralWare, Inc
Adaline Learning Algorithm • Computes the error signal for each iteration and adjusting the weights to eliminate the error using the delta rule, also known as Widrow-Hoff Learning rule – This algorit algorithm hm has been been shown shown to guaran guarantee tee that that the set of weights exists, and at the very least, to guarantee that the set of weights will minimize the error in a leastmean-square sense (LMS)
Neural Computing: NeuralWorks, NeuralWare, Inc
Least Mean Square Error • The delta rule for adjusting the ith weight for each pattern n is
∆ W (t + 1) = η d (t ) − ∑ W (t ) X (t ) X (t ) =1 i
i
i
i
i
• The squared error for a particular training pattern is
E = d (t ) − ∑ W (t ) X (t ) = n
i
2
i
i 1
L. Fausett, Fund. Of NN, Prentice Hall
Least Mean Square Error (Cont.) • The error can be reduced by adjusting the weight Wi in the direction of negative gradient − ∂ E
E = d (t ) − ∑ W (t ) X (t ) = n
i
∂W
2
i
i
i 1
(t ) − 2 d(t )∑ W (t ) X (t ) + ∑ W (t ) X (t ) = = n
E
= d
2
n
i
i
i
i
2
i
i 1
1
and
∂ E = − 2 d (t ) − ∑ W ( t ) X (t ) X (t ) ∂W = n
i
i
i
i
i
1
The local error will be reduced most rapidly (for a given learning rate) by adjusting the weights according to the delta rule.
∆ W (t + 1) = η d (t ) − ∑ W (t ) X (t ) X (t ) = n
i
i
i
i
i
1
L. Fausett, Fund. Of NN, Prentice Hall
Adaline: Storage Capacity* N/(n+1)
Probability (N/(n+1))
n
1.0 2.0 3.0 1.5 2.0 2.5
1.0 0.5 0.0 1.0 0.5 0.0
n>5 n>5 n>5 n>50 n>50 n>50
*
Estimates of the storage capacity for an adaline have been made and experimentally verified
N = Number of patterns to be trained, n = number of weights (number of input weights +1)
Neural Computing: NeuralWorks, NeuralWare, Inc
Adaline:Learning Procedure Step 1: 1: Initialize Weights (W1..Wn) and Threshold (W0) • Set all all weights weights and thresh threshold old to small bipolar bipolar random values (±).
Step 2: 2: Present New Input and Desired Output • Pres Presen entt input input ve vect ctor or x1, x2, .....xn along with the desired output d(t). Note: x0 is a fixed bias and always set equal to 1 d(t) takes the value of ±1
Adaline:Learning Procedure Step 3: 3: Calculate Actual Output [y(t)] n
y(t) = Fh[ Σ wi(t) * xi(t) ] i=0 where Fh (e) = 1 when e > 0, and = -1 when e <= 0
Step 4: 4: Adapt Weights
W (t + 1) = W (t ) + η d (t ) − ∑ W (t ) X (t ) X (t ) = n
i
i
i
i
where
Note:
0
i
i
1
and
η is the learning rate and usually is a small number
ranging from 0 to 1. wi(t+1) = wi(t) if d(t) = y(t)
Adaline:Learning Procedure Step 5: 5: Repeat step 2 to 4 • Repeat until the desired desired outputs and the the actual actual network outputs are all equal for all the input vectors of the training set.
Thoughts on Adaline • Similar basic neural structure as Perceptron • Single adaline could only classify linearly separable patterns • Widrow and Hoff update rule guarantees that the set of weights will minimize the error in a least-mean-square sense and thus the local error will be reduced most rapidly – Experimen Experimental tal results results indic indicate ate that that an adalin adalinee will typic typically ally converge to a stable solution in five times as many learning trials as there are weights Neural Computing: NeuralWorks, NeuralWare, Inc
Madaline • Consists of Many Adaptive Linear Neurons arranged in a multilayer net. • Employs a majority vote rule on the outputs of the adaline layer • If more more than half of the adali adalines nes output output a +1, +1, then the madaline unit outputs +1 (similarly for -1)
• Be able to classify nonlinear functions similar to multi-layer Perceptron • Original learning rule uses Widrow and Hoff rule
Judith Dayhoff, Neural Network Architectures: An Introduction, Van Nostrand Reinhold
Madaline Structure Adjustable Weights Majority function
Output
Adaline Layer Input Layer
Madaline Neural Computing: NeuralWorks, NeuralWare, Inc
Madaline:Learning Procedure Step 1: 1: Initialize Weights (Wk1..Wkn) and Threshold (Wk0) • Set all all weights weights and thresh threshold old to small bipolar bipolar random values (±). – k represent representss the adal adaline ine unit unit k, and – n represents represents the the number number of inputs inputs to each each adaline adaline unit
Step 2: 2: Present New Input and Desired Output • Pres Presen entt input input ve vect ctor or x1, x2, .....xn along with the desired output d(t). • Note:** x0 is a fixed bias and always set equal to 1. – ** d(t d(t)) take takess the the val value ue of ±1.
Madaline:Learning Procedure Step 3: 3: Calculate Actual adaline Output [yk(t)] n
yk (t) = Fh[ Σ wki(t) * xki(t) ] i=0 where Fh (e) = 1 when e > 0, and = -1 when e <= 0 yk (t) is the output from adaline unit k.
Step 4: 4: Determine Actual Madaline Output[M(t)]
M(t) = Majority (yk (t))
Madaline:Learning Procedure Step 5: 5: Determine error and update weights If M(t) = desired output, no need to update the weights, Otherwise: In a madaline network, the processing elements in the adaline layer “compete.” The winner is the neuron with the excitation, weighted weighted sum , nearest to zero, but with the wrong output. output. Only this this neuron neuron is to be adapted
W (t + 1) = W (t ) + η d(t ) − ∑ W (t ) X (t ) X (t ) = 0 < i ≤ n and η is the learning rate (typically n≤ 1/n) n
ci
Where
ci
ci
i
c is the winner adaline unit
1
i
i
Madaline:Learning Procedure Step 6: Repeat step 2 to 5 Repeat until the desired outputs and the actual network outputs are all equal for all the input vectors of the training set.
Madaline:Example • Train a Madaline to recognize the following: X2 X1
X2
Desire O/P
-1
-1
-1
-1
+1
+1
+1
-1
+1
+1
+1
-1
x1
Madaline:Example (Cont.) “1”
w10 w11 x1
#1 w12
w21 w22
x2
“1”
w20
#2
w31
Maj
Output
“1”
w30
w32 #3 Adalines
Madaline
Madaline:Example (Cont.) Step 1: 1: set all weights and threshold to small bipolar random values:
Madaline:Example (Cont.) • Excitation (Line Equation):
Madaline:Example (cont.) Step 2: 2: Present new input and desired output – Let’ Let’ss apply apply input input (-1,-1) (-1,-1) and desired desired output = -1
Step 3: 3: Calculate Actual Output [yk(t)] y1(t) = F(0.0037+(0.3566*-1)+(-0.43*-1)) = +1 y2(t) = F(-0.2779+(0.0232*-1)+(0.1117*-1)) = -1 y3(t) = F(-0.3823+(0.2843*-1)+(0.455*-1)) = -1
Step 4: 4: Determine Actual Madaline Output[M(t)]
M(t) = Majority (1,-1,-1) = -1
Madaline:Example (cont.) Step 5: 5: Determine error and update weights Since M(t) = desired output, no weight updates are needed
Step 6: 6: Repeat step 2 to 5 Repeat until the desired outputs and the actual network outputs are all equal for all the input vectors of the training set.
Madaline:Example (cont.) Repeat : Step 2: 2: Present new input and desired output – Let Let’s ’s apply apply input input (-1,1) (-1,1) and and desired desired output output = +1
Step 3: 3: Calculate Actual Output [yk(t)] y1(t) = F(0.0037+(0.3566*-1)+(-0.43*1)) = -1 y2(t) = F(-0.2779+(0.0232*-1)+(0.1117*1)) = -1 y3(t) = F(-0.3823+(0.2843*-1)+(0.455*1)) = -1
Step 4: 4: Determine Actual Madaline Output[M(t)]
M(t) = Majority (-1,-1,-1) = -1
Madaline:Example (cont.) Step 5: 5: Determine error and update weights M(t) not equal to desired output and adaline #2 is the winner neuron (-0.19 VS -.78 for #1 and -0.21 for #3) Update only adaline #2
Madaline:Example (Cont.) • Excitation (Line Equation):
Madaline:Example (Cont.) Step 6: 6: Repeat step 2 to 5 Repeat until the desired outputs and the actual network outputs are all equal for all the input vectors of the training set.
Madaline:Example (Cont.) After 3 epochs,
Madaline:Example (Cont.) • Excitation (Line Equation):
Madaline:Example (Cont.) Same problem with new set of random weights
Madaline:Example (Cont.)
Madaline:Example (Cont.)
Madaline:Example (Cont.)
Madaline:Example (Cont.)
Madaline:Example (Cont.)
Madaline:Example (Cont.) Solution converged after 3 epoches