A MATLAB BASED APPOARCH FOR TESTING LINEAR CLASSIFICATION OF A NEURAL NETWORK ANDSOLVING NONLINEARITY PROBLEM (XOR PROBLEM) OF NEURAL NETWORK 1
2
AL− AMIN SHOHAG , Dr .UZZAL KUMAR ACHARJA CSE Department, Jagannath University, Dhaka
ABSTRACT Neural network is the first and foremost step in machine learning. t provides the entire !asis for a machine to act like a human. t is prere"uisite for a machine to take different categories of data from analog world. #ut most of analog world data is non$linear. %his non$linearity of analog data raises a pro!lem for neural network. & neural network classifies dataset linearly. %hat is, it can only handle pro!lems which are linearly classified. %hus, it is a necessity for neural network to have a way to solve non$linearity. n this piece of work, we are going test this linearity characteristic a neural network using '(, &ND operation dataset which are linear. &nd we will discuss nonlinearity pro!lem for neural network using )'( dataset. &t the end, we will solve this pro!lem of non$linearity and demonstrate it using *&%+.
KEYWORDS Neural Network, +inearity, erceptron, #ack propagation algorithm, )'(, *&%+
1. Intr!"#t$n Neural network is an artificial network which tries to mimic a neural network of human !rain. Neural network of human !rain consists of many neurons. Similarly an artificial neural network consists of many artificial neurons. %hus, it produces almost similar result of a neural network of human !rain.
1.1 Art$%$#$&' n"r&' ntr* & neural network is a massively parallel distri!uted processor that has a natural propensity for storing e-periential knowledge and making it availa!le for use. n its most general form, a neural network is a machine that is designed to model the way in which the !rain performs a particular
task or function of interest. %he network is usually implemented using electronic components or simulated in software or on a digital computer. n most cases the interest is confined largely to an important class of neural network that perfor ms useful computations through a process of learning. t resem!les the !rain in two respects /. 0nowledge is ac"uired !y the network through a learning process. 1. nterneuron connection strengths known as synaptic weights are used to store the knowledge. Similar to !rain it can produce /. *assively parallel distri!uted structure 1. 2enerali3ation 4owever, an artificial neural network consists of many neurons. %he !asic model of neuron consists of many synaptic inputs in association with synaptic weights, a summing 5unction to produce sum of products of synaptic weights and inputs and an activation function to limit the output of the neuron. & !asic model of neuron is shown !elow
6ig &n &rtificial Neuron
1.+ L$n&r$t,
& neural network takes a pro!lem and tries to generali3e this pro!lem into classes. %his approach of generali3e into classes is linearity approach of neural network. t tries to draw a single or multiple linear lines to produce multiple classes of dataset !ased on similar feature of the pro!lem dataset. 6or e-ample, an &ND as well as an '( operation has input dataset !elow & 8 8 / /
# 8 8 8 /
& 8 8
# 8 /
/ /
8 /
&ND 8 8 8 / 6ig &ND operation #'(& / / / 8 6ig '( operation
7ith this data set a neural network will try to classify the output dataset into two classes. t will produce a linear !oundary line. 'ne side of !oundary line will contain all 3eros for &ND, all ones for '( operation. 'n the other side of the !oundary it will contain only / for &ND, only 8 for '( operation. 6or the class ification of this dataset, data set will need a single layer perceptron only. #ut in case of )'( where the output is non$linear, a single perceptron cannot produce a linear classification. Dataset for )'( is shown !elow & 8 8 / /
# 8 / 8 /
)'( / 8 8 /
6ig )'( dataset n this case, a multilayer perceptron is needed. 7e will see how a multilayer perceptron can solve this pro!lem in later sections.
+. Pr#-trn & perceptron is the simplest form of a neural network used for the classification of a special type of datasets said to !e linearly separa!le. & perceptron is shown !elow
6ig perceptron
n the case of an elementary perceptron, there two decision regions separated !y a hyper plane defined !y the e"uation !elow l
w ∑ = i
7here
w ki
ki
x i − θ= 0
1
are the synaptic weights and
xi
θ
are the inputs.
is the threshold value. 6or
e-ample, a single layer perceptron can classify '( and &ND dataset linearly. #ecause these datasets are linearly separa!le.
x2 /
/
/
/ 8
8 8
/
x1
x1 , x2
6ig '( 9
:
x2
/ 8 8
/ /
8
/
x1
/
6ig &ND 9
x1 , x2 ¿
#ut it cannot classify pro!lems which are not linearly separa!le such as )'( dataset.
x2 /
/ 8
/
8
8 8
/
x1
6ig )'( &s we can see dataset is not linearly separa!le. %o solve this pro!lem we need multilayer perceptron. n the ne-t section we will discuss multilayer perceptron and how it solves this pro!lem using !ack$propagation algorithm.
. M"'t$'&,r -r#-trn & multilayer perceptron has one input dataset and one or many hidden layer and one output layer. & multilayer perceptron is shown !elow
6ig & multilayer perceptron with one hidden layer & multilayer perceptron can classify a non$linear dataset using !ack$propagation algorithm.
¿( x , x´ ) 1
/
2
/ /
8
8
8 8 6ig )'(
.1B* Pr-&/&t$n A'/r$t0
/
¿( x´ , x ) 1
2
*ultilayer perceptrons have !een applied successfully to solve some difficult and diverse pro!lems !y training them in a supervised manner with a highly popular algorithm known as the error !ack$propagation algorithm. %his algorithm is !ased on the error$correction learning rule. Error !ack propagation learning consists of two passes through the different layers of the network, a forward pass and a !ackward pass. n the forward pass, an activity pattern is applied to the sensory nodes of the network and its effect propagates through the network layer !y layer. 6inally, a set of outputs is produced as the actual response of the network. During the forward pass, the synaptic weights of the networks are all fi-ed. During the !ackward pass, on the other hand, the synaptic weights are all ad5usted in accordance with an error$correction rule. Specifically, the actual response of the network is su!tracted from a desired response to produce an error signal. %his error signal is then propagated !ackward through the network against the direction of the synaptic connections$ the name error !ack propagation. %he synaptic weights are ad5usted to make the actual response of the network move closer to the desired response in a statistical sense. %he error signal at the output of neuron 5 at iteration n is defined !y
e j ( n ) = d j ( n )− y j ( n )
; Neuron 5 is an output node 1
7e define the instantaneous value of the error energy for neuron 5 as
Correspondingly, the instantaneous value
E (n )
2
2
e j ( n) 1
is o!tained !y summing
2
. 2
e j ( n)
over all
neurons in the output layer; these are the only visi!le neurons for which error signals can !e calculated directly. 7e may thus write
E ( n )= 1
∑ e (n ) 2
2 j ∈
%he instantaneous error energy
j
E ( n ) and therefore the average error energy E!" , is a
function of all the free parameters of the network. 6or a given training set,
E!"
represents the
cost function as measure of learning performance, the o!5ective of the learning process is to ad5ust the free parameters of the network to minimi3e
E!"
. 6or this we consider a simple
method of training in which the weights updated on a pattern$ !y$pattern !asis until one epoch that is one complete presentation of the entire training set has !een dealt with. %he ad5ustments to
the weights are made in accordance with the respective errors computed for each pattern presented to the network. %he induced local field
# j ( n)
produced at the input of the activation function associated with
neuron 5 is therefore $
w ∑ =
# j (n)=
i
ji
( n ) y i (n )
0
7here m is the total num!er of inputs applied to neuron 5. %he synaptic weight
y j ( n)
!ias !y applied to neuron 5. 4ence the function signal
w j0
e"uals the
appearing at the output of the
neuron 5 at iteration n is
y j ( n )= % j( " j ( n ) ) & w ji ( n ) %he !ack propagation algorithm applies a correction , which is proportional to the partial derivative
calculus; we may e-press the gradient as
%he partial derivative
' E ( n) ' w ji ( n )
w ji ( n ) to the synaptic weight
' E ( n) . ' w ji ( n ) &ccording to the chain rule of
' E ( n ) ' E ( n) ' e j ( n ) ' y j ( n ) ' " j ( n ) = ' w ji ( n ) ' e j ( n ) ' y j ( n ) ' " j ( n) ' w ji ( n )
represents a sensitivity factor, determining the direction of
w ji ( n ) search in weight space for the synaptic weight
.
Now, let us calculate the parameters of the partial derivative
' E (n) = e j (n ) ' e j (n ) &nd
' E ( n) ' w ji ( n )
in the e"uation a!ove
' e j (n ) =−1 ' y j (n ) &nd
# j (n ) ' y j ( n ) =%( ¿ ' " j (n) &nd for the last parameter we have
' " j (n ) = y i (n ) ' w ji ( n )
%hus, the partial derivative
' E ( n) ' w ji ( n )
!ecomes
' E ( n) =−e j (n ) %( ( # j ( n ) ) y i( n ) ' w ji ( n )
%he correction
n & w ji ( n ) applied to w ji ( n ) !e defined !y the delta rule & w ji ¿ : to
& w ji (n )=−)
7here
' E ( n) ' w ji ( n )
) is the learning rate parameter of the !ack$propagation algorithm.
&ccordingly
& w ji (n )=) * j ( n ) y i ( n )
7here the local gradient
* j ( n)
is defined !y
* j ( n )=
−' E ( n ) = e j ( n ) %( ( # j ( n ) ) ' " j (n )
%he local gradient points to re"uired changes in synaptic weights. %he !ack propagation algorithm can !e summari3ed as follows /. nitiali3ation &ssuming that no prior infor mation is availa!le, pick the synapti c weights and thresholds from a uniform distri!ution whose mean is 3ero and whose variance is chosen to make the standard deviation of the induced local fields of the neurons lie at the transition !etween the linear and saturated parts of the sigmoid activation function. 1. resentation of training e-amples resent the network with an epoch of the training e-amples for each e-ample in the set, ordered in some fashion; perform the se"uence of forward and !ackward computations descri!ed under points < and = respectively.
<. 6orward Computation +et a training e-ample in the epoch !e denoted !y 9-9n:,d9n::, with the input vector )9n: applied to the input layer of sensory nodes and the desire response vector d9n: presented to the output layer of sensory nodes and the desired presented to the output layer of computation nodes. Compute the induced local field and function signals of the network !y proceeding forward through the network layer !y layer. %he induced local field
" lj ( n ) for neuron 5 in layer is $0
l
" j ( n )=
w ∑ = l
( n ) y li− ( n ) 1
0
y i− ( n ) is the output signal of the neuron in the previous layer l
7here
l ji
1
( l −1 ) at
iteration n. l
w ji ( n ) is the synaptic weight of neuron 5 in layer i
l −1 . 6or i=0 , we have
in the layer
applied to neuron 5 in layer signal at neuron
j
in layer
l
that is fed form neuron
l 1 1 l y 0− ( n )=+ 1∧ w j 0 ( n )= + j ( n ) is the !ias
l . &ssuming the use of a sigmoid function, the output
l is y j ( n )= x j ( n) 0
7here
x j (n )
is the
j-
element of the input vector ( n) . f neuron
the output layer set L
y j =O j ( n )
j
is in
Compute the error signal
e j ( " )= d j ( n ) − O j ( n )
d i (n ) 7here
d (n)
jis the
element of the desired response vector
.
=. #ackward computation Compute the L
0
l
(
l
e j ( n ) / j ( " j ( n )) 12r ne3r2n j ! 2343 l!yer l ∨ % j ( " j ( n ) )
∑ * + ( n ) w + (n ) l k
1
l 1 kj
k
l
* j ( n )=¿
>
%he synaptic weights of the network in layer according to the generali3ed delta rule l −1
w ji ( n −1 ) + ) * j ( n ) y i ( n) w lji ( n + 1 )=w lji ( n )+ 5 ¿ l
7here
l
) i6-e l e!rnin7 r!e 4!r!$eer ∧5 i6 -e $2$en3$ 2n6!n .
?. teration terate the forward and !ackward computations under points < and = !y presenting new epochs of training e-amples to the network until the stopping criterion is met.
.+XOR Pr2' S'"t$n
7e may solve the )'( pro!lem !y using a single hidden layer with two neurons. %he signal$ flow graph of the network is shown !elow. %he following assumptions are made here /. Each neuron is represented !y a *cCulloch$itts model which uses a threshold function for its activation function. 1. #its 8 and / are represented !y the levels 8 and / respectively.
6ig Signal flow graph of the network for solving )'( pro!lem
%he top neuron, la!el / in the hidden layer is characteri3ed as
w 11= w12=+ 1 &nd
+1 =
%he slope of the decision !oundary is given !y
−3 2
w 21= w22=+ 1 &nd
+2=
−1 2
%he output neuron la!eled < is characteri3ed as
w 31=−2 w 32+ 1
+3 =
−1 2
%he function of the output shown !elow 98, /:
9/, /: /
8
98, 8:
6ig a
9/, 8:
/: 98,
/: 9/,
/
8 98,8:
6ig9!:
9/,8:
/: 98,
/: 9/, 8 / 8
98,8:
6ig9c:
9/,8:
.MATLAB Dn3tr&t$n n *&%+ demonstration we will test linearity for &ND as well as '( dataset with a perceptron. 7e will also test test non$linearity for )'( dataset for a perceptron. +ater, we will see how a multilayer perceptron can solve this non$linearity pro!lem for )'( dataset. 7e will !e using regression plot for all of these purposes.
.+.1 OR D&t&3t t3t %r 3$n/' -r#-trn $t0 n 0$!!n '&,r *&%+ code for '( dataset is given !elow clc; close all; x=[0 0;0 1; 1 1; 1 0]; i=x' t=[0 1 1 1]; net=perceptron; view(net); net=train(net,i,t); y=net(i); plotconfusion(t,y);
Confusion is shown !elow
6ig Confusion for '( dataset
.+.+ AND D&t&3t t3t %r 3$n/' -r#-trn $t0 n 0$!!n '&,r *&%+ code for '( dataset is given !elow clc; close all; x=[0 0;1 1; 0 1; 1 0]; i=x' t=[0 0 0 1]; net=perceptron; view(net); net=train(net,i,t); y=net(i); plotconfusion(t,y);
Confusion is shown !elow
6ig Confusion for &ND dataset
.+. XOR D&t&3t t3t %r 3$n/' -r#-trn $t0 n 0$!!n '&,r *&%+ code for )'( dataset is given !elow clc; close all; x=[0 0;1 1; 0 1; 1 0]; i=x' t=[1 1 0 0]; net=perceptron; view(net); net=train(net,i,t); y=net(i); plotconfusion(t,y);
Confusion is shown !elow
6ig Confusion for )'( dataset &s we can see from confusion )'( dataset is non$linearly classified for all the targeted output. So, a single perceptron with no hidden layer cannot solve an )'( pro!lem. Now let us see if a single perceptron with one hidden layer can solve this pro!lem.
.+. XOR D&t&3t t3t %r 3$n/' -r#-trn $t0 & 0$!!n '&,r &n! 2* -r-&/&t$n tr&$n$n/ &'/r$t0 *&%+ code for )'( dataset is given !elow clc; close all; x=[0 0;1 1; 0 1; 1 0]; i=x' t=[1 1 0 0]; net=feedforwardnet(1,'trainrp'); view(net); net=train(net,i,t); y=net(i); plot(y,t); plotconfusion(t,y)
Confusion is shown !elow
&s we can see we get linear classification for )'( data set, thus solving the pro!lem for a perceptron with no hidden layer.
4. Cn#'"3$n 7e have successfully showed the incapa!ility of a single perceptron with no hidden layer cannot classify the )'( dataset linearly. 7e have also successfully showed that this pro!lem can !e solved !y using a perceptron with a single hidden layer and using !ack propagation training algorithm.
5. R%rn# 617. Neural Network & comprehensive foundation #y Simon 4aykin, *c *aster, University 4amilton, 'ntario, Canada 6+7. &N approach to offline ara!ic character recognitin using neural network !y S.N Nawa3, *.Sarfara3, &.@idouri and 7.2.&+$0hati! 67. Neural Network Aedat %avsanoglue 647. *&%+ *&%7'(0 657. Boutu!e 687. 7iki