NATURAL LANGUAGE PROCESSING (An Implementation in Querying Databae!
A"STRACT
Natural language processing (NLP) is a subfeld o artifcial intelligence and and ling lingui uist stic ics. s. It stud studie ies s the the prob proble lems ms o auto automa mate ted d gene genera rati tion on and and understandi understanding ng o natural natural human languages. languages. Natural Natural language language generation generation systems convert inormation rom computer databases into normal-sounding human human langua language, ge, and natura naturall langua language ge unders understan tandin ding g syste systems ms conver convertt samples o human language into more ormal ormal representations that are easier or computer programs to manipulate.
Natural Language Processing is the artifcial intelligent concept here the machines understand Natural Languages li!e "nglish, #orean, $rench, %elugu, %elugu, &indi.. etc., 'e are going to develop a tool that ill ta!e the atabase ueries in the orm natural language and then processes it and gives the result. %his includes many sub components li!e Language *naly+er, uery uilder and ieer. %he system ill frst parses the uery in natural language and fnds the ma/or parts in the string. %hen frst it ill loo! or the table name and then it parses the string or the here clause and then or the order by clause. *ter parsing it ill construct the uery string based on the data available. %he generated 0L uery is posted to the database to etch the results.
INTRODUCTION
&uman understanding understanding o language language reuir reuires es bac!ground bac!ground or common common sense !noledge o the orld. &uman consciousness is tightly coupled ith both language and our internal models o the outer orld. Indeed, many argue that it is our consciousness that creates our on orld (i.e., e create the orlds that e live in). It ma!es little sense to assume that the real orld is static and is not a1ected by conscious entities living in that orld. 0o, in trying to understand lie and consciousness, it is important to understand the conte2t o e2periences in the orld. 3hildren playing oten ma!e up ne ords spontaneously that or the children involved has real meaning in the conte2t o their lives. %here are to basic approaches depending on hether e ant to rite an e1ective 4natural language ront end5 to a sotare system or i e are motivated to do undamental research on minds and consciousness by building a system that acuire structure and intelligence through its interaction ith its environment. environment.
$inite 0tate 6achines that recogni+e ord seuences as syntactically valid sentence. 3onceptual ependency parsers that stress semantics rather than synta2. %he system uses *n *%N based parser o the 'ordnet le2icon.
*%N parsers are fnite state machines that recogni+e ord seuences as speci specifc fc or ords, noun noun phra phrase ses, s, verb verb phra phrase ses, s, etc. etc. %he %he cont conte e2t ree ree prog progra ramm mmin ing g or or NLP incl includ udes es the the oll ollo oin ing. g. i7c i7cul ulty ty in deal dealin ing g ith ith di1erent sentences structures that has the same meaning. &andling number agreement beteen sub/ects and verbs. etermining the deep structure o input te2ts.
%he term morphological tags reers reers to labeling o ords ith parts o speech tags. 0ome o the e2amples are as ollos. •
Noun 8 cat, dog, boy etc
•
Pronouns Pronouns 8 &e, she, it o
9elative 9elative Pronouns 8 hich, ho, that
•
erb 8 run, thro, see etc
•
eterminers o
*rticles 8 a, an, the
o
Possessives Possessives 8 my, your, theirs etc
o
emonstratives 8 this, that, these, those
o
Numbers
•
*d/ectives*d/ectives- ig, small, purple etc
•
*dverbs o
o
escribe ho some thing is done 8 ast, ell. "tc %ime ater, ater, soon, etc
o
uestioning 8 &o, hy, hen, here
o
Place 8 don, up, here etc.
In general accurate assigning correct morphological tags to input te2t is di7cult problem. &idden 6ar!ov 6odel and ayesian techniues are used or assigning ord types. "nglish :rammar is comple2 %he important steps in building NLP technology into your on programs are. •
9educe 9educe domain o discourse to a minimum.
•
3reate a set o 4use cases5 to ocus your e1ort in designing and rit ritin ing g *%Ns, %Ns, and and to us use e or or test testin ing g your your NLP NLP sy syst stem em duri during ng development.
•
'hen possible capture te2t input rom real users o your system, and incrementally build up a set o use cases that your system can handle correctly. correctly.
•
6ap 6ap inde indent ntif ifed ed or ords ; part parts s o speec speech h to acti action ons s that that sy syst stem em should perorm.
Le2icon data is used to indicate the many o the ord types. 'e ill use ordnet le2ical database to build a le2icon
REQUIRE#ENTS ANAL$SIS ANAL$SIS DOCU#ENT
Intro%u&tion
a' Purpo Purpoe e o t)e t)e ytem ytem
%he main purpose o the system is to design and develop a system that can understand the Natural Languages Li!e "nglish and can convert the natural languages into data base ueries. %he ueries are e2ecuted in the 60 and the response ill be in the Natural Language.
b' S&ope S&ope o t)e t)e Syt Sytem em
%he scope o the system includes developing the system that can unde unders rsta tand nd Natur Natural al lang languag uage e proc proces esso sorr us usin ing g the the *rti *rtifc fcia iall Inte Intell llig igen entt concepts.
&' Ob*e&ti+ Ob*e&ti+e e an% Su&&e Su&&e Criteria Criteria o t)e t)e Pro*e&t Pro*e&t
%he main ob/ective o the system is to design and implement o *%N Parser in
%' De,nitio De,nitionn- a&ronym a&ronym an% abbre+iatio abbre+iation n
Current Sytem In the current system the ueries are in high level languages li!e 0L. %he person ho is using that system must learn the 0L and rite the ueries in the &igh level languages.
Propoe% Sytem
=vervie>
%he proposed system is an intelligent intelligent system hich ill ill understand the natura naturall langua language ge and conve converts rts the natura naturall langua language ge uery uery into into the 0L uery. %he system ill use the "nglish parts o speech, divides and identifes the nouns, verbs and con/uncti con/unctions. ons. %he 0L uery is e2ecut e2ecuted ed in the oracle database. %he results are again shon in the Natural Language.
$unctional 9euirements> %he ma/or unctional reuirem reuirements ents o the system are as ollos. ollos. ?. %o create a natural language language processor processor.. @. %o create Interac Interace e to connect the database database.. A. %o implem implement ent a Natura Naturall Languag Language e "ngine "ngine hich hich consis consists ts o 0earc 0earch h techniues or the ords.
Non $unctional 9euirements> %he ma/or non unctional 9euireme 9euirements nts o the system are as ollos ollos ?. %he ueries ueries rom rom the the client. client.
@. %he data data in the the data databas base. e.
?. Bsab Bsabil iliity %he system is designed ith completely automated process hence there is no or less user intervention.
@. 9elia eliabi bili lity ty %he system is more reliable reliable because o the ualities that are inherited rom the chosen platorm /ava. %he code built by using /ava is more reliable. reliable.
A. Peror erorma manc nce e %he system e2hibits high perormance because it is ell optimi+ed. It uses the automatic garbage collection rom /ava.
C. 0upp 0uppor orta tabi bili lity ty %he system is designed to be the cross platorm supportable. %he system is supported on a ide range o hardare and any sotare sotare platorm hich is having <6 built into the system.
D. Impl Implem emen enta tati tion on %he system is implemented in the platorm platorm independent, Light eight,
E. Inte Interrac ace %he
Bser
Interace
components.
F. Pac! ac!agin aging g
is
completely
based
on
the
0ing
%he entire application is pac!aged into the single pac!age named nlp.
G. Legal %he code sub/ected in this pro/ect is user permissions permissions are issued to :PL :eneral Public License.
.ar%/are 0 Sot/are #apping
&ardare 9euirements
3PB
>
Intel Pentium C Processor
9*6
>
D?@ 6
&
>
GH :
Netor!
>
NI3 3ard 9euired
0otare 9euirements
Programming Language atabase ac!end
>
=racle ?Hg 9elease @
>
0ervlets, <0P
0cripting Language
>
=perating 0ystem
>
'indos
%echnologies %echnologies
0ervice
P
Proessional
'ith