CS6703 GRID AND CLOUD COMPUTING UNIT I INTRODUCTION Evolution of Distributed computing: Scalable computing over the Internet – Technologies for network network based systems – clusters clusters of cooperative cooperative computers - rid computing computing Infrastru Infrastructures ctures – cloud computing - service oriented architecture – Introduction to rid !rchitecture !rchitecture and standards –Elements of rid – "verview of rid !rchitecture# !rchitecture# 1.1 1.1 Evol Evolut utio ion n of Dist Disti i!u !ut" t"# # $o%& $o%&ut utin in'( '( S$)l S$)l)! )!l" l" $o%& $o%&ut utin in' ' ov" ov" t*" t*" Int" Int"n n"t "t + T"$*nolo'i"s fo n"t,o- !)s"# sst"%s + Clust"s of $oo&")tiv" $o%&ut"s $ertainly one can go back a long way to trace the history of distributed computing# Types of distributed computing e%isted in the &'()s# *any people were interested in connecting computers together for high performance computing in the &'+)s and in particular forming multicomputer or multiprocessor systems# ,rom connecting processors and computers together locally that began in earnest in the &'()s and &'+)s distributed computing now e%tends to connecting computers that are geographically distant# The distributed computing technologies that underpin rid computing were developed concurrently and rely upon each other# There are three concurrent interrelated paths# They are: ./etworks .$omputing platforms .Software techni0ues N"t,o-s( rid computing relies on high performance computer networks# The history of such networks began in the &'()s with the development of packet switched networks# The most important important and ground-break ground-breaking ing geographical geographically ly distribut distributed ed packet-swi packet-switched tched network network was the DoD-funded !12/ET network with a design speed of 3) 4bits5sec# !12/ET became operational with four nodes 67niversity of $alifornia at 8os !ngeles Stanford 1esearch Institute 7niversity of $alifornia at Santa 9arbara and 7niversity of 7tah in &'('# &'('# T$2 T$2 6Tra 6Transm nsmis issio sion n $ontrol $ontrol 2rotoc 2rotocol ol was concei conceived ved in &'+; and became became T$25I T$25I2 2 6Transmission $ontrol 2rotocol5Internet 2rotocol in &'+<# T$25I2 became universally adopted# T$2 provided a protocol for reliable communication while I2 provided for network routing# Important concepts including I2 addresses to identify hosts on the Internet and ports that identify end points 6processes for communication purposes# The Ethernet was also developed in the early &'+)s and became the principal way of interconnecting computers on local networks# It initially enabled multiple computers to share a single Ethernet cable and handled communication collisions with a retry protocol although nowadays this collision detection is usuall usually y not needed needed as separa separate te Ethern Ethernet et cables cables are used used for each each comput computer er with with Ethern Ethernet et switches to make connections# Each Ethernet interface has a uni0ue physical address to identify it for communication purpose which is mapped to the host I2 address# The Internet began to be formed in early &'<)s using the T$25I2 protocol# During the &'<)s the Internet grew at a phenomenal rate# /etworks continued to improve and became more pervasive throughout througho ut the world# In the &'')s the Internet developed into the =orld-=i =orld-=ide de =e =eb# b# The browser and the >T*8 markup language was introduced# The global network enables computers to be interconnected virtually anywhere in the world# Co%&ut Co%&utin' in' Pl)tfo Pl)tfo%s( %s( $omputing systems began as single processor systems# It was soon recogni?ed that increased speed could potentially be obtained by having more than one processor
inside a single computer system and the term parallel computer was coined to describe such systems# 2arallel computers were limited to applications that re0uired computers with the highest computational speed# It was also recogni?ed that one could connect a collection of individual computer systems together 0uite easily to form a multicomputer system for higher performance# There were many pro@ects in the &'+)s and &'<)s with this goal especially with the advent of low cost microprocessors# In the &'')s it was recogni?ed that commodity computers 62$s provided the ideal costeffective solution for constructing multicomputer and the term cluster computing emerged# In cluster computing a group of computers are connected through a network switch as illustrated in the figure below# Speciali?ed high-speed interconnections were developed for cluster computing# >owever many chose to use commodity Ethernet as a cost-effective solution although Ethernet was not developed for cluster computing applications and incurs a higher latency# The term 9eowulf cluster was coined to describe a cluster using off-the-shelf computers and other commodity components and software named after the 9eowulf pro@ect at the /!S!#
T&i$)l $lust" $o%&utin' $onfi'u)tion oodard Space ,light $enter started in &''A 6Sterling B))B# The original 9eowulf pro@ect used Intel Intel ;<( ;<( proc proces esso sors rs the the free free 8inu% 8inu% opera operati ting ng syst system em and and dual dual &) *bit *bits5 s5se secc Ether Etherne nett connections# !s clusters were being constructed work was done on how to program them# The dominant programming paradigm for cluster computing was and still is message passing in which which inform informati ation on is passed passed betwee between n proces processes ses running running on the computers computers in the form form of messages# These messages are specified by the programmer using message-passing routines# The most most notable notable librar library y of messag message-pa e-pass ssing ing routin routines es was 2C* 62aral 62arallel lel Cirtu Cirtual al *achine 6Sunderam &'') which was started in the late &'<)s and became the de facto standard in the early-mid &'')# 2C* included the implementation of message-passing routines#
Subse0uently a standard definition for message passing libraries called *2I 6*essage 2assing Interface was established 6Snir et al# &''< which laid down what the routines do and how they are invoked but not the implementation# Several implementations were developed# 9oth 2C* and *2I routines could be called from $5$ or ,ortran programs for message passing and related activities# Several pro@ects began in the &'<)s and &'')s to take advantage of networked computers in laboratories for high performance computing# ! very important pro@ect in relation to rid computing is called $ondor which started in the mid-&'<)s with the goal to harness unusedF cycles of networked computers for high performance co mputing# In the $ondor pro@ect a collection of computers could be given over to remote access automatically when they were not being used locally# The collection of computers 6called a $ondor pool then formed a high-performance multicomputer# *ultip *ultiple le users users could could use such such physic physicall ally y distr distribu ibuted ted compute computerr system systems# s# Some Some very import important ant ideas ideas were were employ employed ed in $ondor $ondor includi including ng matchin matching g the @ob with with the availab available le resour resources ces automat automatica ically lly using a descri descripti ption on of the @ob and a descri descripti ption on of the availa available ble resources# ! @ob workflow could be described in which the output of one @ob could automatically be fed into another @ob# $ondor has become mature and is widely used as a @ob scheduler for clusters in addition to its original purpose of using laboratory computers collectively# In $ond $ondor or the the dist distri ribu bute ted d comp comput uter erss need need only only be netw networ orke ked d and and coul could d be geographically distributed# $ondor can be used to share campus-wide computing resources# Soft,)" T" T"$*ni/u"s( $*ni/u"s( !part from the development of distributed computing platforms software techni0ues were being developed to harness truly distributed systems# The remote procedure call 612$ was conceived in the mid-&'<)s as a way of invoking a procedure on a remote computer as an e%tension of e%ecuting procedures locally# The remote procedure call was subse0uently developed into ob@ect-oriented versions in the &'')s one was $"19! 6$ommon 6$ommon 1e0uest 9roker !rchitecture !rchitecture and another another was the Gava *ethod Invocation Invocation 61*I# The remote procedure call introduced the important concept of a service registry to locate remote remote servic services# es# Servic Servicee regist registrie riess in relati relation on to discov discoveri ering ng servic services es in a rid rid comput computing ing environment includes the mechanism of discovering their method of invocation# During the early development of the =orld-=ide =eb the >T*8 was conceived to provide a way of displaying =eb =eb pages and connecting to other pages through now very familiar hyperte%t links# Soon a =eb page became more than simply displaying information it became an interactive interactive tool whereby information information could be entered entered and processed at either either the client side or the server side# The programming language GavaScript was introduced in &''3 mostly for causing actions to take place specified in code at the client whereas other technologies were being developed for causing actions to take place at the server such as !S2 first released in &''(# In B)))s a very significant concept for distributed Internet-based computing called a =eb =e b service was introduced# =e =eb b services have their roots in remote procedure calls and provide remote actions but are invoked through standard protocols and Internet addressing# They also use H*8 6eHtensible *arkup 8anguage which was also introduced in B)))# The =eb service interface is defined in a language-neutral manner by the H*8 language =SD8# =eb services were adopted into rid computing soon after their introduction as a fle%ible interoperable way of implementing the rid infrastructure and were potentially useful for rid applications#
The firs firstt larg large-s e-sca cale le rid rid comp comput utin ing g demons demonstr trat atio ion n that that invol involve ved d Gi# i# Co%&u &uttin' in'( The geographically distributed computers and the start of rid computing proper was the Information =ide-!rea ear ear 6I-=! 6I-=! demonstration at the Supercomputing &''3 &'' 3 $onference 6S$J'3# Seventeen supercomputer sites were involved including five D"E supercomputer centers four /S, supercomputer centers three /!S! supercomputer sites and other large computing sites# Ten e%isting !T* networks were interconnected with the assistance of several ma@or network service providers# "ver "ver () applic applicati ations ons demons demonstra trated ted in areas areas includ including ing astron astronomy omy and astrop astrophys hysics ics atmospheri atmosphericc science science biochemistry biochemistry molecular molecular biology biology and structural structural biology biology biological biological and medica medicall imagin imaging g chemis chemistry try distri distribut buted ed comput computing ing earth earth scienc science e educati education on engineer engineering ing geom geomet etri ricc model modelin ing g mate materi rial al scie scienc nce e math mathem emat atic ics s micr microph ophys ysic icss and macr macroph ophys ysic ics s neur neuros osci cien ence ce perf perform ormanc ancee analy analysi sis s plas plasma ma physi physics cs tele tele oper operat atio ions ns5t 5tel elep epre rese sence nce and visuali?ation 6De,anti &''(# "ne focus was on virtual reality environments# Cirtual reality components included an immersive AD environment# Separate papers in the &''( special issue of International Gournal of Supercomputer !pplications described nine of the I-=ay applications# I-=ay I-=ay was perhaps perhaps the larges largestt collec collectio tion n of networ networked ked comput computing ing resour resources ces ever ever assembled for such a significant demonstration purpose at that time# It e%plored many of the aspect aspectss now regard regarded ed as central central to rid rid comput computing ing such such as securi security ty @ob submis submissio sion n and dist distri ribut buted ed reso resour urce ce sche schedul dulin ing# g# It came came face face-t -too-fa face ce with with the the pol polit itic ical al and and tech techni nica call constraintsF that made it infeasible to provide single scheduler 6De,anti &''(# Each site had its own @ob scheduler which had to be married together# The I-=ay pro@ect also marked the start of the lobus pro@ect 6lobus2ro@ect which developed de facto software for rid computing# The lobus 2ro@ect is led by Ian ,oster a co-developer of the I-=ay demonstration and a founder of the rid computing concept# The lobus 2ro@ect developed a toolkit of middleware software components for rid computing infrastructure including for basic @ob submission security and resource management# lobus has evolved through several implementation versions to the present time as standards have evolved although the basic structural components have remained essentially the same 6security data management e%ecution management information services and run time environment# =e will describe lobus in a little more detail later# !lthough the lobus software has been widely adopted and is the basis of the coursework described in this book there are other software infrastructure pro@ects# The 8egion pro@ect also envisioned envisioned a distribut distributed ed rid computing computing environment# environment# 8egion was conceived conceived in &''A although although work on the 8egion software did not begin in &''( 68egion =orld=ide Cirtual $omputer# 8egion used an ob@ect-based approach to rid computing# 7sers could create ob@ects in distant locations# The first public release of 8egion was at the Supercomputing '+ conference in /ovember &''+# The work led to the rid computing company and software called !vaki in &'''# The company was subse0uently taken over by Sybase Inc# In the same period a European rid computing pro@ect called 7/I$"1E 67/iform Interf Interface ace to $"mput $"mputing ing 1Esour 1Esources ces began began initia initially lly funded funded by the erman erman *inist *inistry ry for Education and 1esearch 69*9, and continued with other European funding# 7/I$ 7/I$"1 "1E E is the the basis basis of sever several al of the the Euro Europe pean an effo effort rtss in rid rid comp comput utin ing g and and elsewhere including in Gapan# It has many similarities to lobus for e%ample in its security model and a service based "S! standard but is a more complete solution than lobus and
includes a graphical interface# !n e%ample pro@ect using 7/I$"1E is E71"1ID a rid computing testbed developed in the period of B)))-B));# ! E71"1 E71"1ID ID applica applicati tion on pro@ec pro@ectt is "pen*o "pen*ol1 l1ID ID "pen "pen $omput $omputing ing 1ID 1ID for *olecular Science and Engineering developed during the period of B))B-B))3 to speed up automati?e and standardi?e the drug-design using rid technologyF 6"pen*ol1ID# The term e-Science was coined by Gohn Taylor the Director eneral of the 7nited 4ingdomJs "ffice of Science and Technology in &''' to describe conducting scientific research using distributed distributed networks networks and resources resources of a rid computing infrastructur infrastructure# e# !nother more recen recentt Euro Europea pean n term term is e-In e-Infr fras astr truct uctur ure e whic which h refe refers rs to creat creatin ing g a rid rid-l -lik ikee rese resear arch ch infrastructure# =ith the development of rid computing tools such as lobus and 7/I$"1E a growing numbe numberr of rid rid pro@ pro@ec ects ts began began to deve develo lop p appl applic icat atio ions ns## "rig "rigin inal ally ly thes thesee focus focused ed on computational applications# They can be categori?ed as: .$omputationally intensive .Data intensive .E%perimental collaborative pro@ects The comput computati ational onally ly intens intensive ive categor category y is tradit tradition ional al high high perfor performan mance ce comput computing ing addressing large problems# Sometimes it is not necessarily one big problem but a problem that has to be solved repeatedly with different parameters 6parameter sweep problems to get to the solution# The data intensive category includes computational problems but with the emphasis on large amounts of data to store and process# E%perimental collaborative pro@ects often re0uire collecting data from e%perimental apparatus and very large amounts of data to study# The potential of rid computing was soon recogni?ed by the business community for socalled called e-9usiness e-9usiness applications applications to improve improve business business models and practices practices sharing corporate corporate comput computing ing resour resources ces and databas databases es and commerc commercial iali?a i?atio tion n of the technol technology ogy for busine business ss applications# ,or e-9usiness applications the driving motive was reduction of costs whereas for eScience applications the driving motive was obtaining research results# That is not to say cost was not a factor in e-Science rid computing# 8arge-scale research has very high costs and rid computing offers distributed efforts and cost sharing of resources# There are pro@ects that are concerned with accounting such as rid9us mentioned earlier# The figure below shows the time lines for computing platforms underlying software techni techni0ue 0uess and networ networks ks discus discussed sed## Some Some see rid rid comput computing ing as an e%tens e%tension ion of cluste cluster r computing and it is true in the development of high performance computing rid computing has followed on from cluster computing in connecting computers together to form a multicomputer platform but rid computing offers much much more# The term cluster computing is limited to using computers that are interconnected locally to form a computing resource# 2rogramming is done mostly using e%plicit message passing# rid computing computing involves involves geographicall geographically y distribut distributed ed sites sites and invokes some different different techni0ues# There There is certai certainly nly a fine fine line line in the continu continuum um of interc interconne onnecte cted d compute computers rs from from locally locally interconnected computers in a small room through interconnected systems in a large computer room room then then in multi multiple ple rooms rooms and in differ different ent depart departmen ments ts within within a company company throug through h to computers interconnected on the Internet in one area in one country and across the world#
The early hype of rid computing and marketing ploys in the late &'')s and early B)))s caused some to call configurations rid computing when they were @ust large computational clusters or they were laboratory computers whose idle c ycles are being used# "ne classification that embodies the collaborative feature of rid computing is: .Enterprise rids – rids formed within an organi?ation for collaboration# .2artner rids – rids set up between collaborative organi?ations or institutions# Enterprise rid still might cross administrative domains of departments and re0uires departments to share their resources# Some of the key ke y features that are indicative of rid computing are: .Shared multi-owner computing resources# .7sed rid computing software such as lobus with security and cross-management mechanisms in place# rid computing software such as lobus provides the tools for individuals and teams to use geographically distributed computers owned by others collectively#
" $on$"&ts in t*" *isto of Gi# $o%&utin'. ,osterJs ,osterJs $heck 8ist: Ian Ian ,ost ,oster er is cred credit ited ed for for the the deve develo lopm pment ent of rid rid comp comput utin ing g and and sometimes called the father of rid computing# >e proposed a simple checklist of aspects that are common to most true rids 6,oster B))B: ./o centrali?ed control .Standard open protocols ./on-trivial 0uality of service 6KoS Gi# Co%&utin' Co%&utin' v"s" Clust" Co%&utin'( Co%&utin'( It is important not to think of rid computing simply as a large cluster because the potential and challenges are different# $ourses on rid
computing computing and on cluster computing are 0uite different# different# In cluster cluster computing one learns learns about message-passing programming using tools such as *2I# !lso shared memory programming is considered using threads and "pen*2 given that most computers in a cluster today are now also multicore shared memory systems# In cluster computing network security is not a big issue that usually concerns the user directly# 7sually an ssh connection to the front-end code of cluster is sufficient# The internal compute nodes are reached from there# $lusters are usually 8inu% clusters and in those often an /,S 6/etwork ,ile System shared file system installed across the compute resources# !ccounts need to be present on all systems in the cluster and it may be that /IS 6/etwork Information System is used to provide consistent configuration information on all systems but not necessary so# /IS can increase the local network traffic and slow the start of applications# In rid computing one looks at how to manage and use the geographically distributed sites 6distributed resources# 7sers need accounts on all resources but generally a shared file system is not present# Each site is typically a high performance cluster# 9eing a distributed environment one looks at distributing computing techni0ues such as =eb services and Internet protocols and network security as well as how to actually take advantage of the distributed resource# Security is very important because the pro@ect may use confidential information and the distributed nature of the environment opens up a much higher probability of a security breach# There are things in common with both rid computing and cluster computing# 9oth involve using multiple compute resources collectively# 9oth re0uire @ob schedulers to place @obs onto the best platform# In cluster computing a single @ob scheduler will allocate @obs onto the local compute resources# In rid computing a rid computing scheduler has to manage the geographically disturbed resources owned by others and typically interacts with local cluster @ob schedulers found on local clusters# Gi# Co%&utin' v"sus Clou# Co%&utin'( $ommerciali?ation of rid computing is driven by a business model that will make profits# The first widely publici?ed attempt was on-demand and utility computing in the early B)))s which attempted to sell computer time on a rid platform constr construct ucted ed using using rid rid technol technologi ogies es such such as lobus lobus## *ore *ore recent recently ly cloud cloud comput computing ing is a business model in which services are provided on servers that can be accessed through the Internet# The common thread between rid computing and cloud computing is the use of the Internet to access the resources# $loud computing is driven by the widespread access that the Internet and Internet technologies provide# >owever cloud computing is 0uite distinct from the original purpose of rid computing# =hereas =hereas rid rid comput computing ing focuse focusess on collabo collaborat rative ive and distr distribu ibuted ted shared shared resour resources ces cloud cloud comput computing ing concent concentrat rates es upon upon placing placing resour resources ces for paying paying users users to access access and share# share# The technology for cloud computing emphasi?es the use of services 6software as a service SaaS and possibly the use of virtuali?ation # ! number of companies entered the cloud computing space in the mid-late B)))s# I9* was an early promoter of on-demand rid computing in the early B)))s and moved into cloud computing in a significant way opening a cloud computing center in Ireland in *arch B))< 6Dublin and subse0uently in the /etherlands 6!msterdam $hina 69ei@ing and South !frica 6Gohannesburg in Gune B))<#
Clou# $o%&utin' usin' vitu)li"# "sou$"s. "ther ma@or cloud computing players include !ma?on and oogle who utili?e their massive number of servers# !ma?on has the !ma?on Elastic $ompute $loud 6!ma?on EB pro@ect for users to buy time and resources through =eb =eb services and virtuali?ation# virtuali?ation# The cloud computing computing business model is one step further further than hosting hosting companies companies simply renting servers they provide at their location which became popular in the early-mid B)))s with many start-up companies and continues to date# 1.2 Gi# $o%&utin' Inf)stu$tu"s
rid $omputing is based on the concept of information and electricity sharing which allowing us to access to another type of heterogeneous and geographically separated resources# rid gives the sharing of: Storage elements B# $omputational resources A# E0uipment ;# Specific applications 3# "ther
Thus rid is based on: . Internet protocols# . Ideas of parallel and distributed computing#
A Gi# is ) sst"% t*)t
& $oordinates resources that may not sub@ect to a centrali?ed control# B 7sing standard open general-purpose protocols and interfaces# A To deliver nontrivial 0ualities of services# ,le%ible secure coordinated resource sharing among individuals and institutions# Enable communities 6virtual organi?ations to share geographically distributed resources in order to achieve a common goal# In applications which canJt be solved by resources of an only institution or the results can be achieved faster and5or cheaper# 1.3 Clou# $o%&utin'
$loud $omputing $omputing is used to manipulating manipulating accessing and configurin configuring g the hardware and software resources remotely# It gives online data storage infrastructure and application#
Clou# Co%&utin'
$loud computing supports platform independency as the software is not re0uired to be installed locally on the 2$# >ence the $loud $omputing is making our business applications mobile and collaborative# C*))$t"isti$s of Clou# Co%&utin'
There are four key characteristics of cloud computing# They are shown in the following diagram:
On D"%)n# S"lf S"vi$"
$loud $omputing allows the users to use web services and resources on demand# "ne can logon to a website at any time and use them# 4o)# N"t,o- A$$"ss
Since cloud computing is completely web based it can be accessed from anywhere and at any time# R"sou$" Poolin'
$loud computing allows multiple tenants to share a pool of resources# "ne can share single physical instance of hardware database and basic infrastructure# R)&i# El)sti$it
It is very easy to scale the resources vertically or hori?ontally at any time# Scaling of resources means the ability of resources to deal with increasing or decreasing demand# The resources being used by customers at any given point of time are automatically monitored# M")su"# S"vi$"
In this service cloud provider controls and monitors all the aspects of cloud service# 1esource optimi?ation billing capacity planning etc# depend on it# 4"n"fits
"ne can access applications a pplications as utilities over the Internet# B# "ne can manipulate and configure the applications online at any time# A# It does not re0uire to install a software to access or manipulate cloud application# ;# $loud $omputing offers online development and deployment tools programming runtime environment through 2aaS model#
3# $loud resources are available over the network in a manner that provide platform independent access to any type of clients# (# $lou $loud d $omp $omput utin ing g offe offers rs on-dem on-demand and self self-s -ser ervi vice ce## The The reso resour urce cess can can be used used with without out interaction with cloud service provider# Dis)#v)nt)'"s of $lou# $o%&utin'
. 1e0uires a high-speed internet connection . Security and confiability of data . /ot solved yet the e%ecution e%ecution of >2$ apps in cloud computing computing Interoperab Interoperability ility between between cloud based systems 1.5 S"vi$" oi"nt"# )$*it"$tu"
Servic Service-" e-"rie riente nted d !rchi !rchitec tectur turee helps helps to use applica applicatio tions ns as a servic servicee for other other applic applicati ations ons regardless the type of vendor product or technology# Therefore it is possible to e%change the data data betwee between n applica applicatio tions ns of differ different ent vendor vendorss without without additi additional onal progra programmi mming ng or making making changes to services# The cloud computing service oriented architecture is shown in the diagram below#
Distributed computing such as rid computing relies on causing actions to occur on remote computers# Taking advantage of remote computers was recogni?ed many years ago well before rid computing# "ne of the underlying concepts is the client-server model as shown in the figure below# The client in this conte%t is a software component on one computer that makes an access to the server for a particular operation#
Cli"nt+s"v" %o#"l
The server responds accordingly# The re0uest and response are transmitted through the network from the client to the server# !n early form of client-serv client-server er arrangement arrangement was the remote remote procedure procedure call 612$ introduced introduced in the &'<)s# This mechanism allows a local program to e%ecute a procedure on a remote computer and get back results from that procedure# It is now the basis of certain network facilities such as mounting mounting remote files in a shared shared file system# system# ,or the remote procedure procedure call to work the client needs to: .Identify the location of the re0uired procedure# .4now how to communicate with the procedure to get it to provide the actions re0uired# The remote procedure call introduced the concept of a service registry to provide a means of locating the service 6procedure# 7sing a service registry is now part of what is called a serviceoriented architecture 6S"! as illustrated in the figure below# The se0uence of events is as follows: .,irst the server 6service provider publishes pub lishes its services in a service registry# .Then the client 6service re0uestor can ask the service registry to locate a service#
.Then the client 6service re0uestor binds with service provider to invoke a service#
S"vi$"+oi"nt"# )$*it"$tu".
8ater forms of remote procedure calls in &'')s introduced distributed ob@ects most notably $"19! 6$ommon 1e0uest 9roker !rchitecture and Gava 1*I 61emote *ethod Invocation# ! fundamental disadvantage of remote procedure calls so far described is the need for the calling programs to know implementation-dependent details of the remote procedural call# ! procedural call has a list of parameters with specific meanings and types and the return value6s have specific meaning and type# !ll these details need to be known by the calling program each remote procedure provided by differ different ent progra programme mmers rs could could have have differ different ent and incomp incompati atible ble arrange arrangement ments# s# This This led to improvements including the introduction of interface definition 6or description languages 6ID8s that enabled the interface to be described in a language-independent manner and would allow clients and servers to interact in different languages 6e#g# between $ and Gava# >owever even with ID8s these systems were not always completely platform5language independent# Some aspects for a better system include: .7niversally agreed-upon standardi?ed interfaces# .Inter-operability between different systems and languages# .,le%ibility to enable different programming models and message patterns#
.!greed network protocols 6Internet standards# =eb =e b services with an H*8 interface definition language offer the solution# 1. Into#u$tion to Gi# A$*it"$tu" )n# st)n#)#s
4)si$ &ill)s •
Data management
•
1esource management
•
Security
•
Information services
N""# of s"$uit •
/o centrali?ed control
•
Distributed resources
•
Different resource providers
•
Each resource provider uses different security policies
R"sou$" M)n)'"%"nt
The huge number and the heterogeneous potential of rid $omputing resources causes the resource management challenge to be a ma@or effort topic in rid $omputing environments# These resource management eventualities are include resource discovery resource inventories fault isolation resource provisioning resource monitoring a variety of autonomic capabilities and service-level management activities# The most interesting aspect of the resource management area is the selection of the correct resource from the grid resource pool based on the service-level re0uirements then to efficiently provision them to facilitate user needs# Info%)tion S"vi$"s
Inform Informati ation on servic services es are fundam fundament entall ally y concent concentrat rated ed on providi providing ng valuab valuable le inform informati ation on respective to the rid $omputing infrastructure resources# These services leverage and entirely depend on the providers of information such as resource availability capacity utili?ation @ust to name a few# This information is valuable and mandatory feedbac feedback k respec respectiv tivee to the resou resource rcess manager managers# s# These These inform informati ation on servic services es enable enable servic servicee providers to most efficiently allocate resources for the variety of very specific tasks related to the rid $omputing infrastructure solution# D)t) M)n)'"%"nt
Data forms the single most important asset in a rid $omputing system# This data may be input into the resource the results from the resource on the e%ecution of a specific task# If the infrastructure is not designed properly the data movement in a geographically distributed system can 0uickly cause scalability problems# It is well understood that the data must be near to the computation where it is used# This data movement in any rid $omputing environment re0uires absolutely secure data transfers both to and from the respective resources# The current advances surrounding data management are tightly focusing on virtuali?ed data storage mechanisms such as storage area networks 6S!/ network file systems dedicated storage servers virtual databases# These virtuali?ation mechanisms in data storage solutions and common access mechanisms 6e#g# relational SK8s =eb services etc# help developers and providers to design data management concepts into the rid $omputing infrastructure with much more fle%ibility than traditional approaches# St)n#)#s fo GRID "nvion%"nt •
"S!
•
"SI
•
"S!-D!I
•
rid,T2
•
=S1, and etc#
OGSA
The lobal rid ,orum has published the "pen rid Service !rchitecture 6"S!# To address the re0uirements of grid computing in an open and standard way re0uires a framework for distributed systems that support integration virtuali?ation and management# Such a framework re0uires a core set of interfaces e%pected beh aviors resource models bindings# "S! defines re0uirements for these core capabilities and thus provides a general reference architecture for grid computing environments# It identifies the components and functions that are useful if not re0uired for a grid environment#
OGSI
!s grid computing has evolved it has become clear that a service-oriented architecture could provide many benefits in the implementation of a grid infrastructure# The lobal rid ,orum e%tended the concepts defined in "S! to define specific interfaces to various services that would implement the functions defined by "S!# *ore specifically the "pen rid Services Interface 6"SI defines mechanisms for creating managing e%changing information among rid services# ! rid service is a =eb =eb service that conforms to a set of interfaces and behaviors behav iors that define how a client interacts with a rid service# These interfaces and behaviors along with other "SI mechanisms associated with rid service creation and discovery provide the basis for a robust grid environment# "SI provides the =eb Service Definition 8anguage 6=SD8 definitions for these key interface OGSA+DAI
The "S!-D!I 6data access and integration pro@ect is concerned with constructing middleware to assist with access and integration of data from separate data sources via the grid# The pro@ect was conceived by the 74 Database Task ,orce and is working closely with the lobal rid ,orum D!IS-= and the lobus team# Gi#TP
rid,T rid,T2 2 is a secure secure and reliab reliable le data data transf transfer er protoc protocol ol provid providing ing high high perfor performan mance ce and optimi?ed for wide-area networks that have high bandwidth# !s one might guess from its name it is based upon the Internet ,T2 protocol and includes e%te e%tens nsio ions ns that that make make it a desir desirab able le tool tool in a grid grid envi enviro ronme nment nt## The The rid rid,T ,T2 2 prot protoc ocol ol specification is a proposed recommendation document in the lobal rid ,orum 6,D-1-2#)B)# 6,D-1-2#)B)# rid,T2 uses basic rid security on both control 6command and data channels# ,eatures include multiple data channels for parallel transfers partial file transfers third-party transfers more# =S1, =eb Services 1esource ,ramework 6=S1,# 9asically =S1, defines a
set of specif specifica icatio tions ns for defini defining ng the relati relations onship hip betwee between n =eb servic services es 6that 6that are normally normally stateless and stateful resources 8"! s"vi$"s "l)t"# st)n#)#s
9ecause rid services are so closely related to =e =eb b services the plethora of standards associated with =eb =eb services also apply to rid services# =e do not describe all of these standards in this document but rather recommend that the reader become familiar with standards commonly associate with =e =eb b services such as: L H*8 L =SD8 L S"!2 L 7DDI# 1..1 El"%"nts of Gi# + Ov"vi", of Gi# A$*it"$tu" G"n")l D"s$i&tion
The $omputing Element 6$E is a set of g8ite services that provide access for rid @obs to a local resource management system 681*S batch system running on a computer farm or possibly to computing resources local to the $E host# Typically Typically the $E provides access to a set of @ob 0ueues within the 81*S# Utili)tion P"io# 4oo-in' Con#itions
/o particular booking book ing is re0uired to use this service# >owever the user *7ST have a valid grid certif certifica icate te of an accepte accepted d $erti $ertific ficate ate !uthori !uthority ty and *7ST *7ST be member member of a valid valid Cirtu Cirtual al "rgani?ation 6C"# The service is initiated by respective commands that can be submitted from any g8ite 7ser Interface either interactively or through batch submission# To run a @ob on the cluster the user must install an own or at least have access to a g8ite 7ser Interface# $ertificates can be re0uested for e%ample at the erman rid $ertificate !uthority# !uthority# D""'ist)tion
/o particular deregistration is re0uired for this service# ! user with an e%pired rid certificate or C" membership is automatically blocked from accessing the $E# IT+S"$uit
The database and log files of the $Es contain information on the status and results of the @obs and the certificate that was used to initiate the task# The re0uired data files themselves are stored on the worker nodes or in the rid Storage Elements 6SEs# /o other personal data is stored# T"$*ni$)l T" $*ni$)l "/ui"%"nts
To run a @ob at the rid cluster of the Steinbuch $entre for $o mputing 6S$$ the user needs: ! valid rid user certificate# B# *embership in a Cirtual Cirtual "rgani?ation 6C"# A# !n own or at least access to a 7ser Interface#
Ov"vi", of Gi# A$*it"$tu"