INTELLIGENT GAIT CONTROL OF A MULTILEGGED ROBOT USED IN RESCUE OPERATIONS
A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF THE MIDDLE EAST TECHNICAL UNIVERSITY
BY
EMRE KARALARLI
IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN THE DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING
DECEMBER 2003
Approval of the Graduate School of Natural and Applied Sciences
——————————————– ¨ Prof. Dr. Canan Ozgen Director I certify that this thesis satisfies all the requirements as a thesis for the degree of Master of Science. ——————————————– Prof. Dr. M¨ ubeccel ubeccel Demirekler Head of Department This is to certify that we have read this thesis and that in our opinion it is fully adequate, in scope and quality, as a thesis for the degree of Master of Science. ——————————————– Prof. Dr. I˙ smet Erkmen Co-Supervisor
——————————————– Assoc. Prof. Dr. Aydan Erkmen Supervisor
Examining Committee Members Prof. Dr. Erol Kocao˜glan
———————————–
Prof. Dr. Aydın Ersak
———————————–
Prof. Dr. I˙ smet Erkmen
———————————–
Assoc. Prof. Dr. Aydan Erkmen
———————————–
Asst. Prof. Dr. I˙ lhan Konukseven
———————————–
Approval of the Graduate School of Natural and Applied Sciences
——————————————– ¨ Prof. Dr. Canan Ozgen Director I certify that this thesis satisfies all the requirements as a thesis for the degree of Master of Science. ——————————————– Prof. Dr. M¨ ubeccel ubeccel Demirekler Head of Department This is to certify that we have read this thesis and that in our opinion it is fully adequate, in scope and quality, as a thesis for the degree of Master of Science. ——————————————– Prof. Dr. I˙ smet Erkmen Co-Supervisor
——————————————– Assoc. Prof. Dr. Aydan Erkmen Supervisor
Examining Committee Members Prof. Dr. Erol Kocao˜glan
———————————–
Prof. Dr. Aydın Ersak
———————————–
Prof. Dr. I˙ smet Erkmen
———————————–
Assoc. Prof. Dr. Aydan Erkmen
———————————–
Asst. Prof. Dr. I˙ lhan Konukseven
———————————–
ABSTRACT
INTELLIGENT GAIT CONTROL OF A MULTILEGGED ROBOT USED IN RESCUE OPERATIONS
Karalarlı, Emre M.S., Department of Electrical and Electronics Engineering Supervisor: Assoc. Prof. Dr. Aydan Erkmen ˙ Co-Supervisor: Prof. Dr. Ismet Erkmen
December 2003, 97 pages In this thesis work an intelligent controller based on a gait synthesizer for a hexapod robot used in rescue operations is developed. The gait synthesizer draws decisions from insect-inspired gait patterns to the changing needs of the terrain and that of rescue. It is composed of three modules responsible for selecting a new gait, evaluating the current gait, and modifying the recommended ommended gait according according to the internal internal reinforcemen reinforcements ts of past time steps. A Fuzzy Logic Controller is implemented in selecting the new gaits. Key words: Hexapod Walking Rescue Robots, Insect-inspired Gaits, Gait Synthesizer, GARIC.
iii
¨ OZ
¨ UY ¨ US ¨¸ C ¸ OK BACAKLI KURTARMA ROBOTLARININ AKILLI YUR ˙ I˙ DENETIM
Karalarlı, Emre Y¨uksek Lisans, Elektrik ve Elektronik M¨uhendisli˘ gi B¨ol¨um¨u Tez Y¨oneticisi: Do¸c. Dr. Aydan Erkmen ˙ Ortak Tez Y¨oneticisi: Prof. Dr. Ismet Erkmen
Aralık 2003, 97 sayfa Bu tez ¸calı¸smasında kurtarma robotlarının akıllı y¨ur¨uy¨u¸s denetimi i¸cin bir y¨ ur¨uy¨u¸s ¸sekli sentezleyicisi geli¸stirilmi¸stir.
Sentezleyici de˘gi¸sen zemin
¨ozelliklerine ve farklı kurtarma ¸calı¸smalarına cevap verebilmek i¸cin b¨oceklerden ilham alınan y¨ ur¨uy¨u¸s ¸sekillerine g¨ore karar vermektedir. Sentezleyici, y¨ur¨uy¨us¸ ¸sekli belirleyici, de˘gerlendirici ve de˘gi¸stirici olmak u ¨ zere ¨uc¸ b¨ol¨umden olu¸sur. Belirleyici, bir bulanık mantık denetleyicisidir. ˙ Anahtar S¨ ozc¨u kler: Altı Bacaklı Kurtarma Robotları, B¨ oceklerden Ilham Alınmı¸s Y¨ ur¨uy¨u¸s S¸ekilleri, Y¨ur¨uy¨u¸s S¸ekli Sentezleyicisi, GARIC.
iv
ACKNOWLEDGMENTS
I would like to express my gratitude to my supervisor Assoc. Professor ˙ Dr. Aydan Erkmen and co-supervisor Prof. Dr. Ismet Erkmen for their motivation, guidance, patience, and encouragement through the preparation of this thesis. I also thank to all my friends, especially Eng¨ur and Aslı Pi¸sirici, Mehmet¸cik and Semra Pamuk, Bora Sa˜ gdı¸co˜glu, and Sedat Ilgaz for their invaluable comments and suggestions throughout the study. Finally, I express my gratitude to my family for their endless support.
v
TABLE OF CONTENTS
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii ¨ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv OZ ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v TABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii CHAPTER 1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2. SURVEY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Search and Rescue Robotics . . . .. . . . . . . . . . . . . . . . . . . .. . . . . . . . . 7 2.2 Legged Locomotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Walking Mechanisms in Animals . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Control of Legged Robots . . . . . . . . . . . . . .. . . . . . . . . . . . . . . 13 2.2.3 Gait Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2 . 2 . 4 G a i t C o n t r o l . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . 1 9 2.3 Mathematical Background ..................................24 2.3.1 Neural-Fuzzy Controllers ..............................24 2.3.2 GARIC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.3 Fuzzy Sets and Fuzzy Logic Controllers . . . . . . . . . . . . . . . 31 2.3.4 Reinforcement Learning . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 34 3. LEGGED ROBOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.1 Dynamics and Coordinated Control of Legged Robots . . . . . . . 37 3.1.1 Motion Dynamics of Legged Robots . . . . . . . . . . . . . . . . . . . 38 vi
3.1.2 Coordinated Control of Legged Robots . . . . . . . . . . . . . . . . 45 3.2 Gait Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.2.1 Encoding the Gaits for a Multilegged Robot . . . . . . . . . . . 50 3.2.2 Gait Selection Module (GSM) . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.2.3 Gait Evaluation Module (GEM) .......................59 3.2.4 Gait Modifier Module (GMM) . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.5 The Complete Control Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4. HEXAPOD ROBOT SIMULATION . . . . . . . . . . . . . . .. . . . . . . . . . . . . 64 4.1 Hexapod Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Sensor System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.3 Kinematics of the Hexapod Robot . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4 Uneven Terrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5 . S I M U L A T I O N R E S U L T S . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . 7 1 5.1 Exploration and Exploitation Dilemma in Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2 Smooth Terrain Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 Performance on Rough Terrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5.4 Task Shapability: A Must for SAR Operations ..............79 6. CONCLUSION .................................................87 6.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 APPENDICES A. SIMULATION PROGRAM CD . . . .. . . . . . . . . . . . . . . . . . . . . .. . . . . . 97
vii
LIST OF FIGURES
FIGURES 2.1 Tripod (A) and tetrapod (B) support patterns (or support polygons) formed by contact points of the supporting legs . . . . . . . 9 2.2 Wave gait patterns. Bold lines represent swing phase. L1 signifies the left front leg and R3 indicates the right hind leg . . . 16 2.3 Hexapod model. Dashed legs are in swing phase . . . . . . . . . . . . . . . .17 2.4 Summary of coordination mechanisms in the stick insect. The pattern of coordinating influences among the step generators for the six legs is shown at the left; the arrows indicate the direction of the influence. The mechanisms are described briefly at the right . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5 The GARIC architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.6 The action evaluation network . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 27 2.7 The action selection network . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . 29 2.8 General model of a fuzzy logic controller . . . . . . . . . . . . . . . . . . . . . . . 33 3.1 Coordinate frames defined for the legged robot. The coordinate frame C ci is assigned such that the unit vector ˆz is normal to the contact surface at the point of contact . . . . . . . . . . . . . . . . . . . . . 40 3.2 Architecture of Gait Synthesizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.3 Summary of terminology used in gait analysis . . . . . . . . . . . . . . . . . . 52 3.4 Wave gait patterns. Bold lines represent swing phase. L1 signifies the left front leg and R3 indicates the right hind leg . . . 53 3.5 Antecedent Labels, fuzzification of individual leg position . . . . . . 54 3.6 Consequent Labels: task share based on operation modes . . . . . . .56 3 . 7 G a i t S e l e c t i o n M o d u l e . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . 5 8 3 . 8 C o m p l e t e c o n t r o l c y c l e . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . 6 3 4.1 The hexapod robot used in simulation . . . . . . . . . . . . . . . . . . . . . . . . . .65 viii
4.2 Hexapod model ................................................66 4.3 Each leg is identical and composed of three links. Pink legs are in swing phase whereas blue ones are in stance . . . . . . . . . . . . . . . . . 67 4.4 Two different postures of the robot. Body level of the robot in B is lowered in order to increase reachable space of the legs . . . . 68 4.5 The modelled uneven terrain. Different surface segments can be seen in the figure. The holes on uneven terrain are modelled by surface segments which are deeper than the legs can reach. Notice that the pink leg (swinging) falls into such a segment . . . 70 5.1 Body speed versus time graphs for different scale factor and t h r e s h o l d v a l u e s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . 7 4 5.2 Comparison of resultant gaits when training is done according to two different reinforcement for speed (first row) and critical margin (second row). The first column gives the resultant gaits, second one body speed versus time, and last column shows critical margin in the direction of motion versus time . . . . 75 5.3 Internal reinforcement versus time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.4 Critical margin, Cm(t), versus time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.5 Leg tip positions on x direction versus time. In order to increase the critical margin gait synthesizer applies smaller step sizes . . . 78 5.6 Leg tip trajectories of the hexapod on x-z plane with a fixed tripod gait on the defined terrain . .. . . . . . . . . . . . . . . . . . . . .. . . . . . . 81 5.7 Leg tip trajectories of the hexapod on x-z plane with gait synthesizer on the defined terrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.8 Gait of the hexapod robot on uneven terrain. The robot recovers tripod gait pattern after some time reaching the smooth terrain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.9 Gait of the hexapod robot on uneven terrain. The robot recovers tripod gait faster than the previous one . . . . . . . . . . . . . . . 84 5.10 Critical margin versus time . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . .. . 84 5.11 Leg tip positions on x direction versus time ....................85 5.12 Gait generated by gait synthesizer when leg R1 is missing . . . . . . 85 5.13 Gait generated by gait synthesizer in sudden lack of leg R1 . . . . 86
ix
CHAPTER 1 INTRODUCTION
Recent experiences of natural disasters (earthquakes, tornados, floods) and man-made catastrophes (e.g. urban terrorism) have brought the attention to the area of search and rescue (SAR) and emergency management. Horrible devastations and losses have dramatically illustrated the damage that can be expected to today’s modern industrialized countries despite the technological progresses in construction techniques [1]. Besides, these experiences showed that preparedness and emergency response of the governments are inadequate to deal with these devastations. As a result, people who have died due to lack of immediate response inevitably forced us to find out better solutions for search and rescue. The utilization of autonomous intelligent robots in search and rescue (SAR) is a new and challenging field of robotics, dealing with tasks in extremely hazardous and complex disaster environments [2]. Autonomy, high mobility, robustness, and reconfigurability are critical design issues of rescue robotics, requiring dexterous devices equipped with the ability to learn from prior rescue experience, adaptable to variable types of usage with a wide enough functionality under different sensing modules, and compliant to environmental and victim conditions. Intelligent, biologically inspired mobile robots and, in particular hexapod robots have turned out to be widely used robot types beside serpentine mechanisms [3]; providing effective, immediate, and reliable responses to many SAR operations. Aiming at enhancing the 1
quality of rescue and life after rescue, the field of rescue robotics is seeking shape changing and moreover task shapable intelligent dexterous devices. The objective of this thesis is to design a gait synthesizer for 6-legged walking robots with shape-shifting gaits that provide the necessary flexibility and adaptability needed in the difficult workspaces of rescue missions. The gait synthesizer is responsible for the locomotion of the robot providing a compromise between mobility and speed while allowing task shapability to use some legs as manipulators when need arise during rescue. Legged robots are chosen due to their advantage on rough terrains over their wheeled mobile counterparts [4], [5]. Wheeled locomotion is well suited for fast transportation. Wheels change their point of support and use friction to move forward in an efficient way. But due to this fact they require a continuous path. So, they require a preconstructed terrain, which restricts mobility demands. On the other hand, legged locomotion offers a significant potential for mobility over natural rough terrains in comparison to wheeled or tracked locomotion. Because legs can choose footholds to improve traction, to minimize lurching and to step over obstacles, they can handle with softness, unevenness of the terrain [4]. Legs can provide the capability of maneuvering within confined areas of space. Unlike wheels, legs change their point of support all at once so do not need a continuous path. Also, as seen in nature, legs are not used only for walking. Beside their main function, they are almost in every external process of animals (as tactile sensors, as manipulators, etc.). 2
However, legged locomotion possesses additional complexity in the coordination control of the legs [6]. The control of a legged robot is a sophisticated job due to the high number of degrees of freedom offered by the articulated legs. In the design of a control structure of a legged robot on difficult rough terrain there are many aspects that have to be dealt with simultaneously and that also interfere with each other. For example, movements of legs must be carefully coordinated in order to advance the body without causing feet slippage; at each step, an appropriate foothold has to be found; body attitude must be set according to the terrain profile; keep stability; accomplish a navigation task; etc. Here, a body movement for terrain adaptation may change the operation space of a leg so that the leg can not reach a chosen foothold that was within the range beforehand, or inversely, a decision for a modification in the gait may solve a stability problem. So, while coordinating the movements of body and legs, the control structure of the legged robot must also handle such interferences. In this thesis we focus on the gait control and leg coordination and emphasize the potential of redundancy of legs for handling irregularity on terrains as well as their use as manipulators. In walking robots, coordinating the movements of individual legs in order to maintain a stable gait is one of the main control tasks. Observations on insect gaits (cockroaches, stick insects) shows that insects produce sequential movements starting with hind leg protraction and followed by the middle and front legs, which is called metachronal wave or wave gait [7]. Among the numerous periodic gaits, the class of wave gaits is most important because they provide good stability [8]. The tripod gait, 3
which is a member of wave gaits, involves an alternation between right-sided and left-sided metachronal waves and it is the fastest gait. Gaits arise from the interaction of individual leg oscillators (step pattern generators) which govern the stepping of each leg by exchanging the influences of the legs [9]. The information transmitted from the step pattern generator depends upon the leg’s state (either swing or stance, position and velocity). Here the position information play a particular central role in coordination. Several researchers have implemented insect-like controllers for leg coordination ([10], [11]), most of which oriented to protect the regularity of a fixed gait pattern against perturbations. In this thesis, we work on biologically inspired wave gait patterns. Gait patterns are patterns of leg coordination which represents relative phases (swing phase or stance phase) of legs in a statically stable locomotion. These gaits have different properties from mobility and speed point of views. In our method we encoded gait pattern cycles from relative positions of the legs and find the individual leg’s tasks within those gait patterns. The method enables exploring among many different gait patterns and selecting gait patterns according to the different needs to adapt online to terrain conditions. This is the point where we have required features of intelligent control. Generalized approximate reasoning-based intelligent control (GARIC) architecture [12] is one of the realizations of the fusion of the Fuzzy and Neural technologies guided by feedback from the environment. It presents a method for learning and tuning fuzzy logic controllers (FLC) through reinforcements signals. The basic idea behind Fuzzy Logic Controllers (FLC) is to incorporate 4
the ”expert experience” of a human operator in the design of the controller in controlling a process whose input-output relationship is described by a collection of fuzzy control rules (IF-THEN rules) involving linguistic variables rather than a complicated dynamic model [13]. Our gait synthesizer has adapted the GARIC architecture, to our ob jective. The gait synthesizer that we develop for serpentine locomotion [3] and here for hexapod walking consists of three modules. The Gait Evaluation Module (GEM), acts as a critic and provides advice to the main controller, based on a multilayer artificial neural network. The Gait Selection Module (GSM), offers a new gait to be taken by the robot according to a fuzzy controller with rules for different gait patterns in the knowledge base. The Gait Modifier Module (GMM), changes the gait recommended by the GSM based on internal reinforcement. This change in the recommended gait is more significant for a state if that state does not receive high internal reinforcements. (i.e. probability of failure is high). On the other hand, if a state receives high reinforcements, GMM administers small changes to the action selected by the fuzzy controller embedded in the GSM. This reveals that the action is performing well so that the GMM recommendation dictates no or only minor changes to that gait. The basic contribution of this thesis is the development of an intelligent task shapable control, based on a gait synthesizer for a hexapod robot upon its traversal of unstructured workspaces in rescue missions within disaster areas. The gait synthesizer draws decisions from insect-inspired gait patterns to the changing needs of the terrain and that of rescue. The method provides ex5
ploration among different gait patterns using the redundancy in multi-legged structures. The thesis is organized as follows: Chapter 2 covers a survey on legged locomotion and gait analysis, and gives information about basic notions needed throughout the thesis. Chapter 3, includes the dynamics and control of legged robots, and the detailed description of the gait synthesizer. Chapter 4 introduces the simulation and chapter 5 presents and discusses the results of simulation. Chapter 6 covers the conclusion.
6
CHAPTER 2 SURVEY
2.1
Search and Rescue Robotics Contribution of robotics technology to today’s sophisticated tasks is an
inevitable progress, leading to a gradual minimization of human share, mostly, due to saturation in improvements of human abilities or to complementation of human activities. Education and training are insufficient for dealing with the complex and exhaustive tasks [1]. Thus, from the robotics point of view, the trend is to provide an intelligent versatile tool to be a complete substitution of human in risky operations and complement human operations when auxiliary intelligent dynamics are required for extra dexterity. As a part of this progress, Search and Rescue (SAR) is the one of the most crucial fields that needs robotics contribution. Search and rescue (SAR) robotics can be defined as the utilization of robotics technology for human assistance in any phase of SAR operations [2]. Robotic SAR devices have to work in extremely unstructured and technically challenging areas shaped by natural forces. One of the major requirements of rescue robot design is the flexibility of the design for different rescue usage in disaster areas of varying properties. Rescue robotic devices should be adaptable, robust, and predictive in control when facing different and changing needs. Intelligent, biologically inspired mobile robots and, in particular hexapod walking robots have turned out to be widely used robot types beside serpentine mechanisms; providing effective, immediate, and reliable responses 7
to many SAR operations.
2.2
Legged Locomotion Legged locomotion offers a significant potential for mobility over highly
irregular natural rough terrains cut with ditches and high unpredictable in comparison to wheeled or tracked locomotion [4], [5]. Legs can provide the capabilities of stepping over obstacles or ditches, and maneuvering within confined areas of space. They can handle with softness, the unevenness of the terrain. Beside their main function in locomotion, legs are almost in every external process of animals. The articulated structures of legs serve as manipulators to pull, push, hold, etc. or as tactile sensors to explore the environment.
2.2.1
Walking Mechanisms in Animals Millions of years of evolution have resulted in a large number of locomo-
tory designs for efficient, rapid, adjustable and reliable movement of the animals [15]. The major variations are observed in the number of legs (from two in humans to about two hundred in a millipede), the length and shape (some spiders possess extremely long and slender legs whereas hedgehogs have comparatively short legs), the positioning of the legs (insects carry their body between the legs, whereas mammals tuck their legs underneath), and the type of skeleton (arthropods use an exoskeleton made of chitin-protein cuticle, whereas vertebrates use an endoskeleton composed of bone). Despite this diversity, legged locomotion in animals has some basic similarities according to their mechanics and control. 8
At its fundamental level, legs work in a cyclic manner to locomote. The step cycle for an individual leg consists of two basic phases: the swing phase, when the foot is off the ground and moving forward, and the stance phase, when the foot is on the ground and the leg is moving backward with respect to the body. The propulsive force for progression is developed during the stance phase. A common feature of the step cycle in most of the animals (including man) is that the duration of the swing phase remains comparatively constant as walking speed varies. Accordingly changes in the speed of progression are produced primarily by changes in the time it takes for the legs to be retracted during the stance phase [21].
Figure 2.1: Tripod (A) and tetrapod (B) support patterns (or support polygons) formed by contact points of the supporting legs.
Animal locomotion can be classified into two categories according to gait they use [23]. The first type is the one exhibited by insects. Insects are arthropods and have a hard exoskeletal system with joined limbs. They use their legs as struts and levers and the legs must always support the body during walking, in addition to providing propulsion. In other words, the sequential 9
pattern of steps must ensure static stability. The vertical projection of the center of gravity must therefore always be within the support pattern (the two dimensional convex polygon formed by the contact points (Fig. 2.1)). This kind of locomotion has been described as crawling and the legs have to provide at least tripod of support at all times. Another kind of locomotion may be observed in humans, horses, dogs, cheetahs, and kangaroos which have a more flexible structure. These animals require dynamic balance, which is less stringent restriction on the posture and the gait of the animal. The animal may not be in static equilibrium. On the contrary, there may be periods of time when none of the support legs are on the ground as is observed in trotting horses, running humans, and hopping kangaroos. The mechanism by which the nervous system generates the cyclic movements of the legs during walking is basically the same in animals [23], [21]. The first significant efforts analyzing the nervous system were in the beginning of 1900s with the work of two British physiologists, C. S. Sherrington and T. Graham Brown [21]. Sherrington first showed that rhythmic movements could be elicited from the hind legs of cats and dogs some weeks after their spinal cord had been severed. Since the operation had isolated from the rest, the nervous center that control the movement of the hind legs, he showed that the higher levels of the nervous system are not necessary for the organization of stepping movements. He explained the generation of rhythmic leg movements by a series of ”chain reflexes” (a reflex being a stereotyped movement elicited by the stimulation of a specific group of sensory receptors). Thus he conceived that the sensory input generated during any part of the step cycle elicit the
10
next part of the cycle by a reflex action, producing in turn another sensory signal that elicits the next part of the cycle, and so on. Whereas, Graham Brown demonstrated that rhythmic contractions of leg muscles, similar to those that occur during walking, could be induced immediately following transection of the spinal cord even in animals in which all input from sensory nerves in the legs had been eliminated. So, Graham claimed that mechanisms located entirely within the spinal cord are responsible for generating the basic rhythm for stepping in each leg. Actually these two concepts are not compatible but neither provides a complete explanation by itself [21]. Further experiments in a number of laboratories have yielded results that strongly support the dual view of the nervous mechanisms involved in walking. Both approaches have attractive features as models for understanding how neural systems produce behavior. If walking is the consequence of complete motions (central pattern), then it is much easier to see how phase coordination of multiple legs is possible. On the other hand, it is more difficult to see how adaptation to details of the terrain is possible when walking is composed of complete motions. This state of affairs is reversed when the model is based on reflexes. The consensus that evolved was that aspects of both models are important to the control of locomotion and that neither was completely correct by itself [21]. Thus, our gait synthesizer joins both sensory effects, environmental task performance as reinforcement and simple neural structure for phase coordination of the multiple legs of our robot. However our generated system is reflexive enough to adapt to the sudden unevenness of the terrain in rescue operations.
11
The process that gives rise to locomotion is a complicated control system [16]. Motor output is constantly modified by both neural and mechanical feedback . Specialized circuits within the nervous system, called central pattern generators (CPGs), produce the rhythmic oscillations that drive motor neurons of limb and body muscles in animals as diverse as leeches, slugs, lampreys, turtles, insects, birds, cats, and rats. Although CPGs may not require sensory feedback for their basic oscillatory behavior, such feedback is essential in structuring motor patterns as animals move. This influence may be so strong that certain sensory neurones should be viewed not as modulators but as integral members of distributed pattern-generating network that comprises both central and peripheral neurones. This is the main motivation behind our gait synthesizer learning to select gait patterns while other parts of the synthesizer learns to evaluate performance based on sensory data and modify these patterns when necessary. More specifically, Gait Selection Module, GSM (section 3.2.2), in our architecture acts as the CPG of real animals. As a result of studies on animal locomotion, a few themes emerge. First, the dynamics of locomotion is complicated on the basis of a few common principles, including common mechanisms of energy exchange and the use of force for propulsion, stability, and maneuverability. Second the locomotory performance of animals in nature habitats reflects trade-offs between different ecologically important aspects of behavior and is affected by the physical properties of the environment. Third, the control of locomotion is not a linear cascade, but a distributed organization requiring both feedforward motor patterns and neural and mechanical feedback. Fourth, muscles perform many 12
different functions in locomotion, a view expanded by the integration of muscle physiology with whole-animal mechanics (muscles can act as motors, brakes, springs, and struts). Because machines face the same physical laws and environmental constraints that biological systems face when they perform similar tasks, the solutions they use may embrace similar principles. Legged machines have a lot of things to learn from the nature. But, evolutionary pressures that dictate the morphology and physiology of animals do not always give the suitable results for our tasks. For example, % 40 of the body mass of a shrimp is devoted to the large, tasty abdominal muscles that produce a powerful tail flick during rare, but critical, escape behaviors [16]. The imitation of a such body design will surely result in an inefficient machine. The consequence is that the information taken from the nature must be processed and the fundamental principles must be defined. That is why we concentrated on redundant legged robots and more specifically 6 legged ones.
2.2.2
Control of Legged Robots The main challenge for legged robots is the control system. A system
that controls such a robot accomplishes several tasks [5]. First it regulates the robot’s gait, that is, the sequence and way in which the legs share the task of locomotion. For example, six legged robots work with gaits that elevate a single leg at a time or two or three legs simultaneously. A gait that elevates several legs at once generally makes it possible to travel faster but offers less stability than a gait that keeps more legs on the ground. 13
A second task is to keep the robot from tipping over. For the vehicles using static stability, if the center of gravity of the robot moves beyond the base of support provided by the legs, the robot will tip. So, location of the center of gravity with respect to the placement of the feet must be continuously monitored by the robot. In our control structure, static stability is provided by ensuring safety margins from physical limits such as the distance of the center of gravity from the support polygon and distance of the legs from their reaching limits during support phase. Since many legs share the support of the body, a third task is to distribute the support load and the lateral forces among the legs. Smoothness of the ride and minimal disturbance of the ground are the main objectives during this task. In this thesis work the smoothness of the legged robot is provided by applying periodic wave gait patterns of the insects. The perturbations of the ground to the robot are compensated by choosing proper gaits during the locomotion. A fourth task is to make sure the legs are not driven past the limits during their travel. The geometry of the legs may make it possible for one leg to bump into another. Control system must take into account the limits of the leg’s motion and the expected motion of the robot during that leg’s stance period. In our robot the legs’ operation areas are restricted such that they do not overlap. A fifth task is to choose places for stepping that will give adequate sup-
14
port. For this task a sensor system that would scan the ground ahead of the robot will be required. This system will build an internal digital model of the terrain and process to find suitable footholds. Here, softness of the terrain may cause problems. In the gait synthesizer we developed, a task oriented internal model is learned during the learning process of gait evaluation. We perform these five tasks (which are related to locomotion) on a hexapod robot by focusing on gait control. In other words, our solutions to the problems in the overall control of the hexapod robot are based on gait control. For the rest of the tasks which depend on the application, we just show the potential of the gait synthesizer. Specifically, we will show that the gait synthesizer is capable of adapting to the rescue operations where a leg of the hexapod is used as manipulator while the rest provide mobility. However, the key challenge in legged robots is to control individual components (legs) for cooperative manipulation, while obtaining their cooperation for walking as an integrated whole. This is behind the motivation of this thesis work.
2.2.3
Gait Analysis In this thesis we focus on gaits of legged robots. A gait is a sequence
of leg motions coordinated with a sequence of body motions for the purpose of transporting the body of the legged system from one place to another [8]. Gait analysis is one of the fundamental areas in the study of walking robots. It is important because it is the major factor that affects the geometric and control design of a walking robot [30]. In general, there are two types of gaits: periodic and non-periodic gaits [8]. 15
Figure 2.2: Wave gait patterns. Bold lines represent swing phase. L1 signifies the left front leg and R3 indicates the right hind leg [7].
Periodic gaits are those in which a specific pattern of leg movement is imposed. Observations on insect gaits (cockroaches, stick insects) show that insects produce sequential movements starting with hind leg protraction and followed by the middle and front legs, which is called metachronal wave or wave gait [7]. The slowest gait involves an alternation between right-sided and left-sided metachronal waves (Fig. 2.2A). As these waves overlap (Fig. 2.2B to 2.2E), tetrapod gaits (Fig. 2.2C, 2.2D) and the typical tripod gait (Fig. 2.2E) are generated. The tripod gait (observed in hexapod insects such as cockroaches), which is the fastest statically stable gait that a six-legged mechanism can use. In the tripod gait, three legs that enclose the center of gravity support the body while the other legs simultaneously lift and recover. Periodic gaits offer good mobility over smooth terrain since they possess optimum stability. However, terrain irregularities, which can be dealt with these gaits, 16
are relatively limited. If the terrain irregularity is severe such as in natural disaster areas, periodic gaits become ineffective, and special gates need to be developed. These gaits are non-periodic gaits. Works in this area comprise of studies on free gaits [32], and large obstacle gaits [30]. Free gaits are gaits in which any leg is permitted to move at any time [31]. In free gait approach, a finite set of gait states is defined and control is done on a rule-based principle, resulting in simple motions lack of smoothness. Our gait control approach takes the advantages of these gaits in order to achieve a smooth and adaptive locomotion over unpredictive terrain roughness.
Figure 2.3: Hexapod model. Dashed legs are in swing phase.
Fig. 2.3 shows a hexapod model. The leg order as labelled in Fig. 2.3 is adopted throughout our thesis work. Below are some terms used in gait analysis [8], [10], [30]: 1. Protraction: The leg moves towards the front of the body. 17
2. Retraction: The leg moves towards the rear of the body. 3. Stance phase: The leg is on the ground where it supports and propels the body. In forward walking, the leg retracts during this phase. Also called power stroke or support phase. 4. Swing phase: The leg lifts and swings to the starting position of the next stance phase. In forward walking, the leg protracts during this phase. Also called the return stroke or recovery phase. 5. Cycle time: The time for a complete cycle of leg locomotion of a periodic gait. 6. Duty factor of a leg: The time fraction of a cycle time in which the leg is in the support phase. 7. Phase of a leg: The fraction of a cycle period by which the current leg position follows the placement of the leg. 8. Support Polygon: Two dimensional point set in a horizontal plane consisting of the convex hull of the vertical projection of all foot points in support phase (Fig. 2.3). 9. Stability Margin (Sm): The shortest distance of the vertical projection of center of gravity to the boundaries of the support pattern in the horizontal plane. 10. Front and Rear Boundary: The boundaries of the support polygon, respectively ahead of and behind the projection of center of gravity in forward walking, and intersect the longitudinal body axis.
18
11. Front and Rear Stability Margin (Front and Rear Sm): The distances from the vertical projection of the center of gravity to the front and rear boundaries of the support polygon respectively, in forward walking. 12. Kinematic Margin (Sm): The distance from the current foothold of a stance leg to the border of its reachable area in the opposite direction of body motion (Fig. 2.3). 13. Anterior Extreme Position (AEP): In forward walking, this is the target position of the advance degree of freedom during recovery phase. It is the foremost position a leg reaches during a cycle. 14. Posterior Extreme Position (PEP): In forward walking, this is the target position of the swing degree of freedom during support phase. It is the backmost position a leg reaches during a cycle. 15. The Stroke distance (Sd): The distance between Anterior Extreme Point (AEP) and Posterior Extreme Point (PEP).
2.2.4
Gait Control One of the aspects related with the control of legged robots is the genera-
tion of stable gaits [31]. The task of gait generation mechanism can be defined as selecting an appropriate coordination sequence of leg and body movements so that the robot advances with a desired speed and direction. Gait generation for six legged (or more) robots has been addressed with several researches that we will overview here.
19
The principle stated by experimental studies of walking in insects is that gaits arise from the interaction of individual leg oscillators (step pattern generators) which govern the stepping of each leg by exchanging the influences of the legs [9]. The information transmitted from the step pattern generator depends upon the leg’s state (either swing or stance, position and velocity). Here the position information plays a particular central role in coordination. We take this also into consideration in our work. Several versions of this interleg coordination principle is investigated and implemented in insect-inspired walking robots. Pearson [21] proposed that modification of the walking coordination may occur through load sensors in the leg’s chordotonal organ and position information from the campaniform sensillae. This model formed the basis of Beer’s simulation of cockroach behaviors [18] where the effect of load and position sensors was simulated by forward and backward angle sensor ”neurons” as well as ground contact and stance and swing ”neurons” within a distributed neural network control architecture. This basic model was then implemented on a walking robot with two degrees of freedom per leg [19]. A more complex interleg coordination model is proposed by [20] and [24]. Together they identify at least six mechanisms that work between legs in a stick insect. A summary of the coordination mechanisms in the stick insect is shown in Fig. 2.4. The arrows indicate the direction of influences which establish the coordination of the legs providing stability. In [24], [25] most of these mechanisms are simulated and some of them have also been implemented on a robot with two dof per leg [26] and two robots with three dof per leg [27]. In 20
Figure 2.4: Summary of coordination mechanisms in the stick insect. The pattern of coordinating influences among the step generators for the six legs is shown at the left; the arrows indicate the direction of the influence. The mechanisms are described briefly at the right [24].
the implementations, interleg coordination mechanisms operate by modifying the PEP (AEP and PEP are applied as a switching point between swing and stance phases and AEP is set to a constant value) of a receiving leg depending upon the state of a sending leg. In [10], Ferrell compares different insect inspired gait controllers. The most important feature of these implementations is that they are highly distributed. But, much is still unknown about the general dynamical behavior of the models and dependence of this behavior on parameters [34]. So, parameters associated with the model must be tuned heuristically to achieve a desired behavior. However, one of the major requirements of rescue robot design is the flexibility of the design for different rescue usage in disaster areas of varying properties [2]. Our work on gait control offers such a flexibility by
21
drawing decisions from insect-inspired gait patterns to the changing needs of the terrain and that of rescue. In the literature some of the complete walking robot designs does not offer remarkable approaches for gait control [37], [11]. They usually apply fixed gait patterns (especially tripod). But some researches still focus on the subject. In [30], Choi and Song deal with obstacle-crossing gaits. Their study is presented on fully automated gaits that can be used to cross four types of simplified obstacles: grade, ditch, step, and isolated wall. After the type and dimensions of an obstacle is entered, the system generates a series of preprogrammed movements that enables a hexapod to cross over the obstacle in a fully automated mode. Our approach provides obstacle crossing by trying different gaits rather than imposing pre-programmed movements. In [36] a gait state definition is presented as a function of the last steps executed. They identify several classes of gait states and transitions between them. They show that independently from the initial posture of robot, the robot would be in one of the four situations according to the number of legs in contact by executing a sequence of gait states and the tripod gait can be obtained. Yang and Kim focus on the robustness to damages to legs in walking machines and deal with fault tolerant gaits [28]. These are the gaits maintaining stability in static walking against a fault event preventing a leg from having the support state. In [29], they successfully implement a fault tolerant gait over uneven terrain. In our gait control approach we do not distinguish the 22
gaits according to their fault tolerances but we enable the controller to search for the gait that will solve the problem. In [6], Celaya and Porta present a complete control structure for the locomotion of a legged robot on uneven terrain. In the gait generation they use two rules by which different gaits including the complete family of wave gaits can be obtained with a proper initial state. With the first rule that is: ’never have two neighboring legs raised from the ground at the same time’, static stability is guaranteed. Whereas the second rule: ’a leg should perform a step when this is allowed by the first rule and its neighboring legs have stepped more recently than it has’, forces the alternation of the steps of any pair of neighboring legs. These two rules are local so that no central synchronization is required. In [33], a modified version of Q-learning approach is used for the decentralized control of the Robot Kafka. Each leg, which can be in one of finite number of states, has its own look-up table and can communicate with the others. Based on the legs’ states and those, to which they are coupled, actions are chosen according to these lookup tables. Modified Q-learning approach is employed to search for a set of actions resulting in successful walking gaits. Parker and et al utilize Cyclic Genetic Algorithm (CGA) to produce gaits for a hexapod robot [40]. The approach to generate a gait is to develop a model capable of representing all states of the robot and use a cyclic genetic algorithm to train this model to walk forward. CGA is developed as a modification of the standard Genetic Algorithm. The CGA incorporates time into 23
the chromosome structure by assigning each gene a task to be accomplished in a set amount of time. Also some portions of the chromosome (tasks) are repeated creating a cycle. This allows the chromosome to represent a program that has a start section and an iterative section. In [40] it is shown that with only minimal a priori knowledge the optimal tripod gait for a hexapod robot can be produced. In [11], a survey of different approaches for gait generation can be found. Among the methods in the literature, our gait synthesizer has an hybrid structure. Interaction of legs (mutual inhibitions and excitations) in biological systems results in observed gait patterns. Without implementing the described interleg mechanisms we work on these patterns so that we make use of biological background. Whereas our system enables the flexibility of non-periodic gaits by allowing any leg to move out of the pattern when needed. In our approach both the terrain conditions and performance criteria determine the gait to be applied.
2.3 2.3.1
Mathematical Background Neural-Fuzzy Controllers Neural Fuzzy Controllers (NFCs), based on a fusion of ideas from fuzzy
control and neural networks, possess the advantages of both neural networks (e.g., learning abilities, optimization abilities, and connectionist structures) and fuzzy control systems (e.g., humanlike IF-THEN rule thinking and ease of incorporating expert knowledge) [13]. Fuzzy systems and neural networks
24
share the common ability to improve the intelligence of systems working in an uncertain, imprecise, and noisy environment. The main purpose of neural fuzzy control system is to apply neural learning techniques to find and tune the parameters and/or structure of the neuro-fuzzy control systems. Some of the works in this area are Generalized Approximate Reasoning based Intelligent Control (GARIC) [12], Fuzzy Adaptive Learning Control Network (FALCON) [52], Adaptive Neuro Fuzzy Inference System (ANFIS) [53], and Neuro-Fuzzy Control (NEFCON) [54]. In our work we adopted the GARIC architecture in order to develop the gait synthesizer for our multilegged robot.
2.3.2
GARIC Architecture Generalized approximate reasoning-based intelligent control (GARIC),
introduced by Berenji and Khedkar [12], is a neural fuzzy control system with reinforcement learning capability. GARIC presents a method for learning and tuning fuzzy logic controllers (FLC) through reinforcements signals. It consists of three modules (Fig. 2.5): an action evaluation network (AEN) that maps a state vector and a failure signal into a scalar score (internal reinforcement) indicating the goodness of the state, an action selection network (ASN) that maps a state vector into a recommended action using fuzzy inference, and a stochastic action modifier that produces actual action based on internal reinforcement. Learning occurs by fine-tuning the free parameters in the two networks: in the AEN, the weights are adjusted; in the ASN, the parameters describing the fuzzy membership functions are changed.
25
Figure 2.5: The GARIC architecture [12].
Action Evaluation Network The AEN constantly predicts reinforcements associated with different input states. It is a two-layer feedforward network with direct interconnections from input nodes to output node (Fig. 2.6). The input to the AEN is the state of the plant, and the output is an evaluation of the state (or equivalently, a prediction of the external reinforcement signal) denoted by v(t). The output of each node in the AEN is calculated by the following equations n
y (t) = g( a (t)x (t)) b (t)x (t) + c (t)y (t) v(t) = i
ij
j
(2.1)
j =1
n
n
i
i
i=1
where
i
i
(2.2)
i=1
1 (2.3) 1+e s is the sigmoid function, v is the prediction of the reinforcement signal, and aij , g(s) =
26
−
bi, and ci are corresponding link weights shown as A, B, and C in Fig. 2.6.
Figure 2.6: The action evaluation network.
This network evaluates the action recommended by the action network as a function of the failure signal and the change in state evaluation based on the state of the system at time t
0 rˆ(t) = r(t) − v(t − 1) r(t) + γv(t) − v(t − 1)
start state failure state
(2.4)
otherwise where 0 ≤ γ ≤ 1 is the discount rate. In other words, the change in the value of v plus the value of the external reinforcement constitutes the heuristic or internal reinforcement, ˆr, where the future values of v are discounted more the further they are from the current state of the system. Learning in AEN is based on internal reinforcement, ˆr(t). If r is positive, 27
the weights are altered so as to increase the output v for positive input, and vice versa. Therefore, the equations for updating the weights are as follows: bi (t) = bi (t − 1) + β ˆ r(t)xi (t − 1)
(2.5)
ci (t) = ci (t − 1) + β ˆ r(t)yi(t − 1)
(2.6)
aij (t) = aij (t − 1) + β h rˆ(t)yi (t − 1)(1 − yi (t − 1))sgn(ci (t − 1))x j (t − 1) (2.7) where β > 0 and β h > 0 are constant learning rates.
Action Selection Network As shown in Fig. 2.7, the ASN is a five layer network with each layer performing one stage of the fuzzy inference process. The functions of each layer are briefly described here.
• Layer 1: An input layer that just passes input data to the next layer. • Layer 2: Each node in this layer functions as an input membership function. Here triangular membership functions are used:
1 − |x − c|/s µ (x) = 1 − |x − c|/s 0 V
L
x ∈ [c − sL , c]
R
x ∈ [c, c + sR ]
(2.8)
otherwise
where V = (c, sL , sR ) indicates an input linguistic value, and c, sL , sR correspond to the center, left spread, and right spread of the triangular membership function µV , respectively. 28
Figure 2.7: The action selection network.
• Layer 3: Each node in this layer represents a fuzzy rule and implements the conjunction of all the preconditions in the rule. Its output wr , indicating the firing strength of this rule, is calculated by following continuous, differentiable softmin operation:
kµi
µe w = e i
i
r
i
kµi
(2.9)
where µi is the output of a layer 2 node, which is the degree of matching between a fuzzy label occurring as one of the preconditions of rule r and the corresponding input variable. The parameter k controls the hardness of the softmin operation, and as k → ∞ we recover the usual min operator. However, for k finite, we get a differentiable function of the inputs, which makes it convenient for calculating gradients during the learning process. The choice of k is not critical. 29
• Layer 4: Each node in this layer corresponds to a consequent label. For each of the wr supplied to it, this node computes the corresponding output action as suggested by rule r. This mapping is written as µ 1 (wr ), −
where the inverse is taken to mean a suitable defuzzification procedure applicable to an individual rule. For triangular functions,
µY 1 (wr ) = c + 0.5(sR − sL )(1 − wr ) −
(2.10)
where Y = (c, sL , sR ) indicates a consequent linguistic value.
• Layer 5: Each node in this layer is an output node that combines the recommendations from all the fuzzy control rules using the following weighted sum:
−1
w µ (w ) F = w r
r
r
r
(2.11)
r
In the ASN, adjustable weights are present only on the input links of layers 2 and 4. The other weights are fixed at unity.
Stochastic Action Modifier In GARIC architecture, the output of ASN is not applied to the environment directly. Stochastic Action Modifier (SAM) uses the values of ˆr from the previous time step and the action F recommended by the ASN to stochastically generate an action, F , which is a Gaussian random variable with mean
F and standard deviation σ(ˆr(t − 1)). This σ() is some nonnegative, monotonically decreasing function, e.g. exp(−rˆ). The action F is what is actually
30
applied to the plant. The stochastic perturbation in the suggested action leads to a better exploration of state space and better generalization ability. When rˆ(t − 1) is low, meaning the last action performed is bad, the magnitude of the deviation |F − F | is large, whereas the controller remains consistent with
the fuzzy control rules when ˆr(t − 1) is high. The actual form of the function σ(), especially its scale and rate of decrease, should take the units and range of variation of the output variable into account. In GARIC, the goal of calculating F values in ASN is to maximize the evaluation of the gait, v, determined by AEN. The gradient information ∆ p = δv/δp ( p is the vector of all adjustable weights in ASN) is estimated by stochastic exploration in the Stochastic Action Modifier (SAM). The modification implemented in t − 1 by SAM is judged by rˆ(t). If rˆ > 0, meaning the modified F (t − 1) is better than expected, then F (t − 1) is moved closer to
the modified one, and vice versa.
2.3.3
Fuzzy Sets and Fuzzy Logic Controllers Fuzzy sets, introduced by Zadeh in 1965 as a mathematical way to rep-
resent vagueness in linguistics, can be considered a generalization of classical set theory [47]. In a classical set, the membership of an element is crisp; it is either yes (in the set) or no (not in the set). A crisp set can be defined by the so-called characteristic function (or membership function). The characteristic function µA(x) of a crisp set A is given as
1 µ (x) = 0 A
31
if x ∈ A if x ∈A
Fuzzy set theory extends this concept by defining partial memberships, which can take values ranging from 0 and 1: µA(x) : U → [0, 1] where U refers to the universal set defined in a specific problem. Fuzzy logic was one of the major developments of Fuzzy Set Theory and was primarily designed to represent and reason with knowledge that cannot be expressed by quantitative measures. The main idea of algorithms based on fuzzy logic is to imitate the human reasoning process to control ill-defined or hard-to-model plants. Fuzzy inference systems model the qualitative aspects of human knowledge through linguistic if-then rules. Every rule has two parts: an antecedent part (premise), expressed by if..., which is the description of the state of the system, and a consequent part, expressed by then..., which is the action that the operator who controls the system must take. We can use fuzzy sets to represent linguistic variables. Linguistic variables represent the process states and control variables in a fuzzy controller. Their values are defined in linguistic terms and they can be words or sentences in a natural or artificial language. The most important operators in classical set theory with crisp sets are complement, intersection, and union. These operations are defined in fuzzy logic via membership functions. The membership values in a complement subset A¯ are µA¯ (x) = 1 − µA (x) 32
which corresponds to the same operation in the classical theory. For the intersection of two fuzzy sets various operators have been proposed (min operator, algebraic product, bounded product,...). The min operator for two fuzzy sets A and B is given as µA(x)andµB (x) = min{µA(x), µB (x)} For the union of two fuzzy sets, there is a class of operators named t-conorms or s-norms. One of the most used in the literature is the max operator: µA (x)orµB (x) = max{µA(x), µB (x)}
Figure 2.8: General model of a fuzzy logic controller.
A typical architecture of a FLC is comprised of four principal components (Fig. 2.8): a fuzzifier, a fuzzy rule base, inference engine and defuzzifier. The fuzzifier performs the fuzzification that converts the data from the sensor measurements into proper linguistic values of fuzzy sets through predefined input membership functions. In Fuzzy Rule Base, fuzzy control rules are characterized by a collection of fuzzy IF-THEN rules in which the preconditions and consequents involve linguistic variables. This collection of fuzzy control rules 33
(or fuzzy control statements) characterizes the simple input-output relation of the system. system. The inference inference engine engine is to match the output of the fuzzifier fuzzifier with the fuzzy logic rules and perform fuzzy implication and approximate reasoning to decide a fuzzy fuzzy control control action. Finally Finally,, the defuzzifier defuzzifier performs the function function of defuzzification to yield a nonfuzzy (crisp) control action from an inferred fuzzy control action through predefined output membership functions. The principal elements of designing a FLC include defining input and output variables, deciding on the fuzzy partition of the input and output spaces and choosing the membership functions for the input and output linguistic variables, deciding on the types and derivation of fuzzy control rules, designing the inference mechanism, and choosing a defuzzification operator [13].
2.3.4 2.3.4
Reinforc Reinforcemen ementt Learning Learning Reinforcement learning is an approach to artificial intelligence that em-
phasizes learning by the individual from its interaction with its environment [13]. The environmen environmentt supplies a time varying varying vector vector of input to the system, receives its time varying vector of output or action and then provides a time varying scalar reinforcement signal. Here, the reinforcement signal r(t) can be one of the following following forms: a two valued valued number number r(t) ∈ {-1,1}or {-1,0} such that r(t)=1 (0) means ”success” and r(t)=-1 means ”failure”; a multi-valued discrete number in the range [-1,1] or [-1,0], for example r(t) ∈ {-1, -0.5, 0, 0.5, 1}, a real number r(t)=[-1,1] or [-1,0], which represent a more detailed and contin continuou uouss degree degree of failur failuree or succes success. s. We also also assume assume that r(t) is the reinforcement signal available at time step t and is caused by the inputs and 34
actions at time step (t-1) or even affected by earlier inputs and actions. Challenging problem in reinforcement learning is that there may be a long time delay between a reinforcement signal and the actions that caused it. In such cases a temporal credit assignment problem results because we need to assign credit or blame, for an eventual success or failure, to each step individually in a long sequence. sequence. An approach approach to solve solve such problem problem is based on the temporal difference methods [41]. TD methods consist of a class of incremental learning learning procedures procedures specialized for prediction prediction problems. problems. TD methods assign credit based on the difference between temporally successive predictions. The main characteristic of these methods is that it is not required to wait until the actual outcome is known. The object of learning is to construct an action selection policy that optimizes the systems performance. A natural measure of performance is the discounted cumulative reinforcement (utility [38]) ∞
k
γ r V = t
t+k
(2.12)
k=0
where V t is the discount discounted ed cumulati cumulative ve reinforce reinforcemen mentt starting starting from time t throughout the future, rt is the reinforcement received after the transition from time t − 1 to t, and 0 ≤ γ ≤ 1 is a discount factor, which adjusts the importance importance of long term consequence consequencess of actions. actions. In the approach approach to solve the temporal credit assignment problem, the aim is to learn an evaluation function to predict the discounted cumulative cumulative reinforcement. Evaluation Evaluation function (V xπ ) is the expected discounted cumulative reinforcement that will be received starting from state x, or simply the utility of state x. The evaluation function 35
is represented using connectionist networks (evaluation network or critic) and learned using a combination of temporal difference methods and error backpropagation propagation algorithm. algorithm. TD methods compute the error called called the TD error between temporally successive predictions, and the backpropagation algorithm minimizes the error by modifying the weights of the networks. Let pt is the output of the evaluation network, which denotes the estimate at time step t for the evaluation function V xπ , given the state xt , and rt is the actual cost incurred between time steps t − 1 and t. Then pt
−1
predicts
∞
k
γ r
t+k
= rt + γp t
(2.13)
k=0
In this case the prediction error (TD error) which is the difference between estimated evaluation and actual evaluation would be (rt + γp t ) − pt
−1
(2.14)
This method is used for prediction problems in which exact success or failure may never become completely known.
36
CHAPTER 3 LEGGED ROBOT
3.1
Dynamics and Coordinated Control of Legged Robots The dynamics of a robotic system play a central role in both its control
and simulation. When studying the control of robots, the primary problem, which must be solved, is known as Inverse Dynamics. Solution of the inverse dynamics problem requires the calculation of the actuator torques and/or forces, which will produce a prescribed motion in the robotic system. Whereas, in the area of simulation, the fundamental problem to be solved is called Forward or Direct Dynamics. Solution of this problem requires the determination of the joint motion, which results from a given set of applied joint torques and/or forces. The overall mechanism of a legged robot is a closed-chain comprised of a body with supporting legs. The kinematic relations between the leg joint motion and the body motion are complicated. The additional complexity arises because the chains (legs) of the system are coupled through the body. In the approach presented here the resemblance between the control of legged robots and the manipulation of objects by multi-fingered robot hands is considered. The dynamics and control of grasping are developed in various prior works [48], [51]. We adapt these concepts here to legged robots. The basics of the mathematical background given in this section can be found in [42], [44], [50]. Note that these analysis are valid for legged robots using static 37
balance where the body is continuously supported by at least three legs constituting a support polygon. The dynamics and control algorithm represented here must be considered within a complete control system for a legged robot including navigation, terrain adaptation, etc. Because these concepts are out of the scope of this thesis work, we will just give the algorithm, however simulations for the gait synthesizer will be implemented with a simpler model described in chapter 4.
3.1.1
Motion Dynamics of Legged Robots We firstly derive equations concerned with moving coordinate frames.
Let C 1 and C 2 be two coordinate frames. We denote by p12 ∈ R3 and R12 ∈ −1
SO(3) (3 × 3 orthogonal matrix, R
= RT ) the position and orientation of
C 2 relative to C 1 . Beside, we denote by v12 = p˙ 12 and w12 = S 1 (R˙ 12 RT 12 ) (or −
R˙ 12 = S (w12 )R12 ), translational and rotational velocity of C 2 relative to C 1 , where S is an operator defined by
w w= w
1
2
w3
,
S (w) =
0
−w3
w2
w3
0
−w1
−w2
w1
0
which clearly satisfies S (w)f = w × f
and
AS (w)AT = S (Aw)
for all A ∈ SO(3),
w, f ∈ R3 .
Now consider three coordinate frames C 1 , C 2 , and C 3 . The position and orientation of C 3 relative to C 1 is given by [50] 38
p13 = p12 + R12 p23
(3.1)
R13 = R12 R23
(3.2)
Then translational velocity of C 3 relative to C 1 is obtained by v13 = p˙ 13 = p˙ 12 + R˙ 12 p23 + R12 p˙ 23
(3.3)
v13 = v12 − S (R12 p23 )w12 + R12 v23
(3.4)
which is
To see this, we observe that R˙ 12 p23 = S (w12 )R12 p23 T = (R12 R12 )S (w12 )R12 p23 T = R12 S (R12 w12 ) p23 T = R12 (R12 w12 ) × p23 T = R12 (− p23 ) × (R12 w12 ) T = −R12 S ( p23 )R12 w12
= −S (R12 p23 )w12 By differentiating both sides of equation 3.2, we also obtain rotational velocity of C 3 relative to C 1 : R˙ 13 = R˙ 12 R23 + R12 R˙ 23
(3.5)
S (w13 )R13 = S (w12 )R12 R23 + R12 S (w23 )R23
(3.6)
S (w13 )R13 = S (w12 )R13 + S (R12 w23 )R13
(3.7)
w13 = w12 + R12 w23
(3.8)
by the transformation
39
T R12 S (w23 )R23 = R12 S (w23 )(R12 R12 )R23
= S (R12 w23 )R13 ) Then the generalized velocity of C 3 relative to C 1 is given in matrix form by
v I −S (R p ) v R = + 0 0 w I w 13
12
23
12
13
12
12
0 R12
v w 23
(3.9)
23
Figure 3.1: Coordinate frames defined for the legged robot. The coordinate frame C ci is assigned such that the unit vector ˆ z is normal to the contact surface at the point of contact.
In Fig. 3.1 the coordinate frames C w , C B , C bi , C ti , and C ci denote respectively the inertial base frame, the body coordinate frame attached to the center of mass of the body, the leg base frame of leg i, the leg tip frame of leg i, and the local frame at the contact point of leg i. For the relations of these coordinate frames we know that ptc = 0, and C c and C b are fixed with respect to C w and C B , respectively (vwc = wwc = vBb = wBb = 0). Besides, according to equation 3.9 following relations exist: 40
v I −S (R p ) v R 0 v = + w 0v I R 0 w v 0 R w (3.10) = w + 0 R w v R 0 v = (3.11) 0 R w w v I −S (R p ) v R 0 v = + = 0 0 0 w I w R w bc
bt tc
bt
bc
bt
tc
bt
bt
bt
tc
tc
bt
bt
Bc
tc
Bb
Bc
wc
bt
bc
Bb
wB Bc
bc
wB
wc
wB
wB
Bc
wB
Bc
(3.12)
I −S (R p ) v − 0 I w R −R S (R − 0 R v −T wB Bc
R 0 v = 0 R w p ) v w = wv v =
wB
wb
wB
T wb
T wb
wb
wB Bc
T wb
bc
bc
wB
bc
wB
bc
wB
bc
(3.13)
(3.14)
(3.15)
wwB wbc Moreover, the velocity of leg tip frame, C t , is related to the velocity of the leg joints, q, ˙ by the leg jacobian,
v = J (q)q˙ w bt
(3.16)
bt
In this analysis, we consider following contact models for the leg tipterrain interactions: a) a point contact without friction, b) a point contact with friction, c) a soft contact, d) a rigid contact. These contact models give rise to contact constraints specified by 41
• vzi = 0, for a point contact without friction. • vxi = vyi = vzi = 0, for a point contact with friction. • vxi = vyi = vzi = 0 and wzi = 0, for a soft contact. • vxi = vyi = vzi = 0 and wxi = wyi = wzi = 0, for a rigid contact. For each of the contact models, substituting the above contact constraints and equation 3.16 into equation 3.10 we have
B T
v = B J (q)q˙ w bc
T
(3.17)
bc
where B T is the basis matrix defined in [49] representing the model contact constraints. For example, for a point contact with friction
B T
1 = 0
0 0
0 0 0 0 1 0 0 0
(3.18)
0 0 1 0 0 0 Substituting equation 3.17 into equation 3.15 we have
v −B T w v −G
wB
T
wB
T
wB
= B J (q)q˙ = J (q)q˙ T
(3.19)
leg
(3.20)
wwB Dual to generalized velocity, a generalized force (or wrench) can be written as
F 13
f = τ 13
13
42
(3.21)
where τ 13 ∈ R3 and f 13 ∈ R3 are the torque and the linear force about the origin of C 3 relative to coordinate frame C 1 , respectively. Generalized force can be defined by examining the work produced by a virtual displacement. A virtual displacement is an instantaneous infinitesimal displacement du. The work produced by a virtual displacement, virtual work, is denoted by δW , where δW = F · du. We use the principle of virtual work to find generalized force relations. The work performed, which has units of energy, must be the same regardless of the coordinate system within which it is measured or expressed [45]. The virtual work done by an infinitesimal displacement of the body with respect to C w is T
δW =
f v f v · = τ w τ w wB
wB
wB
wB
wB
wB
wB
wB
where we have represented the dot product in the virtual work equation using the transpose operation. Alternatively, the virtual work done by the corresponding infinitesimal displacement of the C c with respect to C b is T
f v f v · = τ w τ w
δW =
bc
bc
bc
bc
bc
bc
bc
bc
By the principle of virtual work, these two formulations of the work performed are equal: T
T
f v f v = τ w τ w wB
wB
bc
bc
wB
wB
bc
bc
(3.22)
and substituting equation 3.15 into 3.22 we have T
f τ wB
wB
T
=
f (−T ) τ bc
bc
43
(3.23)
f τ wB
wB
=
f (−T ) τ bc
T
(3.24)
bc
For a given contact model, let n i denote the total number of independent contact wrenches that leg i can apply to the terrain. For example, n i = 1 for a point contact without friction (i.e., a force in the normal direction), and n i = 3 for a point contact with friction (i.e., a force in the normal direction plus two components of frictional forces). Note that n i is just the number of contact constraints corresponding to the contact model. According to equation 3.24 the resulting generalized force from applied contact force of the leg i can be expressed as
f = −T Bx τ f = −Gx wB
T
i
(3.25)
wB
wB
i
(3.26)
τ wB where xi ∈ Rn i is the magnitude vector of applied contact forces (generalized) along the basis directions of B. Equations 3.20 and 3.26 provide valid relations if the leg remains in contact with the surface and there is no slipping. A common way to guarantee no slipping is to ensure that the contact forces lie within the friction cone at the point of contact-that is, the tangential component of the contact force is less than or equal to the coefficient of friction µ times the normal component of the contact force. Finally for n (i = n) supporting legs
44
q q. Q= . . q x x. F = . . x 1
2
n
1
2
T
n
J = T
GT
J leg1 J leg2 · · · J legn
=
G1 G2 · · · Gn
Then the equations 3.20 and 3.26 and can be concatenated for i = 1, · · · , n to give,
v J (Q)Q˙ = −G w f −G F = wB
T T
T
(3.27)
wB
wB
T
T
(3.28)
τ wB We have derived the force,torque and velocity relations from legs to legtips and leg-tips to body.
3.1.2
Coordinated Control of Legged Robots In this section, we develop the control algorithm for the coordinated
control of the robot legs. The goal of the control scheme is to specify a set of 45
control inputs for the leg motors so that the body undergoes a desired motion. The control scheme we develop in this section is based on the computed torque methodology. By differentiating equation 3.27 we have
v ¨ + J ˙ (Q)Q˙ = −G˙ J (Q)Q w v v˙ = J (Q) −G˙ w − G w˙ T
wB
T T
T
wB
¨ Q
+
wB
T T
T
v˙ − G (3.29) w˙ − J (Q)J ˙ (Q)Q˙ + Q¨ wB
wB
T T
wB
wB
T T
+
T
T
o
wB
(3.30)
Here J T + (Q) is the pseudo inverse satisfying J + = J T (JJ T )
−1
¨ o ∈ N (J T ) and Q
is the internal motion of redundant joints not affecting the body motion. The dynamics of the body expressed in internal base frame C w is given by the Newton-Euler equation as [51]
I 0
m
Here
0 v˙ + 0 = f I w˙ w × I w τ m 0 0 I = 0 m 0 0 m0 wB
w
wB
wB
wB
w
wB
(3.31)
wB
B
m
B
B
where mB is the body mass and I w = RwB I o RT wB is the body inertia matrix expressed in C w and I o is the body inertia matrix expressed in C B . Also from equation 3.28 we have 46
F T =
f −G τ + F wB
+
(3.32)
To
T
wB
+
where GT is the pseudo inverse of GT and F T o is the internal leg force not affecting the body motion. Combining equation 3.31 and equation 3.32 yields
F T =
I −G 0
m
+
T
0 v˙ + I w˙ w wB
w
0
wB × I w wwB
wB
+ F
To
(3.33)
In order to specify an orientation trajectory in terms of the rotation matrix RwB (t) we parameterize SO(3) so that RwB = RwB (Υ) where Υ ∈ R3 is taken as yaw α(t), pitch β (t), and roll γ (t) coordinates of the body. Given this parametrization, there exists a linear transformation p(Υ) such that [42]:
w w
x
w =
y
wz
where
cγcβ −sγ 0 α˙ ˙ = sγsβ cγ 0 β ˙ = p(Υ)Υ ˙ −sβ 0 1 γ α Υ = β
(3.34)
γ
So the acceleration of the body is given as
v˙ w˙ wB
wB
where
p¨ p˙ = P Υ¨ + P ˙ Υ˙ I 0 P = 0 p(Υ) wB
wB
wB
wB
(3.35)
We define the position error e p ∈ R6 of the body in a given desired trajectory as 47
e p = where
p p − Υ Υ p (t) Υ (t) d wB
wB
d wB
wB
d wB
d wB
is the desired body trajectory.
In order to reduce the position error, we apply joint torques of the legs to make the acceleration of the body satisfy the equation
v˙ w˙ wB
wB
=
p¨ p˙ + k e˙ + k e + P ˙ P ¨ ˙ Υ Υ d wB
wB
v p
d wB
p p
(3.36)
wB
where kv and k p are scalars chosen such that the characteristic roots of e¨ p + kv e˙ p + k p e p = 0 have negative real parts. The dynamics of the ith leg manipulator for l link is given by ˙ q˙ = τ i − J T (q)Bx i H i(q)¨ q + C i (q, q) where τ
= l × 1 vector of joint torques,
˙ q¨ q, q,
= l × 1 vectors of joint positions, velocities, and accelerations,
H (q)
= l × l joint space inertia matrix , both symmetric and positive definite,
˙ = l × l matrix of coriolis and centripetal force terms, C (q, q) We define
H . H = ..
1
0
··· ...
0 .. .
· · · H n 48
C C . C = . . C τ τ . τ = . . τ 1
2
n
1
2
n
Then the leg dynamics can be grouped for i = 1, · · · , n to yield ¨ + C (Q, Q) ˙ Q˙ = τ − J T (Q)F T H (Q)Q T
(3.37)
Thus the resultant control law is specified by substituting equations 3.30, 3.33, 3.36, 3.37:
p¨ p˙ + k e˙ + k e + DP ˙ + E (3.38) τ = DP ¨ ˙ Υ Υ where I 0 D = −HJ G − J G 0 I and p˙ 0 − HJ J ˙ Q˙ + C Q˙ − J G E = −HJ G˙ P ˙ Υ w × I w d wB
d wB
wB
v p
p p
wB
+
T
T T
T T
+
m
T
w
+
T
T T
wB
+
T
T
wB
T
+
T
T
wB
w
wB
˙ pwB , All the terms in equation 3.38 are functions of state variables Q, Q,
˙ wB . ΥwB , vwB , and Υ
49
3.2
Gait Controller Our gait controller is based on a gait synthesizer which is adapted from
the Generalized Approximate Reasoning-based Intelligent Control (GARIC) architecture [12], to our objective GARIC presents a method for learning and tuning fuzzy logic controllers (FLC) through reinforcements signals. The gait synthesizer (Fig. 3.2) consists of three modules. The Gait Evaluation Module (GEM), acts as a critic and provides advice to the main controller, based on a multilayer artificial neural network. The Gait Selection Module (GSM), decides on a new gait to be undertaken by the robot according to an ANN representation of a fuzzy controller with as many hidden units as there are rules in the knowledge base. The Gait Modifier Module (GMM), changes the gait recommended by the GSM based on internal reinforcement. This change in the recommended gait is more significant for a state if that state does not receive high internal reinforcements. (i.e. probability of failure is high). On the other hand, if a state receives high reinforcements, GMM administers small changes to the action selected by the fuzzy controller embedded in the GSM. This reveals that the action is performing well so that the GMM recommendation dictates no or only minor changes to that gait. The actions for the gait synthesizer are the gaits recommending an operation mode (defined in section 3.3.1) for each leg.
3.2.1
Encoding the Gaits for a Multilegged Robot Our gait synthesizer works on gait patterns that need to be coded.
Gait patterns are patterns of leg coordination which represents relative phases
50
Figure 3.2: Architecture of Gait Synthesizer.
(swing phase or stance phase) of legs. For legged robots using static balance, the typical feature of these gait patterns is that in any phase of the pattern the robot ensures static stability. In the gait synthesizer we work on wave gait patterns which are observed in insect walking. As stated in chapter 2 these gaits consist of metachronal waves in both side of the robot and differ from each other with an amount of overlapping. So different wave gait patterns can be derived by changing this amount. Among numerous gait patterns we choose the ones including groups of legs which are in phase. For instance, the tripod gait which is special in these patterns (an alternation between right-sided and left-sided metachronal waves) naturally have two group of legs involving three legs in phase. In the encoding of the gaits our goal is to find a modelling method for 51
Figure 3.3: Summary of terminology used in gait analysis.
all gait patterns from which a leg task can be obtained. In other words for a given state (which at least includes the phase, position and velocity of each leg for proprioceptive level of control) we want to find both in which gait pattern the current state belongs to and in which phase of that pattern it belongs to. We make use of position information of the legs to recognize the gait patterns. In the encoding process we divide the stroke distance (Fig.
3.3)
of a leg into overlapping grids for both swing and stance phases as in Fig.
3.5.
Here the linguistic values {A , B , . . . , L , M} are ”author-
defined” fuzzy partitions of stroke distance with triangular membership functions.
The tripod gait of Fig.
with the sequence:
(F,A,F,A,F,A)
3.4E can now be coded
→(G,B,G,B,G,B) → . . . →
(E , J , E , J , E , J) → (F,A,F,A,F,A) → . . ., or the gait pattern in Fig. 3.4D with the sequence:
(K,C,A,C,A,K )
{(M , E , C , E , C , M ) or(A,K,C,K,C,A)}
→
→ (L , D , B , D , B , L) → (B,L,D,L,D,B)
→
. . . → (D,B,L,B,L,D) → {(E , C , M , C , M , E) or(K,C,A,C,A,K )} → (L , D , B , D , B , L) → . . .. In all gait-sequence-encoding the fraction of cy52
Figure 3.4: Wave gait patterns. Bold lines represent swing phase. L1 signifies the left front leg and R3 indicates the right hind leg [7].
cle periods for stance and swing must be incorporated in the model. As in Fig. 3.4, in tripod gait, a stance phase is the half of a whole leg cycle whereas in tetrapod gaits it is two over three portion of leg cycle (so called duty factor described in chapter 2). Leg sequences defining gait patterns have to be also modelled by leg cycles. For a portion of a cycle, a leg is either in stance, swing or in transition (end of swing or end of stance). Thus we construct rules as: if leg R3 is in E , and, R2 in C , R1 in M , L3 in C , L2 in M , L1 in E , then R3 is in transition, R2 in stance, R1 in transition, L3 in stance, L2 in transition, and L1 in transition. Here being in A for example means that leg is in stance in current state and has partial belonging to the fuzzy linguistic value A. Whereas, the consequent (or then part) of the rule prescribe the legs’ next ”state”. With the given partitioning, 10 rules cover a tripod gait pattern and 9 for tetrapod gait patterns. The significance of this fuzzy modelling is that
53
Figure 3.5: Antecedent Labels, fuzzification of individual leg position.
individual leg phases are found from a gait pattern cycle which is determined from relative positions of the legs. For uneven terrain conditions, we define four ”operation modes” of a leg: 1. First mode labelled as -2: The leg is responsible for supporting the body. 2. Second mode labelled as -1: The leg switches to third mode provided that just the legs in the first mode provides static stability, otherwise the leg participate in the supporting legs. These legs are candidates for swing phase among stance legs.
54
3. Third mode labelled as 2: The leg is responsible for full recovery such as that if it encounters an obstacle it will try to handle it. 4. Fourth mode labelled as 1: The leg tip will descend to the ground until the tip touches the terrain and switches to the first mode. In both mode labels 2 and 1 (modes will be mentioned with their labels from now on), the leg will go on recovery if it is within the limits of its operation space. These four modes constitute leg states from control point of view that we need to distinguish for a leg within the cooperative action of walking. At Anterior Extreme Point (AEP), mode 2 automatically switches to mode 1. Furthermore, the binary data from static stability check for mode -2 legs and tip contact (a protracting leg switches to retraction when it finds a foothold that it can safely support the body) clearly determine the switching from mode -1 to mode 2 and from mode 1 to mode -2, respectively. Beside the leg/leg coordination, leg/body coordination is required for a regular gait. Movement of each leg can be characterized by a position p ∈ R and a velocity p˙ ∈ {vstance(vst ), vswing (vsw )} according to direction of body motion in leg centered coordinates. When a leg is in protraction, it is lifted from the ground and swing forward relative to the body with a constant velocity vsw > 0. When a leg is in retraction, it is on the ground, providing support and swinging backward relative to the body with a velocity vst < 0 (for straight line walking this velocity is equal to minus body velocity with respect to the ground, vB ). As in many walking animals, vsw is relatively constant while vst varies according to walking speed. In other words considering Fig 3.4, the body or retraction velocity is a fraction of protraction velocity and the fraction 55
is directly proportional to number of support legs over number of swing legs. For instance in tripod gait this ratio is one and the velocities are equal. So in our controller the body velocity for a time step is taken as: vB (t) = (
v (t − 1) ∗ ∆t)/(nost) st
(3.39)
where nost is the number of stance legs. There are two parameters to be considered concerning velocity: static stability margin and kinematic margin of stance legs (Fig. 3.3). The minimum of these margins (let us call critical margin, Cm) determines the distance that the robot can travel without violating a physical constraint. So additionally vB is set to zero when Cm is zero in speed control.
Figure 3.6: Consequent Labels: task share based on operation modes.
For the gait synthesizer, a gait is the ”task sharing” of the legs accomplishing a coordinated body movement. For instance, if a leg is on AEP, it is clear that it can only be used for stance (no share for swing) such that if it is presently in swing phase it must take a transition to stance. However in uneven terrain conditions where there is no fixed leg cycle for individual legs, it is difficult to assign in a deterministic way a leg share within the limits. 56
In our controller, we introduce a linguistic variable task share, M leg (t), taking linguistic values {Stance (St), Swing (Sw), Transition (Tr) } with triangular membership functions shown in Fig. 3.6. The values ( −2, −1, 0, 1, 2) are chosen according to labels of operation modes which consider cyclic behavior of the legs. By changing the overlapping areas and phase difference of the leftand right-sided metachronal waves we form 9 tetrapod gaits. According to the method mentioned above, we construct 91 (9 × 9 + 10) rules for all gaits belonging to the wave gait class. With the membership functions in Figs. 3.5, 3.6 we constitute the fuzzy rules for the rule base of the GSM of the gait synthesizer where triggered rules recommend a value for task share of each leg.
3.2.2
Gait Selection Module (GSM) GSM determines the recommended task share for each leg, M leg (t), in a
fuzzy decision process where inferencing is done based on the fuzzy rule base. M leg values define a measure to distinguish two switching point between mode -2 and -1 and mode 2 and 1 during walking. Two thresholds T 2,1 and T
,
−2 −1
termine the mode of the legs. For the legs in stance, legs with M leg (t) < T
de,
−2 −1
are determined as mode -2 legs and legs with M leg (t) < T
,
−2 −1
as mode -1.
Likewise, for legs in stance, legs with M leg (t) > T 2,1 are determined as mode 2 legs and those with M leg (t) < T 2,1 as mode 1. The effect of these threshold values to the decision process are analyzed in simulation. As shown in Fig. 3.7, GSM is a fuzzy logic controller represented as a fivelayer feedforward network with each layer performing one stage of the fuzzy inference process. GSM takes the current legs’ positions and phases (swing 57
Figure 3.7: Gait Selection Module
or stance) as input. The nodes in the second layer corresponds individually to possible values of each linguistic variables of the inputs (Fig. 3.5) with triangular membership functions, µV (x), where the input linguistic value V = (c, sL , sR ) is represented by c, sL , sR corresponding respectively to the center, left spread, and right spread of the triangular membership function µV . Each node in this layer feeds the rules using the linguistic value in their antecedent parts (”if” part). The conjunction of all the antecedent conditions in a rule is calculated in the third layer. The output of the layer is the firing strength of the rules which is calculated by softmin operation described in section 2.5.2. The nodes in the fourth layer corresponds to a consequent label (Fig. 3.6). Their inputs come from all the rules which use this particular consequent label. For each input supplied by a rule, nodes compute the corresponding output suggested by that rule by a defuzzification procedure µY l1eg (wr ) = c + 0.5(sR − −
58
sL )(1 − wr ) where Y leg = (c, sL , sR ) indicates a consequent linguistic value of a leg. In the last layer there are six output node for each leg that computes M leg (t) by combining the recommendations from all the fuzzy control rules in the rule base, using weighted sum in which the weights are the rule strengths:
w µ
−1
M leg = (
r
w
(wr ))/
r
r
(3.40)
r
The goal of calculating M leg values in GSN is to maximize the evaluation of the gait, v, determined by GEM where, within its learning process the vector of all parameters of Y leg (centers and spreads) are adjusted; that is, δv (3.41) δp Y = (c, sL , sR ). But, there is no explicit gradi∆ pY ∝
where pY is the vector of Y leg
ent information provided by the reinforcement signal and the gradient δv/δp can only be estimated. To estimate the gradient information in reinforcement learning, there needs to be some randomness in how output gaits are chosen by GSM so that the range of possible outputs can be explored to find a correct value. This is provided by the stochastic exploration in Gait Modifier Module (GMM).
3.2.3
Gait Evaluation Module (GEM) GEM is a standard two-layer feedforward neural network, which takes
the state of the system as input. The state data includes leg-tip positions, and velocities in leg centered coordinate systems and legs’ operation mode (-2,-1,1,2). To assign credit to the individual actions of the action sequence preceding a reinforcement signal, an evaluation function of the states is learned. 59
The output is an evaluation of the state denoted by v. Changes in v due to state transitions are further combined with a reinforcement signal to produce an internal reinforcement ˆr: rˆ(t) = r(t) + γv(t) − v(t − 1)
(3.42)
where 0 ≤ γ ≤ 1 is the discount rate. The internal reinforcement plays the role of an error measure in the learning of the GEM. If ˆr is positive, the weights of the network are altered through back propagation algorithm so as to increase the output v for positive input, and vice versa. The main reinforcement signal is obtained from critical margin (Cm) and vB . If Cm = 0 or vB = 0 (may be zero if there are no legs in swing) a reinforcement signal r(t) = −1 is returned. Otherwise a value is returned according to design goal. This value can be simply r(t) = 0 or can be a real number to represent a more detailed and continuous degree of success. Different reinforcement signals are tested in simulations in order to optimize speed and mobility.
3.2.4
Gait Modifier Module (GMM) One of the features of the Gait Synthesizer architecture is to modify the
output of the GSM according to internal reinforcement from previous time steps. GMM creates a Gaussian random distribution with a mean which is set as the recommended M leg value, and with a standard deviation αexp(−rˆ(t−1)), a non negative, monotonically decreasing function with a scale factor α, where α ∈ R+ . When rˆ(t − 1) is low, meaning the last action performed is bad, the deviation is large, whereas the controller remains consistent with the fuzzy control rules when rˆ(t − 1) is high. This deviation provides adaptation to 60
current conditions or to solving a sudden problem of leg entrapment. Also the exploration of the state space increases the systems experience, which is provided by the learning in the GEM and GSM. The gradient information ∆ pY = δv/δpY , which is within GSM, is estimated by stochastic exploration in the GMM. The modification implemented in t − 1 by GMM is judged by rˆ(t). If rˆ > 0, meaning the modified M (t − 1) is better than expected, then M (t − 1) is moved closer to the modified one, and vice versa. That is,
δv M mod(t − 1) − M rec (t − 1) (3.43) ≈ rˆ(t) δp Y αexp(−rˆ(t − 1)) denotes the M value recommended by GSM, and M mod denotes
where M rec
the M value modified by stochastic perturbation in GMM. Due to change in M leg values, four different transitions may occur: If a leg’s state is -2 (-1) and M leg (t) > T
,
−2 −1
(M leg (t) < T
,
−2 −1
) then leg is -1 (-2).
If a leg’s state is 2 (1) and M leg (t) < T 2,1 (M leg (t) > T 2,1 ) then leg is 1 (2). So stochastic exploration on M leg values which does not result in a modified transition, have no contribution in learning. We can define the minimum deviation, ∆dm, as the minimum perturbation added by GMM required to change the state of a leg. ∆dm(t) values can be given as
|M ∆d (t) = |M m
rec (t)
− T 2,1 |
rec (t)
− T
,
if the leg is in state 2 or 1
−2 −1
| if the leg is in state -2 or -1
So the modification of an M (t) depends on the deviation function and the ∆dm (t). The effect of the values α, T 2,1 , and T
,
−2 −1
simulations.
61
will be analyzed in
3.2.5
The Complete Control Cycle The control cycle in Fig. 3.8 is executed in each time step. Firstly, re-
inforcement signal and legs’ states are taken by the gait synthesizer and the legs are clustered to suitable operation modes according to their calculated M values. Then, further modifications depending on physical checks are applied and the resultant operation modes which will be valid for the rest of control layers are obtained until the next cycle. In the figure we only consider velocity controller, but different control modules (such as navigation, terrain adaptation) can be implemented. Lastly, desired velocities of the legs and the body is calculated and applied by the robot.
62
Figure 3.8: Complete Control Cycle
63
CHAPTER 4 HEXAPOD ROBOT SIMULATION
We develop the hexapod robot shown in Fig. 4.1 to be used in our simulations. Our simulation program consists of two subprograms. The first one constitutes the main body (main program) which includes the controller architecture and the hexapod model. All simulation tests and training sessions are implemented in this subprogram which is written in Matlab 6.5. The second subprogram is responsible for visualization (rendering) of the simulation results. The main program saves as a file named simvars.bsd the state data of the hexapod for each time cycle. The state data are fed as an input to the rendering program. The rendering program is written in Borland C Builder with opengl as a graphics tool. The reason for using two separate program in the simulation is to decrease the computation time spent in the tests of the hexapod. The source code of the programs and simulation results can be found in the CD attached to this thesis as an appendix.
4.1
Hexapod Model The simulations are implemented in a kinematic model. Such kinematic
models are commonly implemented in gait analysis [24], [34] and gait control [30], [28] for simulation purposes. A simplified model of the hexapod robot considered in this thesis is shown in Fig. 4.2. Each leg is identical and composed of three rigid links (Fig. 4.3). All the links are connected to each other via a revolute joint. Hence the foot point or the leg tip has three degrees of 64
Figure 4.1: The hexapod robot used in simulation.
freedom with respect to the body. The legs are represented by labels R1, R2, R3, and L1, L2, L3. Here, for example, L1 signifies the left front leg and R3 indicates the right hind leg. The body coordinate (C B ) is attached on the hexapod body with the origin at the center of gravity while leg base coordinates ( C b ) are attached on the bases of the legs (Fig. 4.2). C w is the inertial base frame. Dashed rectangles represent working spaces for the legs ( pbtipx ∈ [−Sd/2,Sd/2] and pbtipz ∈ [−Rz/2,Rz/2]). The joint angles are calculated by inverse kinematics [45] given a desired position and orientation. The dimensions of the links and the body level from the ground are assigned such that the leg tips can reach all points in their working spaces (existence of solution of inverse kinematics) and there exists only one joint angle vector (uniqueness of solution of inverse kinematics). Fig. 4.4 shows two postures of the hexapod model. As can be seen, the hexapod body in Fig. 4.4B is lower compared to the one in Fig. 4.4A
65
Figure 4.2: Hexapod model
in order to increase the reachable space of the legs. The hexapod in Fig. 4.4B is especially used in uneven terrain simulations where some legs fall into holes on the terrain. Also notice that reachable space by the legs do not overlap (Fig. 4.2).
4.2
Sensor System As indicated, joint angles of the legs are calculated by inverse kinematics
from given leg tip trajectories. In real robots these angles measured by Joint Angle Sensors [42]. These are potentiometers that measure the joint angle for each DOF of the leg. In our simulation these angles are used in the rendering 66
Figure 4.3: Each leg is identical and composed of three links. Pink legs are in swing phase whereas blue ones are in stance.
program. In the gait synthesizer (so in the main program), leg tip coordinates and velocities in their own coordinate systems are used. The leg tip-terrain interactions are determined by modelling ground contact sensors. In real robots, these are linear potentiometers on the tip of all legs that measures the deflection of the foot as it presses against the ground. In our experiments, this is an on-off sensor with output of ’1’ when contact occurs, and ’0’ for noncontact. In real robots, several additional sensors are used such as inclinometer which senses the body orientation with respect to the direction of gravity. In our simulations, we implement straight line walking in x-direction (Fig. 4.2). So body orientation does not change. Also we did not need to model sensors for terrain sensing (such as optical sensors), because the gait synthesizer is capable of making its decisions without explicitly needing such data since it develops gradually an internal world of the environment for gait adaptation.
67
Figure 4.4: Two different postures of the robot. Body level of the robot in B is lowered in order to increase the reachable space of the legs.
4.3
Kinematics of the Hexapod Robot Assumptions on kinematics and dynamics of the hexapod are given as
follows for simplicity of the analysis and is adapted from [28]. 1. The contact between a foot and the ground is a point. 2. There is no slipping between a foot and the ground. 3. All the mass of the six legs is lumped into the body, and the center of gravity is assumed to be at the centroid of the body. 4. There is no displacement in y-directions and body level ( pwBz ) and orientation is constant with respect to the inertial base frame. 5. The body speed with respect to inertial frame in x direction is equal to 68
minus leg tip speed in x direction of stance legs with respect to C b (i.e., vwBx = −vbtipst x ).
In our simulations vbtipsw x is set to a constant positive value ρ from which vwBx (so vbtipst x ) is calculated. Also |vbtipsw z | = for swinging legs. For different states (operation modes) the velocities of the legs are calculated as follows.
ρ if the leg is in state 2 or 1 and p (t − 1) < Sd/2 (AEP) = 0 if the leg is in state 2 or 1 and p v (t − 1) > Sd/2 (AEP) ν if the leg is in state −2 or −1 (4.1) where ν = ( v (t − 1)∆t)/(nost) and nost is the number of stance legs. btipx
btipx
btipx
btipsw x
And
vbtipz
4.4
= − 0
if the leg is in state 2 and pbtipx(t − 1) <(Rz/2) if the leg is in state 1 and pbtipx(t − 1) >(-Rz/2)
(4.2)
otherwise
Uneven Terrain The main challenge of the gait synthesizer is for uneven terrain locomo-
tion. The test path for uneven terrain is modelled such that a smooth surface succeeding a part with randomly placed hills and holes, some of which are deeper than the legs can reach (Fig. 4.5). A function in the main program named as TTerrainmaker.m (refer to the CD in the appendix) creates terrains by randomly placing 7 different surface segments which have dimensions such 69
that only leg tips can collide with them. In other words the other parts of the hexapod robot (links or the body) do not have collision with the terrain. The tests conducted on uneven terrain, where a leg hits an obstacle (probably with the link part of the leg) and can not go on swinging, are modelled by temporary disabling of the corresponding leg. The effect of the disabling is same as the obstacle collision from the gait synthesizer point of view (temporarily the leg will not participate in the gait of the hexapod). Again such simulations can be found in the CD and are discussed in details in chapter 5 under simulation results.
Figure 4.5: The modelled uneven terrain. Different surface segments can be seen in the figure. The holes on uneven terrain are modelled by surface segments which are deeper than the legs can reach. Notice that the pink leg (swinging) falls into such a segment.
70
CHAPTER 5 SIMULATION RESULTS
The hexapod robot simulation developed in chapter 4 is used to generate simulation results that clearly demonstrate the capabilities of our gait synthesizer. The simulations are implemented in a kinematic model (chapter 4) rather than the dynamic model described in chapter 3.1. A control system in such a dynamic model has to include many control modules besides a gait controller, such as control algorithms related to navigation, speed, body level (terrain adaptation), which have only effects on the low level execution of a gait of the robot rather than to the gait formulation level. Consequently the simulation omits these effects and analyze just the gait synthesizer in the gait control of a hexapod robot based on its kinematic model. In the first two sections of the simulations we will firstly analyze the control parameters and different choice of reinforcement signals that are significant in the performance of the gait synthesizer. These tests will be implemented on smooth terrains in order to focus on comparisons under similar environmental effects. In the rest of the simulations we will show the capabilities of the gait synthesizer for search and rescue (SAR) by testing its performance on modelled uneven terrains expected in SAR operations and when a leg is used as a manipulator. Before these tests are implemented, the gait synthesizer was trained with different initial conditions and with different terrains (for the tests applied on uneven terrain). The results presented here are chosen among the ones which are impressive enough to clearly demonstrate the advantages 71
of the gait synthesizer and the potential it offers for SAR. All the results are included in a CD which is attached to the thesis as an appendix. The reader is referred to this CD in which the results discussed here can be examined visually.
5.1
Exploration and Exploitation Dilemma in Reinforcement Learning As indicated in the Gait Modifier Module (GMM) the deviation func-
tion αexp(−rˆ(t − 1)) is scaled by α, and two threshold values, T
,
−2 −1
and T 2,1
must be properly selected for the controller. The effects of these values are tested first for a simple learning problem. The legs begin from random initial positions (all the legs are in state −2) such that this initial configuration does not belong to any gait pattern in the fuzzy rule base. Since the reinforcement signals aim at optimizing the speed of the hexapod with a maximized static stability, we expect that from this random initialization the gait will converge to the optimum one in terms of speed which is the tripod gait. In order to test the sensitivity of the gait synthesizer to changes in α and thresholds we test the GSM in the same manner for each α and threshold values. Within each training session, repeated 10 times maximum, the gait synthesizer is trained for 2000 time steps for a given parameter set ( α, T 2,1 , and T
,
−2 −1
). We ini-
tialize the weights in learning, change the parameters and apply the training again for a new parameter set. Fig. 5.1 shows the resultant speed vs time graphs of the hexapod. In the first test the parameters are chosen as; α = 0.5, T 2,1 = 0.5, and T
,
−2 −1
= −0.5. If the magnitude of the scale factor (α) is
72
high, we find that the exploration of different gaits is also high. In other words the gait synthesizer tries plenty of gaits for different states, causing a very slow learning. Fig. 5.1A shows the resultant speed vs time graph at the 10th training session. The synthesizer is found not to be able to converge to a periodic movement or capture a gait pattern. On the other hand, when the scale factor is too small as in a second test taking α = 0.01, T 2,1 = 0.3, and T
,
−2 −1
= −0.3, learning is low thus exploration is low and moreover there is
a chance of getting stuck. Fig. 5.1B (the second row) is the resultant speed vs time at the second training session. The legs’ state vector at the end of this training is observed as [−1, −1, −2, −1, −2, −1]. Here, because no static stability is provided by the legs in state −2, no swinging leg exist and the body stands in a still position. In such states (most severe being the case of the state [−1, −1, −1, −1, −1, −1]) the synthesizer has to try different combinations of leg states in order to continue its movement. But low scale factor tightens the deviation from the recommended M values and recovery from the present state is low and limited. In the third and fourth tests (Fig. 5.1C, 5.1D) we set the scale factor to 0.15 and consider two threshold pair; T 2,1 = 0.5, T
,
−2 −1
and T 2,1 = 0.1, T
,
−2 −1
= −0.1 (Fig. 5.1C),
= −0.5 (Fig. 5.1D). These speed vs time graphs are
obtained in the 10th training session. When T 2,1 is high the legs can not stay at state 2 for a long time and changing into state 1. This creates very small step sizes. Whereas, when T
,
−2 −1
is too small a similar problem as in the sec-
ond test arises where too many legs falls into the state −1 and the hexapod robot get stuck in a still position without the gait synthesizer being able to
73
Figure 5.1: Body speed versus time graphs for different scale factor and threshold values.
restart its motion, although the gait synthesizer tries many new gaits in order to escape from such states. The robot looses time: notice long delays with zero speed such as between times 1300 and 1400. The last row represents results for parameters α = 0.15, T 2,1 = 0.1, T
,
−2 −1
= −0.1. This speed vs time graph
shows a tripod gait and is obtained in the third training session, giving raise to values that can be considered as near optimum.
5.2
Smooth Terrain Tests In this section simulations demonstrate the learning capability of the
gait synthesizer on smooth flat terrain. Learning is aiming at increasing the 74
Figure 5.2: Comparison of resultant gaits when training is done according to two different reinforcement for speed (first row) and critical margin (second row). The first column gives the resultant gaits, second one body speed versus time, and last column shows critical margin in the direction of motion versus time.
static stability margin while maximizing speed. As indicated in section 3.2.3, a reinforcement signal r(t) = −1 is returned when the critical margin, Cm, or body speed vwBx is zero, except for states in which there exist a swinging leg on AEP. Otherwise, the controller is rewarded towards its optimization of the speed and critical margin. Reinforcement signals leading to such rewards are of the form r(t) = vwBx /ρ
(5.1)
and r(t) = Cm(t)/Cm max respectively. Here Cmmax is the maximum critical margin which is the stroke 75
distance (Sd), and ρ is the maximum speed of the body according to the speed policy which can be obtained in tripod gait. The first row of Fig. 5.2 shows the results of speed optimized gait. The first column gives the resultant gait, second one vwBx versus time and last column shows critical margin versus time in the direction of motion. As expected a tripod gait is obtained because it is the fastest gait in the rule base of the gait synthesizer and this is where naturally gait decision has converged to. Maximum speed in second column corresponds to ρ. The results in the second row corresponds to the gait synthesizer trained to optimize Cm. As can be seen, a tetrapod gait is obtained which generates steps to prevent the critical margin from getting smaller (graph in the third column of second row). The drawback here is on the speed as seen in the second column.
1
0.5
0
−0.5
−1
−1.5 0
500
1000
1500
2000
2500
Figure 5.3: Internal reinforcement versus time.
Another example demonstrates a compromise the gait synthesizer undergoes in its performance in the case of a tripod gait with small step sizes. The 76
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 0
500
1000
1500
2000
2500
Figure 5.4: Critical margin, Cm(t), versus time.
robot is trained for speed with an additional reinforcement signal r(t) = −1 when critical margin, Cm, which is the minimum of stability margin and kinematic margin, is below a positive value. When the robot starts with a tripod gait it is punished several times due to this reinforcement signal and the internal reinforcement increased as seen in Fig. 5.3. Fig. 5.4 shows the Cm versus time graph of this simulation. The decrease in the internal reinforcement meaning a performance problem causes the gait synthesizer to decide on new gaits. As can be seen, the gait synthesizer adapts the gait after a certain amount of time to increase the internal reinforcements without losing the periodicity. Fig. 5.5 shows the leg tip positions in x direction where one can observe that leg step sizes decreased. This simulation clearly shows that an adaptation of the gait synthesizer is achieved for both speed and mobility (in terms of critical margin) by an appropriate choice of reinforcement signals.
77
0.2
0.2
0.2
0.1
0.1
0.1
0
0
0
−0.1
−0.1
−0.1
−0.2
−0.2 500
1000 1500
−0.2 500
R1
1000 1500
500
R2
R3
0.2
0.2
0.2
0.1
0.1
0.1
0
0
0
−0.1
−0.1
−0.1
−0.2
−0.2 500
1000 1500 L1
1000 1500
−0.2 500
1000 1500 L2
500
1000 1500 L3
Figure 5.5: Leg tip positions on x direction versus time. In order to increase the critical margin gait synthesizer applies smaller step sizes.
5.3
Performance on Rough Terrain Next, the robot is tested on uneven terrain, modelled such that a smooth
surface succeeds a part with randomly placed hills and holes some of which deeper than the legs can reach. We conduct a comparative analysis of performance of the hexapod robot with or without the gait synthesizer but with fixed gait approaches on the defined terrain. Fig. 5.6 shows tip trajectories of the legs in classical fixed tripod gait. The legs swing in their operation space and Anterior Extreme Point is taken to be the fixed switching point for mode 2 to 1. When left front leg (L1) falls in a hole, the robot is stuck and can no longer move. There are mainly two reasons for such a failure. Firstly the gait pattern is defined for six legs and can not be implemented if any one is missing. Secondly as shown in Fig. 5.2, critical margin for the tripod gait approaches zero when swinging legs are descending. This is because stance legs reach 78
Posterior Extreme Point (PEP) so there exist no margin for body movement to handle the hole. Fig. 5.7 shows tip trajectories and Fig. 5.8 shows the resultant gait when gait synthesizer is implemented on the same terrain. The gait synthesizer successfully handles the terrain irregularities. When the robot first enters the uneven portion of the terrain, the evaluation of the gait give lower reinforcements (due to unexpected bad performance in the robot state) and new gaits are recommended by the gait synthesizer. When a leg falls in a hole the synthesizer generates very small steps as ripples in the trajectories. These hesitations are actually trials of new gaits by Gait Modifier Module and are also seen on the trajectories of the legs’ tips while they are swinging. One can argue that a different fixed gait (for instance a tetrapod gait) can tackle such terrain. This is true from mobility point of view. However for search and rescue tasks, speed (or response time) is as important as mobility and a fixed tetrapod gait has a slower performance quite inadequate for a time pressing SAR operation. Compromise is needed between two concepts. Fig. 5.8 also shows that after some time the robot reaches the smooth terrain where it recovers a tripod gait. Gait trials for a better evaluation of the gait can be seen from these results where recoveries occur. The results of an another example for a similar terrain is given in Fig. 5.9, 5.10, and 5.11 where a faster recovery of the tripod gait is achieved.
5.4
Task Shapability: A Must for SAR Operations In search and rescue (SAR) operations a leg of the hexapod can be re-
quired to be used for tasks such as carrying debris or any equipment while
79
the robot is in motion, so that it can not participate in the gait of the hexapod. Such a task shapability may be vital in hazardous environment of SAR. Fig. 5.12 and 5.13 represent such a situation where leg R1 is involved in a manipulation task and is eliminated from the gait pattern. The leg involved in a manipulation task is shown here as fixed in a position in swing phase as if it is holding something. Although the gait synthesizer is seen not being able to find right away a periodic gait, it provides the mobility in sudden lack of a leg using the redundancy in multi-legged locomotion. Simulations clearly indicate the advantageous characteristics of the gait synthesizer for mobility and robustness required in search and rescue (refer to CD in the appendix).
80
Figure 5.6: Leg tip trajectories of the hexapod on x-z plane with a fixed tripod gait on the defined terrain.
81
Figure 5.7: Leg tip trajectories of the hexapod on x-z plane with gait synthesizer on the defined terrain.
82
Figure 5.8: Gait of the hexapod robot on uneven terrain. The robot recovers tripod gait pattern after some time reaching the smooth terrain.
83
Figure 5.9: Gait of the hexapod robot on uneven terrain. The robot recovers tripod gait faster than the previous one.
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 0
500
1000
1500
2000
2500
Figure 5.10: Critical margin versus time.
84
0.2
0.2
0.2
0.1
0.1
0.1
0
0
0
−0.1
−0.1
−0.1
−0.2
−0.2
−0.2
500 1000 1500 2000
500 1000 1500 2000
500 1000 1500 2000
R1
R2
R3
0.2
0.2
0.2
0.1
0.1
0.1
0
0
0
−0.1
−0.1
−0.1
−0.2
−0.2
−0.2
500 1000 1500 2000
500 1000 1500 2000
500 1000 1500 2000
L1
L2
L3
Figure 5.11: Leg tip positions on x direction versus time.
Figure 5.12: Gait generated by the gait synthesizer when leg R1 is missing.
85
Figure 5.13: Gait generated by the gait synthesizer in sudden lack of leg R1.
86
CHAPTER 6 CONCLUSION
6.1 6.1
Gener eneral al In this thesis work we developed an intelligent task shapable control,
based on a gait synthesizer for a hexapod robot upon its traversal of unstructured workspaces in rescue missions within disaster areas. The gait synthesizer draws decisions from insect-inspired gait patterns to the changing needs of the terrai terrain n and that of rescue rescue tasks. tasks. It is compose composed d of three modules modules respons responsiible for selecting a new gait, evaluating the current gait, and modifying the recommended gait according to the internal reinforcements of previous time execut execution ion perform performanc ances. es. Simula Simulatio tion n result resultss show show the potent potential ial of the gait synthesizer for Search and Rescue operations such as to be adaptable to the uneven terrain by shaping gaits, to get out of an entrapment of some legs, to modify gaits when some legs are used as manipulators in some other tasks very different in nature to that of locomotion. The contribution of this thesis work can be analyzed from several point of views. Tow owards ards gait analysis, analysis, we introduce a modelling modelling method method for insect inspired gait patterns. We form fuzzy rules for different phases in gait pattern cycles from relative positions of the legs. The fuzzy rules provide a method to distinguish tasks of individual legs in the coordinated movement of hexapod robots. This modelling modelling and fuzzificat fuzzification ion process is valid for all legged robots using static stability.
87
For legged robots, there are two parts to gait generation; the cyclic action of the individual legs and the coordination of all the legs to make effective use of their cycles. Periodic gaits offers this coordination within a pattern with a fast execution, exhibiting different amount of weakness to irregularities of the environment. By utilizing a control structure namely the novel gait synthesizer architecture that exhibit intelligent control features such as learning and adaptability in unstructured environments, we provide exploration among such periodic gait patterns in order to be mobile and rapid at the same time on uneven terrain. Besides, the control architecture generates gaits to get out of entrapment of legs owing to the modifier module as one of the 3 main modules of the gait synthesizer. Dynamics of legged robots is complicated because of coupling of individual legs in the dynamics of the body. We established its similarity to grasping and manipulation of objects by multi-fingered robot hands. and we considered locomotion as grasping onto infinitely big, rough, arbitrarily textured terrain. We make use of multitude of existing works on grasping models with multi-fingered robot hands to generate the locomotion dynamical equations for legged robots. Again the derived equations are general enough to be applied to all legged robots under static stability. Finally this thesis contributes to the literature on feasibility of autonomous intelligent robots for search and rescue (SAR). We have developed a coordination control of legs based on gait patterns for fast and secure mobility of legged robots for the mobility needs in SAR environments. Fast mobility is ensured by an optimization on speed. Secure mobility is achieved by an 88
optimization of the static stability margin and also by the gait synthesizer modifying its gait to take out the robot for any motion entrapment. This deadlockfree locomotion when terrain entrapment or leg failure occur is due to the ability of our gait synthesizer using successfully the redundancy in multilegged robots.
6.2
Future Work In this thesis we restrict the subject to gait control. In a complete control
structure of a legged robot, the gait synthesizer should undertake more responsibilities than the ones mentioned in this thesis work. The gait synthesizer can then be further expanded in several ways. By different choice of reinforcement signals the synthesizer can be trained and adapted to different tasks. We just give an example of such an adaptation in terms of speed and mobility which are the main concern of locomotion as our case. Moreover, for terrain irregularities that are routinely faced in a search and rescue (SAR) operation (such as specific obstacle types), some control modules can be added into Gait Modifier Module. Encountering such situations, the gait synthesizer lets these modules take the control of modifications held in GMM while still recommending gaits for locomotion. Also new rule bases can be added to the system for five legged locomotion so that in a permanent loss of a leg the corresponding rule base can be set in action. Although we showed that such situations can still be handled by the gait synthesizer with rules for six legs, addition of such rules will provide more 89
functionality to the gait synthesizer with just a drawback of memory usage. The analysis in the thesis are made for a two dimensional model of a hexapod robot locomotion, meaning a straight line walking. The gait synthesizer can be adapted to a real robot by addition of rules for lateral positions of leg tips and the locomotion can be made over a planar x-y terrain. Also the working space of the legs must be adapted when orientation of the body changes. These foreseen changes to the system would not effect the performance of the gait synthesizer because the main concept, which is drawing decisions from gait patterns to the needs of the locomotion, would not change with these modifications. An important property of the gait synthesizer is the generated M values. These values have information about relative functionality of the legs. Although we just apply to distinguish operation modes of the legs, the other control modules can also make use of them. For instance, in navigation control the M vector of the legs can be taken as an input indicating feasibility of a manoeuvre. For such utilizations the learning algorithm need to be changed. Because this time not only the comparison with threshold values but also the value of the M will be meaningful.
90
REFERENCES
[1] Jennifer Casper, Mark Micire, Robin R. Murphy, Issues in Intelligent Robots for Search and Rescue, . ˙ [2] Ismet Erkmen, Aydan M. Erkmen, Fumitoshi Matsuno, Ranajit Chatter jee, Tetsushi Kamegawa, Snake Robots to the Rescue, Serpentine Search Robots in Rescue Operations, IEEE Robotics and Automation Magazine, September 2002. ˙ [3] G. Meltem Kulalı, Mustafa Gevher, Aydan M. Erkmen, Ismet Erkmen, Intelligent Gait Synthesizer for Serpentine Robots, Proceedings of the 2002 IEEE International Conference on Robotics and Automation, Washington, DC, May 2002. [4] M. H. Raibert, Legged Robots that Balance, MIT Press, Cambridge, MA, 1986. [5] S. M. Song, K. J. Waldron, Machines That Walk: The Adaptive Suspension Vehicle, Cambridge, MA: MIT Press, 1988. [6] Celaya, E., Porta, J. M., A Control Structure for the Locomotion of a Legged Robot on Difficult Terrain , IEEE Robotics and Automation Magazine, Vol. 5, No. 2, June 1998, pp. 43-51. [7] M. J. Randall, A. G. Pipe, A Novel Soft Computing Architecture for the Control of Autonomous Walking Robots, Soft Computing 4 (2000) 165-185.
91
[8] Shin-Min Song, Kenneth J. Waldron, An Analytical Approach for Gait Study and Its Applications on Wave Gaits, The International Journal of Robotics Research, Vol. 6, No. 2, Summer 1987. [9] J. Dean, A Model of Leg Coordination in the Stick Insect, Carausius Morosus, I. A geometrical consideration of contralateral and ipsilateral coordination mechanisms between two adjecent legs, Biol. Cybern. 64, 393-402, 1991. [10] Cynthia Ferrell, A Comparison of Three Insect-inspired Locomotion Controllers, Robotics and Autonomous Systems 16, (1995) 135-159 [11] David Wettergreen, Chunk Thorpe, Gait Generation for Legged Robots, Proceedings of the IEEE International Conference on Intelligent Robots and Systems, July 1992. [12] Hamid R. Berenji, Pratap Khedkar, Learning and Tuning Fuzzy Logic Controllers Through Reinforcements, IEEE Transactions on Neural Networks, Vol. 3, No. 5, September 1992. [13] Chin-Teng Lin, C.S. George Lee, Neural Fuzzy Systems, Prentice Hall Inc., 1996. [14] Jennifer Casper, Mark Micire, Robin R. Murphy, Jeff Hyams, Brian Menten, Mobility and Sensing Demands in USAR, . [15] R. Blickhan, R. J. Full, Similarity in Multilegged Locomotion: Bouncing Like a Monopode, Journal of Comparative Physiology, A(1993) 173:509517.
92
[16] Michael H. Dickinson, Claire T. Farley, Robert J. Full, M. A. R. Koehl, Rodger Kram, Steven Lehman, How Animals Move: An Integrative View , Science Vol 288 7 April 2000. [17] Chiel,H., Beer,R., Sterling,L, Heterogenous Neural Networks for Adaptive Behaviour in Dynamic Environments, Advances in Neural Information Processing Systems, 1989,pp. 577-585. [18] Beer, R. D., Intelligence as Adaptive Behaviour , Academic Press, 1990. [19] Beer, R. D., Chiel, H. J., Quinn, R. D, Espenschied, K. S., Larsson, P., A Distributed Neural Network Architecture for Hexapod Robot Locomotion , Neural Computation, 4, pp. 356-365, 1992. [20] Cruse, H., What Mechanisms Coordinate Leg Movement In Walking Anthropods? , Trends in Neuroscience, 13, pp. 15-21, 1990. [21] K. Pearson, The control of walking , Scientific American 235:72-86, 1976. [22] Full, R. J., Blickhan, R., Ting, L. H., Leg design in hexapedal runners, J. exp. Biol. 158, 369–390. [23] V.R. Kumar, K.J. Waldron, A Review of Research on Walking Vehicles, The Robotics Review 1, pp. 243-266, The MIT Press, Cambridge, MA, 1989. [24] J. Dean, A Model of Leg Coordination in the Stick Insect, Carausius Morosus, II. Description of the Kinematic Model and Simulation of Normal Step Patterns, Biol. Cybern. 64, 403-411, 1991.
93
[25] J. Dean, A Model of Leg Coordination in the Stick Insect, Carausius Morosus, III. Responses to perturbations of normal coordination , Biol. Cybern. 66, 335-343, 1992. [26] Espenschied, K. S., Chiel, H. J., Quinn, R. D, Beer, R. D., Leg Coordination Mechanisms in the Stick Insect Applied to Hexapod Robot Locomotion , Adaptive Behaviour 1 (4), pp. 455-468, 1992. [27] Espenschied, K. S., Chiel, H. J., Quinn, R. D, Beer, R. D., BiologicallyInspired Hexapod Robot Control , Proc. 5th Int. Symp. on Robotics and Manufacturing (ISRAM), 5, pp. 89-94, 1994. [28] Jung-Min Yang and Jong-Hwan Kim, Optimal Fault Tolerant Gait Sequence of the Hexapod Robot with Overlapping Reachable Areas and Crab Walking , IEEE Transactions on Systems, Man, and Cybernetics- Part A: Systems and Humans, Vol.29, No. 2, March 1999. [29] Jung-Min Yang and Jong-Hwan Kim, A fault Tolerant Gait for a Hexapod Robot over Uneven Terrain , IEEE Transactions on Systems, Man, and Cybernetics- Part B: Cybernetics, Vol.30, No. 1, February 2000. [30] Byoung S. Choi, Shin Min Song, Fully Automated Obstacle-Crossing Gaits for Walking Machines, IEEE Transactions on Systems, Man, and Cybernetics, Vol.18, No. 6, November/December 1988. [31] David Wettergreen, Chuck Thorpe, Gait Generation for Legged Robots, Proceedings of the IEEE International Conference on Intelligent Robots and Systems, July 1992 .
94
[32] Robert B. McGhee, Geoffrey I. Iswandhi, Adaptive Locomotion of a Multilegged Robot over Rough Terrain , IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-9, No. 4, April 1979. [33] T.D. Barfoot, E.J.P. Earon, G.M.T. D’Eleuterio, A Step in the Right Direction, Learning Hexapod Gaits Through Reinforcement , International Symposium on Robotics, Montreal, Canada, 14-17 May 2000. [34] Alan Calvitti, Randall D. Beer, Analysis of a Distributed Model of Leg Coordination, I. Individual Coordination Mechanisms, Biol. Cybern. 82, 197-206 (2000). [35] Porta, J. M., Celaya, E., Gait Analysis for Six-Legged Robots, Technical Report IRI-DT-9805, Institut de Rob‘otica i inform‘atica Industrial, Barcelona, March 1998. [36] E. Celaya, J.M. Porta, V. Ruz de Angulo, Reactive Gait Generation for Varying Speed and Direction , First International Symposium on Climbing and Walking Robots, 1998. [37] U. Saranlı, M. Buehler, D. E. Koditschek, Design, Modeling and Preliminary Control of a Compliant Hexapod Robot , IEEE Int. Conf. on Robotics and Automation, San Fransisco, CA (April 2000). [38] Long-Ji Lin, Self-Improving Reactive Agents Based On Reinforcement Learning, Planning and Teaching , Machine Learning, 8, 293-321, 1992. [39] Andrew G. Barto, Richard S. Sutton, and Christopher J.C.H Watkins, Learning and Sequential Decision Making , Learning and computational neuroscience, MIT Press, 1990. 95
[40] George J. Klir, Tina A. Folger, Fuzzy Sets, Uncertainty, and Information , Prectice Hall. [41] Sutton, R.S., Learning to predict by the methods of temporal differences, Machine Learning, 3:9-44 1988. [42] K.S. Fu, R.C. Gonzalez, C.S.G. Lee, Robotics: Control, Sensing, Vision, and Intelligence, McGraw-Hill. [43] John J. Craig, Introduction to Robotics, Mechanics and Control , AddisonWesley Publishing Company. [44] Kathryn W. Lilly, Efficient Dynamic simulation of robotic mechanisms, Kluwer Academic Publishers, 1993. [45] Robert J. Schilling, Fundamentals of Robotics, Analysis and Control , Prentice Hall, 1990. [46] Michael McKenna, David Zeltzer, Dynamic simulation of autonomous legged locomotion , In Computer Graphics (SIGGRAPH proceedings), volume 24. ACM, August 1990. [47] Jelena Godjevac, Comparative Study of Fuzzy Control, Neural Network Control and Neuro-Fuzzy Control , Technical Report, No: 103/95, February 1995. [48] Zexiang Li, Ping Hsu, Shankar Sastry, Grasping and Coordinated Manipulation by a Multifingered Robot Hand , International Journal of Robotics Research, Vol. 8, No. 4, August 1989.
96