Obstacle detection using stereo vision for self-driving cars
Abstract
Perc Percei eivi ving ng the the surr surrou oundi nding ngss accur accurat ately ely and quic quickl kly y is one one of the the most most esse essent ntia iall and and challenging tasks for autonomous systems such as self-driving cars. The most common sensing systems such as RADAR and LIDAR that are used for perception on self-driving cars today give a full !" vie# to the car making it more informed a$out the environment environment than a human driver. driver. This article presents a methodology to employ t#o 360 cameras to perceive o$stacles all around the autonom autonomous ous vehicle vehicle using using stereo stereo vision vision.. %sing %sing vertic vertical& al& rather rather than than hori'o hori'onta ntal& l& camera camera displacement allo#s the computation of depth information in all vie#ing directions& e(cept 'enith and nadir #hich have the least useful information a$out the o$stacles. The )ey idea for o$stacle detection is to classify points in the D space $ased on height& #idth and traversa$le slope relative to the neigh$oring points. The detected o$stacle points can $e mapped onto convenient pro*ection planes for motion planning.
CHAPTER-1 INTRODUCTION
+omputer vision is one of the toughest pro$lems in Artificial Intelligence ,AI that has $een challenging researchers and engineers for decades. This pro$lem of e(tracting useful information
from images and videos finds application in a variety of fields such as ro$otics& remote sensing& virtual reality& industrial automation& etc. The concept of making cars drive $y themselves has gained immense popularity today in the AI #orld& mainly motivated $y the num$er of accidents that occur due to driver errors negligence. Autonomous vehicles are designed to sense their surroundings #ith techniques such as RADAR& LIDAR& /P0 and computer vision. This array of sensors #orking coherently to o$serve and record the surroundings constitute the perception module of the vehicle. The ne(t stage in the pipeline is the locali'ation step #hich stitches together the incomplete and disconnected information o$tained from the sensors to identify the position& velocity and other states of the vehicle and the o$stacles ,including dynamic o$stacles. The final module is the planning stage #here the vehicle has to decide #hat it has to do given the situation it is in. The present day research prototypes $uilt $y ma*or players in the industry academia have LIDAR and RADAR as their primary perception systems. They generally provide a very accurate full !" vie# to the vehicle making it more informed a$out the environment than a normal human driver. The do#nside to these systems is the cost involved in deploying them. 0o an option is to use cameras and computer vision techniques to su$stitute these systems. It has $een sho#n in several cases that stereoscopic vision can $e applied to e(tract useful information a$out the surroundings that could assist the navigation of mo$ile ro$ots 123& 143& 13. 5ut in most cases the field of vie# is limited to *ust the front of the ro$ot. It is very critical that the vision system #e are targeting targeting doesn6t compromise compromise on o$taining o$taining the !" vie# provided provided $y LIDAR RADAR systems. This can $e achieved $y employing special cameras that capture !" $y 27" image of a scene. R8LAT8D 9:R) It is very very esse essenti ntial al for for an auto autonom nomou ouss vehic vehicle le to accu accura rate tely ly and reli relia$l a$ly y perce perceiv ivee and and discriminate o$stacles in the environment. To this end& many approaches have $een presented for different application areas and scenarios in past years using stereo vision or 4DD sensor technologies. 8ach o$stacle detection system is focused on a specific tessellation or clustering strategy& hence they have $een categori'ed into ; main models 1;3< ,i pro$a$ilistic occupancy map& ,ii digital elevation map& ,iii scene flo# segmentation and ,iv geo metry-$ased clusters.
In pro$a$ilistic occupancy maps& the #orld is represented as a rigid grid of cells containing a random varia$le varia$le #hose outcome can $e free& occupied& occupied& or undefined undefined ,not mapped 1=3& 1!3. The goal here is to compute the associated *oint distri$ution depending on a set of measurements carried out on a certain discrete set of time moments. Digital elevation maps ,D8> is one of the algorithms that try to detect o$stacles relying on the fact that they protrude up from a dominant ground surface. The o$stacle detection algorithm proposed in 1?3 marks D8> cells as road or o$stacles& using the density of D points as a criterion. It also involves fitting a surface model to the road surface. The scene flo# segmentation or other#ise called as optical flo# utili'es temporal correlation $et#een different frames of a scene captured $y stereo cameras to classify o$stacles that are in motion 173& 1@3& 12"3. This method thus naturally handles tracking dynamic o$stacles. o$stacles. inally inally the geometry-$as geometry-$ased ed clustering clustering involves involves classifica classification tion $ased on geometric geometric structure of point clouds in the D space. The o$stacle detection algorithm that #ill $est suit this catego category ry is 12 1223 23 #hich is $ased on a search method that clusters points using a dou$le cone model. This algorithm $ecame the $asis for the o$stacle detection modu le that #ent on the The approach #e present in this report is inspired $y the method in 1223. /iven our o$*ective of o$taining a !" ! " vie#& #e present a fe# modifications to this algorithm that #ill use !" stereo pair to o$tain o$stacle information. The remainder of this report is structured as follo#s. 0ection talks a$out the equipment setup that #ould $est suit for this application. 0ection ; de scri$es the disp dispar arit ity y and and dept depth h esti estima mati tion on tech techni nique que used used.. 0ect 0ectio ion n = talk talkss a$out a$out the the pi(el pi(el-l -leve evell implementation details of the o$stacle detection algorithm. 0ection ! discusses the e(perimental results on a fe# test scenes. 0ection ? outlines o utlines possi$le ne(t steps from this pro*ect.
TEREO !IION "OR E#"-DRI!IN$ CAR Relia$ly and accurately detecting o$stacles is one of the core pro$lems that need to $e solved to ena$le autonomous navigation for ro$ots and vehicles. or many use cases such as micro aerial vehicles ,>ABs or self-driving cars& o$stacle detection approaches need to run in ,near real-time so that evasive actions can $e performed. At the same time& solutions to the o$stacle detection pro$lem are often restricted $y the type of vehicle and the availa$le resources. or e(ample& a >AB has restricted computational capa$ilities and can carry only a certain payload
#hile car manufacturers are interested in using sensors already $uilt into series vehicles in order to keep self-driving cars afforda$le. There essentially e(ist t#o approaches for o$stacle de-tection. Active methods use sensors such as laser scanners& time-of-flight& structured light or ultrasound to search for o$stacles. In contrast& passive methods try to detect o$stacles $ased on passive measurements of the scene& e.g.& in camera images. They have the advantage that they #ork over a #ide range of #eather and lighting conditions& offer a high resolution& and that cameras are cheap. At the same time& a #ide field of vie# can $e covered using for e(ample fisheye cameras. In this paper& #e therefore present an o$stacle detection system for static o$*ects $ased on camera images. 9e use stereo vision techniques 123& 143& 13 to o$tain a D model of the scene. :ur main motivation is to ena$le self-driving cars to detect static o$*ects such as parked cars and signposts& determine the amount of free space around them& and measure the distance $et#een o$stacles& e.g.& to determine the si'e of an empty parking spot. 9e therefore detect o$stacles as o$*ects o$truding from the ground 1;3. >ost e(isting stereo vision-$ased techniques rely on classical for#ard facing $inocular stereo cameras #ith a relatively narro# field of vie# and visual ,inertial odometry ,BI: systems to provide accurate vehicle poses. These systems are mainly targeted for detecting o$*ects #hich are in front of the car and are therefore used in standard on road for#ard driving situations. or many maneuvers such as parking into a parking spot or navigating in a narro# parking garage& a full surround vie# is very important. 9e sho# that for such situations accurate o$stacle detections can $e o$tained from a system that uses only monocular fisheye cameras and the less accurate poses provided from the #heel odometry of the car& if the noisy individual detections are properly fused over time. The resulting system does not require comple( BI: systems& $ut simply e(ploits information already availa$le #hile running in real-time on our test vehicle& #e thus avoid any unnecessary delay a BI: system might introduce. This paper makes the follo#ing contri$utions< 9e descri$e the overall o$stacle detection system and e(plain each part in detail& highlighting the rationale $ehind our design decisions. 9e demonstrate e(perimentally that highly precise vehicle poses are not required for accurate o$stacle detection and sho# that a proper fusion of individual measurements can compensate for pose errors. 0elf-driving cars are currently a very active field of research and #e $elieve that the
proposed system and our results #ill $e of interest to a significant part of researchers #orking in this field. To our kno#ledge& ours is the first system that uses monocular fisheye cameras and only relies on the #heel odometry. The paper is structured as follo#s< The remainder of this section discusses related #ork. 0ec. II provides an overvie# over $oth our vehicle setup and our o$stacle detection system. 0ec. III e(plains the computation of the depth maps. :$stacle e(traction is descri$ed in 0ec. IB& #hile 0ec. B details ho# to fuse detections from multiple depth maps. 0ec. BI e(perimentally evaluates the proposed method.
ig. 2. :vervie# over the o$stacle detection system proposed in this paper. height map 1;3& 1@3& 123& or a volumetric data structure 12;3 into #hich the depth measurements are fused. :ccupancy grids are pro$a$ly the most popular scene representation as they not only provide information a$out the positions of the occupied space $ut also a$out free space 12=3. In the case of ground-$ound vehicles& e.g.& cars moving on planar surfaces& o$stacles correspond to o$*ects o$trud-ing from the ground< 1@3 compute a height profile from stereo measurements and su$sequently estimate for each 4D occupancy grid cell #hether it corresponds to free space or occupied space. 9hile 1@3 use a pro$a$ilistic model& 12!3 employ dynamic programming on the grid cells to determine free space. /iven the free space& sti(els can $e used to compactly represent o$stacles as $o(es standing on the ground 1;3. :verhanging structures and arch#ays can $e handled $y allo#ing multiple height map layers 123. In this paper& #e follo# a similar approach and model o$-stacles as o$*ects e(truding from the ground. 9hile e(isting #ork requires high-precision sensors 12!3 or visual odometry methods to o$tain precise vehicle poses $efore fusing the depth data 1@3& 12?3& #e sho# that using the less accurate #heel odometry readily availa$le in every car is sufficient.
The disadvantage of most stereo cameras is their limited field of vie# ,:B. Thus& car manufacturers are $eginning to additionally integrate #ide :B 1?3 or fisheye cameras 1273 to cover the entire scene surrounding the vehicle. The e(tremely #ide :B of fisheye cameras ena$les them to also o$serve o$*ects close to the vehicle& i.e.& they are #ell-suited for o$stacle detections. In this paper& #e thus use such cameras and sho# e(perimentally that #e are a$le to accurately detect o$stacles through motion stereo. 12@3 use motion stereo& $ased on very accurate poses from /P0IC0 measurements& to generate large-scale ur$an reconstructions $y estimating the D geometry of the scene. In contrast& our approach determines #hich parts of the environment are occupied $y o$stacles and #hich are free space. As illustrated in ig. 2& our o$stacle detection frame#ork consists of three main stages. irst& #e e(tract a depth map for each camera mounted on the car using multi-vie# stereo matching on a sequence of camera frames recorded $y the moving car. The camera poses required for this step are directly o$tained from the #heel odometry and the e(trinsic cali$ration of the cameras. Co visual odometry system is employed to refine the poses. The depth maps provide a D reconstruction of the surroundings of the car $ut do not offer any information a$out #hich structures are o$stacles that need to $e avoided and #hich parts correspond to free space that the car can move through. In the second stage& #e thus detect and e(tract $oth o$stacles and free space for each individual depth map. 0ince o$stacles are o$*ects protruding from the ground& o$stacle detection is performed in 4D. :ne of the shortcomings of using #heel odometry is that the rotation of the #heels is only discretely sampled& #hich leads to a slight oscillation around the true traveled distance. This is not a pro$lem in terms of determining the position of the car $ut leads to uncertainty for the estimated depth maps due to a slightly inaccurate $aseline in the stereo matching. Therefore& the third stage fuses the o$stacle detections over several camera frames to o$tain a more accurate estimation of the occupied space around the car. Depending on the use case& the fusion can $e done independently per camera or as a single fusion that com$ines data from multiple cameras. 9e integrated our o$stacle detection system into t#o B9 /olf BI cars that are equipped #ith a system of four fisheye cameras mounted on the car #ith minimally overlapping :Bs. The cameras are installed in the side mirrors as #ell as in the front and $ack of the car and each have a 27= :B& *ointly covering the #hole field of vie# around the car. They record at a frame rate
of 24.=' and are triggered syn-chronously. 5esides the cameras& #e also utili'e the #heel odometry of the car& #hich provides a degrees-of-freedom pose of the car on the ground plane $y measuring the rotation of the #heels. The cameras are cali$rated #ith respect to the odometry frame using a simultaneous locali'ation and mapping-$ased cali$ration pipeline 14"3. The cars are the test platforms of the B-+harge pro*ect 1273& #hose aims are fully automated navigation and parking. As such they are equipped #ith other environment perception sensors #hich are not used for our system. o#ever& since the output of our method is simply a set of occupied and free space estimations& our results can easily $e fused into a com$ined o$stacle map together #ith similar data from other sensors. All used sensors are either already $uilt into today6s cars or close-to-market. or the processing of the sensor data and to perform the navigational tasks required for automated driving& the cars have a cluster of ! P+s installed in the trunk.
ig. 4. Results of our depth map computation procedure. ,Left image of the reference vie#& the depth maps computed ,middle #ithout ground direction plane-s#eep and ,right the complete depth map computation procedure including the ground directions. The depth maps are coloured #ith respect to height a$ove ground instead of depth ,color pattern repeating every "<= meters& to illustrate the quality of the ground. 9ithout the ground direction plane-s#eep& the ground is reconstructed rather $umpy ,cf. middle. In contrast& our complete procedure o$tains a much smoother reconstruction. motion $lur due to the close pro(imity to the car. To reduce noise on the areas #here the ground plane is visi$le in the images& #e introduce a t#o stage approach #hich first checks if the ground plane matches sufficiently #ell in the images and only reconstructs other structures if a matching patch is unlikely to $elong to the ground plane. 0tandard $inocular stereo matching takes t#o stereo rec-tified images as an input and computes a depth map for one of the t#o images. In our case& images are recorded #ith a continuous frame rate. This means #e have the option to match more than one image to a single reference image to increase the quality of the computed depth maps $y increasing the $aseline.
or general camera configurations& it is not possi$le to stereo rectify a set of more than t#o images. To avoid this pro$lem& plane-s#eeping stereo 123 is usually used since it does not require any rectification. The main idea of plane-s#eeping is to match a set of images to a reference image $y pro*ecting them onto a plane hypothesis and then $ack to the reference image. The images #arped through this procedure and the reference image #ill then $e compared using an image dissimilarity measure& #hich is evaluated over a small matching #indo#. If the tested plane hypothesis is close to the true depth of a pi(el in a reference image& the corresponding dissimilarity value #ill $e lo#. Testing many plane hypotheses and taking the depth induced $y the $est matching plane for each pi(el then produces a depth map for the reference image. :riginally& plane-s#eeping has $een proposed for pinhole images& #here the images are #arped into the reference vie# through a planar homography. In order to cover a #ider field of vie# #ith each camera& and thus ena$le o$stacle detection around the car& our setup uses fish eye cameras. 9hile this increases the comple(ity of the #arping process slightly& depth maps can still $e computed in real-time on a graphics processing unit ,/P% 13. The plane-s#eeping algorithm locally appro(imates the reconstructed D structure as a plane. If the normal direction of the plane hypothesis is not #ell aligned #ith the actual surface direction& the #arped image #ill $e locally distorted #ith respect to the reference image& #hich increases the dissimilarity score even for matching patches. This can $e overcome $y aligning the s#eeping plane directions #ith the predominant directions in the scene 143. 9e therefore s#eep planes in t#o directionsE 0ince #e are interested in detecting o$stacles on the ground& one s#eep direction uses planes parallel to the ground plane #hile the others are fronto-parallel to the camera. The e(trinsic camera cali$ration& #hich gives us the height of the cameras #ith respect to the surface the car drives on& provides a good initial estimate of the ground plane. ence& #e only need to test very fe# planes for the ground directionE #e used 2" in our e(periments. or the fronto-parallel s#eep& #e space the planes inversely proportional to the depth& resulting in an even sampling in disparity space. or each pi(el in the reference image and each plane& #e o$tain one dissimilarity score from each image that is matched against the reference image. To o$tain one single cost per plane for each pi(el in the reference image& #e simply average these dissimilarity scores. Cotice& this means #e do no utili'e any occlusion handling ,c.f. 13. :cclusion handling could slightly improve the quality of the depth maps around occlusion $oundaries $ut this small gain comes #ith a set of disadvantages. irstly& occlusion handling
slo#s do#n the depth map computation process. 0econdly& doing occlusion handling the #ay it #as initially proposed in 1423 #ould mean that #e #ould compute the depth map for at least one frame $efore the ne#est one availa$le. This #ould lead to an unnecessary delay of the data. As #e could not o$serve any significant gains #ith it and since it comes #ith the aforementioned pro$lems& #e refrained from using occlusion handling in our e(periments. The depth map is finally e(tracted as the depth of the plane #ith minimal cost& utili'ing a #inner takes all strategy. The last step of our depth map computation procedure is to com$ine the potentially incomplete depth map computed for each s#eeping direction. or this& #e need to define a $it more formally the output of the plane-s#eeping algorithm. or the s#eeps in the ground and fronto-parallel direction& #e o$tain a depth map Fg,(E y and F f ,(E y& respectively. The computation of the depth maps provides for each pi(el the uniqueness of this match. Let + E4,(E y $e the cost of the second-$est matching plane for pi(el ,(E y. The uniqueness ratio is then defined as the ratio + ,(E yG+ E4,(E y. 5oth of these additional values can $e used to filter the depth maps< A lo# matching cost implies that the image patches match #ell. A large uniqueness ratio indicates am$iguities& e.g.& uniformly te(tured regions or repeating patterns #ill result in a ratio close to 2. 9e use t#o sets of thresholds for the t#o s#eeping direction T+g & T%g and T+f & T%f & respectively. Let
dissimilarity cost and the uniqueness ratio for the ground-parallel and fronto-parallel s#eeps& respectively. The t#o depth maps are merged as
uses matches on the ground plane. In our e(periments& #e found this $ehavior to $e desira$le as it leads to a smoother reconstruction of the ground #hich is important to avoid false positive o$stacle detections.
or our e(periments& #e use 'ero mean normali'ed cross correlation ,FC++ scores over a @ @ pi(el #indo# for measuring the image similarity. The matching cost + ,(E y are computed as the negative FC++ score normali'ed to the interval 1"E 23& i.e.& a matching cost of 2 corresponds to a FC++ score of 2 and a matching cost of " corresponds to a FC++ score of 2. There are 2" planes used for s#eeps parallel to the ground and =" planes for the fronto-parallel direcion. The filtering thresholds are set to T%g
ig. . Illustration of the coordinate systems used in the p aper. The (-& y-& and '-a(es are colored in red& green& and $lue& respectively
The grid is placed in the ground plane& i.e.& coincides #ith the (-y plane of the vehicle frame. The cameras are mounted such that their (-a(es are parallel to the ground. Thus& the (-and y-a(es of the occupancy grid are chosen to $e parallel to the pro*ections of the (- and '-a(es of the camera onto the ground plane. As illustrated in ig. & the origin of the occupancy grid is the pro*ection of the camera center codo& e(pressed in the vehicle frame& to the (-y plane. 9e e(press coordinates in the occupancy map $y an ,angle& disparity pair ,dE & #here disparity d G 2Gygrid corresponds to the inverse depth of the measurement. 9e use disparities instead of the original y-coordinates since the depth maps are already computed $ased on dispari-ties $y spacing the planes inversely-proportional to the depth. As a consequence& the occupancy grid has a higher resolution very close to the camera& #hich leads to an unnecessary high grid resolution and thus memory and time consumption. In order to guarantee a minimal cell si'e in depth direction #e virtually shift the grid $y y shift. The definition of the final occupancy grid coordinates ,dgridE coordinates in the occupancy grid frame as<
can no# $e stated in terms of the +artesian
grid
The occupancy grid coordinates ,dgridE grid are only de-fined for +artesian coordinates #ith ygrid H ". Cotice that the resulting grid has the shape of an isosceles trape'oid& similar to the polar grid in 12!3. Let
odo
R cam 4 R and
odo
tcam 4 R 2 $e the rotation and translation that transform a depth
measurement (cam from the local camera coordinate system into the vehicle frame& i.e.& ( odo G odo
R cam(cam
odo
tcam. As sho#n in ig. & the corresponding grid cell is o$tained $y pro*ecting ( odo
codo onto the (-y plane of the grid frame& computing the angle and disparity& and performing the shift $y yshift. or each grid cell ,dE & #e store the num$er ,dE of points from the depth map that vote for free space in this specific cell& the num$er :,dE of points that vote for occupied space& and additionally the average disparity dgrid of the points voting for occupied space #ithin a given cell. If the vehicle frame position (odo of a depth measurement is close to ,or $elo# in case of errors in the depth map the ground& it votes for free space in its corresponding cell. If (odo lies a$ove the ground plane up to a ma(imum height& it votes for occupied space in its cell and the corresponding average disparity value is updated. The ma(imum height
ig. ;. Results of our o$stacle detection approach< ,Top ro# T#o input images. ,5ottom ro# :$stacles detected in three different frames& pro*ected to the ground plane. The red dots denote the o$stacle position and the cyan lines indicate the uncertainty area along the ray. The last image contains all o$stacles e(tracted in an interval of 2 seconds around the corresponding frame and sho#s that simply adding individual detections produces significant noise. each grid cell& #e aim to determine #hether it corresponds to free space or is occupied& i.e.& contains an o$stacle. In ro$otics& this is often achieved in terms of an occupancy grid using an inverse sensor model 12"3. 9e follo# a different approach #hich originates from the fusion of laser point clouds 14;3 and has since then $een used successfully in many computer vision systems 14=3& 14!3& 14?3. Its main idea is to follo# e ach ray originating from a camera center until it hits the first o$stacle. or all grid cells traversed $y the ray& a negative #eight is added to denote free space. At the measurement& the ray enters the o$*ect and hence the space $ehind the measurement should $e occupied& #hich is denoted $y adding a positive #eight. 0ince #e do not kno# the thickness of each o$*ect& #e only enter positive #eights for a small region $ehind the o$stacle. After adding the measurements for all cameras that should $e considered& a cell having a negative accumulated #eight is then considered as free space and a cell #ith a positive #eight as occupied space. or cells that have a #eight of e(actly 'ero& the space is considered as uno$served. or real-time processing& #e need to limit the si'e of the grid. 9e achieve this $y only updating a 2" 2" meter area #hich is placed such that it covers the area o$served in the current camera frame. 9hich part of the fusion grid needs to $e updated is determined $ased on the #heel odometry reading. or each cell inside the vie#ing area of the current camera frame& #e determine the o$stacle #hich lies on the ray from the camera center to the grid cell. The #eight
added to the current grid cell is defined $y comparing the distance l o$st of the o$stacle to the camera #ith the distance lcell $et#een the cell and the camera. As e(plained in 0ec. IB-5& there are t#o types of o$stacles& conventional o$stacles and the Jfree spaceJ o$stacles placed at the end of the definitively visi$le space along a ray. +onsequently& t#o different types of #eights are used. The free space #eight #f is used for $oth types #hile the o$stacle #eight # o is only employed for the standard o$stacles. or $oth t ypes of o$stacles& an
ig. !. usion results for an indoor sequence. Reflections on the ground cause erroneous detections #hile $acklight complicates precise depth esti-mation. 0till& our system is a$le to outline the positions of the o$stacles. uncertainty interval ,lo$st u2E lo$st u4 along the ray is given and #e distri$ute the #eight of the o$stacle along the interval. Thus& o$stacles that are measured #ith lo# accuracy #ill only contri$ute little #eight to each grid cell. The free space #eight for the cells along the ray is
#here the constant k " is added only to cells in front of the o$stacle. The o$stacle #eight #o should only $e non-'ero in the uncertainty regions and is thus defined as
Additionally& #e impose a minimal uncertainty area to make sure that the #eight is spread at least into a fe# cells in the grid. 9e also discard measurements #hich have a very large uncertainty interval of more than ; meters. All grid cells are initiali'ed #ith #eight 'ero and are updated #henever ne# o$stacle detections are availa$le $y adding the respective #eight to the grid cells. The update is constantly performed in a separate compute thread con-current to depth map generation and o$stacle detection. The area of the grid tha t is updated is set to ;"" ;"" cells& resulting in a 4<=cm resolution for the 2"m 2"m area. The constant for free space is set to k G ;. igures = and ! depict results of the fusion approach& #here free space is colored green& occupied space is red& and uno$served space is $lack. +ar accidents are the key pro$lem of today6s traffic #orld. Particularly this strongly felt in ur$an areas #here cars density is largest. According to Lithuanian 4"22 year car accidents statistic most common type of car accidents are collisions. It represents ;;K of total car accidents. Accidents happened #ith pedestrians and o$stacles also could $e assigning to similar car accidents type. 5y summing of everything& #e are getting 74K of total accidents 123. >ost of car accidents happened due to human factor< somnolence& inattention& a$sence of mind and slo# reaction. All these factors are critical for a driving safety. %sing stereo vision system is a #ay to reduce num$er of car accidence. 0tereo vision $ased traffic o$servation systems are in active development period. 0cientist and many famous car manufacturers are developing dura$le& accurate and real time car driver assistance systems. >any of #orks done $y researching and developing vision system for lane tracking 143& pedestrian or $icyclists detection 13& car detection 1;3 or driver monitoring 1=3.
ig. 2. Typical structure of stereo vision 1!3.
The main purpose of using stereo vision system is capa$ility to transform 4D vie# into D information usually called depth map. Typical visual system consists of t#o digital video cameras #hich are displaced hori'ontally from one another to o$tain different vie#s of a scene& in a manner similar are used to human $inocular vision ,ig. 2. D information derived $y finding the corresponding points across multiple cameras images. +orrespondences lie to the same epipolar line. To simplify the matching process& camera images are often rectified. The result is epipolar line corresponds on image scan line. 5efore image rectification system has to meet these requirements< cameras image planes are parallel& focal points height and lengths are the same. To get a depth map a disparity value must $e calculated for every image pi(el. A disparity value for a target point p calculated using follo#ing e(pression
igh level is the last processing stage #here depth and image data are used for o$*ect e(traction and ecognition. urther o$*ect list is used for front vehicle
ig. 4. 0tereo vision system architecture consist of three data processing levels. 0uggested approach $ased on dense edge map calculation and R:I filter. ig. illustrates a $lock diagram of vision system data processing. Totally eight steps sequence to e(tract and mark car from 4D image set. +ar e(traction process starts from capturing of vie# images. Ce(t& captured 4D images processed in the middle data processing level. ere 4D data transformed into D information& #ell kno#n as depth map. R:I filter eliminates redundant D information. The rest of data is processed $y dense edge map estimation algorithm.
ig. . Bision system6s data processing $lock diagram. Region of Interest filter is used to eliminate redundant D information. R:I can $e represented as a frustum in a D environment ,ig. ;. According to fact that system designed to o$serve front vehicles a frustum is pro*ected on a road#ay and covers one or more drive lines. All data outside this space are eliminated leaving only significant information. The main $enefit of applying R:I filter is saved e(tra computation resources for further o$*ect e(traction processes.
ig. ;. Illustration of region of interest filtering.
8dge dense map estimation $asically is a depth maps transformation from the front vie# to the top vie# perspective. A value of every ne# map element calculated using follo#ing e(pression<
9here
#here A is D data arrayE 5 is dense edge data array& i∈"&2..h-2M& h is image height& *∈"&2..#-2M& # is image #idth& c is a factor calculated
This transformation is modified follo#ing the idea to analy'e not #hole o$*ect $ut only their edges. The result of this transformation is an edge6s map in a perspective of top vie#. ig. = depicts the transformation principle in a D environment. The most density of points is located on o$*ect6s edges. 8dges are highlighted $y summing points in a vertical direction. The more o$*ect6s edge is higher and sharper the more points #ill $e counted in a vertical direction. lat
and lo# altitude o$*ects have a small amount of pointsE therefore& o$*ects like a road can $e easily eliminated using simple threshold.
ig. =. Illustration of D o$*ect dense edge estimation principle A virtual D computer reality #as chosen as very first vision system and approached car e(traction algorithm test environment. 0pecially designed soft#are allo#s simulating stereo vision system $ehavior in a virtual D reality. 0oft#are $ased on :pen/L and :pen+B programming li$raries and is #ritten in a + language. 0oft#are runs on +ore 4 Duo 4.=/' +P% and ; /5 memory of RA>. 0imilar kind of test method has $een used $y other authors 1@3 and 12"3 to test stereo vision systems for relative applications. In order that tested situations #ould $e more realistic& a segment of street #ith all follo#ing infrastructure< cars& road signs& $uildings& pedestrians #ere designed and rendered in a virtual D reality. In this test stage a depth map #as generated using :pen/L graphic API methods. This allo#s getting the $est depth map quality. Depth map can $e estimated and using any kind of stereo matching algorithm. 0imulation soft#are allo#s imitating stereo camera vie# and capture left and right camera images. 8stimated depth map can $e represented as a point cloud converting D data to real #ord coordinates. Different traffic situations& vision systems configurations can $e emulated and repeated many times. This is the main advantage of descri$ed simulation soft#are. $ottom. The opposite situation presented in the situation no. 4. Detected car marked #ith a red colored rectangle. Red color indicates that car is in the unsafe distance and driver must pay attention. +lose o$*ects represent as $ig $lo$s located near to dense edge map $ottom. The rest t#o situations ,no. & ; sho#s system a$ility detect not only for#ard cars $ut and different kind of o$stacles& for instance pedestrians crossing the street. Another aspect of vision system is a$ility to detect multi num$ers of o$stacles.
ig. !. our traffic situation e(amples simulated using virtual D reality. Left image N camera vie#& right image N edge dense map.
CHAPTER-% PROPOED ÐOD
A pair of cameras are required to implement stereo vision. 0ince the application of this #ork is to replace e(isting LIDAR-$ased perception systems& the system developed in this #ork needs to detect o$stacles all around the vehicle. T#o Ricoh Theta2 cameras #ere used to capture !" $y 27" spherical panoramas. The Ricoh Theta camera comprises of t#o opposite-facing 27= fisheye lens and sensors. The t#o images are stitched $y the Ricoh application to form a !" $y 27" spherical image. The t#o cameras #ere displaced vertically as this #ould result in loss of information directly a$ove and $elo# the cameras& #hich are areas& not of interest to us. or the purposes of this #ork& the camera #as mounted on a tripod at one position to capture the first image. A spirit level #as used to ensure that the camera #as hori'ontal. The camera #as then vertically displaced through a kno#n height on the tripod to the second position to capture the second image.
O ig. 2< Bertically displaced Ricoh Theta cameras mounted on a vehicle
D8PT >AP 80TI>ATI:C . Image pair capture The spherical images captured $y the ricoh cameras are in equirectangular format i.e. they are uniformly smapled in a'imuth and altitude angles. The a'imuth va riation is from to along the (-a(is of the image and the altitude variation is from G4 to G4 along the y-a(is of the image. The images are of resolution 4";7 2"4; pi(els.
The disparity for the spherical image pair is the change in the altitude angle $et#een the t#o images& since the cameras are displaced vertically. There are several techniques to estimate disparity values such as those outlined in 12;3& 12=3& 12!3. The procedure adopted in this #ork involves estimating the optical flo# of each point. The optical flo# estimation is done using the optical flo# soft#are 4 developed $y 12?3. As the cameras are vertically displaced& the optical flo# is in the vertical direction. The optical flo# at every pi(el gives us the disparity value in num$er of pi(els& at that pi(el. To speed up the disparity estimation step& the original image is do#n-sampled to 2"44 =24 pi(els. To further speed up the computation& the top and $ottom portions of the image #hich don6t contain important information for the vehicle are chopped out. An altitude range of " to " is considered for the process. The resolution of the image so o$tained is 2"4 2?2 pi(els. The image is further do#n-sampled to ?!? 24@ pi(els& on #hich the rest of the algorithm is run. The $lack and #hite image of the o$stacle map o$tained as the output of the o$stacle detection algorithm is then up-sampled to 2"4 2?2 pi(els.
ig. < +hopped and do#n sampled original lo#er image. The
image is of resolution ?!? $y 24@ pi(els #ith an altitude range of " to " .
ig. ;< Disparity map generated from optical flo# estimation. 5righter pi(els correspond to greater disparity and so smaller depth. 9e o$serve uniform gradient of disparity along the ground plane& in the $ottom half of the image.
Depth estimation The ne(t step is to determine the depth map. This is achieved through some $asic geometry and trigonometric transformations applied to the disparity map.
#here l is the $aseline distance& is the disparity value& 2 is the altitude angle of the point #ith respect to the upper camera center and PA is the distance from the point to the upper camera center. The depth is then calculated as
%sing the a$ove formulae& the depth value for every pi(el can $e calculated from the corresponding disparity value.
ig. =< Depth map generated from the disparity map. Darker pi(els correspond to smaller depth.
:50TA+L8 D8T8+TI:C /eometry-$ased clustering technique has $een proposed to detect o$stacles in the scene. 5efore #e dive into the o$stacle detection algorithm& #e need to come up #ith a definition for an o$stacle. It is important to understand and visuali'e o$stacles in the D space. Let us consider the most general case of o$stacles found a$ove a ground plane and focus our analysis on this case. 9e later discuss some of the other possi$ilities of o$stacles or spaces in a scene that need to $e avoided $y a vehicle. Thus& o$stacles are points or areas in the scene #hich are at a height from the dominant ground plane. >athematically& #e #ill define o$stacles in terms of t#o distinct points in space in the follo#ing #ay< T#o points P1 and P2 $elong to the same o$stacle and are said to $e compati$le if 2 T Q *P 2F P1F * Q ma(. The difference $et#een the elevations of the t#o points is #ithin a range defined $y T and ma(.
4 ,,P2 P1 ,P 3 P1G**P2 P1 ****P3 P1 ** H cos T . The point P3 is o$tained $y displacing P1 through ma( in the ' direction. This condition enforces that the angle $et#een P2 and P1 #ith respect to the ' direction or elevation direction is less than a threshold value. In the a$ove definition& T is the minimum height of an o$*ect for it to $e considered an o$stacle. ma( is the ma(imum height of on o$stacle. The value of T can $e set $ased on the accepted traversa$le slope to classify o$stacles appropriately. The definition is illustrated in igure !. 9e construct an up#ard cone from P2 in the D space $ased on the values chosen for the three parameters T & ma( and T . If any of the points in the scene lie in the frustum as sho#n& those points are classified as o$stacle points.
ig. !< +one formed in the D space #ith P1 at the center. The point P2& if lies #ithin the frustum formed $y T & ma( and T & is classified as an o$stacle point. ,0ource< 1223
rom the depth map generated $efore& #e have information a$out the depth of all the pi(els. All the points or pi(els are no# D points in A'imuth-Altitude-Depth space. The points are transformed from the A'imuth-Altitude-Depth space to the -S-F space. A naive algorithm #ould involve e(amining all point pairs #hich #ould have a comple(ity of :,C2. The algorithm proposed is more efficient and compares a point #ith a limited set of points. These set of points are the points contained #ithin the trape'ium formed $y the pro*ection of the frustum or cone formed a$ove the point ,referred to as $ase point from here on in the D space& onto the image plane. The pro*ected trape'ium is scaled according to depth of the $ase point. The parameters of the trape'ium are
#here is the height of the image and depth is the depth of the $ase point. The first term is a scaling factor to transform the height to the altitude angle and thus& the num$er of pi(els. 0imilarly&
ere& hT is height of the closer parallel side of the trape'ium from the $ase point and hma( is height of the farther parallel side from the $ase point. The upper left angle of the trape'ium is same as T & the threshold angle chosen for the compati$ility definition earlier. 9e loop through all the points in the trape'ium and form a point pair for each point #ith the $ase point. If the pair of points satisy the definition of o$stacles i.e. are compati$le& then #e classify that point in the trape'ium as an o$stacle point. This algorithm has a $etter time comple(ity than the Caive alogrithm. Let ) denote the average num$er of points in the trape'ium. The comple(ity is :,)C. Algorithm< +lassify all points as non-o$stacles. 0can through all the pi(els& P & in the image. • • •
Determine the set of pi(els& TP & in the pro*ected trape'ium of P on the 4D image plane. 8(amine all points in TP and determine set :P of points Pi in TP & compati$le #ith P . If :P is not empty& classify all points of :P as o$stacle points.
ig. ?< 5lack and #hite image of the :$stacle map. The #hite pi(els correspond to o$stacle points. Post-processing Due to the lack of te(ture on the ground& #e get fe#er feature points in those regions and so the estimated disparity values are inaccurate. The disparity gradient is discontinuous in those regions on the ground& resulting in noise in the output. >edian filtering and morphological closing operation are applied to the output of the o$stacle detection algorithm to close the small holes.
ig. 7< 5lack and #hite image of the :$stacle map after post-processing. The small holes in igure ? are filled.
ig. @< :riginal lo#er image #ith o$stacles overlaid on it. The pi(e ls in yello# correspond to o$stacles.
Polar >ap The primary application of this #ork is to detect o$stacles so as to plan the motion of the vehicle. It is important to represent the o$stacle points in a manner that might $e useful for motion planning. A polar map representation of the o$stacle points is chosen. The polar map is
generated $y pro*ecting the D o$stacle points onto the ground plane. In figure 2"& the vehicle is at the center of the circle facing " . The o$stacle points are depicted in $lue #ith the radius of the circle indicating the depth from the vehicle. The small patch at " & close to the center corresponds to the #hite sign $oard in the scene in figure and the patch at " corresponds to the $lack platform. The $uildings and trees can $e seen further a#ay in the map. rom the polar map& #e can plan a path for the vehicle& not o$structed $y o$stacles.
ig. 2"< The agent is at the center of the map& facing " . The $lue points correspond to polar positions of the o$stacle points around the agent.
CHAPTER-' REU#T
igure 22 depcits a scene #hich has a good mi( of o$stacles at short range and long range. The algorithm detects all the o$stacles in scene. The process #as tested on various outdoor settings and gave positive results in all the cases. In figure 2& #e can o$serve that the polar map representation accurately captures the cars parked ne( t to each other at angles 2= to = . Also& in figure 2& #e notice small erroneous detection at @" on the ground. This is due to the
lack of te(ture on the ground& #hich results in inaccurate disparity estimation. Thus& the o$stacle detection algorithm classifies it as an o$stacle point. The accuracy of the o$stacle detection is limited $y the accuracy of disparity estimation techniques. The programs #ere developed in
>ATLA5. The average runtime of disparity estimation is 24=s and that of o$stacle detection is 4;"s.
It #as mentioned earlier that o$stacles a$ove the ground #ere $eing considered for this #ork. 5ut the process developed also #orks for potholes in the ground. 9hen a point inside the hole is picked& the conefrustum in the D space #ill contain points on the ground around the pothole. These points #ill $e classified as o$stacles #hich implies that these points #ill $e avoided in the path planning process. Potholes are areas that need to $e avoided and the algorithm does e(actly that.
CHAPTER-( &AT#A)
INTRODUCTION TO &AT#A) *+at Is &AT#A),
>ATLA5 is a high-performance language for technical computing. It integrates
computation& visuali'ation& and programming in an easy-to-use environment #here pro$lems and solutions are e(pressed in familiar mathematical notation. Typical uses include >ath and computation Algorithm development Data acquisition >odeling& simulation& and prototyping Data analysis& e(ploration& and visuali'ation 0cientific and engineering graphics Application development& including graphical user interface $uilding. >ATLA5 is an interactive system #hose $asic data element is an array that does not require dimensioning. This allo#s you to solve many technical computing pro$lems& especially those #ith matri( and vector formulations& in a fraction of the time it #ould take to #rite a program in a scalar non interactive language such as + or :RTRAC. The name >ATLA5 stands for matrix laboratory. >ATLA5 #as originally #ritten to provide easy access to matri( soft#are developed $y the LICPA+) and 8I0PA+) pro*ects. Today& >ATLA5 engines incorporate the LAPA+) and 5LA0 li$raries& em$edding the state of the art in soft#are for matri( computation.
>ATLA5 has evolved over a period of years #ith input from many users. In university environments& it is the standard instructional tool for introductory and advanced courses in mathematics& engineering& and science. In industry& >ATLA5 is the tool of choice for high productivity research& development& and analysis.
>ATLA5 features a family of add-on application-specific solutions called toolboxes. Bery important to most users of >ATLA5& tool$o(es allo# you to learn and apply speciali'ed technology. Tool$o(es are comprehensive collections of >ATLA5 functions ,>-files that e(tend the >ATLA5 environment to solve particular classes of pro$lems. Areas in #hich tool$o(es are availa$le include signal processing& control systems& neural net#orks& fu''y logic& #avelets& simulation& and many others. T+e &AT#A) ste./
The >ATLA5 system consists of five main parts< Develo0.ent Environ.ent/
This is the set of tools and facilities that help you use >ATLA5 functions and files. >any of these tools are graphical user interfaces. It includes the >ATLA5 desktop and +ommand 9indo#& a command history& an editor and de$ugger& and $ro#sers for vie#ing help& the #orkspace& files& and the search path. T+e &AT#A) &at+e.atical "unction/
This is a vast collection of computational algorithms ranging from elementary functions like sum& sine& cosine& and comple( arithmetic& to more sophisticated functions like matri( inverse& matri( 8igen values& 5essel functions& and fast ourier transforms.
T+e &AT#A) #anguage/
This is a high-level matri(array language #ith control flo# statements& functions& data structures& inputoutput& and o$*ect-oriented programming features. It allo#s $oth Uprogramming
in the smallU to rapidly create quick and dirty thro#-a#ay programs& and Uprogramming in the largeU to create complete large and comple( app lication programs. $ra0+ics/
>ATLA5 has e(tensive facilities for displaying vectors and matrices as graphs& as #ell as annotating and printing these graphs. It includes high-level functions for t#o-dimensional and three-dimensional data visuali'ation& image processing& animation& and presentation graphics. It also includes lo#-level functions that allo# you to fully customi'e the appearance of graphics as #ell as to $uild complete graphical user interfaces on your >ATLA5 applications. T+e &AT#A) A00lication Progra. Interface API2/
This is a li$rary that allo#s you to #rite + and :RTRAC programs that interact #ith >ATLA5. It includes facilities for calling routines from >ATLA5 ,dynamic linking& calling >ATLA5 as a computational engine& and for reading and #riting >AT-files.
&AT#A) *OR3IN$ EN!IRON&ENT/ &AT#A) DE3TOP/-
>at la$ Desktop is the main >at la$ application #indo#. The desktop contains five su$ #indo#s& the command #indo#& the #orkspace $ro#ser& the current directory #indo#& the command history #indo#& and one or more figure #indo#s& #hich are sho#n only #hen the user displays a graphic. The command #indo# is #here the user types >ATLA5 commands and e(pressions at the prompt ,HH and #here the output of those commands is displayed. >ATLA5 defines the #orkspace as the set of varia$les that the user creates in a #ork session. The #orkspace $ro#ser sho#s these varia$les and some information a$out them. Dou$le clicking on a varia$le in the #orkspace $ro#ser launches the Array 8ditor& #hich can $e used to o$tain information and income instances edit certain properties of the varia$le.
The current Directory ta$ a$ove the #orkspace ta$ sho#s the contents of the current directory& #hose path is sho#n in the current directory #indo#. or e(ample& in the #indo#s operating system the path might $e as follo#s< +ATLA5O9ork& indicating that directory V#orkJ is a su$directory of the main directory V>ATLA5JE 9I+ I0 IC0TALL8D IC DRIB8 +. clicking on the arro# in the current directory #indo# sho#s a list of recently used paths. +licking on the $utton to the right of the #indo# allo#s the user to change the current directory. >ATLA5 uses a search path to find >-files and other >ATLA5 related files& #hich are organi'e in directories in the computer file system. Any file run in >ATLA5 must reside in the current directory or in a directory that is on search path. 5y default& the files supplied #ith >ATLA5 and math #orks tool$o(es are included in the search path. The easiest #ay to see #hich directories are on the search path. The easiest #ay to see #hich directories are soon the search paths& or to add or modify a search path& is to select set path from the ile menu the desktop& and then use the set path dialog $o(. It is good practice to add any commonly used directories to the search path to avoid repeatedly having the change the current directory. The +ommand istory 9indo# contains a record of the commands a user has entered in the command #indo#& including $oth current and previous >ATLA5 sessions. Previously entered >ATLA5 commands can $e selected and re-e(ecuted from the command history #indo# $y right clicking on a command or sequence of commands. This action launches a menu from #hich to select various options in addition to e(ecuting the commands. This is useful to select various options in addition to e(ecuting the commands. This is a useful feature #hen e(perimenting #ith various commands in a #ork session.
Using t+e &AT#A) Editor to create &-"iles/
The >ATLA5 editor is $oth a te(t editor speciali'ed for creating >-files and a graphical >ATLA5 de$ugger. The editor can appear in a #indo# $y itself& or it can $e a su$ #indo# in
the desktop. >-files are denoted $y the e(tension .m& as in pi(el up .m. The >ATLA5 editor #indo# has numerous pull-do#n menus for tasks such as saving& vie#ing& and de$ugging files. 5ecause it performs some simple checks and also uses color to differentiate $et#een various elements of code& this te(t editor is recommended as the tool of choice for #riting and editing >functions. To open the editor & type edit at the prompt opens the >-file file name .m in an editor #indo#& ready for editing. As noted earlier& the file must $e in the current directory& or in a directory in the search path. $etting Hel0/
The principal #ay to get help online is to use the >ATLA5 help $ro#ser& opened as a separate #indo# either $y clicking on the question mark sym$ol ,W on the desktop tool$ar& or $y typing help $ro#ser at the prompt in the command #indo#. The help 5ro#ser is a #e$ $ro#ser integrated into the >ATLA5 desktop that displays a yperte(t >arkup Language,T>L documents. The elp 5ro#ser consists of t#o panes& the help navigator pane& used to find information& and the display pane& used to vie# the information. 0elf-e(planatory ta$s other than navigator pane are used to perform a search.
CHAPTER-4 DI$ITA# I&A$E PROCEIN$
)AC3$ROUND/
Digital image processing is an area characteri'ed $y the need for e(tensive e(perimental #ork to esta$lish the via$ility of proposed solutions to a given pro$lem.
An important
characteristic underlying the design of image processing systems is the significant level of
testing X e(perimentation that normally is required $efore arriving at an accepta$le solution. This characteristic implies that the a$ility to formulate approaches Xquickly prototype candidate solutions generally plays a ma*or role in reducing the cost X time required to arrive at a via$le system implementation.
What is DIP
An image may $e defined as a t#o-dimensional function f,(& y& #here ( X y are spatial coordinates& X the amplitude of f at any pair of coordinates ,(& y is called the intensity or gray level of the image at that point. 9hen (& y X the amplitude values of f are all finite discrete quantities& #e call the image a digital image. The field of DIP refers to processing digital image $y means of digital computer. Digital image is composed of a finite num$er of elements& each of #hich has a particular location X value. The elements are called pi(els. Bision is the most advanced of our sensor& so it is not surprising that image play the single most important role in human perception. o#ever& unlike humans& #ho are limited to the visual $and of the 8> spectrum imaging machines cover almost the entire 8> spectrum& ranging from gamma to radio #aves. They can operate also on images generated $y sources that humans are not accustomed to associating #ith image. There is no general agreement among authors regarding #here image processing stops X other related areas such as image analysisX computer vision start. 0ometimes a distinction is made $y defining image processing as a discipline in #hich $oth the input X output at a process are images. This is limiting X some#hat artificial $oundary. The area of image analysis ,image understanding is in $et#een image processing X computer vision.
There are no clear-cut $oundaries in the continuum from image processing at one end to complete vision at the other. o#ever& one useful paradigm is to consider three types of computeri'ed processes in this continuum< lo#-& mid-& X high-level processes. Lo#-level process involves primitive operations such as image processing to reduce noise& contrast enhancement X image sharpening. A lo#- level process is characteri'ed $y the fact that $oth its inputs X outputs are images.
>id-level process on images involves tasks such as segmentation& description of that o$*ect to reduce them to a form suita$le for computer processing X classification of individual o$*ects. A mid-level process is characteri'ed $y the fact that its inputs generally are images $ut its outputs are attri$utes e(tracted from those images. inally higher- level processing involves V>aking senseJ of an ensem$le of recogni'ed o$*ects& as in image analysis X at the far end of the continuum performing the cognitive functions normally associated #ith hu man vision. Digital image processing& as already defined is used successfully in a $road range of areas of e(ceptional social X economic value.
*+at is an i.age,
An image is represented as a t#o dimensional function f,(& y #here ( and y are spatial co-ordinates and the amplitude of Yf6 at any pair of coordinates ,(& y is called the intensity of the image at that point.
$ra scale i.age/
A grayscale image is a function I ,(ylem of the t#o spatial coordinates of the image plane. I,(& y is the intensity of the image at the point ,(& y on the image plane. I ,(ylem takes non-negative values assume the image is $ounded $y a rectangle 1"& a3 ×1"& $3I< 1"& a3 × 1"& $3 → 1"& info
Color i.age/ It can $e represented $y three functions& R ,(ylem for red& / ,(ylem for green and 5
,(ylem for $lue.
An image may $e continuous #ith respect to the ( and y coordinates and
also in
amplitude. +onverting such an image to digital form requires that the coordinates as #ell as the amplitude to $e digiti'ed. Digiti'ing the coordinate6s values is called sampling. Digiti'ing the amplitude values is called quanti'ation. Coordinate convention/
The result of sampling and quanti'ation is a matri( of real num$ers. 9e use t#o principal #ays to represent digital images. Assume that an image f,(& y is sampled so that the resulting image has > ro#s and C columns. 9e say that the image is of si'e > C. The values of the coordinates ,(ylem are discrete quantities. or notational clarity and convenience& #e use integer values for these discrete coordinates. In many image processing $ooks& the image origin is defined to $e at ,(ylemG,"&".The ne(t coordinate values along the first ro# of the image are ,(ylemG,"&2.It is important to keep in mind that the notation ,"&2 is used to signify the second sample along the first ro#. It does not mean that these are the actual values of physical coordinates #hen the image #as sampled. ollo#ing figure sho#s the coordinate convention. Cote that ( ranges from " to >-2 and y from " to C-2 in integer increments. The coordinate convention used in the tool$o( to denote arrays is different from the preceding paragraph in t#o minor #ays. irst& instead of using ,(ylem the tool$o( uses the notation ,race to indicate ro#s and columns. Cote& ho#ever& that the order of coordinates is the same as the order discussed in the previous paragraph& in the sense that the first element of a coordinate topples& ,al$& refers to a ro# and the second to a column. The other difference is that the origin of the coordinate system is at ,r& c G ,2& 2E thus& r ranges from 2 to > and c from 2 to C in integer increments. IPT documentation refers to the coordinates. Less frequently the tool$o( also employs another coordinate convention called spatial coordinates #hich uses ( to refer to columns and y to refers to ro#s. This is the oppo site of our use of varia$les ( and y.
I.age as &atrices/
The preceding discussion leads to the follo#ing representation for a digiti'ed image function<
f ,"&"
f,"&2
ZZZ..
f,"&C-2
f ,2&"
f,2&2
ZZZZ
f,2&C-2
f ,(ylemG
.
.
.
.
. .
f ,>-2&" f,>-2&2 ZZZZ f,>-2&C-2 The right side of this equation is a digital image $y definition. 8ach element of this array is called an image element& picture element& pi(el or pel. The terms image and pi(el are used throughout the rest of our discussions to denote a digital image and its elements. A digital image can $e represented naturally as a >ATLA5 matri(< f ,2&2 f,2&4 ZZ. f,2&C f ,4&2
f,4&4 ZZ.. f ,4&C
. fG
.
. .
. .
f ,>&2 f,>&4 ZZ.f,>&C 9here f ,2&2 G f,"&" ,note the use of a moonscape font to denote >ATLA5 quantities. +learly the t#o representations are identical& e(cept for the shift in origin. The notation f,p &q denotes the element located in ro# p and the column q. or e(ample f,!&4 is the element in the si(th ro# and second column of the matri( f. Typically #e use the letters > and C respectively to denote the num$er of ro#s and columns in a matri(. A 2(C matri( is called a ro# vector #hereas an >(2 matri( is called a column vector. A 2(2 matri( is a scalar. >atrices in >ATLA5 are stored in varia$les #ith names such as A& a& R/5& real array and so on. Baria$les must $egin #ith a letter and contain only letters& numerals and underscores. As noted in the previous paragraph& all >ATLA5 quantities are #ritten using mono-scope characters. 9e use conventional Roman& italic notation such as f,( &y& for mathematical e(pressions
Reading I.ages/
Images are read into the >ATLA5 environment using function imread #hose synta( is Imread ,Yfilename6 "or.at na.e
TI
Descri0tion
recogni5ed e6tension
Tagged Image ile ormat
.tif& .tiff
[P8/
[oint Photograph 8(perts /roup
.*pg& .*peg
/I
/raphics Interchange ormat
.gif
5>P
9indo#s 5itmap
PC/
Porta$le Cet#ork /raphics
9D
9indo# Dump
.$mp .png .(#d
ere filename is a spring containing the complete of the image file,including any applica$le e(tension.or e(ample the command line HH f G imread ,Y7. *pg6E Reads the [P8/ ,a$ove ta$le image chest ray into image array f. Cote the use of single quotes ,Y to delimit the string filename. The semicolon at the end of a command line is used $y >ATLA5 for suppressing output If a semicolon is not included. >ATLA5 displays the results of the operation,s specified in that line. The prompt sym$ol ,HH designates the $eginning of a command line& as it appears in the >ATLA5 command #indo#. Data Classes/
Although #e #ork #ith integers coordinates the values of pi(els themselves are not restricted to $e integers in >ATLA5. Ta$le a$ove list various data classes supported $y >ATLA5 and IPT are representing pi(els values. The first eight entries in the ta$le are refers to
as numeric data classes. The ninth entry is the char class and& as sho#n& the last entry is referred to as logical data class. All numeric computations in >ATLA5 are done in dou$le quantities& so this is also a frequent data class encounter in image processing applications. +lass unit 7 also is encountered frequently& especially #hen reading data from storages devices& as 7 $it images are most common representations found in practice. These t#o data classes& classes logical& and& to a lesser degree& class unit 2! constitute the primary data classes on #hich #e focus. >any ipt functions ho#ever support all the data classes listed in ta$le. Data class dou$le requires 7 $ytes to represent a num$er uint7 and into 7 require one $yte each& uint2! and int2! requires 4$ytes and unit 4.
Na.e
Descri0tion
Dou$le precision& floating point num$ers the Appro(imate. %int7
unsigned
7$it
integers
in
the
range
1"&4==3
,2$yte
per
8lement. %int2!
unsigned 2!$it integers in the range 1"& !===3 ,4$yte per element.
%nit 4 Int7
unsigned 4$it integers in the range 1"& ;4@;@!?4@=3,; $ytes per element.
signed 7$it integers in the range 1-247&24?3 2 $yte per element Int 2! Int 4
signed 2!$yte integers in the range 14?!7& 4?!?3 ,4 $ytes per element. 0igned 4$yte integers in the range 1-42;?;7!;7& 42;?;7!;?3 ,; $yte per
element. 0ingle precision floating point num$ers #ith values In the appro(imate range ,; $ ytes per elements +har
characters ,4 $ytes per elements.
Logical
values are " to 2 ,2$yte per element.
Int 4 and single required ; $ytes each. The char data class holds characters in %nicode representation. A character string is merely a 2\n array of characters logical array contains only the values " to 2ith each element $eing stored in memory using function logical or $y using relational operators.
I.age T0es/
The tool$o( supports four types of images< 2 .Intensity imagesE 4. 5inary imagesE . Inde(ed imagesE ;. R / 5 images. >ost monochrome image processing operations are carried out using $inary or intensity images& so our initial focus is on these t#o image types. Inde(ed and R/5 colour images.
Intensit I.ages/
An intensity image is a data matri( #hose values have $een scaled to represent intentions. 9hen the elements of an intensity image are of class unit7& or class unit 2!& they have integer values in the range 1"&4==3 and 1"& !===3& respectively. If the image is of class dou$le& the values are floating point num$ers. Balues of scaled& dou$le intensity images are in the range 1"& 23 $y convention.
)inar I.ages/
5inary images have a very specific meaning in >ATLA5.A $inary image is a logical array "s and2s.Thus& an array of "s and 2s #hose values are of data class& say unit7& is not considered as a $inary image in >ATLA5 .A numeric array is converted to $inary using function logical. Thus& if A is a numeric array consisting of "s and 2s& #e create an array 5 using the statement. 5Glogical ,A If A contains elements other than "s and 2s.%se of the logical function converts all non'ero quantities to logical 2s and all entries #ith value " to logical "s. %sing relational and logical operators also creates logical arrays. To test if an array is logical #e use the I logical function<
is logical ,c.
If c is a logical array& this function returns a 2.:ther#ise returns a ". Logical array can $e converted to numeric arrays using the data class conversion functions.
Inde6ed I.ages/
An inde(ed image has t#o components< A data matri( integer& ( A color map matri(& map >atri( map is an m\ arrays of class dou$le containing floating point values in the range 1"& 23.The length m of the map are equal to the num$er of colors it defines. 8ach ro# of map specifies the red& green and $lue components of a single color. An inde(ed images uses Vdirect mappingJ of pi(el intensity values color map values. The color of each pi(el is determined $y using the corresponding value the integer matri( ( as a pointer in to map. If ( is of class dou$le &then all of its components #ith values less than or equal to 2 point to the first ro# in map& all components #ith value 4 point to the second ro# and so on. If ( is of class units or unit 2!& then
all components value " point to the first ro# in map& all components #ith value 2 point to the second and so on. R$) I.age/
An R/5 color image is an >\C\ array of color pi(els #here each color pi(el is triplet corresponding to the red& green and $lue components of an R/5 image& at a specific spatial location. An R/5 image may $e vie#ed as VstackJ of three gray scale images that #hen fed in to the red& green and $lue inputs of a color monitor Produce a color image on the screen. +onvention the three images forming an R/5 color image are referred to as the red& green and $lue components images. The data class of the components images determines their range of values. If an R/5 image is of class dou$le the range of values is 1"& 23. 0imilarly the range of values is 1"&4==3 or 1"& !===3.or R/5 images of class units or unit 2! respectively. The num$er of $its use to represents the pi(el values of the component images determines the $it depth of an R/5 image. or e(ample& if each component image is an 7$it image& the corresponding R/5 image is said to $e 4; $its deep. /enerally& the num$er of $its in all component images is the same. In this case the num$er of possi$le color in an R/5 image is ,4]$ ]& #here $ is a num$er of $its in each component image. or the 7$it case the num$er is 2!&???&42! colors7
CHAPTER-8 CONC#UION
9e have sho#ed the #orking of an algorithm to detect o$stacles. 5ut the challenge al#ays lies in running it real-time on the vehicle #ith a quick update rate so as to react quickly
ig. 2< 0cene < ,a lo#er imageE ,$ depth mapE ,c $lack and #hite image of o$stacle mapE ,d lo#er image #ith o$stacles overlaidE ,e polar map representation of o$stacles
to changes in the environment. The first among our future steps should $e optimi'ing the run time of this algorithm and possi$ly e(plore efficient platforms and parallel architecture if required to run it online. There are other quick depth-estimation algorithms 1273& 12!3 in the literature that are more suita$le for real time applications at the cost of reduced accuracy. 0o a sensi$le trade off has to $e made on the choice of the depth estimation algorithm. The o$stacle detection algorithm is found to $e decently ro$ust in detecting o$stacles sticking out of the ground. 5ut it does not particularly consider holes or cliffs. /iven the depth map and D location of points in vie#& it is easy to $uild a fe# more features in the pipeline to seamlessly handle these kinds of o$stacles as #ell. or e(ample& #e could classify the road $ased on the fact that it has a steady gradient in depth value and plan a path for the vehicle only along the definite road& if e(ists& to avoid falling off cliffs. And of course& $esides *ust o$stacles& the vision system should also detect lane markings& sign $oards& $icyclists6 hand signals& etc to complete the #hole perception package of the vehicle.
CHAPTER-9 RE"ERENCE
123 Don >urray and [ames [ Little. %sing real-time stereo vision for mo$ile ro$ot navigation. Autonomous Ro$ots& 7,4<2!2N2?2& 4""". 143 ans P >oravec. Rover visual o$stacle avoidance. In I[+AI& pages ?7=N?@"& 2@72. 13 Don >urray and +ullen [ennings. 0tereo vision $ased mapping and navigation for mo$ile ro$ots. In Ro$otics and Automation& 2@@?. Proceedings.& 2@@? I888 International +onference on& volume 4& pages 2!@;N2!@@. I888& 2@@?. 1;3 Cicola 5ernini& >assimo 5erto''i& Luca +astangia& >arco Patander& and >ario 0a$$atelli. Real-time o$stacle detection using stereo vision for autonomous ground vehicles< A survey. In Intelligent Transportation 0ystems ,IT0+& 4"2; I888 2?th International +onference on& pages 7?N7?7. I888& 4"2;. 1=3 Al$erto 8lfes. %sing occupancy grids for mo$ile ro$ot perception and navigation. +omputer& 44,!<;!N=?& 2@7@.
1!3 ernan^ 5adino& %#e ranke& and Rudolf >ester. ree space com-putation using stochastic occupancy grids and dynamic programming. In 9orkshop on Dynamical Bision& I++B& Rio de [aneiro& 5ra'il& volume 4"& 4""?. 1?3 lorin :niga and 0ergiu Cedevschi. Processing dense stereo data using elevation maps< Road surface& traffic isle& and o$stacle detection. Behicular Technology& I888 Transactions on& =@,<22?4N2274& 4"2". 173 Andreas 9edel& Annemarie >ei_ner& +lemens Ra$e& %#e ranke& and Daniel +remers. Detection and segmentation of independently moving o$*ects from dense scene flo#. In 8nergy minimi'ation methods in computer vision and pattern recognition& pages 2;N4?. 0pringer& 4""@. 1@3 %#e ranke& +lemens Ra$e& ernan^ 5adino& and 0tefan /ehrig. !d-vision< usion of stereo and motion for ro$ust environment perception. In Pattern Recognition& pages 42!N44. 0pringer& 4""=. 12"3
Philip Len'& [ulius Fiegler& Andreas /eiger& and >artin Roser. 0parse scene flo#
segmentation for moving o$*ect detection in ur$an environ-ments. In Intelligent Behicles 0ymposium ,IB& 4"22 I888& pages @4!N @4. I888& 4"22. 1223
A Talukder& R >anduchi& A Rankin& and L >atthies. ast and relia$le o$stacle detection
and segmentation for cross-country navigation. In Intelligent Behicle 0ymposium& 4""4. I888& volume 4& pages !2"N!27. I888& 4""4. 1243
>assimo 5erto''i& Luca 5om$ini& Al$erto 5roggi& >ichele 5u''oni& 8lena +ardarelli&
0tefano +attani& Pietro +erri& Alessandro +oati& 0tefano De$attisti& Andrea al'oni& et al. Biac< An out of ordinary e(periment. In Intelligent Behicles 0ymposium ,IB& 4"22 I888& pages 2?=N27". I888& 4"22. 123
Al$erto 5roggi& >ichele 5u''oni& >irko elisa& and Paolo Fani. 0tereo o$stacle
detection in challenging environments< the viac e(perience. In Intelligent Ro$ots and 0ystems ,IR:0& 4"22 I888R0[ International +onference on& pages 2=@@N2!";. I888& 4"22. 12;3
)yung-oon 5ae& Dong-0ik Si& 0eung +heol )im& and 8un-0oo )im. A $i-directional
stereo matching algorithm $ased on adaptive matching #indo#. In :ptics X Photonics 4""=& pages =@"@4@N=@"@4@. Interna-tional 0ociety for :ptics and Photonics& 4""=.