HCIE-R&S
Huawei Certification
HCIE-R&S Huawei Certified Internetwork Expert-Routing and Switching
Huawei Technologies Co.,Ltd
HUAWEI TECHNOLOGIES
HCIE
Copyright © Huawei Technologies Co., Ltd. 2010. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd. Trademarks and Permissions
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders. Notice
The information in this document is subject to change without notice. Every effort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute the warranty of any kind, expressed or implied.
Huawei Certification HCIE-R&S
HUAWEI TECHNOLOGIES
HCIE-R&S
Huawei Certification System Relying on its strong technical and professional training system, in accordance with different customers at different levels of ICT technology, Huawei certification is committed to provide customs with authentic, professional certification. Based on characteristics of ICT technologies and customers’needs at different levels, Huawei certification provides customers with certification system of four levels. HCDA (Huawei Certification Datacom Associate) is primary for IP network maintenance engineers, and any others who want to build an understanding of the IP network. HCDA certification covers the TCP/IP basics, routing, switching and other common foundational knowledge of IP networks, together with Huawei communications products, versatile routing platform VRP characteristics and basic maintenance. HCDP-Enterprise (Huawei Certification Datacom Professional-Enterprise) is aimed at enterprise-class network maintenance engineers, network design engineers, and any others who want to grasp in depth routing, switching, network adjustment and optimization technologies. HCDP-Enterprise consists of IESN (Implement Enterprise Switch Network), IERN (Implement Enterprise Routing Network), and IENP (Improving Enterprise Network performance), which includes advanced IPv4 routing and switching technology principles, network security, high availability and QoS, as well as the configuration of Huawei products. HCIE-Enterprise (Huawei Certified Internetwork Expert-Enterprise) is designed to endue engineers with a variety of IP technologies and proficiency in the maintenance, diagnostics and troubleshooting of Huawei products, which equips engineers with competence in planning, design and optimization of large-scale IP networks.
HUAWEI TECHNOLOGIES
HCIE
HUAWEI TECHNOLOGIES
HCIE-R&S
Referenced icon
Router
L3 Switch
L2 Switch
Firewall
Serial line
Ethernet line
HUAWEI TECHNOLOGIES
Net cloud
HCIE
CONTENTS RIP ..................................................................................................................................................... 7 IS-IS.................................................................................................................................................. 59 OSPF .............................................................................................................................................. 123 BGP BASICS .................................................................................................................................... 196 BGP ADVANCED AND INTERNET DESIGN ........................................................................................ 266 ROUTE IMPORT AND CONTROL ...................................................................................................... 334 VLAN .............................................................................................................................................. 393 LAN LAYER 2 TECHNOLOGIES ......................................................................................................... 448 WAN LAYER 2 TECHNOLOGIES........................................................................................................ 496 STP ................................................................................................................................................. 548 MULTICAST .................................................................................................................................... 636 IPv6 ................................................................................................................................................ 719 MPLS VPN ...................................................................................................................................... 805 OTHER TECHNOLOGIES .................................................................................................................. 841
HUAWEI TECHNOLOGIES
RIPv1 packet format A RIP packet consists of two parts: Header and Route Entries. The Header includes the Command and Version fields. Route Entries include at most 25 routing entries. Each routing entry contains the Address Family Identifier field, IP Address of target network, and Metric field. The meaning of each field in a RIP packet is as follows: Command: indicates whether the packet is a request or response. The value is 1 or 2. The value 1 indicates a request, and the value 2 indicates a response. Version: specifies the used RIP version. The value 1 indicates a RIPv1 packet, and the value 2 indicates a RIPv2 packet. Address Family Identifier: specifies the used address family. The value is 2 for IPv4. If the packet is a request for the entire routing table, the value is 0. IP Address: specifies the destination address for the routing entry. The value can be a network address or host address. Metric: indicates how many hops the packet has passed through to the destination. Although the field value ranges from 0 to 2^32 (2 to the power of 32), the value ranges from 1 to 16 in RIP.
RIPv1 characteristics RIP is a UDP-based routing protocol. A RIP packet excluding an IP header has at most 512 bytes, which includes a 4-byte RIP header, and each route includes a 20-byte, the maxium message of RIP is 4+(25*20)=504-byte routing entries, and an 8-byte UDP header. A RIPv1 packet does not carry mask information. RIPv1 send and receive routes based on the main class network segment mask and interface address mask. Therefore, RIPv1 does not support route summarization or discontinuous subnets. RIPv1 packets do not carry the authentication field, and so RIPv1 does not support authentication.
RIPv2 packet format A RIPv2 packet has the same format as a RIPv1 packet except that RIPv2 uses some new and unused fields in RIPv1 to provide extended functions. The meaning of the new fields is as follows: Route Tag: indicates external routes learned from other protocols or routes imported into RIPv2. Subnet Mask: identifies the subnet mask of an IPv4 address. Next Hop: indicates a next-hop address that is better than the advertising router address. The value 0.0.0.0 indicates that the advertising router address is the optimal next-hop address. When authentication is configured in RIPv2, RIPv2 modifies the first Route Entries: Changes the Address Family Identifier field to 0XFFFF. Changes the Route Tag field to the Authentication Type field. Changes the IP Address, Subnet Mask, Next Hop, and Metric fields to the Password field. Compared with RIPv1, RIPv2 has the following advantages: Supports route tags. Route tags are used in routing policies to flexibly control routes. Tags can also be used when RIP processes import routes from each other. Supports subnet masks, route summarization, and CIDR.
Supports specified next hops to select the optimal next-hop address on a broadcast network. Multicasts route updates. Only RIPv2-running devices can receive protocol packets, reducing resource consumption. Supports packet authentication to enhance security. On a broadcast network with more than two devices, the Next Hop field changes to optimize the path.
In MD5 authentication, the AND operation is performed on route entries and shared key. A router then sends the AND operation results and route entries to the neighbor.
RI mainly uses three timers: Update timer: defines the interval between two route updates. It periodically triggers the transmission of route updates at a default interval of 30 seconds. Aging timer: specifies the aging time of routes. If a RIP device does not receive the update of a route from its neighbor within the aging time, the RIP device considers the route as unreachable. After the aging timer expires, the RIP device sets the metric of the route to 16. Garbage-collect timer: specifies the interval between a route is marked as unreachable and the route is deleted from the routing table. The default interval is four times the update interval, namely, 120 seconds. If the RIP device does not receive the update of an unreachable route from the same neighbor within the garbage-collect time (defaults to 120 seconds), the RIP device deletes the route from the routing table. Relationship between three timers: RIP route update advertisement is controlled by the update timer. A route update is sent at a default interval of 30 seconds. Each routing entry has two timers: aging timer and garbagecollect timer. When a route is learned and added to the routing table, the aging timer starts. If a RIP device does not receive the update of the route from a neighbor when the aging timer expires, the device sets the metric of the route to 16 (indicatingan unreachable route) and starts the garbage-collect timer.
If the device still does not receive the update of the unreachable route from the neighbor when the garbage-collect timer expires, the device deletes the route from the routing table.
Precautions If a RIP device does not have the triggered update function, it deletes an unreachable route from the routing table after a maximum of 300 seconds (aging time plus garbage-collect time). If a RIP device has the triggered update function, it deletes an unreachable route from the routing table after a maximum of 120 seconds (the garbage-collect time).
Split horizon RIP uses split horizon to reduce bandwidth consumption and prevent routing loops. Implementation R1 sends R2 a route to network 10.0.0.0/8. If split horizon is not configured, R2 sends the route learned from R1 back to R1. In this manner, R1 can learn two routes to network 10.0.0.0/8: one direct route with zero hops and the other route with two hops and R2 as the next hop. However, only the direct route is active in the RIP routing table of R1. When the route from R1 to network 10.0.0.0/8 becomes unreachable and R2 does not receive route unreachable information, R2 continues sending route information indicating that network 10.0.0.0/8 is reachable to R1. Subsequently, R1 receives incorrect route information and considers that it can reach network 10.0.0.0/8 through R2; R2 still considers that it can reach network 10.0.0.0/8 through R1. As a result, a routing loop occurs. After split horizon is configured, R2 does not send the route to network 10.0.0.0/8 back to R1, preventing a routing loop. Precautions Split horizon is disabled on NBMA networks by default.
Poison reverse function Poison reverse helps delete useless routes from the routing table of the peer end. Implementation After receiving a route 10.0.0.0/8 from R1, R2 sets the metric of the route to 16, indicating that the route is unreachable, if poison reverse is configured. Then R1 does not use the route 10.0.0.0/8 learned from R2, preventing a routing loop. Precautions Poison reverse is disabled by default. Generally, split horizon is enabled on Huawei devices (except on NBMA networks) and poison reverse is disabled. Comparisons between split horizon and poison reverse Both split horizon and poison reverse can prevent routing loops in RIP. The difference between them is as follows: Split horizon avoids advertising a route back to neighbors along the same path to prevent routing loops, while poison reverse marks a route as unreachable and advertises the route back to neighbors along the same path to prevent routing loops.
Triggered update Triggered update can shorten the network convergence time. When a routing entry changes, a RIP device broadcasts the change to other devices immediately without waiting for periodic update. If triggered update is not configured, by default, invalid routes are retained in the routing table for a maximum of 300 seconds (aging time plus garbage-collect time). Update is not triggered when the next-hop address becomes unreachable. Implementation After R1 detects a network fault, it sends a route update to R2 immediately without waiting for the expiry of the update timer. Subsequently, the routing table of R2 is updated in a timely manner.
Route summarization RIPv2 supports route summarization. Because RIPv2 packets carry the mask, RIPv2 supports subnetting. Route summarization can improve scalability and efficiency of large networks and reduce the routing table size. RIPv2 process-based classful summarization can implement automatic summarization. Interface-based summarization can implement manual summarization. If the routes to be summarized carry tags, the tags are deleted after these routes are summarized into one summary route. Case Two routes: route 10.1.0.0/16 (metric=10) and route 10.2.0.0/16 (metric=2) are summarized into one natural network segment route 10.0.0.0/8 (metric=3).
Working process analysis: Initial state: A router starts a RIP process, associates an interface with the RIP process, and sends as well as receives RIP packets from the interface. Build a routing table: The router builds its routing entries according to received RIP packets. Maintain the routing table: The router sends and receive a route update at an interval of 30 seconds to maintain its routing entries. Age routing entries: The router starts a 180-second timer for its routing entries. If the router receives route updates within 180 seconds, it resets the update timer and aging timer. Garbage collect entries: If the router does not receive the update of a route after 180 seconds, it starts the 120-second garbage-collect timer and sets the metric of the route to 16. Delete routing entries: If the router still does not receive the update of the route after 120 seconds, it deletes the route from the routing table.
Case description In this case, R1, R2, and R3 reside on network 192.168.1.0/24; R3, R4, and R5 reside on network 192.168.2.0/24. All the routers run RIPv2 and advertise IP addresses of connected interfaces. To control route selection on R3, modify the metric of routes. Remarks In the IP routing table, only some related routing entries are displayed. In the Flags field of the route, R indicates an iterated route, and D indicates that the route is delivered to the FIB table. The route iteration process is as follows: Iteration process is finding routing for iteration. On a device, when the next hop of a route to the destination address does not match the outbound interface of the device, routes can match again the destination address in the table of the “next hop” so routes be iterated to find the correct outbound interface for forwarding. The FIB table is the route forwarding table that is generated by the routing table. You can run the display fib command to view the forwarding table.
Command usage The rip metricin command increases the metric of a received route. After the route is added to the routing table, the metric of the route is changed. Running this command affects route selection of the local device and other devices. The rip metricout command increases the metric of an advertised route. The metric of the route remains unchanged in the routing table. Running this command does not affect route selection of the local device but affects route selection of other devices. View Interface view Parameters rip metricout { value | { acl-number | acl-name acl-name | ipprefix ip-prefix-name } value1 }: sets the additional metric to be added to an advertised route. value: increases the metric of an advertised route. The value ranges from 1 to 15 and defaults to 1. acl-number: specifies a basic ACL number. The value ranges from 2000 to 2999. acl-name acl-name: specifies an ACL name. The value is case-sensitive. ip-prefix ip-prefix-name: specifies an IP prefix list name, which must be unique.
value1: increases the metric of the route that passes the filtering of an ACL or IP prefix list. Precautions You can specify value1 to increase the metric of the advertised RIP route that passes the filtering of an ACL or IP prefix list. If a RIP route does not pass the filtering, its metric is increased by 1. Running the rip metricin/metricout commands will affect route selection of other devices.
Case description The topology in this case is the same as that in the previous case. To prevent interfaces from sending or receiving route updates, suppress the interfaces or run the undo rip input/output commands.
Command usage The silent-interface command suppresses an interface to allow it to receive but not send RIP packets. If an interface is suppressed, direct routes of the network segment where the interface resides can still be advertised to other interfaces. This command can be used together with the peer (RIP) command to advertise routes to a specified device. The undo rip output/input command prohibits an interface from sending/receiving RIP packets. View silent-interface: RIP view undo rip output/input: interface view Parameters silent-interface { all | interface-type interface-number } all: suppresses all the interfaces. Precautions After all the interfaces are suppressed, one of the interfaces cannot be activated. That is, the silent-interface all command has the highest priority. In this case, all the interfaces of R4 are suppressed, and so any interface of R4 cannot be activated.
Configuration verification The display ip routing-table command output shows that: R3 can receive the update of route 172.16.0.0/24 from R5 but not R4 and can receive the update of route 10.0.0.0/24 from R1 but not R2.
Case description The topology in this case is the same as that in the previous case. To prevent a device from receiving routes from a specified neighbor, run the filter-policy gateway command.
Command usage The filter-policy { acl-number | acl-name acl-name } import command filters received routes based on an ACL. The filter-policy gateway ip-prefix-name import command filters routes based on the advertising gateway. View filter-policy { acl-number | acl-name acl-name | ip-prefix ipprefix-name } import: RIP view filter-policy gateway ip-prefix-name import: RIP view Parameters filter-policy { acl-number | acl-name acl-name } import acl-number: specifies the number of a basic ACL used to filter the destination address of routes. acl-name acl-name: specifies the name of an ACL. The name is case-sensitive and must start with a letter. ip-prefix: filters routes based on an IP prefix list. ip-prefix-name: specifies the name of an IP prefix list used to filter the destination address of routes. filter-policy gateway ip-prefix-name import gateway: filters routes based on the advertising gateway. ip-prefix-name: specifies the IP prefix list name of the advertising gateway.
Configuration verification Run the filter-policy gateway command to filter routes from a specified neighbor. In this case, routes from R4 are filtered on R3.
Case description To reduce routing entries, Company A decides to summarize routes. RIPv2 summarization includes automatic summarization based on the main class network and manual summarization. You can perform automatic summarization on R1 and manual summarization on R3 and R4.
Command usage summary [ always ]: When the class summarization is enable, summary routes are advertised to the natural network boundary. In default the RIPv2 is enable. But If split horizon or poison reverse is configured, summarization will become invalid. And when the always parameter is configured, no matter how the split horizon or poison reverse situation is, RIPv2 automatic summarization is enable. rip summary-address ip-address mask [ avoid-feedback ]: configures a RIP router to advertise a local summary IP address. If the avoid-feedback keyword is configured, the local interface does not learn the summary route to the advertised summary IP address. This configuration prevents routing loops.
View summary [ always ]: RIP view rip summary-address ip-address mask [ avoid-feedback ]: interface view Parameters summary [ always ] always: If the always parameter is not configured, classful summarization becomes ineffective when split horizonor poison reverse is configured.
Therefore, when summary routes are advertised to the natural network boundary with no always, split horizon or poison reverse must be disabled in corresponding views. rip summary-address ip-address mask [ avoid-feedback ] ip-address: specifies a summary IP address. mask: specifies a network mask. avoid-feedback: avoids learning the summary route to the advertised summary IP address from the interface.
Case description In this case, R1 and R2 connect over network 192.168.1.0/24. R1 connects to network 10.0.0.0/24, and R2 connects to network 172.16.0.0/24. Devices on the network run RIPv2 and import the routes to networks where the devices reside. Only the display command output of R1 is provided and only information about this case is displayed.
Command usage timers rip update age garbage-collect: adjusts a timer. rip authentication-mode md5 nonstandard passwordkey key-id: configures the MD5 authentication mode. Authentication packets use the nonstandard packet format. nonstandard indicates that MD5 authentication packets use the nonstandard packet format (IETF standards). rip replay-protect [ window-range ]: enables the replayprotect function. window-range specifies the receive or transmit buffer size for connections. The default value is 50. View timers rip update age garbage-collect: RIP view rip authentication-mode md5 nonstandard passwordkey key-id: interface view rip replay-protect [ window-range ]: interface view Parameters timers rip update age garbage-collect update: specifies the interval for transmitting route updates. age: specifies the route aging time. garbage-collect: specifies the interval at which an unreachable route is deleted from the routing table, namely, garbage-collect time defined in standards.
Precautions If the three timers are configured incorrectly, routes become unstable. The update time must be shorter than the aging time. For example, if the update time is longer than the aging time, a RIP router cannot notify route updates to neighbors within the update time. In applications, the timeout period of the garbagecollect timer is not fixed. When the update timer is set to 30 seconds, the garbage-collect timer may range from 90 to 120 seconds. The reason is as follows: Before the RIP router deletes an unreachable route from the routing table, it sends Update packets four times to advertise the route and sets the metric of the route is set to 16. Subsequently, all the neighbors learn that the route is unreachable. Because a route may become unreachable anytime within an update period, the garbage-collect timer is 3 to 4 times the update timer. Assume that the Identification field (a field in an IP header) of the last RIP packet sent before a RIP interface goes Down is X. After the interface becomes Up, the Identification file of the RIP packet sent again becomes 0, and subsequent RIP packets are discarded until a RIP packet with the Identification field as X+1 is received. This, however, causes asynchronous and lost RIP routing information between two ends. To address this issue, configure the rip replay-protect command to enable the RIP interface to obtain the Identification field of the last RIP packet sent before the RIP interface goes Down and increase the Identification field in the subsequent RIP packet by 1.
1. Check whether ARP is working properly. 2. Check whether related interfaces are Up. 3. Check whether RIP is enabled on the interfaces. Run the display current-configuration configuration rip command to view information about the RIP-enabled network segment. Check whether the interfaces reside on the network segment. The network address specified in the network command must be a natural network address. 4. Check whether versions of the RIP packets sent by the peer end and received by the local end match. By default, an interface sends only RIPv1 packets but can receive RIPv1 and RIPv2 packets. When an inbound interface receives RIP packets of a different version, RIP routes may fail to be correctly received. 5. Check whether a routing policy is configured to filter received RIP routes. If so, modify the routing policy. 6. Check whether UDP port 520 is disabled. 7. Check whether the undo rip input/output commands are configured on the interfaces or whether a high metric is configured using the rip metricin command. 8. Check whether the interfaces are suppressed. 9. Check whether the route metric is larger than 16. 10. Check whether the interface authentication modes on two ends match. If packet authentication fails, correctly configure interface authentication modes.
1. Check whether RIP is enabled on the interfaces. Run the display current-configuration configuration rip command to view information about the RIP-enabled network segment. Check whether the interfaces reside on the network segment. The network address specified in the network command must be a natural network address. 2. Check whether versions of the RIP packets sent by the peer end and received by the local end match. By default, an interface sends only RIPv1 packets but can receive RIPv1 and RIPv2 packets. When an inbound interface receives RIP packets of a different version, RIP routes may fail to be correctly received. 3. Check whether a routing policy is configured to filter received RIP routes. If so, modify the routing policy. 4. Check whether UDP port 520 is disabled. 5. Check whether the undo rip input/output commands are configured on the interfaces or whether a high metric is configured using the rip metricin command. 6. Check whether the interfaces are suppressed. 7. Check whether the route metric is larger than 16. 8. Check whether the interface authentication modes on two ends match. If packet authentication fails, correctly configure interface authentication modes.
Case description In this case, R1 connects to R2 through a frame relay network. R1 connects to network 10.X.X.0/24, and R2 connects to network 172.16.X.0/24.
Analysis process In the pre-configurations of R1 and R2, the frame relay configuration supports multicast. R1 sends version 2 Update packets to R2 in multicast. R1 and R2 can learn routes to each other.
Results Generally, the peer command makes the routers send the packets in unicast, but not surpress to sent packets in multicast by default. Therefore, suggest configure the related interfaces are silent mode when configure this command. So, the multicast packets is supress and only unicast packets can be sent.
Results The display rip route command displays the RIP routes learned from other routers and values of timers for routes. The Tag field indicates whether a RIP route is an internal or external route. The default value is 0. The Flags field indicates whether a RIP route is active or inactive. The value RA indicates an active RIP route, and the value RG indicates an inactive RIP route and that the garbage-collect timer has been started.
Results After the avoid-feedback keyword is specified, the local interface does not learn the summary route to the advertised summary IP address, preventing routing loops. The filter-policy export command configures a filtering policy to filter the routes to be advertised. Only the filtered routes can be added to the routing table and advertised through Update packets.
Case description In this topology, R1, R2, and R3 connect to the same broadcast domain. R3 connects to network 172.16.X.0/24 and advertises routes to RIP.
Analysis process In requirements 1 and 3, R1 is taken as an example. The command output shows that R1 sends multicast packets and does not start authentication. Before meeting requirement 2, R1 can receive all routes to 172.16.X.0/24.
Results RIP authentication command can only be configured on an interface. Huawei devices support standard MD5 authentication and Huawei proprietary authentication mode. You can run the display rip process-id interface interfacetype verbose command to view the authentication mode. Parameters rip authenticationmode { simple password | md5 { nonstandard { passwordkey1 key-id | keychain keychain-name } | usual passwordkey2 } } simple: indicates plain-text authentication. password: Specifies the plain-text authentication password. md5: indicates MD5 cipher-text authentication. nonstandard: indicates that MD5 cipher-text authentication packets use the nonstandard packet format (IETF standards) password-key1: specifies the authentication password in cipher text. key-id: specifies the key in MD5 cipher-text authentication. keychain keychain-name: specifies a keychain name. usual: indicates that MD5 cipher-text authentication packets use the universal packet format (namely, private standards).
password-key2: indicates the cipher-text authentication keyword. Precautions Only one authentication password is used for each authentication. If multiple authentication passwords are configured, only the latest one takes effect. The authentication password does not contain spaces.
Results Only an ACL can be used but an IP prefix list cannot be used, When defined ACLs make sure use the wild-mask. In this case, need focus on the bits of wild-mask is 0, and the other bits is 1.
Results RIPv2 multicasts Update packets by default. You can run the rip version 2 broadcast command in the interface view to configure RIPv2 to broadcast Update packets.
IS-IS Overview IS-IS is a dynamic routing protocol designed by the International Organization for Standardization (ISO) for its Connectionless Network Protocol (CLNP). The Internet Engineering Task Force (IETF) extended and modified IS-IS so that IS-IS can be applied to TCP/IP and OSI environments. This version of IS-IS is called Integrated IS-IS. IS-IS Terms Connectionless network service (CLNS) CLNS consists of the following three protocols: CLNP: is similar to the Internet Protocol (IP) of TCP/IP. IS-IS: is a routing protocol between intermediate systems, that is, a protocol between routers. ES-IS: End System to Intermediate System ,is similar to Address Resolution Protocol (ARP) and Internet Control Message Protocol (ICMP) of IP. NSAP: The open systems interconnection (OSI) uses NSAP(Network Service Access Point) to search for various services at the transport layer on OSI networks. An NSAP is similar to an IP address. Note for Integrated IS-IS Integrated IS-IS applies to TCP/IP and OSI environments. Unless otherwise specified, the IS-IS protocol in this material refers to Integrated IS-IS.
Overall IS-IS Topology To support large-scale routing networks, IS-IS adopts a two-level hierarchy consisting of a backbone area and nonbackbone areas in an autonomous system (AS). Generally, Level-1 routers are deployed in non-backbone areas, whereas Level-2 and Level-1-2 routers are deployed in the backbone area. Each non-backbone area connects to the backbone area through a Level-1-2 router. Topology Introduction The figure shows a network that runs IS-IS. The network topology is similar to the multi-area topology of an OSPF network. The backbone area contains all routers in area 47.0001 and Level-1-2 routers in other areas. In addition, Level-2 routers can be in different areas. Differences between IS-IS and OSPF of topology are as follows: In OSPF, a link can belongs to only one area.In IS-IS, a link can belong to different areas. In IS-IS, no area is physically defined as the backbone or non-backbone area. In OSPF, Area 0 is defined as the backbone area. In IS-IS, Level-1 and Level-2 routers use the shortest path first (SPF) algorithm to generate shortest path trees (SPTs) respectively. In OSPF, the SPF algorithm is used only in the same area, and inter-area routes are forwarded by the backbone area.
Level-1 Router A Level-1 router manages intra-area routing. It establishes neighbor relationships with only Level-1 and Level-1-2 routers in the same area. A Level-1 router maintains a Level-1 link state database (LSDB), which contains routes in the local area. A Level-1 router forwards packets destined for other areas to the nearest Level-1-2 router. A Level-1 router connects to other areas through a Level-12 router. Level-2 Router A Level-2 router manages inter-area routing. It can establish neighbor relationships with Level-2 routers in the same area or in other areas, as well as Level-1-2 routers. A Level-2 router maintains a Level-2 LSDB, which contains all routes in the IS-IS network. All Level-2 routers form the backbone network of the routing domain,They establish Level-2 neighbor relationships and are responsible for inter-area communication. Level-2 routers in the routing domain must be physically contiguous to ensure the continuity of the backbone network. Level-1-2 Router A router that belongs to both a Level-1 area and a Level-2 area is called a Level-1-2 router. It can establish Level-1 neighbor relationships with Level-1 and Level-1-2 routers in the same area.
It can also establish Level-2 neighbor relationships with Level-2 and Level-1-2 routers in the same area or the other areas. A Level-1 router connects to other areas through a Level-12 router. A Level-1-2 router maintains a Level-1 LSDB for intra-area routing and a Level-2 LSDB for inter-area routing.
Network Types Supported by IS-IS For a non-broadcast multiple access (NBMA) network such as a frame relay (FR) network, you need to configure subinterfaces and set the sub-interface type to point-to-point (P2P). IS-IS cannot run on point-to-multipoint (P2MP) links. DIS In a broadcast network, IS-IS needs to elect a designated intermediate system (DIS) from all the routers. The Level-1 DIS and Level-2 DIS are elected respectively. The router with the highest DIS priority is elected as the DIS. If there are multiple routers with the highest DIS priority, the router with the largest MAC address is elected as the DIS. You can set different DIS priorities for electing DISs of different levels. A router whose DIS priority is 0 can also participate in a DIS election, which supports preemption. All routers (including non-DIS routers) of the same level and on the same network segment establish adjacencies. However, the LSDB synchronization is ensured by DISs. DISs are used to create and update pseudonodes, and generate link state protocol data units (LSPs) of pseudonodes. LSPs are used to describe network devices on the network.
Pseudonode A pseudonode is used to simulate a virtual node in the broadcast network. It is not a real router. In IS-IS, a pseudonode is identified by the system ID of the DIS and the 1-byte Circuit ID (its value is not 0). The use of pseudonodes simplifies the network topology. When the network changes, the number of generated LSPs is reduced, and the SPF calculation consumes fewer resources. Differences Between DIS in IS-IS and designated router (DR)/backup designated router (BDR) in OSPF In an IS-IS broadcast network, a router whose priority is 0 also takes part in DIS election. In an OSPF network, a router whose priority is 0 does not take part in DR election. In an IS-IS broadcast network, when a new router that meets the requirements of being a DIS connects to the network, the router is elected as the new DIS, and the previous pseudonode is deleted. This causes a new flooding of LSPs. In an OSPF network, when a new router connects to the network, it is not immediately elected as the DR even if it has the highest DR priority. In an IS-IS broadcast network, all routers (including nonDIS routers) of the same level and on the same network segment establish adjacencies.
NSAP An NSAP consists of the initial domain part (IDP) and domain specific part (DSP). The lengths of the IDP and DSP are variable. The maximum length of the NSAP is 20 bytes and its minimum length is 8 bytes. The IDP is similar to the network ID in an IP address. It is defined by the ISO and consists of the authority and format identifier (AFI) and initial domain identifier (IDI). The AFI indicates the address allocation authority and address format, and the IDI identifies a domain. The DSP is similar to the subnet ID and host address in an IP address. It consists of the High Order DSP (HODSP), system ID, and NSAP selector (SEL). The HODSP is used to divide areas, the system ID identifies a host, and the SEL indicates the service type. The area address (area ID) consists of the IDP and the HODSP of the DSP. It identifies a routing domain and areas in a routing domain. An area address is similar to an area number in OSPF. Routers in the same Level-1 area must have the same area address, while routers belong to the Level-2 area can have different area addresses. A system ID uniquely identifies a host or router in an area. On a device, the fixed length of the system ID is 48 bits (6 bytes). Generally, the device's router ID is converted into a system ID. An SEL provides similar functions as the protocol identifier of IP. A transport protocol matches an SEL. The SEL is always 00 in IP.
NET An NET indicates network layer information about a device. An NET can be regarded as a special NSAP (SEL is 00). The NET length is the same as the NSAP length. Its maximum length is 20 bytes and minimum length is 8 bytes. When configuring IS-IS on a router, you only need to consider an NET but not an NSAP. A maximum of three NETs can be configured during IS-IS configuration. When configuring multiple NETs, ensure that their system IDs are the same.
Hello PDU (IIH) Level-1 LAN IIHs apply to Level-1 routers on broadcast networks. Level-2 LAN IIHs apply to Level-2 routers on broadcast networks. P2P IIHs apply to non-broadcast networks. Compared to a LAN IIH, a P2P IIH does not have the Priority and LAN ID fields, but has a Local Circuit ID field. The Priority field indicates the DIS priority in a broadcast network, the LAN ID field indicates the system ID of the DIS and pseudonode, and the Local Circuit ID field indicates the local link ID. IIHs are used for two neighbors to negotiate MTU by padding the packets to the maximum size. LSP LSPs are similar to link-state advertisements (LSAs) in OSPF. Level-1 routers transmit Level-1 LSPs. Level-2 routers transmit Level-2 LSPs. Level-1-2 routers transmit both Level-1 and Level-2 LSPs. The ATT, OL, and IS-Type fields are major fields in an LSP. The ATT field identifies that the LSP is sent by a Level-1 or Level-2 router. The OL field identifies the overload state. The IS-Type field indicates whether the router that generates the LSP is a Level-1 router or Level-2 router (the value 01 indicates Level-1 and the value 11 indicates Level-2). The LSP update interval is 15 minutes and aging time is 20 minutes. However, an expired LSP will be kept in the database for an additional 60 seconds (known as ZeroAgeLifetime) before it is cleared. The LSP retransmission time is 5 seconds.
Sequence number PDU (SNP) An SNP contains summary information of the LSDB and is used to maintain LSDB integrity and synchronization. Complete SNPs (CSNPs) carry summaries of all LSPs in LSDBs, ensuring LSDB synchronization between neighboring routers. In a broadcast network, the DIS periodically sends CSNPs. The default interval for sending CSNPs is 10 seconds. On a P2P link, CSNPs are sent only when the neighbor relationship is established for the first time. Partial SNPs (PSNPs) carry summaries of LSPs in some LSDBs, and are used to request and acknowledge LSPs. Initial Packet Structure of an IS-IS PDU Intra domain routing protocol discriminator • This field has a fixed value of 0x83 in all IS-IS PDUs. • PDU header length indicator • It identifies the length of the fixed header field. • Version/protocol ID extension • It has a fixed value of 1. • System ID length • It indicates the system ID length and has a fixed value of 6 bytes. • PDU type • It identifies the PDU type. • Version • It has a fixed value of 1. • Reserve • It is set to all zeros. • Max areas • It indicates the maximum number of areas supported by the intermediate system (IS). If the value is 3, the IS supports a maximum of three areas. IIHs on a P2P link Circuit type • It indicates the level of the router that sends the PDU. If this field is set to 0, the PDU will be ignored. System ID • It indicates the system ID of the originating router that sends the IIH. Holding time • It indicates the interval for the peer router to wait for the originating router to send the next IIH. PDU length • It indicates the PDU length. Local circuit ID • It is allocated to the local circuit by the originating router when the router sends IIHs. This ID is unique on the router interface. On the other end of the P2P link, thecircuit ID contained in IIHs may be the same as or different from the local circuit ID.
Area address TLV • It indicates the area address of the originating router. IP interface address TLV • It indicates the interface address or IP address of the router that sends the PDU. Protocol supported TLV • It indicates protocol types supported by the originating router, such as IP, CLNP, and IPv6. Restart option TLV • It is used for graceful restart. Point-to-point adjacency state TLV • It indicates that three-way handshake is supported. Multi topology TLV • It indicates that multi-topology is supported. Padding TLV • It indicates that IIH padding is supported.
LSP
PDU length • It indicates the PDU length. Remaining lifetime • It indicates the time before an LSP expires LSP ID • It can be the system ID, pseudonode ID, or LSP number. • The value 0000.0000.0001.00-00 indicates a common LSP. • The value 0000.0000.0001.01-00 indicates a pseudonode LSP. • The value 0000.0000.0001.00-01 indicates a fragment of a common LSP. Sequence number • It indicates the sequence number of the LSP. The value starts from 0 and increases by 1. The maximum value is 2^32-1. Checksum • It indicates the checksum. The checksum start after from the LSP Remaining Time till the end. P bit • It is used to repair segmented areas and is similar to the OSPF virtual link. Most vendors do not support this feature. ATT bit • It indicates that the originating router is connected to one or multiple areas. OL bit • It identifies the overload state. IS type • It indicates the router type.
Protocol supported TLV • It indicates protocol types supported by the originating router, such as IP, CLNP, and IPv6. Area address TLV • It indicates the area address of the originating router. IS reachability TLV • It is used to list neighbors of the originating router. IP interface address TLV • It indicates the interface address or IP address of the router that sends the PDU. IP internal reachability TLV • It indicates that the IP address is internally reachable. • It is used to advertise the IP address and related mask information of the area that directly connects to the router that sends the LSP. A pseudonode LSP does not contain this TLV. CSNP and PSNP PDU length • It indicates the PDU length. Source-ID • It indicates the system ID of the originating router. Start LSP-ID • It starts from 0000.0000.0000.00-00. • It ends at ffff.ffff.ffff.ff-ff. • LSP entries LSP summary information
Routers of different levels cannot establish neighbor relationships. Level-2 routers cannot establish neighbor relationships with Level-1 routers. However, Level-1-2 routers can establish Level-1 neighbor relationships with Level-1 routers in the same area, and establish Level-2 neighbor relationships with Level-2 routers in the same area or in different areas. Level-1 routers can only establish Level-1 neighbor relationships with Level-1 or Level-1-2 routers in the same area. IP addresses of IS-IS interfaces on both ends of a link must be on the same network segment. According to IS-IS principles, the establishment of IS-IS neighbor relationships is irrelevant to IP addresses. Therefore, routers that establish neighbor relationships may be on different network segments. To solve this problem, Huawei devices check the network segment of routers to ensure that IS-IS neighbor relationships are correctly established. You can configure interfaces not to check IP addresses on a P2P network if the network does not need to check the IP addresses. In a broadcast network, you need to simulate Ethernet interfaces as P2P interfaces before configuring the interfaces not to check IP addresses.
Two routers running IS-IS need to establish a neighbor relationship before exchanging protocol packets to implement routing. On different networks, the modes for establishing IS-IS neighbor relationships are different.
In a broadcast network, routers exchange LAN IIHs to establish neighbor relationships. LAN IIHs are classified into Level-1 LAN IIHs (with the multicast MAC address 01-80-C2-00-00-14) and Level-2 LAN IIHs (with the multicast MAC address 01-80-C2-00-00-15). Level-1 routers exchange Level-1 LAN IIHs to establish neighbor relationships. Level-2 routers exchange Level-2 LAN IIHs to establish neighbor relationships. Level-1-2 routers exchange Level-1 LAN IIHs and Level-2 LAN IIHs to establish neighbor relationships. In this example, two Level-2 routers establish a neighbor relationship on a broadcast link. R1 multicasts a Level-2 LAN IIH (with the multicast MAC address 01-80-C2-00-00-15) with no neighbor ID specified. R2 receives the packet and sets the status of the neighbor relationship with R1 to Initial. R2 then responds to R1 with a Level-2 LAN IIH, indicating that R1 is a neighbor of R2. R1 receives the packet and sets the status of the neighbor relationship with R2 to Up. R1 then responds to R2 with a Level-2 LAN IIH, indicating that R2 is a neighbor of R1. R2 receives the packet and sets the status of the neighbor relationship with R1 to Up. R1 and R2 successfully establish a neighbor relationship.
The network is a broadcast network, so a DIS needs to be elected. After the neighbor relationship is established, routers wait for two intervals before sending Hello PDUs to elect the DIS. Hello PDUs exchanged by the routers contain the Priority field. The router with the highest priority is elected as the DIS. If the routers have the same priority, the router with the largest interface MAC address is elected as the DIS. In an IS-IS network, the DIS sends Hello PDUs at an interval of 10/3 seconds, and non-DIS routers send Hello PDUs at an interval of 10 seconds. Differences between IS-IS Adjacencies and OSPF Adjacencies In IS-IS, two neighbor routers establish an adjacency if they exchange Hello PDUs. In OSPF, two routers establish a neighbor relationship if they are in 2-Way state, and establish an adjacency if they are in Full state. In IS-IS, a router whose priority is 0 can participate in a DIS election. In OSPF, a router whose priority is 0 does not take part in DR election. In IS-IS, the DIS election is based on preemption. In OSPF, a router cannot preempt to be the DR or BDR if the DR or BDR has been elected.
Unlike the establishment of a neighbor relationship on a broadcast network, the establishment of a neighbor relationship on a P2P network is classified into two modes: two-way mode and three-way mode.
Two-Way Mode Upon receiving a P2P IIH from a peer router, a router considers the peer router Up and establishes a neighbor relationship with the peer router. Unidirectional communication may occur. Three-Way Mode A neighbor relationship is established after P2P IIHs are sent for three times. The establishment of a neighbor relationship on a P2P network is similar to that on a broadcast network.
The process of synchronizing LSDBs between a newly added router and DIS on a broadcast link is as follows: Assume that the newly added router R3 has established neighbor relationships with R2 (DIS) and R1. R3 sends an LSP to a multicast address (01-80-C2-00-00-14 in a Level-1 area and 01-80-C2-00-00-15 in a Level-2 area). All neighbors on the network can receive the LSP. The DIS on the network segment adds the received LSP to its LSDB. After the CSNP timer expires, the DIS sends CSNPs at an interval of 10 seconds to synchronize the LSDBs on the network. R3 receives the CSNPs from the DIS, checks its LSDB, and sends a PSNP to the DIS to request the LSPs it does not have. The DIS receives the PSNP and sends the required LSPs to R3 for LSDB synchronization. The process of updating the LSDB of the DIS is as follows: The DIS receives an LSP and searches the matching record in the LSDB. If no matching record exists, the DIS adds the LSP to the LSDB and multicasts the new LSDB. If the sequence number of the received LSP is larger than that of the corresponding LSP in the LSDB, the DIS replaces the local LSP with the received LSP and multicasts the new LSDB. If the sequence number of the received LSP is smaller than that of the LSP in the LSDB, the DIS sends the local LSP to the inbound interface.
If the sequence number of the received LSP is the same as that of the corresponding LSP in the LSDB, the DIS compares the remaining lifetime of the two LSPs. If the remaining lifetime of the received LSP is smaller than that of the LSP in the LSDB, the DIS replaces the local LSP with the received LSP and broadcasts the new LSDB. If the remaining lifetime of the received LSP is larger than that of the LSP in the LSDB, the DIS sends the local LSP to the inbound interface. If the sequence number and the remaining lifetime of the received LSP and those of the corresponding LSP in the LSDB are the same, the DIS compares the checksum of the two LSPs. If the checksum of the received LSP is larger than that of the LSP in the LSDB, the DIS replaces the local LSP with the received LSP and broadcasts the new LSDB. If the checksum of the received LSP is smaller than that of the LSP in the LSDB, the DRB sends the local LSP to the inbound interface. If the sequence number, remaining lifetime, and checksum of the received LSP and those of the corresponding LSP in the LSDB are the same, the LSP is not forwarded.
The process of synchronizing LSDBs on a P2P network is as follows: After establishing a neighbor relationship, R1 and R2 send a CSNP to each other. If the LSDB of the neighbor and the received CSNP are not synchronized, the neighbor sends a PSNP to request the required LSP. Assume that R2 requests the required LSP from R1. R1 sends the required LSP to R2, starts the LSP retransmission timer, and waits for a PSNP from R2 as an acknowledgement for the received LSP. If R1 does not receive a PSNP from R2 after the LSP retransmission timer expires, R1 resends the LSP until it receives a PSNP from R2. The process of updating LSDBs on a P2P link is as follows: If the sequence number of the received LSP is smaller than that of the corresponding LSP in the LSDB, the router directly sends the local LSP to the neighbor and waits for a PSNP from the neighbor. If the sequence number of the received LSP is larger than that of the corresponding LSP in the LSDB, the router adds the received LSP to its LSDB, sends a PSNP to acknowledge the received LSP, and then sends the received LSP to all its neighbors except the neighbor that sends the LSP. If the sequence number of the received LSP is the same as that of the corresponding LSP in the LSDB, the router compares the remaining lifetime of the two LSPs.
If the remaining lifetime of the received LSP is smaller than that of the LSP in the LSDB, the router replaces the local LSP with the received LSP, sends a PSNP to acknowledge the received LSP, and sends the received LSP to all neighbors except the neighbor that sends the LSP. If the remaining lifetime of the received LSP is larger than that of the LSP in the LSDB, the router sends the local LSP to the neighbor and waits for a PSNP. If the sequence number and remaining lifetime of the received LSP are the same as those of the corresponding LSP in the LSDB, the router compares the checksums of the two LSPs. If the checksum of the received LSP is larger than that of the LSP in the LSDB, the router replaces the local LSP with the received LSP, sends a PSNP to acknowledge the received LSP, and sends the received LSP to all neighbors except the neighbor that sends the LSP. If the checksum of the received LSP is smaller than that of the LSP in the LSDB, the router sends the local LSP to the neighbor and waits for a PSNP. If the sequence number, remaining lifetime, and checksum of the received LSP and those of the corresponding LSP in the LSDB are the same, the LSP is not forwarded.
On a P2P network, a PSNP has the following functions: It is used to acknowledge a received LSP. It is used to request a required LSP.
Assume that R1 sends packets to R6. The default situation is as follows:
As a Level-1 router, R1 does not know routes outside its area, so it sends packets to other areas through the default route generated by the nearest Level-1-2 router (R3). Therefore, R1 selects the route R1->R3->R5->R6, which is not the optimal
route, to forward the packets. To solve this question, IS-IS provide the Route Leaking. You can configure access control lists (ACLs) and routing policies and mark routes with tags on Level-1-2 routers to select eligible routes. Then a Level-1-2 router can advertise routing information of other Level-1 areas and the backbone area to its Level-1 area. If route leaking is enabled on Level-1-2 routers (R3 and R4), Level-1 routers in area 47.0001 can know of routes outside area 47.0001 and routes passing through the two Level-1-2 routers. After route calculation, the forwarding path becomes R1->R2->R4->R5->R6, which is the optimal route from R1 to R6.
Principles LSPs with the overload bit are still flooded on the network, but the LSPs are not used when routes that pass through a router configured with the overload bit are calculated. That is, after the overload bit is set on a router, other routers ignore this router when performing SPF calculation and calculate only the direct routes of the router.
Topology
R2 forwards the packets from R1 to R3. If the overload bit on R2 is set to 1, R1 considers the LSDB of R2 incomplete and sends packets to R3 through R4 and R5. This process does not affect packets sent to the directly connected address of R2.
A device enters the overload state in the following situations: A device automatically enters the overload state due to exceptions. You can manually configure a device to enter the overload state. Results of entering the overload state
If the system enters the overload state due to exceptions, the system deletes all the imported or leaked routes. If the system is configured to enter the overload state, the system determines whether to delete all the imported or leaked routes based on the configuration.
Fast Convergence Incremental SPF (I-SPF): recalculates only the routes of the changed nodes rather than all the nodes when the network topology changes, with exception to where calculation is performed for the first time, at which time all nodes are involved, thereby speeding up route calculation. I-SPF improves the SPF algorithm. The shortest path tree (SPT) generated is the same as that generated by the SPF algorithm. This decreases CPU usage and speeds up network convergence. Partial route calculation (PRC): calculates only the changed routes when the network topology changes. Similar to I-SPF, PRC calculates only the changed routes, but it does not calculate the
shortest path. It updates routes based on the SPT calculated by I-SPF. In route calculation, a leaf represents a route, and a node represents a router. If the SPT changes after I-SPF calculation, PRC processes all the leaves only on the changed node. If the SPT remains unchanged, PRC processes only the changed leaves. For example, if IS-IS is enabled on an interface of a node, the SPT calculated by I-SPF remains unchanged. PRC updates only the routes of this interface, consuming less CPU resources.
Intelligent Timer LSP generation intelligent timer: There is a minimum interval restriction on LSP generation to prevent frequent flapping of LSPs from affecting the network. The same LSP cannot be generated repeatedly within the minimum interval, which is 5 seconds by default. This restriction significantly affects route convergence speed. In IS-IS, if local routing information changes, a router generates a new LSP to advertise this change. When local routing information changes frequently, the newly generated LSPs consume a lot of system resources. If the delay in generating an LSP is too long, the router cannot advertise changed routing information to neighbors in time, reducing the network convergence speed. The delay in generating an LSP for the first time is determined by init-interval, and the delay in generating an LSP for the second time is determined by incr-interval. From the third time on, the delay in generating an LSP increases twice every time until the delay reaches the value specified by max-interval. After the delay remains at the value specified by max-interval for three times or the IS-IS process is restarted, the delay decreases to the value specified by initinterval. When only max-interval is specified, the intelligent timer functions as an ordinary one-time triggering timer. SPF calculation intelligent timer: In IS-IS, routes are calculated when the LSDB changes. However, frequent route calculations consume a lot of system resources and decrease the system performance. Delaying SPF calculation can improve route calculation efficiency. If the delay in route calculation is too long, the route convergence speed is reduced. The delay in SPF calculation for the first time is determined by init-interval and the delay in SPF calculation for the second time is determined by incrinterval. From the third time on, the delay in SPF calculation increases twice every time until the delay reaches the value specified by max-interval. After the delay remains at the value specified by max-interval for three times or the IS-IS process is restarted, the delay decreases to the value specified by init-interval. If incr-interval is not specified, the delay in SPF calculation for the first time is determined by init-interval. From the second time on, the delay in SPF calculation is determined by max-interval. After the delay remains at the value specified by maxinterval for three times or the IS-IS process is restarted, the delay decreases to the value specified by init-interval. When only max-interval is specified, the intelligent timer functions as an ordinary one-time triggering timer.
LSP fast flooding: Because the number of LSPs is huge, IS-IS periodically floods LSPs in batches to reduce the impact of LSP flooding on network devices. By default, the minimum interval for sending LSPs on an interface is 50 milliseconds and the maximum number of LSPs sent at a time is 10. After the flashflood function is enabled, when LSPs change and cause SPF recalculation, IS-IS immediately floods LSPs that cause SPF recalculation instead of sending the LSPs periodically. When the network topology changes, LSDBs of all devices on the network are inconsistent. This function effectively reduces the time during which LSDBs are inconsistent and improves the network fast convergence performance. When a network fault occurs, only a small number of LSPs change although a large number of LSPs exist. Therefore, IS-IS only needs to flood the changed LSPs and consumes a few system resources. Priority-based Convergence You can use the IP prefix list to filter routes and configure different convergence priorities for different routes so that important routes are converged first, improving the network reliability. The convergence priorities of IS-IS routes are classified into critical, high, medium, and low in decreasing order.
In area authentication and routing domain authentication, you can configure a router to authenticate LSPs and SNPs separately in the following ways: The router sends LSPs and SNPs carrying the authentication TLV and verifies the authentication information of the received LSPs and SNPs. The router sends LSPs carrying the authentication TLV and verifies the authentication information of the received LSPs. The router sends SNPs carrying the authentication TLV but does not verify the authentication information of the received SNPs.
The router sends LSPs carrying the authentication TLV and verifies the authentication information of the received LSPs. The router sends SNPs without the authentication TLV and does not verify the authentication information of the received SNPs.
The router sends LSPs and SNPs carrying the authentication TLV but does not verify the authentication information of the received LSPs and SNPs.
Concepts Originating system: is a router that runs the IS-IS protocol. After LSP fragment extension is enabled, you can configure virtual systems for the router. The originating system refers to the IS-IS process. System ID: is the system ID of the originating system. Additional System ID: is configured for a virtual system after IS-IS LSP fragment extension is enabled. A maximum of 256 extended LSP fragments can be generated for each additional system ID. Like a normal system ID, an additional system ID must be unique in a routing domain. Virtual system: is a system identified by an additional system ID. It is used to generate extended LSP fragments. Principles IS-IS floods LSPs to advertise link state information. Because one LSP carries limited information, IS-IS fragments LSPs. Each LSP fragment is uniquely identified by and consists of the system ID, pseudonode ID (0 for a common LSP and a non-zero value for a pseudonode LSP), and LSP number (LSP fragment No.) of the node or pseudonode that generates the LSP. The length of the LSP number is 1 byte. Therefore, an IS-IS router can generate a maximum of 256 LSP fragments, restricting link information that can be advertised by the router.
The LSP fragment extension feature enables an IS-IS router to generate more LSP fragments. You can configure up to 50 virtual systems for the router. Each virtual system can generate a maximum of 256 LSP fragments. An IS-IS router can generate a maximum of 13,056 LSP fragments. An IS-IS router can run the LSP fragment extension feature in two modes. Mode-1 • It is used when some routers on the network do not support LSP fragment extension. • Virtual systems participate in SPF calculation. The originating system advertises LSPs containing information about links to each virtual system. Similarly, each virtual system advertises LSPs containing information about links to the originating system. Virtual systems look like the physical routers that connect to the originating system. • The LSP sent by a virtual system contains the same area address and overload bit as those in a common LSP. If the LSPs sent by a virtual system contain TLVs specified in other features, these TLVs must be the same as those in common LSPs. • The virtual system carries neighbor information indicating that the neighbor is the originating system, with the metric equal to the maximum value (64 for narrow metric) minus 1. The originating system carries neighbor information indicating that the neighbor is the virtual system, with the metric 0. This ensures that the virtual system is the downstream node of the originating system when other routers calculate routes. • As shown in the topology, R2 does not support LSP fragment extension, and R1 is configured to support LSP fragment extension in mode-1. R1-1 and R1-2 are virtual systems of R1 and send LSPs carrying some routing information of R1. After receiving LSPs from R1, R1-1, and R1-2, R2 considers that there are three individual routers at the remote end and calculates routes. Because the cost of the route from R1 to R1-1 and the cost of the route from R1 to R1-2 are both 0, the cost of the route from R2 to R1 is the same as the cost of the route from R2 to R1-1. • The LSPs that are generated by virtual systems contain only the originating system as the neighbor (the neighbor type is P2P). In addition, virtual systems are considered only as leaves. • Mode-2 • It is used when all the routers on the network support LSP fragment extension. In this mode, virtual systems do not participate in SPF calculation.
All the routers on the network know that the LSPs generated by virtual systems actually belong to the originating system. • R2 supports LSP fragment extension, and R1 is configured to support LSP fragment extension in mode-2. R1-1 and R1-2 are virtual systems of R1 and send LSPs carrying some routing information of R1. When receiving LSPs from R1-1 and R1-2, R2 obtains the IS Alias ID TLV and knows that the originating system of R1-1 and R1-2 is R1. R2 then considers that information advertised by R1-1 and R1-2 belongs to R1. Precautions After LSP fragment extension is configured, the system prompts you to restart the IS-IS process if information is lost because LSPs overflow. After being restarted, the originating system loads as much routing information as possible to LSPs, and adds the overloaded information to the LSPs of the virtual system for transmission. If there are devices of other vendors on the network, LSP fragment extension must be set to mode-1, otherwise, devices of other vendors cannot identify the LSPs. It is recommended that you configure LSP fragment extension and virtual systems before establishing IS-IS neighbor relationships or importing routes. If you establish IS-IS neighbor relationships or import routes, IS-IS will carry a lot of information that cannot be loaded through 256 fragments. You must configure LSP fragment extension and virtual systems. The configuration takes effect only after you restart the IS-IS router. Therefore, exercise caution when you establish IS-IS neighbor relationships or import routes.
IS-IS Administrative Tag Administrative tags control the advertisement of IP prefixes in an IS-IS routing domain to simplify route management. You can use administrative tags to control the import of routes of different levels and different areas and control IS-IS multi-instances (tags) running on the same router. Topology Assume that R1 only needs to receive only Level-1 routing information from R2, R3, and R4. To meet this requirement, configure the same administrative tag for IS-IS interfaces on R2, R3, and R4. Then configure the Level-1-2 router in area 47.0003 to leak only the routes matching the configured administrative tag from Level-2 to Level-1 areas. This configuration allows R1 to receive only Level-1 routing information from R2, R3, and R4. Precautions To use administrative tags, you must enable the IS-IS wide metric attribute.
Case Description In this case, the addresses for interconnecting devices are as follows: • If RX interconnects with RY, their interconnection addresses are XY.1.1.X and XY.1.1.Y respectively, network mask is 24. Remarks R4 and R5 are Level-1-2 routers. They take part in calculate the routes of Level-1 and Level-2 at the same time, and maintain the Level-1 and Level-2 LSDB.
Command Usage The is-level command sets the level of an IS-IS router. By default, the level of an IS-IS router is Level-1-2. The isis circuit-level command sets the link type of an interface. View is-level: IS-IS view isis circuit-level: interface view Parameters is-level { level-1 | level-1-2 | level-2 } level-1: sets a router as a Level-1 router, which calculates only intra-area routes and maintains a Level-1 LSDB. level-1-2: sets a router as a Level-1-2 router, which calculates Level-1 and Level-2 routes and maintains a Level-1 LSDB and a Level-2 LSDB. level-2: sets a router as a Level-2 router, which exchanges only Level-2 LSPs, calculates only Level-2 routes, and maintains a Level-2 LSDB. isis circuit-level [ level-1 | level-1-2 | level-2 ] level-1: specifies the Level-1 link type. That is, only Level-1 neighbor relationship can be established on the interface. level-1-2: specifies the Level-1-2 link type. That is, both Level-1 and Level-2 neighbor relationships can be established on the interface.
level-2: specifies the Level-2 link type. That is, only Level-2 neighbor relationship can be established on the interface. Precautions If a router is a Level-1-2 router and needs to establish a neighbor relationship at a specified level (Level-1 or Level2) with a peer router, you can run the isis circuit-level command to allow the local interface to send and receive only Hello packets of the specified level on the P2P link. This configuration prevents the router from processing too many Hello packets and saves the bandwidth. The configuration of the isis circuit-level command takes effect on the interface only when the IS-IS system type is Level-1-2, otherwise, the level configured using the islevel command is used as the link type. In a P2P network, the Circuit ID uniquely identifies a local interface. In a broadcast network, the Circuit ID is the system ID and pseudonode ID.
Case Description The topology in this case is the same as that in the previous case. It is required that no DIS can be elected between R4 and R6 or between R5 and R6. That is, the links between R4 and R6 and between R5 and R6 cannot be broadcast links. A priority that is as small as possible but can still enable a router to participate in the DIS election is 0.
Command Usage The isis dis-priority command sets the priority of the interface that is a candidate for the DIS at a specified level. The isis circuit-type command simulates the network type of an interface to P2P. View isis dis-priority: interface view isis circuit-type: interface view Parameters isis dis-priority priority [ level-1 | level-2 ] Specifies the priority for electing DIS. The value ranges from 0 to 127. The default value is 64. The greater the value of priority is, the higher the priority is. level-1 Indicates the priority for electing Level-1 DIS. level-2 Indicates the priority for electing Level-2 DIS. isis circuit-type p2p Sets the interface network type as P2P. Precautions The isis dis-priority command takes effect only on a broadcast link. The isis circuit-type command takes effect only on a broadcast interface. The network types of IS-IS interfaces on both ends of a link must be the same, otherwise, the two interfaces cannot establish a neighbor relationship. Configuration Verification Run the display isis interface process-id command, and view the DIS field in the command output.
Case Description The topology in this case is the same as that in the previous case. Company A requires route control. When configuring tags, you should also enable IS-IS wide metric on all devices in the network so that the tags can be transmitted in the entire network. In addition, Level-2 routes cannot be directly leaked to Level-1 areas and need to be configured manually.
Command Usage The import-route command configures IS-IS to import routes from other routing protocols. The import-route isis level-2 into level-1 command controls route leaking from Level-2 areas to Level-1 areas. The command needs to be configured on Level-1-2 routers that are connected to external areas. The cost-style command sets the cost style of routes sent and received by an IS-IS router. View import-route: IS-IS view import-route isis level-2 into level-1: IS-IS view cost-style: IS-IS view Parameters import-route isis level-2 into level-1 [ filter-policy { aclnumber | acl-name acl-name | ip-prefix ip-prefix-name | routepolicyroute-policy-name } | tag tag ] filter-policy: indicates the route filtering policy. acl-name: specifies the number of a basic ACL. acl-name acl-name: specifies the name of a named ACL. ip-prefix ip-prefix-name: specifies the name of an IP prefix. Only the routes that match the IP prefix can be imported. route-policy route-policy-name: specifies the name of a routing policy. tag tag: assigns administrative tags to the imported routes. cost-style { narrow | wide | wide-compatible }
narrow: indicates that the device can receive and send routes with cost style narrow. wide: indicates that the device can receive and send routes with cost style wide. wide-compatible: indicates that the device can receive routes with cost style narrow or wide but sends only routes with cost style wide. Precautions To transmit tags in the entire network, run the cost-style wide command on all devices in the network. Configuration Verification Run the display isis router command to view tag information.
Case Description The topology in this case is the same as that in the previous case. Company A reconstructs its network. IS-IS uses ACLs, IP prefix lists, and tags to control routes.
Command Usage The filter-policy import command allows IS-IS to filter the received routes to be added to the IP routing table. View filter-policy import: IS-IS view Parameters filter-policy { acl-number | acl-name acl-name | ip-prefix ipprefix-name | route-policy route-policy-name } import acl-number: specifies the number of a basic ACL. acl-name acl-name: specifies the name of a named ACL. ip-prefix ip-prefix-name: specifies the name of an IP prefix list. route-policy route-policy-name: specifies the name of a routing policy that filters routes based on tags and other protocol parameters. Precautions IS-IS can control routes and determine whether a route is added to the routing table. However, LSP transmission is not affected. The filter-policy export command takes effect only when it is used together with the filter-policy import command. +IP-Extended* indicates that wide metric is supported. The symbol * indicates that the route is learned through route leaking.
Case Description IS-IS authentication classifies into area authentication, routing domain authentication, and interface authentication.
Command Usage The area-authentication-mode command configures an IS-IS area to authenticate received Level-1 packets (LSPs and SNPs) using the specified authentication mode and password, or adds authentication information to Level-1 packets to be sent. The isis authentication-mode command configures an IS-IS interface to authenticate Hello packets using the specified mode and password. View area-authentication-mode: IS-IS view isis authentication-mode: interface view Parameters isis authentication-mode { simple password | md5 passwordkey } [ level-1 | level-2 ] [ ip | osi ] [ send-only ] simple password: indicates that the password is transmitted in plain text. md5 password-key: indicates that the password to be transmitted is encrypted using MD5. keychain keychain-name: specifies a keychain that changes with time. level-1: sets Level-1 authentication. level-2: sets Level-2 authentication. ip: indicates the IP authentication password. This parameter cannot be configured in the keychain authentication mode. osi: indicates the OSI authentication password. This parameter cannot be configured in the keychain authentication mode.
send-only: indicates that the router encapsulates sent Hello packets with authentication information but does not authenticate received Hello packets. area-authentication-mode { simple password | md5 passwordkey } [ ip | osi ] [ snp-packet { authentication-avoid | send-only } | all-send-only ] simple password: indicates that the password is transmitted in plain text. md5 password-key: indicates that the password to be transmitted is encrypted using MD5. keychain keychain-name: specifies a keychain that changes with time. ip: indicates the IP authentication password. This parameter cannot be configured in the keychain authentication mode. osi: indicates the OSI authentication password. This parameter cannot be configured in the keychain authentication mode. send-only: indicates that the router encapsulates sent Hello packets with authentication information but does not authenticate received Hello packets. all-send-only: indicates that the router encapsulates generated LSPs and SNPs with authentication information and does not authenticate received LSPs and SNPs. authentication-avoid: indicates that the router does not encapsulate generated SNPs with authentication information or authenticates received SNPs. The router encapsulates generated LSPs with authentication information and authenticates received LSPs. snp-packet: authenticates SNPs. Precautions The area-authentication-mode command takes effect only on Level-1 and Level-1-2 routers.
Case Description In this case, the addresses for interconnecting devices are as follows: • If RX interconnects with RY, their interconnection addresses are XY.1.1.X and XY.1.1.Y respectively, network mask is 24. R2 connects to R3 and R1 through serial interfaces. R1 and R3 connect through Ethernet interfaces. R1 connects to network 10.0.0.0/24 through G0/0/1.
Results You can run the display isis peer command to check whether neighbor relationships are established successfully.
Results You can run the display isis interface command to view the interface relationship.
Results You can run the display ip routing-table command to view the routing table.
Case Description In this case, the network runs IS-IS. Requirement analysis The log prompt function of IS-IS is disabled by default.
Results The nexthop command sets the preferences of equal-cost routes. After IS-IS calculates equal-cost routes using the SPF algorithm, the next hop is chosen from these equal-cost routes based on the value of weight. The smaller the value is, the higher the preference is. Parameters nexthop ip-address weight value ip-address: indicates the next hop address. weight value: indicates the next hop weight. The value is an integer that ranges from 1 to 254. The default value is 255.
Results The summary ip-address mask avoid-feedback | generate_null0_route command avoids learning the aggregation route again. It can also generate a route to the Null0 interface to prevent loops. You need to manually open logs of a neighbor.
OSPF topology: OSPF divides an Autonomous System (AS) into one or multiple logical areas. All areas are connected to Area 0.Area 0 is backbone Area.
Router type: Internal router: All interfaces on an internal router belong to the same OSPF area. Area Border Router (ABR): An ABR belongs to two or more areas, one of which must be the backbone area. An ABR is used to connect the backbone area and non-backbone areas. It can be physically or logically connected to the backbone area. Backbone router: At least one interface on a backbone router belongs to the backbone area. Internal routers in Area 0 and all ABRs are backbone routers. AS Boundary Router (ASBR): An ASBR exchanges routing information with other ASs. An ASBR does not necessarily reside on the border of an AS. It can be an internal router or an ABR. An OSPF device that has imported external routing information will become an ASBR. Differences between OSPF and IS-IS in the topology: In OSPF, a link can belongs to only one area.In IS-IS, a link can belong to different areas.
In IS-IS, no area is physically defined as the backbone or non-backbone area. In OSPF, Area 0 is defined as the backbone area. In IS-IS, Level-1 and Level-2 routers use the shortest path first (SPF) algorithm to generate shortest path trees (SPTs) respectively. In OSPF, the SPF algorithm is used only in the same area, and inter-area routes are forwarded by the backbone area.
OSPF supports the following network types: P2P: A network where the link layer protocol is PPP or HDLC is a P2P network by default. On a P2P network, protocol packets such as Hello packets, DD packets, LSR packets, LSU packets, and LSAck packets are sent in multicast mode using the multicast address 224.0.0.5. P2MP: No network is a P2MP network by default, no matter what type of link layer protocol is used on the network. A network can be changed to a P2MP network. The common practice is to change a non-fully meshed NBMA network to a P2MP network. On a P2MP network, Hello packets are sent in multicast mode using the multicast address 224.0.0.5, and other types of protocol packets, such as DD packets, LSR packets, LSU packets, and LSAck packets are sent in unicast mode. NBMA: A network where the link layer protocol is ATM or FR is an NBMA network by default. On an NBMA network, protocol packets such as Hello packets, DD packets, LSR packets, LSU packets, and LSAck packets are sent in unicast mode. Broadcast: A network with the link layer protocol of Ethernet or FDDI is a broadcast network by default. On a broadcast network, Hello packets, LSU packets, and LSAck packets are usually sent in multicast mode. The multicast addresses 224.0.0.5 is used by an OSPF device. The multicast address 224.0.0.6 is reserved for an OSPF designated router (DR). DD and LSR packets are transmitted in unicast mode.
DR/BDR functions Reduces the number of neighbors and further reduces the number of times that link-state information and routing information are updated. The DRother sets up full adjacency only with the DR/BDR. The DR and BDR set up full adjacency with each other. The DR generates Network-LSAs to describe information about the NBMA or broadcast network segment.
DR/BDR election rules When Hello is used for DR/BDR election, the DR/BDR is elected based on Router Priority of interfaces. If Router Priority is set to 0, the router cannot be elected as the DR or BDR. A larger value of Router Priority indicates a higher priority. If the value of Router Priority is the same on two interfaces, the interface with a larger Router ID is elected. The DR/BDR cannot preempt resources. If the DR is faulty, the BDR automatically becomes the new DR, and a new BDR is elected on the network. If the BDR is faulty, the DR does not change, and a new BDR is elected. Differences between IS-IS DIS and OSPF DR/BDR On an IS-IS broadcast network, routers with priority 0 still participate in DIS election. On an OSPF network, routers with priority 0 do not participate in DR election On an IS-IS broadcast network, when a new router meeting DIS conditions joins the network, the router is elected as the new DIS, and the original pseudonode is deleted. This causes LSP flooding. On an OSPF network, a new router will not immediately become the DR on the network segment even if the router has the highest DR priority. On an IS-IS broadcast network, routers with the same level on the same network segment form adjacencies with each other, including all non-DIS routers.
Overview of OSPF packets OSPF packets are transmitted at the network layer. The protocol number is 89. There are five types of OSPF packets, whose packet headers are in the same format. OSPF packets except the Hello packet carry LSA information. OSPF packet header information All OSPF packets have the same OSPF packet header. Version: specifies the OSPF protocol number. This field must be set to 2. Type: specifies the OSPF packet type. There are five types of OSPF packets. Packet length: specifies the total length of an OSPF packet, including the packet header. The unit is byte. Router ID: specifies the router ID of the router generating the packet Area ID: specifies the area to which the packet is to be advertised. Checksum: specifies the standard IP checksum of the entire packet (including the packet header). AuType: specifies the authentication mode Authentication: specifies information for authenticating packets, such as the password. Hello packet Network Mask: specifies the network mask of the interface sending Hello packets.
HelloInterval: specifies the interval for sending Hello packets, in seconds. Options: specifies optional functions supported by the OSPF router sending the Hello packet. Detailed functions are not mentioned in this course. Rtr Pri: specifies the router priority on the interface sending Hello packets. This field is used for electing the DR and BDR. RouterDeadInterval: specifies the interval for advertising that the neighbor router does not run OSPF on the network segment, in seconds. In most cases, the value of this field is four times HelloInterval. Designated Router: specifies the IP address of the DR elected by routers sending Hello packets. The value 0.0.0.0 of this field indicates that the DR is not elected. Backup Designated Router: specifies the IP address of the BDR elected by routers sending Hello packets. The value 0.0.0.0 of this field indicates that the BDR is not elected. Neighbor: specifies the neighbor router ID, indicating that the router has received valid Hello packets from neighbors.
DD packet Interface MTU: specifies the maximum IP data packet size that an interface on the originating router can send without fragmentation. The value of this field is 0x0000 on a virtual link. Options: is the same as that of the Hello packet. I-bit: is set to 1 for the first DD packet in a series of sent DD packets. The I-bit fields of subsequent DD packets are 0. M-bit: is set to 1 when the sent DD packet is not the last one. The M-bit field of the last DD packet is set to 0. MS-bit: advertises the router as the master router. DD Sequence Number: specifies the sequence number of the DD packet. LSA header information LSR packet Link State Advertisement Type: specifies the LSA type, which can be router-LSA, network-LSA, or other LSA types. Link State ID: varies depending on LSA types. Advertising Router: specifies the router ID of the originating router that advertises LSAs. LSU packet Number of LSAs: specifies the number of LSAs in an LSU packet. LSA: specifies detailed LSA information.
LSU packet Header of LSA: specifies LSA header information.
LSA header information contained in all OSPF packets excluding Hello packets LS age: specifies the age of the LSA, in seconds. Option: specifies optional performance that LSAs supported in some OSPF areas. LS type: identifies the format and functions of LSAs. There are five types of commonly used LSAs. Link State ID: varies with LSAs. Advertising Router: specifies router ID in the first LSA. Sequence Number: increases with the generation of LSA instances. This field allows other routers to identify latest LSA instances. Checksum: indicates the checksum of all information in an LSA. The checksum needs to be recalculated as the aging time increases. Length: specifies the length of an LSA, including the LSA header. Router-LSA (describing all interfaces or links on the originating router) Link State ID: specifies the router ID of the originating router. V: indicates that the originating router is an endpoint on one or more virtual links with full adjacency when this field is set to 1. E: is set to 1 when the originating router is an ASBR. B: is set to 1 when the originating router is an ABR. Number of links: specifies the number of router links described in an LSA. Link Type: indicates the link type. The value of this field can be: 1: P2P link to a device, point-to-point connection to another router
2: link to a transit network, such as broadcast or NBMA network 3: link to a subnet, such as Loopback interface 4: virtual link Link ID: specifies the link ID. The value of this field can be: 1: neighbor router ID 2: IP address of the interface on a DR 3: IP network or subnet address 4: neighbor router ID Link Data: indicates more information about a link. This field specifies the IP address of the interface on the originating router connected to the network when the value of Link Type is 1 or 2, and specifies the IP address or subnet mask of the network when the value of Link Type is 3. ToS: is not supported. Metric: specifies the metric of a link or interface.
Network-LSA Link State ID: specifies the IP address of the interface on a DR. Network Mask: specifies the IP address or subnet mask used on the network. Attached router: lists router IDs of the DR and all routers that have set up adjacency relationships with the DR on an NBMA network. Network-summary-LSA and ASBR-summary-LSA Link State ID: specifies the IP address of the network or subnet in a Type 3 LSA. In a Type 4 LSA, this field specifies the router ID of the ASBR. Network Mask: specifies the IP address or subnet mask of the network in a Type 3 LSA. In a Type 4 LSA, this field has no meaning and is set to 0.0.0.0. Metric: specifies the metric of a route to the destination. AS-external-LSA Link State ID: Indicates the advertised network or subnet’IP
address.
Network Mask: specifies the destination IP address or subnet mask. E: specifies the type of the external route. The value 1 indicates the E2 metric, and the value 0 indicates the E1 metric. Metric: specifies the metric of a route and is set by an ASBR. Forwarding Address: specifies the forwarding address (FA) of a packet destined for a specific destination address. When this field is set to 0.0.0.0, the packet is forwarded to the originating router. External Route Tag: identifies an external route.
NSSA LSA Forwarding Address: When an internal route is advertised between an NSSA ASBR and the neighboring AS, this field is set to the next-hop address of the local network. When the internal route is not used for advertisement, this field is set to the interface ip of the stub network,such as loopback,if have multi stub network,choose the maximum ip address.
Options field: DN: prevents loops on an MPLS VPN network. When a type 3, 5, or 7 LSA is sent from a PE to a CE, the DN bit MUST be set. When the PE receives, from a CE router, a type 3, 5, or 7 LSA with the DN bit set, the information from that LSA MUST NOT be used during the OSPF route calculation. O: indicates that the originating router supports Opaque LSAs (Type 9, 10, and 11 LSAs). DC-bit: indicates that the originating router supports OSPF capabilities of on-demand links. EA: indicates that the originating router can receive and forward External-Attributes-LSA(type8 LSA). N-bit: exists only in Hello packets. The value 1 indicates that the router supports Type 7 LSAs. The value 0 indicates the router does not receive or send NSSA LSAs. P-bit: exists only in NSSA LSAs. This field instructs an NSSA ABR to convert the Type 7 LSA into a Type 5 LSA. MC-bit: indicates that the originating router supports multicast, this bit will be set. E-bit: indicates that the originating router can receive AS external LSAs. This field is set to 1 in all Type 5 LSAs and LSAs that are sent from the backbone area and NSSA areas. This field is set to 0 in LSAs that are sent from stub areas. This field in a Hello packet indicates that the interface can receive and send Type 5 LSAs. MT-bit: indicates that the originating router supports MOSPF.
Neighbor status: Down: It is the initial stage of setting up sessions between neighbors. In this state, a router receives no message from its neighbor. Init: A router has received Hello packets from its neighbor but is not in the neighbor list of the received Hello packets. The router has not established bidirectional communication with its neighbor. In this state, the neighbor is in the neighbor list of Hello packets. 2-Way: In this state, bidirectional communication has been established but the router has not established the adjacency relationship with the neighbor. This is the highest state before the adjacency relationship is established. When routers are located on a broadcast or NBMA network, the routers elect the DR/BDR.
When the neighbor relationship is established, routers negotiate parameters carrying in Hello packets. If the network type of the interface receiving Hello packets is P2MP or NBMA, the Network Mask field in Hello packets must be the same as the network mask of the interface receiving the Hello packets. If the network type of the interface is P2P or virtual link, the Network Mask field is not checked. The HelloInterval and RouterDeadInterval fields in a Hello packet must be the same as those on the interface receiving the Hello packet. The Authentication field in a Hello packet must be the same as that on the interface receiving the Hello packet. The E-bit option in a Hello packet must be the same as that on the interface receiving in the area configuration. The Area ID field in a Hello packet must be the same as that on the interface receiving the Hello packet.
Neighbor relationship setup: When the neighbor state machine is ExStart on R1, R1 sends the first DD packet to R2. Assume that in fields in this DD packet are set as follows: DD Sequence Number is set to 552A. I-bit is set to 1, indicating that the DD packet is the first DD packet. M-bit is set to 1, indicating that more DD packets are to be sent. MS-bit is set to 1, indicating that R1 advertises itself as the master router. When the neighbor state machine is ExStart on R2, R2 sends the first DD packet in which DD Sequence Number is set to 5528 to R1. The router ID of R2 is larger than that of R1; therefore, R2 functions as the master router. After the comparison of router IDs is complete, R1 generates a NegotiationDone event and changes its neighbor state machine from ExStart to Exchange. When the neighbor state machine is Exchange on R1, R1 sends a new DD packet containing the local LSDB. In the DD packet, DD Sequence Number is set to the sequence number of the DD packet sent by R2, M-bit is set to 0 indicating no other DD packet is required for describing the local LSDB, and MS-bit is set to 0 indicating that R1 advertises itself as the slave router. After receiving the DD packet, R2 generates a NegotiationDone event and changes its neighbor state machine to Exchange. When the neighbor state machine is Exchange on R2, R2 sends a new DD packet containing the local LSDB. In this DD packet, DD Sequence Number is increased by 1 (5528 + 1 = 5529).
R1 as the slave router needs to acknowledge each DD packet from R2 even through R1 does not need to update its LSDB using new DD packets. R1 sends an empty DD packet with DD Sequence Number of 5529. When the neighbor state machine is Loading on R1, R1 sends a Link State Request (LSR) packet to request link state information that is learned from DD packets when the neighbor state machine is Exchange but not contained in the local LSDB. After receiving the LSR packet, R2 sends a Link State Update (LSU) packet containing detailed link state information to R1. When receiving the LSU packet, R1 changes its neighbor state machine from Loading to Full. R1 then sends a Link State Acknowledgement (LSAck) packet to R2 to ensure information transmission reliability. LSAck packets are flooded to acknowledge the receiving of LSAs.
OSPF can define areas as stub and totally stub areas. A stub area is a special area where ABRs do not flood the received AS external routes. The ABR in a stub area maintains fewer routing entries and transmits less routing information. The stub area is an optional configuration, but not all areas can be configured as stub areas. Generally, a stub area is a non-backbone area with only one ABR and is located at the AS boundary. To ensure the reachability of AS external routes, the ABR in a stub area generates a Type 3 LSA carrying a default route and advertises it within the entire stub area.
Stub area The backbone area cannot be configured as a stub area. If an area needs to be configured as a stub area, all the routers in this area must be configured with stub attributes. An ASBR cannot exist in a stub area. That is, AS external routes are not flooded in the stub area. A virtual link cannot pass through a stub area. Type 5 LSAs cannot be advertised within a stub area. A router in the stub area must learn AS external routes from the ABR. The ABR automatically generates a Type 3 LSA carrying a default route and advertises it within the entire stub area. The router can then learn the AS external network from the ABR. Totally stub area Neither Type 3 nor Type 5 LSAs can be advertised within a totally stub area.
A router in the totally stub area must learn AS external and interarea network from an ABR. The ABR automatically generates a Type 3 LSA and advertises it within the entire totally stub area.
To prevent a large number of external routes from consuming the bandwidth and storage resources of routers in a stub area, OSPF defines that stub areas cannot import external routes. However, stub areas cannot meet the requirements of the scenario that requires the import of external routes while preventing resources from being consumed by external routes. Therefore, NSSA areas are introduced. Type 7 LSA Type 7 LSAs are defined in an NSSA Area to describe AS external routes. Type 7 LSAs are generated by an ASBR in an NSSA area and advertised only within the NSSA area of this ASBR. When receiving Type 7 LSAs, an ABR in an NSSA selectively translates the Type 7 LSAs to Type 5 LSAs so that external routes can be advertised in other areas of the OSPF network. Type 7 LSAs can be used to carry default route information to guide traffic to other ASs.
To advertise the external routes imported by an NSSA area to other areas, ABRs in the NSSA area needs to translate Type 7 LSAs to Type 5 LSAs so that the external routes can be advertised on the entire OSPF network. The P-bit informs routers whether Type 7 LSAs need to be translated. The ABR with the largest router ID in an NSSA area translates Type 7 LSAs to Type 5 LSAs. Only when the P-bit is set and Forwarding Address is not 0, a Type 7 LSA can be translated to a Type 5 LSA. Forwarding Address figure out the destination address inside the ospf domain for the external routes.
The default Type 7 LSAs meeting the preceding conditions can also be translated. The Type 7 LSAs generated by ABRs are not set with the P-bit.
Precautions Multiple ABRs may be deployed in an NSSA area. To prevent routing loops, ABRs do not calculate the default routes advertised by each other.
NSSA and totally NSSA A small number of AS external routes learned from the ASBR in an NSSA area can be imported to the NSSA area. Type 5 LSAs cannot be advertised within the NSSA area, but routers can learn the AS external routes from the ASBR. Neither Type 3 nor Type 5 LSAs can be advertised within a totally NSSA.
Fast convergence I-SPF improves this algorithm. With exception to where calculation is performed for the first time, only changed nodes, as opposed to all nodes, are involved in calculation. The SPT ultimately generated is the same as that generated by the previous algorithm. This decreases the CPU usage and speeds up network convergence. Similar to I-SPF, PRC calculates only the changed routes. PRC, however, does not calculate the shortest path. PRC updates routes based on the SPT calculated by I-SPF. In route calculation, a leaf represents a route, and a node represents a router. A change in the SPT or leaf causes a change in routing information, but changes in the SPT or leaf and routing information are not dependent on each other. PRC processes routing information based on the SPT or leaf changes: • When the SPT is changed, the PRC processes routing information on all leaves of the changed nodes. • When the SPT is not changed, PRC does not process routing information on nodes. • When a leaf is changed, PRC processes routing information on the changed leaf. • When the leaf is not changed, PRC does not process routing information on the leaf. The OSPF intelligent timer controls the route calculation, LSA generation, and receiving of LSAs to speed up network convergence. The OSPF intelligent timer speeds up network convergence in the following modes:
•
•
•
On a network where routes are frequently calculated, the OSPF intelligent timer dynamically adjusts the interval for calculating routes based on the user configuration and exponential backoff technology. In this manner, the route calculation and CPU resource consumption are decreased. Routes are calculated after the network topology becomes stable. On an unstable network, if a router generates or receives LSAs due to frequent topology changes, the OSPF intelligent timer can dynamically adjust the interval for calculating routes. No LSA is generated or handled within an interval, which prevents invalid LSAs from being generated and advertised on the entire network. The OSPF intelligent timer helps calculate routes as follows: • Based on the local LSDB, a router that runs OSPF calculates the SPT with itself as the root using the SPF algorithm, and determines the next hop to the destination network according to the SPT. Changing the interval for SPF calculation can prevent the bandwidth and resource consumption caused by frequent LSDB changes. • On a network that requires short route convergence time, specify the interval for route calculation in milliseconds to increase the route calculation frequency and speed up route convergence. • When the OSPF LSDB changes, the shortest path needs to be recalculated. If a network changes frequently and the shortest path is calculated continually, a large number of system resources will be consumed, affecting router performance. You can configure an intelligent timer and set a proper interval for SPF calculation to prevent memory and bandwidth resources from being consumed. • After the OSPF intelligent timer is used: • The initial interval for SPF calculation is specified by the parameter start-interval. • The interval for SPF calculation for the nth (n is larger than or equal to 2) time is equal to hold-interval x 2 x (n – 1). • When the interval specified by hold-interval x 2 x (n – 1) reaches the maximum interval specified by max-interval, OSPF performs SPF calculation at the maximum interval for three consecutive times. Then perform step 1 again for SPF calculation at the initial interval specified by start-interval.
Priority-based convergence Filter routes based on the IP prefix list. Set different priorities for the routes so that routes with the highest priority are preferentially converged, improving network reliability.
Setting the maximum number of non-default external routes on a router can prevent an OSPF database overflow. You must set the same maximum number of non-default routes for all routers on an OSPF network. If the number of external routes on a router reaches the configured maximum number, the router enters the overflow state and starts the overflow timer. The router automatically leaves the overflow state after the overflow timer expires. The default timeout period is 5 seconds. The OSPF database overflow process is as follows: When entering the overflow state, a router deletes all non-default external routes that are generated by itself. When staying in the overflow state, the router does not generate non-default external routes, discards newly received, non-default routes, and does not reply with an LSAck packet. When the overflow timer expires, the router checks whether the number of external routes still exceeds the maximum value. If so, restart the timer; if not, the router leaves the overflow state. When leaving the overflow state, the router deletes the overflow timer, generates non-default external routes, receives new nondefault external routes, replies with LSAck packets, and gets ready to enter the overflow state again.
During OSPF deployment, all non-backbone areas must be connected to the backbone area to ensure that all areas are reachable.
Two ABRs use a virtual link to directly transmit OSPF packets. The routers between the two ABRs only forward packets. Because the destination of OSPF packets is not these routers, the routers transparently forward the OSPF packets as common IP packets. If a virtual link is not properly deployed, a loop may occur.
When the two authentication types exist, use authentication based on interfaces.
The OSPF default route is generally applied to the following scenarios: An ABR in an area advertises Type 3 LSAs carrying the default route within the area. Routers in the area use the received default route to forward inter-area packets. An ASBR in an area advertises Type 5 or Type 7 LSAs carrying the default route within the AS. Routers in the AS use the received default route to forward AS external packets. Precautions When no exactly matched route is discovered, a router can forward packets through the default route. Due to hierarchical management of OSPF routes, the priority of default Type 3 routes is higher than the priority of default Type 5 or Type 7 routes. If an OSPF router has advertised LSAs carrying a default route, the router does not learn this type of LSA advertised by other routers, which carry a default route. That is, the router uses only the LSAs advertised by itself to calculate routes. The LSAs advertised by others are still saved in the LSDB. If a router has to use a route to advertise LSAs carrying an external default route, the route cannot be a route learned by the local OSPF process. This is because a router in an area uses default external routes to forward packets outside the area, whereas the routes in the AS have the next hop pointing to devices within the AS. Principles for advertising default routes in different areas Common area
• •
•
By default, OSPF routers in a common OSPF area do not automatically generate default routes, even if the common OSPF area has default routes. NSSA area • To advertise AS external routes using the ASBR in an NSSA area and advertise other external routes through other areas, configure a default Type 7 LSA on the ABR and advertise this LSA in the entire NSSA area. In this way, a small number of AS external routes can be learned from the ASBR in the NSSA, and other inter-area routes can be learned from the ABR in the NSSA area. • To advertise all the external routes using the ASBR in the NSSA area, configure a default Type 7 LSA on the ASBR and advertise this LSA in the entire NSSA area. In this way, all the external routes are advertised using the ASBR in the NSSA area. • The preceding configurations are performed using the same command in different views. The difference between these two configurations is described as follows: An ABR will generate a default Type 7 LSA regardless of whether the routing table contains the default route 0.0.0.0. An ASBR will generate a default Type 7 LSA only when the routing table contains the default route 0.0.0.0. • An ABR does not translate Type 7 LSAs carrying a default route into Type 5 LSAs carrying a default route or flood them to the entire AS. Totally NSSA area • All routers in the totally NSSA area must learn AS external routes from the ABR.
Route filtering LSAs are not filtered during route learning. Route filtering can only determine whether calculated routes are added to the routing table. The learned LSAs are complete.
Precautions Stub areas and database overflow can also implement the LSA filtering function.
This figure shows the process of establishing the neighbor relationship and process of neighbor status changes. Down: It is the initial stage of setting up sessions between neighbors. In this state, a router receives no message from its neighbor. On an NBMA network, the router can still send Hello packets to the neighbor with static configurations. PollInterval specifies the interval for sending Hello packets and its value is usually the same as the value of RouterDeadInterval. Attempt: This state exists only on the NBMA network and indicates that the router receives no message from the neighbor. In this state, the router periodically sends packets to the neighbor at an interval of HelloInterval. If the router receives no Hello packets from the neighbor within RouterDeadInterval, the state changes to Down. Init: A router has received Hello packets from its neighbor but is not in the neighbor list of the received Hello packets. The router has not established bidirectional communication with its neighbor. In this state, the neighbor is in the neighbor list of Hello packets. 2-WayReceived: A router knows that bidirectional communication with the neighbor has started, that is, the router is in the neighbor list of Hello packets received from the neighbor. If the router needs to establish the adjacency relationship with the neighbor, the router enters the ExStart state and starts database synchronization. If the router fails to establish the adjacency relationship with the neighbor, the router enters the 2-Way state.
2-Way: In this state, bidirectional communication has been established but the router has not established the adjacency relationship with the neighbor. This is the highest state before the adjacency relationship is established. 1-WayReceived: The router knows that it is not in the neighbor list of Hello packets received from the neighbor. This is caused by the restart of the neighbor.
The state machines in the figure are described as follows: ExStart: This is the first step for establishing the adjacency relationship. In this state, the router starts to send DD packets to the neighbor. The two neighbors start to negotiate the master/slave status and determine the sequence numbers of DD packets. DD packets transmitted in this state do not contain the local LSDB. Exchange: The router exchanges DD packets containing the local LSDB with its neighbor. Loading: The router exchanges LSR packets with the neighbor for requesting LSAs and exchanges LSU packets for advertising LSAs. Full: The local LSDBs on the two routers have been synchronized.
OSPF supports P2P, P2MP, NBMA, and multicast networks. IS-IS supports only P2P and broadcast networks. OSPF works only at the network layer and the protocol number is 89.
When an OSPF neighbor relationship is established, the two routers check the mask, authentication mode, Hello/dead interval, and area ID in Hello packets. The conditions for establishing an IS-IS neighbor relationship are relatively loose. Establishing a neighbor relationship over an OSPF P2P link requires a three-way handshake. Establishing an IS-IS neighbor relationship does require a three-way handshake. Huawei devices are enabled with the three-way handshake function on an IS-IS P2P network by default, which ensuring reliability for establishing the neighbor relationship. An IS-IS neighbor relationship has level 1 and level 2. The election of an OSPF DR/BDR is based on the priority and IP address. The elected DR/BDR cannot be preempted. On an OSPF network, all DRothers establish full adjacency relationships with DRs/BDRs, and establish 2-way adjacency relationships with each other. When the priority of a router on the OSPF network is 0, the router does not participate in the DR/BDR election. The election of an IS-IS DIS is based on the priority and MAC address. The elected DIS can be preempted. On an IS-IS network, all routers establish adjacency relationships with each other. If the priority of a router on the IS-IS network is 0, the router can still participate in the DIS election and just has a lower priority.
IS-IS supports a few type of LSPs but provides good extension capabilities through the TLV field contained in LSPs.
OSPF costs are calculated based on bandwidth. IS-IS supports the default cost, delay cost, overhead cost, and error cost. IS-IS uses the default cost for implementation.
Case Description The NBMA network topology is displayed in this case. Other devices are connected based on the following rules: • If RX is interconnected with RY, their interconnection addresses are XY.1.1.X and XY.1.1.Y respectively, network mask is 24.
Command Usage The peer command sets the IP address and DR priority of the neighboring router on an NBMA network. On an NBMA network, a router cannot discover neighboring routers by broadcasting Hello packets. You must manually specify IP addresses and DR priorities of neighboring routers. View OSPF view Parameters peer ip-address [ dr-priority priority ] ip-address: specifies the IP address for a neighboring router. dr-priority priority: specifies the priority for the neighbor to select a DR.
Precautions In the routing table on R3, the routing entry mapping the IP address 12.1.1.2/32 exits. This is caused by the PPP echo function. When this function is disabled, the routing entry mapping this 32-bit IP address does not exist.
Case Description The network topology in this case is the same as the previous topology. Area 3 is not directly connected to Area 0, and therefore cannot communicate with other areas.
Command Usage The vlink-peer command creates and configures a virtual link. View OSPF area view
Parameters vlink-peer router-id router-id: specifies the router ID of the virtual link neighbor. Configuration Verification Run the display ospf vlink command to view information about the OSPF virtual link. Remarks A virtual link needs to be configured for R4.
Case Description The network topology in this case is the same as the previous topology. Company A requires control on the DR. To meet this requirement, change the DR priorities of routers. The DR/BDR cannot be preempted.
Command Usage The ospf dr-priority command sets the priority of an interface that participates in the DR election. View Interface view Parameters ospf dr-priority priority priority: specifies the priority of an interface that participates in the DR/BDR election. A larger value indicates a higher priority. Precautions If the DR priority of an interface on a router is 0, the router cannot be elected as a DR or a BDR. In OSPF, the DR priority cannot be configured for null interfaces. Note that the DR/BDR cannot be preempted even if the DR priority is changed.
Configuration Verification Run the display ospf peer command to view information about neighbors in OSPF areas.
Case Description The network topology in this case is the same as the previous topology. This is the network extension requirement. On an OSPF FR network, the default interval for sending Hello packets is 30 seconds, and the default interval for sending is 120 seconds. When the neighbor relationship is invalid, the interval for sending Hello packets is 120 seconds.
Command Usage The ospf timer hello command sets the interval for sending Hello packets on an interface. The ospf timer poll command sets the poll interval for sending Hello packets on an NBMA network. View ospf timer hello: interface view ospf timer poll: interface view Parameters ospf timer hello interval interval: specifies the interval for sending Hello packets on an interface. ospf timer poll interval interval: specifies the poll interval for sending Hello packets. Precautions By default, the intervals for sending Hello packets are 10 seconds on P2P and broadcast interfaces and 30 seconds on P2MP and NBMA interfaces respectively. Ensure that parameters are set to the same on the local interface and the remote interface of the neighboring router.
Remarks
On an NBMA network, after the neighbor relationship is invalid, the router sends Hello packets periodically at the interval specified using the ospf timer poll command. The poll interval must be at least four times of the interval for sending Hello packets.
Perform the same interface configuration on R4 as that on R2 and R3.
Case Description This case is an extension to the original case. Perform configurations on the basis of the original case. Imported routes are advertised in E2 mode by default, and the default cost value is 1.
Command Usage The import-route command imports routes learned by other routing protocols. The ospf cost command sets the cost of a route on an OSPFenabled interface. View import-route: OSPF view ospf cost: interface view Parameters import-route[ cost cost | type type ] cost cost: specifies the cost of a route. type type: specifies the cost type. ospf cost cost cost: specifies the cost of an OSPF-enabled interface. Precautions On a non-PE device, only EBGP routes are imported after the import-route bgp command is configured. IBGP routes are also imported after the import-route bgp permit-ibgp command is configured. If IBGP routes are imported, routing loops may occur. In this case, run the preference (OSPF) and preference (BGP) commands to set the priority of OSPF ASE routes to lower than that of IBGP routes.
Case Description This case is an extension to the original case. Perform configuration on the basis of the original case. If R6 does not want to receive routes from network 172.16.X.0/24, filter Type 3 LSAs on R5.
Command Usage The filter-policy export command configures a filtering policy to filter the imported routes when these routes are advertised in Type 5 LSAs within the AS. This command can be configured only on an ASBR to filter Type 5 LSAs. The filter-policy import command configures a filtering policy to filter intra-area, inter-area, and AS external routes received by OSPF. On routers within an area, this command can be used to filter only routes; on an ABR, this command can be used to filter Type 3 LSAs. View filter-policy export: OSPF view filter-policy import: OSPF view Parameters filter-policy { acl-number | acl-name acl-name | ip-prefix ipprefix-name } export [ protocol [ process-id ] ] acl-number: specifies the basic ACL number. acl-name acl-name: specifies the ACL name. ip-prefix ip-prefix-name: specifies the name of an IP prefix list. protocol: specifies the protocol for advertising routing information. process-id: specifies the process ID when RIP, IS-IS, or OSPF is used for advertising routing information.
filter-policy { acl-number | acl-name acl-name | ip-prefix ipprefix-name } import acl-number: specifies the basic ACL number. acl-name acl-name: specifies the ACL name. ip-prefix ip-prefix-name: specifies the name of an IP prefix list. Precautions Type 5 LSAs are generated on an ASBR to describe AS external routes and advertised to all areas (excluding stub and NSSA areas). The filter-policy command needs to be configured on an ASBR. To advertise only routing information meeting specific conditions, run the filter-policy command to set filtering conditions.
Case Description This case is an extension to the original case. Perform configuration on the basis of the original case. Configure Area 1 as an NSSA area.
Command Usage The nssa command configures an OSPF area as an NSSA area. View OSPF area view Parameters nssa [ default-route-advertise | flush-waiting-timer intervalvalue | no-import-route | no-summary | set-n-bit |suppressforwarding-address | translator-always | translatorinterval interval-value | zero-address-forwarding ] * default-route-advertise: generates default Type 7 LSAs on an ABR or ASBR and then advertises them to the NSSA area. flush-waiting-timer interval-value: specifies the interval for an ASBR to send aged Type 5 LSAs. This parameter takes effect for once only. no-import-route: indicates that no external route is imported to the NSSA area. no-summary: indicates that an ABR is prohibited from sending Type 3 LSAs to the NSSA area. set-n-bit: sets the N-bit in DD packets. suppress-forwarding-address: sets the FA of the Type 5 LSAs translated from Type 7 LSAs by the NSSA ABR to 0.0.0.0.
translator-always: specifies an ABR in an NSSA area as an all-the-time translator. Multiple ABRs in an NSSA area can be configured as translators. translator-interval interval-value: specifies the timeout period of a translator. zero-address-forwarding: sets the FA of the generated NSSA LSAs to 0.0.0.0 when external routes are imported from an ABR in an NSSA area.
Precautions The parameter default-route-advertise is configured to advertise Type 7 LSAs carrying the default route. Regardless of the route 0.0.0.0 exists in the routing table, Type 7 LSAs carrying the default route will be generated on an ABR. However, Type 7 LSAs carrying the default route will be generated only when the route 0.0.0.0 exists in the routing table on an ASBR. When the area to which the ASBR belongs is configured as an NSSA area, invalid Type 5 LSAs from other routers in the area where LSAs are flooded will be reserved. These LSAs will be deleted only when the aging time reaches 3600 seconds. The router performance is affected because the forwarding of a large number of LSAs consumes the memory resources. The parameter flush-waiting-timer is configured to generate Type 5 LSAs with the aging time of 3600 seconds. Invalid Type 5 LSAs on other routers are therefore cleared in a timely manner. The parameter flush-waiting-timer does not take effect when the ASBR also functions as an ABR. In this way, Type 5 LSAs in nonNSSA areas will not be deleted.
Case Description This case is an extension to the original case. Perform configuration on the basis of the original case. Note that the virtual link belongs to Area 0.
Command Usage The authentication-mode command sets the authentication mode and password for an OSPF area. After this command is executed, interfaces on all routers in an OSPF area use the same authentication mode and password. View OSPF view Parameters authentication-mode { md5 | hmac-md5 } [ keyid { plain plaintext | [ cipher ] ciphertext } ] md5 password-key: indicates the MD5 authentication using the ciphertext password. hmac-md5: indicates HMAC-MD5 authentication using the ciphertext password. key-id: specifies an authentication ID, which must be the same on the two ends. keychain: indicates keychain authentication. keychain-name: specifies the keychain name. authenticationmode simple [ [ plain ] plaintext | cipher ciphertext ] simple password: indicates simple authentication. plain: indicates authentication using the plaintext password. If this parameter is specified, the device allows you to set only a plaintext key, and the key is displayed in plaintext mode in the configuration file.
plaintext: specifies a plaintext password. cipher: specifies a ciphertext password. If this parameter is specified, the device allows you to set only a ciphertext key, and the key is displayed in ciphertext mode in the configuration file. ciphertext: specifies a ciphertext password. Precautions The authentication modes and passwords of all the devices must be the same in an area, but can be different in different areas. The authentication-mode command used in the interface view takes precedence over the authentication-mode command used in the OSPF area view.
Case Description If RX is interconnected with RY, their interconnection addresses are XY.1.1.X/24 and XY.1.1.Y/24 respectively.
Configuration Verification Run the display ospf peer brief command to check whether the neighbor relationship is established.
Configuration Verification Run the tracert command to trace traffic on R3. The command output shows that traffic on R3 reaches S0/0/0 on R1 through the Ethernet link.
Configuration Verification Run the display ip routing-table command to view the routing table. During the route summarization, original tags are removed. Therefore, tags need to be added in the next route summarization.
Case Description The network runs OSPF.
Analysis To make R1 select the path through area 2 to reach the networks in area 1,we must make the path through area2 work as it is passing through area 0.virtual link meet the needs.when virtual link is established,R1 will compare the cost of the two path and choose the path with lower cost as the best.
Configuration Verification Only the external LSA (10.0.0.0) exists in the LSDB on R2.
Configuration Verification All neighbor relationships on R3 are correct, indicating successful authentication.
BGP is a dynamic routing protocol used between ASs. BGP-1 (defined in RFC 1105), BGP-2 (defined in RFC 1163), and BGP-3 (defined in RFC 1267) are three earlier-released BGP versions. BGP exchanges reachable inter-AS routes, establishes inter-AS paths, avoids routing loops, and applies routing policies between ASs. The current BGP version is BGP-4 defined in RFC 4271. As an external routing protocol on the Internet, BGP is widely used among Internet Service Providers (ISPs). BGP has the following characteristics: BGP is an EGP. Different from Interior Gateway Protocols (IGPs) such as Open Shortest Path First (OSPF) and Routing Information Protocol (RIP), BGP controls route advertisement and selects optimal routes between ASs rather than discover or calculate routes. BGP uses the Transport Control Protocol (TCP) with listening port 179 as the transport layer protocol. TCP enhances BGP reliability with requiring a dedicated mechanism to ensure connectivity. • BGP needs to select inter-AS routes, which requires high protocol stability. TCP with high reliability therefore is used to enhance BGP stability. • BGP peers must be logically connected and establish TCP connections. The destination port number is 179, and the local port number is random.
When routes are updated, BGP transmits only the updated routes. This greatly reduces the bandwidth occupied by BGP route advertisements. Therefore, BGP applies to the transmission of a large number of routes on the Internet. BGP is designed to avoid loops. • Inter-AS: BGP routes carry information about the ASs along the path. The routes that carry the local AS number are discarded to avoid inter-AS loops. • Intra-AS: BGP does not advertise the routes learned in an AS to BGP peers in the AS. In this manner, intra-AS loops are avoided. BGP provides rich routing policies to flexibly filter and select routes. BGP provides a route flapping prevention mechanism, which effectively improves Internet stability. BGP is easy to extend and adapts to network development. It is mainly extended using TLVs.
An AS is a group of routers that are managed by a single technical administration and use the same routing policy. An AS is a group of routers that are managed by a single technical administration and use the same routing policy. Each AS has a unique AS number, which is assigned by the Internet Assigned Numbers Authority (IANA). An AS number ranges from 1 to 65535. Values 1 to 64511 are registered Internet numbers, while values 64512 to 65535 are private AS numbers. Each AS on a BGP network is assigned a unique AS number to identify the AS. Currently, 2-byte AS and 4-byte AS numbers are available. A 2-byte AS number ranges from 1 to 65535, while a 4byte AS number ranges from 1 to 4294967295. Devices supporting 4-byte AS numbers are compatible with devices supporting 2-byte AS numbers.
EBGP and IBGP IBGP: runs within an AS. To prevent routing loops within an AS, a BGP device does not advertise the routes learned from an IBGP peer to other IBGP peers, and establishes full-mesh connections with all the IBGP peers. EBGP: runs between ASs. To prevent routing loops between ASs, a BGP device discards routes containing the local AS number when receiving routes from EBGP peers. Device roles in BGP message exchange Speaker: The device that sends BGP messages is called a BGP speaker. The speaker receives and generates new routes, and advertises the routes to other BGP speakers. Peer: The speakers that exchange messages with each other are called BGP peers. A group of peers sharing the same policies can form a peer group.
BGP peers exchange five types of messages: Open, Update, Keepalive, Notification, and Route-Refresh messages. Open message: is used to establish BGP peer relationships. It is the first message sent after a TCP connection is set up. After a BGP peer receives an Open message and the peer negotiation succeeds, the BGP peer sends a Keepalive message to confirm and maintain the peer relationship. Subsequently, BGP peers can exchange Update, Notification, Keepalive, and Route-refresh messages. Update message: is used to exchange routes between BGP peers. Update messages can be used to advertise multiple reachable routes with the same attributes or to withdraw multiple unreachable routes. • An Update message can be used to advertise multiple reachable routes with the same attributes. These routes can share a group of route attributes. The route attributes in an Update message apply to all the destination addresses (expressed by IP prefixes) in the Network Layer Reachability Information (NLRI) field of the Update message. • An Update message can be used to withdraw multiple unreachable routes. Each route is identified by its destination address (expressed by an IP prefix), which identifies the routes previously advertised between BGP speakers.
•
An Update message can be used only to withdraw routes. In this case, it does not need to carry route attributes or NLRI. Similarly, an Update message can be used only to advertise reachable routes, so it does not need to carry information about withdrawn routes. Keepalive message: is periodically sent to the BGP peer to maintain the peer relationship. Notification message: is sent to the BGP peer when an error is detected. The BGP connection is then terminated immediately. Route-Refresh message: is used to request the BGP peer resend routes when the BGP inbound routing policy changes. If all BGP routers have the Route-Refresh capability, the local BGP router sends a Route-Refresh message to BGP peers when the BGP inbound routing policy changes. After receiving the Route-Refresh message, the BGP peers resend their routing information to the local BGP router. In this manner, the BGP routing table can be dynamically updated, and the new routing policy can be used without terminating BGP connections. A BGP peer notifies its peer of its Route-Refresh capability by sending an Open message. BGP message applications BGP uses TCP port 179 to set up a connection. BGP connection setup requires a series of dialogues and handshakes. TCP advertises parameters such as the BGP version, BGP connection holdtime, local router ID, and authorization information in an Open message during handshake negotiation. After a BGP connection is set up, a BGP router sends the BGP peer an Update message that carries the attributes of a route to be advertised. This helps the BGP peer select the optimal route. When local BGP routes change, a BGP router sends an Update message to notify the BGP peer of the changes. After two BGP peers exchange routes for a period of time, they do not have new routes to be advertised and need to periodically send Keepalive messages to maintain the validity of the BGP connection. If the local BGP router does not receive any BGP message from the BGP peer within the holdtime, the local BGP router considers that the BGP connection has been terminated, tears down the BGP connection, and deletes all the BGP routes learned from the peer. When the local BGP router detects an error during the operation, for example, it does not support the peer BGP version or receives an invalid Update message, it sends the BGP peer a Notification message to report the error. Before terminating a BGP connection with the peer, the local BGP router also needs to send a Notification message to the peer. BGP message header Marker: A 16-byte field fixed to a value of 1.
Length: A 2-byte unsigned integer that indicates the total length of a message, including the header. Type: A 1-byte field that specifies the type of a message: • Open • Update • Keepalive • Notification • Route-Refresh
Open message format Version: Indicates the BGP version number. For BGPv4, the value is 4. My Autonomous System: Indicates the local AS number. Comparing the AS numbers on both ends, you can determine whether a BGP connection is an IBGP or EBGP connection. Hold Time: Indicates the time during which two BGP peers maintain a BGP connection between them. During the peer relationship setup, two BGP peers need to negotiate the holdtime and keep the holdtime consistent. If two BGP peers have different holdtime periods configured, the shorter holdtime is used. If the local BGP router does not receive a Keepalive message from the peer within the holdtime, it considers that the BGP connection is terminated. If the holdtime is 0, no Keepalive message is sent. BGP Identifier: Indicates the router ID of a BGP router. It is expressed as an IP address to identify a BGP router. Opt Parm Len (Optional Parameters Length): Indicates the optional parameter length. The value 0 indicates that no optional parameters are available. Optional Parameters: These are used for BGP authentication or Multiprotocol Extensions. Each parameter is a 3-tuple (Parameter Type-Parameter Length-Parameter Value). Update message format Withdrawn Routes Length: A 2-byte unsigned integer that indicates the total length of the Withdrawn Routes field. The value 0 indicates that the Withdrawn Routes field is not present in this Update message. Withdrawn Routes: A variable-length field that contains a list of IP address prefixes for the routes to be withdrawn. Each IP address prefix is in format. For example, <19,198.18.160.0> indicates a network at 198.18.160.0 255.255.224.0. Path Attribute Length: A 2-byte unsigned integer that indicates the total length of the Path Attribute field. The value 0 indicates that the Path Attribute field is not present in an Update message. Network Layer Reachability Information: Contains a list of IP address prefixes. This variable length field is in the same format as the Withdrawn Routes: .
Keepalive message format A Keepalive message has only the message header. By default, the interval for sending Keepalive messages is 60 seconds, and the holdtime is 180 seconds. Each time a BGP router receives a Keepalive message from its peer, it resets the hold timer. If the hold timer expires, it considers the peer to be 'down'. Notification message format Errorcode: A 1-byte field that uniquely identifies an error. Each error code may have one or more error subcodes. If no error subcode is defined for an error code, the Error Subcode Field is all 0s. Errsubcode: Indicates an error subcode.
A BGP finite state machine (FSM) has six states: Idle, Connect, Active, OpenSent, OpenConfirm, and Established. The Idle state is the initial BGP state. In Idle state, a BGP device refuses all the connection requests from neighbors. The BGP device initiates a TCP connection with its BGP peer and changes its state to ‘connect’ only after receiving a start event from the system. • A start event occurs when an operator configures a BGP process, resets an existing BGP process or when the router software resets a BGP process. • If an error occurs in any FSM state, for example, the BGP device receives a notification message or TCP connection termination notification, the BGP device returns to the Idle state. In the connect state, the BGP device starts the ConnectRetry timer and waits to establish a TCP connection. The ConnectRetry timer defaults to 32 seconds. • If a TCP connection is established, the BGP device sends an open message to the peer and changes to the OpenSent state. • If a TCP connection fails to be established, the BGP device moves to the Active state. • If the BGP device does not receive a response from the peer before the ConnectRetry timer expires, the BGP device attempts to establish a TCP connection with another peer and stays in the connect state.
•
If another event (started by the system or operator) occurs, the BGP device returns to the Idle state. In the Active state, the BGP device keeps trying to establish a TCP connection with the peer. • If a TCP connection is established, the BGP device sends an open message to the peer, closes the ConnectRetry timer, and changes to the OpenSent state. • If a TCP connection fails to be established, the BGP device stays in the Active state. • If the BGP device does not receive a response from the peer before the ConnectRetry timer expires, the BGP device returns to the connect state. In the OpenSent state, the BGP device waits for an Open message from the peer and then checks the validity of the received Open message, including the AS number, version, and authentication password. • If the received Open message is valid, the BGP device sends a Keepalive message and changes to the OpenConfirm state. • If the received Open message is invalid, the BGP device sends a Notification message to the peer and returns to the Idle state. In OpenConfirm state, the BGP device waits for a Keepalive or Notification message from the peer. If the BGP device receives a Keepalive message, it transitions to the Established state. If it receives a Notification message, it returns to the Idle state. In Established state, the BGP device exchanges Update, Keepalive, Route-Refresh, and Notification messages with the peer. • If the BGP device receives a valid Update or Keepalive message, it considers that the peer is working properly and maintains the BGP connection with the peer. • If the BGP device receives a valid Update or Keepalive message, it sends a Notification message to the peer and returns to the Idle state. • If the BGP device receives a Route-refresh message, it does not change its state. • If the BGP device receives a Notification message, it returns to the Idle state. • If the BGP device receives a TCP connection termination notification, it terminates the TCP connection with the peer and returns to the Idle state.
A BGP device adds optimal routes to the BGP routing table to generate BGP routes. After establishing a BGP peer relationship with a neighbor, the BGP device follows the following rules to exchange routes with the peer: Advertises the BGP routes received from IBGP peers only to its EBGP peers.
Advertises the BGP routes received from EBGP peers to all its EBGP peers and IBGP peers.
Advertises the optimal route to its peers when there are multiple valid routes to the same destination.
Sends only updated BGP routes when BGP routes change.
BGP routing information processing When receiving Update messages from peers, a BGP router saves the Update messages to the routing information base (RIB) and specifies the Adj-RIB-In of the peer from which the Update messages are received. After these Update messages are filtered by the inbound policy engine, the BGP router determines the optimal route for each prefix according to the route selection algorithm. The optimal routes are saved in the local BGP RIB (Loc-RIB) and then submitted to the local IP route selection table (IPRIB). In addition to the optimal routes received from peers, Loc-RIB also contains the BGP prefixes that are selected as the optimal routes and injected by the current router (locally originated routes). Before the routes in Loc-RIB are advertised to other peers, these routes must be filtered by the outbound policy engine. Only the routes that pass the filtering of the outbound policy engine can be installed to the RIB (Adj-RIB-Out).
Synchronization is performed between IBGP and IGP to prevent misleading routers in other ASs. Topology description (when synchronization is enabled) R4 learns the route to 10.0.0.0/24 advertised by R1 through BGP and checks whether local IGP routing tables contain the route. If so, R4 advertises the route to R5. If not, R4 does not advertise the route to R5. Precautions: By default synchronization is disabled on VRP platform, and it can not be changed. Only under two conditions,we can disable the synchronization: The local AS is not a transit AS. All the routers within the local AS set up full-mesh IBGP connections.
BGP route attributes are a set of parameters that further describe BGP routes. Using BGP route attributes, BGP can filter and select routes. Common attributes are as follows: Origin: A well-known mandatory attribute. AS_Path: A well-known mandatory attribute. Next_Hop: A well-known mandatory attribute. Local_Pref: A well-known discretionary attribute. Community: An optional transitive attribute. MED: An optional non-transitive attribute. Originator_ID: An optional non-transitive attribute. Cluster_List: An optional non-transitive attribute.
The Origin attribute defines the origin of a route and marks the path of a BGP route. The Origin attribute is classified into the following types: IGP: A route with the Origin attribute IGP is an IGP route and has the highest priority. For example, the Origin attribute of the routes injected to the BGP routing table using the network command is IGP. EGP: A route with the Origin attribute EGP is an EGP route and has the secondary highest priority. Incomplete: A route with the Origin attribute Incomplete is learned by other means and has the lowest priority. For example, the Origin attribute of the routes imported by BGP using the import-route command is Incomplete.
The AS_Path attribute records all the ASs that a route passes through from a source to a destination in the distance-vector order. To prevent inter-AS routing loops, a BGP device does not accept the EBGP routes of which the AS_Path list contains the local AS number. Assume that a BGP speaker advertises a local route: When advertising the route to other ASs, the BGP speaker adds the local AS number to the AS_Path list, and then advertises it to neighboring routers in Update messages. When advertising the route to the local AS, the BGP speaker creates an empty AS_Path list in an Update message. Assume that a BGP speaker advertises a route learned in the Update message sent by another BGP speaker: When advertising the route to other ASs, the BGP speaker adds the local AS number to the leftmost of the AS_Path list. According to the AS_Path attribute, the BGP router that receives the route can determine the ASs through which the route has passed to the destination. The number of the AS that is nearest to the local AS is placed on the leftmost of the list, and the other AS numbers are listed according to the sequence in which the route passes through ASs. When advertising the route to the local AS, the BGP speaker does not change the AS_Path attribute of the route.
Topology description When R4 advertises route 10.0.0.0/24 to AS 400 and AS 100, it adds the local AS number to the AS_Path list. When R5 advertises the route to AS 100, it also adds the local AS number to the AS_Path list. When R1 and R3 in AS 100 advertise the route to R2 in the same AS, they keep the AS_Path attribute of the route unchanged. R2 selects the route with the shortest AS_Path when other BGP routing rules are the same. That is, R2 reaches 10.0.0.0/24 through R3.
The Next_Hop attribute records the next hop that a route passes through. The Next_Hop attribute of BGP is different from that of an IGP because it may not be the neighbor IP address. A BGP speaker processes the Next_Hop attribute based on the following rules: When advertising a locally originated route to an IBGP peer, the BGP speaker sets the Next_Hop attribute of the route to be the IP address of the local interface through which the BGP peer relationship is established. When advertising a route to an EBGP peer, the BGP speaker sets the Next_Hop attribute of the route to be the IP address of the local interface through which the BGP peer relationship is established. When advertising a route learned from an EBGP peer to an IBGP peer, the BGP speaker does not change the Next_Hop attribute of the route.
Local_Pref attribute This attribute indicates the BGP preference of a router. It is exchanged only between IBGP peers and not advertised to other ASs. This attribute helps determine the optimal route when traffic leaves an AS. When a BGP router obtains multiple routes to the same destination address but with different next hops from IBGP peers, the router prefers the route with the highest Local_Pref. Topology description R1,R2,R3 are IBGP Peers of each other in AS 100, R2 establish EBGP Peer with AS 200 and R3 establish EBGP Peer with AS 300. So R2 and R3 will learn route 10.0.0.0/24 from EBGP, R1 learns two routes to 10.0.0.0/24 from two IBGP peers (R2 and R3) in the local AS. Prefers R2 routing 10.0.0.0/24 to other ASs in AS100, it need configure the Local_Pref with R2 and R3: one with Local_Pref value 300 from R2 and the other with Local_Pref value 200 from R3. R1 prefers the route learned from R2.
The MED attribute helps determine the optimal route when traffic enters an AS. When a BGP router obtains multiple routes to the same destination address but with different next hops from EBGP peers, the router selects the route with the smallest MED value as the optimal route if the other attributes of the routes are the same. The MED attribute is exchanged only between two neighboring ASs. The AS that receives this attribute does not advertise the attribute to any other AS. This attribute can be manually configured. If the MED attribute is not configured for a route, the MED attribute of the route uses the default value 0. Topology description R1 and R2 advertise routes 10.0.0.0/24 to their respective EBGP peers R3 and R4. When other routing rules are the same, R3 and R4 prefer the route with a smaller MED value. That is, R3 and R4 access network 10.0.0.0/24 through R1.
The Community attribute is a set of destination addresses with the same characteristics. It is expressed as a 4-byte list and in the aa:nn or community number format. aa:nn: The value of aa or nn ranges from 0 to 65535. The administrator can set a specific value as required. Generally, aa indicates the AS number and nn indicates the community identifier defined by the administrator. For example, if a route is from AS 100 and its community identifier defined by the administrator is 1, the Community attribute is 100:1. Community number: An integer that ranges from 0 to 4294967295. As defined in RFC 1997, numbers from 0 (0x00000000) to 65535 (0x0000FFFF) and from 4294901760 (0xFFFF0000) to 4294967295 (0xFFFFFFFF) are reserved. The Community attribute helps simplify application, maintenance, and management of routing policies. With the community, a group of BGP routers in multiple ASs can share the same routing policy. This attribute is a route attribute and is transmitted between BGP peers without being restricted by ASs. Before advertising a route with the Community attribute to peers, a BGP router can change the original Community attribute of this route. Well-known community attributes Internet: All routes belong to the Internet community by default. A route with this attribute can be advertised to all BGP peers.
No_Advertise: A device does not advertise a received route with the No_Advertise attribute to any peer. No_Export: A BGP device does not advertise a received route with the No_Export attribute to devices outside the local AS. If a confederation is defined, the route with the No_Export attribute cannot be advertised to ASs outside of the confederation but to other sub-ASs in the confederation. No_Export_Subconfed: BGP device does not advertise the received route with the No_Export_Subconfed attribute to devices outside the local AS or to devices outside the local sub-AS in a confederation.
BGP routing rules The next-hop addresses of routes must be reachable. The PrefVal attribute is a Huawei proprietary attribute and is valid only on the device where it is configured. If a route does not have the Local_Pref attribute, the Local_Pref attribute of the route uses the default value 100. You can use the default local-preference command to change the default local preference of BGP routes. Locally generated routes include the routes imported using the network or import-route command, manually summarized routes, and automatically summarized routes. • Summarized routes have a higher priority than nonsummarized routes. • Manually summarized routes generated using the aggregate command have a higher priority than automatically summarized routes generated using the summary automatic command. • Routes imported using the network command have a higher priority than routes imported using the importroute command. Prefers the route with the shortest AS_Path. • The AS_Path length does not include AS_CONFED_SEQUENCE and AS_CONFED_SET. • An AS_SET counts as 1 no matter how many AS numbers the AS_SET contains.
•
BGP does not compare the AS_Path attributes of routes after the bestroute as-path-ignore command is executed. Prefers the route with the lowest MED. • BGP compares only the MED values of routes sent from the same AS (excluding a confederation sub-AS). That is, BGP compares the MED values of two routes only when the first AS numbers in the AS_SEQUENCE attributes (excluding the AS_CONFED_SEQUENCE) of the two routes are the same. • If a route does not have the MED attribute, BGP considers the MED value of the route as the default value 0. After the bestroute med-none-as-maximum command is executed, BGP considers the MED value of the route as the maximum value 4294967295. • After the compare-different-as-med command is executed, BGP compares the MEDs in the routes sent from peers in different ASs. Do not use this command unless different ASs use the same IGP and route selection mode, otherwise routing loops may occur. • After the bestroute med-confederation command is executed, BGP compares the MED values of routes only when the AS_Path does not contain external AS numbers (sub-ASs that do not belong to a confederation) and the first AS number in AS_CONFED_SEQUENCE is the same. • After the deterministic-med command is executed, routes are not selected in the sequence in which routes are received.
Load Balancing When there are multiple equal-cost routes to the same destination, you can perform load balancing among these routes to load balance traffic. Equal-cost BGP routes can be generated for traffic load balancing only when the rules before the attibutes "Prefers the route with the lowest IGP metric“ are the same.
BGP security MD5: BGP uses TCP as the transport layer protocol. To ensure BGP security, you can perform MD5 authentication during the TCP connection setup. MD5 authentication, however, does not authenticate BGP messages. Instead, it sets the MD5 authentication password for a TCP connection, and the authentication is performed by TCP. If the authentication fails, no TCP connection is set up. After GTSM is enabled for BGP, an interface board checks the TTL values in all BGP messages. In actual networking, packets whose TTL values are not within the specified range are either allowed to pass through or discarded by GTSM. To configure GTSM to discard packets by default, you can set a correct TTL value range according the network topology. Subsequently, messages whose TTL values are not within the specified range are discarded. This function avoids attacks from bogus BGP messages. This function is mutually exclusive to multi-hop EBGP. The number of routes received from peers is limited to prevent resource exhaustion attacks. The AS_Path lengths on the inbound and outbound interfaces are limited. Packets that exceed the limit of the AS_Path length are discarded.
Route dampening helps solve the problem of route instability. In most cases, BGP is used on complex networks where route flapping occurs frequently. To prevent frequent route flapping, BGP uses route dampening to suppress unstable routes.
Route dampening measures the stability of a route using a penalty value. A larger penalty value indicates a less stable route. Each time route flapping occurs, BGP increases the penalty of a route by a value of 1000. During route flapping, a route changes from active to inactive. When the penalty value of the route exceeds the suppression threshold, BGP suppresses this route and does not add it to the IP routing table or advertise any Update message to BGP peers. After a route is suppressed for a period of time (half life), the penalty value is reduced by half. When the penalty value of a route decreases to the reuse threshold, the route becomes reusable and is added to the routing table. At the same time, BGP advertises an Update message to peers. The penalty value, suppression threshold, and half life can be manually configured.
Route dampening applies only to EBGP routes but not IBGP routes. IBGP routes often include the routes from the local AS, which requires that the forwarding tables of devices within an AS be the same. In addition, IGP fast convergence aims to achieve information synchronization.
If IBGP routes were dampened, forwarding tables on devices would be inconsistent when these devices have different dampening parameters. Route dampening therefore does not apply to IBGP routes.
Case description IP addresses used to interconnect devices are designed as follows: • If RTX connects to RTY, interconnected addresses are XY.1.1.X and XY.1.1.Y.Network mask is 24. • Loopback interface addresses of R1, R2, R3, R6, and R7 are shown in the figure. Case analysis To establish stable IBGP peer relationships, use loopback interface addresses and static routes within an AS. To establish EBGP peer relationships, use physical interface addresses.
Command usage The peer as-number command sets the AS number of a specified peer (or peer group). The peer connect-interface command specifies a source interface that sends BGP messages and a source address used to initiate a connection. The peer next-hop-local command configures a BGP device to set its IP address as the next hop of routes when it advertises the routes to an IBGP peer or peer group. View BGP process view Parameters peer ipv4-address as-number as-number ip-address: specifies the IPv4 address of a peer. as-number: specifies the AS number of the peer. peer ipv4-address connect-interface interface-type interfacenumber [ ipv4-source-address ] ip-address: specifies the IPv4 address of a peer. interface-type interface-number: specifies the interface type and number. ipv4-source-address: specifies the IPv4 source address used to set up a connection. peer ipv4-address next-hop-local
ip-address: specifies the IPv4 address of a peer. Precautions When using a loopback interface to send BGP messages: • Ensure that the loopback interface address of the BGP peer is reachable. • In the case of an EBGP connection, you need to run the peer ebgp-max-hop command to enable EBGP to establish the peer relationship in indirect mode. The peer next-hop-local and peer next-hop-invariable commands are mutually exclusive. The PrefRcv field in the display bgp peer command output indicates the number of route prefixes received from the peer.
Case description The topology in this case is the same as that in the previous case. Perform the configuration based on the configuration in the previous case. R1 prefers routes to 10.0.X.0/24 with next hop R2 because BGP prefers the route advertised by the router with the smallest router ID.
Command usage The peer route-policy command specifies a route-policy to control routes received from, or to be advertised to a peer or peer group.
View BGP view Parameters peer ipv4-address route-policy route-policyname { import | export } ipv4-address: specifies an IPv4 address of a peer. route-policy-name: specifies a route-policy name. import: applies a route-policy to routes to be imported from a peer or peer group. export: applies a route-policy to routes to be advertised to a peer or peer group. Configuration verification Run the display bgp routing-table command to view the BGP routing table.
Case description The topology in this case is the same as that in the previous case. Company A requires that R1 access network 10.0.1.0/24 through R7. To meet this requirement, you can enable R4 to access network 10.0.1.0/24 through R7 using the MED attribute.
Command usage The peer route-policy command specifies a route-policy to control routes received from, or to be advertised to a peer or peer group.
View BGP view Parameters peer ipv4-address route-policy route-policyname { import | export } ipv4-address: specifies an IPv4 address of a peer. route-policy-name: specifies a route-policy name. import: applies a route-policy to routes to be imported from a peer or peer group. export: applies a route-policy to routes to be advertised to a peer or peer group. Configuration verification Run the display bgp routing-table command to view the BGP routing table.
Case description The topology in this case is the same as that in the previous case. To meet the requirement, use the Community attribute.
Command usage The peer route-policy command specifies a route-policy to control routes received from, or to be advertised to a peer or peer group.
View BGP view Parameters peer ipv4-address route-policy route-policyname { import | export } ipv4-address: specifies an IPv4 address of a peer. route-policy-name: specifies a route-policy name. import: applies a route-policy to routes to be imported from a peer or peer group. export: applies a route-policy to routes to be advertised to a peer or peer group. Configuration verification Run the display bgp routing-table community command to view the attributes in the BGP routing table.
Case description This case is an extension to the previous case. Perform the configuration based on the configuration in the previous case.
Command usage The peer route-policy command specifies a route-policy to control routes received from, or to be advertised to a peer or peer group. The peer default-route-advertise command configures a BGP device to advertise a default route to its peer or peer group. View peer route-policy: BGP view peer default-route-advertise: BGP view Parameters peer ipv4-address route-policy route-policyname { import | export } ipv4-address: specifies an IPv4 address of a peer. route-policy-name: specifies a route-policy name. import: applies a route-policy to routes to be imported from a peer or peer group. export: applies a route-policy to routes to be advertised to a peer or peer group. peer { group-name | ipv4-address } default-routeadvertise [ route-policy route-policy-name ] [ conditionalroute-match-all{ ipv4-address1 { mask1 | mask-length1 } } &<1-4> | conditional-route-match-any { ipv4address2 { mask2 | mask-length2 } } &<1-4> ]
ipv4-address: specifies an IPv4 address of a peer. route-policy route-policy-name: specifies a routepolicy name. conditional-route-match-all ipv4address1{ mask1 | mask-length1 }: specifies the IPv4 address and mask/mask length for conditional routes. The default routes are sent to the peer or peer group only when all conditional routes are matched. conditional-route-match-any ipv4address2{ mask2 | mask-length2 }: specifies the IPv4 address and mask/mask length for conditional routes. The default routes are sent to the peer or peer group only when any conditional route is matched. Configuration verification Run the display ip routing-table command to view IP routing table information.
Case description This case is an extension to the previous case. Perform the configuration based on the configuration in the previous case.
Command usage The maximum load-balancing command configures the maximum number of equal-cost routes. View
BGP view Parameters maximum load-balancing [ ebgp | ibgp ] number ebgp: implements load balancing among EBGP routes. ibgp: implements load balancing among IBGP routes. number: specifies the maximum number of equal-cost routes in the BGP routing table. Precautions The maximum load-balancing number command cannot be used together with the maximum load-balancing ebgp number or maximum load-balancing ibgp number command. If the maximum load-balancing ebgp number or maximum load-balancing ibgp number command is executed, the maximum load-balancing number command does not take effect.
Configuration verification Run the display ip routing-table protocol bgp command to view the load-balanced routes learned by BGP.
Case description This case is an extension to the previous case. Perform the configuration based on the configuration in the previous case. After GTSM is enabled between R6 and R8, the hop count should be 1.
Command usage The peer valid-ttl-hops command applies the GTSM function on the peer or peer group. The gtsm default-action command configures the default action to be taken on the packets that do not match the GTSM policy. The gtsm log drop-packet command enables the log function on a board to log information about the packets discarded by GTSM on the board. View peer valid-ttl-hops: BGP view gtsm default-action: system view gtsm log drop-packet: system view Parameters peer ipv4-address valid-ttl-hops [ hops ] ipv4-address: specifies the IPv4 address of a peer. hops: specifies the number of TTL hops to be checked. The value is an integer that ranges from 1 to 255. The default value is 255. If the value is configured as hops, the valid TTL range of the detected packet is [255 - hops + 1, 255]. gtsm default-action { drop | pass }
drop: discards the packets that do not match the GTSM policy. pass: allows the packets that do not match the GTSM policy to pass through.
Precautions GTSM and EBGP-MAX-HOP affect the TTL values of sent BGP packets. The two functions are mutually exclusive. If the default action is configured but the GTSM policy is not configured, GTSM does not take effect.
Case description In the topology, among the IP addresses that are not marked, Rx and Ry connect using IP addresses XY.1.1.X/24 and XY.1.1.Y/24.
Results Run the displayvlan command to view the results.
Results Run the display bgp peer command to view the BGP peer relationship.
Results Run the display bgp routing-table command to view the BGP routing table. The command output shows that 2.2.2.2/32 and 3.3.3.3/32 have been advertised.
Results The loop is the result of inconsistency between IGP route selection and BGP route selection.
Case description In the topology, among the IP addresses that are not marked, Rx and Ry connect using IP addresses XY.1.1.X/24 and XY.1.1.Y/24.
Analysis process Run the display bgp routing-table community command to view the attributes.
Results You will notice that the Community attribute of route 10.0.0.0/24 is labeled as <400:1>, no-export on R2.
Results You can add the AS_Path Attribute to change the route selection of R3.
To ensure connectivity between IBGP peers, you need to establish fullmesh connections between IBGP peers. If there are n routers in an AS, you need to establish n(n-1)/2 IBGP connections. When there are a large number of IBGP peers, many network resources and CPU resources are consumed. A route reflector (RR) can be used between IBGP peers to solve this problem. In an AS, a router functions as an RR, and other routers function as clients. The RR and its clients establish IBGP connections and form a cluster. The RR reflects routes to clients, removing the need to establish BGP connections between clients. RR concepts RR: a BGP device that can reflect the routes learned from an IBGP peer to other IBGP peers. Client: an IBGP device of which routes are reflected by an RR to other IBGP devices. In an AS, clients only need to directly connect to the RR. Non-client: an IBGP device that is neither an RR nor a client. In an AS, a non-client must establish full-mesh connections with the RR and all the other non-clients. Originator: a device that originates routes in an AS. The Originator_ID attribute helps eliminate routing loops in a cluster. Cluster: a set of an RR and clients. The Cluster_List attribute helps eliminate routing loops between clusters.
An RR advertises learned routes to IBGP peers based on the following rules: The RR advertises the routes learned from an EBGP peer to all the clients and non-clients. The RR advertises the routes learned from a non-client IBGP peer to all the clients. The RR advertises the routes learned from a client to all the other clients and all the non-clients. An RR is easy to configure because it needs to be configured only on the device that functions as a reflector, and clients do not need to know that they are clients. In some networks, if clients of an RR establish full-mesh connections among themselves, they can directly exchange routing information. In this case, route reflection between clients is unnecessary and wastes bandwidth. You can run the undo reflect between-clients command on the VRP Platform to prohibit an RR from reflecting the routes received from a client to other clients.
The originator ID identifies the originator of a route and is generated by an RR to prevent routing loops in a cluster. When an RR reflects a route for the first time, the RR adds the Originator_ID attribute to this route. The Originator_ID attribute identifies the originator of the route. If the route already contains the Originator_ID attribute, the RR retains this Originator_ID attribute. When a device receives a route, the device compares the originator ID of the route with the local router ID. If they are the same, the device discards the route. An RR and its clients form a cluster, which is identified by a unique cluster ID in an AS. To prevent routing loops between clusters, an RR uses the Cluster_List attribute to record the cluster IDs of all the clusters that a route passes through. When an RR reflects a route between clients, or between clients and non-clients, the RR adds the local cluster ID to the top of the cluster list. If there is no cluster list, the RR creates a Cluster_List attribute. When receiving an updated route, the RR checks the cluster list of the route. If the cluster list contains the local cluster ID, the RR discards the route. If the cluster list does not contain the local cluster ID, the RR adds the local cluster ID to the cluster list and then reflects the route.
Backup RR prevents single-point failures. Backup RR On the VRP, you need to run the reflector cluster-id command to set the same cluster ID for all the RRs in the same cluster. When redundant RRs exist, a client receives multiple routes to the same destination from different RRs and then selects the optimal route according to BGP route selection policies. The Cluster_List attribute prevents routing loops between different RRs in the same AS. Topology description When Client1 receives an updated route 10.0.0.0/24 from an external peer, it advertises the route to RR1 and RR2 through IBGP. After RR1 receives the updated route, it reflects the route to other clients (Client2 and Client3) and adds the local cluster ID to the top of the cluster list. After RR2 receives the updated route, it checks the cluster list and finds that its cluster ID has been contained in the cluster list. Subsequently, it discards the route without reflecting the route to its clients.
A backbone network is divided into multiple clusters. RRs of the clusters are non-clients and establish full-mesh connections with one other. Although each client only establishes an IBGP connection with its RR, all the BGP routers in the AS can receive reflected routing information.
A level-1 RR (RR1) is deployed in Cluster1, while RRs (RR2 and RR3) in Cluster2 and Cluster3 function as clients of RR1.
Confederation A confederation divides an AS into sub-ASs. Full-mesh IBGP connections are established in each sub-AS, while EBGP connections are established between sub-ASs. ASs outside a confederation still consider the confederation as an AS. After a confederation divides an AS into sub-ASs, it assigns a confederation ID (the AS number) to each router within the AS. The original IBGP attributes are retained, including the Local_Pref attribute, MED attribute, and Next_Hop attribute. Confederation-related attributes are automatically deleted when being advertised outside a confederation. The administrator therefore does not need to configure the rules for filtering information such as sub-AS numbers at the egress of a confederation.
The AS_Path attribute is a well-known mandatory attribute. It consists of ASs and has the following types: AS_SET: comprises a series of ASs in a disorderly manner and is carried in an Update message. When network summarization occurs, you can use policies to prevent path information loss using AS_SET. AS_SEQUENCE: comprises a series of ASs in sequence and is carried in an Update message. Generally, the AS_Path type is AS_SEQUENCE. AS_CONFED_SEQUENCE: comprises a series of member ASs in a confederation in sequence and is carried in an Update message. Similar to AS_SEQUENCE, AS_CONFED_SEQUENCE can only be transmitted within a local confederation. AS_CONFED_SET: comprises a series of member ASs in a confederation in a disorderly manner and is carried in an Update message. Similar to AS_SET, AS_CONFED_SET can only be transmitted within a local confederation. Member AS numbers within a confederation are invisible to other ASs outside the confederation. When routes are therefore advertised to other ASs outside the confederation, member AS numbers are removed.
Comparison between a route reflector and a confederation A confederation requires an AS to be divided into sub-ASs, changing the network topology a lot. Only an RR needs to be configured, and clients do not need to be configured. The confederation needs to be configured on all the devices. RRs must establish full-mesh IBGP connections. Route reflectors are widely used, while confederations are seldom used.
The BGP routing table of each device on a large network is large. This burdens devices, increases the route flapping probability, and affects network stability. Route summarization is a mechanism that combines multiple routes into one route. This mechanism allows a BGP device to advertise only the summarized route but not all the specific routes to peers. It reduces the BGP routing table size. If the specific routes flap, the network is not affected, therefore improving network stability. Route summarization uses the Aggregator attribute. This attribute is an optional transitive attribute and identifies the node where route summarization occurs and carries the router ID and AS number of the node.
Precautions The summary automatic command summarizes the routes imported by BGP, including direct routes, static routes, RIP routes, OSPF routes, and IS-IS routes. After summarization is configured, BGP summarizes routes according to the natural network segment and suppresses specific routes in the BGP routing table. This command is only valid for the routes imported using the network command. BGP advertises only summarized routes to peers. BGP does not start automatic summarization by default.
Manual summarization Summarized routes do not carry the AS_Path attribute of detail routes. Using the AS_SET attribute to carry the AS number can prevent routing loops. Differences between AS_SET and AS_SEQUENCE are as follows: In AS_SET, the AS list is often used to perform route summarization, and AS numbers are added to the AS list in a disorderly manner. In AS_SEQUENCE, AS numbers are added to the AS list in the sequence in which a route passes through. Adding the AS_SET attribute to summarized routes may cause routing flapping.
RFC 5291 and RFC 5292 define the prefix-based BGP outbound route filtering (ORF) capability to advertise required BGP routes. BGP ORF allows a device to send prefix-based inbound policies in a RouteRefresh message to BGP peers. BGP peers then construct outbound policies based on these inbound policies to filter routes before sending these routes. This capability has the following advantages: Prevents the local device from receiving a large number of unnecessary routes. Reduces CPU usage of the local device. Simplifies the configuration of BGP peers. Improves link bandwidth efficiency. Case description Among directly-connected EBGP peers, after negotiating the prefix-based ORF capability with R1, Client2 adds local prefixbased inbound policies to a Route-Refresh message and sends the message to R1. R1 then constructs outbound policies based on the received Route-Refresh message and sends required routes to Client1 using a Route-Refresh message. Client1 receives only the required routes, and R1 does not need to maintain routing policies. In this manner, the configuration workload is reduced. Client1 and Client2 are clients of the RR. Client1, Client2, and the RR negotiate the prefix-based ORF capability. Client1 and Client2 then add local prefix-based inbound policies to RouteRefresh messages and send the messages to the RR.
The RR constructs outbound policies based on the received inbound policies and reflects required routes in Route-Refresh messages to Client1 and Client2. Client1 and Client2 receive only the required routes, and the RR does not need to maintain routing policies. The configuration workload is thereby reduced.
Active-Route-Advertise Once a route is preferred by BGP, the route can be advertised to peers by default. When Active-Route-Advertise is configured, only the route preferred by BGP and also active at the routing management layer is advertised to peers. Active-Route-Advertise and the bgp-rib-only command are mutually exclusive. The bgp-rib-only command prohibits BGP routes from being advertised to the IP routing table.
BGP dynamic update peer-groups BGP sends routes based on peers by default, even though the peers have the same outbound policies. After this feature is enabled, BGP groups each route only once and then sends the route to all the peers in the update-group, improving grouping efficiency exponentially. Topology description RR1 has three clients and needs to reflect 100,000 routes to these clients. If RR1 sends the routes grouped per peer to the three clients, the total number of times that all routes are grouped is 300,000 (100,000 x 3). After the dynamic update peer-groups feature is used, the total number of grouping times changes to 100,000 (100,000 x 1), improving grouping performance by a factor of 3.
Roles defined in 4-byte AS number New speaker: a peer that supports 4-byte AS numbers Old speaker: a peer that does not support 4-byte AS numbers New session: a BGP connection between new speakers Old session: a BGP connection between a new speaker and an old speaker, or between old speakers. Protocol extension Two new optional transitive attributes, AS4_Path with attribute code 0x11 and AS4_Aggregator with attribute code 0x12, are defined to transmit 4-byte AS numbers in old sessions. If a BGP connection is set up between a new speaker and an old speaker, a newly reserved AS_TRANS with value 23456 is defined for interoperability between 4-byte AS number and 2byte AS number. New AS numbers have three formats: • asplain: represents an AS number using a decimal integer. • asdot+: represents an AS number using two integer values joined by a period character: .. For example, 2-byte ASN123 is represented as 0.123, and ASN 65536 is represented as 1.0. The largest value is 65535.65535.
•
asdot: represents a 2-byte AS number using the asplain format and representing a 4-byte AS number using the asdot+ format. (1 to 65535; 1.0 to 65535.65535) Huawei supports the asdot format. Topology description R2 receives a route with a 4-byte AS number 10.1 from R1. R2 establishes a peer relationship with R3 and needs to enable R3 to consider the AS number of R2 as AS_TRANS. When advertising a route to R3, R2 records AS_TRANS in the AS_Path attribute of the route and records 10.1 and its AS number 20.1 to the AS4_Path attribute in the sequence required by BGP. R3 retains the unrecognized AS4_Path attribute and advertises the route to R4 according to BGP rules and considers the AS number of R2 as AS_TRANS. When receiving the route from R3, R4 replaces AS_TRANS with the IP address recorded in the AS4_Path attribute and records the AS4_Path as 30 20.1 10.1.
Next-hop iteration based on routing policy BGP needs to iterate indirect next hops. If indirect next hops are not iterated according to the routing policy, routes may be iterated to incorrect forwarding paths. Next hops should therefore be iterated according to certain conditions to control the iterated routes. If a route cannot pass the routing policy, the route is ignored and route iteration fails. Topology description IBGP peer relationships are established between R1 and R2, and between R1 and R3 through loopback interfaces. R1 receives a BGP route with prefix 10.0.0.0/24 from R2 and R3. The original next hop of the BGP route received from R2 is 2.2.2.2. The IP address of Ethernet0/0/0 of R1 is 2.2.2.100/24. When R2 is running normally, the BGP route with prefix 10.0.0.0/24 is iterated to the IGP route 2.2.2.2/32. When the IGP on R2 becomes faulty, the IGP route 2.2.2.2/32 is withdrawn. This causes route iteration again. On R1, a route is searched for in the IP routing table based on the original next hop 2.2.2.2. Consequently, the route is iterated to 2.2.2.0/24. The user expects that: when the route with the next hop 2.2.2.2 becomes unreachable, the route with the next hop 3.3.3.3 is preferred. Actually, the fault is caused by BGP convergence and results in an instant routing black hole.
With the next-hop iteration policy, you can control the mask length of the route through which the original next hop can be iterated. After the next-hop iteration policy is configured, the route with the original next hop 2.2.2.2 depends on only the IGP route 2.2.2.2/32.
Session setup between peers A session can be set up between BGP speakers through directly connected or loopback interfaces. Generally, IBGP neighbors establish peer relationships through loopback interfaces, while EBGP neighbors establish peer relationships through directly connected physical interfaces. You can configure authentication to ensure security for sessions between peers. Logical full-mesh connections must be set up between IBGP peers (no RR or confederation is used). You can prohibit synchronization to reduce the IGP load. Route update origin Routes can be imported into BGP using the import-route or network command. Routing policy optimization You can optimize BGP routes using inbound policies, outbound policies, and ORF. Route filtering and attribute control You can filter the routes to be advertised or received. You can control BGP route attributes to affect BGP route propagation. Route summarization Route summarization can optimize BGP routing entries and reduce the routing table size.
Redundancy Path redundancy ensures that a backup path is available when a network fault occurs. Traffic symmetry Scientific network design and policy application can ensure consistent paths for incoming and outgoing traffic. Load balancing When multiple paths to the same destination exist, traffic can be load balanced through policies to fully utilize bandwidth.
Interaction between non-BGP routes and BGP routes Generally, non-BGP routes can be imported into the BGP routing table using the import-route or network command. Control of default routes Default routes can be advertised or received according to conditions of routing policies. Policy-based routing Traffic paths can be optimized through PBR.
Dynamic update peer-groups: greatly improves router performance. Route reflector and confederation: reduces the number of IBGP sessions and optimizes large BGP networks.
Reduce unstable routes Use stable IGPs. Improve router performance. Reduce manual errors. Expand link bandwidth. Improve BGP stability Use BGP soft reset when using new BGP policies. Punish unstable routes correctly to reduce the impact of these routes on BGP.
Case description IP addresses used to interconnect devices are as follows: • If RTX connects to RTY, interconnected addresses are XY.1.1.X and XY.1.1.Y. Network mask is 24. If OSPF runs normally and the interconnected addresses and loopback interface addresses have been advertised into OSPF. However 10.0.X.0/24, 172.15.X.0/24, and 172.16.X.0/24 are not advertised into OSPF. Case analysis EBGP peer relationships are established using loopback interfaces.
Command usage The peer as-number command sets an AS number for a specified peer or peer group. The peer connect-interface command specifies a source interface that sends BGP messages and a source address used to initiate a connection. The peer next-hop-local command configures a BGP device to set its IP address as the next hop of routes when it advertises routes to an IBGP peer or peer group. The group command creates a peer group. View BGP process view Parameters peer ipv4-address as-number as-number ip-address: specifies the IPv4 address of a peer. as-number: specifies the AS number of the peer. peer ipv4-address connect-interface interface-type interfacenumber [ ipv4-source-address ] ip-address: specifies the IPv4 address of a peer. interface-type interface-number: specifies the interface type and number. ipv4-source-address: specifies the IPv4 source address used to set up a connection.
peer ipv4-address next-hop-local ip-address: specifies the IPv4 address of a peer. group group-name [ external | internal ] group-name: specifies the name of a peer group. external: creates an EBGP peer group. internal: creates an IBGP peer group. Precautions When configuring a device to use a loopback interface as the source interface of BGP messages, note the following points: • The loopback interface of the device's BGP peer must be reachable. • In the case of an EBGP connection, the peer ebgpmax-hop command must be executed to enable the two devices to establish an indirect peer relationship. The peer next-hop-local and peer next-hop-invariable commands are mutually exclusive. The Rec field in the display bgp peer command output indicates the number of route prefixes received from the peer.
Case description The topology in this case is the same as that in the previous case. Perform the configurations based on the configuration in the previous case. If all the clients of the RR have established logically full-mesh connections, the clients can transmit routes to each other without requiring the RR to reflect routes to them. In this situation, prohibit the RR from reflecting routes to clients so as to reduce the RR load.
Command usage The undo reflect between-clients command prohibits an RR from reflecting routes to clients. This command is executed on an RR. After this command is executed, clients can directly exchange BGP messages, while R2 does not need to reflect routes to these clients. However, R2 still reflects the routes that are advertised by non-clients. View BGP view Configuration verification Run the display bgp peer command to view detailed BGP peer information. To reduce the RR load, prohibit BGP routes from being added to the IP routing table and prevent the RR from forwarding packets. Disabling route reflection between clients however can better meet the full-mesh scenario requirement.
Case description The topology in this case is the same as that in the previous case. To meet the first requirement, use a route-policy to advertise interface routing information. To meet the second requirement, use an IP prefix list to filter routes.
Command usage The peer ip-prefix command configures a route filtering policy based on an IP prefix list for a peer or peer group. View
BGP view Parameters peer { group-name | ipv4-address } ip-prefix ip-prefixname { import | export } group-name: specifies the name of a peer group. ipv4-address: specifies the IPv4 address of a peer. ip-prefix-name: specifies the name of an IP prefix list. import: applies a filtering policy to the routes received from a peer or peer group. export: applies a filtering policy to the routes sent to a peer or peer group. Configuration verification Run the display bgp routing-table command to view the BGP routing table. For the same node in a route-policy, the relationship between if-match clauses is AND. A route needs to meet all the matching rules before the actions defined by apply clauses are performed.
The relationship between the if-match clauses in the if-match routetype and if-match interface commands is "OR", but the relationship between the if-match clauses in the two commands and other commands is "AND".
Case description This case is an extension to the previous case. Perform the configuration based on the configuration in the previous case. In requirement 2, the delivery of a default route depends on route 172.16.0.0/16. If route 172.16.0.0/16 disappears, the default route also disappears.
Command usage The peer route-policy command specifies a route-policy to control routes received from or to be advertised to a peer or peer group. The peer default-route-advertise command configures a BGP device to advertise a default route to its peer or peer group. View peer route-policy: BGP view peer default-route-advertise: BGP view Parameters peer ipv4-address route-policy route-policyname { import | export } ipv4-address: specifies an IPv4 address of a peer. route-policy-name: specifies a route-policy name. import: applies a route-policy to routes to be imported from a peer or peer group. export: applies a route-policy to routes to be advertised to a peer or peer group. peer { group-name | ipv4-address } default-routeadvertise [ route-policy route-policy-name ] [ conditionalroute-match-all{ ipv4-address1 { mask1 | mask-length1 } } &<1-4> | conditional-route-match-any { ipv4address2 { mask2 | mask-length2 } } &<1-4> ]
ipv4-address: specifies an IPv4 address of a peer. route-policy route-policy-name: specifies a routepolicy name. conditional-route-match-all ipv4address1{ mask1 | mask-length1 }: specifies the IPv4 address and mask/mask length for conditional routes. The default routes are sent to the peer or peer group only when all conditional routes are matched. conditional-route-match-any ipv4address2{ mask2 | mask-length2 }: specifies the IPv4 address and mask/mask length for conditional routes. The default routes are sent to the peer or peer group only when any conditional route is matched. Configuration verification Run the display ip routing-table command to view information about the IP routing table.
Case description This case is an extension to the previous case. Perform the configuration based on the configuration in the previous case.
Command usage The aggregate command creates an aggregated route in the BGP routing table. View
BGP view Parameters aggregate ipv4-address { mask | mask-length } [ asset | attribute-policy route-policy-name1 | detailsuppressed | origin-policy route-policy-name2 | suppresspolicyroute-policy-name3 ] * ipv4-address: specifies the IPv4 address of an aggregated route. mask: specifies the network mask of an aggregated route. mask-length: specifies the network mask length of an aggregated route. as-set: generates a route with the AS-SET attribute. attribute-policy route-policy-name1: specifies the name of an attribute policy for aggregated routes. detail-suppressed: advertises only the aggregated route. origin-policy route-policy-name2: specifies the name of a policy that allows route aggregation.
suppress-policy route-policy-name3: specifies the name of a policy for suppressing the advertisement of specified routes. Precautions During manual or automatic summarization, routes pointing to NULL0 are generated locally. Configuration verification Run the display ip routing-table protocol bgp command to view the routes learned by BGP.
Case description This case is an extension to the previous case. Perform the configuration based on the configuration in the previous case. BGP on-demand route advertisement requires ORF to be enabled on R4, R5, and R6.
Command usage The peer capability-advertise orf command enables prefixbased ORF for a peer or peer group. View
BGP view Parameters peer { group-name | ipv4-address } capability-advertise orf [ cisco-compatible ] ip-prefix { both | receive | send } group-name: specifies the name of a peer group. ipv4-address: specifies the IPv4 address of a peer. cisco-compatible: is compatible with Cisco devices. both: allows the device to send and receive ORF packets. receive: allows the device to receive only ORF packets. send: allows the device to send only ORF packets. Precautions BGP ORF has three modes: send, receive, and both. In send mode, a BGP device can send ORF information. In receive mode, a BGP device can receive ORF information. In both mode, a BGP device can send and receive ORF information.
To enable a BGP device that advertises routes to receive ORF IP-prefix information, configure this device to work in receive or both mode and the peer device to work in send or both mode. Configuration verification Run the display bgp peer 1.1.1.1 orf ip-prefix command to view prefix-based BGP ORF information received from a specified peer.
Case description IP addresses used to interconnect devices are as follows: • If RTX connects to RTY, interconnected addresses are XY.1.1.X and XY.1.1.Y.Network mask is 24.
Results The configuration is the basic OSPF configuration.
Results Run the display bgp peer command to view the BGP peer status. Run the display bfd session all command to view the BFD session. In the command output, D_IP_IF indicates that a BFD session is dynamically created and bound to an interface.
Results Run the display bgp routing-table command to view BGP routing entries. The command output shows that R3 learns two routes 10.0.0.0/24 from R2 and R4. According to BGP routing rules, R3 prefers the route 10.0.0.0/24 learned from R2.
Case description This case is an extension to the previous case. Perform the configuration based on the configuration in the previous case.
Analysis process You can use commands peer groups to reduce the RR load.
Results Run the display bgp routing-table community command to view the Community attribute.
Results Run the display bgp routing-table community command to view the Community attribute. The Community attribute is noexport. That is, the route is not advertised to EBGP peers.
ACL
An ACL is a series of sequential rules composed of permit and deny clauses. These rules match packet information to classify packets. Based on ACL rules, Routers permits or denies packets. An Access Control List (ACL) is a set of sequential rules. The ACL filters packets according to the specified rules. With the rules applied to a device, the device permits or denies the packets according to the rules. IP prefix list An IP prefix list filters matching routes in defined matching mode to meet requirements. An IP prefix list filters only routing information but not packets. AS_Path filter Each BGP route contains an AS path attribute. AS path filters specify matching rules regarding AS path attribute. AS path filters are exclusively used in BGP.
Community filter Community filters are exclusively used in BGP. Each BGP route contains a community domain to identify a community.Community filters specify matching rules regarding community domains.
ACL management rule An ACL can contain multiple rules. A rule is identified by a rule ID, which can be set by a user or automatically generated based on the ACL step. All the rules in an ACL are arranged in ascending order of rule IDs. There is a step between rule IDs. If no rule ID is specified, the step is determined by the ACL step. You can add new rules to a rule group based on the rule ID. ACL rule management When a packet reaches a device, the search engine retrieves information from the packet to constitute the key value and matches the key value with rules in an ACL. When a matching rule is found, the system stops the matching, and the packet matches the rule. If no matching rule is found, the packet does not match any rule. The action defined in the last rule of a Huawei ACL is permit by default.
Interface-based ACL Match packets based on the rules defined on the inbound interface of packets. You can run the traffic-filter command to reference an interface-based ACL.
Basic ACL Define rules based on the source IP address, VPN instance, fragment flag, and time range of packets. Advanced ACL Define rules based on the source IP address, destination IP address, IP preference, ToS, DSCP, IP protocol type, ICMP type, TCP source port/destination port, and UDP source port/destination port number of packets. An advanced ACL can define more accurate, abundant, and flexible rules than a basic ACL. Layer 2 ACL Define rules based on Ethernet frame header information in a packet, including the source MAC address, destination MAC address, and Ethernet frame protocol type.
ACL matching order An ACL is composed of a list of rules. Each rule contains a deny or permit clause. These rules may overlap or conflict. One rule can contain another rule, but the two rules must be different. Devices support two types of matching order: configuration order and automatic order. The matching order determines the priorities of the rules in an ACL. Rule priorities resolve the conflict between overlapping rules. Automatic order The automatic order follows the depth-first principle. ACL rules are arranged in sequence based on rule precision. For an ACL rule (where a protocol type, a source IP address range, or a destination IP address range is specified), the stricter the rule, the more precise it is considered. For example, an ACL rule can be configured based on the wildcard of an IP address. The smaller the wildcard, the smaller the specified host and the stricter the ACL rule. If rules have the same depth-first order, rules are matched in ascending order of rule IDs.
Packet fragmentation supported by ACLs In traditional packet filtering, only the first fragment of a packet needs to match rules, while the other fragments are allowed to pass through if the first fragment matches rules. In this situation, network attackers may construct subsequent fragments to launch attacks. In an ACL rule, the fragment parameter indicates that the rule is valid for all fragmented packets. The none-first-fragment parameter indicates that the rule is valid only for non-first fragmented packets but not for non-fragmented packets or the first fragmented packet. The rules that do not contain fragment and none-first-fragment parameters are valid for all packets (including fragmented packets). ACL time range You can make ACL rules valid only at the specified time or within a specified time range.
IP prefix list An IP prefix list can contain multiple indexes. Each index has a node. The system matches a route against nodes by the index in ascending order. If the route matches a node, the system does not match the route against the other nodes. If the route does not match any node, the system filters the route. According the matching prefix, an IP prefix list can be used for accurate matching, or matching within a specified mask length range. An IP prefix list can implement accurate matching, or matching within a specified mask length range. You can configure greater-equal and less-equal to specify the prefix mask length range. If the two keywords are not configured, an IP prefix is used to implement accurate matching. That is, only routes with the same mask length as that specified in the IP prefix list are matched. If only greater-equal is configured, the mask length range is [greater-equal-value,32]. If only lessequal is configured, the mask length range is [specified mask length, less-equal-value]. The mask length range can be specified as masklength<=greater-equal-value<=less-equal-value<=32. Characteristics of an IP prefix list When all IP prefix lists are not matched, the last matching mode is deny by default. When the referenced IP prefix list does not exist, the default matching mode is permit.
An AS_Path filter is only used to filter BGP routes to be advertised or received based on the AS_Path attributes contained in the BGP routes. Since the number of the last AS that a route passes through is added to the leftmost of an AS_Path list, configure an AS_Path filter with caution: If a route originating from an AS passes through AS 300, AS 200, and AS500, and then reaches AS 600, the AS_Path attribute of the route is (500 200 300 100).
A Community filter is only used to filter BGP routes to be advertised or received based on the Community attributes contained in the BGP routes. The Community attribute includes basic and advanced community attributes. Self-defined community attributes and well-known communities are basic community attributes. RT and SOO in MPLS VPN are extended community attributes.
A route policy is used to filter routes and set attributes for routes. By changing route attributes (including reachability), a route policy changes the path that network traffic passes through. A route policy is often used in the following scenarios: Control route importing. • Using a route policy, you can preventing sub-optimal routes and routing loops during the import of routes. Control route receiving and advertising. • Using a route policy, you can receive or advertise specified routes according to network requirements. Set attributes for routes. • Using a route policy, you can modify the attributes of routes to optimize a network. Route policy principles A route policy consists of multiple nodes. The system checks routes against the nodes of a route policy in ascending order of the node IDs. A node contains multiple if-match and apply clauses. The if-match clauses define matching conditions of a node, while apply clauses define the actions to be performed on the routes that match if-match clauses. The relationship between the if-match clauses of a node is AND. That is, a route matches a node only when the route matches all the ifmatch clauses of the node. The relationship between the nodes of a route policy is OR.
That is, a route matches a route policy as long as the route matches the route policy. If a route does not match any node, the route fails to match the route policy. The relationship between the if-match clauses of a node in a route policy is AND. The actions defined by apply clauses can be performed on a route only when the route meets all the matching conditions defined by the if-match clauses. The relationship between the if-match clauses in the if-match route-type and if-match interface commands is OR, but the relationship between the if-match clauses in the two commands and other commands is AND.
In the topology, dual-node bidirectional route advertisement is implemented. In the topology, R1 imports route 10.0.0.1/24 into OSPF. R3 imports OSPF routes into IS-IS, and R2 learns routes 10.0.0.0/24 through IS-IS. In this case, R2 learns two routes 10.0.0.0/24 through OSPF and IS-IS. R2 prefers the route learned through IS-IS because this route has a higher priority than the external route learned through OSPF. Therefore, R2 reaches 10.0.0.0/24 along the path R4→R3→R1. To optimize the path, modify the OSPF ASE priority to be higher than the IS-IS priority using a route policy. This modification prevents R2 from using a sub-optimal route. When the interface that connects R1 to network 10.0.0.0/24 goes Down, R2 imports route 10.0.0.0/24 into OSPF because it has learned the route through IS-IS even though the external LSA has been aged in the OSPF area. R1 and R3 then learn the route 10.0.0.0/24. When R2 accesses network 10.0.0.0/24, traffic passes through R4→R3→R1→R2, causing a routing loop. In this scenario, use a tag to prevent routing loops.
Control route receiving and advertising Only necessary and valid routes are received, which limits the routing table size and improves network security. Topology description R4 imports routes 10.0.X.0/24 into OSPF. According to service requirements, R1 can only receive routes 10.0.0.0/24 and 10.0.1.0/24, while R2 can only receive routes 10.0.2.0/24 and 10.0.3.0/24. You can use a filter policy to meet this requirement.
Generally, only routing information is filtered, but link state information is not filtered. In OSPF, incoming and outgoing Type 3, Type 5, and Type 7 LSAs can be filtered. Link-state routing protocols, such as OSPF and IS-IS, can filter only incoming routes but not LSAs that carry these routes. That is, OSPF and IS-IS do not add the filtered routes to the local routing tables, but LSAs of these routes are still transmitted in the OSPF or IS-IS area. The routes imported from other protocols can also be filtered. For example, you can use the filter-policy export command to filter the imported routes to be advertised from RIP. Only the external routes that pass the filtering can be converted into AS-external LSAs and advertised. In this situation, other neighbors do not have specified routes imported from RIP. This configuration can only be performed in the outbound direction.
Topology description You can modify the Local_Pref attribute contained in a route using a route policy to change the path of traffic. R2 learn the route 10.0.0.0/24 from EBGP and modify the Local Pref value 300, R3 learn the route 10.0.0.0/24 from EBGP and modify the Local Pref value 200. R1,R2,R3 have routes of each other from IBGP, ultimate AS100 prefers R2 to reach the 10.0.0.0/24.
PBR is a mechanism that selects routes based on user-defined policies. It includes local PBR, interface PBR, and SPR. This course discusses only local PBR. IP unicast PBR has the following advantages: Allows you to define policies for route selection according to service requirements, which improves route selection flexibility and controllability. Sends different data flows through different links, which improves link efficiency. Uses low-cost links to transmit service data without affecting service quality, which reduces the cost of enterprise data services.
Matching process If a device finds a matching local PBR node, the device processes packets as follows: Step 1 Checks whether the priority of packets has been set. • If so, the device applies the configured priority to the packets and performs step 2. • If not, the device performs step 2. Step 2 Checks whether an outbound interface has been configured for the local PBR. • If so, the device sends packets from the outbound interface. • If not, the device performs step 2. Step 3 Checks whether next hops have been configured for the local PBR. You can configure two next hops to implement load balancing. • If so, the device sends packets to the next hops. • If not, the device searches the routing table for a route based on the destination addresses of packets. If no route is available, the devices performs step 4. Step 4 Checks whether the default outbound interface has been configured for the local PBR. • If so, the device sends the packets from the default outbound interface. • If not, the device performs step 5.
Step 5 Checks whether the default next hop has been configured for the local PBR. • If so, the device sends the packets to the default next hop. • If not, the device performs step 6. Step 6 Discards the packets and generates ICMP_UNREACH messages. If the device does not find a matching local PBR node, it searches the routing table for a route based on the destination addresses of the packets and then sends the packets.
Case description IP addresses used to interconnect devices are as follows: • If RTX connects to RTY, interconnected addresses are XY.1.1.X and XY.1.1.Y. Network mask is 24.
Command usage The route-policy command creates a route policy and displays the route-policy view. View
System view Parameters route-policy route-policy-name { permit | deny } node node route-policy-name: specifies the name of a route policy. permit: specifies the matching mode of the route policy as permit. In permit mode, if a route matches all the ifmatch clauses of a node, the route matches the route policy, and the actions defined by the apply clause of the route are performed on the route, otherwise, the route continues to match the next node. deny: specifies the matching mode of the route policy as deny. In deny mode, if a route matches all the if-match clauses of a route, the route does not match the route policy and cannot match the next node. node node: specifies the index of a node in the route policy. Precautions A route policy is used to filter routes and set attributes for the routes that match the route policy. A route policy consists of multiple nodes.
One node contains multiple if-match and apply clauses. The if-match clauses define matching conditions for this node, and the apply clauses define the actions to be performed on the routes that meet the matching conditions. The relationship between if-match clauses is AND. That is, a route must match all the if-match clauses of a node. The relationship between the nodes of a route policy is OR. That is, if a route matches a node, the route matches the route policy. If a route does not match any node, the route does not match the route policy.
Case description The topology in this case is the same as that in the previous case. Perform the configuration based on the configuration in the previous case. In requirement 2, use the least number of commands to implement the optimal configuration.
Command usage The filter-policy export command filters imported routes to be advertised according to the policy. View
System view Parameters filter-policy { acl-number | acl-name acl-name | ip-prefix ipprefix-name } export [ protocol [ process-id ] ] acl-number: specifies the number of a basic ACL. acl-name acl-name: specifies the name of an ACL. ip-prefix ip-prefix-name: specifies the name of an IP prefix list. protocol: specifies the protocol that advertises routing information. process-id: specifies a process ID when the protocol that advertises routing information is RIP, IS-IS, or OSPF. Precautions After external routes are imported into OSPF using the importroute command, you can run the filter-policy export command to filter the imported routes to be advertised.
This configuration allows only the external routes that meet the matching conditions to be translated into Type 5 LSAs (ASexternal-LSAs) and advertised. In this case, routing loops are prevented. You can specify protocol or process-id to filter the routes of a specified protocol or process. If no protocol or process-id is specified, OSPF filters all of the imported routes.
Case description The topology in this case is the same as that in the previous case. After meeting the requirements, check whether suboptimal routes and routing loops exist.
Results After routing protocols import routes from each other, R4 reaches 172.16.X.0/24 through a sub-optimal route (OSPF route 172.16.X.0/24). This is because R4 first learns OSPF route 172.16.X.0/24 and then learns RIP route 172.16.X.0/24. In fact, the optimal route is OSPF route 172.16.X.0/24. However, the preference of OSPF external routes is 150,and the preference of RIP is 100,so R4 reaches 172.16.X.0/24 through a sub-optimal route.
Case description This case is an extension to the previous case. Perform the configuration based on the configuration in the previous case. To meet requirement 1, ensure that R4 accesses 172.16.X.0/24 through RIP, to void reaches 172.16.X.0/24 through a sub-optimal route. To meet requirement 2, use tags to control dual-node bidirectional route importing so as to prevent routing loops.
Results If we do not filter routes when bidirectional route importing, routing loops occur when network environments change. In order to avoid the loop should ensure that routing protocols between imported only importing in the routing domain self routing. Based on the configuration in the previous, the advantage of using TAG is not required to specify the routing entries specifically. When routing domain specific item or routing, the routing entries and restrictions will change, does not need manual intervention, and has a good scalability. Though the configuration in the previous could avoid routing loops, but the sub-optimal route is still exist.
The reason of sub-optimal route is when dual-node bidirectional route importing one of R3 and R4 will learn network 172.16.X.0/24 from both OSPF and RIP, and the preference of OSPF external routes is greater than RIP, R3 or R4(one of them ) reaches 172.16.X.0/24 through a suboptimal. To slove this you need to modify the preference of OSPF external routes is smaller than RIP. The preference value of OSPF external routes smaller than the OSPF internal routes is unreasonable.
Case description This case is an extension to the previous case. Perform the configuration based on the configuration in the previous case.
Results When only route summarization is performed, two problems exist: R5 learns the summary route, and a routing loop occurs between R3 and R4 when R2 pings a nonexistent IP address. The reason why the first problem occurs is as follows: After R3 and R4 learn the summary routes generated by themselves, they import the summary routes into the RIP area again. The reason why the second problem occurs is as follows: After R3 and R4 learn the summary routes generated by themselves, they add the summary routes to their routing tables. To address the two problems, prevent R3 and R4 from learning the summary routes generated by them and from importing the routes into the OSPF area. That is, filter the summary route learned from each other on R3 and R4.
Configuration filter policy on R3 and R4, avoid receive specify summary routes of OSPF to ensure not importing this to the domain of RIP for avoiding routing loops.
Case description This case is an extension to the previous case. Perform the configuration based on the configuration in the previous case.
Command usage The policy-based-route command creates or modifies a PBR. The ip local policy-based-route command enables local PBR. View
policy-based-route: system view ip local policy-based-route: system view Parameters policy-based-route policy-name { permit | deny } node nodeid policy-name: specifies the PBR name. permit: performs PBR on the routes that meet matching conditions. deny: does not perform PBR on the routes that meet matching conditions. node-id: specifies the ID of a node. ip local policy-based-route policy-name policy-name: specifies a PBR name.
Precautions When deploying PBR, do not configure a broadcast interface such as an Ethernet interface as the outbound interface of packets.
Configuration verification Run the display bgp peer 1.1.1.1 orf ip-prefix command to view prefix-based BGP ORF information received from a specified peer.
Case description IP addresses used to interconnect devices are designed as follows: • If RTX connects to RTY, interconnected addresses are XY.1.1.X and XY.1.1.Y. Network mask is 24.
Results When R5 imports routes, accurate matching must be performed.
Results When you tracert a nonexistent IP address that belongs to 10.0.0.0/16, a routing loop occurs. This is because no route pointing to Null0 is generated when OSPF generates a summary route.
Results You can configure static routes pointing to Null0 on R5 using a command to prevent routing loops.
Case description This case is an extension to the previous case. Perform the configuration based on the configuration in the previous case. IP addresses used to interconnect devices are designed as follows: • If RTX connects to RTY, interconnected addresses are XY.1.1.X and XY.1.1.Y. Network mask is 24. • The IP address of R1 S0/0/0 is 12.1.1.1/24, and the IP address of R2 S0/0/0 is 12.1.1.2/24. The IP address of R1 S0/0/1 is 21.1.1.2/24, and the IP address of R2 S0/0/1 is 21.1.1.1/24.
Results Use the ACL and route-policy commands to import two network segment into IS-IS, usually use the filter-policy XXX export command to import routes.
Results After you use tags to prevent routing loops, If IS-IS support Tags is necessary , the cost type must wide, otherwise the routes of IS-IS can not be tagged. To prevent the sub-optimal route, modify the preference of OSPF external route 10.0.0.0/16 to be smaller than that of ISIS routes.
Results Configuration on this case avoid sub-optimal routes of R3 and R4. The difference of importing time cause one of R3 and R4 will learn 10.0.0.0/16 from ISIS or OSPF at the same time, If R3 imported routes earlier, R4 will learn 10.0.0.0/16 from ISIS and OSPF at the same time, and compare their preference, the preference of OSPF external routes is 150, preference of ISIS is 15, so R4 prefer ISIS to reach the network 10.0.0.0/16, but this one is sub-optimal route. So mofidy the preference of 10.0.0.0/16 on R4 smaller than the preference value of ISIS can eliminate sub-optimal routes. The preference value of OSPF external routes smaller than the OSPF internal routes is unreasonable.
Results Use local PBR to meet this requirement.
VLAN technology brings the following benefits: Limits broadcast domains. A broadcast domain is limited in a VLAN. This saves bandwidth and improves network processing capabilities. Enhances network security. Packets from different VLANs are separately transmitted. Hosts in a VLAN cannot directly communicate with hosts in another VLAN. Improves network robustness. A fault in a VLAN does not affect hosts in other VLANs. Flexibly sets up virtual groups. With VLAN technology, hosts in different geographical areas can be grouped together. This facilitates network construction and maintenance. Topology description S1 and S2 are located in different positions. Each switch connects to two computers and the computers belongs to two different VLANs. The dashed box indicates a VLAN. By default, PCs in VLAN 2 cannot communicate with PCs in VLAN 3. That is, broadcast packets are limited in a VLAN.
IEEE 802.1Q IEEE 802.1Q is an Ethernet networking standard for a specified Ethernet frame format. It adds the 4-byte 802.1Q Tag field between the Source address and the Length/Type fields of the original frame. Subfields in the 802.1q Tag field: TPID: is short for Tag Protocol Identifier and indicates the frame type, which has 2 bytes. The value 0x8100 indicates an 802.1Q-tagged frame. An 802.1Q-incapable device discards the received 802.1Q frame. PRI: is short for priority and indicates the frame priority, which has 3 bits. The value ranges from 0 to 7. The greater the value, the higher the priority. When QoS is deployed on a switch, the switch first sends data frames with higher priority. CFI: is short for Canonical Format Indicator and indicates whether the MAC address is in canonical format. The value 0 indicates the MAC address in canonical format and the value 1 indicates the MAC address in non-canonical format. CFI is used to differentiate Ethernet frames, Fiber Distributed Digital Interface (FDDI) frames, and token ring network frames. The value is 0 on the Ethernet. VID: is short for VLAN ID and indicates the VLAN to which a frame belongs, which has 12 bits.
Each frame sent by an 802.1Q-capable switch can carries a VLAN ID. In a VLAN, Ethernet frames are classified into the following types: Tagged frame: frame with the 4-byte 802.1Q tag Untagged frame: frame without the 4-byte 802.1Q tag
The following link types are available: Access link: Usually connects a host to a switch. Generally, a host does not need to know which VLAN it belongs to, and host hardware cannot distinguish frames with VLAN tags. Hosts therefore send and receive only untagged frames along access links. Trunk link: Usually connects a switch to another switch or a router. Data of different VLANs is transmitted along a trunk link. The two ends of a trunk link must be able to distinguish frames using VLAN tags, and so only tagged frames are transmitted along trunk links. Topology description A host does not need to know the VLAN to which it belongs. It sends only untagged frames. After receiving an untagged frame from a host, a switching device determines the VLAN to which the frame belongs based on the configured VLAN assignment method such as interface information. The switching device then processes the frame accordingly. If a frame needs to be forwarded to another switching device, the frame must be transparently transmitted along a trunk link. Frames transmitted along trunk links must carry VLAN tags to allow other switching devices to properly forward the frame based on the VLAN information.
After a switching device determines the outbound interface of a frame and before the switching device sends the frame to the destination host, the switching device connected to the destination host removes the VLAN tag from the frame to ensure that the host receives an untagged frame.
Interface types An access interface on a switch connects to an interface on a host. It can only connect to access links. • The access interface allows only the VLAN whose ID is the same as the Port Default VLAN ID (PVID). • If the access interface receives untagged frames from the remote device, the switch adds the PVID to the untagged frames. • Ethernet frames sent by the access interface are always untagged frames. A trunk interface on a switch connects to another switch. It can only connect to trunk links. • The trunk interface allows frames from multiple VLANs to pass through. • If the tag in the frame sent by the trunk interface is the same as the PVID, the switch removes the tag from the frame. The trunk interface sends untagged frames in this situation only. • If the tag in the frame sent by the trunk interface is different from the PVID, the switch directly sends the frame. A hybrid interface on a switch can connect to either a host or another switch. It can connect to either access or trunk links. • The hybrid interface allows frames from multiple VLANs to pass through and removes tags from frames on the outbound interface.
Interface-based VLAN assignment VLANs are assigned based on interface numbers. The network administrator configures a PVID for each switch interface, that is, an interface belongs to a VLAN by default. • When an untagged data frame reaches a switch interface that has the PVID configured, the PVID is added to the frame. • When a data frame carries a VLAN tag, the switch does not add a VLAN tag to the data frame even if the interface is configured with a PVID. Different types of interfaces process VLAN frames in different manners. MAC address-based VLAN assignment VLANs are assigned based on MAC addresses. The network administrator needs to configure the mappings between MAC addresses and VLAN IDs. When the switch receives an untagged frame, it searches for the VLAN entry matching the source MAC address of the frame and adds the VLAN ID to the frame. IP subnet-based VLAN assignment When receiving an untagged frame, the switch adds a VLAN tag to the packet based on the source IP address of the packet.
Protocol-based VLAN assignment VLAN IDs are allocated to packets received on an interface according to the protocol (suite) type and encapsulation format of the packets. The network administrator needs to configure the mappings between protocol types and VLAN IDs. When the switch receives an untagged frame, it searches the protocol-VLAN mapping table for a VLAN tag mapping the protocol of the frame and adds it to the frame. The protocol support vlan assignment contains IPV4\IPV6\IPX\AppleTalk(AT), encapsulation type is Ethernet II、802.3 raw、802.2 LLC、802.2 SNAP. Policy-based VLAN assignment Terminals’ MAC addresses and IP addresses need to be configured and associated with VLANs on the switch. Only terminals matching conditions can be added to a specified VLAN. After terminals matching conditions are added to the VLAN, changes of the IP addresses or MAC addresses may cause the terminals to be removed from the VLAN.
Topology description To implement intra-communication in VLAN 2 and VLAN 3 through the trunk link between S1 and S2, add Port 2 on S1 and Port 1 on S2 to VLAN 2 and VLAN 3. PC1 sends a frame to PC2 as follows: • The frame is first sent to Port 4 on S1. • Port 4 adds a tag to the frame. The VID field of the tag is 2, that is, the ID of the VLAN to which Port 4 belongs. • S1 sends the frame to all interfaces in VLAN 2 except Port 4 (Suppose the table of MAC address is empty). • Port 2 forwards the frame to S2. • After receiving the frame, S2 determines that the frame belongs to VLAN 2 based on the tag. S2 sends the frame to all interfaces in VLAN 2 except for Port 1. • Port 3 sends the frame to PC2.
Topology description R1 is a Layer 3 switch supporting sub-interfaces, and S1 is a Layer 2 switching device. LANs are connected using the switched Ethernet interface on S1 and the routed Ethernet interface on R1. To implement inter-VLAN communication, perform the following operations: • Create two sub-interfaces on the Ethernet interfaces connecting R1 and S1, and configure 802.1Q encapsulation on sub-interfaces corresponding to VLAN 2 and VLAN 3. • Configure IP addresses for sub-interfaces to ensure the two sub-interfaces have reachable routes. • Configure Ethernet interfaces connecting S1 and R1 as trunk or hybrid interfaces and configure them to allow frames from VLAN 2 and VLAN 3 to pass through. • Configure the default gateway address as the IP address of the sub-interface mapping the VLAN to which the host belongs. PC1 communicates with PC2 as follows: • PC1 checks the IP address of PC2 and determines that PC2 is in another VLAN. • PC1 sends an ARP Request packet to R1 to request R1's MAC address.
• • • •
• • •
After receiving the ARP Request packet, R1 returns an ARP Reply packet in which the source MAC address is the MAC address of the sub-interface mapping VLAN 2. PC1 obtains R1's MAC address. PC1 sends a packet in which the destination MAC address is the MAC address of the sub-interface and the destination IP address is PC2's IP address to R1. After receiving the packet, R1 forwards the packet and detects that the route to PC2 is a direct route. The packet is forwarded by the sub-interface mapping VLAN 3. R1 as the gateway in VLAN 3 broadcasts an ARP Request packet requesting PC2's MAC address. After receiving the ARP Request packet, PC2 returns an ARP Reply packet. After receiving the ARP Reply packet, R1 sends the packet from PC1 to PC2. All packets sent from PC1 to PC2 are sent to R1 first for Layer 3 forwarding.
A routing table must have correct routing entries so that new data flows can be correctly forwarded. You can deploy VLANIF interfaces and routing protocols on Layer 3 switches to implement Layer 3 connectivity.
Topology description VLAN 2 and VLAN 3 are assigned. To implement interVLAN communication, perform the following operations: • Create two VLANIF interfaces on S1 and configure IP addresses for them to ensure the two VLANIF interfaces have reachable routes. • Configure the default gateway address as the IP address of the VLANIF interface mapping the VLAN to which the user host belongs. PC1 communicates with PC2 as follows: • PC1 checks the IP address of PC2 and determines that PC2 is in another VLAN. • PC1 sends an ARP Request packet to S1 to request S1's MAC address. • After receiving the ARP Request packet, S1 returns an ARP Reply packet in which the source MAC address is the MAC address of VLANIF 2. • PC1 obtains S1's MAC address.
•
• •
• •
PC1 sends a packet in which the destination MAC address is the MAC address of the VLANIF interface and the destination IP address is PC2's IP address to S1. After receiving the packet, S1 forwards the packet and detects that the route to PC2 is a direct route. The packet is forwarded by VLANIF 3. S1 as the gateway in VLAN 3 broadcasts an ARP Request packet requesting PC2's MAC address. After receiving the ARP Request packet, PC2 returns an ARP Reply packet. After receiving the ARP Reply packet, S1 sends the packet from PC1 to PC2. All packets sent from PC1 to PC2 are sent to S1 first for Layer 3 forwarding.
VLAN aggregation, also known as the super-VLAN, partitions a broadcast domain using multiple VLANs on a physical network so that different VLANs can belong to the same subnet. Super-VLAN: is a set of multiple sub-VLANs. In a super-VLAN, only Layer 3 interfaces are created, and no physical interface exists. Sub-VLAN: is used to isolate broadcast domains. In the subVLAN, only physical interfaces exist and Layer 3 VLAN interfaces cannot be created. The super-VLAN is used to implement Layer 3 switching. A super-VLAN can contain one or more sub-VLANs. IP addresses of hosts in sub-VLANs of the super-VLAN belong to the subnet of the super-VLAN.
Topology description The super-VLAN (VLAN 10) contains the sub-VLANs (VLAN 2 and VLAN 3). Proxy ARP between sub-VLANs is enabled on S1. The communication process is as follows: • After comparing PC2’s IP address (1.1.1.20) with its IP address, PC1 finds that both IP addresses are on the same network segment. The ARP table of PC1 however has no corresponding entry for PC2. • PC1 broadcasts an ARP Request packet to request PC2’s MAC address. • PC2 is not in VLAN 2, and so PC2 cannot receive the ARP Request packet. • The gateway is enabled with proxy ARP between subVLANs, therefore after receiving the ARP Request packet from PC1, the gateway finds that PC2’s IP address (1.1.1.20) is the IP address of a directly connected interface. The gateway then broadcasts an ARP Request packet to all the other sub-VLAN interfaces to request for PC2’s MAC address. • After receiving the ARP Request packet, PC2 sends an ARP Reply packet. • After receiving the ARP Reply packet from PC2, the gateway replies its MAC address to PC1. • The ARP tables of both S1 and PC1 have corresponding entries of PC2.
•
To send packets to PC2, PC1 first sends packets to the gateway, and then the gateway performs Layer 3 forwarding.
Topology description The frame that enters S1 through Port 1 on PC1 is tagged with the ID of VLAN 2. The VLAN ID, however, is not changed to the ID of VLAN 10 on S1 even if VLAN 2 is the sub-VLAN of VLAN 10. After passing through Port 3, which is a trunk interface, this frame still carries the ID of VLAN 2. S1 discards the frames of VLAN 10 that are sent to S1 by other devices because S1 has no physical interface corresponding to VLAN 10. A super-VLAN has no physical interface: If you configure a super-VLAN and then a trunk interface, the frames of a super-VLAN are filtered automatically according to the VLAN range configured on the trunk interface. If you first configure a trunk interface and configure the trunk interface to allow all VLANs to pass through, you cannot configure the super-VLAN on the device. The root cause is that any VLAN with physical interfaces cannot be configured as the super-VLAN. The trunk interface allows frames from all VLANs to pass through, so no VLAN can be configured as a super-VLAN. On S1, only VLAN 2 and VLAN 3 are valid, and all frames are forwarded in these VLANs.
Topology description S2 is configured with super-VLAN 4, sub-VLAN 2, sub-VLAN 3, and common VLAN 10. S1 is configured with two common VLANs, namely, VLAN 10 and VLAN 20. S2 is configured with the route to the network segment 1.1.3.0/24, and S1 is configured with the route to the network segment 1.1.1.0/24. PC1 in sub-VLAN 2 of super-VLAN 4 then needs to communicate with PC3 on connected to S1. • After comparing PC3’s IP address (1.1.3.2) with its IP address, PC1 finds that two IP addresses are on different network segments. • PC1 broadcasts an ARP Request packet to its gateway (S2) to request S2’s MAC address. • After receiving the ARP Request packet, S2 checks the mapping between the sub-VLAN and the super-VLAN, and sends an ARP Reply packet to PC1 through subVLAN 2. The source MAC address in the ARP Reply packet is the MAC address of VLANIF 4 corresponding to super-VLAN 4. • PC1 learns S2’s MAC address. • PC1 sends the ARP Reply packet to S2. The ARP Reply packet carries the destination MAC address as the MAC address of VLANIF 4 corresponding to superVLAN 4 and the destination IP address of 1.1.3.2.
•
•
• •
After receiving the ARP Reply packet, S2 performs Layer 3 forwarding and sends the ARP Reply packet to S1, with the next hop address of 1.1.2.2 and outbound interface as VLANIF 10. After receiving the ARP Reply packet, Switch2 performs Layer 3 forwarding and sends the ARP Reply packet to PC3 through the directly connected interface VLANIF 20. The ARP Reply packet from PC3 reaches S2 after Layer 3 forwarding on S1. After receiving the ARP Reply packet, S2 performs Layer 3 forwarding and sends the packet to PC1 through the super-VLAN.
The MUX VLAN falls into the principal VLAN and subordinate VLAN. The subordinate VLAN is classified into the separate VLAN and group VLAN. Principal VLAN: A principal interface can communicate with all interfaces in a MUX VLAN. Subordinate VLAN • Separate VLAN: A separate interface can communicate only with a principal interface and is isolated from other types of interfaces. A separate VLAN must be bound to a principal VLAN. • Group VLAN: A group interface can communicate with a principal interface and other interfaces in the same group VLAN, but cannot communicate with interfaces in other group VLANs or a separate interface. A group VLAN must be bound to a principal VLAN.
Topology description The principal interface connects to the enterprise server; separate interfaces connect to enterprise customers; group interfaces connect to enterprise employees. In this manner, enterprise customers and enterprise employees can access the enterprise server, enterprise employees can communicate with each other, enterprise customers cannot communicate with each other, and enterprise customers and enterprise employees cannot communicate with each other.
Case description To meet requirement 2, configure VLAN 2 and VLAN 3 to be permitted by the trunk link.
Command usage The port link-type command sets the link type of an interface. The port trunk allow-pass vlan command adds a trunk interface to VLANs. The port hybrid untagged vlan command adds a hybrid interface to VLANs. Frames of the VLANs then pass through the hybrid interface in untagged mode. View Interface view Parameters port link-type { access | dot1q-tunnel | hybrid | trunk } Access: configures the link type of an interface as access. dot1q-tunnel: configures the link type of an interface as QinQ. hybrid: configures the link type of an interface as hybrid. trunk: configures the link type of an interface as trunk. Precautions Before changing the link type of an interface, you need to delete the VLAN configuration of the interface. That is, the interface can join only VLAN 1. If a specified VLAN does not exist, the port trunk allow-pass vlan command does not take effect. The port trunk allowpass vlan command cannot be used on a member interface of an Eth-Trunk.
A hybrid interface can connect to either a user host or a switch. When a hybrid interface is connected to a user host, it must be added to VLANs in untagged mode because user hosts cannot process untagged frames. The port hybrid untagged vlan command is invalid on a member interface of an Eth-Trunk. A super VLAN cannot be specified in the port hybrid untagged vlan command.
Case description The topology is similar to that in slide 22. The difference is that MAC addresses are identified. Assign VLANs based on MAC addresses to meet the requirement. Before configuring MAC address-based VLAN assignment, ensure that the link type of the Layer 2 interface is hybrid.
Command usage The mac-vlan mac-address command associates a MAC address with a VLAN. The mac-vlan enable command enables MAC addressbased VLAN assignment on an interface. Precautions After a MAC address is associated with a VLAN, it cannot be associated with other VLANs. If MAC address-based assignment is enabled on an interface: • When receiving an untagged packet, the interface searches for the VLAN entry matching the source MAC address of the packet. If a matching entry is found, the interface forwards the packet based on the VLAN ID. If no matching entry is found, the interface uses other matching rules to forward the packet. • When receiving a tagged packet, the interface forwards the packet based on the interface-based VLAN assignment configuration. MAC address-based assignment can be configured only on hybrid interfaces.
Case description The topology is similar to that in slide 22. Before configuring IP subnet-based VLAN assignment, ensure that the link type of the Layer 2 interface is hybrid.
Command usage The ip-subnet-vlan command associates an IP subnet with a VLAN. The ip-subnet-vlan enable command enables IP subnetbased VLAN assignment on an interface. Precautions The ip-subnet-vlan command associated with a VLAN cannot be a multicast network segment or multicast address. IP subnet-based assignment can be configured only on hybrid interfaces.
Case description Protocol-based assignment can be configured only on hybrid interfaces.
Command usage The protocol-vlan command associates a protocol with a VLAN. The protocol-vlan vlan command associates an interface with a protocol-based VLAN. Precautions Protocol-based assignment can be configured only on hybrid interfaces. When protocol-based assignment is used on an interface, the switch needs to parse the protocol type in the received packet and convert it.
Case description You can use the VLANIF interface or sub-interface to implement communication between VLANs.
Command usage The interface vlanif command creates a VLANIF interface and displays the VLANIF interface view. The dot1q termination vid command configures the single VLAN ID of dot1q encapsulation on a sub-interface. The arp broadcast enable command enables ARP broadcast on a sub-interface. Precautions Before running the interface vlanif command, you must run the vlan command to create a VLAN specified by vlan-id.
Case description Configure VLAN aggregation to meet the requirements.
Command usage The aggregate-vlan command configures a VLAN as a super-VLAN. The access-vlan command adds one or more sub-VLANs to a super-VLAN. Precautions VLAN 1 cannot be configured as a super-VLAN. The super-VLAN must be different from all its sub-VLANs. A VLAN can be added to only one super-VLAN.
Case description Configure the MUX VLAN to meet the requirements.
Command usage The mux-vlan command configures a VLAN as a principal VLAN. The subordinate group command configures subordinate group VLANs for a principal VLAN. The subordinate separate command configures a subordinate separate VLAN for a principal VLAN. Precautions for the principal VLAN The super-VLAN, sub-VLAN, or subordinate VLAN cannot be configured as a principal VLAN. The VLAN where a VLANIF interface has been created cannot be configured as a principal VLAN. Precautions for the subordinate group VLAN Before configuring a subordinate group VLAN, you must configure a principal VLAN and enter the principal VLAN view. The VLAN to be configured as a subordinate group VLAN must have been created. The VLAN to be configured as a subordinate group VLAN cannot have a VLANIF interface configured or be configured as a super-VLAN. Before running the undo subordinate group command delete a subordinate group VLAN to which interfaces have been added, delete the interfaces from the subordinate group VLAN. A subordinate group VLAN must be different from the principal VLAN.
A subordinate group VLAN must be different from a subordinate separate VLAN. Precautions for the subordinate separate VLAN Before configuring a subordinate separate VLAN, you must configure a principal VLAN and enter the principal VLAN view. The VLAN to be configured as a subordinate separate VLAN must have been created. The VLAN to be configured as a subordinate separate VLAN cannot have a VLANIF interface configured or be configured as a super-VLAN. Before running the undo subordinate separate command delete a subordinate separate VLAN to which interfaces have been added, delete the interfaces from the subordinate separate VLAN. A subordinate separate VLAN must be different from the principal VLAN. A subordinate separate VLAN must be different from a subordinate group VLAN.
Check whether MAC address entries on the switch are correct. Run the display mac-address command on the switch to check whether the MAC addresses, interfaces, and VLANs in the learned MAC address entries are correct. If the learned MAC address entries are incorrect, run the undo macaddress mac-address vlan vlan-id command on the interface to delete the existing entries so that the switch can learn MAC address entries again.
Case description To implement communication between VLANs through RIPv2, configure at least two VLANIF interfaces on the switch.
Result Perform the ping operation. PC1 in VLAN 2 and VLAN 3 can communicate with each other.
Result To implement communication between VLANs through RIPv2, configure at least two VLANIF interfaces on the switch.
Proxy ARP Routed proxy ARP: Routed proxy ARP enables network devices on the same network segment but on different physical networks to communicate. Intra-VLAN proxy ARP: If two hosts belong to the same VLAN where user isolation is configured, enable intra-VLAN proxy ARP on an interface associated with the VLAN to allow the hosts to communicate. Inter-VLAN proxy ARP: If two hosts belong to different VLANs, enable inter-VLAN proxy ARP on interfaces associated with the VLANs to implement Layer 3 communication between the two hosts. Topology Description Routed proxy ARP • The IP addresses of PC1 and PC2 are on the same network segment. When PC1 needs to communicate with S1, PC1 broadcasts an ARP Request packet, requesting the MAC address of PC2. However, PC1 and PC2 are on different physical networks (in different broadcast domains). PC2 therefore cannot receive the ARP Request packet sent from PC1 and does not respond with an ARP Reply packet. To solve this problem, enable proxy ARP on S1.
After receiving the ARP Request packet, S1 searches for a routing entry corresponding to PC2. If the routing entry corresponding to PC2 exists, S1 responds to the ARP Request packet with its own MAC address. PC1 forwards data based on the MAC address of S1. S1 functions as the proxy of PC2. Intra-VLAN proxy ARP • PC1 cannot communicate with PC2 in the same VLAN because interface isolation is configured on the interface of S1 connected to PC1 and PC2. To solve this problem, enable intra-VLAN proxy ARP on the interfaces of S1. After S1's interface connected to PC1 receives an ARP Request packet destined for another address, S1 does not discard the packet but searches for the ARP entry corresponding to PC2. If the ARP entry corresponding to PC2 exists, S1 sends its MAC address to PC1 and forwards packets sent from PC1 to PC2. S1 functions as the proxy of PC2. Inter-VLAN proxy ARP • This function is used in VLAN aggregation. Refer to the VLAN documentation.
Gratuitous ARP provides the following functions: Checks for duplicate IP addresses: Normally, a host does not receive an ARP Reply packet after sending an ARP Request packet with the destination address as its own IP address. If the host receives an ARP Reply packet, another host has the same IP address. Advertises a new MAC address: If the MAC address of a host changes because its network adapter is replaced, the host sends a gratuitous ARP packet to notify all hosts of the change before the ARP entry is aged out. Notifies of an active/standby switchover in a VRRP group: After an active/standby switchover is performed, the master switch sends a gratuitous ARP packet in the VRRP group to notify of the switchover.
After the system is reset or the interface card is hot swapped or reset, the dynamic entries will be lost but the static and the blackhole entries are not lost.
Secure MAC addresses are classified into the following types: • Secure dynamic MAC address: is learned on an interface where port security is enabled but the sticky MAC function is disabled. After port security is enabled on an interface, dynamic MAC address entries that have been learned on the interface are deleted and MAC address entries learned subsequently turn into secure dynamic MAC address entries. Secure dynamic MAC addresses will not be aged out by default. After the switch restarts, secure dynamic MAC addresses are lost and need to be learned again. • Sticky MAC address: is learned on an interface where both port security and the sticky MAC function are enabled. Sticky MAC addresses will not be aged out. After you save the configuration and restart the switch, sticky MAC addresses still exist.
MAC address anti-flapping Increasing the MAC address learning priority of an interface: When the same MAC address entry is learned by interfaces with different priorities, the MAC address entry learned by the interface with the highest priority overwrites the one learned by other interfaces. Preventing MAC address overwriting on interfaces with the same priority: If the priority of an interface on a bogus device is the same as that on the authorized device, the MAC address of the bogus device learned later does not overwrite the correct MAC address. If the device powers off, the MAC address of the bogus device is learned. After the device powers on again, the device cannot learn the correct MAC address. Topology description You can set a high MAC address learning priority on Port1 to prevent PC3 from using the MAC address of PC1 to attack the switch.
Topology description No loop prevention protocol is used on the switching network. If S2 and S4 are incorrectly connected with a network cable, a loop occurs between S2, S3, and S4. When a broadcast packet is sent, the packet is forwarded to S3 and received by Port1 on S1. When MAC address flapping detection is configured on Port1, S1 detects that the source MAC address of the broadcast packet flaps between interfaces. If the MAC address flaps between interfaces frequently, S1 considers that MAC address flapping occurs. The interface associated with S1 can enter the error-down state or be removed from the VLAN. MAC address flapping detection Other dynamic VLAN technologies cannot be used with the removal of an interface from the VLAN where MAC address flapping occurs.
Link aggregation has the following advantages: Increased bandwidth: The bandwidth of the link aggregation interface is the sum of bandwidth of member interfaces. Higher reliability: When the physical link of a member interface fails, the traffic can be switched to another available member link, improving reliability of the link aggregation interface. Load balancing: In a Link Aggregation Group (LAG), traffic is load balanced among active member interfaces. Basic concepts of Ethernet link aggregation Eth-Trunk: An LAG is the logical link bundled by many Ethernet links, and is short for Eth-Trunk. Member interfaces and member links: The interfaces that constitute an Eth-Trunk are member interfaces. The link corresponding to a member interface is member link. Active and inactive interfaces and links: • Member interfaces are classified into active interfaces that forward data and inactive interfaces that do not forward data. • Links connected to active interfaces are called active links, and links connected to inactive interfaces are called inactive links.
Upper threshold for the number of active interfaces: This setting guarantees higher network reliability. When the number of active member interfaces reaches the upper threshold, additional active member interfaces are set to Down and used as backup links. Lower threshold for the number of active interfaces: This setting ensure the minimum bandwidth of an Eth-Trunk. When the number of active interfaces falls below this threshold, the Eth-Trunk goes Down.
Forwarding principle An Eth-Trunk interface is assumed to be a physical interface at the MAC sub-layer. Therefore, frames transmitted at the MAC sub-layer only need to be delivered to the Eth-Trunk module.
Eth-Trunk forwarding entries: HASH-KEY value: is calculated through the hash algorithm on the MAC address or IP address in the packet. Interface number: Eth-Trunk forwarding entries are relevant to the number of member interfaces in an Eth-Trunk. Different HASH-KEY values are mapped to different outbound interfaces. Figure description For example, If three physical interfaces, 1, 2, and 3, are bundled into an Eth-Trunk, the Eth-Trunk forwarding table contains three entries, as shown in the preceding figure. In the Eth-Trunk forwarding table, the HASH-KEY values are 0, 1, 2, 3, 4, 5, 6, 7, and the corresponding interface numbers are 1, 2, 3, 1, 2, 3, 1, 2.
Forwarding process The Eth-Trunk module receives a frame from the MAC sublayer, and then extracts its source MAC address/IP address or destination MAC address/IP address according to the load balancing mode. The Eth-Trunk module calculates the HASH-KEY value using the hash algorithm. Based on the HASH-KEY value, the Eth-Trunk module searches the Eth-Trunk forwarding table for the interface number, and then sends the frame from the corresponding interface.
Mis-sequencing in common load balancing mode Because there are multiple physical links between devices of an Eth-Trunk, the first data frame of the same data flow is transmitted on one physical link, and the second data frame may be transmitted on another physical link. In this case, the second data frame may arrive at the peer device earlier than the first data frame. As a result, packet mis-sequencing occurs. Eth-Trunk load balancing The Eth-Trunk uses the load balancing mechanism. This mechanism uses the hash algorithm to calculate the address in a data frame and generates a hash key value. The system then searches for the outbound interface in the Eth-Trunk forwarding table based on the generated hash key value. Each MAC or IP address corresponds to a hash key value, so the system uses different outbound interfaces to forward data. This mechanism ensures that frames of the same data flow are forwarded on the same physical link and implements flowbased load balancing. Flow-based load balancing ensures the sequence of data transmission, but cannot guarantee the bandwidth use efficiency.
Manual load balancing mode If an active link fails, the other active links load balance the traffic evenly. If a high link bandwidth between two directly connected devices is required but the device does not support the LACP protocol, you can use the manual load balancing mode. LACP mode LACP uses a standard negotiation mechanism for switching devices. LACP enables switching devices to automatically create and enable aggregated links based on their configurations. After aggregated links are created, LACP maintains the link status. If an aggregated link's status changes, LACP automatically adjusts or disables the link aggregation.
LACP concepts LACP system priority: The LACP system priority (default value of 32768) is used to differentiate priorities of devices at both ends of an Eth-Trunk. In LACP mode, active interfaces selected by both devices must be consistent; otherwise, the LAG cannot be established. To keep active interfaces consistent at both ends, set a higher priority for one end.
In this manner, the other end selects active member interfaces based on the selection of the peer. The smaller the LACP system priority value, the higher the LACP system priority. When LACP system priorities are the same, the device with smaller MAC address functions as the Actor. LACP interface priority: The LACP interface priority (default value of 32768) is used to determine whether a member interface can be selected as an active interface. The smaller the LACP interface priority value, the higher the LACP interface priority. In LACP mode, LACP determines active and inactive links in an LAG. This mode is also called M:N mode, where M refers to the number of active links and N refers to the number of backup links. This mode guarantees high reliability and allows load balancing to be carried out across M active links.
LACP implementation After member interfaces are added to an Eth-Trunk in LACP mode, each end sends LACPDUs to inform its peer of its system priority, MAC address, interface priority, interface number, and keys. After being informed, the peer compares this information with that saved on itself, and selects which interfaces to be aggregated. Both ends determine active interfaces and links. Negotiation process Devices at both ends send LACPDUs to each other. • Create an Eth-Trunk in LACP mode on S1 and S2 and add member interfaces to the Eth-Trunk. The member interfaces are then enabled with LACP, and devices at both ends send LACPDUs to each other. Determine the Actor and active links. • When S2 receives LACPDUs from S1, S2 checks and records information about S1 and compares system priorities. If the system priority of S1 is higher than that of S2, S1 acts as the Actor. • After devices at both ends select the Actor, they select active interfaces according to the priorities of the Actor's interfaces.
LACP preemption • E1 becomes faulty, and then recovers. When E1 fails, E3 replaces E1 to transmit services. After E1 recovers, if LACP preemption is not enabled on the Eth-Trunk, E1 still retains a backup state. If LACP preemption is enabled on the Eth-Trunk, E1 becomes the active interface and E3 becomes the backup interface because E1 has higher priority than E3. LACP preemption delay • When LACP preemption occurs, the backup link waits for a given period of time before switching to the active state.
GVRP GVRP is based on GARP and is used to maintain VLAN attributes dynamically on devices. Through GVRP, VLAN attributes of one device can be propagated throughout the entire switching network. GVRP enables network devices to dynamically deliver, register, and propagate VLAN attributes, reducing the workload of network administrators and ensuring correct configuration. GVRP applies to only trunk links. GVRP uses the multicast MAC address of 01-80-C2-00-00-21. Participant On a device running GVRP, each GVRP-enabled port is considered as a GVRP participant. VLAN registration and deregistration GVRP implements automatic registration and deregistration of VLAN attributes. • VLAN registration: adds an interface to a VLAN. • VLAN deregistration: removes an interface from a VLAN. GVRP registers and deregisters VLAN attributes through attribute declarations and reclaim declarations: • When an interface receives a VLAN attribute declaration, it registers the VLAN specified in the declaration.
•
That is, the interface is added to the VLAN. When an interface receives a VLAN attribute reclaim declaration, it deregisters the VLAN specified in the declaration. That is, the interface is removed from the VLAN.
GARP participants exchange attribute information by sending messages. GVRP messages fall into Join, Leave, and LeaveAll messages. Join message: When a GARP participant requires that other devices register its attributes, receives Join messages from other GARP participants, or have attributes configured statically, it sends Join messages. Leave message: A GARP participant sends Leave messages to have its attributes deregistered from other devices. The GARP participant also sends Leave messages when receiving Leave messages from other GARP participants or when attributes are manually deregistered. LeaveAll message: A GARP participant sends LeaveAll messages to deregister all its attributes from all the other GARP participants. LeaveAll messages are used to periodically delete garbage attributes. For example, a garbage attribute may be created when a device fails to send a Leave message, due to sudden loss of power, that is used to notify other devices to deregister an attribute that it has removed.
Join timer To ensure that a Join message is reliably transmitted to other GARP participants, a GARP participant may send the Join message twice. When sending the first Join message, the GARP participant starts the Join timer. If a Join message is received before the Join timer expires, the GARP participant does not send the second Join message. If not, the GARP participant re-sends the Join message. The Join timer is configured on a per-port basis. Hold timer When you configure an attribute on a participant or when the participant receives a request message, the participant does not propagate the message to the other devices immediately. Instead, it sends the request messages received within a period of time and sends them in one GARP PDU. This period of time is specified by the Hold timer. By making full use of the data portion of GARP PDUs to send multiple messages in one packet, the mechanism reduces the number of transmitted packets and contributes to network stability. The Hold timer value must be no greater than half of the Join timer value.
Leave timer Upon receiving a Leave or LeaveAll message, a GARP participant starts its Leave timer. If it receives no Join message containing the attribute carried in the Leave or LeaveAll message when the Leave timer expires, it deregisters the attribute. The Leave timer value is twice that of the Join timer value. LeaveAll timer Upon startup, a GARP participant starts the LeaveAll timer. When the LeaveAll timer expires, the GARP participant sends out a LeaveAll message, and then restarts the LeaveAll timer to start another cycle. When receiving a LeaveAll message, a GARP participant restarts all timers, including the LeaveAll timer. If LeaveAll timers of multiple devices expire at the same time, multiple LeaveAll messages will be sent at the same time, creating unnecessary traffic. To avoid this problem, the actual LeaveAll timer value of a participant is a random value between the LeaveAll timer value and the LeaveAll timer value multiplied by 1.5. A LeaveAll event is equivalent to deregistering all attributes network wide by sending Leave messages. The LeaveAll timer value must be at least larger than the Leave timer value.
One-way registration of VLAN attributes Manually create static VLAN 2 on S1. In response to this action, GVRP automatically assigns the GVRP-enabled ports on S2 and S3 to VLAN 2 through one-way registration. The process is as follows: • After VLAN 2 is created on S1, E1 on S1 starts the Join timer and Hold timer. When the Hold timer expires, S1 sends the first JoinEmpty message to S2. When the Join timer expires, E1 restarts the Hold timer. When the Hold timer expires again, Port 1 sends the second JoinEmpty message. • After E2 on S2 receives the first JoinEmpty message, S2 creates dynamic VLAN 2 and adds E2 to VLAN 2. In addition, S2 requests E3 to start the Join timer and Hold timer. When the Hold timer expires, E3 sends the first JoinEmpty message to S3. When the Join timer expires, E3 restarts the Hold timer. When the Hold timer expires again, E3 sends the second JoinEmpty message. After E2 receives the second JoinEmpty message, S2 does not take any action because E2 has been added to VLAN 2.
•
•
After E4 of S3 receives the first JoinEmpty message, S3 creates dynamic VLAN 2 and adds E4 to VLAN 2. After E4 receives the second JoinEmpty message, S3 does not take any action because E4 has been added to VLAN 2. Every time the LeaveAll timer expires or a LeaveAll message is received, each device restarts the LeaveAll timer, Join timer, Hold timer, and Leave timer. E1 then repeats step 1 to send JoinEmpty messages. E3 of S2 sends JoinEmpty messages to S3 in the same way.
Two-way registration of VLAN attributes After one-way registration is complete, E1, E2, and E4 are added to VLAN 2 but E3 is not added to VLAN 2 because only interfaces receiving a JoinEmpty or JoinIn message can be added to dynamic VLANs. To transmit traffic of VLAN 2 in both directions, VLAN registration from S3 to S1 is required. The process is as follows: • After one-way registration is complete, static VLAN 2 is created on S3 (the dynamic VLAN is replaced by the static VLAN). E4 on S3 starts the Join timer and Hold timer. When the Hold timer expires, E4 on S3 sends the first JoinIn message (because it has registered VLAN 2) to S2. When the Join timer expires, E4 restarts the Hold timer. When the Hold timer expires, E4 sends the second JoinIn message. • After E3 on S2 receives the first JoinIn message, S2 adds E3 to VLAN 2 and requests E2 to start the Join timer and Hold timer. When the Hold timer expires, E2 sends the first JoinIn message to S1. When the Join timer expires, E2 restarts the Hold timer. When the Hold timer expires again, E2 sends the second JoinIn message. After E3 receives the second JoinIn message, S2 does not take any action because E3 has been added to VLAN 2. • When S1 receives the JoinIn message, it stops sending JoinEmpty messages to S2. Every time the LeaveAll timer expires or a LeaveAll message is received, each device restarts the LeaveAll timer, Join timer, Hold timer, and Leave timer. E1 on S1 sends a JoinIn message to S2 when the Hold timer expires. • S2 sends a JoinIn message to S3. • After receiving the JoinIn message, S3 does not create dynamic VLAN 2 because static VLAN 2 has been created.
One-way deregistration of VLAN attributes When VLAN 2 is not required on devices, the devices can deregister VLAN 2. The process is as follows: • After static VLAN 2 is manually deleted from S1, E1 on S1 starts the Hold timer. When the Hold timer expires, E1 sends a LeaveEmpty message to S2. E1 needs to send only one LeaveEmpty message. • After E2 on S2 receives the LeaveEmpty message, it starts the Leave timer. When the Leave timer expires, E2 deregisters VLAN 2. Then E2 is deleted from VLAN 2, but VLAN 2 is not deleted from S2 because E3 is still in VLAN 2. At this time, S2 requests E3 to start the Hold timer and Leave timer. When the Hold timer expires, E3 sends a LeaveIn message to S3. Static VLAN 2 is not deleted from S3, so E3 can receive the JoinIn message sent from E4 after the Leave timer expires. In this case, S1 and S2 can still learn dynamic VLAN 2. • After S3 receives the LeaveIn message, E4 is not deleted from VLAN 2 because VLAN 2 is a static VLAN on S3. Two-way deregistration of VLAN attributes To delete VLAN 2 from all devices, two-way deregistration is required. The process is as follows:
• •
•
After static VLAN 2 is manually deleted from S3, E4 on S3 starts the Hold timer. When the Hold timer expires, E4 sends a LeaveEmpty message to S2. After E3 on S2 receives the LeaveEmpty message, it starts the Leave timer. When the Leave timer expires, E3 deregisters VLAN 2. Then E3 is deleted from dynamic VLAN 2, and dynamic VLAN 2 is deleted from S2. At this time, S2 requests E2 to start the Hold timer. When the Hold timer expires, E2 sends a LeaveEmpty message to S1. After E1 on S1 receives the LeaveEmpty message, it starts the Leave timer. When the Leave timer expires, E1 deregisters VLAN 2. Then E1 is deleted from dynamic VLAN 2, and dynamic VLAN 2 is deleted from S1.
Manually configured VLANs are called static VLANs and VLANs created using GVRP are called dynamic VLANs.
Case description To enable PC1 and PC2 whose interfaces are isolated in VLAN 2 to communicate with each other, enable intra-VLAN proxy ARP on S1.
Command usage The port-isolate enable command enables port isolation. The arp-proxy inner-sub-vlan-proxy enable command enables intra-VLAN proxy ARP.
View Interface view Parameters port-isolate enable [ group group-id ] group-id: specifies the ID of a port isolation group. The default value is 1. Precautions You can use the display port-isolate command to view the port isolation group configuration.
Case description Preemption needs to be enabled to meet requirement 3.
Command usage The mode command configures the working mode of an EthTrunk. The eth-trunk command adds an interface to an Eth-Trunk. The load-balance command sets a load balancing mode of an Eth-Trunk. The max active-linknumber command sets the upper threshold for the number of active member links on an EthTrunk. The lacp priority command sets the LACP system or interface priority. The lacp preempt enable command enables priority preemption in static LACP mode. Precautions When adding an interface to an Eth-Trunk, pay attention to the following points: • An Eth-Trunk contains a maximum of 8 member interfaces. • A member interface cannot be configured with any service or static MAC address. • The link type of the member interface added to the EthTrunk must be hybrid.
• •
• • •
• •
•
An Eth-Trunk cannot be nested, that is, its member interface cannot be an Eth-Trunk. An Ethernet interface can be added to only one EthTrunk. To add the Ethernet interface to another EthTrunk, delete it from the original Eth-Trunk first. Member interfaces of an Eth-Trunk must be of the same type. That is, FE and GE interfaces cannot join the same Eth-Trunk. Ethernet interfaces on different LPUs can join the same Eth-Trunk. The remote interface directly connected to the local Eth-Trunk member interface must also be bundled into an Eth-Trunk; otherwise, the two ends cannot communicate. When member interfaces use different rates, congestion may occur on the low-rate interface, causing packet loss. After interfaces are added to an Eth-Trunk, MAC addresses are learned on the Eth-Trunk but not the member interfaces. When all member interfaces of an Eth-Trunk work in half-duplex mode, the Eth-Trunk cannot negotiate an Up state.
Case description Deploy GVRP to meet requirement 2.
Command usage The gvrp command enables GVRP globally or on an interface. Precautions Before enabling GVRP on an interface, you must set the link type of the interface to trunk. The display gvrp vlan-operation command displays the dynamic VLANs to which an interface is added.
PPP includes three protocols: Link Control Protocol (LCP): is used to establish, monitor, and tear down PPP data links. LCP can automatically detect the link environment, for example, check whether there are loops. It also negotiates link parameters such as the maximum packet length and authentication protocol to be used. Compared with other data link layer protocols, PPP has an important feature, that is, it can provide the authentication function. The two ends of a link can negotiate the authentication protocol to be used and implement authentication. The ends can be connected only when the authentication succeeds. Due to this feature, PPP is appropriate for carriers to provide access to distributed users. Network Control Protocol (NCP): is used to negotiate the format and type of packets transmitted on data links. For example, IP Control Protocol (IPCP) and Internetwork Packet Exchange Control Protocol (IPXCP) are used to control parameter negotiation of IP and IPX packets respectively. PPP extensions: give PPP support functions. For example, PPP extensions provide the Password Authentication Protocol (PAP) and Challenge Handshake Authentication Protocol (CHAP) to ensure network security.
PPP packet format Flag field • The Flag field identifies the start and end of a physical frame and is always 0x7E. Address field • The Address field identifies a peer. Two communicating devices connected by using PPP do not need to know the data link layer address of each other because PPP is used on P2P links. This field must be filled with a broadcast address of all 1s and is of no significance to PPP. • Control field • The Control field value defaults to 0x03, indicating an unsequenced frame. By default, PPP does not use sequence numbers or acknowledgement mechanisms to ensure transmission reliability. • The Address and Control fields identify a PPP packet, so the PPP packet header value is FF03. • Protocol field • The Protocol field identifies the datagram encapsulated in the Information field of a PPP data packet. LCP packet format Code field • The Code field is 1 byte in length and identifies the LCP packet type.
Identifier field • The Identifier field is 1 byte long. It is used to match request and response packets. If a device receives a packet with an invalid Identifier field, the device discards the packet. • The sequence number of a Configure-Request packet usually begins with 0x01 and increases by 1 each time a Configure-Request packet is sent. After a receiver receives a Configure-Request packet, it must send a response packet with the same sequence number as that of the received ConfigureRequest packet. Length field • The Length field specifies the total number of bytes in the LCP packet. It specifies the length of an LCP packet, including the Code, Identifier, Length and Data fields. • The Length field value cannot exceed the maximum receive unit (MRU) of the link. Bytes outside the range of the Length field are treated as padding and are ignored after they are received. Data field • The Type field specifies the negotiation option type. • The Length field specifies the total length of the Data field, including Type, Length, and Data. • The Data field contains the contents of the negotiation option.
The PPP link establishment process is as follows: Dead: PPP starts and ends with the Dead phase. After the physical status of two communicating devices becomes Up (marked as UP in the figure), PPP enters the Establish phase. Establish: The two devices negotiate link layer parameters in the Establish phase. If negotiation of link layer parameters fails (marked as FAIL in the figure), a PPP connection cannot be established and PPP returns to the Dead phase. If negotiation of link layer parameters succeeds (marked as OPENED in the figure), PPP enters the Authenticate phase. Authenticate: In the Authenticate phase, the authenticating party authenticates the authenticated party. If authentication fails (marked as FAIL in the figure), PPP enters the Terminate phase. If authentication succeeds (marked as SUCCESS in the figure) or none authentication is configured, PPP enters the Network phase. Network: In the Network phase, the two devices use NCP to negotiate network-layer parameters. If negotiation succeeds, a PPP connection can be established and data packets can be transmitted over the PPP connection. When the upper-layer protocol determines that the PPP connection (for example, an on-demand circuit)should be disconnected or an administrator manually disconnects the PPP connection, PPP enters the Terminate phase. Terminate: In the Terminate phase, the two devices use LCP to disconnect the PPP connection. After the PPP connection is disconnected (marked as Down in the figure), PPP enters the Dead phase.
Note: The working phases of PPP listed in this slide are not the status of the PPP protocol because PPP is a protocol suite that does not have a protocol status. Only specified protocols such as LCP and NCP can have a protocol status that can change from one state to another.
3 Type packets of LCP Protocal: 1.Link configure packet, used to establish and configure links: Configure-Request, Configure-Ack, Configure-Nak, Configure-Reject. 2.Link disconnection packet, used to end links: Terminate-Request, Terminate-Ack. 3.Link maintenance packet, used to management and debug links: Code-Reject, Protocol-Reject, Echo-Request, Echo-Reply, DiscardRequest.
LCP is used to negotiate the following parameters: MRU is used on the Versatile Routing Platform (VRP) to indicate the maximum transmission unit configured on an interface. The PPP authentication protocols include PAP and CHAP. Two ends of a PPP link can use different protocols to authenticate the peer. However, the authenticated party must support the authentication protocol used by the authenticating party and have authentication information such as the user name and password correctly configured. LCP uses the magic number to detect link loops and other exceptions. A magic number is a randomly generated digit. It should be ensured that the two ends do not generate the same magic number. After a device receives a Configure-Request packet, it compares the magic number in the Configure-Request packet received with the locally generated magic number. If they are different, link loops do not occur and the device sends a Confugure-Ack packet (if other parameters are successfully negotiated) to indicate that negotiation of the magic number succeeds. If subsequent packets contain the MagicNumber field, the value of the field is set to the successfully negotiated magic number and LCP does not generate a new magic number. If the magic number in the Configure-Request packet received is the same as that received previously, the receiver sends a Confugure-Nak packet to the sender, carrying a new magic number. The sender sends a new Configure-Request packet carrying a new magic number, regardless of whether the magic number in the Configure-Nak packet received is the same as that . If a link loop exists, the process persists. If no link loop exists, packet exchange will soon be restored.
Link negotiation success: As shown in the figure, R1 and R2 are connected in series and run PPP. When the physical status of the link becomes Up, R1 and R2 use the LCP to negotiate link layer parameters. In this example, R1 sends an LCP packet. R1 sends a Configure-Request packet to R2, carrying link-layer parameters configured on the sender (R1). The link-layer parameters use the Type, Length, Value structure. After receiving the Configure-Request packet, R2 sends a Configure-Ack packet to R1 if it can identify all the link-layer parameters in the packet and determines that the value of each parameter is acceptable. If R1 does not receive a Configure-Ack packet, it re-transmits a Configure-Request packet once every 3 seconds. If R1 still cannot receive a Configure-Ack packet after the Configure-Request packet is re-transmitted for 10 consecutive times, it determines that the peer is unavailable and stops sending Configure-Request packets. Note: After the process is complete, R2 determines that the link-layer parameters configured on R1 are acceptable. R2 also needs to send Configure-Request packets to R1, so that R1 can determine whether the link-layer parameters configured on R2 are acceptable.
Link negotiation failure: After R2 receives a Configure-Request packet from R1, R2 sends a Configure-Nak packet to R1 if R2 can identify all the link-layer parameters in the packet, but determines that all or some of the parameter values are unacceptable, indicating that parameter negotiation fails. The Configure-Nak packet contains only the parameters whose values are unacceptable, and the value of each parameter is changed to a value or value range that is acceptable on R2. After receiving the Configure-Nak packet, R1 changes the parameter values used locally based on the values in the Configure-Nak packet, and then sends a Configure-Request packet. If negotiation still fails after the Configure-Request packet is sent for five consecutive times, the parameters are disabled and parameter negotiation stops.
The link negotiation parameters cannot be identified. After receiving a Configure-Request packet from R1, R2 sends a Configure-Reject packet to R1 if R2 cannot identify all or some linklayer parameters in the packet. The Configure-Reject packet contains only the parameters that cannot be identified. After receiving the Configure-Reject packet, R1 sends a ConfigureRequest packet to R2, carrying only parameters that can be identified by R2.
The link state detection process is as follows: After a connection is set up using LCP, Echo-Request and EchoReply packets can be used to detect the link status. If a device replies an Echo-Reply packet each time it receives an EchoRequest packet, the link status is normal. By default, the VRP platform sends an Echo-Request packet once every 10 seconds.
The process of tearing down a connection is as follows: LCP can tear down an existing connection if the authentication fails or an administrator manually shuts down the connection. LCP uses Terminate-Request and Terminate-Ack packets to disconnect a connection. The Terminate-Request packet is used to request the peer to disconnect the connection. After receiving a Terminate-Request packet, the device replies a Terminate-Ack packet to confirm that the connection is to be disconnected. If a device fails to receive a Terminate-Ack packet, it re-transmits a Terminate-Request packet once every 3 seconds. If the device still does not receive a Terminate-Ack packet after sending the TerminateRequest packet twice consecutively, it determines that the peer is unavailable, and then disconnects the connection.
A PAP packet is encapsulated in the PPP packet directly.
The PAP authentication process is as follows: The authenticated party sends an Authenticate-Request packet carrying the user name and password in plaintext to the authenticating party. In this example, the user name and password are huawei and hello. After receiving the user name and password from the authenticated party, the authenticating party compares the user name and password with those configured locally to check whether they are correct. If the user name and password are correct, the authenticating party returns an Authenticate-Ack packet, indicating that the authentication succeeds. If the user name and password are incorrect, the authenticating party returns an Authenticate-Nak packet, indicating that the authentication fails.
The encryption algorithm Message Digest 5 (MD5) is used to calculate a 16-byte character string, which is the concatenation of Identifier+password+challenge. The authenticated party adds the calculated 16-byte character string to the Data field of the Response packet and sends the packet to the authenticating party.
CHAP is a three-way handshake authentication protocol. The Request packet and Response packet exchanged between two communicating devices during one CHAP process contain the same Identifier. Unidirectional CHAP authentication is applicable to two scenarios: the authenticating party is configured with a user name, and the authenticating party is not configured with a user name. It is recommended that the authenticating party be configured with a user name. When the authenticating party is configured with a user name (that is, the ppp chap user username command is configured on the interface): • The authenticating party initiates an authentication request by sending a Challenge packet that carries the local user name to the authenticated party. • After receiving the Challenge packet on an interface, the authenticated party checks whether the ppp chap password command is used on the interface. If this command is used, the authenticated party uses MD5 to calculate the concatenation of Identifier, password generated by the ppp chap password command, and a random number. The authenticated party then sends a Response packet carrying the calculated ciphertext password and local user name to the authenticating party. If the ppp chap password command is not configured, the authenticated party searches the local user table for the password matching the user name of the authenticating party in the received Challenge packet, and encrypts the matching password by using MD5 in a similar way. The authenticated party sends a Response packet carrying the calculated ciphertext password and local user name to the authenticating party.
•
The authenticating party encrypts the locally saved password of the authenticated party by using MD5. The authenticating party then compares the generated ciphertext password with that carried in the received Response packet, and returns a response based on the check result. When the authenticating party is not configured with a user name (that is, the ppp chap user username command is not configured on the interface): • The authenticating party initiates an authentication request by sending a Challenge packet. • After receiving the Challenge packet, the authenticated party uses MD5 to calculate the concatenation of Identifier, password generated by the ppp chap password command, and a random number. It then sends a Response packet carrying the ciphertext password and local user name to the authenticating party. • The authenticating party encrypts the locally saved password of the authenticated party by using MD5. The authenticating party then compares the generated ciphertext password with that carried in the received Response packet, and returns a response based on the check result.
IPCP negotiates IP addresses of two devices to transmit IP packets over PPP links. IPCP and LCP have the same negotiation mechanism, packet type, and working process. Topology Configure two IP addresses 12.1.1.1/24 and 12.1.1.2/24 for the two ends. (IPCP can be used to negotiate IP addresses even if they are not on the same network segment.) The static IP address negotiation process is as follows: • R1 and R2 send a Configure-Request packet carrying the local IP address to each other. • After receiving the Configure-Request packet from the peer, R1 and R2 check the IP address in the packet. If the IP address is a valid unicast IP address, and is different from the local IP address configured, R1/R2 determines that the peer can use this address and returns a Configure-Ack packet. • IPCP uses Configure-Request and Configure-Ack packets to allow two ends at a PPP link to discover each other’s 32bit IP address.
As shown in the figure, R1 requests the peer to allocate an IP address for it and R2 is configured with a static IP address 12.1.1.2/24. R2 is enabled to allocate an IP address 12.1.1.1 to R1. The dynamic IP address negotiation process is as follows: R1 sends a Configure-Request packet carrying the IP address 0.0.0.0 to R2, requesting R2 to allocate an IP address for it. After receiving the Configure-Request packet, R2 determines that the IP address 0.0.0.0 is invalid and returns a Configure-Nak packet carrying a new IP address 12.1.1.1 to R1. After receiving the Configure-Nak packet, R1 updates the local IP address, and then sends a Configure-Request packet carrying the new IP address 12.1.1.1 to R2. After receiving the Configure-Request packet, R2 determines that the IP address 12.1.1.1 is valid, and returns a Configure-Ack packet to R1. In addition, R2 also sends a Configure-Request packet carrying the IP address 12.1.1.2 to R1. R1 determines that the IP address 12.1.1.2 is valid, and returns a Configure-Ack packet to R2.
Multilink PPP fragments a packet and sends the fragments to the same destination over multiple PPP links.
PPPoE overview PPPoE allows a large number of hosts on an Ethernet to connect to the Internet using a remote access device and controls each host using PPP. PPPoE features a large application scale, high security, and convenient accounting.
Topology
A PPPoE session is set up between each PC and the router on the carrier network. Each PC functions as a PPPoE client and has a unique account, which facilitates user accounting and control by the carrier. The PPPoE client software must be installed on the PCs.
The PPPoE session establishment process includes three stages: Discovery, Session, and Terminate. Discovery stage: A PPPoE client broadcasts a PPPoE Active Discovery Initial (PADI) packet that contains service information required by the PPPoE client. After receiving the PADI packet, all PPPoE servers compare the requested service with the services they can provide. The PPPoE servers that can provide the requested service unicast PPPoE Active Discovery Offer (PADO) packets to the PPPoE client. Based on the network topology, the PPPoE client may receive PADO packets from more than one PPPoE server. The PPPoE client selects the PPPoE server from which the first PADO packet is received and unicasts a PPPoE Active Discovery Request (PADR) packet to the PPPoE server. The PPPoE server generates a unique session ID to identify the PPPoE session with the PPPoE client. The PPPoE server sends a PPPoE Active Discovery Sessionconfirmation (PADS) packet containing this session ID to the PPPoE client. When the PPPoE session is established, the PPPoE server and PPPoE client enter the PPPoE Session stage. When the PPPoE session is established, the PPPoE server and PPPoE client share the unique PPPoE session ID and learns the peer Ethernet address.
Session stage: PPP negotiation at the PPPoE Session stage is the same as common PPP negotiation. When PPP negotiation succeeds, PPP data packets can be forwarded. At the PPPoE Session stage, the PPPoE server and client send all Ethernet data packets in unicast mode. Terminate stage: After a PPPoE session is established, the PPPoE client or the PPPoE server can unicast a PADT packet to terminate the PPPoE session at any time. When a PADT packet is received, no further PPP traffic can be sent using this session.
Four types of FR interfaces are available: A user's device is called a DTE, and the corresponding interface type is DTE. A network device that provides access services for DTE devices is called a DCE, and the corresponding interface type is DCE or NNI. A UNI interface interconnects the DTE and DCE. An NNI interface interconnects two FR switches. A Virtual Circuit (VC) is a logical circuit established between two network devices on the same network. Based on establishment mode, VCs are classified into two types: • PVC: refers to the manually created VC. • SVC: refers to the VC that can be created or deleted automatically through negotiation. The PVC status of the DTE is determined by the DCE. The PVC status of the DCE is determined by the network. VCs are identified by the DLCI and a DLCI takes effect only on a local interface and its directly connected interface. On an FR network, a DLCI can identify multiple VCs established on different physical interfaces.
LMI: local management interface used to monitor the PVC status. The system supports three LMI protocols: ITU-T Q.933 Annex A, ANSI T1.617 Annex D, and non-standard compatible protocol. The non-standard compatible protocol is used for interconnection with a device from a vendor except Huawei. The PVC status of the DTE is determined by the DCE. The PVC status of the DCE is determined by the network. When two network devices are directly connected, the PVC status of the DCE is set by the device administrator. The LMI negotiation process is as follows: The DTE periodically sends Status Enquiry messages. After receiving the Status Enquiry message, the DCE replies a Status message. The DTE parses the received Status message to obtain the link status and PVC status. When the DTE and DCE can normally send and receive LMI negotiation messages, the link protocol status changes to Up, and the PVC status changes to Active. The FR LMI negotiation succeeds.
After the FR LMI negotiation succeeds and the PVC status changes to Active, two devices on a PVC start the InARP negotiation process: If a protocol address is configured on the local interface, the local device (for example, R1) sends an Inverse ARP Request packet to the peer device (for example, R2) over the VC. The Inverse ARP Request packet carries the protocol address of R1. After receiving the Inverse ARP Request packet, R2 obtains the protocol address of R1, generates an address mapping, and sends an Inverse ARP Response packet to R1. After receiving the Inverse ARP Response packet, R1 parses the address of R2 in the packet and generates an address mapping. R1 generates the address mapping 12.1.1.2 to 100, while R2 generates the address mapping 12.1.1.1 to 100. If a static mapping is configured manually or a dynamic mapping is created, the local device does not send an InARP Request packet to the remote device over the VC regardless of whether the remote address in the address mapping is correct. The local device sends an InARP Request packet to the remote device only when no mapping exists.
Sub-interfaces can solve the problem caused by split horizon on an FR network. One physical interface can contain multiple logical subinterfaces. Each sub-interface can connect to a remote router over one or multiple DLCIs. The routers are connected over the FR network. You can define logical sub-interfaces on the serial line. Every sub-interface uses one or multiple DLCIs to connect to the remote router. After a DLCI is configured on a subinterface, the mapping between the destination protocol address and this DLCI needs to be created. As shown in the figure, R4 has only one physical serial interface S0; however, DLCIs are defined on S0 to connect the sub-interfaces S0.1, S0.2, and S0.3 to R1, R2, and R3 respectively. Two types of sub-interfaces are available: P2P sub-interface: used to connect to a single remote device. Each P2P sub-interface can be configured with only one PVC. In this case, the remote device can be determined uniquely without the static address mapping. Therefore, when the PVC is configured for the subinterface, the peer address is identified.
P2MP sub-interface: used to connect to multiple remote devices. Each sub-interface can be configured with multiple PVCs. Each PVC maps the protocol address of its connected remote device. In this way, different PVCs can reach different remote devices. You can manually configure the address mapping, or use InARP to dynamically create the address mapping.
Case description The NCP protocol can be used to allocate an IP address to the peer. You need to configure the ppp chap user Huawei command on R1's interface to enable R1 to send a Challenge packet to R2 carrying the user name Huawei.
Command usage ppp authentication-mode: Configures the PPP authentication mode in which the local device authenticates the remote device. ppp chap user: Configures a user name for CHAP authentication. ppp chap password: Configures a password for CHAP authentication. ip address ppp-negotiate: Configures IP address negotiation on an interface to allow the interface to obtain an IP address from the remote device. remote address: Configures the local device to assign an IP address or specify an IP address pool for the remote device. Usage scenario Interface view Parameters ppp authentication-mode { chap | pap } chap: Indicates the CHAP authentication mode. pap: Indicates the PAP authentication mode. ppp chap user username username: Specifies a user name for CHAP authentication. ppp chap password { cipher | simple } password cipher: Indicates a ciphertext password. Simple: Indicates a plaintext password. Password: Specifies the password for CHAP authentication. remote address { ip-address | pool pool-name } cipher: Indicates a ciphertext password. Simple: Indicates a plaintext password. Password: Specifies the password for CHAP authentication.
Precautions In CHAP authentication, the authenticated party does not send the password to the authenticating party. The local device can use IPCP to learn the 32-bit host address from the remote
Command usage interface mp-group: Creates an MP-Group interface and enters the MP-Group interface view. ppp mp mp-group: Binds an interface to the MP-Group interface so that the interface works in MP mode. restart: Restarts the current interface. Precautions Data frames will be lost after you disable the interface. Exercise caution when you use the restart command.
Case description You need to get familiar with the configurations of the PPPoE server and PPPoE client in this case.
Command usage virtual-template: Creates a VT interface and enters the VT interface view. pppoe-server bind virtual-template: Binds a specified VT interface to an Ethernet interface and enables PPPoE on the Ethernet interface. remote address: Configures the local device to assign an IP address or specifies an IP address pool for the remote device. dialer-rule: Enters the dialer rule view. dialer-rule: Specifies a dialer ACL for a dialer access group and defines conditions to initiate calls. interface dialer: Creates a dialer interface and enters the dialer interface view. dialer user: Enables the resource-shared DCC and specifies the remote user name of the dialer interface. dialer-group: Adds an interface to a dialer access group. That is, the number of the dialer rule is specified. dialer bundle: Specifies a dialer bundle for a dialer interface in the resource-shared DCC. pppoe-client dial-bundle-number: Specifies a dialer bundle for a PPPoE session. Parameters remote address { ip-address | pool pool-name } ip-address: Specifies an IP address to be allocated to the remote device. pool pool-name: Specifies the name of the IP address pool, from which an IP address is allocated to the remote device.
dialer-rule dialer-rule-number { acl { acl-number | name acl-name } | ip { deny | permit } | ipv6 { deny | permit } } dialer-rule-number: Specifies the number of a dialer access group. The number is the same as the value of group-number in the dialer-group command. acl { acl-number |name acl-name }: Indicates the number or name of the dialer ACL. ip { deny | permit }: Indicates whether the dialer ACL allows or forbids IPv4 packets. Precautions To configure the local device to allocate an IP address to the remote device, run the ppp ipcp remote-address forced command in the interface view.
Case description In the case of FR network, you do not need to manually configure the mapping relationship for a P2P sub-interface.
Precautions You do not need to manually configure the mapping relationship if the sub-interface is a P2P sub-interface no matter that has InARP disabled or not.
Topology Description Broadcast storm • Assume that STP is not enabled on the switching devices. If PC1 broadcasts a request, the request is received by port1 and forwarded by port2 on S1 and S2. On S1 and S2, port 2 receives the request broadcast by the other switch and port1 forwards the request. As such transmission repeats and resources on the entire network are exhausted, causing the network to break down. MAC address table flapping • Port2 on S1 can learn the MAC address of the PC2. Since S2 forwards data frames sent by PC2 to its other ports, S1 may learn the MAC address of PC2 on port1. S1 continuously modifies its MAC address table, causing flapping of the MAC address table.
STP
STP can eliminate network loops. STP is used to build a loopfree network (tree) to ensure the unique data transmission path and prevent infinite looping of packets. STP works at the data link layer of the OSI model. STP-capable switches exchange BPDUs and perform distributed calculation to determine which ports need to be blocked to prevent loops.
Root bridge The root bridge is the bridge with the smallest BID, which is composed of the priority and MAC address. Root Port The root port is the port with the smallest root path to the root bridge, and is responsible for forwarding data to the root bridge. The root port is determined based on the path cost. Among all STP-capable ports on a network bridge, the port with the smallest root path cost is the root port. There is only one root port on an STP-capable device, but there is no root port on the root bridge. Designated port and bridge The bridge closest to the root bridge on each network segment is used as the designated bridge. The port on the designated bridge to the network segment is called designated port. The designated port is responsible for forwarding traffic, and the designated bridge is responsible for forwarding configuration BPDUs. After the root bridge, root port, and designated port are selected successfully, the entire tree topology is set up. When the topology is stable, only the root port and the designated port forward traffic. All the other ports are in Blocking state, and receive only STP BPDUs but not forward user traffic.
A configuration BPDU is generated in one of the three following scenarios: When ports are enabled with STP, the designated ports send configuration BPDUs at intervals specified by the Hello timer. When a root port receives configuration BPDUs, the device where the root port resides sends a copy of the configuration BPDUs to its designated port. When receiving a configuration BPDU with a lower priority, the designated port immediately sends its own configuration BPDUs to the downstream device. Root identifier The root identifier is composed of the priority and MAC address of the root bridge. The default priority is 32768. Root path cost Cumulative cost of all links to the root bridge. Bridge Identifier (BID) BID of the device sending configuration BPDUs. On a LAN, the BID is the ID of the designated bridge. Port Identifier (PID) PID of the port sending configuration BPDUs. The PID consists of the port priority and port number. On a LAN, the PID is the ID of the designated port.
Hello Time The Hello timer specifies the interval at which an STP-capable device sends configuration BPDUs to detect link faults. When the network topology becomes stable, the change of the interval takes effect only after a new root bridge takes over. After a topology changes, TCN BPDUs will be sent. This interval is irrelevant to the transmission of TCN BPDUs. The default value is 2 seconds. Max Age After a non-root bridge running STP receives a configuration BPDU, the non-root bridge compares the Message Age value with the Max Age value in the received configuration BPDU. • If the Message Age value is smaller than or equal to the Max Age value, the non-root bridge forwards the configuration BPDU. • If the Message Age value is larger than the Max Age value, the configuration BPDU ages and the non-root bridge directly discards it. In this case, the network size is considered too large and the non-root bridge disconnects from the root bridge. In real world situations, each time a configuration BPDU passes through a bridge, the value of Message Age increases by 1. The default value is 20. Forward Delay The Forward Delay timer specifies the delay for interface status transition. The default value is 15 seconds.
STP Topology Calculation After all devices on the network are enabled with STP, each device considers itself as the root bridge. Each device only transmits and receives BPDUs but does not forward user traffic. All ports are in Listening state. After exchanging configuration BPDUs, all devices participate in the selection of the root bridge, root port, and designated port. During network initialization, every device considers itself as the root bridge and sets the root bridge ID as the device ID. Devices exchange configuration BPDUs to compare the root bridge IDs. The device with the smallest BID is elected as the root bridge. The switch priority is configurable. The value ranges from 0 to 65535. The default priority is 32768. Topology Description Assume that the priorities of S1 and S2 are 0 and 1. Port A on S1 connects to Port B on S2. S1 sends the configuration BPDU of {0, 0, 0, Port A} and S2 sends the configuration BPDU of {1, 0, 1, Port B}. After the two switches compare the configuration BPDUs, S1 is deemed to have a higher priority than S2, so S1 becomes the root bridge.
Topology Description Priorities of S1, S2, and S3 are 0, 1, and 2, and the path costs between S1 and S2, between S1 and S3, and between S2 and S3 are 5, 10, and 4 respectively. Initial configuration BPDUs on ports of S1, S2, and S3: S1: {0, 0, 0, PortA1} on PortA1 and {0, 0, 0, Port A2} on Port A2 S2: {1, 0, 1, PortB1} on PortB1 and {1, 0, 1, Port B2} on Port B2 S3: {2, 0, 2, PortC1} on PortC1 and {21, 0, 2, Port C2} on Port C2
First exchange of configuration BPDUs Ports on S1, S2, and S3 send their configuration BPDUs. Each network bridge considers itself as the root bridge, so the RPC is 0.
Comparison for the first exchange of configuration BPDUs S1 • Port A1 receives the configuration BPDU {1, 0, 1, Port B1} from Port B1 and finds that its configuration BPDU {0, 0, 0, Port A1} has higher priority than the configuration BPDU {1, 0, 1, Port B1}, so Port A1 discards the configuration BPDU {1, 0, 1, Port B1}. • Port A2 receives the configuration BPDU {2, 0, 2, Port C1} from Port C1 and finds that its configuration BPDU {0, 0, 0, Port A2} has higher priority than the configuration BPDU {2, 0, 2, Port C1}, so Port A2 discards the configuration BPDU {2, 0, 2, Port C1}. • After finding that both the root and the designated switch IDs refer to itself in the configuration BPDU on each port, S1 considers itself as the root bridge. S1 then sends configuration BPDUs from each port periodically without modifying the configuration BPDUs. • The configuration BPDU {0, 0, 0, Port A1} on Port A1 and configuration BPDU {0, 0, 0, Port A2} on Port A2 are optimal. • Because S1 is the root bridge, all ports on S1 are designated ports.
S2 •
•
• •
Port B1 receives the configuration BPDU {0, 0, 0, Port A1} from Port A1 and finds that its configuration BPDU {0, 0, 0, Port A1} has a higher priority than the configuration BPDU {1, 0, 1, Port B1}, so Port B1 updates its configuration BPDU. Port B2 receives the configuration BPDU {2, 0, 2, Port C2} from Port C2 and finds that its configuration BPDU {1, 0, 1, Port B2} has a higher priority than the configuration BPDU {2, 0, 2, Port C2}, so Port B2 discards the configuration BPDU {2, 0, 2, Port C2}. The configuration BPDU {0, 0, 0, Port A1} on Port B1 and the configuration BPDU {1, 0, 1, Port B2} on Port B2 are optimal. Comparison of configuration BPDUs on ports: • S2 compares the configuration BPDU on each port and finds that the configuration BPDU on Port B1 has the highest priority, so Port B1 is used as the root port and the configuration BPDU on Port B1 remains unchanged. • S2 calculates the BPDU {0, 5, 1, Port B2} for Port B2 based on the configuration BPDU and path cost of the root port, and compares the configuration BPDU {0, 5, 1, Port B2} with its configuration BPDU {1, 0, 1, Port B2} on Port B2. S2 finds that the calculated configuration BPDU has a higher priority, so Port B2 is used as the designated port, and its configuration BPDU is replaced by the calculated configuration BPDU and the calculated configuration BPDU is sent periodically.
S3 •
•
Port C1 receives the configuration BPDU {0, 0, 0, Port A2} from Port A2 and finds that the configuration BPDU {0, 0, 0, Port A2} has a higher priority than its configuration BPDU {2, 0, 2, Port C1}, so Port C1 updates its configuration BPDU. Port C2 receives the configuration BPDU {1, 0, 1, Port B2} from Port B2 and finds that the configuration BPDU {1, 0, 1, Port B2} has a higher priority than its configuration BPDU {2, 0, 2, Port C2}, so Port C2 updates its configuration BPDU.
• •
The configuration BPDU {0, 0, 0, Port A2} on Port C1 and configuration BPDU {1, 0, 1, Port B2} on Port C2 are optimal. Comparison of configuration BPDUs on ports: • S3 compares the configuration BPDU on each port and finds that the configuration BPDU on Port C1 has the highest priority, so Port C1 is used as the root port and the configuration BPDU on Port C1 remains unchanged. • S3 calculates the configuration BPDU {0, 10, 2, Port C2} for Port C2 based on the configuration BPDU and path cost of the root port, and compares the configuration BPDU {0, 10, 2, Port C2} with its configuration BPDU {1, 0, 1, Port B2} on Port C2. S3 finds that the calculated configuration BPDU has a higher priority, so Port C2 is used as the designated port and its configuration BPDU is replaced by the calculated configuration BPDU.
Second exchange of configuration BPDUs S1 is the root bridge. Configuration BPDUs sent by S1 • The configuration BPDU sent by Port A1 is {0, 0, 0, Port A1}. • The configuration BPDU sent by Port A2 is {0, 0, 0, Port A2}. Configuration BPDUs sent by S2 • S1 is the root bridge, so S2 does not send configuration BPDUs to S1. • The configuration BPDU sent by Port B2 is {0, 5, 1, Port B2}. Configuration BPDUs sent by S3 • S1 is the root bridge, so S3 does not send configuration BPDUs to S1. • The configuration BPDU sent by Port C2 is {0, 10, 2, Port C2}.
Comparison for the second exchange of configuration BPDUs S2 • Port B1 receives the configuration BPDU {0, 0, 0, Port A1} from Port A1 and finds that the received configuration BPDU is the same as its own configuration BPDU, so Port B1 discards the received one. • Port B2 receives the configuration BPDU {0, 10, 2, Port C2} from Port C2 and finds that its configuration BPDU {0, 5, 1, Port B2} has a higher priority, so Port B2 discards it. • After comparison, the optimal configuration BPDUs on Port B1 and Port B2 are {0, 0, 0, Port A1} and {0, 5, 1, Port B2} respectively. • Because the optimal configuration BPDU on each port remains unchanged, the port role does not change. S3 • Port C1 receives the configuration BPDU {0, 0, 0, Port A2} from S1 and finds that the received configuration BPDU is the same as its own configuration BPDU, so Port C1 discards the received one. • Port C2 receives the configuration BPDU {0, 5, 1, Port B2} from S1 and compares it with its configuration BPDU {0, 10, 2, Port C2}.
•
Because the root bridge ID is the same, the root path costs are compared. Port C2 finds that the received configuration BPDU has a higher priority(10>9), so Port C2 updates its BPDU as {0, 5, 1, Port B2}. After comparison, the optimal configuration BPDUs on Port C1 and Port C2 are {0, 0, 0, Port A2} and {0, 5, 1, Port B2} respectively. Comparison of configuration BPDUs on each port: • S3 compares the root path cost of Port C1 (root path cost of 0 in the received configuration BPDU + path cost 10 of the link) with the root path cost of Port C2 (root path cost of 5 in the received configuration BPDU + path cost 4 of the link). The root path cost of Port C2 is smaller, so the configuration BPDU of Port C2 is preferred. Port C2 is used as the root port and its configuration BPDU remains unchanged. • S3 calculates the configuration BPDU {0, 9, 2, Port C1} for Port C1 according to the configuration BPDU and path cost of the root port, and compares the calculated configuration BPDU with its configuration BPDU. S3 finds that its configuration BPDU has a higher priority, so Port C1 is blocked and the configuration BPDU of S3 remains unchanged. In this case, Port C1 does not forward data. Furthermore, spanning tree calculation may be triggered, for example, the link between S2 and S3 becomes Down.
Topology on the Left Side According to the root bridge selection principle of STP, S1 is the root bridge. Then determine the root port, designated port, and alternate port. E0 and E1 on S2 receive BPDUs {0, 0, 0, E0} and {0, 0, 0, E1} from S1. In the two BPDUs, only the transmit port is different. The port with smaller PID has a higher priority, so E0 is the root port and E1 is the alternate port. Topology on the Right Side According to the root bridge selection principle of STP, S1 is the root bridge. Then determine the root port, designated port, and alternate port. E0 and E1 on S2 receive BPDUs {0, 0, 0, E0} and {0, 0, 0, E1} from S1. The two BPDUs have the same priority, only the PIDs are compared. E0 has smaller PID, so E0 is the root port and E1 is the alternate port.
Generally, only the root bridge generates and sends configuration BPDUs. Other non-root-bridges only forward the configuration BPDU from the root port using their designated ports. The designated port on a non-root-bridge sends the optimal BPDU only after receiving BPDUs with a lower priority.
Topology description: After S2 receives a BPDU with a lower priority from S4, S2 sends a configuration BPDU. This is because network bridges save the optimal configuration BPDU.
Topology Description The figure on the left side shows the initial topology. The path costs are the same. S1, S2, and S3 are connected, S1 is the root port, and interconnected ports are in forwarding state. In the figure on the right side, a link between S1 and S2 is added. After S2 receives BPDUs from S1 and S3, S2 considers that the port connected to S1 is the new root port and the port connected to S3 is the designated port. All ports are root ports or designated ports in forwarding state. In this case, a loop occurs. The loop can be eliminated only when configuration BPDUs are transmitted to each network bridge and S2 blocks the port connected to S3 through calculation. There is a delay for a port (for example, port E on S2) to change from non-forwarding to forwarding so that ports that want to enter the non-forwarding state can complete spanning tree calculation.
Forward Delay The default interval for port status transition is 15 seconds. There are specific calculation between Forwarding Delay, hello timer and Max Age, the default value is based on the diameter 7 calculating. Port Status Description After a port is enabled, the port enters the Listening state and starts the spanning tree calculation. If the port needs to be configured as the alternate port through calculation, the port enters the Blocking state. If the port needs to be configured as the root port or designated port through calculation, the port enters the Learning state from the Listening state after a Forward Delay period. The port then enters the Forwarding state from the Learning state after the Forward Delay period. The port in Forwarding state can forward data frames.
Huawei switch port status Huawei datacom devices use MSTP by default. After a device transitions from the MSTP mode to the STP mode, its STPcapable port supports the same port states as those supported by an MSTP-capable port, including the Forwarding, Learning, and Discarding states.
Port status transition ① The port is initialized or enabled. ② The port is blocked or the link fails. ③ The port is selected as the root port or designated port. ④ The port is no longer the root port or designated port. ⑤ The Forward Delay timer expires.
TCN BPDU processing: After the network topology changes, the downstream device continuously sends a TCN BPDU to its upstream device which the port status turn to forwarding. After the upstream device receives the TCN BPDU from the downstream device, only the designated port processes it. The other ports may receive the TCN BPDU but do not process it. The upstream device sets the TCA bit of the Flags field in the configuration BPDU to 1 and returns the configuration BPDU to instruct the downstream device to stop sending TCN BPDUs. The upstream device sends a copy of the TCN BPDU to the root bridge. Steps 1 to 4 repeat until the root bridge receives the TCN BPDU. After receiving the TCN BPDU, the root bridge resets the TCA bit in the subsequent configuration BPDU for acknowledgment and sets the TC bit of the Flags field in the configuration BPDU to 1 to notify all network bridges of the topology change. After the periods of Max Age and Forward Delay, the root bridge sends the BPDU with the reset TC bit. The network bridge that receives the BPDU reduces the aging time of MAC address entries to the Forward Delay period.
Topology Description: Through STP calculation, S1 is the root bridge and port E1 on S4 is blocked. When the link of port E1 on S3 fails, the STP will be calculation again, port E1 of S4 will turn to designated port and the status is forwarding, S4 immediately sends a TCN BPDU to the upstream. After S2 receives the TCN BPDU from S3, S2 resets the TCA bit in the subsequent configuration BPDU and sends it to S4 from port E3. S2 also sends the TCN BPDU to the root from the root port E1. After S1 receives the TCN BPDU from S2, S1 resets the TCA and TC bits in the subsequent configuration BPDU and sends it to S2 from the designated port E1. Within the period of 35 seconds (20 seconds + 15 seconds), S1 resets the TC bit in the configuration BPDU. After receiving the configuration BPDU with the reset TC bit, each network bridge changes its aging time of MAC address entries to 15 seconds. When the topology change, the MAC address table will established soon, which can avoid wasting of bandwidth.
Root bridge failure: When S1 becomes faulty, S2 and S3 cannot receive BPDUs from the root bridge. S2 and S3 detect the root bridge failure only after a Max Age period. S2 and S3 then determine the new root bridge, root port, and designated port. The topology convergence period is 50 seconds (BPDU aging period plus value twice the Forward Delay period). Link failure: When the link between S3 and S1 fails, S3 can immediately detect this event. The blocked port on S3 immediately enters the Listening state and sends the configuration BPDU with itself as the root. After S2 receives the BPDU with lower priority from S3, S2 sends a configuration BPDU with S1 as the root. The port on S2 connected to S3 therefore becomes the root port, and the port on S3 connected to S2 becomes the designated port. The period for the S3 port status change from Listening, Learning, to Forwarding is 30 seconds. When a link fails or is added, the fault can be rectified after 30 seconds.
STP Limitation: Port statuses or port roles are not distinguished in a finegranular manner. For example, ports in Listening and Blocking states do not forward user traffic or learn MAC addresses. The STP algorithm determines topology changes after the time set by the timer expires, which slows down network convergence. The STP algorithm requires a stable network topology. After the root bridge sends configuration BPDUs, other devices process the configuration BPDUs so that the configuration BPDUs are advertised to the entire network.
RSTP has all functions of STP, and the RSTP-capable and STPcapable network bridges can work together.
RSTP defines four port roles: root port, designated port, alternate port, and backup port. The functions of the root port and designated port are the same as those defined in STP. The alternate port and backup port are described as follows. From the perspective of configuration BPDU transmission: • An alternate port is blocked after learning the configuration BPDUs with a higher priority from other bridges. • A backup port is blocked after learning the configuration BPDUs with a higher priority than itself. From the perspective of user traffic: • An alternate port backs up the root port and provides an alternate path from the designated bridge to the root bridge. • A backup port backs up the designated port and provides an alternate path from the root bridge to a network segment. After all RSTP-capable ports are assigned roles, topology convergence is completed.
Port statuses are simplified from five types to three types. Based on whether a port forwards user traffic and learns MAC addresses, the port is in one of the following states: If a port neither forwards user traffic nor learns MAC addresses, the port is in Discarding state. If a port does not forward user traffic but learns MAC addresses, the port is in Learning state. If a port forwards user traffic and learns MAC addresses, the port is in Forwarding state. RSTP Calculation Roles of ports in Discarding state are determined: • The root port and designated port enter the learning state after the Forward Delay period. A port in Learning state learns MAC addresses and enters the Forwarding state after a Forward Delay period. RSTP accelerates this process using another mechanism. • An alternate port maintains a Discarding state.
Configuration BPDUs in RSTP are differently defined. Port roles are described based on the Flags field defined in STP. When compared with STP, RSTP slightly redefines the format of configuration BPDUs. The value of the Type field is no longer set to 0 but 2. The STP-capable device therefore always discards the configuration BPDUs sent by an RSTP-capable device. The 6 bits in the middle of the original Flags field are reserved. Such a configuration BPDU is called an RST BPDU. Flags field in an RST BPDU: Bit 0 indicates the TC bit, which is the same as that in STP. Bit 1 indicates the Proposal flag bit, indicating that the BPDU is the Proposal packet in the fast convergence mechanism. Bit 2 and bit 3 indicate the port role. The value 00 indicates the unknown port; the value 01 indicates the root port; the value 10 indicates the alternate or backup port; the value 11 indicates the designated port. Bit 4 indicates that the port is in Learning state. Bit 5 indicates that the port is in Forwarding state. Bit 6 indicates the Agreement packet in the fast convergence mechanism. Bit 7 indicates the TCA bit, which is the same as that in STP.
Configuration BPDUs are processed in a different manner. Transmission of configuration BPDUs after the topology becomes stable • In STP, after the topology becomes stable, the root bridge sends configuration BPDUs at an interval set by the Hello timer. A non-root-bridge does not send configuration BPDUs until it receives configuration BPDUs sent from the upstream device. This renders the STP calculation complicated and time-consuming. In RSTP, after the topology becomes stable, a nonroot-bridge sends configuration BPDUs at an interval set by the Hello timer, regardless of whether it has received the configuration BPDUs sent from the root bridge. Such operations are implemented on each device independently. Shorter timeout interval of BPDUs • In STP, a device has to wait for the Max Age period before determining a negotiation failure. In RSTP, if a port does not receive configuration BPDUs sent from the upstream device for three consecutive intervals set by the Hello timer, the negotiation between the local device and its peer fails.
Processing of RST BPDUs with lower priority • In RSTP, when a port receives an RST BPDU from the upstream designated bridge, the port compares the received RST BPDU with its own RST BPDU. If its own RST BPDU has higher priority than the received one, the port discards the received RST BPDU and immediately responds to the upstream device with its own RST BPDU. After receiving the RST BPDU, the upstream device updates its own RST BPDU based on the corresponding fields in the received RST BPDU. In this manner, RSTP processes BPDUs with lower priority more rapidly, independent of any timer that is used in STP.
STP convergence To eliminate loops, STP uses timers to complete convergence. The default period from the time the port is enabled to the time the port is in Forwarding state is 30 seconds. Shortening the values of timers may cause the network to become unstable. RSTP fast convergence Edge port • In RSTP, a designated port on the network edge is called an edge port. An edge port directly connects to a terminal and does not connect to any other switching devices. An edge port does not receive configuration BPDUs, so it does not participate in the RSTP calculation. It can directly change from the Disabled state to the Forwarding state without any delay, just like an STP-incapable port. If an edge port receives bogus configuration BPDUs from attackers, it becomes a common STP port. The STP recalculation is performed, causing network flapping. Fast switching of the root port • If the root port fails, the optimal alternate port on the network becomes the root port and enters the Forwarding state. This is because there must be a path from the root bridge to a designated port on the network segment connecting to the alternate port.
Proposal/Agreement mechanism • When a port is selected as a designated port, in STP, the port does not enter the Forwarding state until a Forward Delay period expires; in RSTP, the port enters the Discarding state, and then the Proposal/Agreement mechanism allows the port to immediately enter the Forwarding state. The Proposal/Agreement mechanism must be applied on the P2P links in full-duplex mode. • The P/A mechanism is short for the Proposal/Agreement mechanism
Edge port An edge port directly connects to a terminal. When the network topology changes, loops do not occur on the edge port. The edge port therefore can directly enter the Forwarding state without waiting for two Forward Delay periods. An edge port does not receive configuration BPDUs, so it does not participate in the RSTP calculation. It can directly change from the Disabled state to the Forwarding state without any delay, just like an STP-incapable port. If an edge port receives bogus configuration BPDUs from attackers, it becomes a common STP port. The STP recalculation is performed, causing network flapping.
Fast switching of the root port In RSTP, an alternate port is the backup of the root port. When the root port of a network bridge becomes discarding, the optimal alternate port is used as the new root port and becomes Forwarding states. Because the network segment connects to this alternate port must have a designated port whitch can reach to the root bridge.
P/A mechanism The Proposal/Agreement (P/A) mechanism enables a designated port to rapidly enter the Forwarding state. The P/A mechanism requires that the link between two switching devices should be P2P and work in full-duplex mode. When P/A negotiation fails, the designated port is selected after two Forward Delay periods. The negotiation process is the same as that in STP. After a new link is established, the negotiation process of the P/A mechanism is as follows: • p0 and p1 become designated ports and send RST BPDUs. • After receiving an RST BPDU with higher priority, p1 on S2 determines that it will become a root port but not a designated port. p1 then stops sending RST BPDUs. • p0 on S1 enters the Discarding state and sends RST BPDUs with the Proposal field of 1. • After receiving an RST BPDU with the Proposal field of 1, S2 sets the sync variable to 1 for all its ports. • As p2 has been blocked, its status remains unchanged; p4 is an edge port and does not participate in calculation. Only the non-edge designated port p3 therefore needs to be blocked.
•
•
After p2, p3, and p4 enter the Discarding state, their synced variables are set to 1. The synced variable of the root port p1 is then set to 1, and p1 sends an RST BPDU with the Agreement field of 1 to S1. With exception of the Agreement field that is set to 1 and the Proposal field that is set to 0, the RST BPDU is the same as that received. After receiving this RST BPDU, S1 identifies the RST BPDU as a response to the Proposal packet that it just sent, and p0 immediately enters the Forwarding state.
The P/A negotiation with the downstream device as follows. When a link between S1 and S2 is added, the P/A mechanism works as follows: S1 sends an RST BPDU with the Proposal field of 1 to S2. After receiving the RST BPDU, S2 determines that E2 is the root port. S2 blocks designated ports of E1 and E3, sets the root port to the Forwarding state, and sends an Agreement packet to S1. After S1 receives the Agreement packet, its designated port E1 immediately enters the Forwarding state. The non-edge designated ports of E1 and E3 on S2 sends Proposal packets. After S3 receives the Proposal packets from S2, S3 determines that E1 is the root port and starts synchronization. Because the downstream port of S3 is the edge port, S3 directly sends an Agreement packet. After S2 receives the Agreement packet from S3, its port E1 immediately enters the Forwarding state. The process on S4 is similar to that on S3. After S2 receives the Agreement packet from S4, its port E3 immediately enters the Forwarding state. The P/A process is completed.
In RSTP, if a non-edge port changes to the Forwarding state, the topology changes. After a switching device detects the topology change (TC), it performs the following operations: Start a TC While timer for every non-edge port. The TC While Timer value doubles the Hello timer value. All MAC address entries learned by the ports whose status changes are cleared before the timer expires. These ports send RST BPDUs with the TC field of 1. Once the TC While timer expires, the ports stop sending the RST BPDUs. After another switching device receives the RST BPDU, it clears the MAC addresses learned by all ports excluding the one that receives the RST BPDU. The switching device then starts a TC While timer for all non-edge ports and the root port. The process is similar. In this manner, RST BPDUs flood the network.
When a port switches from RSTP to STP, the port loses RSTP features such as fast convergence. On a network where both STP-capable and RSTP-capable devices are deployed, STP-capable devices ignore RST BPDUs; if a port on an RSTP-capable device receives a configuration BPDU from an STPcapable device, the port switches to the STP mode after two intervals specified by the Hello timer and starts to send configuration BPDUs. In this manner, RSTP and STP are interoperable. After STP-capable devices are removed, Huawei RSTP-capable datacom devices can switch back to the RSTP mode.
RSTP, an enhancement to STP, implements fast convergence of the network topology. There is a defect for both RSTP and STP: All VLANs on a LAN use one spanning tree, and VLAN-based load balancing cannot be performed. Once a link is blocked, it will no longer transmit traffic, wasting bandwidth and causing the failure in forwarding certain VLAN packets. Topology Description STP or RSTP is deployed on the LAN. The broken line shows the spanning tree; S6 is the root switching device; the links between S1 and S4 and between S2 and S5 are blocked. VLAN packets are transmitted by using only the links marked with "VLAN2" or "VLAN3." PC2 and PC3 belong to VLAN 2 but they cannot communicate with each other because the link between S2 and S5 is blocked and the link between S3 and S6 rejects packets from VLAN 2. MSTP can be used to address this issue. MSTP implements fast convergence and provides multiple paths to load balance VLAN traffic.
A Multiple Spanning Tree (MST) region contains multiple switching devices and network segments between them. The switching devices of one MST region have the following identical characteristics: MSTP-enabled Region name VLAN-MSTI mappings MSTP revision level An instance is a collection of VLANs. Binding multiple VLANs to an instance saves communication costs and reduces resource usage. The topology of each MSTI is calculated independent of one another, and traffic can be balanced among MSTIs. Multiple VLANs that have the same topology can be mapped to one instance. The forwarding status of the VLANs for a port is determined by the port status in the MSTI.
The Common and Internal Spanning Tree (CIST), calculated using STP or RSTP, connects all switching devices on a switching network. The CIST root is the network bridge with the highest priority on the entire network, that is, root bridge of the CIST. In the preceding topology, the lines in red in MSTIs and the lines in blue between MSTIs form a CIST. The root bridge of the CIST is S1 in MST region 1. A Common Spanning Tree (CST) connects all the MST regions on a switching network. The CST is calculated by all nodes using STP or RSTP. In the preceding topology, the lines in blue form a CST. The CST root is MST region 1. An Internal Spanning Tree (IST) resides within an MST region. Each spanning tree in an MST region has an MSTI ID. An IST is a special MSTI with the MSTI ID of 0, called MSTI 0. The VLANs that do not map to other MSTIs map to MSTI 0. An IST is a segment of the CIST in an MST region. In the preceding topology, the lines in red form a IST. The master bridge is the IST master, which is the switching device closest to the CIST root in a region. If the CIST root is in an MST region, the CIST root is the master bridge of the region.
In the preceding topology, S1, S4, and S7 are master bridges. A Single Spanning Tree (SST) is formed in either of the following situations: A switching device running STP or RSTP belongs to only one spanning tree. An MST region has only one switching device. There is no SST in the preceding topology.
MSTI An MST region can contain multiple spanning trees, each called an MSTI. An MSTI regional root is the root of the MSTI. Each MSTI has its own regional root. MSTIs are independent of each other. An MSTI can map to one or more VLANs, but one VLAN can map to only one MSTI. Each MSTI has an MSTI ID. The MSTI ID starts from 1, which is distinguished with the IST (MSTI 0). In the preceding topology, VLAN 2 maps to MSTI 2 and VLAN 4 to MSTI 4. MSTI regional root The MSTI regional root is the network bridge with the highest priority in each MSTI. You can specify different roots in different MSTIs. In the preceding topology, assuming that S9 has the highest priority in MSTI 2, S9 is the regional root in MSTI 2. Assuming that S8 has the highest priority in MSTI 4, S8 is the regional root in MSTI 2.
When compared to RSTP, MSTP has two additional port types. MSTP ports include the root port, designated port, alternate port, backup port, edge port, master port, and regional edge port. Master port • A master port is on the shortest path connecting MST regions to the CIST root. • BPDUs of an MST region are sent to the CIST root through the master port. • Master ports are special regional edge ports, functioning as root ports in the CIST and master ports in instances. • In the preceding topology, the port on S7 connected to MST region 1 is the master port. Regional edge port • A port connecting the network bridge in an MST region to another MST region or an STP or RSTP-enabled network bridge is a regional edge. • In the preceding topology, the port on S8 connected to MST region 2 is the regional edge port.
Network bridges may have different roles in different MSTIs, so ports with exception to the master port on network bridges may have different roles. The master port retains its role in all MSTIs.
Currently, there are two MST BPDU formats: dot1s: BPDU format defined in IEEE 802.1s legacy: private BPDU format In using the stp compliance command, you can configure a port on a Huawei datacom device to automatically adjust the MST BPDU format. With exception to MSTP-specific fields, other fields in an intra-region or inter-region MST BPDU are the same as those in an RST BPDU. The Root ID field in an RST BPDU indicates the CIST root ID in an MST BPDU. The EPC field in an MST BPDU indicates the total path cost from the MST region where the network bridge sending the BPDU resides to the MST region where the CIST root resides. The Bridge ID field in an MST BPDU indicates the regional root ID in the CIST. The Port ID field in an MST BPDU indicates the ID of the designated port in the CIST. MSTP-specific fields: • Version 3 Length: indicates the BPDUv3 length, which is used to check received MST BPDUs. • MST Configuration Identifier: indicates the MST configuration identifier, which has four fields.
• •
•
•
This field identifies an MST region where a network bridge is located. Neighboring switches are in the same MST region only when the following fields on the switches are the same: • Format Selector: indicates the 802.1s-defined protocol selector. It has a fixed value of 0. • Name: indicates the configuration name, that is, the MST region name of a switch. The value has 32 bytes. Each switch has an MST region name configured. The default value is the switch’s MAC address. • Config Digest: indicates the configuration digest, which has 16 bytes. Switches in an MST region should maintain the same mapping between VLANs and MSTIs. However, the MST configuration table is too large (8192 bytes) and cannot be easily transmitted between switches. This field is the digest calculated from the MST configuration table using the MD5 algorithm. • Revision Level: indicates the revision level of an MST region, which has two bytes. The default value is all 0s. The value of the Config Digest field is the digest of the MST configuration table, there is a low probability that MST configuration tables are different but the digest is the same. In this case, switches in different MST regions may be incorrectly considered in the same MST region. It is recommended that different MST regions use different revision levels to prevent the preceding problem. CIST Internal Root Path Cost: indicates the total path cost from the local port to the IST master. This value is calculated based on link bandwidth. CIST Bridge Identifier: indicates the ID of the designated switching device on the CIST. CIST Remaining Hops: indicates the remaining hops of a BPDU in the CIST. This field is used to limit the MST scale. A BPDU has the maximum hop count on the CIST regional root. The hop count decreases by 1 every time the BPDU passes a network bridge. The network bridge discards the BPDU with the hop of 0. MSTI Configuration Messages(may be absent): indicates an MSTI configuration message. • MSTI Flag: has eight bits. Bits 1 to 7 are the same as those in RSTP. Bit 8 indicates whether the network bridge is the master bridge, and replaces the TCA bit in RSTP.
• •
• •
•
MSTI region Root ID: indicates the regional root ID of the MSTI. MSTI IRPC: indicates the path cost from the network bridge sending the BPDU to the MSTI regional root. MSTI Bridge Priority: indicates the priority of the network bridge that sends the BPDU. MSTI Port Priority: indicates the priority of the port that sends the BPDU. MSTI Remaining Hops: indicates the remaining number of hops in an MSTI.
MSTP Topology Calculation In MSTP, the entire Layer 2 network is divided into multiple MST regions, which are interconnected by a single CST. In an MST region, multiple spanning trees are calculated, each of which is called an MSTI. Among these MSTIs, MSTI 0 is also known as the internal spanning tree (IST). Like STP, MSTP uses configuration BPDUs to calculate spanning trees, but the configuration BPDUs are MSTP-specific. Vectors Root switching device ID: identifies the root switching device in the CIST. The root switching device ID consists of the priority value (16 bits) and MAC address (48 bits). The priority value is the priority of MSTI 0. External root path cost (ERPC): indicates the external root path cost from the CIST regional root to the CIST root. ERPCs saved on all switching devices in an MST region are the same. If the CIST root is in an MST region, ERPCs saved on all switching devices in the MST region are 0s. Regional root ID: identifies the MSTI regional root. The regional root ID consists of the priority value (16 bits) and MAC address (48 bits). Internal root path cost (IRPC): indicates the path cost from the local bridge to the regional root.
Designated switching device ID: indicates the network bridge that sends the BPDU. Designated port ID: identifies the port on the designated switching device connected to the root port on the local device. The port ID consists of the priority value (4 bits) and port number (12 bits). The priority value must be a multiple of 16. Receiving port ID: identifies the port that receives the BPDU. The port ID consists of the priority value (4 bits) and port number (12 bits). The priority value must be a multiple of 16. If the priority of a vector carried in the configuration message of a BPDU received by a port is higher than the priority of the vector in the configuration message saved on the port, the port replaces the saved configuration message with the received one. In addition, the port updates the global configuration message saved on the device. If the priority of a vector carried in the configuration message of a BPDU received on a port is equal to or lower than the priority of the vector in the configuration message saved on the port, the port discards the BPDU.
CST Calculation CST and IST calculation is similar to the calculation in RSTP. During CST calculation, an MST region is considered as a network bridge and the ID of the network bridge is the IST regional root ID. CIST uses the following vectors: {root switching device ID, ERPC, regional root ID, IRPC, designated switching device ID, designated port ID, receiving port ID}. CST uses the following vectors: {CIST root, ERPC, regional root ID, designated port ID, receiving port ID}. Topology description: • Assume that S1, S4, and S7 are regional roots in Region1, Region2, and Region3 respectively. S1 has the highest priority, S4 has the lowest priority, and the cost of each path is the same. • Each MST region is considered as a network bridge, and the ID of the network bridge is the regional root ID. Each MST region sends a BPDU with itself as the CIST root and external cost of 0 to other MST regions. • Through RSTP calculation, S1 is the CIST root. • Through ERPC comparison, the port of each regional root connected to Region1 is the master port. • Through comparison of priorities in regional root IDs, the regional edge port is determined.
IST Calculation CST and IST calculation is similar to the calculation in RSTP. MSTP calculates an IST for each MST region, and computes a CST to interconnect MST regions. The CST and ISTs constitute a CIST for the entire network. CIST uses the following vectors: {root switching device ID, ERPC, regional root ID, IRPC, designated switching device ID, designated port ID, receiving port ID}. IST uses the following vectors: {CIST root, IRPC, designated bridge ID, designated port ID, receiving port ID}. Topology description: • After CST calculation is complete, S1, S4, and S7 are regional roots in Region1, Region2, and Region3 respectively. In this situation, the regional root is the network bridge closest to the CIST root but not the network bridge with the highest priority. • The role of a port on each network bridge is determined based on the regional root as the root bridge and IRPC, and then the IST is obtained. • Network bridges in an MST region compare IRPCs to determine the IST root port. • Port roles in the IST are determined based on priorities in BPDUs.
Region1 Calculation In an MST region, MSTP calculates an MSTI for each VLAN based on mappings between VLANs and MSTIs. Each MSTI is calculated independently. The calculation process is similar to the process for STP to calculate a spanning tree. Topology description: • In Region1, VLAN 2 maps to MSTI 2, VLAN 4 to MSTI 4, and other VLANs to MSTI 0. • Different priorities are specified for network bridges in different MSTIs. Assume that S2 is the root bridge in MSTI 2 and S3 is the root bridge in MSTI 4. • In MSTI 2, S2, S1, and S3 are in descending order of priority. Through calculation, the port on S3 connected to S1 is blocked. • In MSTI 4, S3, S1, and S2 are in descending order of priority. Through calculation, the port on S2 connected to S1 is blocked. MSTIs have the following characteristics: The spanning tree is calculated independently for each MSTI, and spanning trees of MSTIs are independent of each other. MSTP calculates the spanning tree for an MSTI in a manner similar to STP. Spanning trees of MSTIs can have different roots and topologies.
Each MSTI sends BPDUs in its spanning tree. The topology of each MSTI is configured by using commands. A port can be configured with different parameters for different MSTIs. A port can play different roles or have different statuses in different MSTIs.
Region2 Calculation Topology description: • In Region2, VLAN 2 maps to MSTI 2, VLAN 3 to MSTI 3, and other VLANs to MSTI 0. • Different priorities are specified for network bridges in different MSTIs. Assume that S5 is the root bridge in MSTI 2 and S6 is the root bridge in MSTI 3. • In MSTI 2, S5, S4, and S6 are in descending order of priority. Through calculation, the port on S6 connected to S4 is blocked. • In MSTI 3, S6, S4, and S5 are in descending order of priority. Through calculation, the port on S5 connected to S4 is blocked.
Region3 Calculation Topology description: • In Region3, VLAN 2 maps to MSTI 2, VLAN 4 to MSTI 4, and other VLANs to MSTI 0. • Different priorities are specified for network bridges in different MSTIs. Assume that S9 is the root bridge in MSTI 2 and S8 is the root bridge in MSTI 4. • In MSTI 2, S9, S10, S8, and S7 are in descending order of priority. Through calculation, the port on S7 connected to S8 and the port on S8 connected to S10 are blocked. • In MSTI 4, S8, S7, S10, and S9 are in descending order of priority. Through calculation, the port on S9 connected to S7 and the port on S10 connected to S7 are blocked.
MSTI Calculation After CIST and MSTI calculations are complete, the mapping between VLANs and MSTIs in each MST region is independent. On an MSTP-aware network, a VLAN packet is forwarded along the following paths: • MSTI including the IST in an MST region • CST among MST regions
Interoperability between MSTP and RSTP An RSTP or STP-enabled network bridge considers an MST region as the RSTP-enabled bridge with the bridge ID as the regional root ID. When an RSTP or STP-enabled network bridge receives an MST BPDU, it obtains the CIST root, ERPC, regional root ID, and designated port ID in the MST BPDU as the RID, RPC, BID, and PID. When an MSTP-enabled network bridge receives an STP or RST BPDU, it obtains the RID, RPC, BID, and PID as the CIST root, ERPC, regional root ID, and designated port ID. The BID is used as the regional root ID and designated switch ID, and the IRPC is 0.
In MSTP, the P/A mechanism works as follows: The upstream device sends a Proposal packet to the downstream device, requesting fast switching. After receiving the Proposal packet, the downstream device sets its port connecting to the upstream device to the root port and blocks all non-edge ports. The upstream device continues to send an Agreement packet. After receiving the Agreement packet, the root port enters the Forwarding state. The downstream device replies with an Agreement packet. After receiving the Agreement packet, the upstream device sets its port connecting to the downstream device to the designated port, and the port enters the Forwarding state. By default, Huawei datacom devices use the enhanced P/A mechanism. To enable a Huawei datacom device to communicate with third-party devices that use the ordinary P/A mechanism, run the stp noagreement-check command to configure the ordinary P/A mechanism on the Huawei datacom device.
Case description S1, S2, and S3 must be in descending order of priority to meet requirements 2 and 3.
Command usage The stp mode command sets the working mode of a spanning tree protocol on a switching device. The stp root command configures a switching device as the root bridge or secondary root bridge of a spanning tree. The stp priority command sets the priority of the switching device in a spanning tree. The stp cost command sets the path cost of a port in a spanning tree. Parameters stp mode { mstp | rstp | stp } mstp: indicates the MSTP mode. rstp: indicates the RSTP mode. stp: indicates the STP mode. stp [ instance instance-id ] root { primary | secondary } instance instance-id: specifies the ID of a spanning tree instance. It needs to be specified in MSTP. primary: indicates that the switching device functions as the primary root bridge of a spanning tree. secondary: indicates that the switching device functions as the secondary root bridge of a spanning tree.
stp [ instance instance-id ] priority priority priority priority: specifies the priority of the switching device in a spanning tree. The priority ranges from 0 to 61440. The value is a multiple of 4096, such as 0, 4096 and 8192. The default is 32768. stp [ instance instance-id ] cost cost cost: specifies the path cost of a port. When the path cost of a port changes, spanning tree recalculation will be performed. Precautions On an STP/RSTP/MSTP network, each spanning tree has only one root bridge, which is responsible for sending BPDUs and connecting devices on the entire network. Because the root bridge is important on a network, the switching device with high performance and network hierarchy is required to be selected as the root bridge. Such a device may not have high priority, so you can run the stp root command to configure a switching device as the root bridge in a spanning tree. A switching device in a spanning tree cannot function as both the primary and secondary root bridges. After the stp root command is run to configure a switching device as the primary root bridge, the priority value of the switching device is 0 in the spanning tree and the priority cannot be modified. After the stp root command is run to configure a switching device as the secondary root bridge, the priority value of the switching device is 4096 in the spanning tree and the priority cannot be modified.
Case description In the preceding topology: • Requirement 1 involves interoperability between RSTP and STP. • Requirement 2 involves the stp root command usage. • Requirement 3 involves the edge port, BPDU filtering, and BPDU protection.
Command usage The stp mcheck command configures a port to automatically switch from the STP mode back to the RSTP/MSTP mode. The stp edged-port default command configures all ports on a switching device as edge ports. The stp bpdu-filter default command configures all ports on a switching device as BPDU-filter ports. The stp bpdu-protection command enables BPDU protection on a switching device. The stp root-protection command enables root protection on a port. Precautions After the stp bpdu-filter default and stp edged-port default commands are run in the system view, none of the ports on the device will initiate any BPDUs or negotiate with the directly connected port on the remote device, and all the ports are in Forwarding state. This may lead to a loop and cause a broadcast storm. Exercise caution when using the stp bpdufilter default and stp edged-port default commands in the system view. After BPDU protection is enabled on a switching device, the switching device sets an edge port in error-down state if the edge port receives a BPDU and retains the port as an edge port.
The role of a designated port enabled with root protection cannot be changed. When a designated port enabled with root protection receives a BPDU with a higher priority, the port enters the Discarding state and does not forward packets. If the port does not receive any BPDUs with higher priority after a given period of time (generally two Forward Delay periods), the port automatically enters the Forwarding state.
Case description S1 must be configured as the root bridge in MSTI2 and S3 must be configured as the root bridge in MSTI3 to meet requirement 3, the Alternate port as figure above. So, S1 need be configured as the root bridge in MSTI2, S2, S3, and S4 must be in descending order of priority; and S3 need be configured as the root bridge in MSTI3, S1, S4, and S2 must be in descending order of priority.
Command usage The region-name command configures the MST region name of a switching device. The instance command maps a VLAN to an MSTI. The revision-level command configures the revision level of an MST region of a switching device. The default value is 0. The active region-configuration command activates the configuration of an MST region. The stp loop-protection command enables loop protection on a port. Precautions Two switching devices belong to the same MST region only when they have the following identical configurations: • MST region name • Mappings between MSTIs and VLANs • MST region revision level Loop protection • On a network running a spanning tree protocol, a switching device maintains the status of the root port and blocked port by continuously receiving BPDUs from the upstream switching device.
•
If ports cannot receive BPDUs from the upstream switching device due to link congestion or unidirectional link failure, the switching device will reselect a root port. The original root port then becomes a designated port and the original blocked port enters the Forwarding state. As a result, loops may occur on the network. Loop protection can be deployed to prevent this problem. If the root port or alternate port cannot receive BPDUs from the upstream device for a long period of time after loop protection is enabled, the root port or alternate port will send a notification message to the NMS. The root port will enter the Discarding state, and the alternate port remains in Blocking state and no longer forwards packets. This prevents loops on the network. The root port or alternate port restores the Forwarding state after receiving BPDUs.
If the topology of an MSTI changes, the forwarding paths of VLANs that are mapped to this MSTI change. As a result, ARP entries relevant to these VLANs need to be updated. Based on methods for processing ARP entries, the convergence modes of a spanning tree protocol are classified into fast and normal: In fast mode, the switch directly deletes the ARP entries that need to be updated in an ARP table. In normal mode, the switch ages the ARP entries that need to be updated in the ARP table. If the number of ARP probes for aging ARP entries is larger than 0, the switch probes these ARP entries before aging them. In fast mode, frequent ARP entry deletion will affect services and even may cause 100% CPU usage. As a result, packet processing will time out, causing network flapping.
Unicast In unicast mode, the amount of data transmitted on a network is proportional to the number of users that require the data. If a large number of users require the same data, the multicast source must send many copies of data to these users, consuming high bandwidth on the multicast source and network. Therefore, the unicast mode is not suitable for batch data transmission and is applicable only to networks with a small number of users. Broadcast In broadcast mode, data is sent to all hosts on a network segment regardless of whether they require the data. This threatens information security and causes broadcast storms on the network segment. Therefore, the broadcast mode is not suitable for data transmission from a source to specified destinations. In addition, the broadcast mode wastes network bandwidth. Multicast has the following advantages over unicast and broadcast: Compared with the unicast mode, the multicast mode starts to copy data and distribute data copies on the network node as far from the source as possible. Therefore, the amount of data and the level of network resource consumption will not increase greatly when the number of receivers increases.
Compared with the broadcast mode, the multicast mode transmits data only to receivers that require the data. This saves network resources and enhances data transmission security.
Multicast basic concepts Multicast group: A group of receivers identified by an IP multicast address. User hosts (or other receiver devices) that have joined a multicast group become members of the group and can identify and receive the IP packets destined for the multicast group address. Multicast source: A sender of multicast data. The server in the topology is a multicast source. A multicast source can simultaneously send data to multiple multicast groups. Multiple multicast sources can simultaneously send data to the same multicast group. A multicast source does not need to join any multicast groups. Multicast group member: A host that has joined a multicast group. PC1 and PC2 in the following topology are multicast group members. Memberships in a multicast group change dynamically. Hosts can join or leave a multicast group anytime. Members of a multicast group are located anywhere on a network. Multicast router: A router or Layer 3 switch that supports IP multicast. The routers in the following topology are multicast routers. In addition to multicast routing functions, multicast routers connected to user network segments provide multicast membership management.
Multicast service models are classified for receiver hosts and do not affect multicast sources. All multicast data packets sent from a multicast source use the IP address of the multicast source as the source IP address and use a multicast group address as the destination address. Depending on whether receiver hosts can select multicast sources, two multicast models are defined: any-source multicast (ASM) model and source-specific multicast (SSM) model. The two models use multicast group addresses in different ranges. ASM model: Receiver hosts can only specify the group they want to join and cannot select multicast sources. SSM model: Receiver hosts can specify the multicast sources from which they want to receive multicast data when they join a group. After joining the group, the hosts receive only the data sent from the specified sources.
Multicast addresses IP addresses 224.0.0.0 to 224.0.0.255 are reserved as permanent group addresses by the Internet Assigned Numbers Authority (IANA). In this address range, 224.0.0.0 is not allocated, and the other addresses are used by routing protocols for topology discovery and maintenance. These addresses are locally valid. Packets with these addresses will not be forwarded by routers regardless of the time-to-live (TTL) values in the packets. Addresses in the range of 224.0.1.0 to 231.255.255.255 and 233.0.0.0 to 238.255.255.255 are ASM group addresses and are globally valid. Addresses 232.0.0.0 to 232.255.255.255 are SSM group addresses available to users and are globally valid. Addresses 239.0.0.0 to 239.255.255.255 are local administrative multicast addresses and are valid only in the local administrative domain. Local administrative group addresses are private addresses. A local administrative group address can be used in different administrative domains.
Mapping from IPv4 multicast addresses to MAC addresses The first four bits of an IPv4 multicast address are 1110, mapped to the leftmost 25 bits of a MAC multicast address. Only 23 bits of the last 28 bits are mapped to a MAC address. This means that 5 bits of the IP address are lost. As a result, 32 multicast IP addresses are mapped to the same MAC address. For example, IP multicast addresses 224.0.1.1, 224.128.1.1, 225.0.1.1, and 239.128.1.1 are all mapped to MAC multicast address 01-00-5e-00-01-01. Address conflicts must be considered in address assignment.
IGMP IGMP is deployed between multicast routers and user hosts. On a multicast router, IGMP is configured on interfaces connected to hosts. On hosts, IGMP allows group members to dynamically join and leave multicast groups. On routers, IGMP manages and maintains group memberships and exchanges information with upper-layer multicast routing protocols. PIM PIM has two modes: PIM-DM and PIM-SM. It must be enabled on all interfaces of all multicast routers. It provides multicast routing and forwarding, and maintains the multicast routing table based on network topology changes. IGMP snooping IGMP snooping is deployed in VLANs on Layer 2 switches between multicast routers and hosts. It listens on IGMP messages exchanged between routers and hosts to create and maintain a Layer 2 multicast forwarding table. In this manner, multicast data can be forwarded on a Layer 2 network.
IGMP IGMP is an IPv4 group membership management protocol in the TCP/IP protocol suite. IP hosts use IGMP to report their group memberships to any immediately-neighboring multicast routers. IGMP is deployed between multicast routers and hosts. On a multicast router, IGMP is configured on interfaces connected to hosts. On hosts, IGMP allows group members to dynamically join and leave multicast groups. On routers, IGMP manages and maintains group memberships and exchanges information with upper-layer multicast routing protocols. The IGMP versions are backward compatible. Therefore, a multicast router running a later IGMP version can identify Membership Report messages sent from hosts running an earlier IGMP version, although the IGMP messages in different versions use different formats. All of the IGMP versions support the any-source multicast (ASM) model. IGMPv3 can be independently used in the source-specific multicast (SSM) model, whereas IGMPv1 and IGMPv2 must be used with SSM mapping.
IGMP messages are encapsulated in IP packets. IGMPv1 defines the following types of messages: General Query: Sent by a querier to all hosts and routers on the shared network segment to discover which multicast groups have members on the network segment. Report: Sent by a host to request to join a multicast group or respond to a General Query message. How IGMPv1 works IGMPv1 uses a query-report mechanism to manage multicast groups. When multiple multicast routers exist on a network segment, one router is elected as the IGMP querier to send Query messages. In IGMPv1 implementation, a unique Assert winner or designated router (DR) is elected by Protocol Independent Multicast (PIM) to work as the querier. (The election mechanism will be described later). The querier is the only device that sends Membership Query messages on the local network segment. General query and report In the multicast network, R1 and R2 connect to a user network segment with three receivers: PC1, PC2, and PC3. R1 is the querier on the network segment. PC1 and PC2 want to receive data sent to group G1, and PC3 wants to receive data sent to group G2. The general query and report process is as follows:
•
The IGMP querier (R1) sends a General Query message with the destination address 224.0.0.1 (indicating all hosts and routers on the same network segment). The IGMP querier sends General Query messages at intervals. The interval can be configured using a command, and the default interval is 60 seconds. • All hosts on the network segment receive the General Query message. PC1 and PC2 then start a timer for G1 (Timer-G1), and PC3 starts a timer for G2 (Timer-G2). The timer length is a random value between 0 and 10, in seconds. • The host with the timer expiring first sends a Report message for the multicast group. In this example, Timer-G1 on PC1 expires first, and PC1 sends a Report message with the destination address as G1. When PC2 detects the Report message sent by PC1, PC2 stops Timer-G1 and does not send any Report messages for G1. This mechanism reduces the number of Report messages transmitted on the network segment, lowering loads on multicast routers. • When Timer-G2 on PC3 expires, PC3 sends a Report message with the destination address as G2 to the network segment. • After the routers receive the Report message, they know that multicast groups G1 and G2 have members on the local network segment. The routers use the multicast routing protocol to create (*, G1) and (*, G2) entries, in which * stands for any multicast source. Once the routers receive data sent to G1 and G2, they forward the data to this network segment. A member joins a group A new host PC4 connects to the network segment. PC4wants to join multicast group G3 but detects no multicast data for G3. In this case, PC4 immediately sends a Report message for G3 without waiting for a General Query message. After receiving the Report message, the routers know that a member of G3 has connected to the network segment, and they create a (*, G3) entry. When the routers receive data sent to G3, they forward the data to this network segment. A member leaves a group IGMPv1 does not define a Leave message. After a host leaves a multicast group, it no longer responds to General Query messages. Assume that PC4 has left group G3. It does not send Report messages for G3 when receiving General Query messages.
Because there is no other member of G3, routers no longer receive Report message for G3. After a period of time (130 seconds, Membership timeout interval = IGMP general query interval x Robustness variable + Maximum response time), the routers delete the multicast forwarding entry of G3.
IGMPv2 defines two types of new messages in addition to General Query and Report messages: Group-Specific Query: sent by a querier to a specified group on the local network segment to check whether the group has members. Leave: sent by a host to notify routers on the local network segment that it has left a group. IGMPv2 modifies the General Query message format by adding the Max Response Time field in the message. The field value controls the response speed of group members and is configurable. Querier election IGMPv2 defines an independent querier election mechanism. When multiple multicast routers are available on a shared network segment, the router with the smallest IP address is elected as the querier. IGMPv1 depends on upper-layer multicast protocols such as PIM for querier election. Topology description • Each IGMPv2 router considers itself as a querier when it starts and sends a General Query message to all hosts and routers on the local network segment. • When other routers receive the General Query message, they compare the source IP address of the message with their own interface IP addresses.
The router with the smallest IP address becomes the querier, and the other routers are non-queriers. In this network, R1 has a smaller interface IP address than R2, so R1 becomes the querier. • All non-querier routers start a timer (Other Querier Present Timer, Timer length = Robustness variable x IGMP general query interval + (1/2) x Maximum response time. If the robustness variable, IGMP general query interval, and maximum response time are all default values, the Other Querier Present Timer length is 125 seconds.) If non-querier routers receive a Query message from the querier before the timer expires, they reset the timer. If non-querier routers receive no Query message from the querier when the timer expires, they trigger election of a new querier. Leave mechanism In IGMPv2 implementation, the following process occurs when PC3 wants to leave multicast group G2 and if PC3 is the group member of last response query: • PC3 sends a Leave message for G2 to all multicast routers on the local network segment. The destination address of the Leave message is 224.0.0.2. • When the querier receives the Leave message, it sends Group-Specific Query messages for G2 at intervals to check whether G2 has other members on the network segment. The sending interval and number of Group-Specific Query messages sent by the querier are configurable. By default, the querier sends a total of two Group-Specific Query messages, at an interval of 1 second. In addition, the querier starts the membership timer (Timer-Membership, Timer length = Interval for sending Group-Specific Query messages x Number of messages sent). • If G2 has no other member on the network segment, the routers cannot receive any Report message for G2. After Timer-Membership expires, the routers delete the downstream interface connected to the network segment from the (*, G2) entry. Then the routers no longer forward data of G2 to the network segment. • If G2 has other members on the network segment, the members send a Report message for G2 within the maximum response time. The routers continue maintaining membership of G2.
IGMPv3 was developed to support the source-specific multicast (SSM) model. IGMPv3 messages can contain multicast source information so that hosts can receive data sent from a specific source to a specific group. IGMPv3 also defines two types of messages: Query and Report. Compared with IGMPv2, IGMPv3 has the following changes: In addition to General Query and Group-Specific Query messages, IGMPv3 defines a new Query message type: Group-and-Source-Specific Query. A querier sends a Groupand-Source-Specific Query message to members of a specific group on the shared network segment, to check whether the group members want data from specific sources. A Groupand-Source-Specific Query message carries one or more multicast source addresses. A host can send a Report message to notify a multicast router that it wants to join a multicast group and receive data from specified multicast sources. IGMPv3 supports source filtering and defines two filter modes: INCLUDE and EXCLUDE. Group-source mappings are represented as (G, INCLUDE, (S1, S2...)) or (G, EXCLUDE, (S1, S2...)). The (G, INCLUDE, (S1, S2...)) entry indicates that a host only wants to receive data sent from the listed multicast sources to group G. The (G, EXCLUDE, (S1, S2...)) entry indicates that a host wants to receive data sent from all multicast sources except the listed ones to group G.
Group Record types in IGMPv3 Report messages IS_IN • Indicates that the source filter mode is INCLUDE for a multicast group. That is, members of the group want to receive only data sent from the specified sources. IS_EX • Indicates that the source filter mode is EXCLUDE for a multicast group. That is, members of the group want to receive data sent from multicast sources except the specified sources. TO_IN • Indicates that the source filter mode for a multicast group has changed from EXCLUDE to INCLUDE. If the source list is empty, the members have left the multicast group. TO_EX • Indicates that the source filter mode for a multicast group has changed from INCLUDE to EXCLUDE. ALLOW • Indicates that members of a multicast group want to receive data from the specified multicast sources in addition to the current sources. If the source filter mode for the multicast group is INCLUDE, the specified sources are added to the source list. If the source filter mode is EXCLUDE, the specified sources are deleted from the source list.
BLOCK • Indicates that members of a multicast group no longer want to receive data from the specified multicast sources. If the source filter mode for the multicast group is INCLUDE, the specified sources are deleted from the source list. If the source filter mode is EXCLUDE, the specified sources are added to the source list. An IGMPv3 Report message can carry multiple groups, whereas an IGMPv1 or IGMPv2 Report message can carry only one group. IGMPv3 greatly reduces the number of messages transmitted on a network. Unlike IGMPv2, IGMPv3 does not define a Leave message. Group members send Report messages of a specified type to notify multicast routers that they have left a group. For example, if a member of group 225.1.1.1 wants to leave the group, it sends a Report message with (225.1.1.1, TO_IN, (0)).
If IGMPv1 or IGMPv2 is running between a host and its upstream router, the host cannot select multicast sources when it joins group G. The host receives data from both S1 and S2, regardless of whether it requires the data. If IGMPv3 is running between the host and its upstream router, the host can choose to receive only data from S1 using either of the following methods: Method 1: Send an IGMPv3 Report (G, IS_IN, (S1)), requesting to receive only the data sent from S1 to G. Method 2: Send an IGMPv3 (G, IS_EX, (S2)), notifying the upstream router that it does not want to receive data from S2. Only data sent from S1 is then forwarded to the host.
Compatibility with IGMPv1 routers When IGMPv2 hosts discover an IGMPv1 router, they must send IGMP Report messages to the router and cannot send Leave messages. If there are both IGMPv1 and IGMPv2 routers on a network segment, the querier must send IGMPv1 messages. Compatibility with IGMPv1 hosts IGMP v2 hosts must allow their Report messages to be suppressed by IGMPv1 Report messages. Otherwise, the querier will not know existence of IGMPv1 hosts on the shared network segment. If the querier is an IGMPv2 router and receives a Leave message for a group (there are IGMPv1 hosts in the group), the IGMPv1 hosts will not receive traffic for this group. If an IGMPv2 router detects IGMPv1 hosts on the local network segment, the router ignores any subsequent Leave messages received.
SSM mapping is implemented based on static SSM mapping entries. A multicast router converts (*, G) information in IGMPv1 and IGMPv2 Report messages to (S, G) information according to static SSM mapping entries, so as to provide the SSM service for IGMPv1 and IGMPv2 hosts. By default, SSM group addresses range from 232.0.0.0 to 232.255.255.255. IGMP SSM mapping does not apply to IGMPv3 Report messages. To enable hosts running any IGMP version on a network segment to obtain the SSM service, IGMPv3 must run on interfaces of multicast routers on the network segment. With SSM mapping entries configured, a router checks the group address G in each IGMPv1 or IGMPv2 Report message received, and processes the message based on the check result: If G is in the range of any-source multicast (ASM) group addresses, the router provides the ASM service for the host. If G is in the range of SSM group addresses: • When the router has no SSM mapping entry matching G, it does not provide the SSM service and drops the Report message. • If the router has an SSM mapping entry matching G, it converts (*, G) information in the Report message into (S, G) information and provides the SSM service for the host. Topology description
On an SSM network, PC1 runs IGMPv3, PC2 runs IGMPv2, and PC3 runs IGMPv1. PC2 and PC3 cannot run IGMPv3. To provide the SSM service for all the hosts on the network segment, IGMP SSM mapping must be configured on R1. Before SSM mapping is enabled, the group-source mappings on R1 are as follows: • Group 232.0.0.0/8 mapped to source 10.10.1.1 • Group 232.1.0.0/16 mapped to source 10.10.2.2 • Group 232.1.1.0/24 mapped to source 10.10.3.3 After SSM mapping is enabled on R1, R1 checks group addresses of received packets to see whether the group addresses are in the SSM group address range. If the group addresses are in the SSM group address range, R1 generates the following multicast entries according to the configured SSM mapping entries. If a group address is mapped to multiple sources, R1 generates multiple (S, G) entries. The following are entries generated according to information in Report messages sent from PC2 and PC3: • (10.10.1.1,232.1.2.2) • (10.10.2.2,232.1.2.2) • (10.10.1.1,232.1.3.3) • (10.10.2.2,232.1.3.3)
Report message to the upstream device. The upstream device can send multicast packets to the host after receiving the Report message. IGMP messages are encapsulated in IP packets (Layer 3 packets). Layer 2 devices between hosts and multicast routers, however, cannot process Layer 3 information carried in IP packets. In addition, Layer 2 devices cannot learn any MAC multicast address because the source MAC addresses of link layer data frames are not MAC multicast addresses. When a Layer 2 device receives a data frame with a multicast destination MAC address, the device cannot find a matching entry in its MAC address table. Consequently, the device broadcasts the multicast packet. This wastes bandwidth resources and poses threats to network security.
Concepts
A router port is a link layer device's port towards a multicast router. The link layer multicast device receives packets through the router port. Router ports are classified into two types: • Dynamic router port: A port that can receive IGMP Query messages or PIM Hello messages whose source addresses are not 0.0.0.0. Dynamic router ports are dynamically maintained based on protocol packets exchanged between multicast devices and hosts. Each dynamic router port has a timer. When the timer expires, the member port ages out. • Static router port: Manually specified using a command. Static router ports will not age out. A group member port is a port towards user hosts. A link layer multicast device sends multicast packets to receiver hosts through group member ports. Group member ports are classified into two types: • Dynamic member port: A port that can receive IGMP Report messages. Dynamic member ports are dynamically maintained based on protocol packets exchanged between multicast devices and hosts.
Each dynamic member port has a timer. When the timer expires, the member port ages out. • Static member port: Manually specified using a command. Static member ports will not age out. The output port list is important information for layer-2 multicast, include port of router and port of member. Working mechanisms When a router port on an Ethernet switch receives an IGMP General Query message, the switch resets the aging timer of the router port. If the port that receives the General Query message is not a router port, the switch starts the aging timer for the port. (The aging time is 180 seconds or the Holdtime value carried in PIM Hello messages received by the switch. The default Holdtime value is 105 seconds.) When an Ethernet switch receives an IGMP Report message, it checks whether there is a MAC multicast group matching the IP multicast group that the user wants to join. • If the MAC multicast group does not exist, the switch creates the MAC multicast group, adds the port that receives the Report message to the MAC multicast group, and starts the aging timer on the port (Timer length = Robustness variable x General query interval + Maximum response time). In addition, the switch adds all router ports in the same VLAN as the member port to the MAC multicast forwarding entry. It then creates an IP multicast group and adds the port that receives the Report message to the IP multicast group. • If the MAC multicast group exists but the port that receives the IGMP Report message is not in the group, the switch adds the port to the MAC multicast group and starts the aging timer on the port. The switch then checks whether the IP multicast group exists. If the IP multicast group does not exist, the switch creates the IP multicast group and adds the port to it. If the IP multicast group exists, the switch adds the port to the group directly. • If the MAC multicast group exists and the port that receives the IGMP Report message is already in the group, the switch resets the aging timer on the port.
IGMP Leave message: When an Ethernet switch receives an IGMP Leave message for a group on a port, it sends an IGMP Group-Specific Query message to the port to check whether the group has other members on the port. At the same time, the switch starts the query response timer (Timer length = Group-specific query interval x Robustness variable). If the switch does not receive any IGMP Report message for the group when the query response timer expires, it deletes the port from the matching MAC multicast group. If the MAC multicast group has no member port, the switch requests the upstream multicast router to delete this branch from the multicast tree.
Layer 2 multicast If users in different VLANs require the same multicast data, the upstream router still has to send multiple copies of identical multicast data to different VLANs. Users in VLAN 2 and VLAN 3 need to receive the same multicast data flow. Multicast router R1 replicates the multicast data in each VLAN and sends two copies of data to downstream switch S1. This wastes bandwidth between the router and Layer 2 device and increases loads on the router. Multicast VLAN The multicast VLAN feature allows Layer 2 network devices to replicate multicast data across VLANs. After the multicast VLAN function is configured on S1, R1 replicates multicast data in the multicast VLAN (VLAN 4) and sends only one copy to S1. As the router does not need to replicate multicast data in VLAN 2 and VLAN 3, network bandwidth is conserved and loads on the router are reduced. Concepts Multicast VLAN: VLAN to which a network-side interface belongs. A multicast VLAN is used to aggregate multicast data flows. One multicast VLAN can be bound to multiple user VLANs. User VLAN: VLAN to which a user-side interface belongs. A user VLAN is used to receive multicast data flows from the multicast VLAN. A user VLAN can be bound only to one multicast VLAN.
We have learned about the Internet Group Management Protocol (IGMP). The IGMP protocol runs between receiver hosts and multicast routers, whereas a multicast routing protocol needs to run between routers. A multicast routing protocol is used to create and maintain multicast routes, and to forward multicast data packets correctly and efficiently. Multicast routes construct a unidirectional loop-free data transmission path from a data source to multiple receivers. This transmission path is a multicast distribution tree. Multicast routing protocols can be intradomain or inter-domain protocols. This course introduces PIM, a typical intra-domain multicast routing protocol.
PIM router Routers with PIM enabled on interfaces are called PIM routers. A multicast distribution tree contains the following types of PIM routers: • Leaf router: The PIM router directly connected to a user host, which may not be multicast group members. • First-hop router: The PIM router directly connected to a multicast source on the multicast forwarding path and responsible for forwarding multicast data from the multicast source. • Last-hop router: The PIM router directly connected to a multicast group member on the multicast forwarding path and responsible for forwarding multicast data to the member.
Multicast distribution tree On a PIM network, a point-to-multipoint multicast forwarding path is set up for each multicast group on routers. The multicast forwarding path is in a tree topology, so it is also called a multicast distribution tree. There are two types multicast distribution trees: source tree and shared tree. Source tree A source tree is rooted at a multicast source and combines the shortest paths from the source to receivers. Therefore, a source tree is also called a shortest path tree (SPT). For a multicast group, routers need to establish an SPT from each multicast source that sends packets to the group. In this example, there are two multicast sources (S1 and S2) and two receivers (PC1 and PC2). Therefore, two source trees are established on the network. PIM routing entry PIM routing entries are created by the PIM protocol to guide multicast forwarding. An (S, G) entry contains a known multicast source for a group, and is used to establish an SPT on PIM routers. (S, G) entries apply to both PIM-DM and PIM-SM networks. If an (S, G) entry exists on a PIM router, the router forwards multicast packet according to the (S, G) entry.
Multicast distribution tree On a PIM network, a point-to-multipoint multicast forwarding path is set up for each multicast group on routers. The multicast forwarding path is in a tree topology, so it is also called a multicast distribution tree. There are two types multicast distribution trees: source tree and shared tree. Shared tree A shared tree is rooted at a rendezvous point (RP) and combines shortest paths from the RP and all receivers. It is therefore also called a rendezvous point tree (RPT). Each multicast group has only one shared tree. All multicast sources and receivers of a group send and multicast data packets along the shared tree. A multicast source first sends data packets to the RP, which then forwards the packets to all receivers. In this example, multicast sources S1 and S2 share one RPT. PIM routing entry PIM routing entries are created by the PIM protocol to guide multicast forwarding. A (*, G) entry contains a known multicast group, with the multicast source unknown. It is used to establish an RPT on PIM routers. (*, G) entries apply only to PIM-SM networks. If no (S, G) entry is available and only a (*, G) entry exists on a router, the router creates an (S, G) entry based on this (*, G) entry, and then forwards multicast packets according to the (S, G) entry.
PIM DM overview PIM-DM uses the push mode to forward multicast packets and is often used on small-scale networks with densely distributed multicast group members. PIM-DM assumes that each network segment has multicast group members. When a multicast source sends multicast packets, PIM-DM floods the multicast packets to all PIM routers on the network and prunes the branches with no members. PIM-DM establishes and maintains a unidirectional loop-free SPT (source-specific shortest path tree) through periodical flood-and-prune processes. If a new group member connects to a leaf router on a pruned branch, the router can initiate a grafting process to restore multicast forwarding before the next flood-and-prune process. PIM-DM uses the following mechanisms: neighbor discovery, flooding, pruning, grafting, assert, and state refresh. The flooding, pruning, and grafting mechanisms are used to establish an SPT.
PIM routers send Hello messages through all PIM-enabled interfaces. The multicast packet encapsulating a Hello message has a destination IP address of 224.0.0.13 (indicating all PIM routers on a network segment), and the source IP address is the IP address of the interface sending the multicast packet. The TTL value of the multicast packet is 1. Hello messages are used to discover PIM neighbors, adjust PIM protocol parameters, and maintain neighbor relationships. Discovering PIM neighbors • PIM routers on the same network segment must receive multicast packets with the destination address 224.0.0.13. By exchanging Hello messages, directly connected PIM routers learn neighbor information and establish neighbor relationships. • A PIM router can receive other PIM messages to create multicast routing entries only after it establishes neighbor relationships with other PIM routers. Adjusting PIM protocol parameters • A Hello message carries the following PIM protocol parameters to control PIM message exchange between PIM neighbors: • DR_Priority: indicates the priority used by an interface in DR election. The interface with the highest priority becomes the DR. This parameter is used for DR election only on PIMSM networks.
•
Holdtime: indicates timeout interval of a neighbor relationship. A PIM router considers its neighbor reachable within the Holdtime interval. • LAN_Delay: indicates the delay in transmitting Prune messages on a shared network segment. • Neighbor-Tracking: indicates the neighbor tracking function. • Override-Interval: indicates the interval for overriding a pruning operation. Maintaining neighbor relationships • PIM routers periodically send Hello messages to each other. If a PIM router does not receive any Hello message from a PIM neighbor within the Holdtime interval, the router considers the neighbor unreachable and deletes the neighbor from the neighbor list. • Changes of PIM neighbors lead to changes in the multicast network topology. If an upstream or downstream neighbor in the multicast distribution tree is unreachable, multicast routes need to re-converge, and the multicast distribution tree will change. IGMPv1 querier election Routers on a PIM-DM network compare the priorities and IP addresses carried in Hello messages to elect a DR for each network segment. The DR functions as the IGMPv1 querier on the network segment. If the DR fails, neighboring routers trigger a new DR election process when the Hello timeout timer expires. Hello timers The default Hello interval is 30 seconds. The default Hello timeout interval is 105 seconds.
On a PIM-DM network, multicast packets sent from a multicast source are flooded throughout the entire network. When a PIM router receives a multicast packet, the router performs an RPF check on the packet against the unicast routing table. If the packet passes the RPF check, the router creates an (S, G) entry, in which the downstream interface list contains all the interfaces connected to downstream PIM neighbors. The router then forwards subsequent multicast packets through each downstream interface. When multicast packets reach a leaf router, the leaf router processes the packets as follows: If the network segment connected to the leaf router has group members, the leaf router adds its interface connected to the network segment to the downstream interface list of the (S, G) entry, and forwards subsequent multicast packets to the group members. If the network segment connected to the leaf router has no group member and the leaf router does not need to forward multicast packets to downstream PIM neighbors, the leaf router initiates a pruning process. Topology description Multicast source S sends a multicast packet to multicast group G.
When R1 receives the multicast packet, it performs an RPF check on the packet against the unicast routing table. After the packet passes the RPF check, R1 creates an (S, G) entry, in which the downstream interface list contains interfaces connected to R2 and R5. R1 then forwards subsequent packets to R2 and R5. R2 receives the multicast packet from R1. After the packet passes the RPF check, R2 creates an (S, G) entry, in which the downstream interface list contains the interfaces connected to R3 and R4. R2 then forwards subsequent packets to R3 and R4. R5 receives the multicast packet from R1. Because the downstream network segment does not have group members or PIM neighbors, R5 triggers a pruning process. R3 receives the multicast packet from R2. After the packet passes the RPF check, R3 creates an (S, G) entry, in which the downstream interface list contains the interface connected to PC1. R3 then forwards subsequent packets to PC1 R4 receives the multicast packet from R2. Because the downstream network segment does not have group members or PIM neighbors, R4 triggers a pruning process.
When a PIM router receives a multicast packet, it performs an RPF check on the packet. If the packet passes the RPF check but the downstream network segment does not have any group member, the PIM router sends a Prune message to the upstream router. After receiving the Prune message from the downstream interface, the upstream router deletes the downstream interface from the downstream interface list of the (S, G) entry. The multicast packets will not be forwarded to this downstream interface. A pruning operation is initiated by a leaf router. The Prune message is sent upstream hop by hop, and PIM routers receiving the Prune message deletes the downstream interface from the (S, G) entry. Finally, the multicast distribution tree contains only branches with group members. A PIM router starts a prune timer (210 seconds by default) for the pruned downstream interface and resumes multicast forwarding on the interface after the timer expires. Multicast packets are then flooded on the entire network, and new group members can receive multicast packets. Subsequently, leaf routers without group members attached trigger pruning processes. PIM-DM updates the SPT through periodic flood-and-prune processes.
After a downstream interface of a leaf router is pruned: If new members join the multicast group on the interface and want to receive multicast packets before the next flood-andprune process, the leaf router initiates a grafting process. If no member joins the multicast group and multicast forwarding still needs to be suppressed on the interface, the leaf router initiates a state refresh process. Topology description R5 sends a Prune message to R1 to notify R1 that the downstream network segment no longer needs to receive multicast data. After receiving the Prune message, R1 stops forwarding data through its downstream interface connecting to R5, and deletes this downstream interface from the (S, G) entry. R1 has another downstream interface in forwarding state, so the pruning process ends. Subsequent multicast packets are only forwarded to R2. R4 sends a Prune message to R2 to notify R2 that the downstream network segment no longer needs to receive multicast data. After receiving the Prune message, R2 waits for 3 seconds (LAN-delay +override-interval). R3 also receives the Prune message sent by R4. Because R3 connects to a downstream receiver, R3 sends a Join message to override the Prune message. After R2 receives the Join message, it ignores the Prune message sent from R4 and continues forwarding multicast traffic to the downstream interface. The LAN-delay and override-interval are explained as follows: Hello messages carry the LAN-delay and override-interval parameters. The LAN-delay parameter specifies the packet transmission delay (500 milliseconds by default), and the override-interval specifies the interval during which downstream routers can override a pruning operation (2500 milliseconds by default). If a router sends a Prune message upstream but other routers on the same network segment still need to receive multicast data, they must send a Join message to override the pruning operation within the override-interval. If routers on a link have different override-interval values, the maximum override-interval value used among the routers is used on the link.
The total of LAN-delay and override-interval is the prunepending timer (PPT). After a router receives a Prune message from a downstream interface, it waits until the PPT expires, and then prune the downstream interface. If the router receives a Join message from the downstream interface before the PPT expires, it cancels the pruning operation.
Multicast routers prune branches without group members to establish a new SPT according to received Prune messages. Although routers no longer forward multicast packets to pruned branches, the corresponding (S, G) entry still exists on each router. Once new members join the group on the pruned branches, the downstream interfaces can be quickly added to the entry to resume multicast forwarding.
PIM-DM uses the grafting mechanism to enable new group members on a pruned network segment to rapidly obtain multicast data. A leaf router can determine that a multicast group G has new members on a network segment according to IGMP messages. The leaf router then sends a Graft message to notify the upstream router that the downstream network segment needs multicast data. After receiving the Graft message, the upstream router adds the downstream interface to the downstream interface list of the (S, G) entry. A grafting process is initiated by a leaf router and ends on the router that can receive multicast packets. Topology description Pruned downstream nodes can resume multicast forwarding when the prune timer expires, but they must wait for 210 seconds before the prune timer expires. This is quite a long time for new group members. To reduce the waiting time, a pruned downstream router can send a Graft message to notify the upstream router. When the network segment connected to R5 has a new group member, R5 sends a Graft message towards the multicast source S. When R1 receives the Graft message, it replies with a Graft ACK message. After that, multicast data can be forwarded to the previously pruned branch.
To prevent pruned interfaces from resuming multicast forwarding after the prune timer expires, the first-hop router nearest to the multicast source periodically sends a State-Refresh message throughout the entire PIM-DM network. Other PIM routers reset the prune timer after receiving the State-Refresh message. In this way, pruned downstream interfaces remain suppressed if leaf routers connected to the interfaces have no new group members attached. Topology description R1 sends a State-Refresh message to R2 and R5 to initiate a state refresh process. R5 has a pruned interface and resets the prune timer on the interface. If R5 still has no group member on the connected network segment when the next flood-and-prune process starts, the pruned interface is still suppressed.
If multicast PIM routers forward multicast packets to the same network segment after the multicast packets pass the RPF check, only one PIM router can be selected through the assert mechanism to forward multicast packets to the network segment. When a PIM router receives a multicast packet that is the same as the multicast packet it sends to other neighbors, the PIM router sends an Assert message with the destination address 224.0.0.13 to all other PIM routers on the same network segment. When the other PIM routers receive the Assert message, they compare local parameters with those carried in the Assert message for assert election. The assert election is performed according to the following rules: The router with the highest priority of the unicast routing protocol wins. If these routers have the same priority, the router with the smallest route cost to the multicast source wins. If these routers have the same priority and the same route cost to the multicast source, the router with the largest downstream interface IP address wins. The PIM routers perform the following operations based on assert election results: The downstream interface of the router that wins the election is the assert winner and forwards multicast packets to the shared network segment.
The downstream interfaces the PIM routers that lose the election are assert losers and no longer forward multicast packets to the shared network segment. The PIM routers delete the downstream interfaces from the downstream interface list of their (S, G) entries. After the assert election is complete, only one downstream interface is active on the network segment, so only one copy of multicast packets is transmitted to the network segment. All assert losers can resume multicast packet forwarding after a specified interval (180 seconds by default), triggering periodic assert elections. Topology description In this example, R2 has a smaller cost to the multicast source than R3. R2 and R3 receive a multicast packet from each other through their downstream interfaces, but both the packets fail the RPF check and are dropped. R2 and R3 then send an Assert message to the network segment. R2 compares its routing information with that carried in the Assert message sent by R3 and finds that its own route cost to the multicast source is smaller. Therefore, R2 wins the election. R2 continues forwards multicast packets to the network segment, whereas R3 drops subsequent multicast packets because these packets fail the RPF check. R3 compares its routing information with that carried in the Assert message sent by R2 and finds that its own router cost to the multicast source is larger. Therefore, R3 fails the election. R3 then blocks multicast forwarding on its downstream interface and deletes the interface from the downstream interface list of the (S, G) entry.
PIM-SM applies to the any-source multicast (ASM) and source-specific multicast (SSM) models. In the ASM model, PIM-SM uses the pull mode to forward multicast packets. This mode is used in networks with a lot of sparsely distributed group members. PIM-SM is implemented as follows: A PIM router works as the rendezvous point (RP) to serve group members or multicast sources that appear on the network. All PIM routers on the network know the RP's position. When a new group member appears on the network (a host sends an IGMP message to request to join a multicast group G), the last-hop router sends a Join message to the RP. The Join message is transmitted hop by hop, and all the routers receiving the message create a (*, G) entry. Finally, an RPT rooted at the RP is set up. When an active multicast source appears on the network (the multicast source sends the first multicast packet to a multicast group G), the first-hop router encapsulates the multicast data in a Register message and sends the Register message to the RP in unicast mode. The RP then creates an (S, G) entry, and the multicast source is registered on the RP. PIM-SM uses the following mechanisms in the ASM model: neighbor discovery, DR election, RP discovery, RPT setup, multicast source registration, SPT switchover, pruning, and assert. You can also configure a bootstrap router (BSR) to implement fine-grained management in a PIM-SM domain.
The network segment of a multicast source or receivers may connect to multiple PIM routers. The PIM routers exchange Hello messages to set up PIM neighbor relationships. The Hello message sent by a router carries the DR priority of the router and IP address of the interface connected to the network segment. Each PIM router compares its own information with the information carried in the Hello messages received from its neighbors. The DR elected among the PIM routers is responsible for forwarding multicast packets for the multicast source or receivers. The DR is elected according to the following rules: The PIM router with the highest DR priority wins (all routers on the network segment support the DR priority). If PIM routers have the same DR priority or at least one PIM router does not allow the DR priority field in Hello messages, the PIM router with the largest IP address wins. If the current DR fails, other PIM routers trigger a new DR election when the PIM neighbor timeout timer expires (105 seconds by default). In the ASM model, the DR provides the following functions: The DR on the shared network segment connected to a multicast source sends Register messages to the RP. This DR is called the source DR. The DR connected to the shared network segment of group members sends Join messages to the RP. This DR is called the receiver DR.
On a PIM-SM network, an RPT is a multicast distribution tree with the RP as the root and PIM routers that have group memberships as leaves. In the topology shown in the figure, when a group member appears on the network (a user sends an IGMP message to join a multicast group G), the receiver DR sends a Join message to the RP. The Join message is transmitted hop by hop, and routers receiving the message create a (*, G) entry. Finally, an RPT rooted at the RP is set up.
On a PIM-SM network, any new multicast source must register on the RP so that the RP can forward multicast data from the multicast source to group members. The multicast source registration process is as follows: A multicast source sends a multicast packet to the source DR (R1). After receiving the multicast packet, the source DR encapsulates the multicast packet into a Register message and sends the Register message to the RP (R2). The RP decapsulates the received Register message, creates an (S, G) entry, and forwards the multicast packet to group members along the RPT. The RP no longer needs any Register message sent from R1, so it sends a Register-Stop message to R1. R1 then stops sending Register messages to the RP.
On a PIM-SM network, each multicast group can have only RP and one RPT. Before an SPT switchover, all multicast packets destined for a multicast group must be encapsulated in Register messages and then sent to the RP. The RP decapsulates Register messages and forwards multicast packets along the RPT. All multicast packets pass through the RP. As the rate of multicast packets increases, the RP faces heavy loads. To resolve this problem, PIM-SM allows the RP or the receiver DR to trigger an SPT switchover. SPT switchover conditions When the multicast traffic rate exceeds the specified threshold, PIM-SM triggers an RPT-to-SPT switchover. According to default configuration of the VRP, routers connected to receivers join the SPT immediately after receiving the first multicast data packet from a multicast source.
The receiver DR periodically checks the rate of multicast packets for an (S, G) and triggers an SPT switchover when the rate exceeds the specified threshold. The receiver DR sends a Join message to the source DR. The Join message is transmitted hop by hop, and routers receiving the message create an (S, G) entry. Finally, an SPT is set up from the source DR to the receiver DR.
After the SPT is set up, the receiver DR sends a Prune message to the RP. The Prune message is transmitted hop by hop along the RPT, and routers receiving the message delete their downstream interfaces from the (S, G) entry. After the pruning process is complete, the RP no longer forwards multicast packets along the RPT. If the SPT does not pass through the RP, the RP continues to send a Prune message to the source DR, so that routers along the path between the RP and source DR delete their downstream interfaces from the (S, G) entry. After the pruning process is complete, the source DR no longer forwards multicast packets along the SPT to the RP.
On a PIM-SM network, the root of a shared tree is an RP. An RP provides the following functions: Forwards all multicast packets transmitted in the shared tree to receivers. Forwards multicast data of several or all multicast groups. A network can have one or multiple RPs. You can configure an RP to serve multicast groups in a specified range. An RP can serve multiple multicast groups, but each multicast group can have only one RP. Multicast packets sent from a multicast source to all receivers of a group are aggregated on the RP. RP discovery: Static RP: A static RP address is specified on all PIM routers in the PIM domain using the static-rp rp-address command. Dynamic RP: Several PIM routers in a PIM domain are configured as candidate-RPs (C-RPs), among which an RP is elected. Candidate bootstrap routers (C-BSRs) also need to be configured. A BSR is elected among the C-BSRs. An RP is the core router in a PIM-SM domain. If a small and simple network needs to transmit light multicast traffic and one RP is enough, you can specify the RP address statically on all routers in the PIM-SM domain. In most cases, PIM-SM networks have a large scale and need to transmit heavy multicast traffic. To reduce loads on each RP and optimize shared tree topology, different multicast groups should have different RPs. Dynamic RP election is required in this condition, and a BSR is required for RP election.
During a BSR election, each C-BSR considers itself as the BSR and sends a Bootstrap message to the entire network. The Bootstrap message carries the C-BSR address and priority. Each PIM router receives Bootstrap messages from all C-BSRs and compares C-BSR information to elect a BSR. The BSR is elected according to the following rules: The C-BSR with the highest priority wins (larger priority value, higher priority). If C-BSRs have the same priority, the C-BSR with the largest IP address wins. An RP election process is as follows: Each C-RP sends an Advertisement message to the BSR. An Advertisement message carries the C-RP address, the range of multicast groups the C-RP serves, and the C-RP priority. The BSR summarizes the C-RP information in an RP-Set, encapsulates the RP-Set in a Bootstrap message, and advertises the message all PIM-SM routers on the network. PIM routers follow the same rules to compare RP information in the RP-Set and elect an RP from multiple C-RP for the same group. The RP election rules are as follows: • The C-RP interface with the longest address mask wins. • The C-RP with the highest priority wins (larger priority value, lower priority).
•
If C-RPs have the same priority, routers use a hash algorithm, and the C-RP with the largest hash value wins. • If all the preceding parameters are the same, the C-RP with the largest IP address wins. All PIM routers use the same RP-Set and election rules, so they obtain mappings between RPs and multicast groups. The PIM routers save the mappings for subsequent multicast forwarding.
The SSM model is implemented based on PIM-SM and IGMPv3/MLDv2. In this model, an SPT can be established from a multicast source to group members without the need to maintain an RP, establish an RPT, or register the multicast source. In the SSM model, hosts can determine the location of the multicast sources. Therefore, they can specify the multicast sources from which they want to receive multicast data when joining a multicast group. After the receiver DR receives the request from a host, it sends a Join message to the source DR. The Join message is then transmitted upstream hop by hop. An SPT is then set up from the multicast source to the host. In the SSM model, PIM-SM uses the following mechanisms: neighbor discovery, DR election, and SPT setup. An SPT setup process is as follows: R3 and R5 learn that hosts in the same multicast group request data from different multicast sources through IGMPv3. Therefore, R3 and R5 send Join messages toward the sources. PIM routers that receive the Join message create (S1, G) and (S2, G) entries according to the Join message. In this way, they set up an SPT from S1 to PC1 and an SPT S2 to PC2. Multicast packets from the two multicast sources are then forwarded to the respective receivers along the SPTs.
RPF check When a router receives a multicast packet, it searches the unicast routing table for the route to the source address of the packet. After finding the route, the router checks whether the outbound interface of the route is the same as the inbound interface of the multicast packet. If they are the same, the router considers that the multicast packet is received from a correct interface. This process is called an RPF check, which ensures correct forwarding paths for multicast packets. If multiple equal-cost routes are available, the route with the largest next-hop address is used as the RPF route. RPF checks can be performed based on unicast routes, Multiprotocol Border Gateway Protocol (MBGP) routes, or static multicast routes. The priority order of these routes is static multicast routes > MBGP routes > unicast routes. Topology description A multicast stream sent from the source 152.10.2.2 arrives at interface S1 of the router. The router checks the routing table and finds that the multicast stream from this source should arrive at interface S0. Therefore, the RPF check fails and the multicast stream is dropped by the router. A multicast stream sent from the source 152.10.2.2 arrives at interface S0 of the router. The router checks the routing table and finds that the RPF interface is also S0. The RPF check succeeds, and the multicast stream is correctly forwarded.
Static multicast routing For R3, the RPF neighbor towards the multicast source (Source) is R1. Therefore, multicast packets sent from Source are forwarded along the path Source -> R1 -> R3. If you configure a multicast static route on R3 and specify R2 as the RPF neighbor, the transmission path of multicast packets sent from Source changes to Source-> R1-> R2-> R3. The multicast path then diverges from the unicast path.
Case description In this case, interconnection IP addresses are configured according to the following rule: • If RTX connects to RTY, their interface IP addresses used to connect to each other are XY.1.1.X and XY.1.1.Y, network mask is 24.
Command usage The multicast routing-enable command enables the multicast routing function. The pim dm command enables PIM-DM on an interface. The pim hello-option dr-priority command sets the DR priority for a PIM interface. The igmp enable command enables IGMP on an interface. The igmp version command specifies the IGMP version running on an interface. Precautions In this network topology, R2 is the IGMP querier, and R3 forwards multicast packets to downstream receivers because R3 is the assert winner. The display pim routing-table command displays entries in the PIM routing table. The display pim routing-table fsm command displays detailed information about the finite state machine (FSM) in the PIM routing table.
Case description The network topology is the same as that in PIM-DM configuration. The network runs PIM-SM, and the transmission scope of Bootstrap messages needs to be limited.
Command usage The pim sm command enables PIM-SM on an interface. The c-rp command configures a router to notify the BSR that it is a C-RP. The c-bsr command configures a C-BSR. The pim bsr-boundary command configures the BSR boundary of the PIM-SM domain on an interface. Precautions In this network topology, R2 is the IGMP querier, and R3 forwards multicast packets to downstream receivers because R3 is the assert winner. The display pim routing-table command displays entries in the PIM routing table. The display pim routing-table fsm command displays detailed information about the FSM in the PIM routing table.
The method for checking the SPT in a PIM-SM network is similar to the method for checking the RPT.
The method for checking the SPT in a PIM-SM network is similar to the method for checking the RPT.
Case description In this case, interconnection IP addresses are configured according to the following rules: • If RTX connects to RTY, their interface IP addresses used to connect to each other are XY.1.1.X and XY.1.1.Y, network mask is 24. • The loopback interface address of RTX is X.X.X.X/32.
Pre-configuration This page provides the basic OSPF configuration. In this case, R1 is the DR in the FR network.
Results: A Bootstrap message is transmitted from R1 to R2 and fails the RPF check on R2, so R2 drops the message. To enable Bootstrap messages to be forwarded by R2, configure a static multicast route on R2 to change the RPF path.
Results: A Bootstrap message is transmitted from R1 to R2 and fails the RPF check on R2, so R2 drops the message. To enable Bootstrap messages to be forwarded by R2, configure a static multicast route on R2 to change the RPF path.
Results: The ACL restricts the multicast address range.
IPv6 characteristics are as follows: Address space: An IPv6 address is 128 bits long. A 128-bit address structure allows for 2128 (4.3 billion x 4.3 billion x 4.3 billion x 4.3 billion) possible addresses. The biggest advantage of IPv6 is its almost infinite address space. Packet format: IPv6 uses a new protocol header format rather than increasing the bits in the address field of an IPv4 packet to 128 bits. The IPv6 data packets carry new packet headers. An IPv6 packet header includes IPv6 basic and extension headers. Some optional fields are moved to the extension header following the IPv6 header. This enables intermediate routers on the network to process IPv6 packet headers more efficiently. Autoconfiguration and readdressing: IPv6 provides address autoconfiguration, which allows hosts to automatically discover networks and obtain IPv6 addresses. This significantly improves network manageability. Hierarchical network structure: A huge address space allows for the hierarchical network design in IPv6. The hierarchical network design facilitates route summarization and improves forwarding efficiency. End-to-end security support: IPv6 supports IP Security (IPSec) authentication and encryption at the network layer, so it provides end-to-end security.
Quality of Service (QoS) support: IPv6 defines the Flow Label field in the packet header. This field enables network routers to differentiate data flows and provide special processing for the identified data flows. With this field, the routers can identify data flows without checking the inner data packets being transmitted. In this way, QoS can be implemented even if the valid payloads of data packets are encrypted. Mobility: With the support for Router header and Destination option header, IPv6 provides built-in mobility.
It should be noted that an IPv6 address can contain only one double colon (::). Otherwise, a computer cannot determine the number of zeros in a group when restoring the compressed address to the original 128bit address.
If the first 3 bits of an IPv6 unicast address are not 000, the interface ID must be of 64 bits. If the first 3 bits are 000, there is no such limitation. IEEE EUI-64 standards The length of an interface ID is 64 bits. IEEE EUI-64 defines a method to convert a 48-bit MAC address into a 64-bit IPv6 interface ID. In the MAC address, c bits indicate the vendor ID, d bits indicate the vendor number ID, and 0 bit indicates a global/local bit. g specifies whether the interface ID indicates a single host or a host group. The specific conversion algorithm is as follows: convert 0 to 1 and insert two bytes (FFFE) between c and d. The method for converting MAC addresses into IPv6 interface IDs reduces the configuration workload. When stateless address autoconfiguration (stateless address autoconfiguration will be explicated in the following pages) is used, you only need an IPv6 network prefix before obtaining an IPv6 address. The defect of this method is that an IPv6 address can be easily calculated based on a MAC address.
IPv4 addresses are classified into unicast, multicast, and broadcast addresses. Compared to IPv4, IPv6 has no broadcast address and introduces a new address type: anycast address. IPv6 addresses are classified into unicast, multicast, and anycast addresses. An IPv6 unicast address identifies an interface. Packets sent to an IPv6 unicast address are delivered to the interface identified by the unicast address. An IPv6 multicast address identifies a group of interfaces. Packets sent to an IPv6 multicast address are delivered to all the interfaces identified by the multicast address. An IPv6 anycast address identifies multiple interfaces. Packets sent to an anycast address are delivered to the nearest interface that is identified by the anycast address, depending on the routing protocols. In fact, anycast addresses and unicast addresses use the same address space. The router determines whether to send a packet in unicast mode or anycast mode.
Global unicast address An IPv6 global unicast address is an IPv6 address with a global unicast prefix, which is similar to an IPv4 public address. IPv6 global unicast addresses support route prefix summarization, helping limit the number of global routing entries. A global unicast address consists of a global routing prefix, subnet ID, and interface ID. • Global routing prefix: is assigned by a service provider to an organization. A global routing prefix is of at least 48 bits. Currently, the first 3 bits of all the assigned global routing prefixes are 001. • Subnet ID: is used by organizations to construct a local network (site). There are a maximum of 64 bits for both the global routing prefix and subnet ID. It is similar to an IPv4 subnet number. • Interface ID: refers to the interface identifier. It can be used to identify a device (host).
Link-local address Link-local addresses have a limited application scope. An IPv6 link-local address can be used only for communication between nodes on the same link. A link-local address uses a link-local prefix FE80::/10 as the first 10 bits (1111111010 in binary) and an interface ID as the last 64 bits. When IPv6 runs on a node, each interface of the node is automatically assigned a link-local address that consists of a fixed prefix and an interface ID in EUI-64 format. This mechanism enables two IPv6 nodes on the same link to communicate without any additional configuration. Therefore, link-local addresses are widely used in neighbor discovery and stateless address autoconfiguration. Routing devices do not forward IPv6 packets with the link-local address as a source or destination address to devices on nonlocal links.
Unique local address Unique local addresses are used only within a site. Site-local addresses are deprecated in RFC 3879 and replaced by unique local addresses in RFC 4193. Unique local addresses are similar to IPv4 private addresses. Any organization that does not obtain a global unicast address from a service provider can use a unique local address. Unique local addresses are routable only within a local network but not on the Internet. Fields in a unique local address can be described as follows: • Prefix: is fixed as FC00::/7. • L: is set to 1 if the address is valid within a local network. The value 0 is reserved for future expansion. • Global ID: indicates a globally unique prefix, which is pseudo-randomly allocated (for details, see RFC 4193). • Subnet ID: identifies a subnet within the site. • Interface ID: identifies an interface. A unique local address has the following characteristics: • Has a globally unique prefix. The prefix is pseudorandomly allocated and has a high probability of uniqueness. • Allows private connections between sites without creating address conflicts.
• •
• •
Has a well-known prefix (FC00::/7) that allows for easy route filtering by edge routers. Does not conflict with any other addresses or cause Internet route conflicts if it is leaked outside of the site through routing. Functions as a global unicast address to upper-layer applications. Is independent of the Internet Service Provider (ISP).
Unspecified address An IPv6 unspecified address is 0:0:0:0:0:0:0:0/128 or ::/128, indicating that an interface or a node does not have an IP address. It can be used as the source IP address of some packets, such as Neighbor Solicitation (NS) message in duplicate address detection. Devices do not forward the packets with the source IP address as an unspecified address. Loopback address An IPv6 loopback address is 0:0:0:0:0:0:0:1/128 or ::1/128. Similar to IPv4 loopback address 127.0.0.1, the IPv6 loopback address is used when a node needs to send IPv6 packets to itself. This IPv6 loopback address is usually used as the IP address of a virtual interface (a loopback interface for example). The loopback address cannot be used as the source or destination IP address of packets that need to be forwarded.
IPv6 multicast address Like an IPv4 multicast address, an IPv6 multicast address identifies a group of interfaces, which usually belong to different nodes. A node may belong to any number of multicast groups. Packets sent to an IPv6 multicast address are delivered to all the interfaces identified by the multicast address. An IPv6 multicast address is composed of a prefix, flag, scope, and group ID (global ID): • Prefix: is fixed as FF00::/8 (1111 1111). • Flag: is 4 bits long. Currently, only the last bit is used. The high-order 3 bits are reserved and must be set to 0s. The last bit 0 indicates a permanently-assigned multicast address allocated by the Internet Assigned Numbers Authority (IANA). The last bit 1 indicates a non-permanently-assigned (transient) multicast address. • Scope: is 4 bits long. It limits the scope where multicast data flows are sent on the network. • Group ID (global ID): is 112 bits long. It identifies a multicast group. RFC 2373 does not define all the 112 bits as a group ID but recommends using the low-order 32 bits as the group ID and setting all the remaining 80 bits to 0s.
IPv6 multicast addresses: Like an IPv4 multicast address, an IPv6 multicast address identifies a group of interfaces, which usually belong to different nodes. A node may belong to any number of multicast groups. Packets sent to an IPv6 multicast address are delivered to all the interfaces identified by the multicast address. An IPv6 multicast address is composed of a prefix, flag, scope, and group ID (global ID): • Prefix: is fixed as FF00::/8 (1111 1111). • Flag: is 4 bits long. Currently, only the last bit is used. The high-order 3 bits are reserved and must be set to 0s. The last bit 0 indicates a permanently-assigned multicast address allocated by the Internet Assigned Numbers Authority (IANA). The last bit 1 indicates a non-permanently-assigned (transient) multicast address. • Scope: is 4 bits long. It limits the scope where multicast data flows are sent on the network. • Group ID (global ID): is 112 bits long. It identifies a multicast group. RFC 2373 does not define all the 112 bits as a group ID but recommends using the low-order 32 bits as the group ID and setting all the remaining 80 bits to 0s.
IPv6 anycast address Anycast addresses are exclusive to IPv6. An anycast address identifies a group of interfaces, and this group of interfaces often belong to different nodes. Packets sent to an anycast address are delivered to the nearest interface that is identified by the anycast address, depending on the routing protocols. The IPv6 anycast addresses can be used in One-to-One-ofMany communications. The receiver can be one interface of a group. For example, a mobile subscriber needs to connect to the nearest receive station. Using anycast addresses, the mobile subscriber is not limited by physical locations. Anycast addresses are allocated from the unicast address space, using any of the unicast address formats. Thus, anycast addresses are syntactically indistinguishable from unicast addresses. The nodes to which an anycast address is assigned must be explicitly configured to know that it is an anycast address. Currently, anycast addresses are used only as destination addresses, and are assigned to only routers. A subnet-router anycast address is predefined in RFC 3513. The interface ID of a subnet-router anycast address is all 0s. Packets addressed to a subnet-router anycast address are delivered to a certain router (the nearest router that is identified by the address) in the subnet specified by the prefix of the address. The nearest router is defined as being closest in terms of routing distance.
An IPv6 packet has three parts: an IPv6 basic header, one or more IPv6 extension headers, and an upper-layer protocol data unit (PDU). IPv6 basic header • Each IPv6 packet must have an IPv6 basic header, which is fixed as 40 bytes long. • The IPv6 basic header provides basic packet forwarding information and will be parsed by all routers on the forwarding path. Extension headers • An IPv6 extension header is an optional header that may follow the IPv6 basic header. An IPv6 packet may carry zero, one, or more extension headers. The extension headers may be different in lengths. The IPv6 header and IPv6 extension header replace the IPv4 header and its options. The IPv6 extension header enhances IPv6 functions and has great extensibility. Unlike the Options of an IPv4 header, the maximum length of an IPv6 extension header is not limited. Therefore, an IPv6 extension header can contain all the extension data required by IPv6 communications. • The extension information about packet forwarding in an IPv6 extension header is not parsed by all the routers on the path, and is generally parsed by only the destination router.
Upper-layer protocol data unit • An upper-layer PDU is composed of the upper-layer protocol header and its payload such as an ICMPv6 packet, a TCP packet, or a UDP packet.
Fields in an IPv6 packet header are described as follows: Version: is 4 bits long. In IPv6, the Version field value is 6. Traffic Class: is 8 bits long. It indicates the class or priority of an IPv6 packet. The Traffic Class field is similar to the TOS field in an IPv4 packet and is mainly used in QoS control. Flow Label: is 20 bits long. This field is added in IPv6 to differentiate traffic. A flow label and source IP address identify a data flow. Intermediate network devices can effectively differentiate data flows based on this field. Payload Length: is 16 bits long, which indicates the length of the IPv6 payload. The payload is the rest of the IPv6 packet following the basic header, including the extension header and upper-layer PDU. This field indicates only the payload with the maximum length of 65535 bytes. If the payload length exceeds 65535 bytes, the field is set to 0. The payload length is expressed by the Jumbo Payload option in the Hop-by-Hop Options header. Next Header: is 8 bits long. This field identifies the type of the first extension header that follows the IPv6 basic header or the protocol type in the upper-layer PDU. Hop Limit: is 8 bits long. This field is similar to the Time to Live field in an IPv4 packet, defining the maximum number of hops that an IP packet can pass through. The field value is decremented by 1 by each router that forwards the IP packet. When the field value becomes 0, the packet is discarded. Source Address: is 128 bits long, which indicates the address of the packet originator. Destination Address: is 128 bits long, which indicates the address of the packet recipient.
IPv6 extension header An IPv4 packet header has an optional field (Options), which includes security, timestamp, and record route options. The variable length of the Options field makes the IPv4 packet header length range from 20 bytes to 60 bytes. When routers forward IPv4 packets with the Options field, many resources need to be used. Therefore, these IPv4 packets are rarely used in practice. IPv6 uses extension headers to replace the Options field in the IPv4 header. Extension headers are placed between the IPv6 basic header and upper-layer PDU. An IPv6 packet may carry zero, one, or more extension headers. The sender of a packet adds one or more extension headers to the packet only when the sender requests other routers or the destination device to perform special handling. Unlike IPv4, IPv6 has variable-length extension headers, which are not limited to 40 bytes. This facilitates further extension. To improve extension header processing efficiency and transport protocol performance, IPv6 requires that the extension header length be an integer multiple of 8 bytes. When multiple extension headers are used, the Next Header field of an extension header indicates the type of the next header following this extension header.
An IPv6 extension header contains the following fields: Next Header: is 8 bits long. It is similar to the Next Header field in the IPv6 basic header, indicating the type of the next extension header (if existing) or the upper-layer protocol type. Extension Header Len: is 8 bits long, which indicates the extension header length excluding the Next Header field. Extension Head Data: is of variable lengths. It includes a series of options and the padding field.
Each extension header can only occur once in an IPv6 packet, except for the Destination Options header. The Destination Options header may occur at most twice (once before a Routing header and once before the upper-layer header).
The Internet Control Message Protocol version 6 (ICMPv6) is one of the basic IPv6 protocols. In IPv4, ICMP reports IP packet forwarding information and errors to the source node. ICMP defines certain messages such as Destination Unreachable, Packet Too Big, Time Exceeded, and Echo Request or Echo Reply to facilitate fault diagnosis and information management. In addition to the common functions provided by ICMPv4, ICMPv6 provides mechanisms such as Neighbor Discovery (ID), stateless address configuration including duplicate address detection, and Path Maximum Transmission Unit (PMTU) discovery. The protocol number of ICMPv6, namely, the value of the Next Header field in an IPv6 packet is 58. Some fields in the packet are described as follows: • Type: specifies the message type. Values 0 to 127 indicate the error message type, and values 128 to 255 indicate the informational message type. • Code: indicates a specific message type. • Checksum: indicates the checksum of an ICMPv6 packet.
Destination Unreachable message :When a data packet fails to be sent to the destination node or the upper-layer protocol, the router or destination node sends an ICMPv6 Destination Unreachable message to the source node. In an ICMPv6 Destination Unreachable message, the value of the Type field is 1. The value of the Code field can be 0, 1, 2, 3, and 4. Each value has a specific meaning (defined in RFC2463) • Code=0: No route to the destination device. • Code=1: Communication with the destination device is administratively prohibited. • Code=2: Not assigned. • Code=3: Destination IP address is unreachable. • Code=4: Destination port is unreachable. Packet Too Big message If a data packet cannot be sent to the destination node because the size of the packet exceeds the link MTU of the outbound interface, the router sends an ICMPv6 Packet Too Big message to the source node. The link MTU of the outbound interface is carried in the message. PMTU discovery is implemented based on Packet Too Big messages. In a Packet Too Big message, the value of the Type field is 2 and the value of the Code field is 0.
Time Exceeded message If a router receives a packet with the hop limit being 0, it discards the data packet and sends an ICMPv6 Time Exceeded message to the source node. In a Time Exceeded message, the value of the Type field is 3. The value of the Code field can be 0 or 1. • Code=0: Hop limit exceeded in packet transmission • Code=1: Fragment reassembly timeout Parameter Problem message If an IPv6 node detects an error in the IPv6 packet header or extension header, the IPv6 node discards the data packet and sends an ICMPv6 Parameter Problem message to the source node, specifying the location and type of the error. In a Parameter Problem message, the value of the Type field is 4. The value of the Code field can be 0, 1, or 2. The 32-bit Point field indicates the location of the error. The Code field is defined as follows: • Code=0: A field in the IPv6 basic header or extension header is incorrect. • Code=1: The Next Header field in the IPv6 basic header or extension header cannot be identified. • Code=2: Unknown options exist in the extension header.
Echo Request messages Echo Request messages are sent to destination nodes. After receiving an Echo Request message, the destination node responds with an Echo Reply message. In an Echo Request message, the value of the Type field is 128 and the value of the Code field is 0. The Identifier and Sequence Number fields are configured by the source host to match the Echo Reply messages and Echo Request messages. Echo Reply messages After receiving an Echo Request message, the destination ICMPv6 node responds with an Echo Reply message. In an Echo Reply message, the value of the Type field is 129 and the value of the Code field is 0. The Identifier and Sequence Number fields in the Echo Reply message are assigned the same values as those in the Echo Request message.
IPv6 address resolution is completed at Layer 3. Layer 3 address resolution brings the following advantages: Layer 3 address resolution enables Layer 2 devices to use the same address resolution protocol. Layer 3 security mechanisms, for example, IPSec, are used to prevent address resolution attacks. Request packets are sent in multicast mode, reducing performance requirements on Layer 2 networks. Neighbor Solicitation (NS) packets and Neighbor Advertisement (NA) packets are used during address resolution. In an NS packet, the value of the Type field is 135 and the value of the Code field is 0. An NS packet is similar to the ARP Request packet in IPv4. In an NA packet, the value of the Type field is 136 and the value of the Code field is 0. An NA packet is similar to the ARP Reply packet in IPv4. The address resolution process is as follows: PC1 needs to parse the link-layer address of PC2 before sending packets to PC2. Therefore, PC1 sends an NS message on the network.
In the NS message, the source IP address is the IPv6 address of PC1, and the destination IP address is the multicast address of PC2 (this multicast address is called a solicited-node multicast address composed of the prefix FF02::1:FF00:0/104 and the last 24 bits of the corresponding unicast address). The destination IP address to be parsed is the IPv6 address of PC2. This indicates that PC1 wants to know the link-layer address of PC2. The Options field in the NS message carries the link-layer address of PC1. After receiving the NS message,PC2 replies with an NA message. In the NA reply message, the source address is the IPv6 address of PC2, and the destination address is the IPv6 address of PC1 (the NS message is sent to PC1 in unicast mode using the link-layer address of PC1). The Options field carries the link-layer address of PC2. This is the whole address resolution process.
An IPv6 unicast address that is assigned to an interface but has not been verified by DAD is called a tentative address. An interface cannot use the tentative address for unicast communication but will join two multicast groups: ALL-nodes multicast group and Solicited-node multicast group. IPv6 DAD is similar to IPv4 gratuitous ARP. A node sends an NS message that requests the tentative address as the destination address to the Solicited-node multicast group. If the node receives an NA Reply message, the tentative address is being used by another node. This node will not use this tentative address for communication. DAD process An IPv6 address 2000::1 is assigned to PC1 as a tentative IPv6 address. To check the validity of 2000::1, PC1 sends an NS message to the Solicited-node multicast group to which 2000::1 belongs. The NS message contains the requested address 2000::1. Since 2000::1 is not specified, the source address of the NS message is an unspecified address. After receiving the NS message, PC2 processes the message in the following ways: • If 2000::1 is one tentative address of PC2, PC2 will not use this address as an interface address and not send the NA message.
• If 2000::1 is being used on PC2, PC2 sends an NA message to the All-nodes multicast group to which the address belongs. The NA message carries IP address 2000::1. In this way, PC1 can find that the tentative address is duplicate after receiving the message and will not use the address.
IPv6 supports stateless address autoconfiguration. Hosts obtain IPv6 prefixes and automatically generate interface IDs. Router Discovery is the basics for IPv6 address autoconfiguration and is implemented through the following two messages: Router Advertisement (RA) message: Each router periodically sends multicast RA messages that carry network prefixes and identifiers on the network to declare its existence to Layer 2 hosts and routers. An RA message has a value of 134 in the Type field. Router Solicitation (RS) message: After being connected to the network, a host immediately sends an RS message to obtain network prefixes. Routers on the network reply with an RA message. An RS message has a value of 133 in the Type field. Address autoconfiguration The process of IPv6 stateless autoconfiguration is as follows: • A host automatically configures the link-local address based on the interface ID. • The host sends an NS message for duplicate address detection. • If address conflict occurs, the host stops address autoconfiguration. Then, the host address needs to be configured manually.
• •
•
If addresses do not conflict, the link-local address takes effect. The host is connected to the network and can communicate with the local node. The host sends an RS message or receives RA messages routers periodically send. The host obtains the IPv6 address based on the prefix carried in the RA message and the interface ID generated in EUI-64 format.
To choose an optimal gateway router, the gateway router sends a Redirection message to notify the sender that packets can be sent from another gateway router. A Redirection message is contained in an ICMPv6 message. A Redirection message has the value of 137 in the Type field and carries a better next hop address and destination address of packets that need to be redirected. The process of redirecting packets is as follows: PC1 needs to communicate with PC2. By default, packets sent from PC1 to PC2 are sent through R1. After receiving packets from PC1, R1 finds that sending packets to R2 is much better. R1 sends a Redirection message to PC1 to notify PC1 that R2 is a better next hop address. The destination address of PC2 is carried in the ICMPv6 Redirection message. After receiving the Redirection message, PC1 adds a host route to the default routing table. Packets sent to PC2 will be directly sent to R2. A router sends a Redirection message in the following situations: The destination address of the packet is not a multicast address. Packets are not routed to the router. After route calculation, the outbound interface of the next hop is the interface that receives the packets.
The router finds that a better next hop IP address of the packet is on the same network segment as the source IP address of the packet. After checking the source address of the packet, the router finds a neighboring device in the neighbor entries that uses this address as the global unicast address or the link-local address.
In IPv6, packets are fragmented on the source node to reduce the pressure on the transit device. The PMTU protocol is implemented through ICMPv6 Packet Too Big messages. A source node first uses the MTU of its outbound interface as the PMTU and sends a probe packet. If a smaller PMTU exists on the transmission path, the transit device sends a Packet Too Big message to the source node. The Packet Too Big message contains the MTU value of the outbound interface on the transit device. After receiving the message, the source node changes the PMTU value to the received MTU value and sends packets based on the new MTU. This process is repeated until packets are sent to the destination address. Then, the source node obtains the PMTU of the destination address. The process of PMTU discovery. Packets are transmitted through four links. The MTU values of the four links are 1500, 1500, 1400, and 1300 bytes respectively. Before sending a packet, the source node fragments the packet based on PMTU 1500. When the packet is sent to the outbound interface with MTU 1400, the router returns a Packet Too Big message that carries MTU 1400. After receiving the message, the source node fragments the packet based on MTU 1400 and sends the fragmented packet again.
When the packet is sent to the outbound interface with MTU 1300, the router returns another Packet Too Big message that carries MTU 1300. The source node receives the message and fragments the packet based on MTU 1300. In this way, the source node sends the packet to the destination address and discovers the PMTU of the transmission path.
RIPng made the following modifications to RIP: RIPng uses UDP port 521 (RIP uses UDP port 520) to send and receive routing information. RIPng uses the destination addresses with 128-bit prefixes (mask length). RIPng uses 128-bit IPv6 addresses as next hop addresses. RIPng uses the link-local address FE80::/10 as the source address to send RIPng Update packets. RIPng periodically sends routing information in multicast mode and uses FF02::9 as the multicast address. A RIPng packet consists of a header and multiple route table entries (RTEs). In a RIPng packet, the maximum number of RTEs depends on the MTU on the interface.
OSPFv3 is based on links rather than network segments. OSPFv3 runs on IPv6, which is based on links rather than network segments. Therefore, you do not need to configure OSPFv3 on the interfaces in the same network segment. It is only required that the interfaces enabled with OSPFv3 are on the same link. In addition, the interfaces can set up OSPFv3 sessions without IPv6 global addresses. OSPFv3 does not depend on IP addresses. This is to separate topology calculation from IP addresses. That is, OSPFv3 can calculate the OSPFv3 topology without knowing the IPv6 global address, which only applies to virtual link interfaces for packet forwarding. OSPFv3 packets and LSA format change. OSPFv3 packets do not contain IP addresses. OSPFv3 router LSAs and network LSAs do not contain IP addresses, which are advertised by Link LSAs and Intra Area Prefix LSAs. In OSPFv3, Router IDs, area IDs, and LSA link state IDs no longer indicate IP addresses, but the IPv4 address format is still reserved. Neighbors are identified by Router IDs instead of IP addresses in broadcast, NBMA, or P2MP networks. Information about the flooding scope is added in LSAs of OSPFv3.
Information about the flooding scope is added in the LSA Type field of LSAs of OSPFv3. Thus, OSPFv3 routers can process LSAs of unidentified types, which makes the processing more flexible. • OSPFv3 can store or flood unidentified packets, whereas OSPFv2 just discards unidentified packets. • OSPFv3 floods packets in an OSPF area or on a link. It sets the U flag bit of packets (the flooding area is based on the link local) so that unidentified packets are stored or forwarded to the stub area. OSPFv3 supports multi-process on a link. Only one OSPFv2 process can be configured on an OSPFv2 physical interface. In OSPFv3, one physical interface can be configured with multiple processes that are identified by different instance IDs. OSPFv3 uses IPv6 link-local addresses. As a routing protocol running on IPv6, OSPFv3 also uses linklocal addresses to maintain neighbor relationships and update LSDBs. Except Vlink interfaces, all OSPFv3 interfaces use link-local addresses as the source address and that of the next hop to transmit OSPFv3 packets. The advantages are as follows: • The OSPFv3 can calculate the topology without knowing the global IPv6 addresses so that topology calculation is independent of IP addresses. • The packets flooded on a link are not transmitted to other links, which prevents unnecessary flooding and saves bandwidth. OSPFv3 packets do not contain authentication fields. OSPFv3 directly adopts IPv6 authentication and security measures. Thus, OSPFv3 does not need to perform authentication. It only focuses on the processing of packets. OSPFv3 supports two new LSAs. Link LSA: A router floods a link LSA on the link where it resides to advertise its link-local address and the configured global IPv6 address. Intra Area Prefix LSA: A router advertises an intra-area prefix LSA in the local OSPF area to inform the other routers in the area or the network, which can be a broadcast network or an NBMA network, of its IPv6 global address. OSPFv3 identifies neighbors based on router IDs only. On broadcast, NBMA, and P2MP networks, OSPFv2 identifies neighbors based on IPv4 addresses of interfaces.
OSPFv3 identifies neighbors based on router IDs only. Thus, even if global IPv6 addresses are not configured or they are configured in different network segments, OSPFv3 can still establish and maintain neighbor relationships so that topology calculation is not based on IP addresses.
Extended IS-IS for IPv6 is defined in the draft-ietf-isis-ipv6-05 of the IETF. To process and calculate IPv6 routes, IS-IS uses two new TLVs and one network layer protocol identifier (NLPID). The two TLVs are as follows: TLV 236 (IPv6 Reachability): describes network reachability by defining the route prefix and metric. TLV 232 (IPv6 Interface Address): is similar to the IP Interface Address TLV of IPv4, except that it changes a 32-bit IPv4 address to a 128-bit IPv6 address. The NLPID is an 8-bit field that identifies the protocol packets of the network layer. The NLPID of IPv6 is 142 (0x8E). If IS-IS supports IPv6, it advertises routing information through the NLPID value.
To support multiple network layer protocols, BGP requires NLRI and Next_Hop attributes to carry information about network layer protocols. Therefore, MP-BGP uses the following new optional non-transitive attributes: MP_REACH_NLRI: indicates the multiprotocol reachable NLRI. It is used to advertise reachable routes and next hop information. MP_UNREACH_NLRI: indicates the multiprotocol unreachable NLRI. It is used to withdraw unreachable routes.
Multicast Listener Discovery (MLD) is a protocol that manages IPv6 multicast members. It has similar principles and functions as IGMP. MLD is used to enable each IPv6 router to discover their directed connected multicast listeners (nodes that expect to receive multicast data) and learn the multicast addresses that the neighbor nodes are interested in. Then, MLD delivers the learnt information to the multicast routing protocols used by the routers to ensure that multicast data can be sent to all links where the receivers reside.
Querier election mechanism The working mechanism is similar to IGMPv2: • Each MLD router considers itself as a querier when it starts and sends a General Query message with destination address FF02::1 to all hosts and routers on the local network segment. • When the routers receive a General Query message, they compare the source IPv6 address of the message with their own interface IPv6 address. The router with the smallest IPv6 address becomes the querier, and the other routers are considered non-queriers. • All non-queriers start a timer (Other Querier Present Timer). If non-queriers receive a Query message from the querier before the timer expires, they reset the timer. If non-queriers receive no Query message from the querier when the timer expires, they trigger election of a new querier. Member join mechanism PC2 and PC3 need to receive IPv6 multicast data destined for IPv6 multicast group G1, and PC1 needs to receive IPv6 multicast data destined for IPv6 multicast group G2. The hosts need to join their respective multicast groups, and then the MLD querier (R1) needs to maintain IPv6 group memberships.
The query and report process is as follows: • Hosts send Multicast Listener Report messages to the IPv6 multicast groups that they want to join without waiting to receive a Query message from the MLD querier. • The MLD querier (R1) periodically multicasts General Query messages with destination address FF02::1 to all hosts and routers on the local network segment. • After PC2 and PC3 receive the Query message, the host whose delay timer expires first sends a Report message to G1. If the delay timer of PC2 expires first, PC2 multicasts a Report message to G1, declaring that it belongs to G1. All hosts on the local network segment can receive the Report message sent from PC2 to G1. When PC3 receives this Report message, it does not send the same Report message to G1 because MLD routers (R1 and R2) have known that G1 has members on the local network segment. This mechanism suppresses duplicate Report messages, reducing information traffic on the local network segment. • PC1 still needs to multicast a Report message to G2, declaring that it belongs to G2. • After receiving the Report messages, MLD routers know that multicast groups G1 and G2 have members on the local network segment. Then the routers use IPv6 multicast routing protocols (such as IPv6 PIM) to create (*, G1) and (*, G2) entries for multicast data forwarding, in which * stands for any multicast source. • When IPv6 multicast data sent from an IPv6 multicast source reaches the MLD routers through multicast routes, the MLD routers forward the received multicast data to the local network segment because they have (*, G1) and (*, G2) entries. Subsequently, receiver hosts can receive the IPv6 multicast data. Member Leave Mechanism The host sends a Done message with destination address FF02::2 to all IPv6 multicast routers on the local network segment.
When the MLD querier receives the Done message, it sends a Multicast-Address-Specific Query message to the IPv6 multicast group that the host wants to leave. The destination address and group address of the Query message are the address of this IPv6 multicast group. If the IPv6 multicast group has other members on the network segment, the members send a Report message within the maximum response time. If the querier receives the Report messages from other members within the maximum response time, the querier continues to maintain memberships of the IPv6 multicast group. Otherwise, the querier considers that the IPv6 multicast group has no member on the local network segment and stops maintaining memberships of the IPv6 multicast group.
IPv6 multicast source filtering MLDv2 supports IPv6 multicast source filtering and defines two filter modes: INCLUDE and EXCLUDE. When a host joins an IPv6 multicast group G, the host can choose to accept or reject IPv6 multicast data from a specific source S. When a host joins an IPv6 multicast group: • If the host only needs to receive data sent from sources S1, S2, and so on, the host can send a Report message with an INCLUDE Sources (S1, S2,…) entry. • If the host wants to reject data sent from sources S1, S2, and so on, the host can send a Report message with an EXCLUDE Sources (S1, S2,…) entry. IPv6 Multicast Group Status Tracking Multicast routers running MLDv2 keep IPv6 multicast group state based on per multicast address per attached link. The IPv6 multicast group state includes: • Filter mode: The MLD querier tracks the INCLUDE or EXCLUDE state. • Source list: The MLD querier tracks the sources that are added or deleted. • Timers: include a filter timer when the MLD querier switches to the INCLUDE mode after its IPv6 multicast address expires and a source timer about source records.
Receiver Host Status Listening Multicast routers running MLDv2 listen to the receiver host status to record and maintain information about hosts that join IPv6 multicast groups on the local network segment.
IPv4/IPv6 dual stack is an efficient technology that implements IPv4-toIPv6 transition. In IPv4/IPv6 dual stack, network devices support both the IPv4 protocol stack and IPv6 protocol stack. The source device selects a protocol stack according to the IP address of the destination device. Network devices between the source and destination devices select a protocol stack to process and forward packets according to the packet protocol type. IPv4/IPv6 dual stack can be implemented on a single device or on a dual-stack backbone network. On a dual-stack backbone network, all devices must support the IPv4/IPv6 dual stack, and interfaces connected to the dual-stack network must have both IPv4 and IPv6 addresses configured. The topology is described as follows: The host sends a DNS request to the DNS server for the IP address of domain name www.huawei.com. The DNS server replies with the requested IP address of the domain name. The IP address may be 10.1.1.1 or 3ffe:yyyy::1. If the host sends a class-A query, the DNS server replies with the IPv4 address of the domain name. When the host sends a class-AAAA query, the DNS server replies with the IPv6 address of the domain name.
The R1 in the figure supports IPv4/IPv6 dual stack. If the host needs to access network server at IPv4 address 10.1.1.1, the host can access the network server through the IPv4 protocol stack of R1.If the host needs to access the network server at IPv6 address 3ffe:yyyy::1, the host can access the network server through the IPv6 protocol stack of R1.
During early transition, IPv4 networks are widely deployed, while IPv6 networks are isolated islands. IPv6 over IPv4 tunneling allows IPv6 packets to be transmitted on an IPv4 network, interconnecting all IPv6 islands.
Principles are as follows: IPv4/IPv6 dual stack is enabled and an IPv6 over IPv4 tunnel is deployed on edge routing devices. After an edge routing device receives a packet from the IPv6 network, the device appends an IPv4 header to the IPv6 packet to encapsulate the IPv6 packet as an IPv4 packet if the destination address of the packet is not the device and the outbound interface of the packet is a tunnel interface. On the IPv4 network, the encapsulated packet is transmitted to the remote edge routing device. The remote edge routing device decapsulates the packet, removes the IPv4 header, and then sends the decapsulated IPv6 packet to the connected IPv6 network. The IPv4 address of the source end of an IPv6 over IPv4 tunnel must be manually configured, but the IPv4 address of the destination end can be manually configured or automatically obtained. An IPv6 over IPv4 tunnel can be a manual or an automatic tunnel depending on how the destination end of the tunnel obtains its IPv4 address.
Manual tunnel: The edge routing device cannot automatically obtain the IPv4 address of the destination end, which must be manually configured so that the packets can be correctly forwarded to the tunnel end. Automatic tunnel: The edge routing device can automatically obtain the IPv4 address of the destination end and does not require you to manually configure an IPv4 address for the destination end. In most cases, two interfaces on both ends of an automatic tunnel use IPv6 addresses that contain embedded IPv4 addresses so that the destination IPv4 address can be extracted from the destination IPv6 address of IPv6 packets.
If an edge routing device needs to set up a manual tunnel with multiple devices, multiple tunnels must be configured on the edge routing device. Such configuration is complex. To simplify the configuration, a manual tunnel is often set up between two edge routing devices to connect two IPv6 networks. The manual tunnel has advantages and disadvantages: Advantage: applies to any environment in which IPv6 traverses IPv4. Disadvantage: must be manually configured. Packets are transmitted in an IPv6 over IPv4 manual tunnel as follows: When an edge device of the tunnel receives an IPv6 packet from an IPv6 network, the device searches for the IPv6 routing table according to the destination address of the IPv6 packet. If the packet is forwarded from the virtual tunnel interface, the device encapsulates the packet according to the tunnel source and destination IPv4 addresses configured for the tunnel interface. The encapsulated packet becomes an IPv4 packet, which is then processed by the IPv4 protocol stack. The IPv4 packet is forwarded to the destination end of the tunnel over an IPv4 network. After the destination end of the tunnel receives the IPv4 packet, it decapsulates the packet and sends the decapsulated packet to the IPv6 protocol stack.
An IPv6 over IPv4 GRE tunnel uses standard GRE tunneling technology to provide a point-to-point connection and requires tunnel endpoint addresses to be manually configured. GRE tunnels have no limitations on the encapsulation protocol and transport protocol, which can be any protocol such as IPv4, IPv6, OSI, or Multiprotocol Label Switching (MPLS). Packet forwarding on an IPv6 over IPv4 GRE tunnel is similar to that on an IPv6 over IPv4 manual tunnel.
The destination address of IPv6 packets transmitted over an automatic IPv4-compatible IPv6 tunnel is an IPv4-compatible IPv6 address (the special address used by the automatic tunnel). An IPv4-compatible IPv6 address is an IPv6 unicast address that has zeros in the highorder 96 bits and an IPv4 address in the low-order 32 bits. Disadvantages of an automatic IPv4-compatible IPv6 tunnel: An automatic IPv4-compatible IPv6 tunnel requires that each host on both ends should have a valid IP address and support IPv4/IPv6 dual stack and automatic IPv4-compatible IPv6 tunnels. Therefore, automatic IPv4-compatible IPv6 tunnels cannot be deployed in a large scale. Currently, automatic IPv4compatible IPv6 tunnels have been replaced by automatic 6to4 tunnels. Packet forwarding process is as follows: After R1 receives an IPv6 packet destined for R2, R1 searches for an IPv6 route according to destination address ::2.1.1.1, and finds that the next hop is a tunnel interface. The tunnel configured on R1 is an automatic IPv4-compatible IPv6 tunnel. Therefore, R1 encapsulates the IPv6 packet into an IPv4 packet. In the IPv4 packet, the source address is the tunnel source address 1.1.1.1, and the destination address is the loworder 32 bits of IPv4-compatible IPv6 address ::2.1.1.1, namely, 2.1.1.1. The IPv4 packet is forwarded by the tunnel interface on R1 over the IPv4 network to R2 at 2.1.1.1.
After R2 receives the IPv4 packet, it decapsulates the IPv4 packet to obtain the IPv6 packet and sends the IPv6 packet to the IPv6 protocol stack for processing. An IPv6 packet is sent from R2 to R1 following a similar process.
An automatic 6to4 tunnel is also a kind of automatic tunnel and set up using the IPv4 address embedded in an IPv6 address. Unlike an automatic IPv4-compatible IPv6 tunnel, the 6to4 automatic tunnel can be set up from a router to a router, from a host to a router, from a router to a host, and from a host to a host. The address format is as follows: FP: is the format prefix of aggregatable global unicast addresses and fixed as 001. TLA: is short for top level aggregator and fixed as 0x0002. SLA: is short for site level aggregator. A 6to4 address starts with the prefix 2002::/16 in the format of 2002:IPv4-address::/48. A 6to4 address has a 64-bit network prefix, in which the first 48 bits (2002:a.b.c.d) are the IPv4 address assigned to a router interface and cannot be changed, and the last 16 bits (SLA) can be configured by the user.
An IPv4 address can only be used as the source address of one 6to4 tunnel. If one edge router connects to multiple 6to4 networks and uses the same IPv4 address as the tunnel source address, SLA IDs in 6to4 addresses are used to differentiate the 6to4 networks. These 6to4 networks, however, share the same 6to4 tunnel.
Common IPv6 networks need to communicate with 6to4 networks over IPv4 networks. This requirement can be met through 6to4 relays. A 6to4 relay is a next-hop device that forwards IPv6 packets of which the destination address is not a 6to4 address but the next-hop address is a 6to4 address. The tunnel destination IPv4 address is obtained from the next-hop 6to4 address. If a host on 6to4 network 2 needs to communicate with devices on the IPv6 network, a route must be configured on the edge router, and the next-hop address of the route to the IPv6 network is specified as the 6to4 address of the 6to4 relay. The 6to4 address of the relay matches the source address of the 6to4 tunnel. Packets to be sent from 6to4 network 2 to the IPv6 network are first sent to the 6to4 relay according to the next hop specified in the routing table. The 6to4 relay then forwards the packet to the IPv6 network. When a packet needs to be sent from the IPv6 network to 6to4 network , the 6to2 relay encapsulates the packet as an IPv4 packet according to the destination address (a 6to4 address) of the packet so that the packet can be successfully sent to 6to4 network.
Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) is another automatic tunneling mechanism. An ISATAP tunnel uses an IPv6 address with an embedded IPv4 address. An ISATAP address uses an IPv4 address as the interface identifier, while a 6to4 address uses an IPv4 address as the network prefix. The address is described as follows: If the IPv4 address is globally unique, the u bit is 1. Otherwise, the u bit is 0. The g bit indicates whether the IPv4 address is unicast or multicast. An ISATAP address can be a global unicast address, link-local address, unique local address, or multicast address. The first 64 bits of an ISATAP address are obtained through a request sent to an ISATAP router and can be automatically configured. The Neighbor Discovery (ND) protocol can run between edge devices on both ends of an ISATAP tunnel. An ISATAP tunnel regards an IPv4 network as a non-broadcast multi-access (NBMA) network. The forwarding process is described as follows: The IPv4 network has two dual-stack hosts PC2 and PC3, each of which has a private IPv4 address. To implement the ISATAP function, perform the following operations: • Configure ISATAP tunnel interfaces. The hosts generate ISATAP interface IDs according to their IPv4 addresses.
•
• •
The hosts then generate a link-local IPv6 address according to the ISATAP interface identifier. Then the two hosts have IPv6 communication capabilities on the local link. The hosts perform address autoconfiguration and obtain IPv6 global unicast addresses and ULA addresses. The host obtains an IPv4 address from the next hop IPv6 address as the destination address, and forwards packets through the tunnel interface to communicate with another IPv6 host. If the destination host is within the local site, the next hop is the destination host. If the destination host is in a different site, the next hop address is the address of the ISATAP router.
During a later stage of IPv4-to-IPv6 transition, IPv6 networks are widely deployed, while IPv4 networks are isolated islands over the world. You can create a tunnel on an IPv6 network to connect isolated IPv4 sites so that isolated IPv4 sites can access other IPv4 networks through the IPv6 public network. The forwarding process is described as follows: IPv4/IPv6 dual stack is enabled and an IPv4 over IPv6 tunnel is deployed on edge routing devices. After the edge routing device receives a packet from the connected IPv4 network, it adds an IPv6 header to the IPv4 packet to encapsulate the IPv4 packet as an IPv6 packet if the destination address of the packet is not the routing device. On the IPv6 network, the encapsulated packet is transmitted to the remote edge routing device. The remote edge routing device decapsulates the packet, removes the IPv6 header, and then sends the decapsulated IPv4 packet to the connected IPv4 network.
Example description: The device addresses are determined as follows: • If RTX connects to RTY, the addresses of the two devices are 2001:XY::X/64 and 2001:XY::Y/64 respectively.
The commands and their functions are as follows: ripng: creates an RIPng process. ripng enable: enable RIPng on an interface. ripng metricout: sets the metric that is added to the RIPng route sent by an interface. import-route: configures RIPng to import routes from other routing protocols. You can use the route-policy parameter to filter routes to be imported and configure route properties. Precautions: The policy usage is similar to that in IPv4.
Example description: The device addresses are determined as follows: • If RTX connects to RTY, the addresses of the two devices are 2001:XY::X/64 and 2001:XY::Y/64 respectively.
The commands and their functions are as follows: router-id: configures the ID of the router running OSPFv3. ospfv3 area: enables the OSPFv3 process on an interface and specifies the area the process belongs to. nssa: configures an OSPFv3 area as an NSSA. undo ipv6 nd ra halt: enables the system to send RA packets. ipv6 address auto global: enables a device to automatically generate a global IPv6 address through stateless autoconfiguration. Precautions: OSPFv3 has similar features as OSPFv2.
Example description: The device addresses are determined as follows: • If RTX connects to RTY, the addresses of the two devices are 2001:XY::X/64 and 2001:XY::Y/64 respectively.
The commands and their functions are as follows: ipv6 enable: enables the IPv6 capability of an IS-IS process. ipv6 nd ra prefix: configures the prefix in an RA packet. isis ipv6 enable: enables the IS-IS IPv6 capability for an interface and specifies the ID of the IS-IS process to be associated with the interface. ipv6 import-route isis level-2 into level-1: configures IPv6 route importing from Level-2 areas to Level-1 areas. Precautions: IS-IS IPv6 has similar features as IS-IS IPv4.
Example description: The device addresses are determined as follows: • If RTX connects to RTY, the addresses of the two devices are 2001:XY::X/64 and 2001:XY::Y/64 respectively.
The commands and their functions are as follows: peer{ipv6-address | group-name } as-number as-number: creates a peer or configures an AS number for a specified peer group. ipv6-family: displays the IPv6 address family view of BGP. peer enable: enables a BGP device to exchange routes with a specified peer or peer group in the address family view. peer connect-interface: specifies a source interface from which BGP packets are sent, and a source address used for initiating a connection. peer password: enables a BGP device to implement MD5 authentication for BGP messages exchanged during the establishment of a TCP connection with a peer. Precautions: BGP4+ has similar features as BGP.
Example description: IPv6 and IPv4 addresses have been specified.
The commands and their functions are as follows: interface tunnel: creates a tunnel interface and displays the tunnel interface view. tunnel-protocol ipv6-ipv4: sets the tunnel mode to IPv6 over IPv4 manual tunnel. source { ipv4-address | interface-type interface-number }: specifies the source interface of a tunnel. destination { ipv4-address }: specifies the destination interface of a tunnel. ipv6 address { ipv6-address prefix-length }: configures IPv6 addresses for tunnel interfaces.
Example description: IPv6 and IPv4 addresses have been specified.
The commands and their functions are as follows: interface tunnel: creates a tunnel interface and displays the tunnel interface view. tunnel-protocol gre: sets the tunnel mode to IPv6 over IPv4 GRE tunnel. source { ipv4-address | interface-type interface-number }: specifies the source interface of the tunnel. destination { ipv4-address }: specifies the destination interface of a tunnel. ipv6 address { ipv6-address prefix-length }: configures IPv6 addresses for tunnel interfaces.
MPLS VPN overview A BGP/MPLS IP VPN is a Layer 3 Virtual Private Network (L3VPN). It uses the Border Gateway Protocol (BGP) to advertise VPN routes and uses Multiprotocol Label Switching (MPLS) to forward VPN packets on the backbone network of the Service Provider (SP). This technology is called IP VPN because IP packets are transmitted on VPNs. The BGP/MPLS IP VPN model consists of the following entities: • Customer Edge (CE): a device that is deployed at the edge of a customer network and has interfaces directly connected to the SP network. A CE device can be a router, switch, or host. Generally, CE devices cannot detect VPNs and do not need to support MPLS. • Provider Edge (PE): a device that is deployed at the edge of an SP network and directly connected to a CE device. On an MPLS network, PE devices process all VPN services and must have high performance. • Provider (P): a backbone device that is deployed on an SP network and is not directly connected to CE devices. P devices only need to provide basic MPLS forwarding capabilities and do not maintain VPN information. PE and P devices are managed by SPs. CE devices are managed by customers unless customers authorize SPs to manage their CE devices.
A PE device can connect to multiple CE devices. A CE device can connect to multiple PE devices of the same SP or different SPs.
Site
A site is a group of IP systems with IP connectivity, which can be achieved independent of ISP networks. Sites are configured based on topologies between devices but not their geographic locations, although devices in a site are geographically adjacent to each other in most situation. The devices in a site may belong to multiple VPNs. That is, a site may belong to more than multiple VPNs.
Different VPN sites can use overlapping address spaces.
A PE device establishes and maintains a VPN instance for each directly connected site. A VPN instance contains VPN member interfaces and routes of the corresponding site. Specifically, information in a VPN instance includes the IP routing table, label forwarding table, interface bound to the VPN instance, and VPN instance management information. VPN instance management information includes the route distinguisher (RD), route filtering policy, and member interface list of the VPN instance. A public routing and forwarding table and a VRF differ in the following aspects: A public routing table contains IPv4 routes of all the PE and P devices. The routes are static routes or dynamic routes generated by routing protocols on the backbone network. A VPN routing table contains routes of all sites that belong to a VPN instance. The routes are obtained through the exchange of VPN routing information between PE devices or between CE and PE devices. Information in a public forwarding table is extracted from the public routing table according to route management policies, whereas information in a VPN forwarding table is extracted from the corresponding VPN routing table.
VPN instances on a PE device are independent of each other and maintain a VRF independent of the public routing and forwarding table. Each VPN instance can be considered as a virtual device, which maintains an independent address space and connects to VPNs through interfaces.
The PE devices use Multiprotocol Extensions for BGP-4 (MP-BGP) to advertise VPN routes and use the VPN-IPv4 address family to solve the problem that BGP cannot distinguish VPN routes with the same IP address prefix. RDs distinguish the IPv4 prefixes with the same address space. The RD format enables SPs to allocate RDs independently. When CE devices are dual-homed to PE devices, RD must be globally unique to ensure correct routing.
A VPN target, also called the route target (RT), is a 32-bit BGP extension community attribute. BGP/MPLS IP VPN uses VPN targets to control VPN routes advertisement. A VPN instance is associated with one or more VPN target attributes. VPN target attributes are classified into the following types: Export target: After a PE device learns IPv4 routes from directly connected sites, it converts the routes to VPN-IPv4 routes and sets the export target attribute for those routes. The export target attribute is advertised with the routes as a BGP extended community attribute. Import target: After a PE device receives VPN-IPv4 routes from other PE devices, it checks the export target attribute of the routes. If the export target is the same as the import target of a VPN instance on the local PE device, the local PE device adds the route to the VPN routing table. A VPN target defines which sites can receive a VPN route and which VPN routes of which sites can be received by a PE device. The reasons for using the VPN target instead of the RD as the extended community attribute is as follows:
A VPN-IPv4 route has only one RD, but can be associated with multiple VPN targets. With multiple extended community attributes, BGP can greatly improve the flexibility and expansibility of a network. VPN targets can be used to control route advertisement between different VPNs on a PE device. With properly configured VPN targets, different VPN instances on a PE device can import routes from each other.
Traditional BGP-4 defined in RFC 1771 can manage only the IPv4 routes but cannot process VPN routes that have overlapping address spaces. To correctly process VPN routes, VPNs use MP-BGP defined in RFC 2858 (Multiprotocol Extensions for BGP-4). MP-BGP supports multiple network layer protocols. Network layer protocol information is contained in the Network Layer Reachability Information (NLRI) field and the Next Hop field of an MP-BGP Update message. MP-BGP uses the address family to differentiate network layer protocols. An address family can be a traditional IPv4 address family or any other address family, such as a VPN-IPv4 address family or an IPv6 address family. For the values of address families, see RFC 1700 (Assigned Numbers).
The PE and CE devices exchange routing information through standard BGP, OSPF, IS-IS, RIP or static routes. During the process, the PE device needs to store routes received from the CE devices to different VRFs. Other operations are the same as those for common route exchange. You can configure the same routing protocol for all the CE devices. However, you must configure different instances for each VRF of a PE device. The instances do not interfere with each other.
After a PE1 device receives an IPv4 route from a CE1 device, the PE device adds the manually configured RD of the VRF to the route to change the IPv4 route into a VPNv4 route. Then the PE device changes the Next_Hop attribute in the Route Advertisement message to its own Loopback address and adds a VPN label (randomly generated by MP-IBGP) to the route. After that, the PE device adds the Export Route Target attribute to the route and sends the route to all the PE neighbors. In VRP5.3, after MPLS is enabled on PE1, PE1 uses MP-BGP to allocate VPN labels to private network routes. PE devices can then correctly exchange VPN routes. When multiple CE devices in a VPN site connect to different PE devices, VPN routes advertised from the CE devices to the PE devices may be sent back to the VPN site after the routes traverse the backbone network. This may cause routing loops in the VPN site. The Site or Origin (SOO) specifies the source site and prevents routing loops.
After PE2 receives a VPNv4 route advertised by PE1, PE2 converts the VPNv4 route into an IPv4 route and adds the IPv4 route to the corresponding VRF based on the import target attribute of the route. The VPN label of the route is retained for packet forwarding. PE2 forwards the IPv4 route to the corresponding CE device through the routing protocol between the PE and CE devices. The next hop in the route is the IP address of PE2's interface.
The data to be exchanged to VPNs needs to be forwarded through the MPLS backbone network based on MPLS labels. The process for allocating public network labels (outer labels) is as follows: The PE and P routers learn BGP next hop IP addresses using an IGP, assign outer labels using LDP, and establish LSPs. A label stack is used for packet forwarding. An outer label directs packets to the BGP next hop. An inner label indicates the outbound interface for the packet or the VPN instance to which the packet belongs. MPLS forwarding is based on only outer labels and is irrelevant to the inner labels.
CE2 sends an IP packet destined for CE1. After receiving the packet, PE2 encapsulates an inner label 15362 and then an outer label 1024 to the packet and forwards the packet to the P device. After receiving the packet, the penultimate hop P pops out the outer label, retains the inner label, and forwards the packet to PE1 based on the outer label. PE1 determines the VPN site to which the packet belongs based on the inner label, removes the inner label, and forwards the packet to CE1.
Case description In this case, the addresses for interconnecting devices are as follows: • If RTX interconnects with RTY, the addresses are XY.1.1.X and XY.1.1.Y, network mask is 24. • Assume that PE1 is RT1, PE2 is RT2, P is RT3.
Command usage ip binding vpn-instance: binds the current AC interface to a specified VPN instance. ipv4-family: enters the IPv4 address family view of BGP.
Precautions After a VPN instance is bound to or unbound from an interface, Layer 3 features such as IP address and routing protocol are deleted from the interface. If such features are required, you need to re-configure them.
Case description In this case, the addresses for interconnecting devices are as follows: • If RTX interconnects with RTY, the addresses are XY.1.1.X and XY.1.1.Y, network mask is 24. • Assume that PE1 is RT1, PE2 is RT2, P is RT3.
Command usage ip binding vpn-instance: binds the current AC interface to a specified VPN instance. ipv4-family: enters the IPv4 address family view of BGP.
Precautions Specify a VPN instance for each RIP process on the PE device.
Case description In this case, the addresses for interconnecting devices are as follows: • If RTX interconnects with RTY, the addresses are XY.1.1.X and XY.1.1.Y, network mask is 24. • Assume that PE1 is RT1, PE2 is RT2, P is RT3.
Command usage ip binding vpn-instance: binds the current AC interface to a specified VPN instance. ipv4-family: enters the IPv4 address family view of BGP.
Precautions Specify a VPN instance for each IS-IS process on the PE device. Deleting a VPN instance or disabling a VPN instance IPv4 address family will delete all the IS-IS processes bound to the VPN instance or the VPN instance IPv4 address family on the PE.
Case description In this case, the addresses for interconnecting devices are as follows: • If RTX interconnects with RTY, the addresses are XY.1.1.X and XY.1.1.Y, network mask is 24. • Assume that PE1 is RT1, PE2 is RT2, P is RT3.
Command usage ip binding vpn-instance: binds the current AC interface to a specified VPN instance. ipv4-family: enters the IPv4 address family view of BGP. Precautions Specify a VPN instance for each OSPF process on the PE device. Deleting a VPN instance or disabling a VPN instance IPv4 address family will delete all the OSPF processes bound to the VPN instance or the VPN instance IPv4 address family on the PE.
Case description In this case, the addresses for interconnecting devices are as follows: • If RTX interconnects with RTY, the addresses are XY.1.1.X and XY.1.1.Y, network mask is 24. • Assume that PE1 is RT1, PE2 is RT2, P is RT3.
Command usage ip binding vpn-instance: binds the current AC interface to a specified VPN instance. peer substitute-as: replaces the AS number of the peer specified in the AS_Path attribute with the local AS number. Precautions VPN sites in the same AS or with different private AS numbers can communicate over the BGP MPLS/IP VPN backbone network. Sites in the same VPNs have the same AS number. When a local CE device establishes an EBGP neighbor relationship with a PE device, you need to run the peer substitute-as command to enable AS number substitution on the PE device. If AS number substitution is disabled, the local CE device discards VPN routes with the local AS number. As a result, VPN users cannot communicate with each other.
To improve the HA of a device, increase MTBF and reduce MTTR.
Concepts Two network devices establish a BFD session to detect the bidirectional forwarding paths between them and serve upper-layer applications. BFD does not provide the neighbor discovery mechanism. Instead, BFD obtains neighbor information from the upper-layer applications BFD serves. After the BFD session is established, the local device periodically sends BFD packets. If the local device does not receive a response from the peer device within the detection time, it considers the forwarding path faulty. BFD then notifies the upper-layer application for processing. BFD control messages are encapsulated in UDP packets. The destination port number is 3784 and source port number is a random value from 49152 to 65535. BFD session establishment process OSPF discovers neighbors using the hello mechanism and sets up connections to neighbors. After setting up a neighbor relationship, OSPF notifies neighbor information (including destination and source addresses) to BFD. BFD sets up a session by using the received neighbor information. After the BFD session is set up, BFD starts to detect link faults and rapidly responds to link faults. BFD session establishment process A link fault is detected.
BFD detects the link fault and changes the BFD session status to Down. BFD notifies the local OSPF device that the BFD peer is unreachable. Local OSPF process tears down the connection with the OSPF neighbor.
The BFD sessions have the following status: Down, Init, Up, and Down. Down: indicates that a BFD session is in the Down state or has just been set up. Init: indicates that the local system can communicate with the peer system, and the local system expects to make the session Up. Up: indicates that a session is established successfully. AdminDown: indicates that a session is in the AdminDown state. BFD session status transition: R1 and R2 start BFD state machines respectively. The initial state of BFD state machine is Down. R1 and R2 send BFD control messages with the State field as Down. After receiving the BFD message with the State field as Down from R1, R2 switches the session status to Init and sends a BFD message with State field as Init. After the local BFD session status of R2 changes to Init, R2 no longer processes the received BFD messages with the State field as Down. The BFD session status change on R1 is the same as that on R2. After receiving the BFD message with the State field as Init, R2 changes the local BFD session status to Up. The BFD session status change on R1 is the same as that on R2.
Common Commands Single-hop detection and multi-hop detection • Single-hop or multi-hop detection: • The bfd command enables the global BFD and displays the BFD view. • The bfd bind peer-ip command creates a BFD binding and establishes a BFD session. • The discriminator command sets the local and remote discriminators for the current BFD session. • The commit command submits the configurations of a BFD session. Association between BFD and interface status • The bfd command enables the global BFD and displays the BFD view. • The bfd bind peer-ip default-ip command binds the physical status of a physical link to the BFD session. • The discriminator command sets the local and remote discriminators for the current BFD session. • The process-interface-status command associates the status of the current BFD session with the status of the interface to which the session is bound. The configuration is similar to the configuration of BFD and route association, and is omitted here.
When a router fails, neighbors at the routing protocol layer detect that their neighbor relationships are Down and then become Up again after a period of time. This is the flapping of neighbor relationships. The flapping of neighbor relationships causes route flapping, which leads to black hole routes on the restarted router or causes data services from the neighbors to be transmitted bypass the restarted router. This decreases the reliability on the network. NSF is thus introduced to address route flapping issue. The following requirements must be met: Hardware: Dual control boards must be configured with redundant RP. One is the active board and the other is the standby board. If the active board restarts, the standby board becomes the active one. The distributed structure is used. That is, data forwarding and control are separated, and LPUs are responsible for data forwarding. System software: When the active control board is running, it synchronizes configuration and interface state information to the standby control board. When an active/standby switchover occurs, LPUs do not reset or withdraw forwarding entries, and the interfaces remain Up. Protocols: Graceful restart (GR) must be supported for related network protocols, such as routing protocols OSPF, IS-IS, and BGP, and other protocols such as Label Distribution Protocol (LDP) and Resource Reservation Protocol (RSVP).
Graceful Restart (GR) is a mechanism that ensures nonstop service data forwarding during an active/standby switchover or a protocol restart. When a device is performing a protocol restart, it notifies neighboring devices of its restart so that the neighboring relationships and routes are stably maintained in a certain period. After the protocol restart is complete, the neighboring devices synchronize configurations (including the topologies, routes, and sessions maintained by the GRrelated protocols) to the GR Restarter. The configurations on the GR Restarter are quickly restored. During the protocol restart, route flapping will not occur and packet forwarding path is not changed. The entire system continuously works. OSPF GR terms: GR Restarter: indicates the GR-capable device where protocol restart occurs. GR Helper: indicates a device neighboring with the GR Restarter and helping complete the GR process. GR Session: indicates the process of GR capability negotiation performed during OSPF neighbor relationship establishment. The negotiated content includes whether the two parties have the GR capability. If the GR capability negotiation is successful, the GR process starts when the protocol restart occurs. Assume that R1 and R2 have a stable OSPF neighbor relationship and GR capability is enabled on R1 and R2. When R1 restarts, the GR process is as follows:
After R1 restarts, it sends a Grace LSA to R2. When R2 receives the Grace LSA sent by R1, it maintains the neighbor relationship with R1. R1 and R2 exchange hello and DD packets and synchronize LSDB to each other. LSAs are not generated during GR; therefore, if R1 receives its own LSAs from R2 during LSDB synchronization, it stores them and adds the Stable tag. After LSDB synchronization is complete, R1 sends Grace LSA to notify R2 that the GR is finished. R1 starts the OSPF process and regenerates LSAs, and then deletes the LSAs that are tagged Stable and not regenerated. After restoring all routing entries, R1 starts to recalculate routes and updates the FIB table. OSPF GR commands: The opaque-capability enable command enables the Opaque-LSA capability. After Opaque-LSA capability is enabled, an OSPF process can generate Opaque-LSAs and receive Opaque-LSAs from neighboring devices. The graceful-restart command enables OSPF GR.
IS-IS GR also uses the concepts of GR Restarter, GR Helper, and GR Session, which are the same as that used in OSPF GR. To support the GR feature, IS-IS adds the Restart TLV field to hello packets and defines three timers. T1 timer is similar to the IIH timer used in the IS-IS protocol. When a device restarts, it creates a T1 timer on each interface and periodically sends hello packets. The T1 timer on an interface is deleted only when the interface receives all hello ACK packets and CSNP packets. T2 defines the timeout period of LSDB synchronization after a device restarts. The T2 timer of a Level is deleted only when the LSDB of this Level completes synchronization. If LSDB synchronization is not complete when the T2 timer expires, the T2 timer is deleted and GR fails. T3 defines the maximum time during which the GR Restarter performs GR. If LSDB synchronization is not complete when the T3 timer expires, the T3 timer is deleted and GR fails. Assume that R1 and R2 have a stable IS-IS neighbor relationship and GR capability is enabled on R1 and R2. When R1 restarts, the GR process is as follows: T2 and T3 timers start when the IS-IS protocol on R1 is globally enabled again. When the interface of R1 goes Up again and enables the IS-IS protocol, the T1 timer starts on the interface and the interface sends a hello packet.
When R2 receives the hello packet from R1, it maintains the neighbor relationship with R1 and sends a hello packet. Then R2 sends a CSNP packet and an LSP packet to R1 to help LSDB synchronization. When the interface of R1 receives the hello packet and all CSNP packets, R1 deletes the T1 timer; otherwise, R1 periodically sends hello packets until it receives all hello packets and CSNP packets. If the number of times the T1 timer expires reaches the maximum value, the T1 timer is also deleted. When the LSDB synchronization is complete, R1 deletes the T2 timer. After all T2 timers are deleted, R1 starts to delete T3 timers. When the GR is complete, R1 starts the IS-IS process. IIH timer is started on all interfaces, and then R1 can periodically send hello packets. After restoring all routing entries, R1 starts to recalculate routes and updates the FIB table. IS-IS GR command: The graceful-restart command enables IS-IS GR.
LAND attack Because of the vulnerability in the 3-way handshake mechanism of TCP, a LAND attacker sends SYN packets of which the source address and port of a device are the same as the destination address and port respectively. After receiving the SYN packet, the target host creates a null TCP connection with the source and destination addresses as the address of the target host. The connection is kept until expiration. The target host will create many null TCP connections, wasting resources or causing device breakdown. After defense against malformed packet attacks is enabled, the device checks source and destination addresses in TCP SYN packets to prevent LAND attacks. The device considers TCP SYN packets with the same source and destination addresses as malformed packets and discards them. Commands for configuring defense against malformed packet attacks The anti-attack abnormal enable command configures defense against malformed packets. After the command is executed, the device discards malformed packets.
TCP SYN attack The TCP SYN attack takes advantage of the vulnerability in 3-way handshake of TCP. During the 3-way handshakes of TCP, when receiving the initial SYN message from the client, the server sends back an SYN+ACK packet. When the server is waiting for the final ACK packet from the client, the connection stays in half-connected mode. If the server fails to receive the ACK packet, it resends a SYN+ACK packet to the client. If the server still cannot receive ACK packets, the server closes the connection and updates the session status in memory. The interval from the sending of initial SYN+ACK packet to connection closing is about 30 seconds. During this interval, the attacker may send more than 100 thousands of SYN packets to the open interfaces and does not respond to the SYN+ACK packets from the server. Then, memory of the server is overloaded and cannot accept new connection requests. As a result, the server closes all active connections. After defense against TCP SYN flood attacks is enabled, the device limits the rate of TCP SYN packets so that system resources will not be exhausted by attacks.
Commands for configuring defense against malformed packet attacks The anti-attack udp-flood enable command enables the TCP SYN Flood attack defense. The anti-attack tcp-syn car command configures the rate limit for TCP SYN packets. If the rate of received TCP SYN flood packets exceeds the limit, the device discards excess packets to ensure normal working of CPU.
Two modes of URPF: Strict mode • In this mode, packets can pass the check only when the forwarding table contains the related entries and the interface of the default route matches the inbound interface of the packet. • If route symmetry is ensured, you are advised to use the URPF strict check. For example, if there is only one path between two network edge devices, URPF strict check can be used to ensure network security. Loose mode • In this mode, packets pass the check as long as the source IP addresses of the packets match the entries in the routing table. • If route symmetry is not ensured, you are advised to use the URPF loose check. For example, if there are multiple paths between two network edge devices, URPF loose check can be used to ensure network security. Topology description A bogus packet with source IP address 2.1.1.1 is sent by the attacker to S1. After receiving the bogus packet, S1 sends a response packet to the destination device at 2.1.1.1. In this situation, both S1 and PC1 are attacked by the bogus packets. If URPF is enabled on S1, when S1 receives the bogus packet with source IP address 2.1.1.1, URPF discards the packet because the interface corresponding to the source address of the packet does not match the interface receiving the packet.
URPF command The urpf command enables URPF on an interface and set the URPF mode.
IPSG principles IPSG matches IP packets against static or dynamic DHCP binding table. Before a network device forwards an IP packet, it compares the source IP address, source MAC address, interface, and VLAN information in the IP packet with entries in the binding table. If a matching entry is found, the device considers the IP packet valid and forwards it. Otherwise, the device considers the IP packet as an attack packet and discards it. Working process After IPSG is configured on S1, S1 checks the incoming IP packets against the binding table. When the packet information matches the binding table, the packets are forwarded; otherwise, the packets are discarded. IPSG commands The binding table can be generated through DHCP or manually configured through static IP addresses (the user-bind static command is used to configure static table). The ip source check user-bind enable command enables the IPSG function on an interface to check the received IP packets. The ip source check user-bind check-item command configures VLAN- or interface-based IP packet check items. This command is only valid to dynamic binding table.
Topology description The figure shows a scenario of the MITM attack. The attacker sends a bogus ARP packet using the PC3's address as the source address to PC1. PC1 records incorrect address mapping relationship of PC3 in the ARP table. The attacker thus obtains the data sent by PC1 to PC3 and sent by PC3 to PC1. Therefore, information between PC1 and PC3 leaks. To prevent MITM attacks, configure DAI on S1. When an attacker connects to S1 and attempts to send bogus ARP packet to S1, S1 detects the attack behavior according to the DHCP snooping binding table and discards the ARP packet. If the ARP discarding alarm is enabled on S1, when the number of discarded ARP packets exceeds the alarm threshold, S1 sends an alarm to notify the administrator. DAI uses DHCP snooping binding table to defend against MITM attacks. Before a device forwards an ARP packet, it compares the source IP address, source MAC address, interface, and VLAN information in the ARP packet with entries in the binding table. If an entry is matched, the device considers the packet valid and forwards it; otherwise, the device considers the packet as an attack packet and discards it. DAI command The arp anti-attack check user-bind enable command enables DAI on an interface or in a VLAN. That is, the device checks ARP packets against the binding table.
QoS provides differentiated service qualities for different applications, for example, dedicated bandwidth, decreased packet loss ratio, short packet transmission delay, and decreased delay and jitter. Best-effort service model Routers and switches are packet switching devices. They select transmission path for each packet based on TCP/IP and use the statistics multiplexing method, but do not use the dedicated connections like TDM. Traditionally, IP provides only one service model (Best-Effort). In this model, all packets transmitted on a network have the same priority. Best-Effort means that the IP network tries best to transmit all packets to the correct destination addresses completely and ensure that the packets are not discarded, damaged, repeated, or loss of sequence during transmission. However, the Best-Effort model does not guarantee any transmission indicators, such as delay and jitter. Best-Effort is not belongs to the QOS technical in strict, but is the major service model used by today's Internet. So we need know about it. Due to the Best-Effort model, the Internet has made a lot of achievements. However, with the development of the Internet, the Best-Effort model cannot meet increasing requirements of emerging applications. Therefore, the SPs have to provide more types of service based on the Best-Effort model, to meet requirements of each application.
IntServ model The IntServ model, developed by IETF in 1993, supports various types of service on IP networks. It provides both realtime service and best-effort service on IP networks. The IntServ model reserves resources for each information flow. The source and destination hosts exchange RSVP messages to establish packet categories and forwarding status on each node along the transmission path. The model maintains a forwarding state for each flow, so it has a poor extensibility. There are millions of flows on the Internet, which consume a large number of device resources. Therefore, this model is not widely used. In recent years, IETF has modified the RSVP protocol, and defines that RSVP can be used together with the DiffServ model, especially in the MPLS VPN field. Therefore, RSVP has a new improvement. However, this model still has not been widely used. THe DiffServ model addresses problems in the IntServ mode, so the DiffServ model is a widely used QoS technology. DiffServ model The IntServ has a poor extensibility. After 1995, SPs and research organizations developed a new mechanism that supports various services. This mechanism has a high extensibility. In 1997, IETF recognized that the service model in use is not applicable to network operation, and there should be a way to classify information flows and provide differentiated service for users and applications. Therefore, IETF developed the DiffServ model, which classifies flow on the Internet and provides differentiated service for them. The DiffServ model supports various applications and is applicable to many business models.
Precedence field The 8-bit Type of Service (ToS) field in an IP packet header contains a 3-bit IP precedence field. Bits 0 to 2 constitute the Precedence field, representing precedence values 7, 6, 5, 4, 3, 2, 1 and 0 in descending order of priority. The highest priorities (values 7 and 6) are reserved for routing and network control communication updates. Userlevel applications can use only priority values 0 to 5. Bits 6 and 7 are reserved. Apart from the Precedence field, a ToS field also contains the D, T, and R sub-fields: • Bit D indicates the delay. The value 0 represents a normal delay and the value 1 represents a short delay. • Bit T indicates the throughput. The value 0 represents normal throughput and the value 1 represents high throughput. • Bit R indicates the reliability. The value 0 represents normal reliability and the value 1 represents high reliability.
DSCP field RFC 2474 redefines the TOS field. The right-most 6 bits identify service type and the left-most 2 bits are reserved. DSCP can classify traffic into 64 categories.
Each DSCP value matches a Behavior Aggregate (BA) and each BA matches a PHB (such as forward and discard), and then the PHB is implemented using some QoS mechanisms (such as traffic policing and queuing technologies). DiffServ network defines four types of PHB: Expedited Forwarding (EF), Assured Forwarding (AF), Class Selector (CS), and Default PHB (BE PHB). EF PHB is applicable to the services that have high requirements on delay, packet loss, jitter, and bandwidth. AF PHBs are classified into four categories and each AF PHB category has three discard priorities to specifically classify services. The performance of AF PHB is lower than the performance of EF PHB. CS PHBs originate from IP TOS, and are classified into 8 categories. BE PHB is a special type in CS PHB, and does not provide any guarantee. Traffic on IP networks belongs to this category by default. Priority mapping configuration Configure the trusted packet priorities: Run the trust command to specify the packet priority to be mapped. Configure the priority mapping table: Run the qos map-table command to enter the 802.1p or DSCP mapping table view, and run the input command to set the priority mappings.
Token bucket A token bucket with a certain capacity stores tokens. The system places tokens into a token bucket at the configured rate. When the token bucket is full, excess tokens overflow and no token is added. A token bucket forwards packets according to the number of tokens in the token bucket. If there are sufficient tokens in the token bucket for forwarding packets, the traffic rate is within the rate limit. Otherwise, the traffic rate is not within the rate limit. Single-rate-single-bucket A token bucket is called bucket C. Tc indicates the number of tokens in the bucket. Single-rate-single-bucket has two parameters: • Committed Information Rate (CIR): indicates the rate of putting tokens into bucket C, that is, the average traffic rate permitted by bucket C. • Committed Burst Size (CBS): indicates the capacity of bucket C, that is, the maximum volume of burst traffic allowed by bucket C each time. The system places tokens into the bucket at the CIR. If Tc is smaller than the CBS, Tc increases; otherwise, Tc does not increase. B indicates the size of an arriving packet: • If B is smaller than or equal to Tc, the packet is colored green, and Tc decreases by B. • If B is greater than Tc, the packet is colored red, and Tc remains unchanged.
Single-Rate-Double-Bucket Two token buckets are available: bucket C and bucket E. Tc and Te indicate the number of tokens in the bucket. Single-rate-double-bucket has three parameters: • Committed Information Rate (CIR): indicates the rate of putting tokens into bucket C, that is, the average traffic rate permitted by bucket C. • Committed Burst Size (CBS): indicates the capacity of bucket C, that is, the maximum volume of burst traffic allowed by bucket C each time. • Excess Burst Size (EBS): indicates the capacity of bucket E, that is, the maximum volume of excess burst traffic allowed by bucket E each time. The system places tokens into the buckets at the CIR: • If Tc is smaller than the CBS, Tc increases. • If Tc is equal to the CBS and Te is smaller than the EBS, Te increases. • If Tc is equal to the CBS and Te is equal to the EBS, Tc and Te do not increase. B indicates the size of an arriving packet: • If B is smaller than or equal to Tc, the packet is colored green, and Tc decreases by B. • If B is greater than Tc and smaller than or equal to Te, the packet is colored yellow and Te decreases by B. • If B is greater than Te, the packet is colored red, and Tc and Te remain unchanged.
Double-Rate-Double-Bucket Two token buckets are available: bucket P and bucket C. Tp and Tc indicate the number of tokens in the bucket. Double-rate-double-bucket has four parameters: • Peak information rate (PIR): indicates the rate at which tokens are put into bucket P, that is, the maximum traffic rate permitted by bucket P. The PIR must be greater than the CIR. • Committed Information Rate (CIR): indicates the rate of putting tokens into bucket C, that is, the average traffic rate permitted by bucket C. • Peak Burst Size (PBS): indicates the capacity of bucket P, that is, the maximum volume of burst traffic allowed by bucket P each time. PBS is greater than CBS. • Committed Burst Size (CBS): indicates the capacity of bucket C, that is, the maximum volume of burst traffic allowed by bucket C each time. The system places tokens into bucket P at the rate of PIR and places tokens into bucket C at the rate of CIR: • If Tp is smaller than the PBS, Tp increases. If Tp is greater than or equal to the PBS, Tp remains unchanged. • If Tc is smaller than the CBS, Tc increases. If Tc is greater than or equal to the CBS, Tc remains unchanged.
B indicates the size of an arriving packet: • If B is greater than Tp, the packet is colored red. • If B is greater than Tc and smaller than or equal to Tp, the packet is colored yellow and Tp decreases by B. • If B is smaller than or equal to Tc, the packet is colored green, and Tp and Tc decrease by B.
Traffic policing discards excess traffic to limit traffic within a proper range and to protect network resources and enterprises' interests. Traffic policing consists of: Meter: measures the network traffic using the token bucket mechanism and sends the measurement result to the marker. Marker: colors packets in green, yellow, or red based on the measurement result received from the meter. Action: takes actions based on packet coloring results (packets in green or yellow are forwarded and packets in red are discarded by default) received from the marker. The following actions are defined: • Pass: forwards the packets that meet network requirements. • Remark + pass: changes the local priorities of packets and forwards them. • Discard: discards the packets that do not meet network requirements.
If the rate of a type of traffic exceeds the threshold, the device lowers the packet priority and then forwards or directly discards the packets. By default, these packets are discarded. Traffic policing commands: Configure interface-based traffic policing: Run the qos car command to create a QoS CAR profile and configure QoS CAR parameters. The parameters in the command vary when the command is executed on a WAN interface and a LAN interface. Configure rate limiting on WAN interface: Run the qos lr command to set the ratio of packet rate sent by a physical interface to the total interface bandwidth.
Traffic policing discards excess traffic to limit traffic within a proper range and to protect network resources and enterprises' interests. Traffic shaping process: When packets arrive, the device classifies packets into different types and places them into different queues. If the queue that packets enter is not configured with traffic shaping, the packets are immediately sent. Packets requiring queuing proceed to the next step. The system places tokens to the bucket at the specified rate (CIR): • If there are sufficient tokens in the bucket, the device forwards the packets and the number of tokens decreases. • If there are insufficient tokens in the bucket, the device places the packets into the buffer queue. When the buffer queue is full, packets are discarded. When there are packets in the buffer queue, the system extracts the packets from the queue and sends them periodically. Each time the system sends a packet, it compares the number of packets with the number of tokens till the tokens are insufficient to send packets or all the packets are sent. Traffic shaping commands: Configure interface-based traffic shaping: Run the qos gts command to configure traffic shaping on the interface.
Configure queue-based traffic shaping. • Run the qos queue-profile queue-profile-name command to create a queue profile and display the queue profile view. • Run the queue { start-queue-index [ to end-queueindex ] } &<1-10> length { bytes bytes-value | packets packets-value } command to set the length of each queue. • Run the queue { start-queue-index [ to end-queueindex ] } &<1-10> gts cir cir-value [ cbs cbs-value ] command to configure queue-based traffic shaping. By default, traffic shaping is not performed for queues. • Run the qos queue-profile queue-profile-name command to apply the queue profile to an interface.
If the rate of incoming packets on an interface is higher than the rate of outgoing packets, the interface is congested. If there is insufficient space for storing the packets, some packets are discarded. When packets are discarded, hosts or routers retransmit the packets, leading to a vicious circle. When congestion occurs, multiple packets preempt resources. The packets that cannot obtain resources are discarded. The bandwidth, delay, and jitter of key services cannot be ensured. The core of congestion management is to decide the resource scheduling policy that specifies the packet forwarding sequence. Generally, devices use the queue technology to cope with congestion. The queue technology involves queue creation, traffic classifier, and queue scheduling. Initially, there is only one queue scheduling policy, that is, First-in-Firstout. To meet different service requirements, more scheduling policies are developed. Queue scheduling mechanisms include hardware queue scheduling and software queue scheduling. Hardware queue is also called transmit queue (TxQ). The interface drive uses this queue when transmiting packets one by one. The hardware queue is a FIFO queue. Software queue schedules data packets to hardware queue according to QoS requirements. It can use multiple scheduling methods. Data packets enter the software queue only when the hardware queue is full.
The hardware queue length depends on the bandwidth setting on the interface. If the interface bandwidth is high, transmission delay is short, so queue length can be long. An appropriate hardware queue length is important. If the hardware queue length is too long, the policy execution performance of the software queue degrades because the hardware queue uses the FIFO mechanism for scheduling. If the hardware queue length is too short, scheduling efficiency is low, link use efficiency is low, and the CPU usage is high. LAN ports support the FQ and WRR queues. WAN ports support the FQ and WFQ queues. Configuration commands: Run the qos queue-profile queue-profile-name command to create a queue profile and display the queue profile view. On the WAN-side interface, run the schedule{ { pq startqueue-index [ to end-queue-index ] } | {wfq start-queue-index [ to end-queue-index ] } command to set a scheduling mode for each queue on the WAN-side interface. On the LAN-side interface, run the schedule{ { pq startqueue-index [ to end-queue-index ] } | { drr start-queue-index [ to end-queue-index ] } | {wrr start-queue-index [ to endqueue-index ] } command to set a scheduling mode for each queue on the LAN-side interface. Run the qos queue-profile queue-profile-name command to apply the queue profile to an interface.
FIFP characteristics: Advantages: • Simple Disadvantages: • Unfair and no separation between flows. A large flow will occupy the bandwidth of other flows, which prolongs the delay of other flows. • When congestion occurs, FIFO discards some packets. When TCP detects packet loss, it lowers transmission speed to avoid congestion. However, UDP does not lower transmission speed because it is a connectionless protocol. As a result, the TCP and UDP packets in FIFO are not equally processed. The TCP packet rate is too low. • A flow may occupy all the buffer space and blocks other types of traffic.
RR
Advantages: • Different flows are separated, and bandwidth is equally allocated to queues. • Available bandwidth is equally allocated to other queues. Disadvantages: • Weights cannot be configured for the queues. • When queues have different packet lengths, scheduling is inaccurate. • When scheduling rate is low, delay and jitter indicators will deteriorate. For example, when a packet arrives at an empty queue that is just scheduled, this packet can be processed only when all the other queues are scheduled. In this situation, jitter is serious. However, if scheduling rate is high, the delay is short. The RR mode is widely used on high-speed routers.
Compared with RR, WRR can set the weights of queues. During the WRR scheduling, the scheduling chance obtained by a queue is in direct proportion to the weight of the queue. During the WRR scheduling, the empty queue is directly skipped. Therefore, when there is a small volume of traffic in a queue, the remaining bandwidth of the queue is used by the queues according to a certain proportion. Advantages: • Bandwidth is allocated based on weights, and the remaining bandwidth of a queue is equally allocated to other queues. Low-priority queues are also scheduled in a timely manner. • It is easy to implement. • Applicable to DiffServ ports. Disadvantages: • Similar to RR, WRR is inaccurate when queues have different packet lengths. • When scheduling rate is low, packet delay is unstable and the delay and jitter indicators cannot be lowered to the expected values.
PQ
PQ has four-level queues, including Top, Middle, Normal, and Bottom. However, most devices support eight-level queues. Packets in queues with a low priority can be scheduled only after all packets in queues with a high priority have been scheduled. Therefore, PQ has obvious advantages and disadvantages. PQ ensures that the packets in high-priority queues obtain high bandwidth, low delay and jitter; however, the packets in low-priority queues cannot be scheduled in a timely manner or even cannot be scheduled. As a result, the lower-priority queues starve out. PQ has the following characteristics: • Uses ACL to classify packets into different types and adds packets to the corresponding queues. • Packets are discarded only by using the Tail Drop mechanism. • When the queue length is set to 0, the queue length can be infinite. That is, the packets entering this queue are not discarded by Tail Drop unless the memory space is exhausted. • The FIFO logic is used internal the queue. • The packets in low-priority queues are scheduled only after all packets in high-priority queues are scheduled. PQ ensures high quality for specified service traffic, but does not care about the quality of other services.
Advantages: • Precisely controls the delay of high-priority queues. • Easy to implement, differentiating services Disadvantages: • Cannot allocate bandwidth as required. When highpriority queues have many packets, the packets in lowpriority queues cannot be scheduled. • It shortens the delay of high-priority queues by compromising the service quality of low-priority queues. • If a high-priority queue transmits TCP packets and a low-priority queue transmits UDP packets, the TCP packets are transmitted at a high speed, while UDP packets cannot obtain sufficient bandwidth.
CQ
The number of bytes to be scheduled must be specified for each queue. A packet can be scheduled only when its length exceeds the specified byte size. If the configured byte size is too small, the queue may be congested. If the configured byte size is small, bandwidth allocation is inaccurate. For example, 500 bytes is specified for a queue, while most packets in the queue exceed 1000 bytes. Therefore, the bandwidth actually allocated is higher than the expected bandwidth. If the number of bytes specified is large, it is difficult to control the delay. CQ can schedule multiple packets each time. The number of packets to be scheduled is the same as the number of packets that can be accommodated by the bytes scheduled each time. Advantages: • Allocates bandwidth according to certain percentages. When the traffic volume of a queue is small, other queues can occupy the bandwidth of this queue. • Easy to implement Disadvantages: • When the specified number of bytes is small, bandwidth allocation is inaccurate. When the specified number of bytes is large, delay and jitter are serious.
WFQ
Weighted Fair Queuing (WFQ) classifies packets by flow. On an IP network, the packets with the same source IP addresses, destination IP addresses, protocol numbers, and IP precedence belong to the same flow. On an MPLS network, the packets with the same labels and EXP fields belong to the same flow. WFQ assigns each flow to a queue, and tries to assign different flows to different flows. When packets leave the queues, WFQ allocates the bandwidth on the outbound interface for each flow according to the weights. The smaller the weight value of the flow is, the smaller the bandwidth the flow obtains. The greater the weight value of the flow is, the greater the bandwidth the flow obtains. In this manner, services of the same priority are treated equally; services of different priorities are allocated with different weight. For example, there are eight flows on the interface, with weights as 1, 2, 3, 4, 5, 6, 7, and 8 respectively. The total bandwidth quota is the sum of weights, that is, 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 = 36. The bandwidth occupied by each flow is: Weight of each flow/Total bandwidth quota. That is, flows obtain the bandwidth of 1/36, 2/36, 3/36, 4/36, 5/36, 6/36, 7/36, and 8/36. Thus, WFQ assigns different scheduling weights to services of different priorities while ensuring fairness between services of the same priority. Advantages:
The queues are scheduled fairly based on the granularity of bytes. Differentiates services and allocates weights. Properly controls delay and reduces jitter. Disadvantages: Difficult to implement.
Congestion Avoidance Tail drop is a traditional method in the congestion avoidance mechanism. When the length of a queue reaches the maximum value, all the packets are discarded. If too many TCP packets are dropped, TCP times out. This may result in slow TCP start and trigger the congestion avoidance mechanism so that the device slows down the transmission of TCP packets. When queues drop several TCP-connection packets at the same time, these TCP connections start congestion avoidance and slow startup, which is referred to as global TCP synchronization. Thus, these TCP connections simultaneously send fewer packets to the queue so that the rate of incoming packets is smaller than the rate of outgoing packets, reducing the bandwidth usage. Moreover, the volume of traffic sent to the queue varies greatly from time to time. As a result, the volume of traffic over the link fluctuates between the bottom and the peak. The delay and jitter of certain traffic are affected. The traditional packet loss policy uses the tail drop method. When the queue length reaches the upper limit, the excess packets (buffered at the queue tail) are discarded. To prevent global TCP synchronization, Random Early Detection (RED) is used. The RED technique randomly discards packets to prevent the transmission speed of multiple TCP connections from being reduced simultaneously. The TCP rate and network traffic volume thus are stable.
The device provides Weighted Random Early Detection (WRED) based on RED technology. WRED discards packets in queues based on DSCP field or IP precedence. The upper drop threshold, lower drop threshold, and drop probability can be set for each priority. When the number of packets of a priority reaches the lower drop threshold, the device starts to discard packets. When the number of packets reaches the upper drop threshold, the device discards all the packets. A higher threshold indicates a high drop probability. The maximum drop probability cannot exceed the upper drop threshold. WRED discards packets in queues based on the drop probability, thereby relieving congestion. WRED configuration: • Configure a drop profile. • Run the drop-profile drop-profile-name command to create a drop profile and enter the drop profile view. • Run the dscp{ dscp-value1 [ to dscp-value2 ] } &<1-10> low-limit low-limit-percentage highlimit high-limit-percentage discard-percentage discard-percentage command to set DSCPbased WRED parameters. • Run the ip-precedence { ip-precedence-value1 [ to ip-precedence-value2 ] } &<1-10> low-limit low-limit-percentage high-limit high-limitpercentage discard-percentage discardpercentage command to set IP precedencebased WRED parameters. • Apply the drop profile. • Run the qos queue-profile queue-profile-name command to enter the queue profile view. • Run the schedule wfq start-queue-index [ to end-queue-index ] command to set the scheduling mode of a queue to WFQ. • Run the queue { start-queue-index [ to endqueue-index ] } &<1–10> drop-profile dropprofile-name command to bind a drop profile to a queue in a queue profile. • Run the qos queue-profile queue-profile-name command to apply the queue profile to an interface.
Traffic classification is used to identify the packets with certain characteristics according to a rule, and is the prerequisite and basis for differentiated services. You can define rules to classify packets and specify the relationships between rules: AND: Packets match a traffic classifier only when the packets match all the rules. If a traffic classifier contains ACL rules, packets match the traffic classifier only when the packets match one ACL rule and all the non-ACL rules. If a traffic classifier does not contain ACL rules, packets match the traffic classifier only when the packets match all the non-ACL rules. OR: Packets match a traffic classifier as long as the packets match a rule. A traffic behavior refers to an action taken for packets. Performing traffic classification is to provide differentiated services. A traffic classifier takes effect only when it is associated with a traffic control action or a resource allocation action. A traffic policy is configured by binding traffic classifiers to traffic behaviors. After a traffic policy is applied to an interface, globally, to a board, or to a VLAN, differentiated service is provided. Traffic policy configuration commands
Configure a traffic classifier. • Run the traffic classifier classifier-name [ operator { and | or } ] command to create a traffic classifier and enter the traffic classifier view. Configure a traffic behavior. • Run the traffic behavior behavior-name command to create a traffic behavior and enter the traffic behavior view. Configure a traffic policy. • Run the traffic policy policy-name command to create a traffic policy and enter the traffic policy view. • The classifier behavior command binds a traffic behavior to a traffic classifier to a traffic behavior in a traffic policy. Run the traffic-policy policy-name { inbound | outbound } command to apply a traffic policy to the interface or subinterface in the inbound or outbound direction.
SNMP model NMS station is the manager in a network management system. It uses the SNMP protocol to manage and monitor the network. The NMS software runs on an NMS server. Agent is a process on the managed device. The agent maintains data on the managed device, receives and processes the request packets from the NMS, and then sends the response packets to the NMS. Management object is the object to be managed. A device may have multiple management objects, including a hardware component (such as an interface board) and parameters (such as a routing protocol) configured for the hardware or software. MIB is a database specifying variables that are maintained by the managed device and can be queried or set by the agent. MIB defines attributes of the managed device, including the name, status, access rights, and data type of objects.
Operations of SNMPv1 and SNMPv2c Get: reads one or several parameter values from the MIB of the agent process. GetNext: reads the next parameter value from the MIB of the agent process. Set: sets one or several parameter values in the MIB of the agent process. Response: returns one or more queried values. The agent performs this operation that corresponds to the GetRequest, GetNextRequest, SetRequest, and GetBulkRequest operations. Upon receiving a Get or Set request, the agent performs the Query or Modify operation using MIB tables and then sends the responses to the NMS. Trap: sent by an agent process to notify the NMS of a fault or event on the managed device. New Operation Types of SNMPv2c GetBulk: The NMS queries managed devices in batches. It is implemented based on the GetNext operation. A GetBulk operation equals to a series of GetNext operations. You can specify the number of times the GetNext operation is executed on the managed device during a GetBulk interaction. InformRequest: sent by a managed device to notify the NMS of an alarm on a managed device. After the managed device sends an inform, the NMS must send an InformResponse packet to the managed device.
Operations related to SNMPv3: The NMS sends a Get request without security parameters to the agent. The agent responds and returns requested parameters to the NMS. The NMS sends a Get request carrying security parameters to the agent. The agent encrypts response packet and returns required parameters to the NMS.
NQA Principles Creating a test instance • NQA requires two test ends, an NQA client and an NQA server (or called the source and destination). The NQA client (or the source) initiates an NQA test. You can configure test instances through command lines or the NMS. Then NQA places the test instances into test queues for scheduling. Starting the test instance • When starting an NQA test instance, you can choose to start the test instance immediately, at a specified time, or after a delay. A test packet is generated based on the type of a test instance when the timer expires. If the size of the generated test packet is smaller than the minimum size of a protocol packet, the test packet is generated and sent out with the minimum size of the protocol packet. Processing a test instance • After a test instance starts, the protocol-related running status can be collected according to response packets. The client adds a timestamp to a test packet based on the local system time before sending the packet to the server. After receiving the test packet, the server sends a response packet to the client. The client then adds a timestamp to the received response packet based on the current local system time. This helps the client calculate the round-trip time (RTT) of the test packet based on the two timestamps.
An NQA ICMP test instance checks whether a route from the NQA client to the destination is reachable. The ICMP test has a similar function as the ping command, while the ICMP test provides more output information: By default, the command output shows the results of the latest five tests. The output includes the average delay, the packet loss ratio, and the time the last packet is correctly received.
Test Procedure Source (R1) sends an ICMP echo request packet to the destination (R2). After receiving the ICMP echo request packet, the destination (R2) responds to the source (R1) with an ICMP echo reply packet. The source (R1) then can calculate the time of communication between the source (R1) and the destination (R2) by subtracting the time the source sends the ICMP echo request packet from the time the source receives the ICMP echo reply packet. The calculated data can reflect the network performance and operating status.
NTP synchronization process R1 sends an NTP packet to R2. The packet carries a timestamp, 10:00:00 am (T1), indicating the time it leaves R1. When the NTP packet reaches R2, R2 adds a timestamp, 11: 00:01 am (T2), to the NTP packet, indicting the time R2 receives the packet. When the NTP packet leaves R2, R2 adds a transmit timestamp, 11:00:02 am (T3), to the NTP packet, indicating the time it leaves R2. When R1 receives this response packet, it adds a new receive timestamp, 10:00:03 am (T4), to the packet. R1 uses the received information to calculate the following two important parameters: • Roundtrip delay of the NTP packet: Delay = (T4 - T1) (T3 - T2) • Clock offset of R1 by taking R2 as a reference: Offset = ((T2 - T1) + (T3 - T4))/2 After the calculation, R1 knows that the roundtrip delay is 2 seconds and the clock offset of R1 is 1 hour. R1 sets its own clock based on these two parameters to synchronize its clock with that of R2.