Building HP FlexFabric Data Centers_PD28400 698 Pages PDF

HP ExpertOne

******ebook converter DEMO Watermarks*******

Building HP FlexFabric Data Centers eBook (Exam HP2-Z34)

Hppress.com


HP ExpertOne


Building HP FlexFabric Data Centers eBook (Exam HP2-Z34) © 2014 Hewlett-Packard Development Company, L.P. Published by: HP Press 660 4th Street, #802 San Francisco, CA 94107 All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without written permission from the publisher, except for the inclusion of brief quotations in a review. ISBN: 978-1-937826-90-1 WARNING AND DISCLAIMER This book provides information about the topics covered in the Building HP FlexFabric Data Centers (HP2-Z34) certification exam. Every effort has been made to make this book as complete and as accurate as possible, but no warranty or fitness is implied. The information is provided on an “as is” basis. The author, HP Press, and HewlettPackard Development Company, L.P., shall have neither liability nor responsibility to any person or entity with respect to any loss or damages arising from the information contained in this book or from the use of the discs or programs that may accompany it. The opinions expressed in this book belong to the author and are not necessarily those of Hewlett-Packard Development Company, L.P. TRADEMARK ACKNOWLEDGEMENTS All terms mentioned in this book that are known to be trademarks or service marks have been appropriately capitalized. HP Press or Hewlett-Packard Inc. cannot attest to the accuracy of this information. Use of a term in this book should not be regarded as affecting the validity of any trademark or service mark. GOVERNMENT AND EDUCATION SALES This publisher offers discounts on this book when ordered in quantity for bulk purchases, which may include electronic versions. For more information, please contact U.S. Government and Education Sales 1-855-4HPBOOK (1-855-447-2665)


or email [email protected]. Feedback Information At HP Press, our goal is to create in-depth reference books of the best quality and value. Each book is crafted with care and precision, undergoing rigorous development that involves the expertise of members from the professional technical community. Readers’ feedback is a continuation of the process. If you have any comments regarding how we could improve the quality of this book, or otherwise alter it to better suit your needs, you can contact us through email at [email protected]. Please make sure to include the book title and ISBN in your message. We appreciate your feedback. Publisher: HP Press Contributors and Reviewers: Olaf Borowski, Gerhard Roets, Vincent Gilles, Olivier Vallois HP Press Program Manager: Michael Bishop

HP Headquarters Hewlett-Packard Company 3000 Hanover Street Palo Alto, CA 94304–1185 USA Phone: (+1) 650-857-1501 Fax: (+1) 650-857-5518 ******ebook converter DEMO Watermarks*******

HP, COMPAQ and any other product or service name or slogan or logo contained in the HP Press publications or web site are trademarks of HP and its suppliers or licensors and may not be copied, imitated, or used, in whole or in part, without the prior written permission of HP or the applicable trademark holder. Ownership of all such trademarks and the goodwill associated therewith remains with HP or the applicable trademark holder. Without limiting the generality of the foregoing: a. Microsoft, Windows and Windows Vista are either US registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries; and b. Celeron, Celeron Inside, Centrino, Centrino Inside, Core Inside, Intel, Intel Logo, Intel Atom, Intel Atom Inside, Intel Core, Intel Core Inside, Intel Inside Logo, Intel Viiv, Intel vPro, Itanium, Itanium Inside, Pentium, Pentium Inside, ViiV Inside, vPro Inside, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries.


Special Acknowledgments This book is based on the Building HP FlexFabric Data Centers course (Course ID: 00908176). HP Press would like to thank the courseware developers, Peter Debruyne, David Bombal, and Steve Sowell. Thanks to Debi Pearson and Miriam Allred for their help preparing this eBook for publication.

Introduction This study guide helps you prepare for the Building HP FlexFabric Data Centers exam (HP2-Z34). The HP2-Z34 elective exam is for candidates who want to acquire the HP ASE-FlexNetwork Architect V2 certification, or the HP ASE-FlexNetwork Integrator V1 certification. The exam tests you on specific Data Center topics and technologies such as Multitenant Device Context (MDC), Datacenter Bridging (DCB), Multiprotocol Label Switching (MPLS), Fibre Channel over Ethernet (FCoE), Ethernet Virtual Interconnect (EVI), and Multi-Customer Edge (MCE). The exam will also cover topics on high availability and redundancy such as Transparent Interconnection of Lots of Links (TRILL) and Shortest Path Bridging Mac-in-Mac mode (SPBM).

HP ExpertOne Certification HP ExpertOne is the first end-to-end learning and expertise program that combines comprehensive knowledge and hands-on real-world experience to help you attain the critical skills needed to architect, design, and integrate multivendor and multiservice converged infrastructure and cloud solutions. HP, the largest IT company in the world and the market leader in IT training, is committed to help you stay relevant and keep pace with the demands of a dynamic, fast-moving industry. The ExpertOne program takes into account your current certifications and experience, providing the relevant courses and study materials you need to pass the certification exams. As an ExpertOne certified member, your skills, knowledge, and real-world experience are recognized and valued in the marketplace. To continue your professional and career growth, you have access to a large ExpertOne community of IT professionals and decision-makers, including the world’s largest community of cloud experts. Share ideas, best practices, business insights, and challenges as you gain professional connections globally.


To learn more about HP ExpertOne certifications, including storage, servers, networking, converged infrastructure, cloud, and more, please visit hp.com/go/ExpertOne.

Audience This study guide is designed for networking professionals who want to demonstrate their expertise in implementing HP FlexNetwork solutions by passing the HP2-Z34 certification exam. It is specifically targeted at networking professionals who want to extend their knowledge of how to design and implement HP FlexFabric solutions for the data center.

Assumed Knowledge To understand the technologies and protocols covered in this study guide, networking professionals should have “on the job” experience. The associated training course, which includes numerous hands on lab activities, provides a good foundation for the exam, but learners are also expected to have real world experience.

Relevant Certifications After you pass these exams, your achievement may be applicable toward more than one certification. To determine which certifications can be credited with this achievement, log in to The Learning Center and view the certifications listed on the exam’s More Details tab. You might be on your way to achieving additional HP certifications.

Preparing for Exam HP2-Z34 This self-study guide does not guarantee that you will have all the knowledge you need to pass the exam. It is expected that you will also draw on real-world experience and would benefit from completing the hands-on lab activities provided in the instructor-led training.

Recommended HP Training Recommended training to prepare for each exam is accessible from the exam’s page in The Learning Center. See the exam attachment, “Supporting courses,” to view and register for the courses.


Obtain Hands-on Experience You are not required to take the recommended, supported courses, and completion of training does not guarantee that you will pass the exams. HP strongly recommends a combination of training, thorough review of courseware and additional study references, and sufficient on-the-job experience prior to taking an exam.

Exam Registration To register for an exam, go to hp.com/certification/learn_more_about_exams.html.


1 Datacenter Products and Technologies Overview EXAM OBJECTIVES In this chapter, you learn to: ✓ Understand the components of the HP FlexFabric network architecture. ✓ Describe common datacenter networking requirements. ✓ Position the HP FlexFabric products. ✓ Describe the HP IMC VAN Modules.

INTRODUCTION This chapter introduces HP’s FlexFabric portfolio, and describes how these products can be used to deploy simple, scalable, automated data center networking solutions. Specific data center technologies are also introduced. These include multi-tenant solutions such as MDC, MCE, and SPBM, along with Hypervisor integration protocols like PBB and VEPA. Other connectivity solutions include MPLS L2VPN, VPLS, EVI, SPBM, and TRILL.

ASSUMED KNOWLEDGE Because this course introduces the HP FlexFabric portfolio and datacenter technologies, learners are not expected to have prior knowledge about this topic. It is helpful, however, to be familiar with the requirements and growing trends of modern datacenters.

HP FlexFabric Overview ******ebook converter DEMO Watermarks*******

This chapter provides an overview of the components that are involved in the FlexFabric network architecture. It describes common data center networking requirements, positions HP FlexFabric products, and describes the HP data center technologies.

The World is Moving to a New Style of IT Many IT functions and systems are continuing to change at a relatively brisk pace. As shown in Figure 1-1, new paradigms arise, such as cloud computing and networking, big data, BYOD and new security mechanisms, to name a few. With these new paradigms come new challenges and new requirements, influencing how we build networks going forward.

Figure 1-1: The World is Moving to a New Style of IT

■ Cloud: We must understand how to build an agile, flexible and secure network edge, especially with regards to multi-tenancy. ■ Security: We have to rebuild the perimeter of the network wherever a device connects without degrading the quality of business experience. ■ Big Data: We have to enable the network to respond dynamically to real-time data analytics and to deal with the volume of traffic involved. ■ Mobility: We need to simplify the policy model in the campus by unifying wired and wireless networks. In the data center, we need to increase the agility and performance of mobile VMs.


A converged infrastructure can meet these needs by providing several key features, including: ■ A resilient fabric for less downtime and faster VM mobility ■ Network virtualization for faster data center provisioning ■ Software Defined Networking (SDN) – to simplify deployment and security – creating business agility and network alignment to business priorities.

Apps Are Changing - Networks Must Change Applications are changing and the networks infrastructure must be capable of handling these new application requirements. One significant trend is a massive increase in virtualization. Almost any service will be offered as a virtualized service, hosted inside a data center. These virtualized services can be in private clouds, a customer’s local data center, or public clouds. They might even be offered as a type of hybrid cloud service, which is a mix of private and public clouds. Inside the data center, the bulk of data traffic is now server-to-server. This is mainly due to the change in application behavior, since (as shown in Figure 1-2) we see much more use of federated applications as opposed to monolithic application models of the past.

Figure 1-2: Apps Are Changing - Networks Must Change


Previously, companies may have used a single email server that provided multiple functions. In today’s environment, companies may instead leverage a front-end server, a business logic server, and a back-end database system. In such a deployment, each client request towards the data center is handled by multiple services inside the data center. This results in similar client-server interactions as in the past, but with increased server-to-server traffic to fulfill those client requests. Also, many storage services and protocols are now being supported by a converged network that handles both traditional client-server traffic, as well as disk storagerelated traffic.

Multi-tier Legacy Architecture in the Data Center (DC) Federated applications and virtualization has changed the way traffic flows through the infrastructure. As packets must be passed between more and more servers, increased latency can impact performance and end-user productivity. Networks must be designed to mitigate these risks, while ensuring a stable, loop-free environment, see Figure 1-3. Network loops in a large data center environment can have egregious impacts on the business, so the ability to maintain loop-free paths is of particular importance.

Figure 1-3: Multi-tier Legacy Architecture in the Data Center (DC)

HPN FlexFabric Value Proposition HP’s FlexFabric approach has a focus on the three customer benefits. The network


should be simple, scalable, and automated. Simple – reducing operational complexity by up to 75% ■ Unified virtual/physical and LAN/SAN fabrics ■ OS/feature consistently, no licensing complexity, cost Scalable – double the fabric scaling, with up to 500% improved service delivery ■ Non-blocking reliable fabric for 100-10,000 hosts ■ Spine and leaf fabric optimized for Cloud, SDN Automated – cutting network provisioning time from months to minutes ■ 300% faster time to service delivery, Software-Defined Network Fabric ■ Open, standards based programmability, SDN App Store and SDK

HP FlexFabric Product Overview This product overview section begins with a discussion of core and aggregation switches. This is followed by an overview of access switches, and the IMC network management systems.

HP FlexFabric Core Switches Figure 1-4 introduces the current portfolio of HP FlexFabric core switches. This includes the HP FlexFabric 12900, 12500, 11900 and the 7904 Switch Series.

Figure 1-4: HP FlexFabric Core Switches


HP FlexFabric 12900 Switch Series The HP FlexFabric 12900 Switch Series, shown in Figure 1-5, is an exceedingly capable core data center switch. The switch includes support for Open Flow 1.3, laying a foundation for SDN and investment protection.

Figure 1-5: HP FlexFabric 12900 Switch Series

It provides 36 Tbps of throughput in a non-blocking fabric, and supports up to 768 10 gigabit ports, and up to 256 40Gbps ports. The 12900-series supports Fiber Channel over Ethernet (FCoE) and Data Center Bridging (DCB). The switch allows for In Service Software Upgrades (ISSU) to minimize downtime. Additionally, protocols like TRILL and SPB can be used to provide scalable connectivity between data center sites. All of these functions can be used in conjunction with IRF to offer a redundant, flexible platform.

HP 12500E Switch Series The HP 12500E Switch Series, shown in Figure 1-6, allows for up to 24Tbps switching capacity. It is available in 8 and 18-slot chassis. It supports very large Layer 2 and Layer 3 address and routing tables, and data buffers. It allows for up to four units in an IRF system.


Figure 1-6: HP 12500E Switch Series

The HP 12500 Switch Series has been updated, and so now supports high density 10 gigabit, 40 gigabit or 100 gigabit Ethernet modules - up to 400 gigabit per slot. It can support traditional Layer 2 and Layer 3 functions IPv4 and IPv6. These devices also feature support for the more modern protocols, such as MPLS, VPLS, MDC, EVI, and more. Wire-speed services provide a high-performance backbone while the energyefficient design lowers operational costs.

HP 12500 Switch Series Overview Figure 1-7 compares the features and capabilities of the 12500C and 12500E platforms. The 12500C is based on Comware5 while the 12500E is based on Comware7. The use of Comware7 results in enhanced MPU performance.


Figure 1-7: HP 12500 Switch Series Overview

HP FlexFabric 11908 Switch Series The HP FlexFabric 11900 Switch Series, shown in Figure 1-8, supports up to 7.7Tbps of throughput in a non-blocking fabric. This switch can be a good choice for data center aggregation switch.


HP FlexFabric 7900 Switch Series ******ebook converter DEMO Watermarks*******

The HP FlexFabric 7900 Switch Series, shown in Figure 1-9, is the next generation compact modular data center core switch. It is based on the same architecture and ComWare7 code as larger chassis-based switches.


The feature set includes full support for IRF, TRILL, DCB, EVI, MDC, OpenFlow and VXLAN.

HP FlexFabric Access Switches The HP 5900 Switch Series, shown in Figure 1-10, can serve as traditional top-ofrack access switches.

Figure 1-10: HP FlexFabric Access Switches


The HP 5900AF Switch Series is available in various models, including 48 1Gbps port versions and as 48 10Gbps switch port versions, with 4x 40Gbps uplink ports. It is also available with 48 1/10Gbps ports, with 4 x 40Gbps uplink connections. The 1/10Gbps port version is especially convenient for data centers which are migrating servers from 1 to 10Gbps interfaces. The 5930 is a Top-of-Rack (ToR) switch with 32 10/40Gbps ports. This switch could be used to terminate 40 gigabit connections from blade server enclosures, or it could be deployed as a distribution or aggregation layer device to concentrate a set of HP 5900Switch Series. Each of the 40Gbps ports can be split out as four 10Gbps ports with a special cable. This means that the 32 40Gbps ports could become 128 10Gbps ports, available in a 1U device. The “CP” in the 5900CP model stands for Converged Ports. As the name implies, both Fibre Channel over Ethernet (FCoE) and native Fibre Channel (FC) are supported in a single, converged ToR access switch. All of the 5900 Switch Series shown here support FCoE, but only the 5900CP also supports native FC connectivity. The module installed in each port determines whether that port functions as a 10Gbps FCoE port, or as an 8Gbps FC port. The 5900CP supports FCoE-to-FC gateway functionality. The HP FlexFabric 5900v is a virtual switch that can be installed as a replacement for the VMware switch on a Hypervisor. The 5900v is based on the VEPA protocol. This means that the 5900v does not support local direct configuration. Inter-VM traffic will be sent to an external ToR switch to be serviced. This is why the 5900v must be deployed in combination with a physical switch which also supports the VEPA protocol. All Comware7-based 5900-series switches support VEPA. In HP blade enclosures can have interconnects installed. These interconnects must match the physical form factor of the blade enclosure. The HP 6125 XLG can provide this blade server interconnectivity. This switch belongs to the HP 5900 Switch Series family of switches, as it provides 10Gbps access ports for blade servers, along with 4 x 40Gbps uplink ports. As a Comware7-based product, the 6125 XLG can be configured with the same protocols and features as traditional HP 5900 Switch Series. For example, features like FCoE and IRF are supported. This means that multiple 6125 XLG switches in the same blade enclosure can be grouped together as a single virtual IRF system. It also supports VEPA, and so can work with the 5900v switch running on a Hypervisor.

HP FlexFabric 5930 Switch Series ******ebook converter DEMO Watermarks*******

The HP FlexFabric 5930 Switch Series, shown in Figure 1-11, are built on the latest generation of ASICs, and so includes hardware support for VXLAN & NVGRE. VXLAN is an overlay virtualization technology which is largely promoted by VMware. NVGRE is an overlay technology which is largely promoted with Microsoft and used in their HyperV product.


Since the HP FlexFabric 5930 Switch Series has hardware support for both technologies, both products can be interconnected with traditional VLANs, with support for OpenFlow and SDN. With 32 40Gbps ports, it is suitable as a component in large scale spine or leaf networks that can leverage IRF and TRILL.

HP FlexFabric 5900CP Converged Switch The HP FlexFabric 5900CP supports 48 x 10Gbps converged ports. As shown in Figure 1-12, support for 4/8Gbps FC or 1/10Gbps Ethernet is available on all ports. It supports HP’s universal converged optic transceivers. The hardware optics in each port determines whether that port will function as a native FC port, or as an Ethernet port. The converged optics interface is a single device that can be configured to operate as either of the two. This means that the network administrator can easily change the operational mode of the physical interface via CLI configuration. This eliminates the need to unplug receivers and reconnect transceivers for this purpose.


Figure 1-12: HP FlexFabric 5900CP Converged Switch

FlexFabric 5700 Datacenter ToR Switch The HP FlexFabric 5700 Top-of-Rack switch is available in various combinations of 1Gbps and 10Gbps port configurations with 10Gbps or 40Gbps uplinks, as shown in Figure 1-13. This relatively new addition to the FlexFabric family offers L2 and L3 lite support, IRF support of nine switches to simplify management operations.

Figure 1-13: FlexFabric 5700 Datacenter ToR Switch

The 5700 switch series delivers 960Gbps switching capacity and is SDN-ready.

HP HSR6800 Router Series ******ebook converter DEMO Watermarks*******

The HP HSR6800 Router Series, shown in Figure 1-14, provides comprehensive routing, firewall and VPN functions. It uses 2Tbps backplane to support 420Mpps routing throughput. This is a high-density WAN router that can support up to 31 10Gbps Ethernet ports and is 40/100Gbps ready.

Figure 1-14: HP HSR6800 Router Series

Two of these carrier-class devices can be grouped into an IRF team to operate as a single, logical router entity. This eases configuration and change management, and eliminates the need for other redundancy protocols like VRRP.

Virtual Services Router The Virtual Services Router (VSR) can be seen as a network function virtualization (NFV) technology. It is very easy to deploy the VSR on any branch or data center or cloud infrastructure, see Figure 1-15 for more information. It is based on Comware7 and can be installed on a hypervisor, such as VMware ESXi or LINUX KVM.


Figure 1-15: Virtual Services Router

The VSR makes it very easy and convenient to support a multi-tenant data center. New router instances can be quickly deployed inside the hosted environment to provide routed functionality for a specific customer solution. VSR comes in multiple versions, with various licensing options to provide more advanced capabilities.

IMC VAN Fabric Manager Basic data center management of devices is handled by IMC. The VAN Fabric Manager (VFM) is a software module that can be added to IMC. This module adds advanced traffic management capabilities for many data center protocols, such as SPB, TRILL, and IRF. Storage protocols such as DCB and FCoE are also supported, see Figure 1-16.


Figure 1-16: IMC VAN Fabric Manager

It also manages the data center interconnect protocol such as EVI, and provides zoning services for converged storage management. You can easily view and manage information about VM migrations. VM migration records include the VM name, source and destination server, start and end times for the migration, and name of the EVI service to which the VM belongs. You can also perform a migration replay, which allows you to playback the migration process, allowing you to view the source, destination, and route of a migration in a video.

HP FlexFabric Cloud: Virtualized DC Use Case Figure 1-17 shows an example of an HP FlexFabric deployment. At the access layer, 5900v’s are deployed inside a blade server hypervisor environment, in conjunction with 5900-series switches with VESA support.


Figure 1-17: HP FlexFabric Cloud: Virtualized DC Use Case

With a deployment of HP blade systems, the 6125 XLGs can be used for interconnectivity. In this scenario the access layer is directly connected to the core, which could be comprised of 12900 or 11900-series devices. Connectivity to remote locations can be provided by the HSR 6800 router, and the entire system can be managed from a single pane-of-glass with HP’s IMC. Additional insight and management for data center specific technologies can be provided by the addition of the VFM module for IMC.

Data Center Technologies Overview The data center may provide support for multiple tenants. Multiple infrastructures may co-exist in an independent way. The data center should also have support for Ethernet fabric technologies to provide interconnect between all the switches, as well as converged FC/FCoE support. This fabric should integrate with Hypervisor environments. Also, data center interconnect technologies Network overlay technologies are used to connect several multi-tenant data centers together in a scalable, seamless way.

Overview of DC Technologies Figure 1-18 provides an overview of data center technologies and generalizes where these technologies are deployed.


Figure 1-18: Overview of DC Technologies

■ Multi-tenant support is provided by technologies such as MDC, MCE and SPBM. Hypervisor integration is provided by PBB and VEPA protocols, along with the 5900v switch product. ■ Overlay networking solutions are provided by VXLAN and SDN. ■ Data center interconnect technologies include MPLS L2VPN, VPLS, EVI, and SPBM. ■ OpenFlow technology can be used to understand, define, and control network behavior. ■ Large-scale Layer 2 Ethernet fabrics can be deployed using traditional link aggregation along with TRILL or SPBM. ■ IRF or Enhanced IRF can be used to improve manageability and redundancy in the Ethernet fabric. ■ Storage and Ethernet technologies can be converged with switches that support DCB, FCoE, and native FC.

Multi-tenant Support Multi-tenancy support involves the ability to support multiple business units, customers, and services over a common infrastructure. This data center infrastructure must provide techniques to isolate multiple customers from each other.


Multi-tenant Isolation Several isolation techniques are available, in two general categories. Physical isolation is one solution. However, this solution is less scalable due to the cost of purchasing separate hardware for each client, as well as the space, power, and cooling concerns. With logical isolation, isolated services and customers share a common hardware infrastructure. This reduces initial capital expenditures and improves return on investment.

Multi-tenant Isolation with MDC and MCE One isolation technique is Multi-tenant Device Context (MDC). This technology creates a virtual device inside a physical device. This ensures customer isolation at the hardware layer, since ASICs or line cards are dedicated to each customer. Since each MDC has its own configuration file, with separate administrative logins, isolation at the management layer is also achieved. There is also isolation of control planes, since each MDC has its own path selection protocol, such as TRILL, SPB, OSPF, or STP. Isolation at the data plane is achieved through separate routing tables and Layer 2 MAC address tables. Another technology to provide Layer 3 routing isolation is Multi-customer Carrier Ethernet (MCE). This is also known in the market as Virtual Routing and Forwarding (VRF). With VRF, separate virtual routing instances can be defined in a single physical router. This technology maintains separate routing functionality and routing tables for each customer. However, the platform’s hardware limitations still apply. For example, ten MCE’s might be configured on a device that has a hardware limit of 128,000 IPv4 routes. In this scenario, all ten customer MCE routing tables must share that 128,000 entry maximum. Unlike MDC, which allows for different management planes per customer, MCE features a single management plane for all customers. In other words, a single administrator configures and manages all customer MCE instances.

Multi-tenant Isolation for Layer 2 VLANs are the traditional method used to isolate Layer 2 networks, and this remains a prominent technology in data centers. However, the 4094 VLAN maximum can be a limiting factor for large, multi-tenant deployments. Another difficulty is preventing


each client from using the same set of VLANs. QinQ technology alleviates some of these concerns. Each customer has their own set of 4096 VLANs, using a typical 802.1q tag. An outer 802.1q tag is added, which is unique to each client. The data center uses this unique outer tag to move frames between customer devices. Before the frame is handed off to the client, the outer tag is removed. A limitation of this technique involves the MAC address table. All customer VLANs traverse the provider network with a common outer 802.1q tag. Therefore, all client VLANs share the same MAC address table. It is possible for this to increase the odds of MAC address collision – multiple devices that use the same address. Another option is Shortest Path Bridging using MAC in MAC mode (SPBM). SPBM can also isolate customers, similar to QinQ. Unlike QinQ, SPBM creates a new encapsulation, with the original customer frame as the payload of the new frame. This new outer frame includes a unique customer service identifier, providing a highly scalable solution. SPBM supports up to 16 million service identifiers. Each of the 16 million customers can have their own set of 4094 VLANs. A common outer VLAN identifier tag can be used for all client VLANs, like with QinQ. Alternatively, different customer VLANs can use different identifiers. Compared to QinQ, SPBM provides increased scalability while limiting the issue of MAC address collision. Virtual eXtensible LAN (VXLAN) is another technology that provides a virtualized VLAN for Hypervisor environments. A Virtual Machine (VM) can be assigned to a VXLAN, and use it to communicate with other VMs in the same VXLAN. This technology requires some integration with traditional VLANs via a hardware gateway device. This functionality can be provided by the HP Comware 5930 switch. VXLAN supports up to 16 million VXLAN IDs so is quite scalable. VXLAN provides a single VXLAN ID space. While SPBM could be used to encapsulate 4094 traditional VLANs into a single customer service identifier, with VXLAN, a customer with 100 VLANs would use 100 VXLAN IDs. For this reason, some planning is required to ensure that each client uses a unique range of VXLAN IDs.

Network Overlay Functions Network overlay functions provide a virtual network for a specific, typically VM-


based service. Software Defined Networking (SDN) can be considered a network overlay function, since it can centralize the control of traffic flows between devices, virtual or otherwise. VXLAN is an SDN technology that can provide overlay networks for VMs. Each VM can be assigned to a unique VXLAN ID, as supposed to a physical, traditional VLAN ID. HP is developing solutions to integrate SDN and VXLAN solutions. This will enable inter-connectivity between VXLAN-assigned virtual services and physical hosts.

SDN: Powering Your Network Today and Tomorrow SDN can be used to control the network behavior inside the data center. As shown in Figure 1-19, the SDN architecture consists of the infrastructure, control, and application layers.

Figure 1-19: SDN: Powering Your Network Today and Tomorrow

The infrastructure layer consists of overlay technologies such as VXLAN or NVGRE. Or it can is consist of devices that support OpenFlow. The control plane is to be delivered by the HP Virtual Application Network (VAN) SDN controller. This controller will be able to interact with VXLAN and OpenFlowenabled devices. It will have the ability to be directly configured, or to be controlled


by an external application, such as automation, cloud management, or security tools. The HP SDN app store will provide centralized availability for SDN-capable applications. Load-balancing will also be provided.

Data Center Ethernet Fabric Technologies This section will focus on Ethernet fabric technologies for the data center. An Ethernet fabric should provide a high speed Layer 2 interconnect with efficient path selection. It should also provide scalability to enable ample bandwidth and link utilization.

Data Center Ethernet Fabric Technologies 2 IRF combines two or more devices into a single, logical device. IRF systems can be deployed at each layer in the data center. For example, they are often deployed at the core layer of a data center, and could also be used to aggregate access layer switches. Servers could also be connected to IRF systems at the access layer. These layers can be interconnected by traditional multi-chassis link aggregations, which provide an active-active redundancy solution. Each IRF system is managed as an independent entity. If a customer has 200 physical access switches, they could be grouped into 100 IRFs, each IRF system containing two physical switches. If a new VLAN must be defined, it must be defined on each of the 100 IRF systems. Enhanced IRF (EIRF) is the next generation of IRF technology, allowing for the grouping of up to 100 or more devices into a single logical device. Enhanced IRF can combine multiple layers into a single logical system. For instance, several aggregation and access layer switches can be combined into a single logical device. Like traditional IRF, this provides a relatively easy active-active deployment model. However, with Enhanced IRF a large set of physical devices will be perceived as a single, very large switch with many line cards. If 100 physical switches were combined into a single EIRF system, they are all managed as a single entity. If a new VLAN must be defined, it only needs to be defined one time, as opposed to multiple times with traditional IRF. Also, EIRF eliminates the need to configure multi-chassis link aggregations as inter-switch links.

Data Center Ethernet Fabric Technologies 3 IRF and EIRF offer a compelling, HP Comware-based solution for building an Ethernet fabric. TRILL and SPBM offer other, standards-based technologies for data


center connectivity. HP Comware IRF or EIRF technology can provide switch and link redundancy while connecting to a standards-based TRILL or SPBM fabric. TRILL ensures that the shortest path for Layer 2 traffic is selected, while allowing maximum, simultaneous utilization of all available links. For example, two server access switches could connect to multiple aggregation switches and also be directly connected to each other. Traffic flow between servers on the two switches can utilize the direct connection between the two switches, while other traffic uses the accessto-aggregation switch links. This is an advantage over traditional STP-based path selection, which would require one of the links (likely the access-access connection) to be disabled for loop prevention. TRILL can also take advantage of this active-active, multi-path connectivity for cases when switches have, say four uplinks between them. The traffic will be loadbalanced over all equal-cost links. This load balancing can be based on source/destination MAC address pairs, or source/destination IP addressing. A limitation of TRILL is the fact that it supports a single VLAN space only. While TRILL provides for very efficient traffic delivery, it remains limited by the 4094 VLAN maximum. SPBM is similar to TRILL in its ability to leverage routing-like functionality for efficient Layer 2 path selection. Compared to TRILL, SPBM offers a more deterministic method of providing load-sharing over multiple equal-cost paths. This allows the administrator to engineer specific paths for specific customer traffic SPBM also offers the potential for greater scalability than TRILL. This is because SPBM supports multiple VLAN spaces, since each customer’s traffic is uniquely tagged with a service identifier in the SPBM header. RFC 7272 is a relatively recent standard that will allow the use of a 24-bit identifier, as opposed to the current 12-bit VLAN ID. This will allow greater scalability for multiple tenants. This feature is not currently supported on HP Comware switches.

Server Access Layer – Hypervisor Networking Hypervisor networking is supported at the access layer of a data center deployment, in the form of VEPA and EVB. These technologies enable integration between virtual and physical environments. For example, the HP Comware Hypervisor 5900v provides a replacement option for the Hypervisor’s own built-in software vSwitch. The 5900v sends inter-VM traffic to


an external, physical switch for processing. This external switch must support VEPA technology to be used for this purpose. Typically, most inter-VM traffic is handled by a physical switch anyway, since there are typically multiple ESX hosts. Traffic between VMs hosted by different ESX platforms are handled by an external physical switch. Only inter-VM traffic on the same ESX host is handled by that host’s internal vSwitch. The VEPA EVB model ensures a more consistent traffic flow, since all inter-VLAN traffic is via an external switch. This results in greater visibility and insight into inter-VLAN traffic flow. Traditional network analysis tools and port mirroring tools are thus capable of detailed traffic inspection and analysis.

Server Access Layer – Converged Storage Storage convergence means that a single infrastructure has support for both native Fibre Channel (FC), Fibre Channel over Ethernet (FCoE), and iSCSI. With Fibre Channel technology, a physical Host Bus Adapter (HBA) is installed in each server to provide access to storage devices. To ensure lossless delivery of storage frames, FC uses a buffer-to-buffer credit system for flow control. A separate Ethernet interface is installed in the server to perform traditional Ethernet data communications. FCoE is a technology that provides traditional FC and 10Gbps Ethernet support over a single Converged Network Adapter (CNA). The server’s application layer continues to perceive a separate adapter for each of these functions. Therefore, the CNA must accept traditional FC frames, encapsulate them in Ethernet and send it over the converged network fabric. A suite of Data Center Bridging (DCB) protocols enhance the Ethernet standard. This ensures the lossless frame delivery that is required by FC. iSCSI encapsulates traditional SCSI protocol communications inside a TCP/IP packet, which is then encapsulated in an Ethernet frame. The iSCSI protocol does not require that Ethernet be enhanced by DCB or any other special protocol suite. Instead, capabilities inherent to the TCP/IP protocol stack will mitigate packet loss issues. However enterprise-class iSCSI deployments should have robust QoS capabilities and hardware switches with enhanced buffer capabilities. This will help to ensure that iSCSI frame delivery is reliable, with minimal retransmissions.


Although DCB was originally developed to ensure lossless delivery for FCoE, it can also be used for iSCSI deployments. This minimizes frame drop and retransmission issues.

Server Access Layer – FC/FCoE The 5900CP provides native FC fabric services. Since it provides both FCoE and native FC connections, it can act as a gateway between native FC and FCoE environments. In addition to this FC-FCoE gateway service, other deployment scenarios are supported by the HP 5900CP. It can be used to interconnect a collection of traditional FC storage and server devices, or to connect a collection of FCoE-based systems. Multiple Fiber Channel device roles are supported. The 5900CP can fill the FCF role to support full fabric services. It can also act as an NPV node to support endpoint ID virtualization.

Data Center Interconnect Technologies Data center Interconnect technologies allow customer services to be interconnected across multiple data center sites. Two data center locations could be deployed, or multiple data centers could be spread over multiple locations for additional scalability and redundancy. These technologies typically require options for path redundancy and scalable Layer 2 connectivity between the data centers. This ensures that all customer requirements can be met, such as the ability to move VMs to different physical hosts via technologies such as VMWare’s vMotion.

Data Center Interconnect Technologies 2 Data centers can be connected using some traditional Layer 2 connection. This could be dark fiber connectivity between two sites, or some other connectivity available from a service provider. Once these physical connections are established, traditional VLAN trunk links and link aggregation can be configured to connect core devices at each site. MPLS L2VPN is typically offered and deployed by a service provider, although some larger enterprises may operate their own internal MPLS infrastructure. Either way, L2VPN tunnels can be established to connect sites over the MPLS fabric.


In this way, MPLS L2VPN provides a kind of “pseudo wire” between sites. It is important to note this connection lacks the intelligence to perform MAC-learning or other Layer 2 services. It is simply a “dumb” connection between sites.

Data Center Interconnect Technologies 3 MPLS Virtual Private LAN Service (VPLS) is another option that is typically deployed by a service provider. Some enterprises may have their own MPLS infrastructure, over which they may wish to deploy a VPLS solution. Unlike MPLS L2VPN, VPLS has the intelligence to perform traditional Layer 2 functions, such as MAC learning for each connected site. Therefore, when a device at one location sends a unicast frame into the fabric, it can be efficiently forwarded to the correct site. This is more efficient than having to flood the frame to all sites. Ethernet Virtual Interconnect (EVI) is an HP propriety technology to interconnect data centers with Layer 2 functionality. This technology enables the transport of L2 VPN and VPLS without need for an underlying MPLS infrastructure. Any typical IP routed connection between the data centers can be used to interconnect up to eight remote sites. The advantage of EVI is that it is very easy to configure as compared to MPLS. MPLS requires expertise with several technologies, including IP backbone technologies, label switching, and routing. EVI also makes it easy to optimize the Ethernet flooding behavior.

Summary In this chapter, you learned that HP’s FlexFabric provides a simple, scalable, automated approach to data center networking solutions. You also learned that HP’s FlexFabric product portfolio includes core switches like the 12900, 12500, 11900, and 7904. It also includes 5900AF, 5930, 5900CP, 5900v, and 6125XLG access switches. For routing, the HSR 6800 and VSR are available. Improved visibility and management functions for TRILL/SPB and FCoE/FC are available with the IMC VAN fabric manager product. You also learned that: ■ Technologies that support multi-tenant solutions include MDC, MCE, and SPBM. Hypervisor integration is provided by PBB and VEPA. ■ Overlay solutions include VXLAN and SDN, while data center interconnect technologies include MPLS L2VPN, VPLS, EVI, and SPBM.


■ Large-scale Layer 2 fabrics can be deployed using TRILL or SPBM, with IRF and EIRF providing improved manageability and redundancy. ■ The HP data center portfolio can create converged network support with DCB, FCoE, and native FC.

Learning Check Answer each of the questions below. 1. HP’s FlexFabric includes the following components (choose all that apply)? a. Core switches b. Aggregation switches c. MSM 4x0-series access points. d. Access switches. e. The 5900CP converged switch. f. Both physical and virtual services routers g. HP’s IMC management platform 2. The IMC VAN fabric manager provides which three capabilities (choose three)? a. Unified SPB, TRILL, and IRF fabric management b. VPN connectivity and performance management. c. VXLAN system management d. Unified DCB, FCoE, and FC SAN management. e. EVI protocol management for data center interconnects. f. Switch and router ACL configuration management. 3. Which two statements are true about multi-tenant isolation for Layer 2? a. VLANs provide a traditional method to isolate Layer 2 networks that is limited to 4094 VLANs1 b. With QinQ technology, up to 256 customers can each have their own set of 4094 isolated VLANs. c. DCB is an overlay technology that allows a converged infrastructure d. Shortest Path Bridging MAC-in-MAC mode can support 16 million isolated customers through the use of and I-SID2 4. Which technology can extend a Layer 2 VLAN across multiple data centers using


a Layer 3 technology? a. DCB. b. EIRF. c. SDN. d. TRILL. e. VXLAN.

Learning Check Answers 1. a, b, d, e, f, g 2. a, d, e 3. a, d 4. d


2 Multitenant Device Context

EXAM OBJECTIVES In this chapter, you learn to: ✓ Describe MDC features. ✓ Explain MDC use cases. ✓ Describe MDC architecture and operation. ✓ Describe support for MDC on various hardware platforms. ✓ Understand firmware updates and ISSU with MDC. ✓ Describe supported IRF configurations with MDC.

INTRODUCTION Multitenant Device Context (MDC) is a technology that can partition a physical device or an IRF fabric into multiple logical switches called "MDCs." Each MDC uses its own hardware and software resources, runs independently of other MDCs, and provides services for its own customer. Creating, starting, rebooting, or deleting an MDC does not affect any other MDC. From the user's perspective, an MDC is a standalone device.

MDC Overview Multitenant Device Context (MDC) can partition either a single physical device or an IRF fabric into multiple logical switches called "MDCs." With MDC, physical networking platforms, such as HP 11900, 12500, and 12900 switches can be virtualized to support multitenant networks. In other words, MDC


provides customers with 1:N device virtualization capability to virtualize one physical switch into multiple logical switches as shown in Figure 2-1.

Figure 2-1: Feature overview

Other benefits of MDC include: ■ Complete separation of control planes, data planes and forwarding capabilities. ■ No additional software license required to enable MDC. ■ Reduced power, cooling and space requirements within the data center. ■ Up to 75% reduction of devices and cost when compared to deployments without 1:N device virtualization. ■ Modification of interface allocations without stopping MDCs.

IRF versus MDC What is the difference between MDC and technologies like IRF? The main difference is that in the case of IRF (N:1 Virtualization), you are combining multiple physical devices into one logical device. With MDC on the other hand (N:1 Virtualization), you are splitting either a single device or a logical IRF device into separate discrete logical units.


The reason for doing this is to provide network features such as VLANs, routing, IRF and other features to different entities (customers, development network), but still use the same hardware. Customers can also be given different feature sets inside the same logical "big box" device. Each of the MDCs operate as a totally independent device inside the same physical device (or IRF fabric). Instead of buying additional core switches for different customers or business units, a single core switch or IRF fabric can be used to provide the same hardware feature set to multiple customers or business units.

MDC Features Each MDC uses its own hardware and software resources, runs independently of other MDCs, and provides services for its own environment. Creating, starting, rebooting, or deleting an MDC does not affect the configuration or service of any other MDC. From the user's perspective, an MDC is a standalone device. Each MDC is isolated from the other MDCs on the same physical device and cannot communicate with them via the switch fabric. To allow two MDCs on the same physical device to communicate with each other, you must physically connect a port allocated to one MDC to a port allocated to the other MDC using an external cable. It is not possible to make a connection between MDCs over the backplane of the switch. Each MDC has its own management, control and data planes, which is the same size as the physical device. For example, if the device has a 64-KB space for ARP entries, each MDC created on the device gets a separate 64-KB space for its own ARP entries. Management of MDCs on the same physical device is done via the default MDC (admin MDC), or via management protocols such as SSH or telnet.

MDC Applications MDC can be used for applications such as the following: ■ Device renting ■ Service hosting ■ Staging of a new network on production equipment ■ Testing features such as SPB and routing that cannot be configured on a single device


■ Student labs Instead of purchasing new devices, you can configure more MDCs on existing network devices to expand the network. As an initial example, in Figure 2-2 a service provider provides access services to three companies, but only deploys a single physical device (or IRF stack). The provider configures an MDC for each company on the same hardware device to logically create three separate devices.

Figure 2-2: MDC application example

The administrators of each of the three companies can log into their allocated MDC to maintain their own network without affecting any other MDC. The result is the same as deploying a separate gateway for each company. Additional use cases will be discussed later in this chapter.

MDC Benefits Overview MDC Benefits Higher utilization of existing network resources and fewer hardware upgrade costs: Instead of purchasing new devices, you can configure more MDCs on existing network devices to expand the network. For example, when there are more user groups, you can configure more MDCs and assign them to the user groups. When there are more users in a group, you can assign more interfaces and other resources to the group.


Lower management and maintenance cost: Management and maintenance of multiple MDCs occur on a single physical device. Independence and high security: Each MDC operates like a standalone physical device. It is isolated from other MDCs on the same physical device and cannot directly communicate with them. To allow two MDCs on the same physical device to communicate, you must physically connect a cable from a port allocated to one MDC to another port allocated to the other MDC.

MDC Features An MDC can be considered a standalone device. Creating, running, rebooting or deleting a MDC does not affect the configuration or service of any other MDC. This is because of Comware v7's container based OS level virtualization technology as shown in Figure 2-3.


Each MDC is a new logical device defined on the existing physical device. The physical device could either be a single switch or an IRF fabric. A traditional switching device has its own control, management and data planes. When you define a new MDC, the same features and restrictions of the physical


device will apply to the new MDC and the new MDC will have separated control and management planes. Each MDC has a separate telnet server process, separate SNMP process, separate LACP process, separate OSPF process etc. In addition, each MDC will also have an isolated data plane. This means that the VLANs defined in one MDC are totally independent of the VLANs defined in a different MDC. As an example, MDC1 can have VLANs 10, 20 and 30 configured. MDC2 can also have VLANs 10, 20 and 30 configured, but here is no communication between VLAN 10 on MDC1 and VLAN 10 on MDC2. Each MDC also has its own hardware limits. This is because resources are assigned to MDCs down to the ASIC level. A switch configured without multiple MDCs has a limit of 4094 VLANs in the overall chassis. However, once a new MDC is created, ASICs and line cards within the physical device are assigned to the new MDC and can be programmed by the new management and control plane. Each MDC is a new logical device inside the physical device and has a separate limit of 4094 VLANs. Other features such as the number of VRFs supported are also set per MDC and what is configured in one MDC does not affect other MDCs limits. In other words, if you have 4 MDCs on a chassis, the total chassis will support 4 times the hardware and software limits of the same chassis with a single MDC or a traditional chassis. As an example, rather than supporting only 4094 VLANs, 4 x 4094 VLANs are supported with a total of 16,376 VLANs supported (4094 per MDC and running 4 MDCs). MDCs share and compete for CPU resources. If an MDC needs a lot of CPU resources while the other MDCs are relatively idle, the MDC can access more CPU resources. If all MDCs need a lot of CPU resources, the physical device assigns CPU resources to MDCs according to their CPU weights. Use the MDCs.

limit-resource cpu weight

command to assign CPU weights to user

Supported Platforms Supported Products MDC is supported on chassis based platforms running the HP Comware 7 operating system. MDC is not by the HP Comware 5 operating system. As an example, the 12500 series switches require main processing units (MPUs) running HP Comware 7


and not HP Comware 5. This also applies to the HP 10500 series switches. See Figure 2-4 for supported platforms.

Figure 2-4: Supported platforms

MDC is only available on chassis based switches and not fixed port switches. This is due to the processing and memory requirements of running separate virtual switches within the same physical switch. If you configured three MDCs that would require 3 x LACP process, 3 x BGP processes, 3 x OSPF processes, 3 x telnet processes etc. Fixed port switches do not have enough memory to run multiple MDCs and create separate instances of all processes. In contrast, chassis based switches have the HP Comware operating system installed on the Main Processing Unit (MPU) and may also have the HP Comware operating system running on the line cards or Line Processing Units (LPU) with their own memory. The chassis based switches have more memory and can therefore run multiple MDCs. All MDC capable devices have a "default MDC” or “admin MDC.” The default MDC can access and manage all hardware resources. User MDCs can be created, managed or deleted via the default MDC and. The default MDC is system predefined and cannot be created or deleted. The default MDC always uses the name "Admin" and the ID 1. The number of MDCs available depends on the Main Processing Unit (MPU) capabilities and switch generation. The supported number of MPUs is in the range four to nine: ■ The 11900 and 12500 switch series support four MDCs. ■ The HP FlexFabric 12900 switch series supports nine MDCs. This is because the switch has enhanced memory capabilities.


Note When you configure MDCs, follow these restrictions and guidelines: Only MPUs with 4-GB memory or 8-GB memory space support configuring MDCs. The MDC feature and the enhanced IRF feature (4-chassis IRF) are mutually exclusive. When using MDC, the IRF Fabric is currently limited to 2 nodes. The number of MDCs supported per LPU differs depending on LPU memory. Refer to Table 2-1 through Table 2-5 below for summary and SKUs with LPU memory. Note The product details shown below are for reference only. Table 2-1: MDCs support per device and LPU

Table 2-2: LPUs with 512MB Memory SKU

Description

JC068A JC065A JC476A JC069A

HP 12500 8-port 10-GbE XFP LEC Module HP 12500 48-port Gig-T LEC Module HP 12500 32-port 10-GbE SFP+ REC Module HP 12500 48-port GbE SFP LEC Module

JC075A JC073A JC074A JC064A JC070A

HP 12500 48-port GbE SFP LEB Module HP 12500 8-port 10-GbE XFP LEB Module HP 12500 48-port Gig-T LEB Module HP 12500 32-port 10-GbE SFP+ REB Module HP 12500 4-port 10-GbE XFP LEC Module

Table 2-3: LPUs with 1G Memory SKU

Description

JC068B HP 12500 8-port 10GbE XFP LEC Module


JC069B HP 12500 48-port GbE SFP LEC Module JC073B JC074B JC075B JC064B JC065B JC476B JC659A JC660A JC780A JC781A

HP 12500 8-port 10GbE XFP LEB Module HP 12500 48-port Gig-T LEB Module HP 12500 48-port GbE SFP LEB Module HP 12500 32-port 10GbE SFP+ REB Module HP 12500 48-port Gig-T LEC Module HP 12500 32-port 10-GbE SFP+ REC Module HP 12500 8-port 10GbE SFP+ LEF Module HP 12500 48-port GbE SFP LEF Module HP 12500 8-port 10GbE SFP+ LEB Module HP 12500 8-port 10GbE SFP+ LEC Module

JC782A HP 12500 16-port 10-GbE SFP+ LEB Module JC809A HP 12500 48-port Gig-T LEC TAA Module JC810A HP 12500 8-port 10-GbE XFP LEC TAA Mod JC811A HP 12500 48-port GbE SFP LEC TAA Module JC812A HP 12500 32p 10-GbE SFP+ REC TAA Module JC813A HP 12500 8-port 10-GbE SFP+ LEC TAA Mod JC814A HP 12500 16p 10-GbE SFP+ LEC TAA Module JC818A HP 12500 48-port GbE SFP LEF TAA Module Table 2-4: LPUs with 4G Memory SKU

Description

JG792A HP FF 12500 40p 1/10GbE SFP+ FD Mod JG794A HP FF 12500 40p 1/10GbE SFP+ FG Mod JG796A HP FF 12500 48p 1/10GbE SFP+ FD Mod JG790A HP FF 12500 16p 40GbE QSFP+ FD Mod JG786A HP FF 12500 4p 100GbE CFP FD Mod JG788A HP FF 12500 4p 100GbE CFP FG Mod Refer to device release notes to determine support.


Table 2-5: Example of HP 12500-CMW710-R7328P01 support of Ethernet interface cards for ISSU and MDC


Use Case 1: Datacenter Change Management Overview A number of use cases are discussed in this chapter. In this first use case, MDC is used to better handle change management procedures in a data center. Separate MDCs are created for a production network, a quality assurance (QA) network and a Development network. This is in line with procedures followed by Enterprise resource planning (ERP) applications which tend to have three separate installations.

Development Network ******ebook converter DEMO Watermarks*******

A separate development MDC allows for testing to be performed on a separate logical network, but still using the same physical switches as are used in the production network. As an example, a customer may want to test a new load balancer for two to three weeks. The test can be performed on a temporary basis using the development network rather than the production network. However, as mentioned both networks use the same physical switches. Rather than introducing the additional risk of a new untested device in the production network, comprehensive tests can be performed using the development network. Features of the new device can be tested, issues resolved and updated network configuration verified without affecting the current running network. The additional benefit of MDC is that the test will be relevant and consistent with the production network as the tests are being performed on the same hardware as the production network.

Quality Assurance (QA) Network A Quality Assurance network is an identical logical copy of the production network. When a major change is required on the production network, the change can be validated on the QA network. Changes such as the addition of new VLANs, new routing protocols or new access control lists (ACLs) can be tested and validated in advance on the QA network before deploying the change on the production network. The advantage of using MDC in this scenario is that all the MDCs are running on the same physical hardware. Thus the tests and configuration are validated as if they were running on the production network. This is a much better approach than using smaller test switches instead of actual production core switches to try to validate changes. Using different switches does not make the QA tests 100% valid as there could be differences in firmware or hardware capabilities between the QA network and the production network when tested on different hardware. Note The QA process will validate feature configurations, but cannot be used to test or validate firmware updates. All MDCs in a physical device or IRF are running the same firmware version and all MDCs will be upgraded together during a firmware update

Use Case 2: Customer Isolation ******ebook converter DEMO Watermarks*******

This second use case uses MDC for customer isolation. In a data center, multiple customers could use the same core network infrastructure, but be isolated using traditional network isolation technologies such as VLANS and VRFs. A customer may however want further isolation in addition to traditional network isolation technologies. They may want isolation of their configurations, memory and CPU resources from other customers. MDC provides this functionality whereas traditional technologies such as VLANs don't provide this level of isolation. This use case is limited by the number of supported MDCs on the physical switches. As an example, when using a 12500 series switch with 4GB MPU, this use case will only allow for isolation of two to three customers, as shown in Figure 2-5. This is because one MDC is used for the Admin MDC and the switch supports a maximum of four MDCs.

Figure 2-5: Use Case 2: Customer isolation

An additional use case for MDC isolation is where different levels of security are required within a single customer network. A customer may have a lower security level network and a higher security level network and may want to keep these separate from each other. These networks would be separated entirely by using multiple MDCs. This use case is however also restricted by the number of MDCs a switch can support. Note


MDCs are different to VRFs as VRFs only separate the data plane and not the management plane of a device. In the example use case of different security level networks, multiple network administrators are involved. A lower level security zone administrator cannot configure or view the configuration of a higher level security zone. When configuring VRFs however, the entire configuration would be visible to network administrators.

Use Case 3: Infrastructure and Customer Isolation The third MDC use case splits a switch logically into two separate devices. One MDC is used for core infrastructure and another MDC is used for customers, as shown in Figure 2-6.The benefit here is that the core data center infrastructure network is isolated from all customer networks. There are separate VLANs (4094), separate QinQ tags and separate VRFs per MDC.

Figure 2-6: Use Case 3: Infrastructure and Customer Isolation

The data center core network is logically running a totally separate management network independent of all customer data networks. Both management and customer networks still use the same physical equipment.

Use Case 4: Hardware Limitation Workaround In this fourth use case, MDC provides a workaround for hardware limitations on switches. As an example, a data center may use Shortest Path Bridging MAC mode (SPBM) or Transparent Interconnection of Lots of Links (TRILL). The current switch ASICs cannot provide the core SPBM service and layer 3 routing services at the same time.


SPB is essentially a replacement for Spanning Tree. One caveat of SPB is that core devices simply switch encapsulated packets and do not read the packet contents. This is similar to the behavior of P devices in an MPLS environment. A core SPBM device would therefore not be able to route packets between VLANs. An SPB edge device is typically required for the routing. SPB encapsulated packets would be de-capsulated so that the device can view the IP frames and perform interVLAN routing. If IP routing is required on the same physical core as the device configured for SPB, two MDCs would be configured, as shown in Figure 2-7. One MDC would be configured with SPB and be part of the SPB network. Another MDC would then be configured that is not running SPB to provide layer 3 functionality. A physical cable would be used to connect the two MDCs on the same chassis switch. The SPB MDC is thus connected to the layer 3 routing core MDC via a physical back-to-back cable.

Figure 2-7: Use Case 4: Hardware limitation workaround

This scenario would apply for both SPB and TRILL.

MDC Numbering and Naming MDC 1 is created by default with HP Comware 7 and is named “Admin” in the default configuration. Non-default MDCs are allocated IDs 2 and above. Names are assigned to these MDCs as desired, such as “DevTest”, “DMZ” and “Internal” as shown in Figure 2-8.


Figure 2-8: MDC numbering and naming

Architecture It is important to realize that even though MDCs look like two, three or even four logical devices running on a physical device, there is still only one MPU with only one CPU. Only one kernel is booted. On top of this kernel, multiple MDC contexts will be started and each MDC context will have its own processes and allocated resources. But, there is still only one Kernel. This also explains why multiple MDCs need to run the same firmware version. A device supporting MDCs is an MDC itself, and is called the "Admin" MDC. The default MDC always uses the name Admin and the ID 1. You cannot delete it or change its name or ID. By default, there is one kernel that started and it will start one MDC and one MDC only (the Admin MDC). The Admin MDC is used to manage any other MDC’s. The moment a new MDC is defined, all the control plane protocols of the new MDC will run in that MDC process group. This process group is isolated from other process groups and they cannot interact with each other. Processes that form part of the process group can be allocated a CPU weight to


provide more processing to specific MDCs. CPU, disk usage and memory usage of process groups can also be restricted for any new MDC. Resource allocation will be covered later in this chapter. This restriction does not apply to the Admin MDC. The Admin MDC will always have 100% access to the system. If necessary, it can take all CPU resources, or use all memory, or use the entire flash system. The Admin MDC can also access the files of the other MDCs, since these files are stored in a subfolder per MDC on the main flash. It is important to remember that there is still a physical MPU dependency. If the physical MPU goes down, all of the MDCs running on top of the physical MPU will also go down. That is why it is worth considering the use of an IRF fabric for high availability. As an example, two core physical chassis switches are configured as an IRF fabric. In addition, three MDCs are configured. If the first physical switch is powered off, all MDCs (three in this example), will have a master IRF failure and will activate the slave as the new master (second chassis).

Architecture, Control Plane When a new MDC is defined, the MDC can be started. A new control plane is configured for the MDC. However, the MDC only has access to the Main Processing Unit (MPU). No line cards or interfaces are available to the MDC until they have been assigned by an administrator to the MDC. This is similar to booting a chassis with only the MPU and no Line Processing Units (LPU) / line cards inserted in the chassis. Using the interfaces.

display

interface

brief

command for example would show no

Architecture, ASICs How do you assign line card interfaces to an MDC? Because of the hardware restrictions on devices, the interfaces on some interface cards are grouped. Interfaces therefore need to be allocated to the MDC per ASIC (port group).


It is important to understand how ASICs are used within a chassis based switch. In a chassis, each of the line cards has one or more local ASICs. This affects the data plane of the switch as the data plane packet processing is done by the ASIC. When packets are received by the switch, functions such as VLAN lookups, MAC address lookups and so on are performed by ASICs. These ASICs also hold the VLAN table or the IP routing table. One ASIC can be used by multiple physical interfaces. As an example, one ASIC on the line card can be used by 24 Gigabit Ethernet ports. Depending on the line card models there may be up to 4 ASICs on a physical line card. Another example is a 48 Gigabit Ethernet port line card which could have only two ASICs.

Architecture, ASIC Control Why this is important to understand? Because each of these ASICs has its own hardware resources and limits. For each ASIC as an example, there is a limit of 4094 VLANs. The moment you define a new VLAN at the global chassis level, that VLAN will be programmed by the control plane into each of the ASICs on the chassis. If there are six different ASICs on a line card, each ASIC will be programmed with all globally configured VLANs. In a normal chassis all the ASICs are used by the MPU, so they are programmed by the single control plane. Each ASIC can only have one control plane or ASIC programming process. The ASIC can have only one master and cannot be configured by other control planes. When creating a new MDC, a new control plane is created. Two control planes cannot modify the same ASIC. By default, all ASICs and line cards are controlled by the Admin MDC. When creating a new MDC, the control of an ASIC can be changed from the default Admin MDC to that new MDC. This results in all physical interfaces that are bound to the ASIC also being moved to the new MDC. Individual interfaces cannot be assigned to an MDC. They are assigned indirectly to the MDC when the ASIC they use is assigned to the MDC. All interfaces which are managed by one ASIC must be assigned to the same MDC. For example 10500/11900/12900 series switches only support one MDC per LPU. In the configuration, this is enforced by the CLI through port-groups. As shown in Figure 2-9, all interfaces which are bound to the same ASIC must be assigned as a port-


group to an MDC. An example of 12500/12500E LPU MDC Port Group Implementation is given in Table 2-6.

Figure 2-9: Architecture, ASIC control

Table 2-6: Example 12500/12500E LPU MDC Port Group Implementation

HP Comware7 will notify which ports belong to a port-group. The following sample configuration shows 11900 MDC port group allocation: [DC1-SPINE-1-mdc-2-mdc2]allocate interface FortyGigE 1/1/0/1 Configuration of the interfaces will be lost. Continue? [Y/N]:y Group error: all interfaces of one group must be allocated to the same mdc. FortyGigE1/1/0/1 Port list of group 5: FortyGigE1/1/0/1

FortyGigE1/1/0/2

FortyGigE1/1/0/3

FortyGigE1/1/0/4

FortyGigE1/1/0/5

FortyGigE1/1/0/6


FortyGigE1/1/0/7

FortyGigE1/1/0/8

Architecture, Hardware Limits In addition to a new control plane being created, hardware limits change with the creation of a new MDC. As an example, if 1000 VLANs were created using the Admin MDC, these VLANs would be programmed on each ASIC that is associated with the Admin MDC. However, ASICs associated with another MDC, such as the Development MDC, will not have the 1000 VLANs programmed. They only have the VLANs configured by an administrator of the Development MDC. The control plane of the Admin MDC does not control and can therefore not program the ASICs associated with the Development MDC. If VLAN 10 was configured on the Admin MDC, that VLAN is not programmed onto the ASICs of the Development MDC. VLAN 10 would only be programmed on the ASICs if VLAN 10 was configured on the Development MDC. However, VLAN 10 on the Admin MDC is different and totally independent from VLAN 10 on the Development MDC. MAC addresses learned in the Development MDC are different from the MAC addresses learned in the Admin MDC. There is no control plane synchronization between the ASICs of different MDCs. By default there is only one MDC and all ASICs have the same VLAN information. However, as soon as multiple MDCs are created, each ASIC in a different MDC is in effect part of a different switch, controlled and programmed separately. This principle applies to all the resources and features such as access lists, VRFs, VPN instances, routing table sizes etc. This also means that if any MDC is running out of hardware resources at the ASIC level, the resource shortage will not impact any of the other MDCs. This is ideal for heavy load environments. Customers could stress test a network with many VRFs, access lists or quality of service (QoS) rules without affecting other MDCs. A development MDC could run out of resources without affecting the production MDC for example. However, while there is isolation of the data plane by isolating the ASICs, this is not the case for a number of other components. Switch hardware resources such as CPU, physical memory and the flash file systems are shared between MDCs.

Architecture, File System ******ebook converter DEMO Watermarks*******

Each MDC has its own configuration files on a dedicated part of the disk. An MDC administrator can therefore only modify or restart their own MDC. Access to the switch CPU and physical memory by MDCs can also be restricted. There is also good isolation and separation of MDC access to these resources. For the file system however, there is only one file system available on the flash card. The Admin MDC (which is the original MDC) has root access to the file system. This MDC has total control of the flash and has the privileges to perform operations such as formatting the file system. Any file system operations such as formatting the flash or using fixdisk are only available from the Admin MDC. Configurations saved from the Admin MDC are typically saved to the root of the file system. Other MDCs only have access to a subset of the file structure. This is based on the MDC identifier. When a new MDC is defined, a folder is created on flash with the MDC identifier. MDC 2 for example, has a folder "2" created for it on flash. All files saved by MDC 2 are stored in this subfolder. Additionally, any file operations such as listing directories and files on the flash using DIR will only show files within this subfolder. From within the MDC, it appears that root access is provided, but in effect, only a subfolder is made available to the MDC. The Admin MDC can view all the configuration files of other MDCs as they are subfolders in the root of the file system. This is something to consider in specific use cases. Within the other MDCs, only the local MDC files are visible. MDC 2 would not be able to view the files of Admin MDC or other MDCs (such as MDC 3). The Admin MDC can also be used to monitor and restrict the file space made available to other MDCs. The Admin MDC has full access and unlimited control over the file system, but other MDCs can be restricted from the Admin MDC if required.

Architecture, Console Ports Console Port Other components which are shared between the MDC’s are the console port and the Management-Ethernet ports. The console port and AUX port of the physical chassis always belong to the Admin MDC (default MDC). Other MDCs do not have access to the physical console or AUX ports.


To access the console of the other MDCs, first access the admin MDC console and then use the switchto mdc command to switch to the console of a specific MDC. This is similar to the Open Application Platform (OAP) connect functionality used to connect to the console of subslots on other devices like the unified wireless controllers.

Management-Ethernet ports The management interface of all MDCs share the same physical Out Of Band (OOB) management Ethernet port. Switching to this interface using the switchto command is not possible like with the console port. The management Ethernet interface is shared between all MDCs. When a new MDC is created, the system automatically shows the Management Ethernet interface of the MPU inside the MDC. You must assign different IP addresses to the Management-Ethernet interfaces so MDC administrators can access and manage their respective MDCs. The IP addresses for the management Ethernet interfaces do not need to belong to the same network segment. The interface can be configured from within each MDC as the interface is shared between all the MDCs. This means that the physical interface will accept configurations from all the MDCs. Network administrators or operators of the MDCs will need to agree on the configuration of the Management-Ethernet port.

Design Considerations ASIC Restrictions When designing an MDC solution, remember that ASIC binding determines the interface grouping that will need to be allocated to an MDC. Interfaces have to be assigned per ASIC. Some of the line cards only have a single ASIC. This means that all the interfaces on the line card will need to be assigned to or removed from an MDC at the same time. Some line cards may have 2 or more ASICs. This allows for a smaller number of interfaces to be assigned to an MDC at the same time. The second consideration is that the number of MDC’s will depend on the MPU generation and memory size.


The interfaces in a group must be assigned to or removed from the same MDC at the same time. You can see how the interfaces are grouped by viewing the output of the allocate interface or undo allocate interface command: ■ If the interfaces you specified for the command belong to the same group or groups and you have specified all interfaces in the group or groups for the command, the command outputs no error information. ■ Otherwise, the command displays the interfaces that failed to be assigned and the interfaces in the same group or groups. Assigning or reclaiming a physical interface restores the settings of the interface to the defaults. For example, if the MDC administrator configures the interface, and later on the interfaces are assigned to a different MDC, the interface configuration settings are lost. To assign all physical interfaces on an LPU to a non-default MDC, you must first reclaim the LPU from the default MDC by using the undo location and undo allocate commands. If you do not do so, some resources might be still occupied by the default MPU.

Platforms The number of MDCs supported by a platform also needs to be considered. This depends on the MPU platform as well as the MPU generation. You can create MDCs only on MPUs with a memory space that is equal to or greater than 4 GB. The maximum number of non-default MDCs depends on the MPU model. Refer to earlier in this chapter for more details.

Basic Configuration Steps Overview The configuration steps for creating and enabling an MDC will be discussed. Basic MDC configuration is discussed first and then advanced configuration options such as setting resource limits will be covered. Step 1: Define the new MDC with the new ID and a new name. Step 2: Authorize the MDC to use specific line cards. ASICs are not assigned at this point. Authorization is given so the next step can be used to assign interfaces.


Step 3: Allocate interfaces to the MDC. Remember to allocate per ASIC group. Step 4: Start the MDC. This starts the new MDC control plane. Step 5: Access the MDC console by using the switchto command.

Configuration Step 1: Define a New MDC Step 1: Define the new MDC with the new ID and a new name. This command needs to be entered from within the default Admin MDC. You cannot type this command from any non-default MDCs. From the default MDC enter system view. Next, define a new MDC by specifying a name of your choice and ID of the MDC. This ID is used for the subfolder on the flash file system. See Figure 2-10 for an example.

Figure 2-10: Configuration step 1: Define a new MDC

Once the MDC is configured, a new process group is defined. The process group is not started at this point as the MDC needs to be manually started in step 4. To create an MDC, see Table 2-7. Table 2-7: Creating an MDC Step

Command Remarks

1. Enter system view.

systemview

2. Create and

mdc mdcname [ id mdc-id ]

By default, there is a default MDC with the name Admin and the ID 1. The default MDC is system predefined. You do not need to create it, and you cannot delete it. The MDC starts to work after you execute the mdc start command.


MDC.

This command is mutually exclusive with the irf enhanced command.

mode

Configuration Step 2: Authorize MDC for a Line Card When you create an MDC, the system automatically assigns CPU, storage space, and memory space resources to the MDC to ensure its operation. You can adjust the resource allocations as required (this is discussed in more detail later in this chapter). An MDC needs interfaces to forward packets. However, the system does not automatically assign interfaces to MDCs and you must assign them manually. By default, a non-default MDC can access only the resources on the MPUs. All LPUs of the device belong to the default MDC and a non-default MDC cannot access any LPUs or resources on the LPUs. To assign physical interfaces to an MDC, you must first authorize the MDC to use the interface cards to which the physical interfaces belong. Step 2 is to authorize the MDC to access interfaces of a specific line card. This command is entered from the non-default MDC context. In Figure 2-11, MDC 2 with the name Dev is authorized to allocate interfaces on the line card in slot 2.

Figure 2-11: Configuration step 2: Authorize MDC for a line card

This command does not assign any of the interfaces to the MDC at this point. It only authorizes the assignment of the interfaces on that line card. Interfaces will be assigned to the MDC in step 3, as outlined in Table 2-8. Multiple MDCs can be authorized to use the same interface card. Table 2-8: To authorize an MDC to use an interface card Step

Command


system-view

Remarks


2. Enter MDC view.

mdc mdc-name [ id mdc-id ]

In standalone mode: location slot slot-number

3. Authorize the MDC to use an In IRF mode: interface card. location chassis chassis-number slot slot-number

By default, all interface cards of the device belong to the default MDC, and a non-default MDC cannot use any interface card. You can authorize multiple MDCs to use the same interface card.

Configuration Step 3: Allocate Interfaces per ASIC By default, all physical interfaces belong to the default MDC, and a non-default MDC has no physical interfaces to use for packet forwarding. To enable a non-default MDC to forward packets, you must assign it interfaces. The console port and AUX port of the device always belong to the default MDC and cannot be assigned to a non-default MDC. Important When you assign physical interfaces to MDCs on an IRF member device, make sure the default MDC always has at least one physical IRF port in the up state. Assigning the default MDC's last physical IRF port in the up state to a nondefault MDC splits the IRF fabric. This restriction does not apply to 12900 series switches. Only a physical interface that belongs to the default MDC can be assigned to a nondefault MDC. The default MDC can use only the physical interfaces that are not assigned to a non-default MDC. One physical interface can belong to only one MDC. To assign a physical interface that belongs to a non-default MDC to another non-default MDC, you must first remove the existing assignment by using the undo allocate interface command. Assigning a physical interface to or reclaiming a physical interface from an MDC restores the settings of the interface to the defaults. Remember that because of hardware restrictions, the interfaces on some interface


cards are grouped. The interfaces that form part of the ASIC group may vary depending on the line card and the interfaces in a group must be assigned to the same MDC at the same time. When interfaces are allocated to the new MDC, they are removed from the default MDC and moved to the specified non-default MDC. All current interface configuration is reset on the interfaces when moved to the new MDC. These interfaces appear as new interfaces in the MDC. They will thus be assigned by default to VLAN 1. In Figure 2-12, interfaces Gigabit Ethernet 2/0/1 to 2/0/48 have been allocated to MDC 2, named Dev. To configure parameters for a physical interface assigned to an MDC, you must log in to the MDC.

Figure 2-12: Configuration step 3: Allocate interfaces per ASIC

In IRF mode on 12500 series switches, you must assign non-default MDCs physical interfaces for establishing IRF connections. A non-default MDC needs to use the physical IRF ports to forward packets between member devices. This is discussed in more detail later in this chapter. After you change the configuration of a physical IRF port, you must use the save command to save the running configuration. Otherwise, after a reboot, the master and subordinate devices in the IRF fabric have different physical IRF port configurations and you must use the undo allocate interface command and the undo port group interface command to restore the default and reconfigure the physical IRF port. Table 2-9 outlines the configuration procedure. Table 2-9: Configuration Procedure Step 1. Enter system view. 2. Enter MDC view.

Command

Remarks

system-view


(Approach 1) Assign individual interfaces to the MDC: allocate interface {

Use either or both approaches.


3. Assign interface-type interfacenumber }&<1-24> physical interfaces to Approach 2) Assign a range of the MDC. interfaces to the MDC: allocate interface interface-type interfacenumber1 to interface-type interface-number2

By default, all physical interfaces belong to the default MDC, and a non-default MDC has no physical interfaces to use. You can assign multiple physical interfaces to the same MDC.

Configuration Step 4: Start MDC Once interfaces are assigned to the MDC, the MDC can be started. The start command starts the control plane and management plane of the MDC, as shown in Figure 2-13. The data plane will be active for any interfaces which have been allocated to this MDC at the moment the MDC is started.

Figure 2-13: Configuration step 4: Start MDC

At this point you may notice that the total memory utilization of the switch will increase. This is because multiple additional processes for the MDC are being started. To start an MDC, see Table 2-10. Important If you access the BootWare menus and select the Skip Current System Configuration option while the device starts up, all MDCs will start up without loading any configuration file. Table 2-10: Starting an MDC Step

Command

1. Enter system view. 2. Enter MDC view.

system-view mdc mdc-name [ id mdc-id ]


3. Start the MDC.

mdc start

Configuration Step 5: Access the MDC A non-default MDC operates as if it were a standalone device. From the system view of the default MDC, you can log in to a non-default MDC and enter MDC system view. In Figure 2-14, the console is switched to the Dev MDC from the Admin MDC. The prompt will display as if you are accessing a new console session. Within the Dev MDC, you will need to enter the system-view again to configure the switch. In this example the host name is changed to Dev for the Dev MDC.

Figure 2-14: Configuration step 5: Access the MDC

In MDC system view, you can assign an IP address to the Management-Ethernet interface, or create a VLAN interface on the MDC and assign an IP address to the interface. This will allow administrators of the MDC to log in to the MDC by using Telnet or SSH. To return from a user MDC to the default MDC, use the switchback or quit command. In this example the switchback command is used to return to the Admin MDC and the output shows the switch name as switch. Table 2-11 outlines how to log in to a non-default MDC from the system view of the default MDC. Table 2-11: To log in to a non-default MDC from the system view of the default MDC Step

Command

Remarks



system-view

2. Log in to an MDC

switchto mdc mdc-name

You use this command to log in to only an MDC that is in active state.

MDC Advanced Configuration Topics Once basic configuration has been completed, multiple advanced options can be configured. Options such as restricting MDC resource access to CPU, memory and file system access will be discussed in this chapter. Configuration of the Management-Ethernet interface and firmware updates will also be discussed. Resource allocation to MDCs are explained in Table 2-12, values may be modified if required. Table 2-12: The default values shown will fit most customer deployments Allocation Resource Information • Used to assign MPU and LPU CPU resources to each MDC according to their CPU weight

CPU weight

• When MDCs need more CPU resources, the device assigns CPU resources according to their CPU weights

Default

• 10 Default (100 Max) • By default, the default MDC has a CPU weight of 10 (unchangeable) on each MPU and each interface card. • Each non-default MDC has a CPU weight of 10 on each MPU and each interface card that it is authorized to use.

• Specify CPU weights for MDCs using “limitresource cpu weight” command


Disk space

Memory space

• Used to limit the amount of disk space each MDC can use for configuration and log files

• 100% Default (100% Max)

• By default, all MDCs share the disk space in • Specify disk space the system, and an MDC can use all free disk percentages for space in the system. MDCs using “limit-resource memory” command • Used to limit the amount of memory • 100% Default (100% Max) space each MDC • By default, all MDCs share the memory space can use in the system, and an MDC can use all free • Specify memory memory space in the system.card, and each space percentages non-default MDC has a CPU weight of 10 on for MDCs using each MPU and each interface card that it is “limit-resource authorized to use. disk” command

Although fabric modules are shared by MDCs, traffic between MDCs are isolated as source/destination Packet Processors within the chassis are isolated.

Restricting MDC Resources: Limit CPU All MDCs are authorized to use the same share of CPU resources. If one MDC takes too many CPU resources, the other MDCs might not be able to operate. To ensure correct operation of all MDCs, specify a CPU weight for each MDC. The amount of CPU resources an MDC can use depends on the percentage of its CPU weight among the CPU weights of all MDCs that share the same CPU. For example, in Figure 2-15, three MDCs share the same CPU, setting their weights to 10, 10, and 5 is equivalent to setting their weights to 2, 2, and 1: ■ The two MDCs with the same weight can use the CPU for approximately the same period of time. ■ The third MDC can use the CPU for about half of the time for each of the other


two MDCs.

Figure 2-15: Restricting MDC resources: Limit CPU

The CPU weight specified for an MDC takes effect on all MPUs and all LPUs that the MDC is authorized to use. Table 2-13 outlines how to specify a CPU weight for an MDC. The resource limits are only used if required. If an MDC does not require any of the CPU resources, other MDCs can use all the available CPU. In other words, there is no hard limit on the CPU usage when CPU resources are available. Table 2-13: How to specify a CPU weight for an MDC Step

Command Remarks


systemview

2. Enter MDC view.

mdc mdcname [ id mdcid ] limitresource cpu weight

3. Specify a CPU weight for the MDC. weightvalue

By default, the default MDC has a CPU weight of 10 (unchangeable) on each MPU and each interface card, and each non-default MDC has a CPU weight of 10 on each MPU and each interface card that it is authorized to use.

Restricting MDC Resources: Limit Memory ******ebook converter DEMO Watermarks*******

By default, MDCs on a device share and compete for the system memory space. All MDCs share the memory space in the system, and an MDC can use all free memory space in the system. If an MDC takes too much memory space, other MDCs may not be able to operate normally. To ensure correct operation of all MDCs, specify a memory space percentage for each MDC to limit the amount of memory space each MDC can use. Table 2-14 outlines how to specify a memory space percentage for an MDC. The memory space to be assigned to an MDC must be greater than the memory space that the MDC is using. Before you specify a memory space percentage for an MDC, use the mdc start command to start the MDC and use the display mdc resource command to view the amount of memory space that the MDC is using. Note An MDC cannot use more memory than the allocated value specified by the limit-resource memory command. This is in contrast to CPU resource limit which is a weighted value. Table 2-14: How to specify a memory space percentage for an MDC Step 1. Enter system view. 2. Enter MDC view.

Command

Remarks

system-view mdc mdc-name [ id mdc-id ]

In standalone mode: limit-resource memory slot slot-number ratio limit-ratio

3. Specify a memory space percentage for In IRF mode: limit-resource memory the MDC. chassis chassis-number

By default, all MDCs share the memory space in the system, and an MDC can use all free memory space in the system.

slot slot-number ratio limit-ratio

Restricting MDC Resources: Limit Storage By default, MDCs on a device share and compete for the disk space of the device's storage media, such as the Flash and CF cards. An MDC can use all free disk space


in the system. If an MDC occupies too much disk space, the other MDCs might not be able to save information such as configuration files and system logs. To prevent this, specify a disk space percentage for each MDC to limit the amount of disk space each MDC can use for configuration and log files. Table 2-15 outlines how to specify a disk space percentage for an MDC. Before you specify a disk space percentage for an MDC, use the display mdc resource command to view the amount of disk space the MDC is using. The amount of disk space indicated by the percentage must be greater than that the MDC is using. Otherwise, the MDC cannot apply for more disk space and no more folders or files can be created or saved for the MDC. If the device has more than one storage medium, the disk space percentage specified for an MDC takes effect on all the media. Table 2-15: To specify a disk space percentage for an MDC Step

Command


system-view

2. Enter MDC view.


Remarks

In standalone mode: limit-resource disk slot slot-number ratio limit-ratio

3. Specify a disk space percentage for In IRF mode: limit-resource disk the MDC. chassis chassis-number

By default, all MDCs share the disk space in the system, and an MDC can use all free disk space in the system.

slot slot-number ratio limit-ratio

Management Ethernet When a non-default MDC is created, the system automatically provides access to the Management Ethernet interface of the MPU. The Management-Ethernet interfaces of all non-default MDCs use the same interface type and number and the same physical port and link as the default MDC's physical Management-Ethernet interface. However, you must assign a different IP address to the Management-Ethernet


interface so MDC administrators can access and manage their respective MDCs, see Figure 2-16 for an example. The IP addresses for the Management-Ethernet interfaces do not need to belong to the same network segment.

Figure 2-16: Management Ethernet

Device Firmware Updates To run Comware 7, MPUs must be fitted with 4GB SDRAM and also have a CF card of at least 1 GB in size. 4 GB SDRAM is fitted as standard in the JC072B and the JG497A, but the JC072A must be upgraded from 1 GB to 4 GB of SDRAM by using two memory upgrade kits (2 x JC609A). If required, 1 GB CF cards (JC684A) are available for purchase. If an upgraded JC072A needs to be returned for repair, be sure to retain the upgrade parts for use in the replacement unit. As shown in Figure 2-17, due to physical memory limits, interface cards with 512 MB memory do not support ISSU, and the interfaces on each of these cards can be assigned to only one MDC. Except for these ISSU and MDC limitations, these cards provide full support for all other features.

Figure 2-17: Device firmware updates

Refer to earlier in this chapter for more detail.

Network Virtualization Types ******ebook converter DEMO Watermarks*******

In this section, MDC and IRF interoperability will be discussed.

IRF Refer to the left hand of Figure 2-18. The network virtualization shown in the figure is the combination of multiple physical switches configured as a single logical fabric using IRF. Distributed link aggregation could then be used to connect multiple physical cables to the separate physical switches as a single logical link connected to a single logical device. Multi-Chassis Link Aggregation (MLAG) could be used for link aggregation between the IRF fabric and other switches. IRF supports both 2 and 4 chassis configurations.

Figure 2-18: Network Virtualization Types

MDC The middle of Figure 2-18 shows MDC on a single physical switch. This has been discussed at length previously in this chapter. We have discussed how the MDC technology provides multi tenant device contexts, where multiple virtual or logical devices are created on a single physical chassis. Each of these logical contexts provides unique VLAN and VRF resources and also provides hardware isolation inside the same physical chassis.

MDC and IRF Although MDC can be deployed on a single chassis with redundant power supplies, redundant management modules (MPUs) and redundant line cards (LPUs), most


customers have MDC deployed together with HP Intelligent Resilient Framework (IRF). IRF N:1 device virtualization together with MDC 1:N virtualization achieves a combined N:1 + 1:N device virtualization solution as shown in the right hand of Figure 2-18. This achieves higher port densities together with chassis redundancy. Currently, only 2-chassis IRF & MDC is supported. The right hand of Figure 2-18 shows MDC and IRF combined to provide a single virtual device with multiple device contexts. In this example, two physical switches are virtualized using IRF to create a single logical switch. The IRF fabric is then carved up into multiple MDCs to provide IRF resiliency for each of the MDCs defined in the IRF fabric. This would be used to provide a common control plane, data plane and management plane for each MDC across 2 physical systems.

IRF-Based MDCs When you configure MDCs, follow these guidelines (see Figure 2-19): ■ To configure both IRF and MDCs on a device, configure IRF first. Otherwise, the device will reboot and load the master's configuration rather than its own when it joins an IRF fabric as a subordinate member, and none of its settings except for the IRF port settings take effect. ■ Before assigning a physical IRF port to an MDC or reclaiming a physical IRF port from an MDC, you must use the undo port group interface command to restore the default. After assigning or reclaiming a physical IRF port, you must use the save command to save the running configuration.

Figure 2-19: IRF-Based MDCs

By default, when a new IRF fabric is created, only the default Admin MDC is created on the IRF fabric. All line cards are assigned to the Admin MDC by default. Line


cards and interfaces will then need to be manually assigned to other MDCs as required. It is important to note that at the time of this writing only 2 chassis IRF fabrics are currently supported in conjunction with the MDC feature. A 4 chassis IRF fabric which provides greater IRF scalability is not currently supported with MDC.

IRF-Based MDCs As discussed previously, any new MDCs need to be authorized to use line cards before interfaces can be allocated to the MDC. Once authorized, port groups are used to allocate interfaces to the MDC. What kind of combinations would be possible with IRF and MDCs? Figure 2-20 shows various MDC and IRF scenarios.

Figure 2-20: IRF-Based MDCs

The first scenario is the most typical. Each MDC is allowed to allocate resources on both chassis 1 and chassis 2. This will provide redundancy for each of the configured MDCs. This is not a required configuration. An MDC can be created without redundancy (as shown in the second scenario). In this example, only specific line cards on chassis 1 in the IRF fabric have been allocated to MDC 4. MDC4 does not have any IRF redundancy on chassis 2. The other MDCs have redundancy and have line cards allocated on both chassis 1 and chassis 2 in the IRF fabric.


In the third scenario, both MDC 3 and 4 have line cards allocated only on chassis 1, while MDC 1 and 2 have line cards allocated from both chassis 1 and 2. MDC 1 and 2 have redundancy in case of a chassis failure, but MDC 3 and 4 do not have any redundancy if chassis 1 fails. In the same way, as seen in the fourth scenario, MDC 1 is only configured on chassis 1, while MDCs 2, 3 and 4 are only configured on chassis 2. This is also a supported configuration. Scenario 5 and 6 show other supported variations of how MDCs can be configured within an IRF fabric. As can be seen, various combinations are possible and the administrator can decide where MDCs operate. There is no limitation on where the MDCs need to be configured on the chassis devices in the IRF fabric.

MDCs and IRF Types Overview There are two ways to configure IRF in combination with MDC. This is dependent on the switch generation. As shown in Figure 2-21, the method used by the 12500 and 12500E Series Switches has separate IRF links per MDC. The alternate method used on the 10500, 11900 and 12900 Series Switches uses a shared IRF link for all MDCs.


Figure 2-21: MDCs and IRF types

12500/12500E When configuring IRF on the 12500/12500E Series Switches, a dedicated IRF link per MDC is required. For MDC 2 on chassis 1 to communicate with MDC 2 on chassis 2, a dedicated IRF port needs to be configured on both chassis switches that are physically part of that MDC. For example, if line card 2 was assigned to MDC 2, then you would need to assign a physical port on line card 2 as an IRF port for MDC 2. If line card 3 was assigned to MDC 3, then a physical port on line card 3 would need to be configured as an IRF port for MDC 3. This would be configured for each MDC. This configuration also results in all data packets for an MDC using the dedicated IRF port between the two chassis switches. As an example, if data is sent between MDC1 on chassis 1 and MDC1 on chassis 2, the data would traverse the dedicated IRF port connecting the two MDCs and not other IRF links. This results in isolation of the data plane as the IRF link of MDC 1 will not receive


traffic from MDC 2 or other MDCs. This also applies to other MDCs.

10500/11900/12900 The version of the IRF and MDC interoperability used on the 10500, 11900 and 12900 Series Switches uses a single shared IRF link for all MDCs rather than a dedicated IRF link per MDC. This results in a change of packet flow between physical switches and MDCs. On a 12500 switch, a packet sent from one MDC to another uses the dedicated link for that MDC. There is no explicit specification of source MDC when traffic traverses the IRF link. It is therefore important that IRF the link be correctly connected to the appropriate MDCs on both chassis switches. If an administrator accidently cabled MDC2 on chassis 1 to MDC3 on chassis 2 on 12500 switches, traffic will flow between the two MDCs using that IRF physical link. VLAN 10 traffic in MDC 2 would end up as VLAN 10 traffic on MDC 3 for example. This breaks the original design principals of MDCs as the switch fabric is now extended from one MDC to another, whereas MDCs should be separate logical switches. Each MDC should have a separate VLAN space, but in this example VLANs are shared. IRF and MDC on 10500, 11900 and 12900 switches no longer require dedicated links per MDC. A shared IRF link is used and MDC traffic is differentiated using an additional tag. Using the same example, if VLAN 10 traffic is sent from MDC2 on chassis 1 to MDC 2 on chassis 2, an additional tag is added to the traffic across the IRF link. This allows chassis 2 to different between the VLAN 10 traffic of MDC 2 and the VLAN 10 traffic of MDC 3. The IRF port is part of the Admin MDC and direct MDC connections are no longer supported. IRF commands are not available in non-default MDCs. Proper bandwidth provisioning is required however, as the IRF port will now be carrying traffic for multiple MDCs.

Configuration Examples 12500/12500E Differences in IRF approaches are reflected in the configuration commands. When configuring IRF on 12500/12500E switches, the MDC is specified in the port group command.


Even though the IRF configured is completed using the Admin MDC, the IRF configuration associates specific IRF interfaces with specific MDCs. In Figure 2-22, the IRF configuration of IRF port 1/1 is shown. Interface Gigabit Ethernet 1/3/0/1 is added to the IRF port, but is associated with MDC 2. The physical interface 1/0/3/1 must be assigned to MDC 2. Gigabit Ethernet 1/3/0/24 could not be used with MDC 2 for example as it has already been associated with MDC 3 using the allocate interface command. In this example, the interface is correctly associated with MDC 3. Note MDC allows IRF fabrics to use 1 Gigabit Ethernet ports rather than only 10 Gigabit Ethernet ports.

Figure 2-22: Configuration examples

10500/11900/12900 The 10500, 11900 and 12900 Series Switches no longer use the MDC keyword when IRF is configured. The interfaces are simply bound to the IRF port (1/1 in this example). The main difference with these switches is that all the interfaces are part of the Admin MDC. It is no longer possible to bind interfaces associated with nondefault MDCs to the IRF port.


More MDC and IRF Configuration Information Because of port groups and ASIC limitations, it may not be possible to assign individual interfaces to IRF ports. Multiple physical interfaces may need to be associated with the IRF port at the same time. Groups of four interfaces are often associated as per the example shown in Figure 2-23.

Figure 2-23: More MDC and IRF configuration information

This is similar to the behavior on 5900 switches which also require that a group of four interfaces be configured for IRF. This doesn't mean that you have to use all four ports for IRF to function. You could as an example only physically cable two of the ports. But, you cannot use any of the four ports in the group for any other function apart from IRF once the group is used for IRF. In Figure 2-23, port TenGigabitEthernet 1/0/0/5 is added to IRF. However, an error is displayed indicating that ports 1/0/0/5 to 1/0/0/8 need to be shut down. As the interfaces are part of a port group, they need to be allocated for IRF use as a group rather than individually. Once allocated, one of the interfaces could be used for the actual IRF functionality, but the entire group needs to be activated for IRF use (this is true for certain platforms such as the 5900 series switches but may be different on other platforms).

10500/11900/12900 Link Failure Scenario ******ebook converter DEMO Watermarks*******

As per IRF best practices, multiple physical interfaces should form part of the IRF link between switches. If one of the physical interfaces goes down, IRF continues to use the remaining links. As long as at least one link is active between the switches, IRF will remain active. There will be reduced bandwidth between the IRF devices, but IRF functionality is not affected (no split brain). However, as shown in Figure 2-24, when all physical links between the switches go down, an IRF split will occur.

Figure 2-24: 10500/11900/12900 link failure scenario

Since the admin MDC is used for IRF configuration port configuration, this is also the MDC where IRF MAD needs to be configured. There will be no MAD configuration in other MDCs. This also implies that the IRF MAD ports have to belong to the Admin MDC.

12500/12500E Link Failure Scenario On the 12500/12500E switches, IRF configuration is more complicated. There is a base IRF protocol running at the chassis level and in addition, MDCs use the IRF physical interfaces to exchange data. The data sent by a MDC is for that particular MDC only. As an example, an IRF link configured in MDC 1 will only transport data between MDC1 contexts. The link between MDC 2 contexts will only transport data for MDC 2. The links do not carry data for other MDC contexts, but are used by the base IRF protocol.


Refer to the first scenario in Figure 2-25. If the link between MDC1 on chassis 1 and MDC 1 on chassis 2 fails, the base IRF protocol will remain online as there are still 3 active links between chassis that can be used by the base IRF protocol.

Figure 2-25: 12500/12500E link failure scenario

However, the data plane connection for MDC 1 is down which results in a split for MDC 1. In a traditional IRF system, that would result in a chassis split brain. However, in this example by contrast, the base IRF protocol can determine that both chassis are still online and are still connected because the 3 remaining links are still active. The base IRF protocol running at the chassis level will trigger MDC 1 to shut down all external ports on the standby chassis, but the core IRF protocol and other MDCs continue to operate normally. This is in effect a split brain scenario for MDC 1, but is automatically resolved by the base IRF protocol because the remaining links are still active and can be used to detect the failure of the single MDC. Once again, MDC 1 is lost on the standby chassis, but MDC 2, 3 and 4 will continue to operate normally. In the second scenario, the IRF link that is part of MDC 2 is lost. In this example, as per the previous example, the base IRF protocol continues to function normally. This is because 3 out of 4 links are still up for the base IRF protocol. The data connection for MDC 2 is down in this example, and this results in a split brain for MDC 2. The IRF protocol will shut down the external facing interfaces of MDC 2 on the standby chassis. All other MDCs will continue to operate normally and so will the base IRF


protocol. Another advantage of this setup is that if the IRF link for a given MDC is restored, the MDC is not rebooted and the ports on the slave device are restored automatically. There is no reboot of the slave device as long as there is an IRF connection between the switches. A similar situation occurs in the third scenario. In this example, both MDC 1 and MDC 2 will have the external interfaces of the standby chassis shut down because of the split brain on those MDCs. The base IRF protocol will continue to operate as normal as there are still two remaining links up between the chassis. MDCs 3 and 4 will also continue to operate normally. In the last example, all links between the chassis are lost. This means that there is no communication between the chassis IRF ports. This results in a split brain scenario for the base IRF protocol and all MDCs. This scenario requires an external multiple active detection method such as MAD BFD to resolve the split brain.

IRF-based MDC: IRF Fabric Split An IRF fabric is split when no physical IRF ports connecting the chassis are active. As shown in Figure 2-26, this results in both chassis becoming active at the same time with the same IP address and same MAC address. This results in multiple network issues and requires a split brain protocol such as Multi Active Detection (MAD) to resolve. One of the systems in the IRF fabric should shut down all external ports.

Figure 2-26: IRF-based MDC: IRF Fabric Split

Previously in this chapter, we discussed the scenario of a split in a single MDC where the standby MDC is automatically shut down. When the link recovers, the MDC is restarted and not the entire chassis. The base kernel and other MDCs will


continue to operate normally. However, when the entire IRF is lost like in this example, the situation is different. When the link is recovered, the standby system will need to be rebooted when it rejoins the fabric. This is similar to a traditional IRF system.

Multi Active Detection (MAD) When all physical IRF ports between chassis go down, an additional mechanism is required to resolve multiple active devices. In order to ensure that the split brain is detected and resolved, configure traditional MAD BFD or MAD LACP. MAD BFD may be the preferred MAD method as there is no dependency on any other devices outside of the IRF fabric and MAD BFD is very fast at detecting the split. As shown in Figure 2-27, MAD BFD is configured at the base IRF level and it thus configured using the Admin MDC. In addition, all MAD BFD links need to be assigned to the Admin MDC.

Figure 2-27: Multi Active Detection (MAD)

Summary In this chapter, you learned about Multitenant Device Context (MDC). This is a technology that can partition a physical device or an IRF fabric into multiple logical switches called "MDCs." MDC features and use cases were discussed in this chapter, including using a single physical switch for multiple customers, which provides separation but also leverages as single device.


The MDC architecture, supported devices and operation were discussed. Upgrade restrictions and options were also discussed. Lastly, support for MDC and IRF was discussed including the differences between first and second generation switches such as the 12500 and 12900. The way IRF ports are configured and the results of link failures including split brain scenarios was also discussed.

Learning Check Answer each of the questions below. 1. An administrator has configured two customer MDCs (MDC 2 and MDC 3) on a core 12500 switch. What should an administrator configure to allow traffic between the two MDCs? a.

Create routed ports in each MDC and configure inter-VLAN routing between the MDCs.

b. Configure VRFs in each MDC and enable route leaking between the VRFs. c. Connect a physical cable from a port in MDC 2 to a port in MDC 3 and then configure the ports to be in the same VLAN on each MDC. d. Configure routing between MDC 1 and the customer MDCs. Traffic between customer MDCs must be sent via the Admin MDC. 2. A network administrator has taken delivery of a new HP 12900 switch. How many MDCs exist when the switch is booted? a. Zero b. One c. Two d. Four e. Nine 3. How are interfaces allocated to MDCs? a. By individual interface b. By interface group c. By interface port d. By MDC number 4. Which device requires separate IRF ports per MDC?


a. 10500 b. 12900 c. 11900 d. 12500 5. A 12500 switch is configured with 4 IRF ports, each of which is in a different MDC. Port 1 = MDC 1, Port 2 = MDC 2, Port 3 = MDC 4. IRF Port 1 goes down. What is the result? a. All MDCs go offline. b. An IRF split occurs and MAD is required to resolve the split brain. c. The core IRF protocol goes offline, but IRF within the MDCs continues as normal. d. MDC 1 goes offline, but other MDCs continue as normal. The core IRF protocol requires MAD to resolve the split brain. e. MDC 1 goes offline, but other MDCs continue as normal. The core IRF protocol continues as normal.

Learning Check Answers 1. c 2. b 3. b 4. d 5. e


3 Multi-CE (MCE)

EXAM OBJECTIVES In this chapter, you learn to: ✓ Describe MCE Features. ✓ Describe MCE use cases. ✓ Configure MCE. ✓ Describe and configure route leaking. ✓ Configure isolated management access.

INTRODUCTION Multi-VPN-Instance CE (MCE) enables a switch to function as a Customer Edge (CE) device of multiple VPN instances in a BGP/MPLS VPN network, thus reducing network equipment investment. In the remainder of this module we will use Multi-CE or MCE when talking about Multi-VPN-Instance CE.

MPLS L3VPN Overview MPLS L3VPN is an L3VPN technology used to interconnect geographically dispersed VPN sites, as shown in Figure 3-1. MPLS L3VPN uses BGP to advertise VPN routes and uses MPLS to forward VPN packets over a service provider backbone.


Figure 3-1: MPLS L3VPN overview

MPLS L3VPN provides flexible networking modes, excellent scalability, and convenient support for MPLS QoS and MPLS TE. Note MPLS basics are discussed in chapter 3 and MPLS VPNs in other study guides. This study guide only covers the MCE feature without a detailed discussion of MPLS L3VPNs.

Basic MPLS L3VPN Architecture A basic MPLS L3VPN architecture has the following types of devices: ■ Customer edge device (CE device or CE) - A CE device resides on a customer network and has one or more interfaces directly connected to a service provider network. It does not support VPN or MPLS. ■ Provider edge device (PE device or PE) - A PE device resides at the edge of a service provider network and connects to one or more CEs. All MPLS VPN services are processed on PEs. ■ Provider device (P device or P) - A P device is a core device on a service provider network. It is not directly connected to any CE. A P device has only basic MPLS forwarding capability and does not handle CEs and PEs mark the boundary between the service providers and the customers. A


CE is usually a router. After a CE establishes adjacency with a directly connected PE, it redistributes its VPN routes to the PE and learns remote VPN routes from the PE. CEs and PEs use BGP/IGP to exchange routing information. You can also configure static routes between them. After a PE learns the VPN routing information of a CE, it uses BGP to exchange VPN routing information with other PEs. A PE maintains routing information about only VPNs that are directly connected, rather than all VPN routing information on the provider network. A P router maintains only routes to PEs. It does not need to know anything about VPN routing information. When VPN traffic is transmitted over the MPLS backbone, the ingress PE functions as the ingress LSR, the egress PE functions as the egress LSR, while P routers function as the transit LSRs.

Site A site has the following features: ■ A site is a group of IP systems with IP connectivity that does not rely on any service provider network. ■ The classification of a site depends on the topological relationship of the devices, rather than the geographical relationships, though the devices at a site are, in most cases, adjacent to each other geographically. ■ A device at a site can belong to multiple VPNs, which means that a site can belong to multiple VPNs. ■ A site is connected to a provider network through one or more CEs. A site can contain multiple CEs, but a CE can belong to only one site. Sites connected to the same provider network can be classified into different sets by policies. Only the sites in the same set can access each other through the provider network. Such a set is called a VPN.

Termininology VRF / VPN Instance VPN instances, also called virtual routing and forwarding (VRF) instances, implement route isolation, data independence, and data security for VPNs.


A VPN instance has the following components: ■ A separate Label Forwarding Information Base (LFIB). ■ A separate routing table. ■ Interfaces bound to the VPN instance. ■ VPN instance administration information, including route distinguishers (RDs), route targets (RTs), and route filtering policies. To associate a site with a VPN instance, bind the VPN instance to the PE's interface connected to the site. A site can be associated with only one VPN instance, and different sites can associate with the same VPN instance. A VPN instance contains the VPN membership and routing rules of associated sites. With MPLS VPNs, routes of different VPNs are identified by VPN instances. A PE creates and maintains a separate VPN instance for each directly connected site. Each VPN instance contains the VPN membership and routing rules of the corresponding site. If a user at a site belongs to multiple VPNs, the VPN instance of the site contains information about all the VPNs. For independence and security of VPN data, each VPN instance on a PE has a separate routing table and a separate label forwarding information base (LFIB). A VPN instance contains the following information: an LFIB, an IP routing table, interfaces bound to the VPN instance, and administration information of the VPN instance. The administration information includes the route distinguisher (RD), route filtering policy, and member interface list.

VPN-IPv4 Address Each VPN independently manages its address space. The address spaces of VPNs might overlap. For example, if both VPN 1 and VPN 2 use the addresses on subnet 10.110.10.0/24, address space overlapping occurs. BGP cannot process overlapping VPN address spaces. For example, if both VPN 1 and VPN 2 use the subnet 10.110.10.0/24 and each advertise a route destined for the subnet, BGP selects only one of them, resulting in the loss of the other route. Multiprotocol BGP (MP-BGP) can solve this problem by advertising VPN-IPv4 addresses (also called VPNv4 addresses). As shown in Figure 3-2, a VPN-IPv4 address consists of 12 bytes. The first eight


bytes represent the RD, followed by a four-byte IPv4 prefix. The RD and the IPv4 prefix form a unique VPN-IPv4 prefix.

Figure 3-2: VPN-IPv4 address

An RD can be in one of the following formats: ■ When the Type field is 0, the Administrator subfield occupies two bytes, the Assigned number subfield occupies four bytes, and the RD format is 16-bit AS number:32-bit user-defined number. For example, 100:1. ■ When the Type field is 1, the Administrator subfield occupies four bytes, the Assigned number subfield occupies two bytes, and the RD format is 32-bit IPv4 address:16-bit user-defined number. For example, 172.1.1.1:1. ■ When the Type field is 2, the Administrator subfield occupies four bytes, the Assigned number subfield occupies two bytes, and the RD format is 32-bit AS number:16-bit user-defined number, where the minimum value of the AS number is 65536. For example, 65536:1. To guarantee global uniqueness for a VPN-IPv4 address, do not set the Administrator subfield to any private AS number or private IP address.

Route Target Attribute MPLS L3VPN uses route target community attributes to control the advertisement of VPN routing information. A VPN instance on a PE supports the following types of route target attributes: ■ Export target attribute—A PE sets the export target attribute for VPN-IPv4 routes learned from directly connected sites before advertising them to other PEs. ■ Import target attribute—A PE checks the export target attribute of VPN-IPv4 routes received from other PEs. If the export target attribute matches the import target attribute of a VPN instance, the PE adds the routes to the routing table of the VPN instance. Route target attributes define which sites can receive VPN-IPv4 routes, and from which sites a PE can receive routes.


Like RDs, route target attributes can be one of the following formats: ■ 16-bit AS number:32-bit user-defined number. For example, 100:1. ■ 32-bit IPv4 address:16-bit user-defined number. For example, 172.1.1.1:1. ■ 32-bit AS number:16-bit user-defined number, where the minimum value of the AS number is 65536. For example, 65536:1.

MCE / VRF-Lite Multi-CE or VRF-Lite supports multiple VPN instances in customer edge devices. This feature provides separate routing tables or VPNs without MPLS L3VPNs and supports overlapping IP addresses.

MCE Overview BGP/MPLS VPN transmits private network data through MPLS tunnels over the public network. However, the traditional MPLS L3VPN architecture requires that each VPN instance use an exclusive CE to connect to a PE, as shown in Figure 3-3.

Figure 3-3: MCE overview

A private network is usually divided into multiple VPNs to isolate services. To meet these requirements, you can configure a CE for each VPN, which increases device expense and maintenance costs. Or, you can configure multiple VPNs to use the same CE and the same routing table, which sacrifices data security. You can use the Multi-VPN-Instance CE (MCE) function in multi-VPN networks.


MCE allows you to bind each VPN to a VLAN interface. The MCE creates and maintains a separate routing table for each VPN. This separates the forwarding paths of packets of different VPNs and, in conjunction with the PE, can correctly advertise the routes of each VPN to the peer PE, ensuring the normal transmission of VPN packets over the public network. As shown in Figure 3-3, the MCE device creates a routing table for each VPN. VLAN interface 2 binds to VPN 1 and VLAN-interface 3 binds to VPN 2. When receiving a route, the MCE device determines the source of the routing information according to the number of the receiving interface, and then adds it to the corresponding routing table. The MCE connects to PE 1 through a trunk link that permits packets tagged with VLAN 2 or VLAN 3. PE 1 determines the VPN that a received packet belongs to according to the VLAN tag of the packet, and sends the packet through the corresponding tunnel. You can configure static routes, RIP, OSPF, IS-IS, EBGP, or IBGP between an MCE and a VPN site and between an MCE and a PE. Note To implement dynamic IP assignment for DHCP clients in private networks, you can configure DHCP server or DHCP relay agent on the MCE. When the MCE functions as the DHCP server, the IP addresses assigned to different private networks cannot overlap.

Feature Overview MCE Features MCE supports the configuration of additional routing tables within a single routing device. As analogy, this can be compared to VLANs configured on Layer 2 switches. Each VLAN is a separate, isolated Layer 2 network and each VPN instance is a separate, isolated Layer 3 network. Each VPN instance or VRF is a separate routing table which runs independently of other routing tables on the device. In Layer 2 VLANs, a Layer 2 access port belongs to a single VLAN. In the same way, in VPN-instances, each Layer 3 routed interface belongs to a single VPN instance. Examples of interfaces that belong to a single VPN instance include: ■ The Layer 3 interface of a VLAN. Example: interface

vlan 10


■ Routed ports. Example: Gigabit

Ethernet 1/0/2

■ Routed subinterfaces. Example: Gigabit

Ethernet 1/0/2.10

■ Loopback interfaces. Example: interface

loopback 1

In Figure 3-4, various interfaces have been defined in separate VPN instances. As an example, Gigabit Ethernet 1/0 and Gigabit Ethernet 2/0.10 are configured in the RED VPN instance, Gigabit Ethernet 2/0.20 is configured in the GREEN VPN instance and loopback 10, interface VLAN 10 and interface VLAN 10 are configured in the BLUE VPN instance.


Each VPN instance configured by a network administrator has separate interfaces and separate routing tables.

Supported Platforms MCE is available on almost all Comware routing devices (switches and routers). Comware 5 fixed port switches include the 3600v2, 5500, 5800 and 5820 switches. Comware 7 fixed port switches include the 5900, 5920 and 5930 switches. Chassis based switches running either Comware 5 or Comware 7 include the 7500 (Comware 5), 10500, 11900, 12500 and 12900 switches. Routers that support MCE include the MSR, HSR and SR series routers.


Design Considerations The number of VPN instances supported is hardware dependent, as shown in Figure 3-5. For software based routers, the restriction is typically a memory restriction.

Figure 3-5: Design considerations

For switches, this is typically restricted by the ASICs used in the switches.

Use Case 1: Multi-Tenant Datacenter A number of use cases for MCE will now be discussed. The first use case is a Multi-Tenant Data Center. This is a data center infrastructure provided by a hosting provider offering various services to customers. A requirement in the environment is that each customer should have a separate routing infrastructure isolated from other customers. Access control lists (ACLs) could be used to separate customers, but ACLs need to be individually configured and are often very complex and are prone to errors. Customers would still be running within the same routing table instance and a misconfigured ACL would allow access between customer networks. By default traffic would be permitted between customers and only with careful ACL configuration are customers blocked. MCE in contrast creates separate routing tables and thus separates customer resources by design. No access is permitted between VPN instances by default. Only with explicit additional configuration (route leaking) is traffic permitted between the separate VPN instances. The MCE feature is also much simpler to configure and maintain than traditional ACLs.


Typically, to ensure that all of these customers can access a common internet gateway connection, MCE is combined with a virtual firewall per customer. The firewall used would also be VPN instance aware to ensure separation. In Figure 3-6, the RED and GREEN customer are configured in separate VPN instances and cannot communicate with each other, even though they are using a shared network infrastructure. Both customers can access also the Internet via the common Internet firewall.

Figure 3-6: Use Case 1: Multi-tenant datacenter

Use Case 2: Campus with Independent Business Units The second use case is a campus with independent business units, or teams or applications. In some cases, external teams may be working at a customer site on a specific project, but may be located throughout the campus. The owner of the infrastructure


may want to isolate the external team from the rest of the network, but allow them to communicate across different parts of the core infrastructure. This would create a separate isolated virtual network using the same equipment. A second example may be the use of external application monitoring. An internal ERP application may be monitored by an external supplier or partner. MCE could be used to tightly control which networks are available to the external party. Only certain internal routes would be advertised and available to the external party. A third example of service isolation is a managed voice over IP (VoIP) infrastructure. In this example, the entire VoIP infrastructure is managed and configured by an external partner. The internal VoIP addressing is isolated from the normal corporate infrastructure providing better security and separation. The external VoIP partner can manage the VoIP network, but has no access to the rest of the network. A forth example is a guest network. A network may consist of multiple locations connected via routed links. Each location may need to provide guest connectivity, but also use a centralized Internet connection. A remote site may be connected via a routed WAN link to the central site and in this case, configuration of separate VPN instances may be beneficial to provide guest network isolation across routed networks.

Use Case 3: Overlapping IP Segments In this third use case example, support for overlapping IP networks is required. This may occur when companies merge and the same IP address space is used by multiple parts of the business. In this case each business or department is separated by VPN instances to isolate the networks and their addressing. If connectivity between the instances is required, a VPN instance aware firewall could be used at the Layer 3 border between instances. This device would perform network address translation (NAT) between the VPN instances as well as provide firewall functionality.

Use Case 4: Isolated Management Network A forth use case of VPN instances is an isolated management network for network devices. This would not be required for Layer 2 switches as these devices do not have IP


addresses in the customer network. The management subnet of a Layer 2 device is by default isolated from the customer or user portion of the network. This is because a Layer 2 switch only has one Layer 3 IP address which is used exclusively for device management, but is configured in a separate management VLAN. On Inter-VLAN routing devices or Layer 3 devices however, the IP interfaces of the device are accessible by user or customer devices by design. Separation in this case would be required. A dedicated VPN instance would be created for the management interface of the device. Protocols such as SNMP, telnet, SSH and other traditional networking management protocols would operate inside the dedicated VPN-Instance and would not be accessible from the customer VPN instances. Note Several HP Provision switches have OOB Management ports. The Provision OOB Management ports operate by default in their own IP routing space. There is no requirements to define a new routing table for management purposes. This is in contrast with HP Comware devices which require administrators to define a management routing table (VPN Instance) for the OOB Management port.

Use Case 5: Shared Services in Data Center This last use case discussed is a shared services VPN instance in a data center. In the first use case discussed, VPN instances were used to separate customer networks. In this example, VPN instances are extended to provide shared services. The type of shared services that a service provider may offer a customer includes central firewall facilities, backup facilities, network monitoring, hypervisor management and security services. All services could be provided either within a single VPN instance or by using multiple VPN instances. Customers could continue using their own routing protocols such as OSPF within their customer VPN instances. The shared services instances may even use different routing protocols. Each VPN instance is still isolated and only specific routes are permitted between the VPN instances by using route leaking.

Basic Configuration Steps The following is an overview of the basic configuration steps: 1. Define a new VPN instance. This creates a new routing table or virtual routing


and forwarding instance (VRF). 2. Each VPN instance is uniquely identified by a route distinguisher (RD). This is an eight byte value used to uniquely identify routes in Multiprotocol BGP (MPBGP). Even though MP-BGP is not used, the RD must be specified. 3. Layer 3 interfaces are then assigned to the VPN instance. 4. All existing interface configuration is removed in step 3. Any IP address or other configuration will need to be reconfigured. 5. Optionally, dynamic or static routing can be configured.

Configuration Step 1: Define VPN-Instance A VPN instance is a collection of the VPN membership and routing rules of its associated site. See Figure 3-7 and Table 3-1 for the first configuration steps to create a VPN instance.

Figure 3-7: Configuration step 1: Define VPN-Instance

Table 3-1: The first configuration step is to create a VPN instance Step

Command

1. Enter system view. 2. Create a VPN instance and enter VPN instance view.

system-view ip vpn-instance vpn-instance-name

Remarks By default, no VPN instance is created.

Once the VPN instance has been defined, a list of VPN instances can be displayed and the routing table of the VPN instance can be displayed.


By default, no interfaces will be bound to the VPN instance apart from internal loopback interfaces in the 127.0.0.0 range. The display ip routing-table vpninstance will display this, as shown in Figure 3-8.

Figure 3-8: Step 1: Define VPN-Instance (continued)

Configuration Step 2: Route Distinguisher The second step is to configure the route-distinguisher (RD) of the VPN instance, as shown in Figure 3-9.

Figure 3-9: Configuration step 2: Route Distinguisher

BGP cannot process overlapping VPN address spaces. For example, if both VPN 1 and VPN 2 use the subnet 10.110.10.0/24 and each advertise a route destined for the subnet, BGP selects only one of them, resulting in the loss of the other route. Multiprotocol BGP (MP-BGP) can solve this problem by advertising VPN-IPv4 prefixes. MCE does not require MP-BGP, but a unique RD is still required. Use Table 3-2 to configure a Route Distinguisher and optional descriptions. Table 3-2: How to configure an RD and optional descriptions


Step

Command

1. Enter system view. 2. Create a VPN instance and enter VPN instance view.

system-view

3. Configure an RD for the VPN instance.

route-distinguisher route-distinguisher

4. (Optional.) Configure a description for the VPN instance. 5. (Optional.) Configure a VPN ID for the VPN instance.

ip vpn-instance vpn-instance-name

Description description

vpn vpn

Remarks By default, no VPN instance is created. By default, no RD is specified for a VPN instance. By default, no description is configured for a VPN instance. By default, no VPN ID is configured for a VPN instance.

The command display ip vpn-instance [ instance-name vpn-instancename ] displays information about a specified or all VPN instances.

Syntax display ip vpn-instance [ instance-name vpn-instance-name ] instance-name vpn-instance-name

Displays information about the specified VPN instance. The vpninstance-name is a case-sensitive string of 1 to 31 characters. If no VPN instance is specified, the command displays brief information about all VPN instances.

Example Display brief information about all VPN instances, as shown in Figure 3-10.


Figure 3-10: Step 2: Route Distinguisher (continued)

Command output is shown in Table 3-3. Table 3-3: Display VPN-instance route distinguisher command output Field

Description

VPN-Instance Name Name of the VPN instance. RD RD of the VPN instance. Create Time

Time when the VPN instance was created.

Configuration Step 3: Define L3 Interface Optionally, Layer 3 routed interfaces can be defined in the VPN instance. This typically applies to switches as most switches have only a single routed interface by default - interface VLAN 1. Additional Layer 3 interfaces can be created either as routed ports, or Layer 3 VLAN interface, or routed subinterface, or loopback interface. Use display interface brief to display brief Ethernet interface information. In the output in Figure 3-11, multiple interface types are shown, including a routed port, routed subinterface, loopback interface and VLAN interface.


Figure 3-11: Step 3: Define L3 Interface (continued)

Syntax display interface [ interface-type [ interface-number | interfacenumber.subnumber ] ] brief [ description ] interface-type

Specifies an interface type. interface-number Specifies an interface number. interface-number.subnumber

Specifies a subinterface number, where interface-number is a main interface (which must be a Layer 3 Ethernet interface) number, and subnumber is the number of a subinterface created under the interface. The value range for the subnumber argument is 1 to 4094. description

Displays the full description of the specified interface. If the keyword is not specified, the command displays at most the first 27 characters of the interface description. If the keyword is specified, the command displays all characters of the interface description.

Usage Guidelines If no interface type is specified, this command displays information about all interfaces.


If an interface type is specified but no interface number or subinterface number is specified, this command displays information about all interfaces of that type. If both the interface type and interface number are specified, this command displays information about the specified interface.

Examples Display brief information about all interfaces.

The brief information of interface(s) under bridge mode:

Command output is shown in Table 3-4. Table 3-4: Display brief information about all interfaces command output Field

Description

The brief


information of Brief information about Layer 3 interfaces. interface(s) under route mode: Link: ADM ADM—The interface has been shut down by the network administratively administrator. To recover its physical layer state, run the undo down; Stby shutdown command. standby Stby—The interface is a standby interface. If the network layer protocol of an interface is UP, but its link is an Protocol: (s) – on-demand link or not present at all, this field displays UP (s), spoofing where s represents the spoofing flag. This attribute is typical of interface Null 0 and loopback interfaces. Interface

Link

Description

The brief information of interface(s) under bridge mode:

Interface name. Physical link state of the interface: UP—The link is up. DOWN—The link is physically down. ADM—The link has been administratively shut down. To recover its physical state, run the undo shutdown command. Stby—The interface is a standby interface. Interface description configured by using the description command. If the description keyword is not specified in the display interface brief command, the Description field displays at most 27 characters. If the description keyword is specified in the display interface brief command, the field displays the full interface description.

Brief information about Layer 2 interfaces.

If the speed of an interface is automatically negotiated, its speed attribute includes the auto negotiation flag, indicated by the letter a Speed or in parentheses. Duplex: (a)/A - If the duplex mode of an interface is automatically negotiated, its auto; H - half; F duplex mode attribute includes the following options: – full (a)/A—Auto negotiation. H—Half negotiation. F—Full negotiation. Type: A access; T Link type options for Ethernet interfaces.


trunk; H – hybrid

Link type options for Ethernet interfaces.

Speed

Interface rate, in bps. Duplex mode of the interface: A—Auto negotiation. F—Full duplex. F(a)—Auto negotiated full duplex. H—Half duplex. H(a)—Auto negotiated half duplex Link type of the interface: A—Access. H—Hybrid. T—Trunk.. Port VLAN ID.

Duplex

Type PVID

Cause

Causes for the physical state of an interface to be DOWN. Not connected—No physical connection exists (possibly because the network cable is disconnected or faulty). Administratively DOWN—The port was shut down with the shutdown command. To restore the physical state of the interface, use the undo shutdown command.

Configuration Step 4: Bind L3 Interface By default all Layer 3 interfaces on a device are associated with the default VPN instance (public VPN instance). After creating and configuring a VPN instance, associate the VPN instance with the MCE's interface connected to the site and the interface connected to the PE. Any IP address configuration on the interface is lost and will need to be reconfigured, see Figure 3-12.


Figure 3-12: Configuration Step 4: Bind L3 Interface

Use Table 3-5 to associate a VPN instance with an interface. Table 3-5: How to associate a VPN instance with an interface Step

Command


systemview

2. Enter interface view.

interface interfacetype interfacenumber

3. Associate a VPN instance with the interface.

ip binding vpninstance vpninstancename

Remarks

By default, no VPN instance is associated with an interface. The interface is by default part of the public / default instance The ip binding vpn-instance command deletes the IP address of the current interface. You must reconfigure an IP address for the interface after configuring the command.

Display detailed information about a specified VPN instance. display ip vpn-instance instance-name vpn1 VPN-Instance Name and ID : vpn1, 1 Create time : 2000/04/26 13:29:37 Up time : 0 days, 16 hours, 45 minutes and 21 seconds Route Distinguisher : 10:1 Export VPN Targets : 10:1 Import VPN Targets : 10:1


Description : this is vpn1 Maximum Routes Limit : 200 Interfaces : Vlan-interface2, LoopBack0

Command output is shown in Table 3-6. Table 3-6: Display detailed VPN instance information command output Field

Description

VPN-Instance Name and ID Name and ID of the VPN instance Create Time Time when the VPN instance was created Up Time Duration the VPN instance has been up Route Distinguisher Export VPN Targets Import VPN Targets

RD of the VPN instance Export target attribute of the VPN instance Import target attribute of the VPN instance

Import Route Policy Import routing policy of the VPN instance Description Description of the VPN instance Maximum number of Routes Maximum number of routes of the VPN instance Interfaces

Interfaces bound to the VPN instance

Configuration Step 5: Configure IP on L3 address Overview Once the Layer 3 interface has been associated with the VPN instance, an IP address is required. Configure the IP address on the interface in the VPN instance. The display interface brief command does not indicate VPN instance membership. To view the VPN instance membership, use the display ip vpninstance or display ip routing table commands. In the example in Figure 3-13, an IP address is configured on Gigabit Ethernet 2/0 and this is shown in the output of the display ip vpn-instance instance-name vpn1


Figure 3-13: Configuration Step 5: Configure IP on L3 address

IP Address Use the ip port.

address

Use the undo

command to assign an IPv4 address to the management Ethernet

ip address

command to restore the default.

Syntax ip address ip-address { mask-length | mask } undo ip address

ip-address: Specifies an IPv4 address in dotted decimal notation. mask-length: Specifies the length of the subnet mask, in the range of 0 to 32. mask: Specifies the subnet mask in dotted decimal notation. Default: No IPv4 address is configured.

Display IP Routing-Table VPN-Instance Use the display ip routing-table vpn-instance command to display the routing information of a VPN instance / VRF.

Syntax display ip routing-table vpn-instance vpn-instance-name [ verbose ] vpn-instance-name


Name of the VPN instance, a string of 1 to 31 characters. verbose

Displays detailed information.

Example Display the routing information of VPN instance vpn2.

Command output is shown in Table 3-7. Table 3-7: Display IP routing-table VPN-instance command output Field

Description

Destinations

Number of destination addresses

Routes Number of routes Destination/Mask Destination address/mask length Proto Protocol discovering the route Pre Preference of the route Cost NextHop Interface

Cost of the route Address of the next hop along the route Outbound interface for forwarding packets to the destination segment

Configuration Step 6: Configure Routing (1 of 3) Overview ******ebook converter DEMO Watermarks*******

You can configure static routing, OSPF, EBGP, or IBGP between an MCE and a VPN site.

Static Routes An MCE can reach a VPN site through a static route, see Figure 3-14 for an example static route inside VPN-Instance. Static routing on a traditional CE is globally effective and does not support address overlapping among VPNs. An MCE supports binding a static route to a VPN instance, so that the static routes of different VPN instances can be isolated from each other.

Figure 3-14: Configuration step 6: Configure Routing (1 of 3)

Use Table 3-8 to configure a static route to a VPN site. Table 3-8: How to configure a static route to a VPN site Step

Command


system-view

2. Configure a static route for a VPN instance.

ip route-static vpn-instance s-vpninstance-name dest-address { mask-length | mask } { interface-type interface-number [ next-hop-address ] | next-hop-address [ public ] | vpn-instance d-vpn-instancename next-hop-address } [ permanent ] [ preference preference-value ] [ tag tagvalue ] [ description description-text ]

3. (Optional.) Configure


Remarks

By default, no static route is configured. Perform this configuration on the MCE. On the VPN site, configure a common static route.

The default

the default reference for static routes.

default-preference-value

The default preference is 60.

Configuration Step 6: Configure routing (2 of 3) Once the static routes have been defined, the routing tables for the VPN instance can be reviewed. As shown in Figure 3-15, network connectivity can also be tested using the ping and tracert tools for example. These commands require that the -vpn-instance option be specified to indicate the specific VPN instance. Otherwise traffic is sent in the public instance.

Figure 3-15: Configuration step 6: Configure routing (2 of 3)

This also applies to other commands such as viewing the ARP cache.

ping Use ping to verify whether the destination IP address is reachable, and display related statistics. To use the name of the destination host to perform the ping operation, you must first


configure the DNS on the device. Otherwise, the ping operation will fail. To abort the ping operation during the execution of the command, press Ctrl+C.

Syntax ping [ ip ] [ -a source-ip | -c count | -f | -h ttl | -i interfacetype interface-number | -m interval | -n | -p pad | -q | -r | -s packet-size | -t timeout | -tos tos | -v | -vpn-instance vpninstance-name ] * host ip:

Supports IPv4 protocol. If this keyword is not specified, IPv4 is also supported. -a source-ip:

Specifies the source IP address of an ICMP echo request. It must be an IP address configured on the device. If this option is not specified, the source IP address of an ICMP echo request is the primary IP address of the outbound interface of the request. -c count:

Specifies the number of times that an ICMP echo request is sent. The count argument is in the range of 1 to 4294967295. The default value is 5. -f:

Discards packets larger than the MTU of an outbound interface, which means the ICMP echo request is not allowed to be fragmented. ttl: Specifies the TTL value for an ICMP echo request. The ttl argument is in the range of 1 to 255. The default value is 255. -h

interface-type interface-number: Specifies the ICMP echo request sending interface by its type and number. If this option is not provided, the ICMP echo request sending interface is determined by searching the routing table or forwarding table according to the destination IP address. -i

-m interval:

Specifies the interval (in milliseconds) to send an ICMP echo request. The interval argument is in the range of 1 to 65535. The default value is 200. -n:

Disables domain name resolution for the host argument. If the host argument represents the host name for the destination, and this keyword is not specified, the device translates host into an address. -p pad:

Specifies the value of the pad field in an ICMP echo request, in hexadecimal format. No more than 8 "pad" hexadecimal characters can be used. The pad argument is 0 to ffffffff. If the specified value is less than 8 characters, 0s are added in front of the value to extend it to 8 characters. For example, if pad is configured as 0x2f, then the packets are padded with


0x0000002f to make the total length of the packet meet the requirements of the device. By default, the padded value starts from 0x01 up to 0xff, where another round starts again if necessary, like 0x010203…feff01…. -q:

Displays only statistics. If this keyword is not specified, the system displays all information. -r:

Records routing information. If this keyword is not specified, routes are not recorded. -s packet-size:

Specifies length (in bytes) of an ICMP echo request (not including the IP packet header and the ICMP packet header). The packet-size argument is in the range of 20 to 8100. The default value is 56. -t timeout:

Specifies the timeout time (in milliseconds) of an ICMP echo reply. If the source does not receive an ICMP echo reply within the timeout, it considers the ICMP echo reply timed out. The timeout argument is in the range of 0 to 65535. The default value is 2000. -tos tos:

Specifies the ToS value of an ICMP echo request. The tos argument is in the range of 0 to 255. The default value is 0. -v:

Displays non ICMP echo reply received. If this keyword is not specified, the system does not display non ICMP echo reply. Specifies the MPLS L3VPN to which the destination belongs, where the vpn-instance-name argument is a case-sensitive string of 1 to 31 characters. If the destination is on the public network, do not specify this option. -vpn-instance

vpn-instance-name:

host:

IP address or host name (a string of 1 to 20 characters) for the destination.

Examples Test whether the device with an IP address of 1.1.2.2 is reachable.


Test whether the device with an IP address of 1.1.2.2 in VPN 1 is reachable.

Test whether the device with an IP address of 1.1.2.2 is reachable. Only results are displayed.

Test whether the device with an IP address of 1.1.2.2 is reachable. The route information is displayed.

The output shows that: ■ The destination is reachable. ■ The route is 1.1.1.1 <-> {1.1.1.2; 1.1.2.1} <-> 1.1.2.2.


Table 3-9: Test reachable destinations command output Field

Description

PING 1.1.2.2 (1.1.2.2): 56 data bytes, press CTRL_C to break

Test whether the device with IP address 1.1.2.2 is reachable. There are 56 data bytes in each ICMP echo request. Press Ctrl+C to abort the ping operation. Received ICMP echo replies from the device whose IP address is 1.1.2.2. If no echo reply is received during the timeout period, no information is displayed. bytes—Number of data bytes in the ICMP reply. icmp_seq—Packet sequence, used to determine whether a segment is lost, disordered or repeated. ttl—TTL value in the ICMP reply. time—Response time.

56 bytes from 1.1.2.2: icmp_seq=0 ttl=254 time=4.685 ms

RR:

Routers through which the ICMP echo request passed. They are displayed in inversed order, which means the router with a smaller distance to the destination is displayed first.

--- 1.1.2.2 ping statistics Statistics on data received and sent in the ping operation. --5 packet(s) transmitted Number of ICMP echo requests sent. 5 packet(s) received Number of ICMP echo replies received. 0.0% packet loss

Percentage of packets not responded to the total packets sent.

round-trip min/avg/max/std-dev = Minimum/average/maximum/standard deviation response 4.685/4.761/4.834/0.058 time, in milliseconds. ms

Configuration step 6: Configure routing (3 of 3) Use the display arp vpn-instance command to display the ARP entries for a specific VPN. As shown in Figure 3-16, the command shows information about ARP entries including the IP address, MAC address, VLAN ID, output interface, entry type, and aging timer.


Figure 3-16: Configuration step 6: Configure routing (3 of 3)

Syntax display arp vpn-instance vpn-instance-name [ count ] vpn-instance-name

Specifies the name of an MPLS L3VPN, a case-sensitive string of 1 to 31 characters. count

Displays the number of ARP entries.

Example Display ARP entries for the VPN instance named test.

VPN-Instance dynamic routing— OSPF example Overview A separate OSPF process is required for every VPN instance. In Figure 3-17, an OSPF process of 1001 is configured for VPN instance customerA. When configuring the OSPF process, specify a unique process number for that OSPF process and the VPN instance that the OSPF process is associated with.


Figure 3-17: VPN-Instance dynamic routing--OSPF example

Each OSPF process configured on a device will have its own link state database and requires its own router-id which must exist in the VPN instance. The OSPF configuration process is very similar to traditional OSPF configuration. A loopback address is configured in the VPN instance before configuring OSPF. If no routed interfaces are available within the VPN instance, the OSPF process will not start because no router-id can be allocated to the process. In Figure 3-17 area 0 is configured within the OSPF process and OSPF is enabled on all interfaces configured with IPv4 addresses in the VPN instance.

Loopback configuration By default all Layer 3 interfaces are associated with the default VPN instance. After creating and configuring a VPN instance, associate the VPN instance with the MCE's interface connected to the site and the interface connected to the PE. Any IP address configuration on the interface is lost and will need to be reconfigured. To associate a VPN instance with an interface, see Table 3-10. Table 3-10: How to associate a VPN instance with an interface Step

Command


system-view

2. Enter

interface interface-

Remarks


interface view. 3. Associate a VPN instance with the interface.

type interfacenumber ip binding vpn-instance vpninstancename

By default, no VPN instance is associated with an interface. The ip binding vpn-instance command deletes the IP address of the current interface. You must reconfigure an IP address for the interface after configuring the command.

OSPF An OSPF process belongs to the public network or a single VPN instance. If you create an OSPF process without binding it to a VPN instance, the process belongs to the public network. Binding OSPF processes to VPN instances ensures that routes learned populate the correct VPN instance. To configure OSPF between an MCE and a VPN site, see Table 3-11. Table 3-11: How to configure OSPF between an MCE and a VPN site Step

Command


system-view

2. Create an OSPF process for a VPN instance and enter OSPF view.

ospf [ process-id | router-id router-id | vpn-instance vpninstance-name ] *

Remarks

Perform this configuration on the MCE. On a VPN site, create a common OSPF process. An OSPF process bound to a VPN instance does not use the public network router ID configured in system view. Therefore, configure a router ID for the OSPF process. An OSPF process can belong to only one VPN instance, but one VPN instance can use multiple OSPF processes to advertise VPN routes. The default domain ID is 0.


3. (Optional.) Configure the OSPF domain ID.

4. (Optional.) Configure the type codes of OSPF extended community attributes. 5. Optional.) Configure the external route tag for imported VPN routes.

6. Redistribute remote site routes advertised by the PE into OSPF. 7. (Optional.) Configure OSPF to redistribute the default route.

8. Create an OSPF area and enter OSPF area view.

9. Enable OSPF on interfaces that are configured with subnets in the range specified by the network command.

domain-id domain-id [ secondary ]

Perform this configuration on the MCE. All OSPF processes of the same VPN instance must be configured with the same OSPF domain ID to ensure correct route advertisement..

ext-community-type { The defaults are as follows: domain-id type-code1 | router-id type-code2 | • 0x0005 for Domain ID. route-type type-code3 • 0x0107 for Router ID. • 0x0306 for Route Type. }

route-tag tag-value

import-route protocol [ process-id | all-processes | allow-ibgp ] [ allowdirect | cost cost | route-policy routepolicy-name | tag tag | type type ] *

By default, no routes are redistributed into OSPF.

By default, no routes are redistributed into OSPF.

default-routeadvertise summary cost cost

By default, OSPF does not redistribute the default route. This command redistributes the default route in a Type-3 LSA. The MCE advertises the default route to the site.

area area-id

By default, no OSPF area is created.

network ip-address wildcard-mask

By default, an interface neither belongs to any area nor runs OSPF.


VPN-instance dynamic routing—OSPF example Overview To view information for a specific OSPF process or VPN instance, specify the OSPF process number in commands. The VPN instance keyword is not required as an OSPF process number is associated with an individual VPN instance. In Figure 3-18, the link state database of OSPF process number 1001 is displayed as well as the OSPF peers for the process.

Figure 3-18: VPN-instance dynamic routing--OSPF example

display ospf lsdb Use the display ospf lsdb command to display OSPF LSDB information. If no OSPF process is specified, this command displays LSDB information for all OSPF processes.

Syntax display ospf [ process-id ] lsdb [ brief | [ { asbr | ase | network | nssa | opaque-area | opaque-as | opaque-link | router | summary } [ link-state-id ] ] [ originate-router advertising-router-id | self-originate ] ]


process-id

Specifies an OSPF process by its ID in the range of 1 to 65535. brief

Displays brief LSDB information. asbr

Displays Type-4 LSA (ASBR Summary LSA) information in the LSDB. ase

Displays Type-5 LSA (AS External LSA) information in the LSDB. network

Displays Type-2 LSA (Network LSA) information in the LSDB. nssa

Displays Type-7 LSA (NSSA External LSA) information in the LSDB. opaque-area

Displays Type-10 LSA (Opaque-area LSA) information in the LSDB. opaque-as

Displays Type-11 LSA (Opaque-AS LSA) information in the LSDB. opaque-link

Displays Type-9 LSA (Opaque-link LSA) information in the LSDB. router

Displays Type-1 LSA (Router LSA) information in the LSDB. summary

Displays Type-3 LSA (Network Summary LSA) information in the LSDB. link-state-id

Specifies a Link state ID, in the IP address format. originate-router advertising-router-id

Displays information about LSAs originated by the specified router.


self-originate

Displays information about self-originated LSAs.

Example Display OSPF LSDB information.

Command output is shown in Table 3-12. Table 3-12: Display OSPF LSDB command output Field

Description

Area Type

LSDB information of the area. LSA Type.

LinkState ID AdvRouter Age Len

Link state ID. Advertising router. Age of LSA. Length of LSA.


Sequence Metric

Sequence number of the LSA. Cost of the LSA.

*Opq-Link Opaque LSA generated by a virtual link. Display Type-2 LSA (Network LSA) information in the LSDB.

Command output is shown in Table 3-13. Table 3-13: Display Type-2 LSA (Network LSA) information in the LSDB command output Field

Description

Type

LSA type.

LS ID

DR IP address.


Adv Rtr LS Age

Router that advertised the LSA. LSA age time.

Len

Length of LSA. LSA options: O-Opaque LSA advertisement capability. E-AS External LSA reception capability. EA-External extended LSA reception capability. DC-On-demand link support. N-NSSA external LSA support. P-Capability of an NSSA ABR to translate Type-7 LSAs into Type-5 LSAs.

Options

Seq# Checksum Net Mask

LSA sequence number. LSA checksum. Network mask.

Attached Router

ID of the router that established adjacency with the DR, and ID of the DR itself.

display ospf peer Use the display neighbors.

ospf

peer

command to display information about OSPF

If no OSPF process is specified, this command displays OSPF neighbor information for all OSPF processes. If the verbose keyword is not specified, this command displays brief OSPF neighbor information. If no interface is specified, this command displays the neighbor information for all interfaces. If no neighbor ID is specified, this command displays all neighbor information.

Syntax display ospf [ process-id ] peer interface-number ] [ neighbor-id ]

[

verbose

]

[

interface-type

process-id

Specifies an OSPF process by ID in the range of 1 to 65535.


verbose

Displays detailed neighbor information. interface-type interface-number

Specifies an interface by its type and number. neighbor-id

Specifies a neighbor router ID.

Example Display detailed OSPF neighbor information.

Command output is shown in Table 3-14. Table 3-14: Display detailed OSPF neighbor information command output Field Area areaID interface IPAddress (InterfaceName)'s neighbors Router ID Address GR State

Description Neighbor information of the interface in the specified area: areaID-Area to which the neighbor belongs. IPAddress-Interface IP address. InterfaceName-Interface name. Neighbor router ID. Neighbor router address. GR state. Neighbor state: Down-Initial state of a neighbor conversation.


Mode Priority

Init-The router has seen a Hello packet from the neighbor. However, the router has not established bidirectional communication with the neighbor (the router itself did not appear in the neighbor's hello packet). Attempt- Available only in an NBMA network, Under this state, the OSPF router has not received any information from a neighbor for a period but can send Hello packets at a longer interval to keep neighbor relationship. 2-Way-Communication between the two routers is bidirectional. The router itself appears in the neighbor's Hello packet. Exstart-The goal of this state is to decide which router is the master, and to decide upon the initial Database Description (DD) sequence number. Exchange-The router is sending DD packets to the neighbor, describing its entire link-state database. Loading-The router sends LSRs packets to the neighbor, requesting more recent LSAs. Full-The neighboring routers are fully adjacent. Neighbor mode for LSDB synchronization. Neighboring router priority.

DR BDR

DR on the interface's network segment. BDR on the interface's network segment.

MTU

Neighboring router interface MTU. LSA options: O-Opaque LSA advertisement capability. E-AS External LSA reception capability. EA-External extended LSA reception capability. DC-On-demand link support. N-NSSA external LSA support. P-Capability of an NSSA ABR to translate Type-7 LSAs into Type-5 LSAs.

State

Options

Dead timer due in This dead timer will expire in 33 seconds. 33 sec Neighbor is up The neighbor has been up for 02:03:35. for 02:03:35 Authentication Authentication sequence number. Sequence Neighbor state change count

Count of neighbor state changes.


VPN-Instance Dynamic Routing— OSPF Example Overview Use the display ip routing-table vpn-instance command to display the routing information of a VPN instance / VRF. In Figure 3-19, the output of the routing table for VPN instance customerA is shown on R1.

Figure 3-19: VPN-instance dynamic routing--OSPF example

Syntax display ip routing-table vpn-instance vpn-instance-name [ verbose ] vpn-instance-name

Name of the VPN instance, a string of 1 to 31 characters. verbose

Displays detailed information.

Example Display the routing information of VPN instance vpn2.


Command output is shown in Table 3-15: Table 3-15: Display the routing information of VPN instance vpn2 command output Field

Description

Destinations Number of destination addresses Routes Number of routes Destination/Mask Destination address/mask length Proto Pre Cost NextHop Interface

Protocol discovering the route Preference of the route Cost of the route Address of the next hop along the route Outbound interface for forwarding packets to the destination segment

MCE: Advanced configuration In this section, advanced MDC configuration topics are discussed including the following: ■ Routing table limits are used to ensure that the VPN-instance routing tables do not consume all the hardware resources of the underlying platform. This is done by limiting the number of routes permitted in a VPN instance. ■ Route leaking is a VPN configuration option which allows routing between VPN instances. VPN instances are by design isolated from each other. However, in certain cases, routing is required between VPN instances and routes can therefore


be "leaked" between isolated routing tables. Routes of one VPN instance can also be advertised into other VPN instances to provide dynamic routing exchange between routing protocols in different VPN instances. ■ Management Access VPN instances are popular in data center environments and are typically configured on core and distribution switches. These switches are performing IP routing and IP forwarding roles for customer VPNs. To isolate the management function of these devices from customer networks, management protocols and management functionality are configured within a dedicated management VPN instance.

VPN instance routing limits Overview VPN instance routing table limits allow a network operator to restrict the number of active routes allowed in a VPN instance. It is recommended that VPN instance routing limits be configured on all customer VPNs to ensure resource protection. If this is not done, a single VPN instance could potentially consume all hardware resources. As an example, if OSPF was configured within a customer VPN instance, the OSPF process within the VPN instance may be learning hundreds or thousands of routes from an external OSPF router. However, the same underlying hardware or ASICs are being used for all OSPF processing in all VPN instances on that device. A customer VPN instance may be able to consume a disproportionate amount of resources, or in the worst case scenario, consume all resources on core devices. Underlying ASIC routing table limits apply to all the VPN instances which are defined. If a switch can support 64 thousand routes as a maximum, one VPN instance could consume all 64 thousand routes which would mean that there are no hardware resources available for other VPN instances. This will not only affect that single VPN instance, but will affect all VPN instances and therefore potentially affect all customers. Setting limits on the number of routes permitted in a VPN instance ensures that a sufficient number of free resources are available on core devices. This protects both backbone routing as well as routing for other VPN instances. By default, the number of active routes allowed for a VPN instance is not limited. Setting the maximum number of active routes for a VPN instance can prevent the a


device from learning too many routes. Two types of limits are configurable: ■ Limit ■ Warning threshold The routing table limit will limit the maximum number of routes accepted by the routing table. In Figure 3-20, this value is set to 20 which limits the routes in the VPN instance to a maximum of 20 routes. In Figure 3-20 the warning threshold is also set to 80 percent. When the number of routes in the VPN instance reaches 16 routes (80% of 20), SNMP traps will be generated to warn network operators that the number of routes in the routing table is approaching the maximum. This is a type of high-water mark alert notifying network operators before additional routes are denied entry to the VPN instance routing table.

Figure 3-20: VPN instance routing limits

Warning message examples .%May 13 11:56:13:847 2014 HP RM/4/RM_ACRT_REACH_THRESVALUE: Threshold value 80% of max active IPv4 routes reached in URT of customerA .%May 13 11:56:33:426 2014 HP RM/4/RM_ROUTE_REACH_LIMIT: Max active IPv4 routes 20 reached the limit in URT of customerA

Syntax routing-table limit number { warn-threshold | simply-alert } undo routing-table limit number

Specifies the maximum number of routes. The value range depends on the system operating mode. warn-threshold

Specifies a warning threshold, in the range of 1 to 100 in percentage. When the percentage of the number of existing routes to the maximum number of routes exceeds the specified threshold, the system gives an


alarm message but still allows new routes. If routes in the VPN instance reach the maximum, no more routes are added simply-alert

Specifies that when routes exceeds the maximum number, the system still accepts routes but generates a system log message.

Usage guidelines A limit configured in VPN instance view applies to both the IPv4 VPN and the IPv6 VPN. A limit configured in IPv4 VPN view or IPv6 VPN view applies to only the IPv4 VPN or the IPv6 VPN. IPv4/IPv6 VPN prefers the limit configured in IPv4/IPv6 VPN view over the limit configured in VPN instance view.

Examples Specify that VPN instance vpn1 supports up to 1000 routes, and when routes exceed the upper limit, can receive new routes but generates a system log message. system-view [Sysname] ip vpn-instance vpn1 [Sysname-vpn-instance-vpn1] route-distinguisher 100:1 [Sysname-vpn-instance-vpn1] routing-table limit 1000 simply-alert Specify that the IPv4 VPN vpn2 supports up to 1000 routes, and when routes exceed the upper limit, can receive new routes but generates a system log message. system-view [Sysname] ip vpn-instance vpn2 [Sysname-vpn-instance-vpn2] route-distinguisher 100:2 [Sysname-vpn-instance-vpn2] ipv4-family [Sysname-vpn-ipv4-vpn2] routing-table limit 1000 simply-alert

Specify that the IPv6 VPN vpn3 supports up to 1000 routes, and when routes exceed the upper limit, can receive new routes but generates a system log message. system-view [Sysname] ip vpn-instance vpn3


[Sysname-vpn-instance-vpn3] route-distinguisher 100:3 [Sysname-vpn-instance-vpn3] ipv6-family [Sysname-vpn-ipv4-vpn3] routing-table limit 1000 simply-alert

Route leaking Route leaking allows for tightly controlled routed communication between VPN instances. While the original purpose of VPN instances was to isolate communication between instances, you may have scenarios where some routed communication between VPN instances is required. One of the advantages of using the VPN instance is that we can simply create a limited number of routes for leaking. If the routes are not manually defined, no communication will be possible between VPN instances. Access control is therefore easier to implement than using access control lists within a single routing table. One use case for route leaking is to specify that certain subnets in VPN instance A are reachable from certain subnets instance B. As an example, 10.1.1.0/24 in VPN instance A is reachable from 10.2.2.0/24 in VPN instance B. Note that 10.2.2.0/24 needs to be reachable from VPN instance A to allow for bidirectional communication. Another scenario is a shared service within a data center configured on a subnet such as 10.254.0.0/16. The shared subnet could provide backup services or monitoring services to customer VPN instances. Route leaking between a VPN instance and the public routing table is also possible for scenarios such as a central firewall with Internet access. The firewall could be in a dedicated VPN instance as shown in Figure 3-21 or the public routing table. Note When configuring route leaking, ensure that bidirectional communication is enabled by leaking the necessary routes into both VPN instances.


Figure 3-21: Route leaking

Network Address Translation (NAT) is not supported on some of the Layer 3 switches discussed in this study guide. Therefore, overlapping IP addresses cannot be used if only those devices are used. Overlapping subnets in VPN instances can be used, but require inter-VPN instance NAT which is only supported on routers.

Route leaking—Static route example The scenario in Figure 3-22 shows two VPN instances which require communication between them. A shared firewall is configured in the Shared-Internet (shared) VPN instance. The Core routing device has multiple VPN instances configured with one interface in the CustomerA VPN instance and the other in the Shared-Internet (shared) VPN instance. CA-R1 is unaware of any configured VPN instances and is configured as a traditional router or switch. The firewall is also unaware of VPN instances in this example. Several firewalls can be configured to be VPN instance aware, but in this example, the firewall is configured as a traditional firewall only with IPv4 addresses on the internal and external interfaces.


Figure 3-22: Route leaking--Static route example

The CustomerA VPN instance is configured with subnets in the 10.2.0.0/16 range and the Shared-Internet (shared) VPN instance with subnets in the 10.3.0.0/16 range. An Internet facing router is performing NAT (not shown in the diagram). To enable connectivity, static routes need to be configured on the Core, CA-R1 and Firewall devices. The first static route command in Figure 3-22 is configured on the Core device and enables connectivity to networks in the range 10.2.0.0/16 via the next hop 10.2.1.2 (CA-R1). The static route is added to the CustomerA VPN instance routing table on the Core device. The second static route adds a default route to the shared (SharedInternet) VPN instance with the next hop set as the Firewall. CA-R1 has a default route configured with the Core device being the next hop. The Firewall needs to be configured with subnet 10.2.0.0/16 to allow for bidirectional traffic. The next hop on the Firewall for 10.2.0.0/16 is set to the Core device. To enable route leaking, additional static routes are then added to each VPN instance on the Core device. A static route is added to each VPN instance on the core device, but in this case the next hop is set to an IP address in a different VPN instance. The first route leaking command adds a default route (0.0.0.0) to the CustomerA VPN instance, but sets the next hop to 10.3.1.3 in the shared VPN instance. The second route leaking command adds network 10.2.0.0/16 to the shared VPN instance with a next hop of 10.2.1.2 in the CustomerA VPN instance.

Syntax ******ebook converter DEMO Watermarks*******

ip route-static vpn-instance s-vpn-instance-name dest-address { mask | mask-length } { next-hop-address [ public ] [ bfd controlpacket bfd-source ip-address | permanent | track track-entry-number ] | interface-type interface-number [ next-hop-address ] [ backupinterface interface-type interface-number [ backup-nexthop backupnexthop-address ] [ permanent ] | bfd { control-packet | echopacket } | permanent ] | vpn-instance d-vpn-instance-name next-hopaddress [ bfd control-packet bfd-source ip-address | permanent | track track-entry-number ] } [ preference preference-value ] [ tag tag-value ] [ description description-text ] undo ip route-static vpn-instance s-vpn-instance-name dest-address { mask | mask-length } [ next-hop-address [ public ] | interfacetype interface-number [ next-hop-address ] | vpn-instance d-vpninstance-name next-hop-address ] [ preference preference-value ] vpn-instance s-vpn-instance-name

Specifies a source MPLS L3VPN by its name, a case-sensitive string of 1 to 31 characters. Each VPN has its own routing table, and the configured static route is installed in the routing tables of the specified VPNs. dest-address

Specifies the destination IP address of the static route, in dotted decimal notation. mask

Specifies the mask of the IP address, in dotted decimal notation. mask-length

Specifies the mask length in the range of 0 to 32. vpn-instance d-vpn-instance-name

Specifies a destination MPLS L3VPN by its name, a case-sensitive string of 1 to 31 characters. If a destination VPN is specified, packets will search for the output interface in the destination VPN based on the configured next hop address. next-hop-address

Specifies the IP address of the next hop in the destination vpn-instance, in dotted decimal notation. backup-interface interface-type interface-number

Specifies a backup output interface by its type and number. If the backup


output interface is an NBMA interface or broadcast interface (such as an Ethernet interface, a virtual template interface, or a VLAN interface), rather than a P2P interface, you must specify the backup next hop address. backup-nexthop backup-nexthop-address

Specifies a backup next hop address. bfd

Enables BFD to detect reachability of the static route's next hop. When the next hop is unreachable, the system immediately switches to the backup route. control-packet

Specifies the BFD control mode. bfd-source ip-address

Specifies the source IP address of BFD packets. H3C recommends that you specify the loopback interface address. permanent

Specifies the route as a permanent static route. If the output interface is down, the permanent static route is still active. track track-entry-number

Associates the static route with a track entry specified by its number in the range of 1 to 1024. For more information about track, see High Availability Configuration Guide. echo-packet

Specifies the BFD echo mode. public

Indicates that the specified next hop address is on the public network. interface-type interface-number

Specifies an output interface by its type and number. If the output interface is an NBMA interface or broadcast interface (such as an Ethernet interface, a virtual template interface, or a VLAN interface), rather than a P2P interface, the next hop address must be specified.


preference preference-value

Specifies a preference for the static route, in the range of 1 to 255. The default is 60. tag tag-value

Sets a tag value for marking the static route, in the range of 1 to 4294967295. The default is 0. Tags of routes are used for route control in routing policies. For more information about routing policies, see Layer 3—IP Routing Configuration Guide. description description-text

Configures a description for the static route, which comprises 1 to 60 characters, including special characters like the space, but excluding the question mark (?). The routing tables of the devices are updated with the new routes. In Figure 3-23, the routing tables of VPN instance CustomerA and Shared-Internet (shared) are shown on the Core device.

Figure 3-23: Route leaking: Static route example

Both VPN instances CustomerA and Shared-Internet (shared) display the two additional static routes previously configured. CustomerA VPN instance contains the following static routes: ■ 10.2.0.0/16 with a next hop of 10.2.1.2 (CA-R1) in the CustomerA VPN instance. The NextHop (CA-R1) is also in the CustomerA VPN instance.


■ 0.0.0.0/0 with a next hop of 10.3.1.3 (Firewall) in the shared VPN instance. The NextHop (Firewall) is in a different VPN instance. The Shared-Internet (shared) VPN instance contains the following static routes: ■ 10.2.0.0/16 with a next hop of 10.2.1.2 (CA-R1) in the CustomerA VPN instance. The NextHop (CA-R1) is in a different VPN instance. ■ 0.0.0.0/0 with a next hop of 10.3.1.3 (Firewall) in the shared VPN instance The NextHop (Firewall) is also in the shared VPN instance. Connectivity between the CustomerA VPN instance and the Shared-Internet (shared) VPN instance can be verified by using ping for example, as shown in Figure 3-24.

Figure 3-24: Route leaking: Static route example (continued)

In this case the Firewall is able to ping a server with IP address 10.2.0.2 in the CustomerA VPN instance. Tracert shows that the path from the Firewall in the shared VPN instance traverses the Core device (both VPN instances) to reach the server in the CustomerA VPN instance.

Route leaking—Static route restrictions There are restrictions on static route leaking. Static routes can only be configured for remote IP subnets and not for directly connected subnets. The is because a next hop IP address must be configured as part of the static route command; and that address cannot be the local device where the static route is applied.


Multiprotocol BGP (MBGP) is required for inter-VLAN routing of directly connected subnets between VPN instances. In other words, the device with interfaces in different VPN instances will need to run MBGP to advertise directly connected subnets between the VPN instances. In the sample network in Figure 3-22, the Core device would need to run MBGP to route traffic from CustomerA to subnet 10.3.1.0/24, or to route from the SharedInternet VPN instance to subnet 10.2.1.0/24. MBGP configuration is out of the scope of this study guide.

Management access VPN instance In this section isolated management access using a separate VPN instance will be discussed. Most data center switches have dedicated management Ethernet ports. On chassis based devices, the management port is located on the management processing unit (MPU). On fixed port devices, the out of band management port is located either on the front or back of the device, as shown on Figure 3-25.

Figure 3-25: Management access VPN instance

A management Ethernet interface uses an RJ-45 connector. It can be used to connect a PC for software loading and system debugging, or connect to a remote device, for example, a remote network management station, for remote system management. It has the attributes of a common Ethernet interface, but because it is located on the main board, it provides much faster connection speed than a common Ethernet interface


when used for operations such as software loading and network management. The display interface brief command displays this interface as a M-Ethernet Interface. The Management Ethernet Interface is defined as a routed port in the configuration and therefore these ports cannot be used for switching operations, but only are for routed operations. To configure a management Ethernet interface, see Table 3-16. Table 3-16: How to configure a management Ethernet interface Step

Command

1. Enter system view. 2. Enter management Ethernet interface view

system-view

3. Set the description string

Remarks

interface interfacetype interface-number

description text

Optional By default, the description is MGigabitEthernet0/0/0 Interface

As the Management Ethernet port is a routed port, an IP address can be configured directly on the interface. Routed ports including the Management Ethernet port can also be bound to VPN instances. Once the management interface is bound to a specific VPN instance, management subnets are only available within the specific VPN instance and are no longer part of the public routing table. Any routing configuration such as default gateway or routing protocols would need to be configured for that VPN instance. Once IP connectivity is established in the management VPN instance, management protocols need to be configured to use the specified VPN instance. All management protocols use the public routing table by default. For example, a switch is configured to use RADIUS authentication for network operators using server with IP address 10.1.2.100. The switch will attempt to connect to the server using the public routing table by default. Even though the RADIUS server IP address may be reachable via the Management Ethernet port, RADIUS authentication would fail. The switch will not have 10.1.2.100 in the public routing table and will thus not be able to reach the RADIUS server using RADIUS. Management protocols including RADIUS need to be configured to use the correct


VPN instance instead of using the public routing table. This needs to be configured on a per protocol basis (telnet, SSH, RADIUS etc).

Management access VPN-Instance (1/2) Overview Figure 3-26 shows examples of SNMP, Syslog and NTP configured to use the mgmt VPN instance instead of the public routing table.

Figure 3-26: Management access VPN-Instance (1/2)

The first command is an example of SNMP trap host configuration. Any warning or informational messages or other events displayed on the switch can be copied to an SNMP server (NMS management system) using an SNMP trap. Host 10.0.1.100 could be an IMC server configured to receive SNMP traps of another NMS system. The snmp-agent target-host command specifies options such as trap, UDP domain and host IP address. In addition in this example, the command specifies the VPN instance to use when sending traps. If the vpn-instance mgmt option is not specified, the switch will attempt to contact the host using the public routing table. The second example shows the configuration for a syslog server. Once again the VPN instance mgmt option is used to specify that syslog messages are sent based on the VPN instance routing table rather than the public routing table. The third example configures NTP to use the correct VPN instance to reach the NTP server.

SNMP Agent ******ebook converter DEMO Watermarks*******

The SNMP Agent sends notifications (traps and informs) to inform the NMS of significant events, such as link state changes and user logins or logouts. Unless otherwise stated, the trap keyword in the command line includes both traps and informs. Enable an SNMP notification only if necessary. SNMP notifications are memoryintensive and may affect device performance. To generate linkUp or linkDown notifications when the link state of an interface changes, you must enable linkUp or linkDown notification globally by using the snmp-agent trap enable standard [ linkdown | linkup ] * command and on the interface by using the enable snmp trap updown command. After you enable a notification for a module, whether the module generates notifications also depends on the configuration of the module. For more information, see the configuration guide for each module. To enable SNMP traps, see Table 3-17. Table 3-17: How to enable SNMP traps Step

Command


system-view

2. Enable notifications globally.

snmp-agent trap enable [ bgp | configuration | ospf [ authentication-failure | bad-packet | config-error | grhelper-status-change | grrestarter-status-change | if-state-change | lsa-maxage | lsa-originate | lsdb-approachingoverflow | lsdb-overflow | neighbor-statechange | nssatranslator-status-change | retransmit | virt-authentication-failure | virt-bad-packet | virt-config-error | virtretransmit | virtgrhelper-status-change | virtif-state-change | virtneighbor-state-change ] * | standard [ authentication | coldstart | linkdown | linkup | warmstart ] * | system ]

Remarks

By default, all the traps are enabled globally.

You can configure the SNMP agent to send notifications as traps or informs to a host, typically an NMS, for analysis and management. Traps are less reliable and use fewer resources than informs, because an NMS does not send an acknowledgement when it receives a trap.


When network congestion occurs or the destination is not reachable, the SNMP agent buffers notifications in a queue. You can configure the queue size and the notification lifetime (the maximum time that a notification can stay in the queue). A notification is deleted when its lifetime expires. When the notification queue is full, the oldest notifications are automatically deleted. You can extend standard linkUp/linkDown notifications to include interface description and interface type, but must make sure that the NMS supports the extended SNMP messages. To send informs, make sure: ■ The SNMP agent and the NMS use SNMPv3. ■ Configure the SNMP engine ID of the NMS when you configure SNMPv3 basic settings. Also, specify the IP address of the SNMP engine when you create the SNMPv3 user. Configuration prerequisites ■ Configure the SNMP agent with the same basic SNMP settings as the NMS. You must configure an SNMPv3 user, a MIB view, and a remote SNMP engine ID associated with the SNMPv3 user for notifications. ■ The SNMP agent and the NMS can reach each other. To configure the SNMP agent to send notifications to a host, see Table 3-18. Table 3-18: How to configure the SNMP agent to send notifications to a host Step

Command


system-view

Remarks

(Approach 1) Send traps to the target host: snmp-agent target-host trap address udp-domain { ip-address | ipv6 ipv6address } [ udp-port port-number ] [ vpn-instance vpn-instance-name ] params securityname security-string [ v1 | v2c | v3 [ authentication | privacy ] ]

2. Configure a target (Approach 2) Send informs to the target host: snmp-agent target-host inform address host. udp-domain { ip-address | ipv6 ipv6address } [ udp-port port-number ] [ vpn-instance vpn-instance-name ]

Use either approach. By default, no target host is configured. Current software version does not support SNMPv1 and SNMPv2c. The v1 and v2c keywords are reserved at the CLI only


params securityname security-string { v2c | v3 [ authentication | privacy ] }

for future support..

Syslog The info-center loghost command takes effect only after information center is enabled with the info-center enable command. The device supports up to four log hosts. Use info-center Use undo

loghost

to specify a log host and to configure output parameters.

info-center loghost

to restore the default.

Syntax info-center loghost [ vpn-instance vpn-instance-name address | ipv6 ipv6-address } [ port port-number ] local-number ]

] { ipv4[ facility

undo info-center loghost [ vpn-instance vpn-instance-name ] { ipv4address | ipv6 ipv6-address } vpn-instance

vpn-instance-name: Specifies an MPLS L3VPN by its name, a casesensitive string of 1 to 31 characters. If the log host is on the public network, do not specify this option. ipv4-address

Specifies the IPv4 address of a log host within the VPN instance. ipv6 ipv6-address

Specifies the IPv6 address of a log host within the VPN instance. port port-number

Specifies the port number of the log host, in the range of 1 to 65535. The default is 514. It must be the same as the value configured on the log host. Otherwise, the log host cannot receive system information. facility local-number

Specifies a logging facility from local0 to local7 for the log host. The default value is local7. Logging facilities are used to mark different


logging sources, and query and filer logs.

Examples Output logs to the log host 1.1.1.1. system-view [Sysname] info-center loghost 1.1.1.1

NTP When you specify an NTP server for the device, the device is synchronized to the NTP server, but the NTP server is not synchronized to the device. To synchronize the PE to a PE or CE in a VPN, provide vpn-instance vpn-instancename in your command. If you include the vpn-instance vpn-instance-name option in the undo ntp-service unicast-server command, the command removes the NTP server with the IP address of ip-address in the specified VPN. If you do not include the vpn-instance vpninstance-name option in this command, the command removes the NTP server with the IP address of ip-address in the public network. Use ntp-service

unicast-server

Use undo ntp-service the device.

to specify an NTP server for the device.

unicast-server

to remove an NTP server specified for

Syntax ntp-service unicast-server { ip-address | server-name } [ vpninstance vpn-instance-name ] [ authentication-keyid keyid | priority | source interface-type interface-number | version number ] * undo ntp-service unicast-server { ip-address | server-name } [ vpninstance vpn-instance-name ] ip address

Specifies an IP address of the NTP server. It must be a unicast address, rather than a broadcast address, a multicast address or the IP address of the local clock. server-name

Specifies a host name of the NTP server, a case-insensitive string of 1 to


255 characters. vpn-instance vpn-instance name

Specifies the MPLS L3VPN to which the NTP server belongs, where vpn-instance-name is a case-sensitive string of 1 to 31 characters. If the NTP server is on the public network, do not specify this option. authentication-keyid keyid

Specifies the key ID to be used for sending NTP messages to the NTP server, where keyid is in the range of 1 to 4294967295. If the option is not specified, the local device and NTP server do not authenticate each other. priority

Specifies this NTP server as the first choice under the same condition. source interface-type interface-number

Specifies the source interface for NTP messages. For an NTP message the local device sends to the NTP server, the source IP address is the primary IP address of this interface. The interface-type interface-number argument represents the interface type and number. version number

Specifies the NTP version, where number is in the range of 1 to 4. The default value is 4.

Examples Specify NTP server 10.1.1.1 for the device, and configure the device to run NTP version 4. system-view [Sysname] ntp-service unicast-server 10.1.1.1 version 4

Management access VPN-Instance (2/2) RADIUS A RADIUS scheme specifies the RADIUS servers that the device can communicate with. It also defines a set of parameters that the device uses to exchange information with the RADIUS servers, including the IP addresses of the servers, UDP port


numbers, shared keys, and server types. Switches support the defining of multiple RADIUS or TACACS schemes. A switch could for example be configured with one RADIUS scheme for 802.1X authentication and a different RADIUS scheme for management authentication. The RADIUS servers referenced in each VPN instance could also be different. The vpn-instance command is used within the radius scheme to specify which VPN instance and RADIUS server is used for a particular scheme. Customers could have their own RADIUS servers which they may want to use for 802.1X authentication. By configuring the relevant vpn-instance on each customer RADIUS scheme, RADIUS packets would be sent within that VPN instance to the relevant customer RADIUS server rather than via the public routing table. A separate RADIUS scheme could also be configured for management authentication within a VPN instance. Figure 3-27 shows an example of RADIUS server configuration within the management VPN instance. This ensures that RADIUS authentication uses the mgmt VPN instance rather than the public routing table or a customer VPN.

Figure 3-27: Management access VPN-Instance (2/2)

Create a RADIUS scheme before performing any other RADIUS configurations. You can configure up to 16 RADIUS schemes. A RADIUS scheme can be referenced by multiple ISP domains. To create a RADIUS scheme, see Table 3-19. Table 3-19: How to create a RADIUS scheme


Step 1. Enter system view. 2. Enter RADIUS scheme view.

Command

Remarks

system-view

radius scheme radius-schemename

Specify the primary RADIUS authentication server: primary authentication { ipv4address | ipv6 ipv6-address } [ port-number | key { cipher | simple } string | vpn-instance vpn-instance-name ] *

3. Specify RADIUS authentication Specify a secondary RADIUS servers. authentication server:

secondary authentication { ipv4-address | ipv6 ipv6address } [ port-number | key { cipher | simple } string | vpninstance vpn-instance-name ] *

Configure at least one command. By default, no authentication server is specified. Two authentication servers in a scheme, primary or secondary, cannot have the same combination of IP address, port number, and VPN.

The VPN specified for a RADIUS scheme applies to all authentication and accounting servers in that scheme. If a VPN is also configured for an individual RADIUS server, the VPN specified for the RADIUS scheme does not take effect on that server. To specify a VPN for a scheme, see Table 3-20. Table 3-20: How to specify a VPN for a scheme Step

Command

1. Enter system view. 2. Enter RADIUS scheme view. 3. Specify a VPN for the RADIUS scheme.

system-view

Remarks

radius scheme radius-scheme-name vpn-instance vpninstance-name

By default, a RADIUS scheme belongs to the public network.

sFlow/Netflow The second command in Figure 3-27 shows an example of sFlow configured to use


the mgmt VPN instance when communicating with sFlow collector 10.0.1.100. To configure the sFlow agent and sFlow collection information, see Table 3-21. Table 3-21: How to configure the sFlow agent and sFlow connection information Step

Command


system-view

Remarks

2. (Optional.) Configure an IP address for the sFlow agent.

sflow agent { ip ipaddress | ipv6 ipv6address }

By default, no IP address is configured for the sFlow agent. The device periodically checks whether the sFlow agent has an IP address. If not, the device automatically selects an IPv4 address for the sFlow agent but does not save the IPv4 address in the configuration file. It is recommended that you manually configure an IP address for the sFlow agent. Only one IP address can be configured for the sFlow agent on the device, and a newly configured IP address overwrites the existing one.

3. Configure the sFlow collector information.

sflow collector collector-id [ vpninstance vpn-instancename ] { ip ip-address | ipv6 ipv6-address } [ port port-number | datagram-size size | time-out seconds | description text ] *

By default, no sFlow collector information is configured.

4. (Optional.) Specify the source IP address of sFlow packets.

sflow source { ip ipaddress | ipv6 ipv6address } *

By default, the source IP address is determined by routing.


OpenFlow The third command in Figure 3-27 shows an example of OpenFlow configured to communicate with an OpenFlow controller (10.0.1.100) using the mgmt VPN instance. The number of controller supported by an OpenFlow switch is switch dependent. The OpenFlow channel between the OpenFlow switch and each controller can have only one main connection, and the connection must use TCP or SSL. The main connection must be reliable and processes control messages to complete tasks such as deploying entries, obtaining data, and sending information. To specify a controller for an OpenFlow switch and configure the main connection to the controller, see Table 3-22. Table 3-22: How to specify a controller for an OpenFlow switch and configure the main connection to the controller Step 1. Enter system view. 2. Enter OpenFlow instance view. 3. Specify a controller and configure the main connection to the controller.

Command

Remarks

system-view

openflow instance instance-id

controller controller-id address { ip ip-address | ipv6 ipv6-address } [ port port-number ] [ ssl sslpolicy-name ] [ vrf vrf-name ]

By default, an OpenFlow instance is not configured with any main connection.

IMC Management Access using VPN-Instance IMC can be used for network management in conjunction with VPN instances. No additional configuration is required within IMC to support basic device management, status reporting and SNMP polling of devices. IMC will simply try to reach the device via the configured IP address and the device will respond from that IP address. IMC is unaware that it has been configured within a VPN instance. However, when using IMC device discovery, the option "Automatically register to receive SNMP traps from supported devices" needs to be unchecked (turned off).


This IMC option is not VPN instance aware and will configure the devices to send SNMP traps to the IMC server using the public routing table instead of the correct VPN instance. Figure 3-28 shows an example of the configured target-host command as configured by IMC. No VPN instance has been configured and traps will not reach the IMC server at IP address 10.0.1.100 because the public routing table is used instead of the VPN instance.

Figure 3-28: IMC Management Access using VPN-Instance

Ensure that the IMC option is unchecked and that the correct command is manually configured on the device with the correct VPN instance. A feature of IMC that must be configured when working with VPN instances is the Intelligent Configuration Center. The Intelligent Configuration Center is part of the basic IMC platform and provides automated deployment of configurations as well as backup and restore configuration options. Backups of device configurations requires additional setup when used with VPN instances. Failing to configure this will result in backups failing as can be seen in Figure 3-29.


Figure 3-29: Failed backups of device configurations

The reason for the failure is that IMC is unaware of the VPN instance by default. IMC will instruct the network device to backup the configuration to the IMC server running a local TFTP server. IMC will use SNMP set commands to initiate the TFTP backup from the device. Included in the SNMP set messages are the backup filename to be used and the TFTP server IP address. IMC does not however specify any VPN instance by default. Since IMC has only included the TFTP server address and filename in the SNMP set messages, when the device initiates the TFTP backup, it will use the public routing table and thus the backup will fail (the IMC server is not reachable from the public routing table). As discussed previously, the TFTP upload from the network device needs to use the management VPN instance rather than the public routing table. IMC can be configured to include the required VPN instance name when instructing a device to backup its configuration. The SNMP set instructions sent to the device will thus include the VPN instance in addition to the filename and TFTP server IP address. This option is configured by selecting the following options (see Figure 3-30): 1. Configuration Center menu 2. Options menu 3. VPN instance tab 4. For each device, selecting the VPN Instance Name to use for that device.



When IMC instructs the device to initiate a backup, the SNMP set instruction will include the specified VPN instance name. Note The specified VPN instance must be defined on the network device. The output of a successful backup is shown in Figure 3-31.

Figure 3-31: Output of a successful backup

A core network device may have multiple interfaces configured with IP addresses in multiple VPN instances. Any one of these IP addresses could be used for the management of the device and this includes IP addresses configured within customer VPN instances. That means that a customer may attempt to telnet to a core device or use snmp to configure the device. For security reasons, it is undesirable to permit any management access to core network devices from customer VPN instances.


Access control lists (ACLs) can be configured to only allow management access from specific VPN instances such as the management VPN instance. The vpninstance option is available on Comware ACLs to only allow access from specified VPN instances. In Figure 3-32, an access list 2001 is configured with an entry that only permits the IMC host (IP address 10.0.100) configured in mgmt VPN instance. The access list is then bound to various management protocols such as telnet, SSH, HTTP and others. The management protocols are therefore restricted to only allow access from host 10.0.1.100 in the mgmt VPN instance.


The ACL should be applied to all management protocols on the device to ensure customers are not able to connect the device. Note Comware device ACLs have an implicit permit by default when used as packet filers. However, in this example, the ACL is used to limit management protocols and in this case, the default action is deny. This is the opposite of the behavior of packet filters that filter traffic passing through the device.

ACLs An access control list (ACL) is a set of rules (or permit or deny statements) for identifying traffic based on criteria such as source IP address, destination IP address, and port number. Table 3-23 is a list of ACL categories. Table 3-23: ACL categories


Each ACL category has a unique range of ACL numbers. When creating an ACL, you must assign it a number. In addition, you can assign the ACL a name for ease of identification. After creating an ACL with a name, you cannot rename it or delete its name. For an IPv4 basic or advanced ACLs, its ACL number and name must be unique in IPv4. For an IPv6 basic or advanced ACL, its ACL number and name must be unique in IPv6. The rules in an ACL are sorted in a specific order. When a packet matches a rule, the device stops the match process and performs the action defined in the rule. If an ACL contains overlapping or conflicting rules, the matching result and action to take depend on the rule order. The following ACL match orders are available: ■ config—Sorts ACL rules in ascending order of rule ID. A rule with a lower ID is matched before a rule with a higher ID. If you use this approach, carefully check the rules and their order. ■ auto—Sorts ACL rules in depth-first order. Depth-first ordering makes sure any subset of a rule is always matched before the rule. Table 1 lists the sequence of tie breakers that depth-first ordering uses to sort rules for each type of ACL. The match order of user-defined ACLs can only be config. Sort ACL rules in depth-first order, as shown in Table 3-24: Table 3-24: Sort ACL rules in depth-first order ACL category Sequence of tie breakers IPv4 basic

VPN instance More 0s in the source IP address wildcard (more 0s means a


ACL

narrower IP address range) Rule configured earlier VPN instance Specific protocol type rather than IP (IP represents any protocol over IP) IPv4 advanced More 0s in the source IP address wildcard mask ACL More 0s in the destination IP address wildcard Narrower TCP/UDP service port number range Rule configured earlier A wildcard mask, also called an inverse mask, is a 32-bit binary number represented in dotted decimal notation. In contrast to a network mask, the 0 bits in a wildcard mask represent "do care" bits, and the 1 bits represent "don't care" bits. If the "do care" bits in an IP address are identical to the "do care" bits in an IP address criterion, the IP address matches the criterion. All "don't care" bits are ignored. The 0s and 1s in a wildcard mask can be noncontiguous. For example, 0.255.0.255 is a valid wildcard mask.

Telnet server acl Use telnet Use undo

server acl

to apply an ACL to filter Telnet logins.

telnet server acl

to restore the default.

Only one ACL can be used to filter Telnet logins, and only users permitted by the ACL can Telnet to the device. This command does not take effect on existing Telnet connections. You can specify an ACL that has not been created yet in this command. The command takes effect after the ACL is created.

Syntax telnet server acl acl-number undo telnet server acl acl-number

Specifies an ACL by its number: ■ Basic ACL—2000 to 2999. ■ Advanced ACL—3000 to 3999. ■ Ethernet frame header ACL—4000 to 4999.


Examples Permit only the user at 1.1.1.1 to Telnet to the device. system-view [Sysname] acl number 2001 [Sysname-acl-basic-2001] rule permit source 1.1.1.1 0 [Sysname-acl-basic-2001] quit [Sysname] telnet server acl 2001

Summary In this chapter, you learned about Multi-CE (MCE). MCE enables a switch to function as a Customer Edge (CE) device of multiple VPN instances in a BGP/MPLS VPN network, thus reducing network equipment investment. You learned about MCE features and supported platforms. MCE use cases such multi-tenant datacenters, overlapping IP subnets, isolated management networks and others were discussed. You learned the basic configuration steps for configuring MCE including: i. Define a new VPN instance. ii. Set Route-Distinguisher iii. Bind an L3 interface to the VPN-instance iv. Configure L3 interface IP address v. Optionally, configure L3 dynamic or static routing Advanced MCE configuration options were also discussed: ■ Routing table limits ■ Route leaking (both static and dynamic) ■ Management access VPN instances

Learning Check Answer each of the questions below. 1. Is MBGP required to implement MCE on CE devices? a. Yes, as route leaking requires MBGP.


b. Yes, otherwise routes are not advertised to PE devices. c. No, MCE does not require MBGP except when routing for directly connected subnets in different VPN instances. d. No, MCE only uses static routing to route between subnets in different VPN instances.

2. Which components are part of a VPN instance (Choose four)? a. Separate LFIB b. Global routing table c. VPNv4 routes d. Public routing table e. Separate routing table f. Interfaces bound to the VPN instance g. RD 3. Which interface type cannot be allocated to a VPN instance? a. Layer 3 VLAN interfaces b. Routed ports c. Loopback interfaces d. Layer 2 VLAN interfaces e. Routed subinterfaces 4.

An administrator has configured IMC in a VPN instance with the name "management". IMC is not receiving SNMP trap messages. How does the administrator resolve this? a. Move the IMC server out of the VPN instance as this is an unsupported setup. b.

Ensure that the "Automatically register to receive SNMP traps from supported devices" option is checked within IMC.

c. Configure and select the VPN instance within the IMC GUI interface. d. Manually configure the SNMP target host for traps on the network device.

Learning Check Answers 1. c 2. a, e, f, g 3. d


4. d


4 DCB Datacenter Bridging

EXAM OBJECTIVES In this chapter, you learn to: ✓ Describe the DCB Protocols. ✓ Understand the DCBX protocol. ✓ Understand and configure PFC. ✓ Understand and configure ETS. ✓ Understand and configure APP. ✓ Understand Congestion Notification. ✓ Describe datacenter use cases for CEE.

INTRODUCTION Using separate, single-purpose networks for data and storage can increase complexity and cost, as compared to a converged network solution. Datacenter Bridging (DCB) is a technology that enables the consolidation of IP-based LAN traffic and block-based storage traffic onto a single converged Ethernet network. This can help to eliminate the need to build separate infrastructures for LAN systems that carry typical end-user data traffic, and SAN systems that carry storage-specific communications. You will learn about the individual standards-based protocols that enable DCB, and how they enable communication between devices, provide for lossless Ethernet transmissions, handle flow control, and support Quality of Service (QoS).


ASSUMED KNOWLEDGE You should have a basic understanding of Data Center Bridging (DCB) protocols and configuration parameters and be familiar with the features of Fibre Channel Protocol (FCP), InfiniBand (IB), and iSCSI.

DCB Topics This chapter will introduce the concepts related to DCB, and review DCB configuration parameters. Priority-based Flow Control (PFC) will be explored, along with PFC configuration. The operation and configuration of the Application TLV (APP) will discussed, along with an overview of ETS. You will also learn how to configure ETS.

Datacenter Bridging—Introduction Data Center Bridging consists of a collection of standards that extend the functionality of Ethernet. Various vendors have used different acronyms, such as CEE, when discussing or promoting their own DCB-based solutions. However, the IEEE standards group uses the term DCB to describe the suite of technologies that enable FCoE to send Fibre Channel communications over Ethernet systems. The motivation for DCB is to reduce the cost and complexity of running separate, single-purpose networks for SANs and LANs. The consolidation of data center infrastructure reduces the number of physical components, along with the associated costs of rack space, device power, and cooling costs. DCB offers advantages over previous technologies such as Fibre Channel Protocol (FCP), InfiniBand (IB), and iSCSI, as described below and shown in Figure 4-1.


Figure 4-1: DCB vs Previous Technologies

DCB vs Previous Technologies Fibre Channel Protocol (FCP) is a lightweight mapping of SCSI to the Fibre Channels transport protocol. Fibre Channel can carry FCP and IP traffic to create a converged network. However, the cost of FC prevented widespread use, except for large data center SANs. InfiniBand (IB) provides for a converged network using SCSI Remote Direct Memory Access Protocol (SRP) or iSCSI Extensions for RDMA (iSER). Widespread deployment was also limited due to cost, and the complex gateway and routers needed to translate from IB to native FC storage devices. Internet SCSI (iSCSI) provides a direct SCSI to TCP/IP mapping layer. Due to its lower cost, iSCSI can appeal to small-medium sized deployments. However, scaling the systems requires more complexity and cost in the form of iSCSI to FC gateways, and so this solution is often avoided by larger enterprises.


FC over IP (FCIP) and FC Protocol (FCP) can map FC characteristics to LANs and WANs. Again, these protocols were not widely adopted due to complexity, lack of scalability, and cost. Now that 10GbE is becoming more widespread, Fibre Channel over Ethernet (FCoE) is the next attempt to converge block storage protocols onto Ethernet. FCoE embeds FC frames within Ethernet frames, and relies on the Ethernet infrastructure that has been enhanced by implementing IEEE Data Center Bridging (DCB) standards. The individual protocols and components that enable FCoE traffic to be supported over Ethernet are described below.

DCB Components The standards-based protocols and components of DCB are shown in Figure 4-2 and introduced below.

Figure 4-2: DCB Components

■ DCBX: The Data Center Bridging eXchange protocol is used to communicate key parameters between DCB-capable devices. The information exchanged is largely centered on PFC, APP, and ETS functionality. ■ PFC: Priority-based Flow Control helps to ensure that Ethernet can provide the lossless frame delivery that FCoE requires. ■ APP: Provides instructions to CNA about application-to-CoS mapping. ■

ETS: Enhanced Transmission Selection enables control over how much bandwidth LAN, SAN, and other traffic types can use over a converged Ethernet link.

■ CNA: A Converged Network Adapter can support both Fibre Channel and traditional LAN communications on a single interface. ■ CN: Congestion Notification supports end-to-end flow control in an attempt to localize the effects of congestion to the device that is causing it.


DCB Feature Overview DCBX enables devices to discover peers, detect configuration parameters and configure peer CNAs. It is an extension to LLDP, adding new type-length-values (TLVs) that enable the exchange PFC, APP, and ETS information. In Figure 4-3, the server access switch sends DCBX frames to automatically configure the server’s CNA. The APP TLV is used to inform the CNA that FCoE frames are to be marked with an 802.1p value of 3.

Figure 4-3: DCB Feature Overview

ETS controls bandwidth utilization. In this example, the FCoE traffic (802.1p = 3) shall be mapped to ETS queue 1, and have 60% of the bandwidth reserved (during times of congestion). All other traffic (802.1p = 0-2, 4-7) shall be mapped to ETS queue 0, and have access to 40% of the bandwidth. PFC is an enhancement to the Ethernet Pause feature, which uses Pause frames to pause all traffic on a link. It is as if PFC is logically dividing a single physical link into multiple virtual links, reserving one such link for FCoE. Thus, the pause mechanism can stop all traffic other than that specified as no-drop, ensuring that FCoE frames are not dropped due to a short-lived burst of LAN traffic. As shown in Figure 4-3, this PFC information is also passed (via DCBX) between the server access switch and the storage switch. This ensures that the lossless frame requirement for FCoE is enforced all the way between the Server CNA and target SAN.

DCB—Supported Products ******ebook converter DEMO Watermarks*******

The features introduced thus far are available on all HP datacenter switches running Comware 7, including both fixed configuration access switches and chassis-based core switches.

Access switches At the access layer, the 5900 Switch Series support DCB-compliant features. This also includes the HP 5920 and 5930 switches.

Core switches Chassis-based switches suitable for deployment at the datacenter core also support DCB. This includes the 11900/12500/12900 Switch Series.

Full HP Supported configuration limited to select products HP has gone to great lengths to fully ensure that various product combinations support the features you need. HP has created the Single Point of Connectivity Knowledge (SPOCK) as the primary portal for detailed information about HP storage products. It is highly recommended that you consult SPOCK to ensure that you are deploying systems that have been fully tested by HP. The current URL for SPOCK is http://h20272.www2.hp.com/.

Design Considerations Migration from traditional storage to FCoE-based systems can be gradual. Deploy FCoE first at the server-to-network edge. Then migrate further into aggregation/core layers and storage devices over time. Transitioning the server-to-network edge first to accommodate FCoE/DCB will maintain existing network architecture, management roles, and the existing SAN and LAN topologies. This approach offers the greatest benefit and simplification without disrupting the data center architecture. You should also consider implementing FCoE only with those servers requiring access to FC SAN targets. Most data center assets only need a LAN connection, as opposed to both LAN and SAN connections. You should use CNAs only with the servers that actually benefit from them. Don’t needlessly change the entire infrastructure. ProLiant c-Class BladeSystem G7 and later blade servers come with HP FlexFabric adapters (HP CNAs) as the standard LAN-on-Motherboard (LOM) devices. This


provides a very cost effective adoption of FCoE technology. FlexFabric modules eliminate up to 95% of network sprawl at the server edge. One device converges traffic inside enclosures and directly connects to LANs and SANs. During design and implementation, remember that Ethernet Maximum Transmission Unit (MTU), or maximum frame size is 1518 bytes, while the MTU for FCoE is 2240 bytes. This so-called baby jumbo frame size must be supported on all devices between FCoE-capable servers and storage systems. Also, FCoE uses specific MAC addresses, as listed in Figure 4-4. You must ensure that these MACs are not blocked.

Figure 4-4: Design Considerations

DCBX—Data Center Bridging eXchange DCBX is an extension to LLDP that facilitates connectivity between DCB-enabled devices. As defined in the IEEE 802.1Qaz standard, DCBX accomplishes this by adding TLVs to LLDP. This can be compared to LLDP-MED, which defines extensions to LLDP to facilitate connectivity between Voice-over-IP (VoIP) endpoints and switches. Version 1.00 was the initial public version of DCBX. The main goal for version 1.00 was to enable automatic, priority-based flow control mapping. This included limited support for L2 marking, allowing the switch to inform the CNA that FCoE frames should be placed into a specific queue. Version 1.01 enhances this marking capability. In addition to the L2-based classification, L4-based classification is also supported. Thus, TCP and UDP ports could be used as a basis for classification. This is critical for iSCSI, which is a TCP-based storage protocol.


After v1.01, more refinements were made, and the IEEE 802.1Qaz standard was ratified. A key enhancement concerns the CNA's operational settings. In previous versions, a switch announced information toward the CNA, and the CNA only announced its original settings back to the switch. This could make troubleshooting difficult, since there is no definitive method to ensure that the settings were actually accepted by the CNA. With the 802.1Qaz standard version, the LLDP output reveals both the recommended settings, as announced by the switch, and the operational settings actually in use on the CNA. IEEE 802.1Qaz is the default version enabled in HP datacenter switches. While you can manually configure the version in use, it is a best practice to allow the switch to automatically detect which version to use. The switch will detect whether an attached CNA only supports v1.00 or v1.01, and will adjust accordingly.

Configuration Steps for DCBX Following are the steps to configure DCBX on HP Datacenter switches running Comware 7. 1. Enable global LLDP 2. Enable the DCBX TLVs for LLDP on the interface 3. Verify These steps will be detailed below.

DCBX Step 1: Enable Global LLDP The first step to enabling DCBX is to ensure that LLDP is enabled globally on the device. With many Comware-based devices, LLDP is globally enabled by default, but some Comware-based devices require the command shown. The easiest approach is simply to issue the command, as shown in Figure 4-5, and verify that the feature is enabled with the “display lldp status” command.

Figure 4-5: DCBX Step 1: Enable Global LLDP

DCBX Step 2: Enable Interface LLDP DCBX TLVs ******ebook converter DEMO Watermarks*******

The next step involves enabling the DCBX-specific TLVs on the interface to which a CNA is attached. Assuming LLDP has been enabled globally, LLDP is enabled by default at the interface level. However, the TLVs for DCBX must be manually enabled, as shown in Figure 4-6.

Figure 4-6: DCBX Step 2: Enable Interface LLDP DCBX TLVs

DCBX Step 3: Verify Figure 4-7 shows the syntax to verify LLDP has been configured on the Comware switch to support the DCBX TLVs. You notice that the DEFAULT column indicates that DCBX TLVs are not enabled by default, but that YES is in the STATUS column for DCBX TLVs.

Figure 4-7: DCBX Step 3: Verify

Ethernet Flow Control Priority-based Flow Control (PFC) is defined in the IEEE 802.1Qbb standard. 802.1Qbb allows the network to provide link-level flow control for different classes of traffic. The goal is to provide lossless Ethernet, which is a strict requirement for FCoE. This is because fibre channel assumes that frames will not be dropped under normal circumstances.


Ethernet, however, assumes that frames can be dropped because higher level protocols, such as TCP deal with this issue. As shown in Figure 4-8, Ethernet does include a “pause” feature which can be used for flow control. As switch buffers begin to fill, and frame drops will soon occur the switch can send a pause message on the link. This causes the upstream device to stop sending frames.

Figure 4-8: Ethernet Flow Control

The problem with this mechanism is that there is no way to pause only certain types of traffic. Either all frames are paused, or none are paused. This is actually fine for FCoE traffic. It is best to wait until the switch indicates that it can once again accept frames, instead of risking dropped frames, which is unacceptable. However, during this time, all TCP/IP traffic on a converged network would also be paused. TCP/IP protocol stacks are designed to handle packet loss at upper layers, such as TCP, and for some application-layer protocols that use UDP. Therefore, pausing TCP/IP frames is creates unnecessary delays in transmissions, reducing overall performance for LAN traffic. The solution is to have a priority-based flow control mechanism, as described below.

PFC—Enhancing Ethernet Flow Control Priority-based Flow Control, as the name implies, enhances the functionality of the original Ethernet flow control mechanism. Specifically, 802.1Qbb only pauses frames that are tagged with a certain 802.1p CoS value, such as FCoE traffic. Meanwhile, LAN traffic, marked with different CoS values, would be unaffected. Beyond the lossless requirement of FCoE, consider the typical network shown in Figure 4-9. It is possible that the downstream storage network may experience high utilization, causing buffers to fill up. The PFC mechanism issues a PAUSE frame for storage traffic, and so the CNA stops transmitting.


Figure 4-9: PFC – Enhancing Ethernet Flow Control

Since the downstream data network is not experiencing congestion, there is no reason to pause this traffic. As mentioned, TCP/IP stacks can handle FCoE frames, allowing downstream SAN device buffers to handle frame drops, but in this scenario, there is little danger of drops in the data network anyway. In this manner, PFC provides a lossless Ethernet medium for FCoE traffic, without negatively affecting LAN traffic.

PFC—Configuration Modes There two available configuration modes for PFC. Manual mode is used for switchto-switch links, since switches do not use DCBX to configure each other. Since there is no negotiation frames sent between switches, each switch must be locally configured with compatible PFC parameters. Automatic mode is used for links connecting switches to endpoint devices, such as servers and storage systems. For these links, only the switch need be configured for PFC. This is because switches use DCBX to exchange configuration information with the endpoint CNA, which can adopt the switches proposed configuration.

Configuration Steps for PFC Manual Mode Figure 4-10 introduces the steps required to configure PFC for switch-to-switch links. This includes enabling PFC on the interface in manual mode, and specifying the 802.1p value to be used for lossless traffic.


Figure 4-10: Configuration Steps for PFC Manual Mode

PFC Manual Step 1: Enable Interface PFC Mode Step 1, as shown in Figure 4-11, involves enabling PFC on the interfaces that connect to other switches. This configuration would of course be repeated at the switch on the other side of the link.

Figure 4-11: PFC Manual Step 1: Enable Interface PFC Mode

PFC Manual Step 2: Enable Lossless for Dot1p The second step is also configured at the interface, informing the switch which 802.1p value is to be used for lossless Ethernet. In the example in Figure 4-12, an 802.1p CoS value of 3 is specified.

Figure 4-12: PFC Manual Step 2: Enable Lossless for Dot1p

Although the 802.1Qbb standard supports multiple lossless values, most hardware only supports a single lossless queue. For this reason, only one 802.1p value may be specified for lossless, no-drop service. Since FCoE-based traffic is the reason for using PFC, this does not create a practical limitation.

Configuration Steps for PFC Auto Mode Before configuring PFC in auto mode, DCBX must first be enabled on the interface. This was discussed in the previous section of this chapter. The steps to configure PFC for switch-to-endpoint links includes enabling PFC on the


interface in auto mode, specifying the 802.1p value to be used for lossless traffic, and then verifying that the configuration has been successful.

PFC Auto Step 1: Enable Interface PFC Mode The configuration for PFC in auto mode is similar to that for configuring manual mode. The difference is in the use of the keyword “auto”, as shown in Figure 4-13.

Figure 4-13: PFC Auto Step 1: Enable Interface PFC Mode

PFC Auto Step 2: Enable Lossless for Dot1p The second step for PFC auto is identical to that for PFC manual mode. The 802.1p value to be used for lossless Ethernet is specified at the interface configuration level. In the example in Figure 4-14, an 802.1p CoS value of 3 is specified.

Figure 4-14: PFC Auto Step 2: Enable Lossless for Dot1p

PFC Auto Step 3: Verify The final step involves validating the configuration. LLDP local information is displayed for the specific interface configured, to verify which 802.1p value is enabled for lossless Ethernet. In Figure 4-15, much of the initial output is not displayed, in order to focus on pertinent DCBX PFC information. In bold, you can see that PFC lossless Ethernet is enabled for 802.1p value of 3, as indicated by a number 1. It is off for other 802.1p values, as indicate by a numeral 0.


Figure 4-15: PFC Auto Step 3: Verify

In the second example in Figure 4-15 LLDP neighbor information is displayed for the specific interface configured. Notice that the use of the “verbose” keyword is necessary to see detailed information, such as which PFC values have been accepted and currently configured on the endpoint CNA.

APP—Application TLV APP is defined by the DCBX standard, and allows the switch to program application layer QoS rules on the CNA. For a typical switch configuration, the admin must define access rules, to be used as match conditions for a QoS policy. The policy is then applied to a specific interface, so when data exits the interface, it will be checked against the classifier and takes some action. For example, the action could be to elevate or decrease its priority, place it in a different queue, or drop the packet. This operation happens in an ASIC on the switch. The CNA has a similar ASIC, and so also has the capability of performing traffic selection and queuing operations. However, some server administrators are not fluent with these types of rules. The APP TLV allows the network administrator to configure the switch, which will propose QoS policy to the CNA. In this way, the CNA dynamically learns QoS rules, and uses them when it transmits frames to the network. To implement this functionality, traditional QoS mechanisms must be defined, and then applied to the APP TLV feature. This process starts by defining a classifier. With APP, traffic can be classified based on the Layer 2 Ethertype field, or on the Layer 4


TCP or UDP destination port number. For example, FCoE uses Ethertype 0x8906, and all iSCSI traffic uses TCP destination port 3260. Traffic is classified by using advanced ACLs, which can permit or deny traffic based on several different criteria. This includes protocol, source and destination port, source and destination IP address, and more. However, when used to select traffic for APP, we lack the wealth of rules of a traditional switch. The hardware on both the switch and CNA can understand this advanced functionality, but the APP TLV is a fairly simple, lightweight system that is limited in its capability to deliver information. The APP TLV only accommodates the exchange of Layer 2 protocol Ethertype, or Layer 4 TCP/UDP destination port number. This means that when you configure an ACL in order to classify traffic for APP, all fields are ignored except Ethertype and destination port. To configure QoS on an HP switch running Comware, an ACL is defined, and then bound to a traffic classifier. The classifier is an object that describes the traffic to have certain behaviors, or treated in a specific way for queuing services. When configuring QoS specifically for DCBX, only two types of ACLs may be used. The only options are an Ethernet ACL or an advanced ACL. Classifiers can have multiple conditions, and Boolean logic can be used to control this match criteria. Should multiple criteria be specified as match conditions for a classifier, it uses a logical AND operator by default. Therefore, all criteria would have to match, in order for the traffic to be considered in the class. For the APP TLV, a Boolean OR operator must always be used. For example, you may create an Ethernet ACL to specify FCoE traffic, and an advanced ACL to specify iSCSI traffic, and then apply both of these ACLs as match conditions in a classifier object. If you use a logical AND in this case, the condition would never be met, since packets cannot be both iSCSI AND FCoE. In that case, the classifier will be ignored. You define a classifier to select appropriate traffic, and then you define a behavior to specify how that traffic is processed. A behavior defines what actions are to be taken when the condition is matched. In this example, we want to ensure that a certain class is marked with an 802.1p CoS value. This remarked value is sent to the server CNA via the APP TLV, which the CNA accepts and conforms to. A QoS policy consists of a set of classifiers, which are bound to certain behaviors.


QoS policy classifiers can be defined for traditional usage, to locally modify the switches own QoS mechanisms. Classifiers may also be defined for the APP TLV, to modify the QoS mechanism on an attached server’s CNA. You must inform the switch of this by using the “mode DCBX” syntax. Only the rules that have these keywords will be sent to the CNA. This configuration model is quite flexible. You can decide which rules are locally significant to modify traffic classification and behavior for the switch itself, and which rules are to send toward the CNA to modify its behavior. Once classifiers and behaviors are defined in a policy, that policy must be applied in order to take effect. This QoS policy must be applied in the outbound direction, either to an interface, or at the global configuration level.

Configuration Steps for APP Figure 4-16 shows the steps to configure the APP TLV feature. DCBX configuration is a prerequisite to configuring the APP TLV feature. Then ACLs are configured and applied to a classifier object. Behavior objects are then defined, and the two are tied together into a QoS policy. Finally, the QoS policy is activated, and the configuration is verified.


Figure 4-16: Configuration Steps for APP

APP Step 1: Configure Traffic ACLs for Layer 2 The first step is to configure ACLs for Layer 2 traffic classes. Recall that Ethernet ACLs are used to describe FCoE frames at Ethertype 0x8906. Figure 4-17 shows an example ACL, configured to specify Ethertype 0x8906 for FCoE. The number 4000 is used in this example, since Ethernet ACL’s are specified by the numbers 4000-4999. Optionally you can also create named ACLs.

Figure 4-17: APP Step 1: Configure Traffic ACLs for Layer 2

Note that you must specify an exact match on this Ethertype by using an “all ones” mask of 0xFFFF. This is not an inverse or wild card mask, so 0xFFFF specifies that the entire specified pattern of 8906 must match exactly. The example also shows how to add comments to an ACL for documentation purposes.


Remember that the ACL simply provides a description of the traffic for classification purposes, not for security purposes. This helps to explain why the use of permit or deny has no effect in this use case. Whether you use permit or deny, the traffic indicated will be considered a match.

APP Step 2: Configure Traffic ACLs for Layer 4 As in the previous step, this step involves defining an ACL, to be used for traffic classification. Instead of an Ethertype ACL for Layer 2 traffic, an advanced ACL is used, typically to select iSCSI traffic. As before, the permit or deny keyword is not relevant. Also, for the DCBX APP TLV feature, only destination port info is analyzed. Source port, IP source address, and the IP destination address fields are ignored. Figure 4-18 shows a typical example, used to specify iSCSI traffic at TCP port 3260. The ACL number used is 3000, since advanced ACLs are in the range between 30003999.

Figure 4-18: APP Step 1: Configure Traffic ACLs for Layer 2

APP Step 3: Configure QOS Traffic Classifier Once an ACL is created, it must be bound to a traffic classifier. QoS traffic classifiers can group one or more ACLs. You must be careful to use the OR operator, since the default operator is AND. Since it is impossible for a packet to be both FCoE and iSCSI, no traffic would match your classifier. The top example in Figure 4-19 reveals how to create a single classifier with two criteria. You might do this if you wanted to specify a lossless Ethernet service for both FCoE and iSCSI traffic.


Figure 4-19: APP Step 3: Configure QOS Traffic Classifier

If you are only interested in specifying FCoE traffic, the bottom example in Figure 419 could be used. In this example, there is an only one match criterion. However, the OR operator must still be used. Even with a single match criteria, the DCBX module will ignore the classifier unless the OR operator is used.

APP Step 4: Configure QOS Traffic Behavior Now that classifiers are configured, traffic behaviors are defined. This describes the action to be taken on a class. The DCBX module will only parse the dot1p behavior. While the CNA ASIC may be capable of more advanced behaviors, the APP TLV is limited to communicating this single behavior. You may recall from the previous section that with PFC, traffic marked with a specific dot1p value should receive no-drop, or lossless Ethernet service. The purpose of configuring the APP feature is to ensure that appropriate traffic is actually marked with this 802.1p value, so PFC can do its job. Since we configured PFC to provide lossless Ethernet for anything marked with an 802.1p value of 3, we configure APP to mark appropriate packets with that value. Figure 4-20 shows an example behavior for storage.

Figure 4-20: APP Step 4: Configure QOS Traffic Behavior


APP Step 5: Configure QOS Policy In this step, shown in Figure 4-21, the classifier and behavior are bound together in a QoS Policy. QoS policies are processed much like many ACLs are processed, in a top-down fashion. This means that the order of rules is critical.

Figure 4-21: APP Step 5: Configure QOS Policy

There are no rule numbers that can be used to change the order, should you accidentally enter rules in the wrong order. You must remove the policy rules and reapply them in the proper order. The critical element here is the integration of the QoS policy with DCBX. Remember, only QoS policy rules that include the “mode dcbx” option will be handled by DCBX, and communicated from the switch to the CNA.

APP Step 6: Activate the QoS Policy The final configuration step is to activate the QoS policy. This can be done at the global or interface level. Both must be applied in the outbound direction for DCBX. In the example in Figure 4-22, Interface Ten-GigabitEthernet 1/0/49 connects to a server CNA, so the policy is applied outbound to this interface.

Figure 4-22: APP Step 6: Activate the QoS Policy

APP Step 7: Verify Figure 4-23 reveals how to validate your configuration efforts. The top example shows how to verify what settings were proposed by the switch. The output has been truncated to focus on pertinent information. You can see that frames with an Ethertype of 0x8906 are assigned to a QoS map value of 0x8. As well see below, this translates to a CoS value of 3.


Figure 4-23: APP Step 7: Verify

The bottom example of Figure 4-23 reveals what information the neighboring CNA announces back to the switch. This validates whether the CNA has accepted the proposed values. Recall from our earlier discussion of DCBX that pre-standard versions v1.00 and v1.01 of DCBX will only announce their originally configured version, with no indication of whether or not they have accepted the proposed values. In that case, you would only be able to infer that the values have been accepted by noting successful operations of your system. The IEEE standard version of DCBX will display the accepted, operational values, as shown in the example. Figure 4-24 shows the 801.p to CoS map values used for Comware 7 devices. This explains why a value of 0x8 was displayed in the previous LLDP output. Comware switches map a CoS hex value of 0x8 to an 802.1p value of 3.


Figure 4-24: APP Step 7: Verify

APP—Other examples Figure 4-25 shows some other pertinent examples related to APP TLV functionality. In the top example, both iSCSI and FCoE is proposed to the CNA as being marked with an 802.1p value of 3 (Comware CoS 0x8).


Figure 4-25: APP - Other examples

The second example in Figure 4-25 hints at additional capabilities. As before, FCoE Ethertype 0x8906 is assigned to CoS map 0x8 (802.1p = 3). Also ports 8000 and 6600 are assigned to CoS map 0x4 (802.1p = 2). Port 6600 is the port used for VMWare vMotion, and port 8000 is used for some other application of interest to the network administrator. Now that these applications will be marked as indicated, QoS policy can be implemented to control this traffic. For example, an administrator can reserve 1Gbps for vMotion traffic, 2Gbps for the data application, and 4Gbps for storage.

ETS—Enhanced Transmission Selection While the APP TLV allows us to assign specific traffic classes to specific dot1p values, it does not allow us to specify the bandwidth and queuing mechanisms used by the CNA. So it is possible that all dot1p values would be assigned to the same queue on the CNA. Marking traffic types with a specific 802.1p value will have no effect if all the 802.1p values are processed by the same queue. ETS, as defined by the 802.1Qaz standard, allows the switch to specify which 802.1p value should be processed by which queue on the CNA, and how much bandwidth should be available for each queue. Marking protocols, such as 802.1p at Layer 2, and DSCP at Layer 3, have been standardized for years. However, actual queuing and scheduling mechanisms are


vendor defined. So, while we can mark packets in a standard way, how those packets are actually processed can be unique for each vendor. It is true that most vendors base their queue service on common mechanisms, such as weighted fair queuing, weighted round-robin queuing, or strict priority queuing. Still, the specifics of these mechanisms are not standardized. There was no standard for specifying how to service packets with a specific marking. 802.1Qaz defines such a standard. It describes ASIC queue scheduling and bandwidth allocation. The standard not only describes how scheduling should be done, it also defines how the CNA can be programed from the network switch for conformance. Thus, the switch can control how the CNA processes frames outbound, back toward the switch. The switch controls the number of queues (maximum 8) and CoS-to-queue mapping on the CNA. So the switch can dictate which 802.1p values map to which queues. For example the switch can indicate that packets marked with CoS value 3 shall be placed into queue number 2. The scheduling algorithm can also be controlled. Weights, which essentially translates to bandwidth, can be assigned to queues. ETS can control the number of queues used on the CNA. While the adapter may initially be configured to utilize two queues – one for data and one for storage traffic, it can be configured to leverage more queues. ETS allows us the option to assign particular dot1p values to specific queues. By default a single dot1p class is assigned to a single queue. Most physical switches support 8 queues per interface, so each of the eight 802.1p values gets its own queue. This mapping can be customized by modifying the switches dot1p-to-LP QoS map. There is one such map for the entire switch, so you cannot have unique mappings per interface. If this mapping is changed, it applies to how the local switch processes frames, and how the CNA will be instructed to process frames. The default maps each 802.1p value to its own queue. In Comware terminology, this means that each of the dot1p values is assigned to a unique local-precedence map. This local-precedence map controls how the Comware switch processes frames. If the local precedence value is 0, then the Comware switch places frames in queue 0. In the example in Figure 4-26, the default one-to-one mapping of 802.1p value to local precedence has been changed. In this configuration, only the 802.1p value of 3 is assigned to its own local-precedence value, and therefore its own queue. All other


802.1p values share queue 0. Essentially, this sets up a scenario where all the data shares a single queue, and the storage traffic gets a queue of its own.

Figure 4-26: ETS – Enhanced Transmission Selection

Several queuing options are provided by the ETS standard, as described here. Strict priority can be a good option to use for voice traffic. When congestion occurs, traffic in higher strict priority queues will be serviced first. Lower priority queues will not be serviced unless higher priority queues are empty. The risk of these mechanisms is that the strict priority queue can starve other queues. For this reason, a credit-based mechanism has been introduced. The intention is to provide strict priority queues, while mitigating the risk of queue starvation by enforcing a rate limit. This is a very good mechanism to use when there is a mixture of traffic types. For example, VoIP traffic requires low delay and minimal variations in delay (jitter). The strict priority mechanism ensures that the VoIP packets, placed in the strict queue, will be serviced preferentially. However, the credit-based rate limit prevents these packets from starving other queues. This mechanism has not been implemented yet. For this reason, most implementations focus on the ETS queuing mechanism. Enhanced Transmission Selection can be seen as both a standard to exchange information, and a specific scheduling mechanism. This mechanism allows each of


traffic class to have its own minimum bandwidth or service level. But if that class isn’t utilizing its bandwidth, it is available for other classes. The generic nature of this ETS mechanism frees vendors to implement it into their unique hardware platforms. For Comware, this definition matches the Weighted Round Robin (WRR) scheme, and is implemented on an interface. Bandwidth percentages are calculated based on this scheme, and those percentages are sent to the CNA. It is up to the CNA to receive these values, and configure its ASIC in such a way that it respects these values.

Configuration Steps for ETS As with PFS and APP TLV, ETS configuration information is transported using DCBX. Therefore, the configuration of DCBX is a prerequisite to the configuration of ETS. Figure 4-27 shows the three steps involved in configuring ETS. This includes configuring CoS-to-Queue mapping, setting interface scheduling and weight parameters, and then verifying the configuration. These steps are detailed in the following sections.

Figure 4-27: Configuration Steps for ETS

ETS Step 1: QoS Map dot1p-lp Modifying the QoS queue map is an optional step, since every switch already has a mapping by default. Figure 4-28 indicates how to modify the default configuration on a Comware switch, for a two-queue configuration. You can see that an 802.1p value of 3 is mapped to queue 1, while all other queues are mapped to queue 0.


Figure 4-28: ETS Step 1: QoS Map dot1p-lp

So, based on this mapping, only two queues will be used to process all traffic.

ETS Step 2: Interface Scheduling and Weights In the scenario in Figure 4-29, queue 0 and 1 are used to process all traffic. Based on this, the ETS application will look at how those queues are actually configured on the physical interface. This interface configuration will be translated to ETS values, to be proposed to the server’s CNA. With this in mind, the first step is to configure the type of queuing. Since queue starvation is not desirable, WRR is to be configured.

Figure 4-29: ETS Step 2: Interface scheduling and weights

For WRR, two types of weights to be assigned – byte-count and weight value. For ETS applications, specifying weights using byte-count is best, since it is a more accurate way to specify bandwidth utilization. If a weight value is used instead, packets are counted. Since packets are of a variable length, you have less granularity of control over bandwidth. Ten 500-byte packets is much different bandwidth utilization that ten 1500-byte packets. In the example in Figure 4-29, queue 0 is assigned a byte-count of 4, and queue 1 is assigned a byte-count of 6. If weight had been specified, six packets of 200 bytes


each means that 1200 bytes would be transmitted, while 4 packets at 1500 bytes results in 6000 bytes transmitted. So the weight values may not accurately reflect bandwidth. To have more accurate control, therefore, you should use byte count. On the physical interface in this example we are using byte count weight values of four and six respectively. The configurable range is between 1 and 15. 4+6 = 10, so queue 0 gets 4 of 10 bytes (40%), while queue 1 gets 6 out of 10 or 60%. This is a simple calculation when there are only two queues. The configured values in the example will be sent to the CNA. It is up to the CNA to program its ASIC to actually support these values.

ETS Step 2 Continued: A Weight Problem The caveat for the example in Figure 4-30 is that the switch will actually calculate the percentage that it is announcing using all the queues assigned by WRR. Since all the queues are enabled for WRR, and they all have a default weight, we don’t see the expected percentages in the LLDP output above. We see that each queue gets 11%, except for queue 3, which gets 17%. This is actually fairly close to our intended targets.

Figure 4-30: ETS Step 2 Continued: A Weight Problem

As shown in Figure 4-30, 11+17 = 28, and 11/28=39%, and 17/28=61%. However when we look at the output, it is not intuitive or obvious that we have achieved our goal. Further, we intend to use 2 queues, but we see all eight queues are in play. To rectify this issue, we must ensure that only our intended queues are in use.


ETS Step 2 Continued: Assign Queues to SP To ensure that only queue 0 and 1 are used, we can configure the other queues to use the Strict Priority (SP) queuing mechanism. It is then vital that these other queues are never actually used for anything. Otherwise, they could starve out the other two queues on the interface. This is enforced with the local 802.1p to local-precedence map. If this map doesn’t assign any traffic to the other queues, they will remain idle. The example in Figure 4-31 indicates how to assign these idle queues to use SP.

Figure 4-31: ETS Step 2 Continued: Assign Queues to SP

ETS Step 3: Verify Local Configuration Now that the non-essential queues are assigned to use SP, only queues 0 and 1 are used for WRR. Now the LLDP output in Figure 4-32 shows that queue 0 receives 40% of the bandwidth, and queue 1 is assigned 60% of the available bandwidth.

Figure 4-32: ETS Step 3: Verify Local Configuration


Summary In this chapter, you learned about the simplicity, cost, and feature benefits of Data Center Bridging. You also learned about how DCB compares to previous attempts at converging data and storage networks. You learned about the specific protocols that define DCB, starting with DCBX. This is the communication protocol used between converged network switches and storage server CNA adapters. You learned that PFC helps to ensure that a lossless Ethernet service is provided for FCoE traffic. It does this by enhancing the standard Ethernet Pause mechanism, enabling it to pause frames for frames marked with a specific 802.1p value. The APP TLV ensures that both the switch and its attached server CNA are properly marking frames. This ensures that both PFC and ETS can function properly. While APP is responsible for marking frames, ETS controls how to treat frames marked with a specific value. ETS standardizes how frames are queued for transmission, how much bandwidth each queue receives, and the queuing mechanism used for each queue. Lastly, the CN protocol was discussed. You learned how CN differs from PFC in that it is an end-to-end protocol, as opposed to a link-local protocol.

Learning Check Answer each of the questions below. 1. Select four DCB protocol components (Choose four)? a. DCBX – the Datacenter Bridging exchange protocol. b. PFC: Priority-based Flow Control c. ETS: Enhanced Transmission Selection d. EVI: Ethernet Virtual Interconnect e. CN: Congestion Notification. 2. DCBX is an extension to LLDP that facilitates connectivity between DCBenabled devices. a. True. b. False.


3. What is the name for the feature that provides a PAUSE mechanism per 802.1p priority value to help ensure that storage traffic receives lossless service without negatively impacting data traffic? a. ETS b. Congestion Notification c. Standard Ethernet Flow Control d. DCBX e. PFC. 4. Which of the statements below accurately describe the Application TLV, or APP (Choose three)? a. The APP TLV allows the network administrator to configure a switch, which will then automatically propose QoS policy to the CNA. b. To implement APP, traditional QoS mechanisms must be defined, and then applied to the APP TLV feature. c. A special set of QoS mechanisms are provided to deploy the APP TLV feature. d. The APP TLV only accommodates the exchange of Layer 2 Ethertype value, or Layer 4 TCP/UDP destination port number. e.

The APP TLV can accommodate all of the classification mechanisms supported by a typical switch.

5. ETS allows the switch to specify which 802.1p value should be processed by which queue on the CNA. a. True. b. False

Learning Check Answers 1. a, b, c, e 2. a 3. e 4. a, b, d 5. a


5 Fibre Channel over Ethernet

EXAM OBJECTIVES In this chapter, you learn to: ✓ Describe Fibre Channel basic operations. ✓ Understand the roles and ports in an FC Fabric. ✓ Configure a 5900CP for native FC connectivity. ✓ Configure FCoE functionality for Server access. ✓ Configure Fabric Extension. ✓ Understand and configure Storage Area Networking (SAN) Zoning. ✓ Describe and configure NPV mode.

INTRODUCTION In this chapter, you will learn about the fundamental concepts surrounding native Fibre Channel (FC) and Fibre Channel over Ethernet (FCoE) based SAN fabrics. This includes a discussion of fabric components, connectivity and operation. Specific topics related to fabric addressing, security, reliability, and redundancy are to be covered, as well as how to perform initial configuration functions. Additional concepts and configurations involve FCoE host access, Fabric expansion, zoning for security, and N_Port Virtualization (NPV).

ASSUMED KNOWLEDGE You should be familiar with FCoE mechanisms from Chapter 4.


What is a SAN? A Storage Area Network (SAN), as shown in Figure 5-1, is a separate Infrastructure used for storage components. It is a network designed specifically for storage access. Because of the critical nature of data storage, and requirement for absolute fidelity, a SAN design must be very resilient and redundant.

Figure 5-1: What is a SAN?

Historically, the SAN was segregated from LAN traffic. This was largely due to the limited bandwidth of 100Mbps and 1Gbps Ethernet, along with a lack of any standardized means to converge the two networks. Most SANs leverage the Fibre Channel protocol to transmit data between servers and storage systems, with additional capabilities for long-haul links, in case there was a need for inter-site replication services.

SAN Components The components that make up a SAN infrastructure are introduced in Figure 5-2 and below.


Figure 5-2: SAN Components

■ Switches: create the fabric that interconnects SAN devices, in a similar way that Ethernet switches enable connectivity for LAN devices. A “switch fabric” is simply a group of switches that are interconnected to provide a scalable solution. While scalability is often desired, SANs require a very low-latency system. For this reason, as few switches as possible should be deployed to meet an organization’s SAN requirements. ■ Routers, bridges and gateways: Devices typically used to extend the SAN over long distances. SAN Fibre Channel systems use flow-control mechanisms that were designed for low-latency, short-haul networks. Specially designed routers, bridges, and/or gateways have the ability to extend the reach of SAN technology over long distances, while satisfying the requirement for quick responses to SAN signaling frames. These devices can have more advanced features, such as integrating multi-protocol systems, improving fault isolation, and more. ■ Storage devices: This is the disk subsystems used to actually store data, available with a wide variety of capacities and capabilities. Often, storage systems are deployed as a Redundant Array of Independent Disks (RAID), or in a “Just A Bunch of Drives” (JBOD) configuration. Various virtualization technologies can be leveraged with storage systems.


■ Servers: The devices that connect to a SAN with either a Host Bus Adapter (HBA) or Converged Network Adapter (CNA). ■ Cabling and connectors: The medium over which digital signaling is transmitted and received. As with LANs, both fibre optic and copper solutions are available.

HP Disk Storage Systems Portfolio Figure 5-3 reveals some of the many solutions that HP provides, as it relates to SAN systems. This includes systems for the SMB market, such as the StoreVirtual 400. For the midrange market, HP offers the 3PAR StoreServ 7000 and P600 EVA. Enterprise class storage solutions, such as the 3PAR StoreServ 7450, 10000, and XP P9500 are also available.

Figure 5-3: HP Disk Storage Systems Portfolio

This list of available products and features is rapidly evolving. It is recommended that you consult with HP’s Storage Single Point Of Connectivity Knowledge (SPOCK) for detailed information about solutions, compatibility, and capability. The current URL for HP Storage SPOCK is http://h20272.www2.hp.com/Index.aspx? lang=en&cc=us&hpappid=hppcf


Converged Networking - Cookbooks HP storage and server groups work in a very strict configuration mode with regard to storage systems. The goal is to minimize all possible risks with regards to platform interoperability, firmware upgrades, version capabilities, and more. This is why the storage and server group creates validated configurations by building a complete system of switches storage arrays, servers with Converged Network Adapters (CNAs) and Host buss Adapters (HBAs), using specific firmware versions. Various combinations are fully tested and validated to ensure the smoothest possible deployment experience. As mentioned before, platforms, versions and firmware upgrades are quickly evolving. As such, this study guide does not focus on specific product combinations for deployment. The focus is instead placed on understanding how these systems work, and how to configure them.

Host (Initiator)—(Originator) Consider a legacy, stand-alone server that is not connected to, nor using a SAN, instead using internally installed disk systems. Communication is initiated by the server – it needs to either store or retrieve data from the disks, which passively wait to receive and respond to these requests. This aspect of data storage does not change after migration to a SAN-based solution. Modern SAN systems simply move the storage out of the server’s physical enclosure, such that they are connected via some infrastructure, instead of being connected by a local cable inside the server’s enclosure. At a basic level, servers still request data to be stored or retrieved from storage as before. This is why the host is referred to as the Initiator, or originator of SAN service requests. Note There are some specialized replication conditions in which the storage system initiates communication with another storage system. Most of the time, the host server is the initiator, as described above. The actual file or storage system’s independence from the servers is based on Logical Unit Numbers (LUNs). A LUN is how each logical disk or volume is identified by the SAN. Inside the storage system target, many terabytes of raw data may be available. The storage administrator can logically separate this physical storage array into unique


volumes, or LUNs. The administrator can now decide which LUNs are available to be presented to specific hosts. Figure 5-4 shows a host connected to a SAN with two Host Bus Adapters (HBAs). This provides for redundant, multi-fabric connectivity. SAN-A and SAN-B are completely isolated from each other, providing two separate paths between the host initiator and the storage target. The SAN system could select a path to use, or it might be configured to load-balance between the two paths.

Figure 5-4: Host (Initiator)—(Originator)

In this scenario, SAN-A and SAN-B are physically isolated. This means that the SAN’s have completely isolated control, management, and data planes. It also means that firmware updates to the SAN-A fabric will have no effect on the other fabric, SAN-B. In order to take advantage of these two, separate SAN fabrics, hosts must connect to the SAN with adapters that are configured with Multi-Path I/O functionality. This is because without MPIO, the server would not recognize the two paths to the same disk system, thinking they were two unique disks, and therefor send different read/write commands to what is actually one system. Of course, this would create a serious corruption of data.

Disk Array (Target) – (Responder) In a SAN solution, a Disk Array is referred to as the target, or responder. It is the target of host read/write requests, and responds to those requests. Typically the Disk Array, as shown in Figure 5-5, will have multiple interfaces and controllers for


increased throughput, availability and redundancy.

Figure 5-5: Disk Array (Target) – (Responder)

Like the host, the target system may have an interface connected to SAN-A and another connected to SAN-B. The storage system most likely will have two separate internal controllers, with each one responsible for communicating with its separate SAN fabric. Disk arrays are typically protected with controller cache memory system with battery backup. In case of power outage, this “write-back” caching mechanism helps to ensure data integrity. Management software is typically deployed to do replication functions, often over multiple locations, so the disk array can replicate/make back-ups to remote locations for disaster recovery purposes.

Nodes, Ports, and Links Specific terminology is used to refer to the Nodes, ports, and links that interconnect SAN subsystems. The initiator and target interfaces to the fabric is called an “N_Port” or Node Port. As shown in Figure 5-6, these N-Ports connect to a SAN switches “F_Port” or Fabric Port. F-ports are therefore at the edge of the SAN


fabric.

Figure 5-6: Nodes, Ports, and Links

If your SAN consists of a single switch, you would only have F_Ports on the switch connected to host and target N-ports. If you extend your SAN fabric to include multiple switches, E_Ports, or Expansion Ports will be used to connect them. While other port types are available, they are not relevant to the discussion here. These port types will be reviewed in later chapters.

FC Frame and Addressing As previously described, hosts initiate read/write requests to storage targets over a SAN fabric. These request messages are carried inside a Fibre Channel (FC) frame. This section will discuss FC frames, header format, packet flow, FC World Wide Name (WWN), and the FC_ID.

Fibre Channel Frame The FC frame begins with a Start of Frame (SOF) delimiter, and concludes with an End-of-Frame (EOF) delimiter. The frame header contains addressing and other information, as discussed in the next section. There are optional headers that could be used to assist in things like encryption, for example. The data payload field contains the actual data to be transmitted. Notice that the payload field can be up to 2048 bytes, which is larger than the standard 1500-


byte maximum payload of an Ethernet frame. On a converged network FCoE is used to carry FC frames inside Ethernet frames. For this to be successful, jumbo frames must be enabled on the Ethernet infrastructure. The FC frame also includes a CRC to validate data integrity between host and target. Figure 5-7 shows a data frame. There are also link control frames that are used to acknowledge frame receipt, and also for link responses (Busy or Reject).

Figure 5-7: Fibre Channel Frame

Fibre Channel Frame Header Figure 5-8 shows the individual fields of the FC frame header, as described below.

Figure 5-8: Fibre Channel Frame Header

■ R_CTL: Indicates frame type (data, ACK, or Link Response) and data type. ■ D_ID: The destination identifier indicates the destination of the frame. An initiator must determine the D_ID of a target so it can originate a request. There are several methods of determining the D_ID, as will be discussed below.


■ CS_CTL: The Class Specific Control field is used for QoS. ■ S_ID: The source identifier indicates the originator of the frame. It can either be assigned by the fabric controller or administratively set. ■ TYPE: Indicates the upper-layer protocol being carried. In other words, it indicates what is carried in the Data Payload. ■ F_CTL: Frame Control indicates various options, such as sequence information. ■ SEQ_ID: The sequence identifier is assigned by the sequence initiator, and is unique within a given exchange ■ DF_CTL: Indicates presence and size of optional header information. ■ SEQ_CNT: the Sequence Count is a 16-bit number that gets incremented on each frame in a sequence. Storage data must be fragmented into pieces for transmission and reassembled in the proper order upon arrival at the destination. SEQ_ID and SEQ_CNT facilitate this process. A file being stored may be broken up into several sequences, each with a unique SEQ_ID. That sequence is further fragmented into 2112-byte pieces to fit into an FC frame. With these numbers the destination can determine that it has received frame n of sequence SEQ_ID. ■ OX_ID: the Originator Exchange ID is filled in by the originator. This is used to group related transmission sequences. ■ RX_ID: This value is set to 0xFFFF by the originator. Along with the OX_ID, these values constitute a kind of nickname for any given exchange. ■ Parameter: The parameter field has multiple purposes. One of the most common is to be used like the IP Header’s offset field to indicate a relative offset location for data, or for link control information.

Fibre Channel Terminology Figure 5-9 highlights the characteristics of a frame, a sequence, and an exchange.


Figure 5-9: Fibre Channel Terminology

An exchange can be compared to a high-level SCSI read or write operation. Servers need to read information from, or write information to a disk storage system. The server sends information to the disk, and the disk responds back. It is possible that the information sent to the disk subsystem is a very small request, such as “read the 10MB file named “MyDoc.pdf”, for example. Of course, the response is that the entire 10MB file is then transmitted from the disk subsystem to the server, which requires several frames. The complete group of frames that belong to a single request is called an exchange. So, an exchange is a bidirectional communication which can be compared to a traditional SCSI read or write operation. It is the complete process of a server sending a read request to a disk, and the resultant frames to deliver that file to the server. The server confirms that successful receipt of requested data. This is also true for a write operation. The host sends a simple write request, the disk confirms that it is ready to perform this operation, and then server transfers the file. An exchange consists of a number of sequences. This is a communication of one or more frames in a single direction. So, a simple read request would be a single frame, and the response could be, say, 50 frames. The 50 frames sent from the disk to the server is a sequence. At the lowest level is a single frame. This is the description of what should be read from the disk or the actual payload that has been read from the disk and is now being transmitted to the server. The frame carries Upper-Layer Protocol (ULP) data. For


storage traffic, this is the SCSI protocol, which is encapsulated in a Fibre Channel frame.

SCSI (FCP) write operation Figure 5-10 illustrates the relationship between SCSI operations and the Fibre Channel Protocol (FCP) implementation (at the upper layer protocol level). Also described is how those operations translate to Fibre Channel and how the FC layer packages them for transmission.

Figure 5-10: SCSI (FCP) write operation

For a write operation, a server initiator starts with a write command. This is a sequence that contains only a single frame. A transfer ready response confirms that the disk is ready to fulfill this operation. Then the server transmits five frames in a row, all part of sequence number 3. When all data has been sent, the target system confirms the reception of all data by sending a response frame that with the sequence field set to a value of 4. This indicates that all of sequence 3 data was received successfully, and so sequence 4 is expected next. This scenario used five frames as an example. It is possible that fifty, one hundred, or more frames could be transmitted in sequence, before an acknowledgement frame is sent. The important thing is that all frames are received in order for the transmission to successful. Otherwise, the data would be corrupted


This behavior explains why FC has very strict expectations for a lossless network. The protocol has no elegant mechanism to recover from frame loss. In this scheme, if a packet is lost, there is no selective retransmission. The entire sequence must be retransmitted. For example, if frame 5 of 50 was lost, 45 frames would have to be retransmitted.

FC World Wide Name (WWN) The Fibre Channel WWN is a unique identifier for each device in the fabric. Each HBA, CNA, and switch port must have a unique WWN. This is akin to the BIA of an Ethernet adapter. However, the WWN is not used for addressing in FC, or for frame delivery. Recall that the FC_ID is used for this. The FC_ID is a 24-bit address used to define source/destination pairs inside an FC frame. The WWN is used to unambiguously identify systems independent of the FC_ID. The FC_ID is assigned dynamically by the fabric when the host connects, using a kind of “first come, first serve” method. This means that when the host is rebooted, or loses connectivity to the fabric for some time, another system could come on line, and acquire the FC_ID that was originally assigned to the disconnected host. For this reason, it is important to use some identifier that will remain constant, and the WWN services this purpose. This WWN proves useful at two levels. At the fabric level, the WWN can be used for zoning. Figure 5-11 shows two servers connected to a SAN fabric, along with two storage systems. Zoning can key on the assigned WWNs to control which servers can see which storage system. In this way, zonings acts as a type of access filter, limiting storage access to only appropriate, trusted hosts. There should typically be one initiator per zone, as a best practice.


Figure 5-11: FC World Wide Name (WWN)

The second use case for WWN is known as LUN masking. This is a feature that enhances the security of the storage system itself, as opposed to the FC fabric. Figure 5-11 shows three logical disks (LUNS) defined inside the storage system. The system decides which LUN is visible to which initiator. LUN A could be visible to the server at the top of Figure 5-11, with the WWN ending in b6. Meanwhile, LUN B may be visible only to the bottom server, with a WWN ending in b7.


To summarize, zoning controls which targets are visible, while on that target, LUN masking controls which LUNs are visible to a specific WWN. Zoning can also contain Registered State Change Notification (RSCN) messages, which are generated when nodes or switches join or leave the fabric, or when a switch name is changed.

FC WWN Structure The WWN is a 64-bit address defined by the IEEE, see Figure 5-12. A portion of this address contains vendor specific information, with another portion used as a type of serial number, to distinguish between ports from the same vendor. The WWN is used only inside the fabric to which the adapter is connected. This means that the WWN does not need to be globally unique. It must only be unique on the connected fabric.

Figure 5-12: FC WWN Structure

The IEEE has defined two formats for the WWN: ■ Original format: Addresses are assigned to manufacturers by the IEEE standards committee, and are built into the device at build time, similar to Ethernet MAC address. First 2 bytes are either hex 10:00 or 2x:xx (where the x's are vendorspecified) followed by the 3-byte vendor identifier and 3 bytes for a vendorspecified serial number ■ New addressing schema: the most significant four bits is either 0x5 or 0x6 followed by a 3-byte vendor identifier and four-and-a-half bytes for a vendorspecified serial number

Fibre Channel ID Addressing (1 of 3) ******ebook converter DEMO Watermarks*******

As previously stated, the WWN is not used for actual frame delivery, since the 24-bit FC_ID serves this purpose. This FC_ID is assigned by the fabric when the node connects, via a formal registration process. Parallels can be drawn between FC_ID and IP addressing. Like IP addresses, FC_ID is an end-to-end address, and does not change as the packet traverses routed systems. It is also hierarchical. Each switch in the fabric has a unique domain ID. A switch’s domain ID serves as the first octet of the FC_ID assigned to any connected node. The switch can also assign an area ID, and a vendor specific portion of the address. Forwarding is based on this hierarchical address. As with IP, routing tables can be summarized with masks, placing /8 or /16 prefix-based entries in the routing table. Unlike IP, FC_IDs do not have an underlying Layer 2 address. With an Ethernet/IP infrastructure, the MAC address is dynamically learned as needed, and mapped to the IP address in an ARP cache. With FC, a node must go through a formal registration process called fabric login or FLOGI. This process handles host registration, and ensures that the switch fabric has correct addressing information. Figure 5-13 shows an example of the original Brocade address structure for the FC_ID. The first octet contains the Domain ID. This is a number from 1 to 239, which is the maximum number of domain IDs possible in a single FC fabric.


Figure 5-13: Fibre Channel ID Addressing (1 of 3)

The next octet contains the area, which typically represents the switches port number. The final octet is vendor specific. It could be 0 for the first host on an interface, and increment from there. As an example of this schema, if a switch was assigned to domain ID 1, and a host connected to port 5 on that switch, then the FC_ID could be 010500.

Fibre Channel ID Addressing (2 of 3) The Domain ID is a term that refers to the actual switch and all of N_Ports that it groups together. The domain ID needn’t be globally unique. It must only be unique within a fabric. For Comware devices, there is a flat FC_ID assignment schema. The area and ports are grouped together as Port IDs, as shown in Figure 5-14. Therefore,


16 bits are available for ports, which improves scalability for a Comware-based solution.

Figure 5-14: Fibre Channel ID Addressing (2 of 3)

The FC_IDs are logically assigned, and are not based on connectivity to a particular switch port. The first host that comes online, connected to a switch using domain ID 1, will be assigned an FC_ID of 0x010001. The second host to come on line will be assigned an FC_ID of 0x010002, and so on, in a first come, first serve fashion. This grouping based on a switch’s domain ID can improve and simplify FC routing functions. A switch with a domain ID of 01 can create an FC route for 01000/8, grouping all FC_IDs of domain 01 in one class. Routing concepts will be explored later in this chapter.

Fibre Channel ID Addressing (3 of 3) In the example in Figure 5-15, there are two switches, each with a unique domain ID. Switch 02 on the left has three servers connected. The switch on the right has been assigned to domain 01, with two storage systems attached. All end ports (initiators or targets) gets a unique FC_ID, assigned by the switch. So servers are all 0x02xxxx, all storage systems are 0x01xxxx.


Figure 5-15: Fibre Channel ID Addressing (3 of 3) for FC Auto Mode

Fabric Domain IDs Each switch can be statically configured with a domain ID, or it can be configured to support a dynamically assigned domain ID. To avoid unpredictable results, you should configure all switches in a fabric to use the same method, see Figure 5-16.


Figure 5-16: Fabric Domain IDs

When you use static domain IDs, you must configure a unique domain ID per switch. Assigning static domain IDs is currently recommended as a best practice. With dynamically assigned domain IDs, a switch is assigned an ID by an existing switch as it joins the fabric. One switch in the fabric, called the principal switch, carries this responsibility. The principal switch assigns the next unused ID out of the 239 available numbers.

Principal Switch Election Principal switch election takes place at startup, within the first 10 seconds after connections are enabled. The election criteria are simply based on a priority value that you can configure. As shown in Figure 5-17, the switch with the highest priority


wins the election, and becomes the principal switch. If there is a tie in priority values, the switch with the lowest WWN wins the election.

Figure 5-17: Principal Switch Election

The principle switch is responsible for assigning a local domain ID to all other switches. During this process, a concept called a “desired ID” is supported. This means that all FC switches can request a preferred ID. If available, the principal switch assigns that value. If there is a conflict, (perhaps because static configuration was applied to some switch), then the other FC switch will shut down the link. This ensures that new switches will not disrupt an existing fabric.


FC Interswitch Forwarding This section is focused on the forwarding of Fibre Channel traffic between switches. Native FC flow control mechanisms are explored, along with bandwidth aggregation capabilities, the FC routing table, and other available fabric services.

FC Flow Control Overview Fibre Channel provides a lossless network, as required by the SCSI protocol. This is necessary due to the SCSI protocol’s lack of good recovery mechanism, as previously discussed. Native Fibre Channel uses a so-called buffer-to-buffer credit mechanism. This means that during initial peer connection, the peer grants credits, which control how many frames may be transmitted. Credits deplete as frames are transmitted, and when all credits are depleted, transmission must cease. The receiving peer normally sends continuous credit updates during a communication session, as long as buffers are available. This is a very safe mechanism, since transmitters may not send data unless the receiving peer indicates that it is capable of processing inbound frames. Compare this with our discussion of FCoE mechanisms in the previous chapter. FCoE uses the DCBX protocol with PFC. PFC allows transmission until the peer sends a PAUSE frame. FCoE assumes there is ample bandwidth, using PAUSE frames to stop transmissions during the occasional overload. When the pause frame expires, traffic may be sent again. So, PAUSE frames are normally not sent. To summarize, native FC B2B assumes it should not transmit, unless it receives credits from the receiver. The FCoE PFC mechanism assumes it is free to transmit, unless it receives PAUSE frames.

FC Classes and Flow control Native FC defines three classes of service to differentiate traffic. Start_of_Frame Connect Class 1 (SOFc1) provides a dedicated, guaranteed connection, which is best for sustained, high-throughput sessions. Class 2 provides a connectionless service, appropriate when very short messages are sent. Class 3 is similar to Class 2, but with a variation in flow control. This is appropriate for real-time traffic broadcasts. Figure 5-18 summarizes the characteristics of BB and EE flow control. The different


classes leverage these mechanisms in different ways. Class 1 frames use EE flow control, (with one exception). Class 2 uses both BB and EE, while Class 3-based sessions use B2B flow control exclusively.

Figure 5-18: FC Classes and Flow control

The BB flow control mechanism is used between an N_Port and an F_Port, and between N_Ports that are directly connected in a point-to-point topology. Since BB flow control lacks the overhead of sending Acknowledgement frames, it is wellsuited for time-sensitive transmissions. A special type of SOFc1 “initial connect” frame also uses BB flow control. The End-to-End (EE) flow control mechanism provides another option for ensuring reliable communications between N_Ports. Class 1 and 2 FC traffic uses EE, in which an ACK frame from the receiver assures the transmitter that the previous frame has been successfully received. Therefore, the next frame can be sent. If there are insufficient buffers to receive additional frames, a busy message is sent to the transmitter. A corrupted or otherwise malformed frame will cause a Fabric frame Reject (F_RJT) message to be sent to the transmitter.

FC Class 2 Flow Control Scenario Figure 5-19 depicts an FC transmission session for Class 2 traffic, which uses both B2B flow control, based on buffer credits, and EE flow control, based on ACK frames.


Figure 5-19: FC Class 2 Flow Control Scenario

The server’s N_Port sends a data frame to the target, and decrements its F_Port credit count by one. The switch’s F_Port receives this frame, and sends an R_RDY frame to the server’s N_Port, thus incrementing its credit. The switch then transmits this frame out its target-connected F_Port, decrementing its credit count for that target by one. The disk subsystem’s N_Port receives this data frame, and sends an R_RDY to the switch’s transmitting F_Port to increment its buffer count back up by one. This disk responder also sends and EE ACK frame through the fabric and on to the initiator, confirming successful receipt of the frame. For B2B, the receiver continuously updates the transmitter by sending additional credits, as long as it has buffers available. So the server may send only as much as its credit allows. The storage system sends additional credits if it can handle the load. This occurs at wire speed, so it is a very fast mechanism. The combination of both credit counts between each F_Port-to-N_Port connection, and the EE ACK mechanism provides a very robust, fail-safe transmission.

ISL Bandwidth Aggregation To increase the available bandwidth between devices, physical inter-switch links can be bundled into a single, logical medium, as shown in Figure 5-20. Different vendors use their own terminology for this feature. Brocade refers to this as a trunked link, Cisco calls it a port channel, while Comware calls this a SAN aggregation link.


Figure 5-20: ISL Bandwidth Aggregation

Comware supports layer 2 bridge aggregation and layer 3 route aggregation. This allows the bundling of multiple FC ports into a single high-bandwidth logical link.

FC Forwarding Each FC switch has an FC routing table. Each entry in this table contains the destination FC_ID, a mask, and an outgoing port. Each directly connected node has host, or node route entry in the table. Each node has a unique 24-bit FC_ID, and so a /24 table entry indicates a route to a single device. This can be likened to a /32 route in an IP route table. Figure 5-21 shows a switch with domain ID 01, with two storage systems attached and on line. The first target host to connect has been assigned the FC_ID of 0x010001, and has /24 full match entry in the FC routing table. This is a directly connected route, connected on port FC 1/0/1. The entry for device with FC_ID 0x010002 is similarly recorded in the table.


Figure 5-21: FC Forwarding

During the initial registration of this device to the fabric, the FC_ID was assigned, and the route was entered to the route table. If the host goes off line, the associated route table entry becomes unavailable. This mechanism ensures that a switch knows how to reach local hosts. If a fabric consists of a single switch, there is no need to update the routing table. All hosts can find each other, being directly connected to the same switch, and host routes are entered automatically, as targets and initiators connect. When multiple switches are in use, their unique domain IDs describe connectivity to remotely attached devices. The first octet of any node’s FC_IDs is set to its attached switch’s domain ID. All nodes attached to switch with domain ID 0x02 have an FC_ID that begins with 0x02, and so on. The switch with domain ID 0x01 needn’t have an entry for every host connected to switch 0x02. It simply needs a single entry of 0x020000 /8, denoting the outgoing interface attached to switch 0x02. In Figure 5-22, this is interface FC1/0/41


Figure 5-22: FC Forwarding

Similar to IP, both static and dynamic routing can be utilized. Static routes are manually configured on each switch by the administrator. Dynamic routing uses the Fibre Channel Shortest Path First protocol (FSPF). FSPF is a link-state routing protocol, like OSPF or IS-IS. This protocol is on by default, so when two FC switches connect, they automatically exchange routes. FSPF can support link costs to accommodate complex routing scenarios. Also, the graceful restart feature is available to support In-Service Software Update (ISSU). This allows Comware switch firmware to be upgraded without downtime.

Fabric Services This section will discuss Fabric login, Simple Name Service, state change notification, and zoning.

Fabric Login (FLOGI) When a node powers on, it activates its attached switch port link. Before communicating, the node must login to the fabric. Unlike complex security login mechanisms, which are based on usernames and passwords, this is a relatively


simple, yet formal registration process. Both initiator and target nodes must register, enabling the fabric to learn about the nodes. Thus, the fabric has no need to dynamically learn a Layer 2 address, like Layer 3 TCP/IP systems must learn MAC addresses. This information is gleaned as each device connects, through an explicit control-plane process called FLOGI. During FLOGI, the Fibre Channel fabric assigns an FC_ID to the node. In the example in Figure 5-23, the server is attached to a switch with domain ID 02, and it is the first device to activate a port on this switch. Therefore, it gets an FC_ID of 0x020001.


Figure 5-23: Fabric Login (FLOGI)

The switch associates this address with the outgoing port that connects to the server.

Simple Name Service Database As previously discussed, FC_IDs are assigned on a first-come, first-serve basis and can change if devices lose connectivity. This makes the FC_ID less suitable for use


with Fibre Channel security services. This drives the need for a more stable identifier, called the WWN. This is a number that remains constant, regardless of reboots and outages. Since FC_IDs are still used for actual data transmission, the fabric must maintain a map of each WWN and its associated FC_ID. This WWN-to-FC_ID mapping service is provided by the Simple Name Service database. This database is exchanged between switches, so every switch in the fabric has a copy. The database shown in Figure 5-24 lists each WWN, along with its associated FC_ID and node type. The nodes types are listed as either Target or Initiator. When an initiator sends a query to this database, it asks to see all possible target storage systems.

Figure 5-24: Simple Name Service Database

The Name Service can respond with the list of all FC_IDs. The server can then send a logical request to each storage system to ask if any LUNs are available. However, queries can also be filtered based on which device initiated a request. This allows the fabric to show a different set of available targets to different initiator hosts.


In the example in Figure 5-24, suppose the server at the top (WWN ending in 0xb6) is requesting all possible targets. The fabric can filter its response for this initiator, perhaps only revealing targets with a WWN that ends with 0x10 (only target with FC_ID 0x010002, in this example). This is called soft zoning, since it does not actually enforce hardware security rules, merely how responses are filtered for certain initiator queries. It may be technically possible for a host to send a request to a known FC_ID, bypassing the soft-zoning capability of the name service, and reach other storage systems.

VSANs—Virtual SAN/Fabrics VSANs, also known as virtual fabrics, provide the ability to implement multiple fabrics on a single physical infrastructure. This is typically used when isolation is required between fabrics. This isolation can be at different levels. Data isolation ensures that no unintentional data transfer can occur between VSANs. The physical links are shared among VSANs, but kept logically separate through the use of VSAN tagging. Control isolation provides independent instantiations of the fabric for each VSAN. All Fibre Channel services are isolated for each VSAN. Each VSAN has its own name service and zone service. It is therefore impossible to see information from one fabric in another. If an administrator relied solely on the soft zoning feature previously described, a simple error in zone configuration could reveal classified targets to unauthorized initiators. The creation of VSANs eliminates this issue, since all aspects of a VSAN are isolated. An initiating host can only access targets in the same VSAN. Fault isolation is also achieved in the data, control, and management planes, since misconfigurations in one VSAN should not impact other VSANs.

VSAN vs Physical SAN Figure 5-25 compares separate physical SANs on the left to a VSAN solution on the right. On the left, each of the three departments has its own storage infrastructure, deployed as separate physical fabrics, denoted as red, blue, and green. Deploying separate physical infrastructure for each department creates additional cost, and increases rack space and power utilization due to the larger number of devices to manage.


Figure 5-25: VSAN vs Physical SAN

Instead, these systems could share a single physical infrastructure, as shown on the right. This infrastructure can then be logically separated into VSANs. A common storage pool can be shared among these VSANs. This reduces the number of switches to be managed, thereby lowering costs for initial deployment, rack space, power, and cooling. Another benefit is that unused ports can be easily moved to an appropriate VSAN without disrupting a production environment.

VSANs—Virtual SAN/Fabrics on Comware VSANs have historically been used for service isolation, but not necessarily for redundancy. To accommodate redundancy, two physical fabrics (SAN A and SAN B) are deployed with physically separate fabrics. In the previous example, only SAN A is depicted, with the three departments virtually separated via VSANs. To add redundancy, a separate physical SAN B fabric could be deployed, with identical VSAN configuration. Each host can then be connected to both infrastructures for redundancy. Comware 7 switches can improve this scenario. With the Fibre Channel switch


functionality on Comware 7 devices, FC frames are moved through the internal switch architecture using FCoE. This is because internally, Comware switches always use Ethernet technology. This is true even for HP switches like the Comware 5900 CP, which provides native FC ports. Since native FC traffic is internally switched via FCoE, each Fibre Channel VSAN requires an associated Ethernet transport VLAN. For a typical SAN A/SAN B concept, a Comware 5900CP-A and a Comware 5900CP-B is deployed, each with a single VSAN. Figure 5-26 shows a 5900CPCore1 and Core2. C1 hosts VSAN 11, while Core 2 hosts VSAN21. These switches are not part of an IRF group, nor is there any other physical interconnection between these two switches. Thusly the fabric separation is maintained.

Figure 5-26: VSANs—Virtual SAN/Fabrics on Comware

However, the 5900AF top-of-rack switches need to use IRF for redundancy. To maintain a logical fabric separation, VSAN 11 is configured, and the network administrator ensures that only physical ports on unit 1 are assigned.


Similarly, VSAN 21 defined, and is only associated with physical interfaces from unit 2 of the IRF. This ensures that neither VSAN 11 nor VSAN 21 traffic will cross the IRF links, and the concept of physically separated fabrics is maintained. Although they are managed and configured on the same IRF system, each VSAN is separately processed by individual IRF members. Along with the dedicated FC uplinks, the Top-of-rack switches also have a bridge aggregation group configured, with physical uplinks to traditional data core switches. These are the HP Comware model 12900-series switches indicated in Figure 5-26.

VSANs—Tagging As shown in Figure 5-27, VSAN tagging for an FC fabric is quite similar to VLAN tagging for Ethernet switches. Some Ethernet switch ports can be configured as a member of a single, untagged VLAN for endpoint connectivity, while others may be configured to use 802.1q tagging to support multiple VLANs for switch-to-switch links. Fibre Channel also supports two types of tagging – Native FC tagging and FCoE tagging.


Figure 5-27: VSANs—Tagging

With native FC communications, Ethernet is not involved. In this case, FC frames need a specific, special VSAN tag inside the FC frame. The native FC port can be an access port, connected to a node, and using native FC frames. Connections to switches may need to support multiple VSANs. This is where a trunk link is used. The VSAN ID is tagged inside the FC frame. FCoE uses a transport VLAN, which has an 802.1q tag. Even when sending normal FC frames, FCoE uses an 802.1q tag. So for FCoE, there is no access port type. All ports always have an 802.1q tag. This is an implicit tag that is also used for the VLAN. This is why these will always be tagged, even if only a single VSAN is permitted on the port.

Basic Configuration Steps Figure 5-28 introduces the steps to configuring FC infrastructure.


Figure 5-28: Basic Configuration Steps

This starts with configuring the switch working mode and FCoE operating mode. Then VSANs can be defined, along with a transport VLAN for FCoE, bound to the VSAN. Support for native Fibre Channel can be configured on the physical interface. The FC port type is set, and the FC interface is assigned to a VSAN. Initially, a simple default zone can be configured that permits everything.

Configuration Step 1: System-working-mode The switch must be operating in advanced mode to have access to any and all FCoE configuration syntax. This setting requires a system reboot to take effect. As depicted in Figure 5-29, once the system working mode is set to advanced, this configuration must be saved, and then a reboot command is issued.


Figure 5-29: Configuration Step 1: System-working-mode

After the reboot, the working mode is verified via the “display system-workingmode’ command.

Configuration Step 2: Define FCoE Operating Mode The FCoE operating mode depends on how the system is deployed. The operating mode can be configured as a Fibre Channel Forwarder (FCF), an N_Port Virtualization (NPV), or as a transit mode switch. Only one mode is supported per switch or IRF. Also, remember that this command is only available after the switch has been configured to operate in advanced mode, as previously described in Step 1. For this discussion, an FCF is to be configured. NPVs will be covered in a later section. Once set,as shown in Figure 5-30, the FCoE operating mode can be verified with the “display fcoe-mode” command, as shown in Figure 5-31.

Figure 5-30: Configuration Step 2: Define FCoE Operating Mode


Figure 5-31: Configuration Step 2: Define FCoE Operating Mode cont.

Configuration Step 3: Define VSAN The definition of a VSAN creates a virtual fabric. This virtual fabric can provide complete isolation of services for different VSANs sharing the same physical infrastructure, providing a logical fabric separation for IRF top-of-rack systems. VSAN 1 is defined in the system by default, so any new Virtual FC or FC interfaces are assigned to VSAN 1 by default. In Figure 5-32, VSAN10 is created from global configuration mode on the switch.

Figure 5-32: Configuration Step 3: Define VSAN

Configuration Step 4: Transport VLAN and Bind VSAN Now that a VSAN is defined, a VLAN must be created and dedicated for the purpose of FCoE transport. No other hosts or Layer 2 functions are permitted for this VLAN, which has a one-to-one relationship with the VSAN. You cannot have one VLAN that services multiple VSANs. Nor can you have one VSAN that is serviced by multiple VLANS. If the intended design requires multiple VSANs, then a VLAN must be defined for each one. In Figure 5-33, VLAN 10 is defined from global configuration mode. This VLAN is defined as being dedicated to service VSAN 10 with the “fcoe enable vsan 10” command.


Figure 5-33: Configuration Step 4: Transport VLAN and Bind VSAN

In the example in Figure 5-33, the VLAN and VSAN numbers match. Although this can be a good idea to minimize confusion and ease documentation, it is not a technical requirement. Any VLAN number can be configured to support any VSAN.

Configuration Step 5: Configure FC Interface Fibre Channel interface functionality must be configured, since the HP 5900CP switch supports converged ports. Each port can be a 1 or 10gbps port, depending on whether an SFP or SFP+ device is installed. Alternatively, the port could be a 4 or 8gbps native Fibre Channel port. Again, this depends on the adapter installed in a particular port. In the configuration, the port can be configured as either an Ethernet or as an FC port. However, it is important to remember that the optic interface installed for that port must match the intended configuration. For example, if an SFP+ 10Gbps Ethernet interface has been inserted, and that interface is configured for FC, then the port will remain inoperative, in a down state. Ethernet configurations require physical adapters with Ethernet optics, and FC configurations require FC optic-based adapters. Note 16Gbps FC interface optics can be installed in an HP 5900CP. However, this switch only supports a maximum of 8Gbps for FC, and so the port will only operate at 8Gbps. HP has released converged optics that can support both 8Gbps FC and 10Gbps Ethernet. If those are deployed, the administrative configuration determines the operational status of the interface. The example in Figure 5-34 shows an interface initially operating as a 10Gbps interface. When the “port-type fc” command is issued, this interface becomes an FC port.


Figure 5-34: Configuration Step 5: Configure FC Interface

The interfaces operational status can be verified with the “display interface brief” command. In the example in Figure 5-35, the interface is operating as a native FC port.

Figure 5-35: ETS Step 2: Interface scheduling and weights

Interface FC1/0/1 is a member of VSAN 1 by default, and is currently in a nonoperational, or DOWN state.

Configuration Step 6: FC Interface Port Type (1 of 2) Now that the interface has been configured to operate as an FC port, its FC port type can be configured. This includes being configured as one of the following: ■ E_Port: Expansion port – connects to another switch’s E_Port ■ F_Port: Fabric port – connects to a node’s N_Port ■ NP_Port: NPV (virtual) enabled port In the example in Figure 5-36, interface FC1/0/1 is configured as an F_Port, since this port is to be connected to a server or storage system’s N_Port.


Figure 5-36: Configuration Step 6: FC Interface Port Type (1 of 2)

Configuration Step 6: FC Interface Port Type (2 of 2) The command “display interface brief” can be used to validate the configuration. In the example in Figure 5-37, interface FC1/0/1 is still in VSAN 1, but the mode column shows that it is configured to operate as an F_Port.

Figure 5-37: Configuration Step 6: FC Interface Port Type (2 of 2)

Configuration Step 7: Assign FC Interface to VSAN In the scenario in Figure 5-38, the port should be a member of VSAN 10. To configure this port to no longer be a member of the default VSAN 1, the “port access vsan 10” command is used.


Figure 5-38: Configuration Step 7: Assign FC Interface to VSAN

Only native FC interfaces can be configured as an access port. FCoE Virtual Fibre Channel (VFC) interfaces use a transport VLAN, which serves as a VSAN trunk protocol. Again, the “display interface brief” command validates that interface FC1/0/1 has been configured as a member of VSAN 10.

Configuration Step 8: Set Default Zone Permit The default zoning configuration on a Comware switch denies everything. To change this, a simple “zone default-zone permit” command. This command is analogous to the “permit any” statement in an access list. The topic of zones and zoning will be covered in a later section of this chapter. For verification, the “display zone status vsan 10” command reveals that the default zone has been configured to permit all access, as shown in Figure 5-39.


Figure 5-39: Configuration Step 8: Set Default Zone Permit

Configuration Step 9: Status Review Upon completion of a basic configuration, several commands are available to validate Fibre Channel operation and configuration. ■ Display interface brief ■ Display interface FC packet pause (or drops) ■ Display transceiver info ■ Display vsan port-member ■ Display vsan login ■ Display vsan name service

Optional Debugging Optionally, you may use the debug commands shown above. ■ Debug FC interface ■ Debug FLOGI ■ Debug FDISC

FCoE Overview Topics to be covered in this section include the following:


■ Consolidation ■ Terminology ■ CNA ■ FCoE Stack compared to OSI/FC Stack ■ FCoE Frame Format ■ FIP: Fibre Channel Initialization Protocol ■ FPMA: Fabric Provided MAC-Address

FCoE I/O Consolidation The main goal of FCoE is to achieve network consolidation. A typical deployment could fill a rack with a large number of devices, cables, and adapters, due to the separate infrastructure for LAN and SAN. Figure 5-40 highlights the savings in equipment: ■ 50% fewer switches in each server rack—Only two CN Top-of-Rack switches, compared with four (two LAN and two FC) switches per rack with separate Ethernet and FC switches ■ 50% fewer adapters per server ■ 75% fewer cable connections

Figure 5-40: FCoE I/O Consolidation


FCoE Goals FCoE is tasked with maintaining the latency, security, and traffic management attributes of Fibre Channel, while integrating with and preserving investment in existing FC environments. The protocol must not be disruptive to any standard Fibre Channel functions and capabilities. FC must continue to function as always, using the same FC_IDs, WWNs, zoning, FSPF routing, and so on. Interoperability between native Fibre Channel and FCoE based systems should be very easy. This is because there is no device required to “convert” between native FC and some other protocol. The native FC functionality is simply encapsulated in an Ethernet frame, and then decapsulated prior to transmission to a native FC device. This ability to integrate Ethernet and native FC without need for a separate protocol simplifies the deployment of storage environments. As explained above, all the capabilities of native FC technology are extended over Ethernet systems through the use FCoE. Toward this end, it is vital to ensure that Ethernet provides the lossless transmission that Fibre Channel requires. This capability was previously described in the chapter about DCBX and PFC.

FCoE Terminology The following introduces the terminology surrounding FCoE concepts and configuration. ■ FCoE: Fibre Channel over Ethernet carries native FC frames inside a standard Ethernet frame. ■ CEE: Converged Enhanced Ethernet, also known as Data Center Bridging (DCB), describes a suite of protocols and capabilities necessary to support FC technology over Ethernet infrastructure. ■ VFC: Virtual Fibre Channel Interfaces provide an FC abstraction layer over a traditional Ethernet connection. This enables all of the traditional port types supported on native Fibre Channel, including: • VN_Port: This provides the virtual equivalent of an FC N_Port, for end node connectivity. • VF_Port: Provides the virtual equivalent of an FC F_Port, for switch fabric connectivity. • VE_Port: This is the virtual equivalent of an FC E_Port, for switch-to-switch links.


■ Enode: this is an FCoE device that supports FCoE VN_Ports. This includes both server initiators and storage system targets. ■ FCF: an FC Forwarder is a device that provides FC fabric services with Ethernet connectivity ■ FIP: the Fibre Channel Initialization Protocol acts as a type of helper protocol for initial link setup.

Converged Network Adapters (CNA) Each FCoE-capable server must be equipped with a Converged Network Adapter (CNA). As the name implies, this adapter supports both Ethernet services for standard LAN connectivity, and Fibre Channel services for SAN fabric connections, see Figure 5-41.

Figure 5-41: Converged Network Adapters (CNA)

Traditional Host Bus Adapters (HBAs) only support native FC, while Network Interface Cards (NICs) only support Ethernet. The CNA converges these two functions into a single device. This adapter presents itself to the server OS as two separate devices – an Ethernet NIC and an HBA. Therefore, the OS is not aware that convergence is taking place, continuing to perceive two separate fabrics for LAN and SAN. This aspect of CNAs makes it easy to migrate from separate legacy systems to a converged solution.

HP CNA Products Figure 5-42 shows some of the CNA products that HP supports. New products and capabilities are being added by HP on a regular basis. Please check current documentation.


Figure 5-42: HP CNA Products

FCoE Server Access A hardware-based CNA provides FCoE capability that is integrated with traditional Ethernet services. Although both services are provided by a single device, the server OS perceives a separate HBA and NIC. This not only makes network convergence transparent to the server OS, but also to server administrators, who can continue to configure these “separate” adapters as always. The adapter will leverage FIP and the DCB suite (DCBX, PFC, and ETS) to facilitate SAN fabric connectivity. These protocol suites run independent of each other. If you configure FCoE to use VLAN 10, it is the network administrator’s responsibility to ensure that VLAN 10 is assigned the correct 802.1p mapping, and that PFC and ETS are properly deployed to provide lossless service for VLAN 10. In Figure 5-43, the server has two CNAs installed. For Fibre Channel, CNA-Port1 is connected, and will use FLOGI, and acquire FC_ID in VSAN 11, while CNA-Port2 does FLOGI and get a unique FC_ID on VSAN 21.


Figure 5-43: FCoE Server Access

Meanwhile the Ethernet functionality of the two CNAs can be aggregated in a traditional NIC teaming configuration to enhance bandwidth utilization and redundancy for LAN communications. The network administrator may choose how and whether to team these NICs, just as they did with separate HBAs and NICs.

FCoE Stack Overview Figure 5-44 compares the classic OSI model’s protocol concepts with FCoE and native Fibre Channel. Notice that Upper Layer Protocol (ULP) services are identical between FCoE and native FC, as are FC layers 2 through 4.


Figure 5-44: FCoE Stack Overview

It is only the physical and data link layer protocols of FC that have been replaced by Ethernet. An FCoE mapping layer presents itself to FC-2 as a native FC interface stack. It encapsulates the FC frames in Ethernet for transmissions, and decapsulates it before passing received traffic up through the stack.

FCoE Encapsulation The native FC frame is encapsulated in a typical Ethernet frame. In Figure 5-45, you can see the standard Ethernet source and destination MAC addresses, the Ether Type field, the IEEE 802.1Q tag, and the 4-bit version field. FCoE data frames have an Ether Type of 0x8906.


Figure 5-45: FCoE Encapsulation

The standard Ethernet FCS or Frame Check Sequence serves as a frame trailer, and aids in detection of corrupted frames. Contained inside this Ethernet frame is a native, unmodified FC frame.

FIP: FC Initialization Protocol With native FC connections there is a direct physical link between HBA and fabric, so when the physical link is down, the FC link is of course down. FCoE uses virtual links. While the Ethernet link may be up, a logical FC connection must be established and maintained between the CNA and the FCF switch, see Figure 5-46.


Figure 5-46: FIP: FC Initialization Protocol

For example, the Ethernet link and all associated physical connections may be up, but the Virtual FC interface could be manually shut down. FIP notifies the peer of this condition, ensuring that it understands the lack of connectivity. FIP provides a mechanism to accurately reflect the logical status of the FCoE connectivity. Other functions provided by FIP include: ■ FCoE VLAN Discovery: Ensures that the CNA learns from the FCF which 802.1q VLAN tag it should use. ■ FCF Discovery: Enables the CNA to find its attached FCF ■ FLOGI: Fabric Login must occur for any FC device to acquire an FC_ID and communicate over the fabric. Since this is FCoE, a fabric MAC address will also be allocated, called the FPMA.


■ FPMA: The Fabric Provided MAC-Address enables the FCoE transmissions ■ Link Keep-alive: With the above functionality complete, FCoE communications are now possible. That status of the link is continuously validated with link keepalive messages.

FIP: VLAN and FCF Discovery FCoE data frames are denoted by Ethertype 0x8906, while FIP frames use an Ethertype value of 0x8916. Initial FIP frames are sent using the BIA MAC address from the Ethernet portion of the CNA. This is the same as any typical Ethernet frame would be sent. The first step of the FIP protocol is to perform VLAN discovery. Since the VLAN has yet to be discovered, the appropriate 802.1q tag is unknown. Therefore, these discovery frames are sent as untagged, native Ethernet frames. The FCF recognizes VLAN discovery messages, and responds with the FCoE VLAN ID, as configured for that interface. VLAN discoveries are the only FIP frames that are sent untagged. All other frames are tagged per the FCF VLAN discovery response. The next step is to perform FCF discovery, in which the node sends a Discovery Solicitation message. The FCF responds with a Discovery Advertisement, which contains an FCF Priority value. If multiple forwarders exist on the same VLAN, they would all respond to the solicitation message. The node selects the FCF with the highest priority, which is the lowest numerical value.

FIP: FLOGI and FPMA Once the FCF is selected, the host must login to that system, using FLOGI. You may recall from a previous section that FLOGI results in the assignment of an FC_ID. For FCoE, an FPMA is also assigned. The node’s CNA will use the FPMA as its source MAC address for all FCoE frames (Ethertype 0x8906). Prior to FLOGI, the CNA's BIA is used. The FPMA is constructed of two 24-bit pieces – the Fibre Channel MAC Address Prefix (FC-MAP) and the FC_ID. The default FPMA prefix is 0xEFC00, but can be manually configured. The second portion of the FPMA is equal to the assigned FC_ID. Since this is unique per VLAN, there is typically little motivation to modify the FC-MAP. The FPMA need only be unique within the VLAN, and the unique FC_ID ensures this is the case. This is because there is a one-to-one relationship between VLAN and VSAN.


The example in Figure 5-47 shows an FPMA of 0x0EFC00010004. This was constructed from the default FC-MAP, and an assigned FC_ID of 0x010004.

Figure 5-47: FIP: FLOGI and FPMA

FCoE Design considerations FCoE was designed to move data inside a single data center environment, and was not designed for long-distance WAN communications. This is primarily due to the timers involved in the PFC and PAUSE functions of DCBX, and associated buffer calculations.

Configuration Steps for FCoE Host Access As shown in Figure 5-48, the prerequisites are similar to previous configurations. The server and storage nodes must be configured to support appropriate DCBX functionality. Switches must support FCF mode and have a VSAN defined, along with unique Domain ID assignment and a default zone permit, as a minimum.


Figure 5-48: Configuration Steps for FCoE Host Access

Configuration Steps for FCoE Host Access Figure 5-49 introduces the steps to configure FCoE host access. These steps are detailed in the following sections.

Figure 5-49: Configuration Steps for FCoE Host Access

Configuration Step 1: Create Virtual FC Interface A new virtual FC interface is created in the top portion of the example in Figure 5-


50. The second example in Figure 5-50 reveals how to verify this configuration. You can see that VFC 2 has been created.

Figure 5-50: Configuration Step 1: Create Virtual FC Interface

Configuration Step 2: VFC FC Port Type The switch port in this scenario is intended to connect to some end host. Therefore, it must be configured as an F_Port. The examples in Figure 5-51 reveal the syntax to configure and verify this requirement.


Figure 5-51: Configuration Step 2: VFC FC Port Type

Configuration Step 3: Bind VFC to Interface (1 of 2) To function, the previously configured virtual interface must be associated with a physical interface. This could be either a single physical interface, or logical linkaggregation interface. When a single VFC is bound to an Ethernet link-aggregation, the FCoE traffic will be distributed over the link-aggregation member ports using the traditional hash mechanisms. Since the FCoE frame does not contain an IP header, the hashing algorithm will use the source/destination Ethernet MAC address for the calculation. Since all communication between FCoE devices will be using a stable MAC Address, the communication between any 2 FCoE devices is guaranteed to use a single link of the link-aggregation. This ensures that link aggregation will not introduce problems such as out-of-order delivery. Out-of-order delivery is not an issue for traditional IP networks. Multiple packets of a single flow can be sent over different physical links or paths with different latency. Sequencing information contained in the headers allows packets to be reassembled in the proper order, regardless of the order in which they arrived.


Binding a virtual Fibre Channel interface to a logical aggregation interface is not applicable for a server-facing port-group. This is because servers typically have two physical CNA adapters. The reason for having two Fibre Channel connections from a server is to connect each one to a separate fabric, and so they cannot be aggregated. The use of bridge aggregation is appropriate for inter-switch links, thus providing ample bandwidth for the multiple host communications traversing these links. In Figure 5-52, VFC 2 is bound to physical interface ten-gigabitethernet 1/0/2.

Figure 5-52: Configuration Step 3: Bind VFC to Interface (1 of 2)

Configuration Step 3: Bind VFC to Interface (2 of 2) The binding can be verified with the “display interface vfc brief” command, as shown in Figure 5-53. The example reveals that VFC2 is bound to interface XGE1/0/2.

Figure 5-53: Configuration Step 3: Bind VFC to Interface (2 of 2)

Configuration Step 4: Assign VFC Interface to VSAN As with native FC interfaces, the virtual interface must be assigned to a VSAN. You


have learned that the VSAN traffic is transported over a VLAN, and that FCoE uses 802.1q VLAN tagging. Since the tagging allows for multiple VLANs, multiple VSANs are implicitly supported. To achieve this functionality, the FCoE VFC interface must be configured as a VSAN trunk port, as in the example in Figure 5-54.

Figure 5-54: Configuration Step 4: Assign VFC Interface to VSAN

Configuration Step 5: Physical Interface VLAN Assignment The virtual interface has been configured to use VSAN 10, which is using VLAN 10 as a transport. The virtual interface has been bound to a physical interface. This physical interface must therefore be configured to support VLAN 10. The example shown in Figure 5-55 completes this scenario by configuring the physical interface to be a trunk port that allows VLAN 10.

Figure 5-55: Configuration Step 5: Physical Interface VLAN Assignment

Fabric Expansion In previous sections you learned about FCoE connectivity between Top-of-Rack switches and server hosts. The example scenarios have revealed how to connect to isolated FC switches. In these scenarios, one switch is an FCF which connects to the server CNA, using FCoE, and another switch is a 5900CP that connects to storage systems via native Fibre Channel. The focus now shifts to interconnecting the FCoE Top-of-Rack and native FC switches using a fabric expansion. This involves understanding and configuring E_Ports, the FSPF routing protocol, and validating the resultant Fibre Channel routing table.


Fabric Expansion: E_Port For FC switch-to-switch connections, each side of the link is to be configured as an expansion port, called an E_Port. Once configured, the switches discover each other, and fabric services are thus extended across multiple switches. The E_Port in an HP5900 or other Comware device must be facing another Comware E_Port. This includes Fabric link services, such as the FSPF routing protocol, which populates the FC routing table by exchanging domain information. Name service database exchange will also occur, providing a consistent, fabric-wide name service. All targets and initiators can be aware of all devices, regardless of physical switch connections. For security purposes, Zone databases information is exchanged between switches. All switches have access to all zone information. VSAN tagging support will also be configured consistently across all switches in the fabric. For FCoE, the E_Port is a virtual construct, and so is referred to as a VE_Port. This has the same functionality as a native Fibre Channel E_Port. Unlike most FCoEbased functions, DCBX protocol functionality is not required for VE_Ports. Instead, the manual configuration of simple PFC commands is sufficient. For Server CNAs, FIP fulfills all of the initial connection requirements. However, FIP does not serve this purpose with switch-to-switch links. Instead, it is simply assumed that the network administrator properly configures these connections. The FIP keep-alive mechanism is used to determine ongoing link status.

Fabric Expansion: Routing Table Exchange Once the switches become aware that they are part of an expanded fabric, they will exchange routing table information. These routing tables can be constructed using static routes or via the Fibre Channel Shortest Path First protocol. On Comware switches, FSPF is enabled by default to ensure that /8 switch domain ID routes are automatically exchanged.

Configuration steps for Fabric Expansion with FCoE ******ebook converter DEMO Watermarks*******

Prior to configuring fabric expansion, you must manually configure the physical interface to support PFC. Then you can create a new VFC interface, set its port type to be an E_Port, and verify the configuration.

Configuration Step 1: Create New VFC Interface The first step is to prepare a new VFC interface. In this scenario, interface Ten1/0/4 is to be connected to another switch, thereby serving an E_Port role. The example in Figure 5-56 shows this interface being configured as a trunk port, with VLAN 10 enabled to traverse the link.

Figure 5-56: Configuration Step 1: Create New VFC Interface

Next, VFC 4 is created, bound to the interface, and made a member of VSAN 10.

Configuration Step 2: Set FC Port Type to E_Port The virtual port is configured as an E_Port, and then the configuration is validated. The example in Figure 5-57 shows that VFC 4 is created, defined as an E_Port, and bound to interface XGE1/0/4.


Figure 5-57: Configuration Step 2: Set FC Port Type to E_Port

Configuration Step 3: Verify Status As shown in Figure 5-58, several display commands are available to verify successful FIP peering, FSPF route peering and routing table information, as well as the name service database.


Figure 5-58: Configuration Step 3: Verify Status

Multi-path - Concepts Multi-path deployments connect a host system to both SAN A and SAN B for redundancy. Storage systems that support path redundancy could have dual adapters to connect to both SANs. Figure 5-59 depicts a storage system with CTRL-1 and CTRL-2, and each controller is connected to both SAN A and SAN B.


Figure 5-59: Multi-path - Concepts

If these controllers were configured in an Active-Active mode, then the server would see LUN-A four times. Using HBA-P1, connected to SAN A, the server would see the target FC_IDs for CTRL-1 and 2, and each would show LUN-A. The server would have a similar view via HBA-P2, via SAN B. If the server is not aware that it is seeing the same disk 4 times, it could write different data to the same disk, leading to file system corruption. To prevent this issue, a Multi-Path I/O (MPIO) driver is required for host HBAs. The MPIO feature ensures that each of the four paths are identified and recognized as separate connections to the same LUN. If one path fails, the MPIO automatically switches to a different path, enabling continuous service in the face of hardware or connection failures.


MPIO also makes load-sharing options available. Various algorithms could be used to split the load among different paths, some based on connections, some based on perceived load. Load balancing may require special load-balancing software installations on the server, and must be configured by the server administrator. The fabric has no control over these load-sharing functions.

Multi-path—Automatic Failover MPIO facilitates automatic failover functionality. In Figure 5-60, the active link between HBA-P1 and the SAN A switch has failed. The MPIO driver will immediately detect this failure, and use HBA-P2 for continued service.

Figure 5-60: Multi-path—Automatic Failover

This failover feature is transparent to the server, and no special fabric configuration


is required to support this feature. The network administrator must simply ensure that both fabrics support the same services and connections. If a certain storage target was only connected to SAN A, there is obviously no failover capability available to this target via SAN B. A similar requirement relates to fabric zone configuration, which must be configured identical on both fabric A and B. If Fabric B’s zone configuration filters server visibility to a target, failover functionality is broken.

Fabric Zoning Fabric zoning provides for access restrictions inside a VSAN, and so is configured separately for each VSAN. This zoning configuration controls with nodes may communicate with each other. The objective is to ensure that host initiators can only discover intended targets. This control is implemented as a set of permit and deny statements, similar to a TCP/IPbased ACL. Figure 5-61 shows two storage systems. One is a Tier-1 production system for ESX hosts, named “3Par”. The other is a Tier-2 system named “MSA”. The server named ESX-1 is placed in a zone named ESX, along with the 3Par storage system. The archive server is placed into the archive zone with the MSA storage system.


Figure 5-61: Fabric Zoning

Zones can be configured such that only devices in the same zone may discover each other. Although all systems share the same fabric, their access scope is limited by zoning. It is quite easy for the network administrator to modify this behavior at will. Existing servers can be granted additional access, or have stricter filtering controls applied, and new servers and zones can be added or modified.

Fabric Zoning Concepts A zone member simply refers to a node, either by FC_ID or the port WWN (pWWN). For reasons soon to be described, it is recommended that the actual FC_ID or pWWN be abstracted in the zone configuration through the use of Zone aliases.


Zones are defined in order to group members together. Traffic between all members of the same zone is allowed. It is often best to avoid having a large number of members in a single zone, as this will impact the number access rules to be created. Instead, consider creating small zones for point-to-point connections. It is also recommended to have only one initiator per zone. For example, a zone may be created to allow host ESX-1 to access the 3Par storage target. A second zone could be created to allow host ESX-2 to access 3Par, and so on. This is often preferable to creating a single zone with several server and storage system members. With two hosts and a target in the same zone, the switch needs a rule to permit ESX-1 to ESX-2, and to 3Par, and rules in the other direction, from 3Par to ESX-1, along with rules from ESX-2 to ESX-1 and 3Par, and back. A zone with 10 members requires an exponential number of rules, since you must allow each member to see every other member. Creating zones with only two members is recommended, since it preserves hardware resources and reduces configuration efforts. As shown in Figure 5-62, defined zones can be grouped into a zone set. The zone database supports multiple zone sets, but only one zone set can be active at any time. The zone database is distributed to all FC switches, and can be configured to share all zone sets, or only the active zone set.

Figure 5-62: Fabric Zoning Concepts

Zone Members As shown in Figure 5-63, zone members can be identified based on FC_ID or WWN.


Remember that an FC_ID is dynamically assigned, and can change over time. This lack of permanence makes the FC_ID a less reliable identifier for security purposes. However, switches also support static FC_ID assignment, thereby eliminating this concern.

Figure 5-63: Zone Members

FC_ID-based zoning is often referred to as “hard” zoning, since it is enforced at the hardware level. This makes it a more secure method of zoning, especially in conjunction with fabrics that contain untrustworthy nodes, or nodes not under your direct administrative control. WWN-based zoning is often called soft zoning, since it is enforced by the name server. When servers query the name service, zoning can filter the response, so that initiators will only learn about authorized targets. Since FC_ID can change, WWN-based zoning is considered a more stable method of access control. For example, if a VM is configured with a virtual HBA (vHBA), giving it direct Fibre Channel access, the VM will have its own FC_ID and WWN. If this VM is moved to a different host, its FC_ID would change, but the WWN is


maintained and would therefore have consistent SAN access. Zone Aliases are logical names that the administrator can assign, and to which the hosts FC_ID or WWN can be bound. If a CNA or HBA must be replaced, only the zone alias configuration need be updated. The rest of your zone configuration remains valid, since it only references the alias. Zone Aliases also ease the process of copying zone configuration from SAN-A to SAN-B. The WWNs are unique between SAN A and B, but since your zone configuration only references aliases, it can simply be copied it from SAN A to SAN B.

Zone Enforcement Hard zoning is the default zoning method for Comware devices. This means permitted source and destination address information is programmed at the ASIC level, creating a hardware-enforced ACL, permitting and denying traffic based on information in the transmitted frames. Since ASICs have a limited number of resources, an overly large zone set may force the switch to use soft zoning. This is especially true for zones that have been configured with many members. The switch from hard to soft zoning will occur automatically, when hardware resource limits have been reached, see Figure 5-64.


Figure 5-64: Zone Enforcement

The switch to soft zoning means that filtering is no longer enforced at the packet level. Instead, filtering occurs when the switch responds to name service requests. For example, when the archive server queries the name service for targets, its response only includes the MSA target. Since the 3Par storage target is not included, the archive server is unaware of that target. Access to that target is technically possible, but would require a relatively skilled hacker to determine the FC_ID for 3Par, and reprogram the HBA to transmit frames directly to this FC_ID, without use of standard discovery mechanisms.

Zoning Configuration Prerequisites ******ebook converter DEMO Watermarks*******

Prior to zone configuration, an operational VSAN must be configured, and the port WWNs for hosts must be documented.

Configuration Steps for Zoning Figure 5-65 introduces the steps to configure zoning, as detailed in the following pages.

Figure 5-65: Configuration Steps for Zoning

Configuration Step 1: Prepare Zone Member Alias Zones can directly reference FC_ID or pWWNs. However, zone member aliases can provide ongoing administrative advantages. While multiple members can be configured with the same alias, it is a best practice to configure a unique alias per member to support more granular security controls in the future. In the example shown in Figure 5-66, zone aliases are analogous to objects used in an ACL. Two arbitrary but administratively meaningful zone alias names are configured, and the associated pWWNs are assigned to them.


Figure 5-66: Configuration Step 1: Prepare Zone Member Alias

All zone configuration is VSAN specific. You must deploy separate zone configurations for each one. However, with consistent zone aliases, you can simply copy and paste the rest of your zone configuration among VSANs. In other words, the configuration indicated in Figure 5-66 is the only portion of the zoning deployment that will be unique between VSANs. All of the zoning configuration described in the following pages can simply be configured once, on VSAN 10, and then copy/pasted to your other VSAN.

Configuration Step 2: Define Zones Zone definitions are analogous to individual lines (ACEs) in an access list. Zone members are allowed to communicate with each other. In the example in Figure 5-67, a zone named esx1-3par1 is created and members are specified, based on the aliases created previously. Since aliases were used, this and all remaining zone configuration syntax can simply be copied and pasted into the other VSAN. Also, this example follows the previously mentioned best-practice of creating small, point-to-point zones, to ensure ASIC resources are not unduly taxed.


Figure 5-67: Configuration Step 2: Define Zones

Another example is also shown in Figure 5-67, to reveal the syntax used to base zone membership on FC_IDs and pWWNs. This is also shown to point out that mixing WWN and FCID is not considered a best practice, and should be avoided.

Configuration Step 3: Define a Zone Set Now that zones have been defined, they can be grouped together into a zone set. Similar to how an ACL groups individual ACEs into an applicable entity, a zone set group zones together into a single entity. The example in Figure 5-68 shows a set named Zoneset1 being created, with the previously defined zones specified as members.

Figure 5-68: Configuration Step 3: Define a Zone Set

Configuration Step 4: Distribute and Activate Zone Set Continuing with the access list analogy, an ACL will group individual ACEs into an entity that can then be applied to an interface. The defined zone set collects groups


together into an entity, which can be distributed and activated in the fabric. This entity will indeed be distributed to all switches in the fabric, including both native Fibre Channel and FCoE-based switches. Recall that while multiple zone sets can exist in the database, only one can be active in the fabric. As the network administrator, you can configure the distribution to include all zone sets, or only the active zone set. Figure 5-69 shows how to configure the distribution for all zones, by using the “full” option, and then configuring which zone is to be activated.

Figure 5-69: Configuration Step 4: Distribute and Activate Zone Set

Configuration Step 5: Verify Validation commands include the following: ■ Display zoneset vsan ■ Display zone name esx1-3par1 ■ Display zone-alias ■ Display zone member fcid 010001 ■ Display zoneset active vsan 10

NPV – NPIV Overview This section provides insight into the N_Port Virtualization and the N_Port Virtual ID. Terms will be defined and related concepts will be explored. You will then learn how this can improve multi-vendor interoperability, and how to configure the N_Port Virtualization role on a fibre channel device.

Server Virtualization with NPIV The goal of N_Port ID virtualization is to enable Hypervisors such as VMware ESX or Microsoft Hyper-V to extend physical HBA functionality to Virtual Machines (VMs).


Figure 5-70 shows VMWare ESX server 1 with a physical HBA (pHBA). This physical HBA will perform a traditional FLOGI to the fabric, and so the physical ESX server gains access to LUNs.

Figure 5-70: Server Virtualization with NPIV

For a VM deployment, a virtual HBA is added to each VM’s device list. On the storage system, a LUN is defined for this virtual HBA to access. For example, the VM might be a Microsoft Exchange Mail server, able to directly access the SAN fabric through the virtual HBA, and use the defined LUN to store and retrieve data. Multiple VMs run on one physical host, and each VM requires SAN access. This means that each VM must perform an FLOGI to the fabric, and be assigned a unique FC_ID and WWN. Since a single physical server is hosting multiple VMs, the FC fabric perceives a single physical port performing multiple logins, with multiple addresses assigned.


The FC switch fabric must have the ability to support this scenario, in the form of a feature called N_Port ID Virtualization. Support for NPIV is enabled by default on the Comware Fibre Channel switches. Both the ESX host and the physical HBA must also support NPIV, since VMs are not aware of this concept. The VM’s virtual HBA is simply performing a traditional N_Port role, using standard FLOGI communications. Inside the ESX host, a virtual port is created towards each VM. The example in Figure 5-70 depicts two VMs deployed in a single ESX host, so virtual ports 1 and 2 are created. Each virtual port operates as F_Port. As in physical fabrics, the VM’s virtual N_Port connects to this virtual F_Port. However, ESX hosts are not actually switches, and so are not capable of processing the virtual HBA’s FLOGI request. This request is forwarded by the physical host to its upstream switch connection. The VM perceives that it is communicating with the virtual F_Port for FLOGI, while the physical server actually forwards this on to the physical switch. With NPIV, the host server’s physical HBA performs FLOGI and receives the first available FC_ID. The physical HBA will then proxy the VM’s virtual HBA login toward the upstream FC switch. The VM’s virtual HBA receives the next available FC_ID. For data forwarding, it is important to understand that all storage traffic must leave the physical HBA. For example if VM 1 is running FC target software, it can operate as a disc storage system and accept incoming connections. VM 2 is configured as a typical server, and so could use VM 1 as a target. However, this traffic cannot stay inside the ESX environment because it is not a fibre channel switch. It has no knowledge of FC routing and zoning. This is why the physical ESX host must always forward traffic upstream to an actual switch, which can route the traffic to an appropriate destination. This could be back over the same physical interface, in this example. This is a very unlikely scenario because most VMs are used as initiating hosts. Still, the possibility does exist, and the scenario serves to highlight the relationship between virtual and physical components.

FC Switch with NPV Mode A Fibre Channel switch can be configured to operate in NPV mode, to take advantage of the NPIV concept. Essentially, this moves the functionality of ESX internal virtual ports (described previously) out of the virtual realm toward the ESX physical server


and it’s directly attached physical switch ports. The NPV mode switch has an uplink to another physical switch, assigned a domain ID of 0x01 in this example. This uplink is configured as an N_Port Virtualization (NP_Port), which indicates that it will proxy FLOGI requests to the upstream switch. The downlink ports connected to hosts are configured as F_Ports, since the attached ESX hosts connect with traditional HBA N_Ports. Since no fabric services are provided by this switch, all fabric service requests received from the ESX hosts are proxied to the upstream fibre channel switch at Domain ID 0x01. All FLOGI sessions are proxied to this upstream switch, which will assign FC_IDs. In Figure 5-71, the NPV switch logs into switch 0x01, receiving the first available FC_ID of 0x010001. Assuming the ESX servers are the next to login, they will be assigned FC_IDs of 0x010002 and 0x010003.

Figure 5-71: FC Switch with NPV Mode


When one of the physical servers performs a name service lookup, this is also proxied through the NPV switch, on to FC switch 0x01. Also, the same data forwarding rules apply as in the previous example. All traffic must exit the NPV switch. If host ESX 1 requires a storage connection to ESX 2, this request must be forwarded upstream to switch 0x01, which will then send that request out the same physical interface, back through the NPV, and on to target 0x0003. As before, this is an unlikely scenario, since hosts are typically organized behind the NPV switch, while storage systems would be connected to the fibre channel switch.

FC Switch NPV Mode - Considerations Figure 5-72 compares advantages and disadvantages of using a Fibre Channel switch configured to operate as an N_Port Virtualization.

Figure 5-72: FC Switch NPV Mode - Considerations

NPV can simplify fabric services because there’s no need to distribute the zone information to other switches. Switches operating in NPV mode do not take part in zone enforcement, because all of this communication flows through the NPV mode switch to be processed and enforced by the real fibre channel switch. Similarly, there is a reduced number of name service and routing updates, as both of these services are also maintained solely by the real fibre channel switches. This means that name service databases are smaller, while routing tables and topologies are simplified.


Meanwhile, redundancy capabilities remain, since NPV mode switches are capable of link aggregation to the native FC switch fabric. Also, the concept of redundant SAN A and SAN B is available for redundancy by simply configuring two NPV mode devices for the two different fabrics. An advantage for larger deployments is that NPV mode reduces the number of domain IDs in use, which is limited to 239. Since NPV mode switches do not consume a domain ID, greater scalability is available. Another advantage relates to greater vendor Interoperability. There is no real standardized Interoperability for the fabric services. Concepts are well described but most vendors have different features and methods of implementing these concepts. Practically speaking, there is little to no actual Interoperability between the vendors. NPV switches do not take part in actual fabric services. They simply emulate a traditional node (or multiple nodes), and this is a very standardized mechanism which will work fine with other vendors. For example, a server could be connected to a 5900 NPV switch, which in turn is connected to a Brocade FC fabric. This effectively integrates a Comware 5900 switch into an existing Brocade fabric. Another example involves using a virtual connect flex fabric in NPV mode. Blade servers connect to the virtual connect flex fabric device, and the virtual connect flex fabric device, configured as an NP_Port connects to a full Comware based 5900 fibre channel fabric. One perceived disadvantage is that the traffic must leave the NPV switch and travel to an actual FC switch to get forwarded to a target. Practically speaking, this is not an issue since most designs place initiators behind the NPV switch, and targets behind the FC fabric switches. Traffic must traverse this path anyway. Another possible disadvantage relates to link oversubscriptions. Since multiple servers may be connected through a reduced number of uplinks, you must ensure that that sufficient bandwidth is available on uplinks.

Prerequisites to Configure NPV Mode Before configuring a switch to operate in NPV mode, the system working mode must be set to advanced, which requires a system reboot. Also, you should verify that no existing FCoE mode configurations have been applied.


Configuration Steps for NPV Mode The steps to configure NPV mode are shown in Figure 5-73.

Figure 5-73: Configuration Steps for NPV Mode

The steps involve globally enabling this mode on the switch, and then configuring the VFC or FC Interfaces. Uplink interfaces must be configured as NP_Ports, and downlink interfaces must be configured as F_Ports. Finally, the configuration should be verified. Notice that step 2 involves configuring the port as either a virtual or native FC port. This means that the NPV mode can act as a convenient migration and interoperability mechanism between native Fibre Channel and FCoE systems. This is because a 5900 CP could use FCoE over Virtual FC interfaces to connect to the servers, while using native FC interfaces to connect to a traditional Cisco, HP, or Brocade fabric. If there are legacy servers with native Fibre Channel, they can be connected to the downstream native FC interfaces of the NPV switch. This switch can connect upstream to native fibre channel storage system, while simultaneously connecting via FCoE to other storage systems.

Configuration Step 1: Configure global NPV mode ******ebook converter DEMO Watermarks*******

The first step is to enable global NPV mode. This done with the “fcoe-mode npv” command, as shown in Figure 5-74.

Figure 5-74: Configuration Step 1: Configure global NPV mode

This FCOE mode command supports a single configuration option only. A single switch cannot function in both an NPV mode and fibre channel forwarding mode at the same time.

Configuration Step 2: Configure FC or VFC Interfaces The second step is to configure FC or virtual FC interfaces, similar to previous configurations which we have seen. You can configure a native fibre channel interface, ensuring that the correct optics have been installed. The top example in Figure 5-75 shows an interface that started under the assumption that it would operate as 10Gbps Ethernet. The command “port-type fc” was issued, converting this port to a native fibre channel port.

Figure 5-75: Configuration Step 2: Configure FC or VFC Interfaces


The bottom example in Figure 5-75 illustrates the configuration of a virtual fibre channel interface for FCoE, assuming that DCB is already configured. Interface tengigabit 1/0/4 is configured as a trunk link, and configured to allow VLAN 10 to traverse this link. Next, interface VFC 4 is created, bound to the interface ten1/0/4, and made a member of VSAN 10.

Configuration Step 3: Uplink Interface NP_Port The uplink interface should be configured as the N_Port Virtualization port. This is the interface that will connect to an available port on fibre channel switch. This would typically be a Fibre Channel Forwarder (i.e., a native Fibre channel switch). However, it could be another NPV switch’s F_Port. For example, a blade server could be connected to a virtual connect flex fabric, as shown in Figure 5-76. The virtual connect flex fabric, operating in NPV mode, connects to a 5900 CP, also operating in NPV mode, which is connected to a Brocade SAN switch.

Figure 5-76: Configuration Step 3: Uplink Interface NP_Port

In this scenario, all the fabric logins will be handled by the Brocade SAN switch with a single domain ID. As long as this is a typical deployment, and storage systems targets are not running on the blade server, the system will work fine.

Configuration Step 4: Downlink Interfaces F_Port The downlink interfaces will connect to the N_Port of the server’s HBA or CNA. The hosts will see F_Port and initiate FC link setup. This host FLOGI session will be proxied by the NPV switch to the upstream Fibre Channel switch.


As shown in Figure 5-77, this is accomplished with the “fc mode f” command.

Figure 5-77: Configuration Step 4: Downlink Interfaces F_Port

Configuration Step 5: Verify Status The NPV switch status should be verified. Using the “display npv login” command, you can see the actual logins. Even though they are not processed by NPV mode switch, it still keeps track of logins, enabling you to see which FC devices are logged in to which port. This is important. When the NPV mode switch receives data for a specific FC_ID, it must know which downstream interface should receive this traffic. Use the “display NPV status” command to validate operational status. With the “display npv traffic-map” command, you can see which downstream ports are currently using which upstream ports. If multiple upstream ports are available, the NPV switch can perform a kind of load distribution. It does this by assigning some downstream ports to uplink 1 and other downstream ports to uplink 2, for example. This can be seen by displaying the traffic map.

Summary In this chapter, you learned about how various infrastructure components create an integrated SAN fabric. This included discussions about HBAs, CNAs, native Fibre Channel switches, and FCoE switches. You learned that FC fabrics deploy various numbering schemas, such as FC_ID addresses for data transmissions, static and FSPF-based routing, and WWNs with zoning to control the targets that specific server initiators are allowed to use.


Key services provided by the SAN fabric include the Fabric Login service, which formalizes the connection of hosts to the fabric, and the Simple Name Service used to map FC_IDs to WWNs. VSANs enable separate logical storage fabrics to share a common physical infrastructure, which can lower costs and improve security. You also learned that MPIO is required to leverage the improved reliability and performance provided by multi-path redundancy and load-sharing. Finally, NPIV and NVP were discussed as methods of enabling hypervisors such as VMWare’s ESX or Microsoft’s Hyper-V to avail HBA or CNA functionality to internally hosted Virtual Machines.

Learning Check Answer each of the questions below. 1.

Which statement below accurately describes an FCoE deployment consideration? a. FCoE protocols and systems are easily deployed as a multi-vendor solution. b. Nearly any HP switch and storage system combination can be used to deploy an FCoE solution c. There is no need to deploy a specific version of firmware to the switches in an FCoE deployment d. HP has devised a set of converged networking cookbooks to ensure you are deploying a validated set of storage arrays, servers, CNAs, and switches, using specific firmware versions.

2. Choose three correctly described components of a typical FCoE deployment (Choose three). a. A host is a server system that initiates disk read or write requests b. A disk array is a target devices that responds to disk read or write requests from a host. c.

N_Ports are used to connect hosts nodes to the fabric, while T_ports connect target disk end systems to the fabric.

d. F_ports are those fabric ports that connect to either host initiators or target disk arrays. e. E_Ports are used to expand the fabric to multiple switches. 3. Which four statements below accurately describe FCoE naming and forwarding


conventions (Choose four)? a. A WWN provides a unique identifier for each FCoE device to enable frame delivery b. The WWN is somewhat like a BIA for a Layer 2 network interface that can identify systems independently from the FC_ID c. The FC_ID is a dynamically assigned address that is used as the source and destination address of a frame. d. With Comware devices, the FC_ID uses 16 bits to identify ports, increasing addressing scalability. e. Each switch is assigned a unique domain ID. This ID must be manually assigned f. The domain ID can be statically assigned or manually assigned. 4. Which two statements below accurately describe FC classes and flow control (Choose two)? a. BE flow control uses an R_RDY message for flow control. b. BB flow control use a frame credit mechanism to help provide lossless frame transmission. c. All flow control mechanisms can be used by all FC classes. d. EE flow control uses an ACK frame to let the transmitter know that the previous frame was successfully received. e. BB flow control is not well suited for time-sensitive applications. 5. What are to prerequisites to configuring NPV mode on a switch (Choose two)? a. System-working-mode must be set to advanced, which immediately takes effect b. No existing fcoe-mode configurations should be in place c. System-working-mode must always be left to its default value d. System-working-mode must be set to advanced, which takes effect after a reboot e. The correct fcoe-mode must be configured before NPV mode is activated

Learning Check Answers 1. d 2. a, b, d, e


3. b, c, d, f 4. b, d 5. b, d


6 Transparent Interconnection of Lots of Links (TRILL)

EXAM OBJECTIVES In this chapter, you learn to: ✓ Describe the goal of TRILL. ✓ Describe the use cases of TRILL. ✓ Understand the operation of TRILL. ✓ Configure TRILL.

INTRODUCTION This chapter is focused on the TRILL protocol. You will learn the motivation for its development, and how it can provide very large-scale Layer 2 infrastructure for data centers. You will also explore the details of TRILL operation, and understand how TRILL leverages Layer 3 technologies to provide Layer 2 services. TRILL design and deployment considerations will be discussed, as well as how to configure TRILL on Comware devices.

TRILL Introduction TRILL is an IETF standard that stands for Transparent Interconnection of Lots of Links. The goal of the TRILL protocol is to provide large-scale Layer 2 fabric services. The intent is to maintain the simplicity of traditional Layer 2 systems while adding the scalability and convergence of a Layer 3 routed network.


From the perspective of an endpoint device, a standard frame continues to transport data from source to destination MAC address. However, traditional Spanning Tree Protocol (STP) between switches is replaced with the routing-like functionality of TRILL. STP uses a single path in the network, which may not actually be the best path for a specific source-to-destination traffic flow. With TRILL, Layer 2 forwarding is based on best path selection, very much like that of OSPF or IS-IS. This provides actual best path selection while supporting a redundant, active-active topology.

TRILL Standards The TRILL protocol is documented in a set of IETF RFCs, and was developed by Radia Perlman, developer of the original IEEE STP. Figure 6-1 shows the RFCs that describe the actual operation of the TRILL protocol.

Figure 6-1: TRILL Standards

TRILL frames use a hop count or Time-To-Live (TTL) mechanism, which must be processed by switch hardware. For this reason, TRILL will typically only be supported on newer generation Data Center switches which have been designed with ASICS that support this TTL processing.

TRILL Concepts 1 A switch that runs TRILL is called a Routing Bridge, or RBridge. This is because it is a Layer 2 bridging device that uses routing functionality to determine optimal data flow. Figure 6-2 shows four RBridges that form the TRILL network. This network supports connectivity for standard endpoints and Classic Ethernet switches. Two such switches are indicated in the figure, as CE-Switch1 and CE-Switch2.


Figure 6-2: TRILL Concepts 1

Each RBridge is identified by a unique system ID, which is automatically generated by the TRILL protocol. The ID is based on device MAC address by default. A system ID can be manually configured, but automatic generation works well. The system ID is not used to forward frames. It is used to uniquely identify each RBridge inside the Link State Database (LSDB) of the TRILL network. This can be compared to how each OSPF router is identified by its router ID in OSPF’s LSDB, often derived from the IP address of a loopback interface. An RBridge forwards frames through a TRILL network based on source and destination nicknames. Each RBridge in the figure has a unique hexadecimal value for this purpose. RB1 has been assigned a nickname of 0x0001, RB2 has 0x0002, and so on. When RB1 sends a frame to RB2, TRILL adds source and destination nicknames to the frame. In this scenario, RB1’s nickname of 0x0001 is the source, and RB2’s


nickname of 0x0002 is the destination. Each RBridge in the path will process these frames based on destination nickname address. Again, this can be compared with how a routing protocol like OSPF routes frames based on destination IP address. Nicknames can be automatically generated by the system. Unlike IP addressing, nicknames do not have a hierarchical structure. There is no network and host portion for nicknames, and neither is there any sort of mask. Nicknames create a simple, flat address space. Nicknames are 16-bit values, so the theoretical number of available addresses is 65,536. They can be randomly chosen from this space. However, an administrator can manually assign nicknames, making network documentation and diagnostics more intuitive. Instead of random hexadecimal numbers, a schema can be used to distinguish between distribution and access switches, or perhaps to indicate rack locations inside a data center. TRILL is based on the IS-IS link-state routing protocol. IS-IS operates between all RBridges, exchanging link state information, and building an LSDB. The SPF algorithm is run on this database to determine optimal paths through the TRILL network. With TRILL terminology, CE stands for Classic Ethernet, and is used to indicate a traditional Ethernet switch that does not run the TRILL protocol.

TRILL Concepts 2 Like all link-state routing protocols, TRILL uses the concept of a Designated Router, called the Designated RBridge (DRB). DRBs improve update efficiency on multiaccess networks by acting as a single point of contact. Thus, every device on a multiaccess broadcast domain needn’t communicate with every other device. Each device need only communicate new information to the DRB, which will inform others. Each DRB generates Link State Advertisements (LSAs) for its multi-access network. Another use for the DRB is concerned with access links, which are used to connect the TRILL network to endpoints or CE switches. The DRB is responsible for selecting the Appointed VLAN Forwarder (AVF). CE-Switch2 in Figure 6-3 has a traditional Ethernet connection to RB1 and RB2. The TRILL protocol can detect that RB1 and RB2 are interconnected via this CE device. Each RBridge sends HELLO packets. RB1’s HELLO packet is transported by CESwitch2, and so arrives inbound at RB2. The same thing happens when RB2 sends a HELLO packet, and so the two RBridges discover each other on this access link.



Once RB1 and RB2 discover each other, they elect a single DRB for the link. In this example RB2 has won the election for the link connected to CE-Switch2. The DRB can then select an AVF, which ensures that each access network VLAN is only allowed a single connection to the TRILL network. This avoids loops. It also prevents the TRILL network from perceiving a single MAC address as being sourced by multiple RBridges. This feature also enhances scalability when multiple access VLANs connect to a TRILL network. TRILL is VLAN-aware, and so the links from CE-Switch2 to RB1 and 2 can be 802.1q trunks. In this scenario, a thousand VLANs are passing over these trunk links. To split the load, the DRB might select itself as the AVF for VLANs 1-500 and it could appoint RB2 to forward traffic for VLANs 501-1000. In this way, the DRB controls which device forwards traffic for a particular VLAN. The actual port which is doing the forwarding is called the Appointed Port (AP).


This is similar to how Multiple Spanning-Tree Instances (MST) works. While this delegation of traffic load is defined in the TRILL standard, the HP Comware implementation does not support such distribution. Instead, the DRB is always the AVF, with no distribution mechanism available. However, this is not a practical limitation. TRILL can be combined with IRF topologies, allowing an alternative mechanism for this redundancy.

TRILL Concepts 3 RBridges peer with each other using the IS-IS link state protocol. They exchange link state information, build an LSDB, and calculate the best path to each destination. These destinations are identified based on System IDs, which acts as the LSP identifier. The LSP also includes the nickname of the destination RBridge. The ingress RBridge is the TRILL device receiving an Ethernet frame from an endpoint or a CE Switch. This ingress RBridge encapsulates the original Ethernet frame in a new TRILL frame. This new frame contains source and destination nicknames in the TRILL network. The ingress RBridge sets its own nickname as the source, and determines an appropriate destination nickname to apply. This new frame is routed through the TRILL network via the shortest path, based on the destination nickname. In Figure 6-4, supposed RB1 must transmit data to RB2. A traditional STP topology might choose the path from RB1 > RB6 > RB3 > RB7 > RB2. TRILL uses IS-IS, and so will determine a more elegant path from RB1 to any one of the switches RB6-9, and then straight down to RB2.



In this scenario multiple paths exist between RBridges. If these paths are of equal cost, then typical Layer 3 load-balancing principles can be applied. Endpoint unicast traffic entering RB1 can be load-shared over the four different paths to destination RB2. This is a significant scalability improvement over the original Spanning Tree Protocol. The RBridge connected to the destination end point is called the egress RBridge. This device must remove the TRILL header, and transmit a standard Ethernet frame to the endpoint. The operation of TRILL is not affected whether frames enter as native, untagged Ethernet frames, or whether they include an 802.1q tag. TRILL simply processes frames based on the source and destination MAC address. Any 802.1q VLAN tag will be maintained over the TRILL network.

TRILL Frame Format Figure 6-5 shows three sections of header information for the payload. This includes an outer header, the TRILL header, and an inner header.


Figure 6-5: TRILL Frame Format

The outer header is a typical Ethernet frame. It is built with destination and source MAC addresses to traverse a single link between two RBridges, in a hop-by-hop fashion. The receiving RBridge strips off the outer header, determines the best outbound interface, creates a new outer header, and sends the frame. This frame arrives at the next-hop RBridge, and the process continues. It is possible to use a VLAN tag in this outer header, but know that this 802.1q tag has local significance only. It is like a routed sub-interface used to connect between two routers. The TRILL header operates similar to an IP header in a Layer-3 routed network. This end-to-end header is maintained across the entire TRILL network. This header contains the nickname of the ingress RBridge as the source, and the egress RBridge nickname as the destination. Each RBridge receiving a TRILL frame discards the outer header, analyzes the TRILL header’s destination nickname, and selects the outbound interface along the best path. It then builds a new outer header, and transmits the frame to the next-hop RBridge. Each RBridge also decrements the Hop Count field in the TRILL frame. This is similar to how the Time-To-Live (TTL) field is utilized in a Layer 3 routing


protocol. It mitigates any loops inside the TRILL network. Inside the TRILL header is the original Ethernet frame created by an end system. The egress RBridge strips off all other headers, leaving only this original Ethernet frame, before transmitting to the intended endpoint.

TRILL Frame: Outer Header 1 As a best practice, RBridges should be directly connected. In Figure 6-6, this is not the case, as it is possible to insert an intermediate CE switch between two RBridges. This CE switch can perform traditional L2 forwarding, since the outer header has a valid source and destination MAC address. Specifically, this would be the source and destination MAC address of the RBridges on the local link. Since these are typical MAC addresses, the CE switch can learn the addresses and forward frames.


Figure 6-6: TRILL Frame: Outer Header 1

The scenario depicted can serve to clarify why any VLAN tag on the outer header is only locally significant. It is only used to traverse the connection between two directly-connected RBridges. Of course, there is an intermediate CE switch between the RBridges in this case. This CE switch must be configured to support any VLANs and tagging used between RBridges. Stated another way, TRILL should not attempt to use VLANs not supported by intermediate CE switches. In reality this will be automatically handled by TRILL. TRILL exchanges IS-IS HELLO packets over multiple VLANs. These HELLO packets will be tagged with multiple VLAN IDs, enabling the RBridges to detect which VLANs can be successfully passed between each other and which ones cannot.


TRILL Frame: Outer Header 2 Any VLAN tag used by the outer header is referred to as the designated VLAN. It is acting as a transport VLAN tag used to reach the RBridge on the other side of a single link. If this link supports 802.1q tagging, then multiple VLANs could be available for the RBridges to use. In this case, the lowest VLAN ID is selected as the designated VLAN. If enabled on the interface, this would be VLAN 1. Often, VLAN1 is configured as the PVID, which is untagged. If this is the case, then the outer header requires no VLAN tag. The frames will be transmitted by RBridges as untagged. If an intermediate Layer 2 switch is deployed, you must ensure that it is properly configured for VLAN tagging. This CE switch may support multiple VLANs, some of which are used for other purposes in the traditional Ethernet environment. In this case, you can manually configure a designated VLAN. For example, supposed that VLANs 10 through 20 are supported between RB1 and RB2, but VLANs 10 through 19 are used for other purposes by the CE switch. The administrator can manually configure the designated VLAN as VLAN 20. The designated VLAN is configured per interface. If RB1 has a different interface connected to RB4, that interface could be configured to use VLAN 1 has the designated VLAN. Again, this is an optional configuration. RBridges should be directly connected, and so would normally use untagged VLAN1 for TRILL.

TRILL Frame: TRILL Header The TRILL header functions very much like a typical layer 3 IP header. The Ingress RBridge receives user traffic. To build the TRILL header, it sets its own nickname as the source. It then determines the best path to reach the destination, and sets the appropriate egress RBridge nickname as the destination. It starts this process by comparing the destination MAC address in the original header to its own MAC address table. If the destination address is unknown then the packet will be forwarded to the destination nickname All-RBridges. The All-RBridges nickname is sent to a Multicast distribution group and processed by all RBridges. The MAC address is known if there is a MAC table entry for that address. This MAC table doesn’t bind MAC addresses to interfaces like a classic Ethernet switch. Instead, the table binds MAC addresses to nicknames. The switch can thus choose the


appropriate egress RBridge, and place that nickname into the destination address field of the TRILL header. Actual traffic forwarding inside the TRILL network is based on this destination nickname.

TRILL Frame Flow Overview Figure 6-7 introduces the general concept of how a frame travels through a simple TRILL fabric. Only the header fields that are most important to understanding general concepts are shown.

Figure 6-7: TRILL Frame Flow Overview

1. The client named CL1, with MAC address C1, creates a frame destined for the client named CL3, with a MAC address of C3. In the example, an 802.1Q tag of 10 happens to be added, which could be the case if the end node were a VMbased server. If it were an actual end client, an 802.1Q tag might not be included. The point is that ultimately, the original frame created by the source node will be successfully delivered to the destination node. 2. The TRILL Routing Bridge RB1 receives this frame. In this case, RB1 has learned that CL3 is located at the site behind RB3. The process of how this is learned will be discussed in the next section of this chapter. RB1 adds a TRILL header, placing its nickname in the Ingress RB fields, and RB3’s nickname as the ultimate destination.


a. The hop count field is set to a number high enough to reach the ultimate destination. The Multi-destination, or “M” bit is set to zero, indicating that the frame is a unicast frame. b.

RB1 adds an outer frame, with itself as the source, and RB2 as the destination. In this example, VLAN 20 is used to move frames through the TRILL fabric.

3. RB2 receives the frame, strips off the outer header, and adds a new one, with itself as the source and RB3 as the destination. With the exception of the hop count field being decremented, the TRILL header is not modified. The example uses Ethernet as the only Layer 2 protocol. However, the connection between Routing Bridges could be PPP, any nearly any other Layer 2 protocol. The frame is transmitted to RB3 4. RB3 receives the frame, and strips off the outer header. The TRILL header indicates that RB3 is the egress Routing Bridge, and so the TRILL header is also removed. a.

Now the original, inner header can be analyzed. RB3 sees that the destination MAC address is for client CL3, and forwards the frame accordingly.

TRILL Operation This section is focused on TRILL operation. This includes both unicast and multidestination forwarding, along with a discussion of how TRILL forms adjacencies. Trill link types, trees, and DRB operation will also be covered, along with some design considerations.

TRILL Forwarding: Multi-Destination 1 Multi-destination traffic applies to broadcast, multicast and unknown unicast traffic. To handle this traffic, TRILL creates a distribution tree to ensure loop-free delivery. This is very similar to a classic Spanning Tree topology. However, while STP uses this topology for all traffic, TRILL only uses it for broadcast, multicast, and unknown destination unicast frames. It uses the shortest path for all other traffic. The basic principle is that there are some tree roots inside the TRILL network. All other RBridges calculate the shortest path to the tree root. These paths are indicated by the thick lines in Figure 6-8. Multi-Destination traffic will be forwarded based on that tree topology.


Figure 6-8: TRILL Forwarding: Multi-Destination 1

TRILL Forwarding: Multi-Destination 2 In the example scenario in Figure 6-9, Server2 and Server4 need to communicate. To begin this communication, Server2 sends an ARP broadcast to discover Server4’s MAC address.



RB2 is the ingress RBridge for this frame. It receives the frame and determines that the destination address is a broadcast. It creates a TRILL header by adding its own nickname as the source and by adding the tree name as the destination. Usage of a unique tree name enables multiple distribution trees within a single TRILL topology. The ability to use multiple distribution trees enables load balancing across the TRILL fabric. It is the ingress RBridge that chooses which path shall be used.

TRILL Forwarding: Multi-Destination 3 RB2 forwards this frame into the tree. Core RBridge RB1 receives the frame and forward it based on the tree name. Downstream access RBridges RB3 and RB4 receive this frame and analyze the endpoint user’s source MAC address, along with the source nickname, as shown in Figure 6-10. Thus the sender’s MAC address is now learned. It is added to the TRILL switch’s MAC address table and bound to the


associated source nickname, as gleaned from the TRILL frame.


Compare this to traditional Ethernet switches. A CE switch looks at the source MAC addresses of frames to automatically bind MAC addresses to outgoing ports, and stores this information in a MAC address table. Similarly, a TRILL switch looks at source MAC addresses to automatically bind them to source nicknames, storing this information in a TRILL MAC address table. Each TRILL access switch that received this frame will remove the TRILL header and send a traditional Ethernet broadcast frame out its access ports. Both Server3 and Server 4 receive the ARP request from Server 2.

TRILL Forwarding: Unicast 1 As described in Figure 6-11, endpoint MAC addresses are learned, and bound to a


nickname. Once this occurs, the TRILL devices can perform unicast forwarding.

Figure 6-11: TRILL Forwarding: Unicast 1

TRILL Forwarding: Unicast 2 Continuing with the scenario in Figure 6-12, Server 4 has received the ARP request from Server2. It recognizes its address in the request, and so must send an ARP unicast reply back to Server2.



RB4 receives this unicast frame, and looks up the destination MAC address in its table. It finds a match, and sees that the MAC address for Server 2 is bound to the nickname for RB2. RB4 creates a TRILL frame, using its nickname as the source, and RB2’s nickname as the destination.

TRILL Forwarding: Unicast 3 RB4 finds a match for the destination MAC address, and so determines the destination nickname is RB2’s. Next, RB4 looks up the destination interface for this nickname. As TRILL devices, these RBridges have been running IS-IS, and have already determined the shortest path to RB2. RB4 thus knows which physical interface should be used to forward data to RB2. RB4 forwards the TRILL frame toward RB2 over


this interface. RBridges in the path to RB2 receive the frame and forward it along the best path to RB2. RB2 receives the frame. It gleans the endpoint’s source MAC address, and binds it to RB4’s nickname in its MAC address table. Finally, it removes the TRILL header and sends a standard Ethernet frame to Server 2. In this example the direct link connecting RB2 and RB4 is the single best path between Server 2 and Server 4, as shown in Figure 6-13. If that link fails, there are two possible paths remaining. One path is via RB3 to RB2, and the other is via RB1 to RB2. It would be possible in that case to leverage equal cost multipath capabilities. The load could be divided between these two paths.


TRILL RBridge Peering and Adjacencies As discussed, TRILL uses the IS-IS link state routing protocol, using the SPF


algorithm. This IS-IS protocol runs at Layer 2 on the switches, using its own hello frame for peer discovery. The TRILL IS-IS protocol operates as a result of basic TRILL configuration, and does not require specific configuration of its own. Links and link costs are described as TLVs inside LSPs. As shown in Figure 6-14, the information from these LSPs is stored in the LSDB. The LSDB includes every possible path to each destination. Then the SPF algorithm is performed on this database to determine the best paths. This is based on path costs, as advertised in LSPs.

Figure 6-14: TRILL RBridge Peering and Adjacencies

Just like with OSPF running on Layer 3 routers, the LSDB is a list of all possible paths, and the routing table is a list of the best paths. This routing table lists every destination nickname, along with the outgoing interface used to get there.

TRILL Port Types TRILL uses three port types. These are the TRILL access ports, TRILL Hybrid ports, and Trunk ports. These port types should not be confused with traditional interface port types. They are separate and distinct from VLAN access, hybrid, and trunk ports. Figure 6-15 represents TRILL access ports with a solid line. TRILL access ports


only forward user frames. They do not accept TRILL-encapsulated frames. TRILL hello frames may be exchanged on these interfaces, in case a designated RBridge election is required.

Figure 6-15: TRILL Port Types

If you are sure that a port will be the only TRILL access port for a given segment, you can use the access “alone” option. This effectively makes the interface a silent interface, since no hello frames will be exchange. If two access alone switches are connected to a common CE switch they will both be in the forwarding state. This causes MAC address flapping inside the TRILL network. For this reason, you should be mindful when using this option. In Figure 6-15, Server4 has a single link to RB4, making it safe to use the access alone option on that interface. TRILL is fully supported in combination with IRF. RB4 could be an IRF system of two physical switches, with Server4 connected to two physical interfaces in the bonded or teamed configuration. This logical interface would be perceived by TRILL as a single interface. In this way it is possible to gain redundancy for endpoints connected to access alone ports. Figure 6-15 also shows CE-Switch 3 connected to both RB3 and RB4 for redundancy. In this scenario, it is vital that all the traffic entering the TRILL network from CE-Switch3 enters via a single port. To that end, RB3 and RB4 connections to this switch are configured as access ports. Therefore they do not support TRILL backbone traffic, but will exchange TRILL hello frames. One of them will be elected as the designated RBridge and will actually forward CE-Switch3 frames into the TRILL network. The other interface will be in the blocking state.


Like an access port, a TRILL Hybrid port can forward user frames. It can also be used to transmit TRILL frames and operate as a transit link. Figure 6-15 shows the RB1 and RB2 connections to CE-Switch2 are configured as Hybrid ports. This is not a recommended configuration, but does provide some additional redundancy. If the links connecting RB1 and RB3 on the left to RB2 and RB4 on the right were to fail, there would still be a backbone-capable path via CE-Switch2. Most of the VLANs in a CE switch will be user VLANs. In this hybrid scenario it can make sense to manually specify the designated ports which will the actual transport VLAN used by TRILL. TRILL trunk ports can only be used to transmit TRILL frames. It does not support the transmission or receipt of traditional Ethernet frames. Only connections with other RBridge can be made. Effectively only a single VLAN is needed on the TRILL trunk port.

TRILL Multi-Destination Tree Calculation 1 You previously learned how TRILL multi-destination traffic is forwarded very much like a classic STP scenario, over a spanning-tree topology with one bridge acting as the tree’s root. Like classic STP, root bridge election is based on a configurable priority value that defaults to 32768. The difference is that with STP, the lowest priority value wins the election, while with TRILL the highest priority wins. If there is a tie, the highest nickname wins. In Figure 6-16, RB1 is configured with priority 65000, so it is the tree root. RB2 has the second-highest priority of 64000, and so would become the new root if RB1 were to fail.


Figure 6-16: TRILL Multi-Destination Tree Calculation 1

TRILL Multi-Destination Tree Calculation 2 A Spanning Tree can have only one Root Bridge. However, that Root Bridge can request that other trees be created in the same physical topology, each with its own Root Bridge. This can make sense if multiple equal cost paths are available towards the tree root. If multiple equal-cost paths are not available the different trees will simply follow the same topology. In Figure 6-17, RB1 is the tree root, which has been configured to request two tree calculations.


Figure 6-17: TRILL Multi-Destination Tree Calculation 2

RB3 does not have multiple equal-cost paths to RB1. Therefore, the trees for both VLAN 11 and VLAN 12 use the same link to RB1. This is also the case for RB2. RB4 has two equal-cost paths to RB1. This means that the tree for VLAN 11 can get to the RB1 root via RB2, while the tree for VLAN 12 can use RB3. In this way, traffic load can be shared among equal-cost paths. The Ingress RBridge can map traffic to the multicast distribution, per VLAN. This is not a configurable part of the TRILL definition. In the example, VLAN 11 may use the topology indicated by the dashed lines, and VLAN 12 may use the topology indicated by the solid lines. Or the opposite may be the case. As long as the load sharing occurs, network performance is optimized.

TRILL Designated Routing Bridge A DRB is selected for each link, based on the highest priority assigned to RBridge interfaces on that link. The default priority is 64. In case of a tie, the highest MAC


address will win. Figure 6-18 shows that RB1 and RB2 are connected in a TRILL backbone. RB1 is the DRB, which means that either someone configured it with a higher priority, or the priorities on RB1 and RB2 were the same, and RB1 had a higher MAC address.

Figure 6-18: TRILL Designated Routing Bridge

DRB responsibilities depend on link type. On backbone links, the DRB must advertise a pseudo node, used to represent that link. This is standard IS-IS


terminology, and similar to how an OSPF Designated Router (DR) operates on any multi-access network, like Ethernet. This responsibility has to do with the internal operation of the link-state protocol. For access links, the DRB must assign the AVF role, which controls actual data flow. As previously mentioned, with Comware implementations, the DRB always assigns itself as the AVF. Figure 6-18 shows RB1 and RB2 connected via CE-Switch2. Since RB1 is a Comware switch that won the DRB election, it also fills the role of AVF. All data from CE-Switch2 enters the TRILL network via RB1. RB2 is neither a DRB nor AVF. If the connections from both RB1 and RB2 support the same VLAN or set of VLANs, then RB2’s connection to CE-Switch2 remains in a blocking state. This prevents a loop. However, if RB1’s connection to CE-Switch2 only supports VLAN 100, while RB2’s connection only supports VLAN 200, then there is no need for RB2’s connection to enter a blocking state. The RBridges are aware of VLANs through the use of Hello frames.

Unicast ECMP ECMP allows load-sharing across multiple equal cost paths. This can be controlled in part by device-level maximum ECMP settings. This option functions at the ASIC level. It controls what the maximum number of paths calculation should be for all possible ECMP protocols. This includes OSPF, BGP, TRILL and so on. All of these protocols must respect the hardware’s maximum configured capability. ECMP is also controlled by TRILL maximum unicast ECMP path settings. This limit is set to 8 paths by default. The 5900-series supports up to 32 paths at the hardware level. This configuration can never exceed the device-level ASIC’s maximum ECMP path settings. It is the ingress RBridge that performs traffic distribution. In the example in Figure 619, Server4 transmits data to Servers A1 and A2. The MAC addresses for A1 and A2 are both learned on RB1. Since RB4 has multiple equal-cost paths to RB1, it can distribute the packet load over those paths.


Figure 6-19: Unicast ECMP

Design Considerations 1 The configuration of Layer 3 routing can impact TRILL operation. This is because a device cannot perform both TRILL Layer 2 functions and IP Layer 3 routing on the same interface. Therefore, the TRILL devices must be pure Layer 2 devices, interconnected to a Layer 3-capable router. For dedicated TRILL ports, VRRP can be used on these Layer 3 gateways to ensure redundancy.

Design Considerations 2 TRILL does not forward Spanning Tree BPDUs and does not participate in Spanning Tree calculation. Effectively, TRILL creates Spanning Tree Islands. This also means


that Spanning Tree should be disabled on the TRILL interfaces. The TRILL network is loop free due to IS-IS SPF calculations and reverse paths checks for Multicast traffic. This loop-free topology is enforced by the data plane by leveraging the hop count field in the TRILL header. However, while loops are mitigated inside the TRILL network, the rest of the network still requires sound design and implementation. CE switches must still be configured with the same care as before.

Design Considerations 3 TRILL is fully supported in combination with IRF. This means that link aggregation can be used between IRFs systems for backbone links, as well as for access links toward CE switches and endpoints. Figure 6-20 shows a TRILL network with two stand-alone switches at the core. Surrounding this core are four IRF systems. Each IRF system consists of two physical switches. Each IRF system has multiple connections to the TRILL core.


Figure 6-20: Design Considerations 3

There is a direct connection between IRF3 and IRF4, enabling traffic to pass directly to each other. When data is transmitted to other devices in the TRILL network, it must travel through the TRILL core switches. Between CoreA and IRF3, a link aggregation has been created. This aggregated link set can be configured as a TRILL trunk. This provides simultaneous support for both TRILL and IRF inside the backbone. At the access layer, the CE-Layer2 switch is also connected to IRF3 with two physical links, configured for standard link-aggregation. From the IRF3 perspective,


this is simply a bridge aggregation, configured as a TRILL access port. In this particular case, it could be a TRILL Access alone port, but this is optional. There are also possibilities for redundant design when a TRILL network connects to Layer 3 devices. One such method is for two physical routers to each be connected to a TRILL access port. VRRP would provide a traditional failover mechanism. Since a VRRP MAC address can only be learned in one port of an RBridge, an activestandby model must be used. Another method is indicated in Figure 6-20. In this case the TRILL network’s IRF4 is connected to IRF CE-Layer 3 via link aggregation. Since IRF presents the two physical routers as a single, logical device an active-active model is deployed. Many data center engineers prefer the superior bandwidth utilization provided by IRF’s active-active design. Others may be attracted to the individual control planes provided by separate, physical routers running VRRP.

Graceful Restart for TRILL IS-IS TRILL’s IS-IS process includes support for graceful restart. This feature is helpful for situations where there is an MPU failover on chassis switches, or on a master switch failover in an IRF. In these cases, the TRILL control plane processes must be restarted. Although the hardware and firmware quickly recover from these failovers, routing protocols lose their peer relationships. The time required to reestablish these lost peer relationships creates additional downtime. Most routing protocols like OSPF and BGP have a graceful restart mechanism that reduces this downtime, and TRILL is no exception. During this re-peering, the device that failed over takes on the role of a GR Restarter. The connected neighbors support this by becoming GR Helpers. Both devices must be configured to support this functionality.

TRILL Configuration Figure 6-21 introduces the basic and optional steps involved in TRILL configuration, as detailed in this section.


Figure 6-21: TRILL Configuration

Step 1: Enable TRILL Globally Figure 6-22 shows how to enable TRILL globally on the device. When you enable TRILL, a nickname is automatically generated. However, manual assignment eases troubleshooting. During diagnostics, it is often helpful to analyze the LSDB, which lists the links available for each TRILL nickname. If randomly generated nicknames are used, it can be difficult to know which nickname is associated to a particular physical device.


Figure 6-22: Step 1: Enable TRILL Globally

You should be alert and maintain good documentation when assigning nicknames. If two devices are configured with the same nickname, the highest configured priority value will prevail. If priorities were not configured, the device with the highest system ID will keep its name. The other unit could either be functionally disabled or auto-assigned a new name. Either way, you will have lost the advantage of knowing which device has a particular nickname.

Step 2: Configure Uplink Interfaces All TRILL backbone links are configured as TRILL uplinks. They will use any available VLAN on the interface for the outer encapsulation, defaulting to VLAN 1 if no other VLANs have been configured. If multiple VLANs are configured TRILL will automatically select lowest VLAN ID. In the example in Figure 6-23, interface ten1/0/2 is enabled for TRILL, and the TRILL link type is configured as a trunk. Thus, only other TRILL devices can be discovered on this link. It cannot process standard Ethernet data frames from endpoints.

Figure 6-23: Step 2: Configure Uplink Interfaces

Since no VLANs are configured on this link, trill traffic will use VLAN1. The outer header of the frame will be sent without an 802.1q header and associated VLAN tag.

Step 3: Configure Access Interfaces ******ebook converter DEMO Watermarks*******

TRILL ports that connect to endpoints or CE switches are configured as access ports. TRILL port types are separate and distinct from VLAN port types. TRILL access port frames can be tagged or untagged. Also, remember that user VLAN tags are part of the INNER header of the TRILL Frame. The inner header is merely considered part the data field by the outer header, so is not relevant to TRILL devices. The example in Figure 6-24 shows an interface enabled for TRILL, with a link type of access. Since this is the default link type, the command is not actually required. It is shown here to convey proper syntax, or should a trunk link need to be reverted to an access link.

Figure 6-24: Step 3: Configure Access Interfaces

Optionally you can use the access alone option on the interface. This makes it a silent interface, which is appropriate for single-homed devices that have no other TRILL connections.

Step 4: Configure Multicast Root TRILL devices use an STP-like tree for broadcast, multicast, and unknown unicast, traffic. Like STP, this Spanning-Tree emanates from a root bridge. The primary criterion for electing the root bridge is a configured priority value. The root can request multiple tree calculations for ECMP forwarding of multicast traffic. Figure 6-25 shows a TRILL configuration. The priority is set to 65535, the highest value available. Also configured is the number of trees that should be calculated. You should configure this value based on the number of actual paths available between other RBridges and the root bridge.


Figure 6-25: Step 4: Configure Multicast Root

Step 5: Verify Several display commands are available to validate your configuration. These commands are shown in Figure 6-26.

Figure 6-26: Step 5: Verify

Optional Step 6: Interface DRB Priority The first optional step involves configuring the DRB priority on an interface, as shown in Figure 6-27. This enables you to control which RBridge wins the election to become the DRB for a particular Ethernet link. The highest priority wins, and if there is a tie the highest MAC address wins. The default priority is 64.

Figure 6-27: Optional Step 6: Interface DRB Priority

Optional Step 7: Interface Designated VLAN You can optionally configure the transport VLAN used by the outer header of a


TRILL frame, per interface. By default the lowest available VLAN ID is used. This is negotiated between peers on the link by exchanging Hello frames. The VLAN specified in Figure 6-28 is a proposed VLAN. If this VLAN is not a viable option, the switches will negotiate another option.

Figure 6-28: Optional Step 7: Interface Designated VLAN

Optional Step 8: Interface Link Cost As shown in Figure 6-29, configuring interface link cost allows manual control of path cost calculation. This is applicable per interface. By default, auto-cost calculation is enabled. Auto-cost divides the number 20,000,000,000,000 by interface link speed. A 10Gbps link will have a cost of 2000, while a 1Gbps link carries a cost of 20,000.

Figure 6-29: Optional Step 8: Interface Link Cost

Manual configuration of link cost can be useful if the physical interface does not reflect the actual bandwidth available. For instance, a 10 gigabit interface may be used for an engineered Layer 2 connection like MPLS. This MPLS link may be limited to 2 gigabits per second of bandwidth toward a remote site. It would of course be appropriate to manually configure the TRILL cost in these situations. Other than situations like this, it is best to use the default auto-cost calculation.

Summary In this chapter, you learned that TRILL is an IETF standard that stands for Transparent Interconnection of Lots of Links. The goal of the TRILL protocol is to provide a large scale Layer 2 fabric services. The intent is to maintain the simplicity of a traditional Layer 2 while adding the scalability and convergence of a Layer 3 routed network TRILL-capable devices are called Routing Bridges (RBridges) because they use the


IS-IS routing protocol to build a routing table, optimizing data flow for Layer 2 traffic. These RBridges are identified by a System ID and a Nickname. TRILL devices elect DRB’s and AVFs are elected to help optimize TRILL operation and avoid loops. Multi-destination traffic is forwarded through the TRILL network using a classic STP-like topology, while unicast traffic is forwarded based on best paths, as determined by TRILL’s IS-IS routing protocol. TRILL is enabled globally on a Comware device, and also on appropriate interfaces. Interfaces that interconnect TRILL core devices are configured as trunk ports, while connections to CE switches and endpoints are configured as access interfaces.

Learning Check Answer each of the questions below. 1. Which of the statements below accurately describe TRILL (Choose all that apply)? a. An RBridge is a switch that runs TRILL. It uses routing functionality to determine optimal Layer 2 paths. b. Each RBridge is uniquely identified by a System ID. This ID is based on the device’s MAC address by default c. The nickname is used like an IP address to forward frames, along with the system ID. d. TRILL uses a link-state database (LSDB) to determine optimal Layer 2 paths. e. TRILL devices can be connected to a classic Ethernet switch. f. A TRILL deployment specifies DRBs, AVFs and appointed ports to help move frames along an optimal path. 2. Which three are part of a TRILL-encapsulated frame (Choose three)? a. The mezzanine header b. Sheep header. c. TRILL header d. Outer header. e. Inner header.


3. TRILL multi-destination forwarding includes which three frame types (Choose three)? a. All unicast frames. b. Broadcast frames. c. All Multicast frames d. Unicast frames with an unknown destination. e. Multicast frames with an unknown destination. 4. Which protocol does TRILL use to determine optimal paths? a. A special version of OSPF that runs at Layer 2. b. Standard OSPF. c. A special version of NLSP. d. A special version of IS-IS that runs at Layer 2.

Learning Check Answers 1. a, b, d, e, f 2. c, d, e 3. b, c, d 4. d


7 Shortest Path Bridging Mac-in-Mac Mode (SPBM)

EXAM OBJECTIVES In this chapter, you learn to: ✓ Describe the goal of SPBM. ✓ Describe the use cases of SPBM. ✓ Understand the operation of SPBM. ✓ Configure SPBM.

INTRODUCTION Shortest Path Bridging Mac-in-Mac mode (SPBM) provides Layer 2 connectivity between data center sites. This chapter defines SPBM, and explains SPBM operation and configuration.

SPBM Introduction Shortest Path Bridging Mac-in-Mac mode (SPBM) enables large Layer 2 deployments. The goal is to maintain the simplicity of a Layer 2 fabric while leveraging the scalability and convergence of Layer 3 routed services. Best-path traffic forwarding is based on a link-state routing protocol. With the traditional Layer 2 Spanning-Tree Protocol, some links are placed in a blocking state to avoid loops. Since SPBM uses routing mechanisms, all links in the network can be used for actual traffic forwarding. SPBM is very similar to the TRILL protocol. TRILL was proposed by the IETF,


while SPBM was developed by the IEEE. Each protocol has unique features that make it a compelling option for large data center deployments.

SPBM Standards There are two standards related to SPBM. One is IEEE 802.1ah, which defines Provider Backbone Bridging (PBB). The other is Shortest Path Bridging (SPB), as defined by IEEE 801.1aq. Each of these standards is discussed in this chapter, starting with PBB.

PBB Overview This section provides an overview of PBB, before delving into device roles and MAC-in-MAC encapsulation. Frame format and terminology is described, as is PBB operation with SPB.

PBB Introduction PBB is a Layer 2 VPN technology based on Ethernet standards, and is quite similar to an MPLS VPLS solution. It provides bridge functions towards customer networks and maintains isolation between customer networks. Compared with MPLS VPLS solutions SPBM is relatively simple to deploy, since it is based on Ethernet standards. MPLS VPLS infrastructures require expertise with IP routed infrastructures, label switching, and VPN overlay technologies. Engineers must be familiar with how all of these layers interact to successfully deploy and troubleshoot these networks. With PBB, customer MAC frames are simply encapsulated in a service provider MAC frame. The customer frame is transported as payload inside a provider Ethernet frame, with new source and destination MAC addresses. The frame header includes a service ID field to uniquely identify client networks for multi-tenant support. This is a unique differentiator compared to TRILL, which currently supports 4094 VLANs, with no explicit support for any kind of client network differentiator inside the TRILL fabric.

IEEE 802.1ah PBB Encapsulation Comparison In the early 80’s, the DIX (Digital, Intel, Xerox) consortium released the Ethernet II


specification, as depicted in Figure 7-1. This was soon followed by the IEEE 802.3 standard Ethernet frame format. Most protocol stacks, including TCP/IP, continue to use on the Ethernet II frame format. This frame format includes destination and source MAC addresses, an Ethertype definition, and the actual data or payload.

Figure 7-1: IEEE 802.1ah PBB Encapsulation Comparison

In 1998, the 802.1Q standard was released. This provided a tagging mechanism to allow traffic for multiple VLANs to traverse the same physical connection. This tag was carried in an additional header, which was inserted between the original Ethertype and Source Address fields of the standard Ethernet header. This tag has been referred to as a VLAN tag or “C-tag” (Customer tag). The customer VLAN tag must be in the range 1- 4094. In 2005 the 802.1ad standard was released, in an initial attempt at providing multitenant customer isolation in a common fabric. This standard is also known as QinQ because 802.1Q customer “C-tags” are maintained inside of another 802.1Q VLAN tag called the Service or “S-tag”. Using this technology, 4000 VLANs for one customer could have an additional outer 802.1Q tag of 11. These frames would then traverse the service provider’s infrastructure. The same range of 4000 VLANs for another client could traverse the same provider network with an outer 802.1Q tag of 12.


This provides a good solution to allow a single infrastructure to support multiple clients using the same VLAN numbers. However, the limitations of this solution should be considered. One disadvantage relates to scalability. A provider may wish to support several clients, each with 10,000 or more MAC addresses. All of the provider’s backbone switches must learn every MAC address from every client, and maintain a MAC address table. As new clients are added, memory and CPU utilization increases proportionally. Another limitation is that the original outer Ethernet header from each client is maintained, with its original source and destination MAC addresses. Any MAC address conflicts between clients will create problems for the provider network. MAC address conflicts may be relatively rare, but the risk remains for one client’s network to impact another client’s, which is not acceptable. Also, router redundancy protocols like VRRP greatly increase the likelihood of conflicts. A customer might deploy VRRP with Router ID 1, in VLAN 10. In this scenario, VRRP uses Ethernet MAC address 0000-015e-0001. Any other client with the same configuration would use the same MAC address. If both clients shared the same provider network, a conflict will occur. This will cause a MAC flapping issue, where the provider switches learn that source address as emanating from Client A, then from Client B, then client A again, and so on. PBB alleviates these concerns by encapsulating the entire client frame in a new Ethernet frame. The original frame is now the payload inside this outer header. The provider backbone need only process a few MAC addresses per client, regardless of whether the client has 10 or 10,000 MAC addresses. The PBB header also includes additional tags to provide unique services to each client. For example, the “I-tag” carries unique client QoS information. These features enable PBB to provide a scalable, multi-tenant environment. In 2001, the 802.11ah standard was included into the new 802.1Q-2011 specification.

PBB Device Roles PBB supports two device roles - the Backbone Edge Bridge (BEB) and the Backbone Core Bridge (BCB). This is analogous to an MPLS deployment, which uses Provider Edge (PE) and the Provider (P) roles.


The BEB receives the original customer frame and encapsulates it into a new MACin-MAC frame. The source address of this frame is the BEB’s own local MAC address. The target BEB’s MAC address is used as the destination in this new frame. In Figure 7-2, BEB1 receives and encapsulates a frame from the customer. BEB1’s MAC address is the source MAC address of the outer frame, and BEB2 is the destination. Once encapsulated, the frame is forwarded over the uplink port, into the backbone.

Figure 7-2: PBB Device Roles

The BCB receives this frame from BEB1, and forwards it based on the outer destination MAC address. In this scenario, the BCB is using traditional Layer 2 switching to forward the frame. It does not require knowledge of customer MAC addresses, since it is only using the outer frame’s MAC addresses for learning and forwarding purposes. BEB2 receives PBB frame and decapsulates it. It parses the frame’s backbone service instance identifier, or I-SID, which uniquely identifies each customer. This enables BEB2 to be connected to several clients and deliver frames to the correct one. In a way, the I-SID acts as a kind of a VLAN tag to differentiate customers inside the PBB network. There may be 200 customers supported between BEB1 and BEB2, each with a unique I-SID assigned. Over the provider network, the source and destination MAC addresses for all 200 customers are the same, that of BEB1 and BEB2. Therefore, the BCB only needs to learn and maintain a MAC address table with two MAC addresses.


PBB MAC-in-MAC Encapsulation PBB is purely an encapsulation technology, using the MAC-in-MAC format to ensure customer isolation over a common fabric. It does not provide any Layer 2 path calculation. This functionality is the responsibility of other protocols, such as link aggregation, STP, or SPB. Figure 7-3 indicates a very basic PBB example. Since there is only one possible path, there is no need for path calculation. The BCB can simply be a traditional Layer 2 switch with no special configuration required.

Figure 7-3: PBB MAC-in-MAC Encapsulation

The example network is simple, but lacks redundancy. Link Aggregation could be configured between the BEBs and BCB to provide link redundancy and additional bandwidth. Data forwarding is still based on traditional MAC learning and forwarding, since link aggregation does not change how the BCB learns and forwards frames. For device redundancy, two BCBs could be deployed. They could run traditional STP to determine a loop-free path, which would automatically be recalculated in case of a device failure. The basic examples described here serve to illustrate the point that PBB is only an encapsulation protocol, and is not responsible for path forwarding and redundancy.

PBB Frame Format The PBB frame includes the typical preamble, and then starts with a Backbone


Destination Address (B-DA) and Source Address (B-SA), as shown in Figure 7-4. This is a traditional Ethernet frame, with a standard Ethernet Type field, or TPID. The Backbone VLAN tag (B-Tag) is simply a traditional 802.1Q VLAN tag, which is used by the BCB fabric to properly forward frames. The Canonical Format Indicator (CFI) indicates that MAC addresses are in standard or “canonical” format.

Figure 7-4: PBB Frame Format

The I-Tag contains additional PBB information, including the unique I-SID value assigned to each customer. The I-tag also includes the backbone Instance Priority Code Point (I-PCP). This value is used for QoS marking, similar to the 802.1p priority value used in VLAN tagging. This provides a mechanism for the service provider to set priority values without modifying the customer’s original 802.1p marking inside the standard 802.1Q header…the B-Tag. The Drop Eligibility Indicator (I-DEI) is a feature which has been available in IP networks for years, as a part of the DSCP values in an IP header. Two bits of the DSCP value are used to indicate levels of drop probability for QoS purposes. Likewise, PBB’s I-DEI value is used to mark certain traffic with higher or lower levels of drop probability or eligibility. A network administrator can vary drop probability for clients based on bandwidth utilization. For example, as long as clients use less than 100 Mbps, their frames are marked with a low eligibility value, and are unlikely to be discarded. When client traffic bursts between 100 and 150Mbps, frames can be remarked with a higher IDEI value, and so are more likely to be discarded. Above 150Mbps, frames are remarked with yet another value, and are very likely to be discarded. When multiple client frames arrive at a congested switch, the ones with the highest eligibility value


are dropped first. The I-Tag’s UCA bit indicates that different addressing format is carried in an I-Tag, while the RES bits are reserved for future use. The C-DA and C-SA are the source and destination MAC addresses of the original client’s Ethernet frame. This is only learned by client-facing BEBs. The Ethernet payload could be an untagged frame, a single VLAN tag or a QinQ frame with a double tag. The S-Tag would be the outer tag and the client’s C-tag would be the inner VLAN tag.

PBB Concepts: B-TAG and I-SID It is important to understand the relationship between the B-Tag and I-SID. The B-Tag includes the B-VLAN and priority values, enabling multiple VLAN usage in the PBB backbone. A single B-VLAN can transport multiple customers’ traffic, since each client can be uniquely identified by their I-SID. It is the I-SID that enables multitenant support over a single backbone. The I-SID is administratively configured, using a number in the range of 255 to 16,777,215.

PBB With or Without SPB For a PBB-only configuration, the backbone requires some Layer 2 path services between BEBs. Since the backbone typically requires device redundancy, some protocol must provide optimal paths while avoiding loops. This redundancy could be provided by using IRF with Multi-Chassis link aggregations. One consideration for this solution is that it requires a homogenous, single-vendor deployment. Another option is to leverage the Spanning Tree Protocol, however it is an unlikely choice for a large backbone infrastructure. This is because of limitations related to fail-over time and scalability. An additional factor relates to the active-standby nature of STP redundancy, due to the single tree that is calculated. With multi-vendor deployments, or when a highly redundant and scalable solution is required, SPB is an attractive option. While PBB provides the encapsulation service, SPB’s multi-path, active-active deployment model provides distinct advantages over STP. SPB uses IS-IS link calculations to determine best paths through the backbone. This allows all active paths to be used for data forwarding. The combination of SPB and


PBB results in SPBM, Shortest Path Bridging with MAC-in-MAC mode.

SPB Introduction There are two flavors of SPB available. One is Shortest Path Bridging VLAN Mode (SPBV) and the other is Shortest Path Bridging – MAC-in-MAC mode (SPBM) SPBV was originally intended as a successor to STP. SPBV does not encapsulate packets. It is only responsible for discerning best paths using an Active-Active model. This protocol will not be discussed further, as it is not widely implemented. HP devices do not support this protocol. Like SPBV, SPBM also provides multiple active paths in a Layer 2 Ethernet, based on shortest path calculations performed by IS-IS.

SPBM Device Roles Since SPBM is based on PBB the same BCB and BEB device roles are used. Figure 7-5 shows four BCBs and four BEBs. As with a pure PBB-only scenario, BEBs are customer facing, and are responsible for learning customer MAC addresses, PBB still encapsulates customer frames as before, so that BCBs need not learn customer MAC addresses. BCBs only need to learn the MAC addresses of the four BEBs.


Figure 7-5: SPBM Device Roles

Unlike PBB, with SPBM, the backbone is not just a simple Layer 2 switch fabric. Instead, switches run SPB’s IS-IS protocol for best-path topology calculations.

SPBM Path Calculation BCBs need only learn BEB MAC addresses, and so SPBM is responsible for calculating best paths to reach these BEB MAC addresses. The IS-IS link State Protocol is used to calculate these paths, and both BEBs and BCBs must participate in this process, as shown in Figure 7-6.


Figure 7-6: SPBM Path Calculation

Frame forwarding for SPBM differs from TRILL’s hop-by-hop encapsulation method. With TRILL, each routing bridge that receives a frame strips away the outer header, and creates a new Layer 2 header to transmit the frame to the next-hop RBridge. This method requires specific hardware capabilities built into the devices ASIC circuitry. With SPBM, the BEB receives a client frame and creates a PBB encapsulation around it. That BEB’s MAC address is the source, and the ultimate destination BEB MAC address is the destination. This single encapsulation traverses the entire SPBM backbone. The link-state protocol finds the best path to that destination BEB, and BCBs all forward the frame accordingly.


SPBM: Multiple Path Selection 1/2 In a traditional IP network, traffic can be load-balanced over multiple equal-cost paths. Some IP routers might use per-packet load-sharing, where each packet is sent over a different equal-cost path, in a round-robin fashion. However most Layer 3 switches load-share based on a hash calculation that is performed on the source and destination IP addresses. This means that all traffic with the same source/destination IP address pair will traverse the same path. A different source/destination pair will use another one of the equal-cost paths. SPB does not use the mechanisms described above. In fact, load-sharing can be very simple to administratively control by configuring B-VLANs. For example, you could setup B-VLAN 100 for a certain set of tenants to be supported between BCB1BCB3-BCB4, as shown in Figure 7-7. Similarly, B-VLAN 200 could be for a different set of tenants, supported between BCB1-BCB2-BCB4. Although there are multiple physical paths, a single path has been configured for Silver tenants, and a different path for Blue tenants. This is an easy way for an administrator to split the load across different paths.


Figure 7-7: SPBM: Multiple Path Selection 1/2

Alternatively, you could decide to enable both B-VLANs 100 and 200 on both paths, which are of equal cost in this scenario. Therefore, there are two equal-cost paths for both Silver and Blue tenants. In this case one of sixteen predefined ECT algorithms can be applied to each B-VLAN. These algorithms provide a very deterministic path mechanism. In other words, it uses a specific, determined path from a source to a particular destination. The same path will be used for return traffic, since the same algorithm is used. For example, one of the sixteen available algorithms chooses the path based on the highest bridge ID. Another bases it on the lowest bridge ID. The network administrator can assign which customer’s traffic uses which algorithm. Blue customer traffic uses B-VLAN 100, which you have configured to use ECT algorithm 1. When this algorithm detects equal-cost paths, the path with the highest


Bridge ID is selected. Assuming BCB3 has a higher Bridge ID than BCB2, Blue tenant traffic between BCB1 and 4 traverses BCB3. Meanwhile, Silver tenants use B-VLAN 200, which is configured to use ECT algorithm 2. This traffic would use BCB2 as their preferred path, since this algorithm chooses the bridge with the lowest Bridge ID. Since the return traffic uses the same algorithm, customer traffic originating from BEB 4 and destined for BEB1 will use the same path as described above. Blue tenants configured to use ECT algorithm one will reach BEB1 via BCB3. Silver tenants assigned to use algorithm 2 would reach BEB1 via BCB2. The ECT algorithms described above base their path choices on Bridge ID, and administrators can manipulate these ID values to engineer traffic flows. An administrator could simply assign a higher Bridge ID value to BCB2, and change traffic flow patterns. In this way, a network administrator can pre-determine or preengineer the paths along which certain client traffic will pass. By default all backbone VLANs use ECT algorithm 1, but you can select which customer B-VLANs use which ECT algorithms. The deterministic nature of the ECT mechanism is different from that used by TRILL. TRILL simply uses a hash-based mechanism for equal-cost load-sharing. This load balancing is effective, but there is no precise, deterministic control over which specific path a particular customer’s traffic will use.

SPBM: Multiple Path Selection 2/2 The administrator assigns each backbone VLAN to a specific ECT algorithm, thus creating a deterministic path for each one. For example, in Figure 7-8, backbone VLAN 100 might be used for tenants with a “Blue plan”, while VLAN 200 might be used for tenants that have paid more to have a “Silver plan”.


Figure 7-8: SPBM: Multiple Path Selection 2/2

Traffic assignment is controlled by the ingress BEBs. These BEBs are configured with a Virtual Switch Instance (VSI) to accept a particular client’s traffic, similar to an MPLS VPLS scenario. From within the VSI configuration, the network administrator defines the I-SID, which uniquely identifies the customer. The backbone VLAN tag can also be defined. This backbone VLAN tag can be mapped to a specific ECT algorithm to control path selection, as described above. For example, a BEB may have an interface connected to two Silver tenants and different interfaces connected to some Blue tenants. On this switch, a VSI is defined for each silver customer, with a unique I-SID defined. All of the VSIs for these customers are configured to use B-VLAN 200. More VSIs are defined for the Blue clients, each with a unique I-SID, and all configured to use VLAN 100. This VLAN


100 can be configured to use a different ECT algorithm, and therefore a different path in the backbone.

SPBM VSI The VSI is defined on BEBs. This definition includes a unique I-SID to define the tenant, as well as a backbone VLAN. There are also two types of interfaces that can be defined. Tenant-facing physical interfaces and uplink interfaces, which are PBB interfaces. The I-SID number ranges from 255 to 16,777,215. I-SID number 255 is reserved for the SPB fast-channel feature, and should not be assigned by network administrators for tenant usage. The fast-channel feature is an enhancement in how IS-IS LSPs are delivered between BCBs. In a traditional IS-IS network an LSP is generated by a device and then forwarded to peers. Peers process this LSP to their link-state database, and then forward it on to their peers. In a large network LSPs may need to be processed by several switches in the path. Each switch must submit the LSP to their control plane and process the update, delaying receipt of the LSP at the remote end of the network. This hop-by-hop processing must occur on traditional Layer 3 IS-IS networks because each routed link is in a different Layer 2 broadcast domain. SPB’s fast-channel feature is possible because all BCBs have Layer 2 connectivity, provided by PBB. The configuration can include reserved I-SID 255 to be used as a kind of internal VLAN. When a topology change occurs, LSPs are generated as usual. However, not only are they forwarded to IS-IS peers, they are also forwarded as a data frame over I-SID 255. All BCBs therefore receive LSPs at about the same time, and can all process them in parallel. The fast channel feature is automatically enabled when you configure a VSI with ISID 255.

SPBM S-VID to I-SID to B-VLAN Mappings The BEBs and BCBs in an SPBM fabric must be aware of which traffic belongs to which tenant. On tenant facing interfaces the BEB selects specific traffic as belonging to tenant 1, 2, or 3. This selection can be done at the interface or VLAN level. For VLAN-level selection, the outer VLAN is called the S-VLAN ID, see Figure 7-9 for an example. There are over 4000 S-VIDs available, each of which can be mapped on a specific I-SID number.


Figure 7-9: SPBM S-VID to I-SID to B-VLAN Mappings

I-SIDs are assigned to tenants. Each tenant should have a unique I-SID, or unique set of I-SIDs, different from those assigned to other tenants. Therefore, I-SIDs represent a tenant’s traffic in the SPBM fabric. Also, a single I-SID defines or represents a single MAC address table. If the tenant requires connectivity for ten VLANs to a remote location, you have two choices. You can either use a single I-SID for all of that tenant’s traffic, or each tenant VLAN can be mapped to its own I-SID, or you can use a hybrid of the two. If a single I-SID is used for all ten tenants VLANs, then a single, flat MAC address table is common for all VLANs. The simplicity of this option is attractive, but there must be no MAC address conflicts across all VLANs for that tenant. If you use a separate I-SID for each tenant’s VLAN, then each VLAN has its own MAC address table. This option adds a bit of complexity, but eliminates the possibility of inter-VLAN address conflicts. Note Many modern switches maintain separate MAC address tables per VLAN. In that case, the concerns about MAC address conflicts described above would not be an issue.

SPBM Forwarding Flows This section is focused on how various types of traffic are forwarded through an SPBM fabric. The section begins with a discussion of the MAC learning and


forwarding process for unicast traffic. Then the three variations of multicast traffic flows are discussed, including broadcast, multicast, and unknown unicast.

SPBM Forwarding Process Unicast The BEB’s VSI for a client receives the frames that a client transmits, and learns MAC addresses by checking the source address field in these received frames. This identical to the operation of traditional Ethernet switches. Incoming tenant frames destined for a remote site over the SPBM fabric will be encapsulated inside a PBB frame. This outer frame is marked with an I-SID and tagged with the appropriate B-VLAN ID. The BEB places its MAC address as the source address in the new header. The switch must also determine how to forward this frame to the appropriate destination. This process is shown in Figure 7-10.

Figure 7-10: SPBM Forwarding Process Unicast

The switch analyzes the destination MAC address in the received tenant frame, comparing it to its VSI MAC address table. If this MAC address was previously learned from a remote BEB, it will be in this VSI MAC address table. However, it will not be mapped to an outgoing interface, as with a traditional MAC address table. Instead, it is mapped to the MAC address of the remote BEB from which it was learned. The local BEB would therefore know which remote BEB to forward the frame to. That BEB’s MAC address would be used as the destination


MAC address in the PBB frame header. Of course, the fabric BCB’s forward frames based on this destination MAC address. If the tenant-generated frame’s destination MAC address has not yet been learned, or if it is a multicast or broadcast, then the frame is flooded inside the backbone. In this way, every pertinent BEB will receive the frame. This flooding is based on a Layer 2 multicast process. BCB’s do not use the Ethernet broadcast MAC address for this purpose.

SPBM Multicast Forwarding SPBM supports two multicast methods to deliver frames inside the backbone. One method is called head-end replication, and the other is called tandem replication. With head-end replication, tenant multicast traffic is transported as unicast traffic to each remote BEB. In an SPBM network with six BEB switches, the ingress BEB that receives a client frame transmits five unicast frames into the backbone – one for each of the other five BEBs. One advantage to head-end replication is simplicity, as there is no requirement for multicast support inside the backbone. Another advantage is optimized path selection, since each unicast frame can be forwarded along the shortest path to the destination BEB. Head-end replication need only send unicast frames to those BEBs that participate in that tenant’s I-SID. This limits the number of destination BEBs that need to receive and process the frame. Although an SPBM fabric may be comprised of fifty BEBs, perhaps only three of them are used to support a particular tenant. Only these three switches would be configured with that tenant’s VSI. The ingress BEB, therefore, only needs to create two replicas of the frame. Tandem replication uses multicasting as its transport mechanism. The ingress BEB is therefore relieved of the burden of replicating and transmitting an inbound frame multiple times. It need only transmit a single multicast frame into the backbone. However, the backbone devices must now be capable of processing and forwarding multicast frames.

SPBM Multicast Head-End Replication With head-end replication, an ingress BEB receives tenant multicast frames and replicates them as unicast frames into the backbone. A unicast frame will be generated for each BEB that participates in that tenant’s I-SID. Overhead is a concern


when a single multicast frame must be replicated and transmitted several times as a unicast frame. This overhead increases with the size of the client’s deployment. If the SPBM fabric supports a client with 20 sites, each multicast must be replicated nineteen times. For this reason, head-end replication is most suitable for deployments with limited multicast traffic. This is the default mode on HP Comware devices configured for SPBM. Figure 7-11 shows an example of BEB1 receiving an incoming multicast date frame. BEB1 must replicate this frame two times, one for each of the other BEBs participating in the tenant deployment. Each replica must be encapsulated. One must then be transmitted to BEB2, and the other must be transmitted to BEB3.

Figure 7-11: SPBM Multicast Head-End Replication

SPBM Multicast Tandem Replication Tandem replication uses a special multicast address, based on a unique ingress BEB identifier and the tenant’s I-SID. The ingress BEB uses this as the destination MAC address of the new frame, and sends it into the backbone. Based on the shortest path


topology, certain BCBs will be forks in the path toward destination BEBs. These BCBs must replicate the frame toward each BEB that is relevant to the tenant. In Figure 7-12, BEB1 sends a multicast frame into the backbone. The address was fashioned as described above, based on the ID for BEB1 and the client’s I-SID. When the BCB receives this multicast traffic it knows which interfaces to use for frame forwarding. The multicast address is client-specific, and the BCB knows which peers are relevant to that client. Using Reverse-Path Forwarding (RPF) it knows that it received the frame on the best path back to the source, and knows that it should duplicate these frames away from the source, toward appropriate destination BCBs.

Figure 7-12: SPBM Multicast Tandem Replication

As previously stated, the default method used to forward client multicast traffic is head-end replication. The mode can be configured for each tenant’s VSI. It is important that the same method is configured in the VSI of every BEB for a particular client.

SPBM Backbone Multicast VLAN ******ebook converter DEMO Watermarks*******

The backbone multicast VLAN feature is an option to isolate multicast traffic inside the backbone, see Figure 7-13. Since head-end replication uses unicasting, the backbone multicast option is only applicable for the tandem multicast method. The feature is configured with the “multicast-bvlan enable” command and should be configured with care to ensure a functional deployment.

Figure 7-13: SPBM Backbone Multicast VLAN

The B-VLAN configured by the administrator for this feature should always be an odd-numbered VLAN ID, such as 11, 21, and 33 for example. These odd-numbered VLANs will be used for unicast delivery. The device with the multicast backbone VLAN feature enabled will automatically use the next even VLAN for multicast delivery. For example, supposed that a tenant sends multicast data toward BEB1. BEB1’s VSI for this tenant is bound to backbone VLAN99. Unicast traffic from that client will traverse the backbone with that tag. Since the BEB is configured with the backbone multicast feature, any multicast frame from that client are forwarded to the BCB tagged with B-VLAN 100. Remember that since each B-VLAN can be configured with a specific ECT algorithm, VLANs can use unique paths through the backbone.


SPBM Configuration Requirements If IRF systems participate in SPBM, the IRF MAC address must always be configured. This is configured by default on chassis-based devices, but should be manually configured on traditional top-of-rack devices. Failure to configure this can create problems after an IRF fail-over event. By default, when a different physical switch becomes the new IRF master, it begins to use a new MAC address after a few minutes. This would result in a different system ID, and an IS-IS topology change in the SPBM fabric. Since a BEB’s MAC address is used as the source or destination for frame transport, a change in that address breaks the PBB delivery mechanism. For these reasons, stable MAC addresses are important. Another important fact to remember is that the version of IS-IS used by SPBM is exclusively for its use. This protocol is separate and distinct from the IS-IS protocol used for IP routing. The two protocols do not interact in any way. We know that each BEB’s MAC address must be unique, since these addresses serve as the source and destination address for frame transmission across the fabric. Of course, you should avoid statically configuring a MAC address on these devices that could conflict with another address. The backbone VLAN for the SPBM should be dedicated for SPBM. No other feature should be configured on this VLAN. Also, SPBM requires that the Spanning-Tree mode of operation be set to MST.

SPBM Support for Graceful Restart The SPBM implementation on HP devices includes full support for graceful restart. This feature minimizes outages during fail-over event. Without this feature, chassis blades or IRF systems may fail-over quickly, but routing protocols must re-establish peer relationships, lengthening the effective downtime. As with the traditional graceful restart feature used with OSPF, SPBM’s IS-IS defines a graceful restarter function on the device that failed over, and graceful restart helper function device peers. The configuration must be enabled on both the restarter and helper device. The configuration of this feature is a best practice for high-availability deployments.

Design Considerations 1 ******ebook converter DEMO Watermarks*******

By default, LSPs are only propagated by the control-plane, just like OSPF’s control plane process distributes LSPs. This means that a device must receive LSPs, process them at the control plane, and then propagate them to other devices. Fast channel is an additional, optional mechanism that propagates LSPs throughout SPBM domain using a multicast that goes out to all SPBM devices. This messaging is processed by hardware, while the data plane simultaneously propagates the information to other devices. The feature is automatically enabled when a VSI is created with I-SID 255. This VSI should not have tenant-facing interfaces, nor should this I-SID value be assigned to an actual tenant. Even if the fast channel feature is not desired for initial deployment, ISID 255 should be kept in reserve.

Design Considerations 2 As we have seen with TRILL, devices that do Layer 2 shortest-path bridging cannot also perform Layer 3 routing functions. Routing must be performed by some device that is not configured for SPBM. Layer 3 routers must connect to the SPBM VSI as an endpoint. Chassis-based SPBM switches can use a separate Multi-Tenant Device Context (MDC) for this purpose. In that scenario, a cable can be connected from an SPBM-participant line card to an IProuting MDC. Effectively SPB and Layer 3 routing functions are being performed by two separate devices, since the MDCs are functionally isolated features inside the physical switch.

Design Considerations 3 SPBM is fully supported in combination with IRF, and SPBM backbone links can use link aggregation. You can also use service instances on link aggregation interfaces for customer access. This is a convenient way to provide redundant access into the SPBM service. You can also take advantage of the link-aggregation hashing algorithm for loadsharing between SPBM backbone devices. Multiple physical links using link aggregation are perceived as a single logical connection by the BCBs. As shown in Figure 7-14, when Layer 3 routing and IRF are combined with Layer 2 SPBM services, you can leverage IRF’s active-active model for redundancy. This active-active Layer 3 role is connected to an SPBM VSI by a link-aggregation group.


Figure 7-14: Design Considerations 3

Alternatively, you can deploy an active-standby model that uses separately-managed switches, configured to use VRRP. Some network engineers may prefer the convenience of managing multiple physical switches as a single IRF entity. Others may be attracted to the flexibility of having separately managed control planes in separate physical switches.

Configuration Steps for SPBM Figure 7-15 outlines the configuration steps for SPBM.


Figure 7-15: Configuration Steps for SPBM

To begin the configuration of SPBM features, the L2VPN and SPBM settings are globally configured, as are MSTP regions. The B-VLAN is configured, along with customer-facing VSI and I-SID. The final required steps are to create and bind a service instance for the customer and verify your efforts. Optionally, multicast replication and B-VLAN to ECT mapping can be configured. Note The 12500 cannot be configured for SPBM if it is operating in standard mode. Prior to configuring SPBM on a 12500 model, it should be set to any other mode but standard, such as route, bridge, grand, or advanced.

Prerequisite Step for Some Models The 12500 cannot be configured for SPBM if it is operating in standard mode. It must


be configured in some other mode, such as route, bridge, grand, or advanced. While most HP Comware switch models do not require this step, other models may be introduced that require this step. You should check the configuration manuals for your specific device to determine if the system working mode needs to be configured. Figure 7-16 shows a typical configuration for a 12500. This setting requires a system reboot in order to take effect. The command “display system-working-mode” can be used to validate the setting.

Figure 7-16: Prerequisite Step for Some Models

Step 1: Configure Global L2VPN and Global SPBM Figure 7-17 shows the syntax to globally enable L2VPN and SPBM.

Figure 7-17: Step 1: Configure Global L2VPN and Global SPBM

Step 2: Configure MSTP Region Settings for SPBM Next step is to configure an MSTP region, as required by SPBM. Backbone VLANs are all allocated to special MSTP instance 4092. Using this special MST instance number, there is no need to run actual Multiple Instance Spanning Tree, but the region configuration must be active.


In Figure 7-18, B-VLAN 4021 is used. The region name “backbone” is configured. This can be any name, but should match on all devices. Instance 4092 is defined, and B-VLAN 4021 is allocated to it. If other B-VLANs were to be used, they would also need to be mapped to this instance. Finally, the MSTP instance is activated in the last command shown.

Figure 7-18: Step 2: Configure MSTP Region Settings for SPBM

Step 3: Configure B-VLAN The B-VLANs must be defined and enabled on all backbone links. These uplinks should be configured to only permit B-VLANs and no others. In the example scenario shown in Figure 7-19, VLAN 4021 is the B-VLAN. In the figure, the first command creates the VLAN. Then a backbone interface is configured as a trunk port. Default VLAN 1 is removed, and VLAN 4021 is assigned to the port.

Figure 7-19: Step 3: Configure B-VLAN

The last command in Figure 7-19 enables the SPBM feature on the interface. This command actuates two functions. It enables PBB on the interface, and so all frames in and out of this interface shall be encapsulated using PBB. Any frames received on this interface that are not PBB-encapsulated are ignored. The command also enables SPB using MAC-in-MAC mode. SPBM’s IS-IS protocol is therefore active on this interface. It sends IS-IS Hello packets, and forms adjacencies with other IS-IS peers.


Step 4: Create SPBM VSI and Configure I-SID BEBs have some interfaces connected to clients, and others connected to the SPBM backbone’s fabric of BCBs. A VSI is configured for each client. The VSI will have two types of active interfaces. Customer facing interfaces will be configured with a service instance. The other interfaces are backbone links. These will be learned through SPBM. There will be one logical link for each remote BEB which belongs to the same I-SID. Figure 7-20 shows a VSI named “customerA” is configured. This name is locally significant only. BEB2 could be configured as “customerA1”, as long as the I-SID matches. Of course, configuration consistency is a good idea.

Figure 7-20: Step 4: Create SPBM VSI and Configure I-SID

In this scenario, an I-SID of 10001 is assigned. This uniquely defines the customer network inside the backbone. B-VLAN 4021 is associated to this I-SID. This is the VLAN that will be used to transport customer A’s traffic over the SPBM fabric. It was assigned to a physical, backbone-facing interface in the previous step. Optionally, this VLAN can be configured to use specific ECT load-sharing algorithms.

Step 5: Create and Bind Service Instance Customer-facing interfaces must be configured. The service instance is created with some arbitrary ID number. The service instance ID is locally significant only. The service instance defines which customer is connected to the interface, and which customer VLANs can traverse the SPBM fabric. In Figure 7-21, service instance 10 is defined on interface ten1/0/1. The encapsulation command defines that incoming traffic tagged with VLAN 101 shall be eligible for SPBM fabric services. Alternatively, the encapsulation default option could be specified. This would allow all incoming traffic to use SPBM, tagged or untagged. Whatever is specified, all of that client traffic is associated with a single VSI, as defined in the previous step.


Figure 7-21: Step 5: Create and Bind Service Instance

By defining additional service instances, multiple VLANs can have separate MAC address tables. For example, the BEB could be configured with service instance 11. Then the encapsulation command for that instance can specify VLAN 102 and then cross connected to another VSI. You may recall that each VSI defines a separate MAC table. One VSI should be considered as one shared MAC address data base. The VSI can also be used to perform local switching operations. For example, you could configure interfaces ten1/0/1, 1/0/5, and 1/0/7 for service instance 10, with encapsulation VLAN ID 101, and cross-connected to the VSI for CustomerA. This results in three interfaces inside the virtual switch instance, and this VSI can perform local switching between them. Interfaces are configured for Layer 2 switch port operation by default with the command “port link-mode bridge”. The interface must be operating as a Layer 2 port to deploy the configuration shown, but you typically would not need to enter the command.

Step 6: Verify Several display commands are available to verify SPBM operation. These commands are shown in Figure 7-22.



Configuration Review: BCB Configuration Figure 7-23 collects the SPBM syntax into a single BCB configuration example. The BCB must participate in the SPBM IS-IS topology, and so interface is configured with the “spbm enable” command. Since BCBs are not customer facing, they do not require the VSI instance configuration that a BEB needs.

Figure 7-23: Configuration Review: BCB Configuration

Globally, L2VPN and SPBM are enabled. The MST region is configured with the special instance number 4092, the B-VLAN 4021 is assigned, and the instance is activated. VLAN 4021 is actually created from global configuration mode. Next, BCBs physical interfaces are configured as trunk ports, and VLAN 1 is removed to ensure that only the B-VLAN is supported on the interface, as required. The VLAN is permitted on the interface, and SPBM is enabled.

Optional Step: Multicast Replication You have previously learned that head-end replication is the default method of handling multicast, broadcast, and unknown unicast frames. If multicast replication mode is desired, then it should be configured on all BEB nodes participating in the same customer I-SID. In the example in Figure 7-24, the VSI for customer A has been assigned I-SID 10001. From within that context, multicast replication mode is set to tandem. This means that each VSI can be configured with its own multicast replication mode. Some VSI’s can be configured for head-end replication, while others can be configured for multicast replication.


Figure 7-24: Optional Step: Multicast Replication

It is recommended that the default head-end replication be used for I-SIDs with only two participating BEBs. It doesn’t make sense to use multicast tandem mechanism when only a single unicast frame needs to be sent anyway.

Optional Step: B-VLAN to ECT mapping Another optional configuration task is to specify the ECT algorithm to be used for a specific B-VLAN. The ECT algorithm controls path determination through the backbone. Sixteen ECT algorithms are available. The actual numbers assigned to them are hexadecimal values 0x0080C201 – 0x00080C210. (0x10 = 16 in decimal) By default all the B-VLANs use algorithm 1. This configuration is actually configured as a global SPBM setting. It is not configured at the I-SID level. The BVLAN to ECT algorithm mapping configuration should be consistent across all BCBs in the fabric. In the example shown in Figure 7-25, BEB1 is configured by first entering the SPBM configuration context. From here, B-VLAN 4021 is configured to use ECT algorithm 2. When multiple paths are available, this algorithm with choose the path with the lowest bridge ID.

Figure 7-25: Optional Step: B-VLAN to ECT mapping

HP devices support an ECT migration feature, allowing the easy migration of customers to different B-VLAN without any impact. To do this, the network administrator begins my creating a new B-VLAN, and assigning the desired ECT algorithm. This configuration must be completed on all the backbone devices.


Next, each BEB participating in the customer VSI is configured to support the new BVLAN. This new B-VLAN assignment is automatically exchanged throughout the backbone by IS-IS. Even though the administrator executes the command to use the new B-VLAN, devices will not actually start using it unless it has learned it from ISIS or from other BEBs for that I-SID are also using it. In this way, traffic will continue to be forwarded on the original B-VLAN until the administrator has configured every pertinent BEB, and IS-IS has announced the new B-VLAN. Once that occurs, all devices seamlessly switch to the new B-VLAN, which could be configured with a different ECT algorithm.

Summary In this chapter you learned Shortest Path Bridging Mac-in-Mac mode (SPBM) is defined by IEEE 802.1ah (PBB) and 802.1aq (SPB). The goal is to maintain the simplicity of a Layer 2 fabric while leveraging the scalability and convergence of Layer 3 routed services. Best-path traffic forwarding is based on a link-state routing protocol. SPBM is relatively simple to deploy, since it is based on Ethernet standards, and provides multi-tenant capabilities. Two SPBM device roles are defined. BEBs receive customer frames, wrap them in a PBB encapsulation, and send them into the SPBM backbone. BCBs for the backbone fabric, simply forwarding frames based on the outer PBB MAC header. SPBM uses the SPB protocol to determine best paths. SPB is based on the IS-IS routing protocol. When multiple equal-cost paths exist, one of sixteen available ECT algorithms are used to provide deterministic path selection. Administrators can control which algorithm is used for a particular client, and manipulate selection metrics used by that algorithm to engineer SPBM fabric paths. A VSI is defined on BEBs to service customer traffic. A customer’s VSI definition includes a Backbone Service Instance Identifier, called an I-SID, and a B-VLAN. This definition must be consistent across all BEBs that participate in that customer’s connectivity. SPBM’s processing of unicast frames is very similar to that of classic Ethernet. Multicast forwarding can use a head-end replication or tandem replication method.


SPBM configuration includes configuring MSTP regions, B-VLANs, VSIs and ISIDs. Multicast replication and B-VLAN-ECT mapping configurations are optional.

Learning Check Answer each of the questions below. 1. Which of the statements below accurately describe the goals of SPBM (Choose three)? a. Support a large-scale Layer 3 fabric. b. Maintain the simplicity of Layer 2. c. Provide the scalability and convergence of Layer 3. d. Forward based on best path selection. e. Forward based on advanced STP. 2. Which statements are true about PBB device roles (Choose three)? a. The BEB adds a MAC-in-MAC encapsulation to customer frames b. The BEB-added frame encapsulation uses its local BEB MAC address as the source, and the target BEB’s MAC address as the destination. c. BCBs forward frames based on destination IP address d. BCBs forward frames based on the outer MAC address e. BCBs forward frames based on the customer’s MAC address 3. PBB supports over 16 million customers, uniquely identified by an I-SID, while the B-TAG enables multiple VLANs to be used inside the backbone. a. True. b. False. 4. How does SPBM load-share across multiple equal-cost paths? a. SPBM uses a hashing algorithm based on source/destination IP addresses. b.

SPBM uses a hashing algorithm based on source/destination MAC addresses

c. Per-packet load sharing uses a round-robin scheme to split the load among multiple equal-cost paths. d. A set of 16 predefined ECT algorithms are used to provide deterministic path selection.


5. When SPBM handles multicast traffic, what are the two types of replication used (Choose two)? a. Head-end replication uses a BCB to send unicast frames throughout the backbone. b. Unicast replication uses a BEB to send unicast frames into the backbone. c. Tandem replication uses a special multicast address to forward frames to appropriate customer sites d. Head-end replication uses the ingress BEB to replicate a multicast frame as several unicast frames into the backbone. e. I-SID replication creates a special address based on the customer I-SID and a unique ingress BEB identifier.

Learning Check Answers 1. b, c, d 2. a, b, d 3. a 4. d 5. c, d


8 Ethernet Virtual Interconnect

EXAM OBJECTIVES In this chapter, you learn to: ✓ Describe EVI features. ✓ Understand EVI basic operations. ✓ Understand EVI redundancy options. ✓ Configure EVI.

INTRODUCTION This chapter is focused on the Ethernet Virtual Interconnect (EVI) protocol. The motivation for developing EVI is explained, as are the features and functionality that enables EVI to extend Layer 2 broadcast domains over routed IP transport networks.

EVI Introduction Data center administrators need the flexibility to move any VM to any physical server, at any data center site. This is only possible if each VM’s VLAN can be extended across physical data centers. EVI was developed by HP to extend Layer 2 networks across data centers. This can also be accomplished with traditional fiber connections. However, direct fiber connections are not always available or feasible. For example, a customer might have each data center connected via a routed service. In that scenario EVI can use these routed connections to transport Layer 2 traffic. A key advantage over other options is that there is no need for MPLS support between data center connections. This is a strict requirement for client-managed services like VPLS or L2 VPN.


EVI protocols support the grouping of up to eight data centers in the same Layer 2 topology. Several optimization and tuning techniques are available to improve performance and resiliency. One such feature relates to local and remote MAC learning. You will see how local MAC learning is based on traditional data plane processes, while remote MAC learning uses a control plane process. This provides efficiency gains over other options. Another feature is ARP suppression. EVI-capable devices can snoop ARP information from the remote site, and provide proxy responses to local ARP broadcast request. This can suppress ARP traffic from traversing WAN connections. Also, EVI links block the flooding of multicast traffic. This affects the relationship between redundant VRRP routers, ensuring that each data center has a local VRRP master. This optimizes data flow between endpoints and their default gateways. Selective flooding is also support. This enables the network administrator to control which multicast and unicast traffic may traverse the WAN connection.

Supported Products Currently, the Comware12500 supports EVI. The Comware 12900 is supported with the latest release, while 11900 support is planned. The MSR/VSR is supported as of late 2014, with HSR support planned for early 2015. Note You cannot mix a 12500 and 12900 in the same EVI network, since they use different EVI encapsulation. HP routers can interface with either the 12500 or 12900.

EVI Operation This section begins with discussions about EVI terminology and concepts. This is followed by reviewing how EVI Networks operate, including neighbor discovery, the traffic forwarding process, tuning, and redundancy. EVI is an IP-based Layer 2 overlay technology for interconnecting data centers. With EVI, Layer 2 frames are encapsulated in IP packets using a format called MAC in GRE. This encapsulated traffic can traverse routed networks, using GRE tunnels that


are established between each data center’s EVI nodes. Since EVI uses standard GRE and IP encapsulation, any existing IP routing infrastructure can be used to extend Layer 2 networks between data centers. Traditional Ethernet switches glean MAC addresses from the source address field of normal endpoint data frames. These MAC addresses are mapped to the interface on which they were received. EVI learns the MAC addresses for local devices in much the same way. By comparison, EVI uses a control plane-based MAC learning process for remote devices. When a local MAC address is learned, the EVI nodes in that data center use IS-IS to announce it to remote sites. Remote sites map these MAC address to the link on which they were learned. Since remote sites proactively learn MAC addresses via this control plane process, they are free from having to inspect each incoming frame’s source address for that information.

EVI Terminology Figure 8-1 introduces terms related to EVI. These terms are described below.


Figure 8-1: EVI Terminology

An Edge Device is the switch or router in a data center that provides EVI services. It performs local MAC learning, forms IS-IS peers with remote edge devices, and announces local MAC addresses to those remote peers. The EVI network ID is unique identifier between edge devices. Each edge device may be assigned multiple EVI network IDs to isolate multiple tenants. Each EVI network ID defines an additional IS-IS process. This ensures that learned MAC information for one tenant remains separate and private. A single infrastructure can securely support multiple tenants. The LAN side of an edge device connects to traditional VLANs, with the traditional 4094 VLAN limit. You must map selected VLANs to specific EVI network IDs, thus creating an extended VLAN. For example, a data center site may have 1000 traditional, local VLANs. The network administrator wants to extend 100 of those VLANs to a remote data center.


Only these selected VLANs will become extended VLANs, by being mapped to a specific EVI network ID. Each VLAN can only be mapped to a single EVI network ID. It is not possible to map a VLAN to multiple EVI network IDs. The EVI Neighbor Discovery Protocol (ENDP) provides an IP address registration service. This is useful when several sites must be interconnected via EVI’s GRE tunnels. Each side of a GRE tunnel must be configured with a source and destination IP address. When only two sites are connected, this is easy. Each side merely specifies its own local address as the GRE tunnel source, and the other side’s IP address as the destination. However, with three data centers, each site must be configured with two separate tunnels, so that each destination can be configured. As additional sites are added, yet another tunnel must be configured. If a data center is decommissioned all other data centers must be reconfigured to remove the associated tunnel. ENDP defines server and client roles to alleviate this administrative overhead. Each data center’s ENDP client can register themselves to the ENDP server. They can also query the server for a list of currently active peer addresses. This enables the dynamic setup and tear down of GRE tunnels based on the list of registered IP addresses. This greatly simplifies the addition and removal of data centers. EVI IS-IS is based on standard IS-IS mechanisms. Specific extensions were added to announce Layer 2 reachability information for EVI. This reachability information includes a list of VLANs and the MAC addresses that have been learned on those VLANs.

EVI Concepts 1/2 Figure 8-2 shows an EVI-based scenario. Three data centers are connected by a public transport IP network, which may not a part of the customer-managed network. It could be a service provider’s network, or a service providers MPLS L3 VPN service, for example. Each data center has a single edge device that connects to the site-local VLANs, and provides connectivity to remote sites.


Figure 8-2: EVI Concepts 1/2

The edge device could be filling the role of a traditional core switch, with hundreds of local interfaces connected to traditional switches inside the data center. Classic MAC learning is performed in this local network. This means that each frame entering a local port is examined by the edge device. Its source address is added to the MAC address table, mapped to the interface on which it was learned. Using ENDP, GRE transport tunnels are dynamically formed between EVI peers. The switch perceives that logical Layer 2 EVI links are available over these Layer 3 GRE tunnels. This allows remote site MAC addresses to be learned via EVI’s IS-IS protocol.

EVI Concepts 2/2 ******ebook converter DEMO Watermarks*******

The EVI network ID is a logical entity that can be used to isolate multiple tenants. It allows unique logical topologies to be formed for each client, each with its own separate MAC address table. Figure 8-3 shows a topology that supports two separate client networks, EVI network ID 11 and an EVI network ID 12, represented in the figure as logical switching objects.

Figure 8-3: EVI Concepts 2/2

EVI network ID 11 is configured to operate between data centers 1, 2 and 3, while EVI network ID 12 operates only between data centers 2 and 3. Each logical network has its own ENDP server, IS-IS instance, and MAC address table. Complete tenant isolation is achieved. Multiple VLANs can be assigned to each logical ID 11 and 12, but each VLAN can be mapped to one and only one ID. VLANs 1 through 99 could be mapped to EVI ID 11, while VLANs 200- 250 could be mapped to EVI ID12.


By using QinQ technology, EVI edge devices can even provide each tenant with its own set of 4094 VLANs. This is conveyed in Figure 8-3. In DC-3, Edge3 is connected to a local switch that is configured with QinQ. Any VLANs from the local switch will have an additional, outer 802.1q tag of 20 added before transmission to Edge3. Edge3 will map VLAN20 to EVI ID 11. For another tenant, Edge 3 maps VLAN30 to EVI ID 12. This QinQ solution carries a risk as related to duplicate MAC addresses. EVI’s internal MAC database perceives all MAC addresses as associated with VLAN 20, since it only sees the outer tag. If there were duplicate MAC addresses among the client’s multiple VLANs, MAC flapping could occur in the EVI network. However, it is quite rare for duplicate MAC addresses to be an issue. VRRP could increase the odds of this issue, since VRRP uses a specific MAC range. If VLANs 101 and 102 both used VRRP router ID 1, they could end up with the same MAC address, and cause a conflict.

EVI Network Each EVI network configuration is defined with a unique network ID, in the range of 1 - 16777215. The EVI network also requires an EVI tunnel interface. This tunnel interface is defined by a unique tunnel interface ID. Multiple tunnel interfaces will use the same physical source IP address. Each EVI network has an EVI IS-IS process, which is automatically created when a new EVI tunnel interface is configured. EVI’s IS-IS process is responsible for linkstate calculation and remote-site MAC address exchange. The EVI process ID will match the EVI tunnel interface ID. If an administrator configures interface tunnel 26, there will also be an EVI IS-IS process 26. The neighbor discovery protocol must also be configured. This is unique for each network ID to ensure that unique topologies can be formed for each network tunnel. Finally, local VLANs are mapped to specific EVI ID’s to create a set of extended VLANs. Traffic tuning and optimization are also configured per network ID. This includes ARP suppression and selective flooding. In this way, each EVI configuration has a completely isolated configuration set.

EVI Process The EVI network process can be described in three phases - neighbor discovery, MAC address advertisement, and actual data forwarding.


ENDP neighbor discovery occurs on edge devices. Each edge device registers its IP address with the ENDP server. Each ENDP client also queries the server for other edge devices using the same network ID. GRE tunnels are automatically formed with these edge devices. So, these Layer 3 GRE tunnels are automatically established, with an overlay of associated Layer 2 EVI links. The EVI IS-IS routing protocol will form adjacencies over these EVI links. These adjacencies enable IS-IS to send locally learned MAC address information to remote peers. Once the network is fully established, endpoint data forwarding can occur. Local traffic is received, and the destination MAC address is found in the EVI MAC table. This table associates this destination MAC address to an EVI link. The frame is encapsulated and sent over the appropriate tunnel. The remote edge device receives this traffic and forwards it toward the appropriate local destination, based on the local MAC address table.

EVI Neighbor Discovery—Introduction ENDP eases the process of connecting multiple sites. The number of sites supported is dependent on hardware models in use. ENDP provides an IP address registration service that enables sites to be dynamically added and removed. Each ENDP Client (ENDC) is configured with the IP address of the ENDP Server (ENDS). The client will register its own local transport IP address with the ENDS. Then it queries the server to retrieve the list of active remote IP address. The client will refresh the transport IP address information at regular intervals. When clients go offline, the IP address is removed from the ENDS. When the client has retrieved all active remote IP addresses it can setup EVI tunnels to each of them. Each EVI network ID maintains separate ENDP configuration. This enables each EVI ID to maintain unique a topology.

EVI Neighbor Discovery—Configuration The ENDP server can be enabled on any EVI edge device. For registration redundancy purposes up to two ENDP servers can be configured for each network ID. An edge device configured as an ENDS is automatically its own ENDC. It registers its own transport IP address in the local database. Two ENDP servers are often configured to provide redundancy. This redundancy is achieved since each client


registers with both ENDP servers. However, consider a scenario with eight interconnected data centers, with ENDS configured at DC-1 and DC-2. Loss of connectivity to both DC-1 and DC-2 breaks the EVI connectivity between all the other data centers. There must be at least one functional, accessible ENDS. The ENDC is manually configured with the transport ID address of up to two ENDP servers. If an edge device has already been configured as an ENDS, it is already a client of itself, and so can only be configured with one more ENDS. ENDP authentication can be configured to enhance security. This is in the form of a password, which must match between the client and the server.

EVI Neighbor Discovery Example Figure 8-4 shows five interconnected data centers. The edge device at each data center has a routed interface connected to the transport IP network. Edge1 has an IP address of 10.1.0.1/24, Edge2 has 10.2.0.1/24, and so on. These addresses will be used to establish the tunnels. Each data center requires four GRE tunnels, one to each remote site.

Figure 8-4: EVI Neighbor Discovery Example

The two top-most data center’s edge devices are to be configured as ENDP servers. As discussed, they will automatically register to themselves as an ENDC. Each is also manually configured as an ENDC of the other edge device.


All of the other sites are configured as pure ENDP clients of both ENDP servers. The ENDS member database lists all active transport IP addresses and system IDs. The address listed is the public, provider-facing IP address. Each Edge’s system ID is also listed, based on the device’s MAC address. This System ID is used by IS-IS in the same way OSPF uses a Router ID. It uniquely identifies each node in the network.

EVI MAC Learning MAC addresses local to each data center are learned from local interfaces. This is classic data plane-based learning. Source MAC addresses are gleaned from normal endpoint data frames. These locally learned MAC addresses can then be announced by EVI IS-IS to all remote edge devices. The remote site edge devices receive these MAC addresses and add them to their EVI MAC address table associated with the EVI link on which it was received and learned. EVI IS-IS transmits this information in Link State Packets (LSPs). These LSPs include the actual MAC address, and the VLAN on which it exists. For example, if DC-1 has 10,000 local MAC addresses, Edge1 sends an LSP with 10,000 records. This is similar to how OSPF might use LSPs to advertise 10,000 available routes. Remote edge devices receive this LSP, and add the 10,000 MAC addresses to their database.

EVI Traffic Forwarding—Unicast Figure 8-5 highlights the process used by EVI to forward unicast frames. When an edge device receives a unicast frame, it learns the source MAC and looks up the destination MAC address. If the destination is local, it performs traditional local interface switching to the correct local interface. This is classic Ethernet switching.


Figure 8-5: EVI Traffic Forwarding - Unicast

If the MAC address is for a device at the remote site, the edge switch encapsulates the frame and sends it over the EVI tunnel the other site. The remote site edge device receives the inbound Unicast traffic on the EVI link. It does not learn the source MAC address from it, since remote MAC addresses are learned via the IS-IS control-plane process. It looks up the destination MAC address of the incoming frame. If the MAC address is for a local device, then classic, local interface switching will be performed. Typically, there would be a match, since Edge Device1 learned that MAC address from Edge Device2 in the first place. However, if there is no match, Device2 sends an EVI IS-IS instruction to Device1 to purge that address.

EVI Traffic Forwarding—Multicast 1/2 The Multicast delivery process applies to broadcast, multicast, and unknown unicast traffic. By default, the EVI protocol avoids multicast flooding by dropping multicast packets. To enable Multicast traffic over an EVI network, Protocol Independent Multicasting (PIM) is configured, along with IGMP snooping for IPv4 and MLD snooping for IPv6. Edge devices should flood Multicast control protocol traffic, such as IGMP, MLD, and PIM packets, to remote sites. This is important because a multicast receiver and transmitter must be aware of each other. When an additional listener comes online, it sends an IGMP message on the network. That IGMP message must reach the remote


site in order to trigger the delivery of the multicast stream. Since IGMP and PIM protocol packets use a multicast destination, those packets would be dropped by default. It is therefore important to configure EVI with the selective flooding feature. Note that IGMP and MLD PIM protocol packets arriving on EVI links from remote sites will be flooded out the local interface by default. It is locally generated multicast frames that will not be forwarded across the EVI links, by default.

EVI Traffic Forwarding—Multicast 2/2 Figure 8-6 highlights the process that EVI uses to forward multicast traffic, as Edge Device1 receives an inbound multicast packet on a local interface. The switch checks to see if any interfaces have registered to receive multicast traffic. Hosts register to receive a multicast stream by sending an IGMP join message.

Figure 8-6: EVI Traffic Forwarding – Multicast 2/2

If local endpoints have registered, the switch performs traditional multicast delivery. If a remote device has joined the multicast group across an EVI link, then frames are encapsulated and sent over the EVI tunnel to the remote site. If several clients from several remote sites have registered, then Edge Device1 uses a technique called “hidden replication”. It sends a separate packet across each EVI link, toward each remote site. Remote site Device2 receives the multicast traffic on the EVI link. It checks to see if any of its local interfaces are registered. This would have happened as described


above, when a local endpoint sends an IGMP join request. Of course, this should be the case, otherwise Device1 wouldn’t have received the request over the EVI link. Device2 performs traditional multicast delivery to get these multicast frames to appropriate endpoints. However, it will not forward multicast traffic over any EVI links. This split-horizon mechanism prevents loops inside the EVI network topology.

EVI Traffic Forwarding—Flooding EVI switches flood broadcast traffic to all local interfaces and all EVI links - EVI will not break traditional broadcast mechanism. Broadcast traffic like ARP will be flooded to all remote sites by default. The ARP suppression feature can be configured to minimize ARP broadcasts over EVI links. Unknown unicast frames are flooded out all local interfaces, but not over EVI links. This is unnecessary, since MAC addresses for remote devices are learned by the EVI IS-IS protocol. This is a good approach for most normal traffic, but can create challenges for specific protocols that rely on classic unknown unicast mechanisms. Exceptions can be configured to ensure that these special applications continue to operate properly. This will be covered later in this chapter. Incoming multicast traffic is flooded out all local interfaces in a traditional fashion. It will not be flooded out EVI links, unless IGMP, MLD, or PIM messages have been received over that link, indicating that the traffic is required. However, exception rules can be configured to enable selective MAC flooding. An example could be when VRRP is configured. VRRP uses a multicast address to send VRRP Hello packets, but never uses IGMP to join that multicast group. Since EVI will never receive an IGMP join for VRRP, it will not send VRRP hello packets across EVI links. The network administrator thus can configure a specific rule to permit the appropriate multicast MAC address to traverse EVI links.

EVI Traffic Optimization: ARP Suppression 1/2 ARP suppression applies to broadcast traffic. You have learned that EVI floods broadcast traffic to all sites by default. The ARP suppression feature reduces the number of ARP broadcast between sites, since the local edge device can proxy ARP responses on behalf of remote clients. A caching mechanism is activated on the Edge device to cache MAC-to-IP information.


EVI Traffic Optimization: ARP Suppression 2/2 Figure 8-7 highlights the operational steps used with ARP suppression. At DC-1, Server1 sends an ARP broadcast request that arrives at Edge1. Edge1 does not yet have an ARP record for the requested IP address, and floods the request to all remote sites. At remote site DC-2, Server2 receives the request and sends an ARP reply, which is forwarded by Edge2 across the EVI link.

Figure 8-7: EVI Traffic Optimization: ARP Suppression 2/2

Edge1 snoops for ARP replies on EVI link, learns Server2’s MAC address, and adds it to an EVI ARP cache. Therefore when another device at DC-1, Server3, broadcasts an ARP request for Server2, the broadcast need not be forwarded over EVI links. Edge1 has the entry, and proxies a response back to its local client.

EVI Traffic Optimization: Selective Flooding 1/2 Some protocols or applications require multicast packets, but do not perform


IGMP/MLD registration. Microsoft Network load-balancing Cluster is one example that may or may not use IGMP, depending on how it is configured. Another example is VRRP. EVI will block unregistered multicast from traversing EVI links by default. This may render the protocol or application as non-functional. As previously discussed, EVI will block VRRP Hellos by default. As a result, VRRP hosts in DC-1 and DC-2 will not hear hellos from each other. Therefore, they will both assume the master role and actively function as the default gateway for local clients. This ensures that local hosts always forward frames to a local VRRP forwarder. This is preferable to using a default gateway on the other side of a WAN link, since data paths are optimized.

EVI Traffic Optimization: Selective Flooding 2/2 Each EVI Network ID has its own selective flooding configuration. On each edge device, specific multicast MAC addresses that should be flooded can be configured per VLAN. This means that it is possible to allow the default configuration to block VRRP on VLAN 20, and allow VRRP Hellos to traverse EVI links for VLAN 30. Another example is noted in Figure 8-8, with the Microsoft Network Load Balancing (NLB) General multicast. This protocol uses the 03-BF MAC address range as a destination multicast address. This address range can be configured as an exception for VLAN 10.

Figure 8-8: EVI Traffic Optimization: Selective Flooding 2/2

Once configured, this multicast MAC address would selectively be allowed to be flooded over EVI links, while all other unregistered multicast traffic will conform to the default blocking action.

EVI Redundancy 1/2 ******ebook converter DEMO Watermarks*******

EVI allows for up to two logical devices at each local site. Since EVI is supported by IRF systems, site redundancy can be handled at this level. Two EVI devices can be configured as a single, logical IRF system. Other devices perceive a single logical EVI edge device. EVI IS-IS also supports graceful restarts. This ensures quick, nearly lossless recovery from an IRF master fail over scenario, or from an MPU fail over to a different IRF chasses. Another option is to deploy two distinct EVI edge devices in a local site, without use of IRF. With this solution, both edge devices connect to local site switches, and so announce the same set of MAC addresses to remote edge devices. This could cause duplicate MAC learning at remote sites. To avoid this issue an active/standby model is used. The active edge devices participate in EVI IS-IS, announcing locally learned MAC addresses. Therefore, remote sites learn MAC addresses only from this single, active edge device. The standby edge will maintain an LSDB, and so is ready to transmit all appropriate MAC address information in an LSP. However, it remains quiet unless the active device fails. In that event, the standby unit activates, forming IS-IS adjacencies to remote sites. System functionality is thus maintained with very minimal downtime.

EVI Redundancy 2/2 EVI must be able to detect that a site is using two edge devices. The network administrator will physically interconnect EVI edge devices to other local switches for redundancy, but EVI will not be aware of this. This is because EVI IS-IS only operates on the GRE tunnels formed over the transport IP network. It does not operate on the local site interconnections. The solution involves EVI Site IDs, which are configured by the administrator. Each data center is assigned a unique ID, which is configured on both edge devices at the site. The edge devices announce this ID on their local interfaces, and so see each other’s announcement. The device with the lowest MAC address is isolated as a standby device. The device with the highest MAC address is the active EVI IS-IS participant. The standby edge device is not in the data path, and ignores all data frames received from EVI links. It still exchanges IS-IS Hello packets, but does not form adjacencies nor exchange MAC information in any LSPs. It does accept LSPs inbound from


remote sites to build and maintain a current LSDB. Therefore, it is ready to take over should the currently active device fail. The default site ID is 0. Site ID 0 is not blocked from forming adjacencies. So if all sites are left to their default configuration, they will still form IS-IS adjacencies.

EVI IS-IS Maximum MAC Address Announcement EVI IS-IS uses LSPs to announce local MAC addresses per VLAN. Thousands of endpoints may exist at a large data center, so a single LSP might contain a large amount of MAC address information. An update to the local MAC address table results in an updated LSP, announced to all remote edge devices. With EVI, a single LSP can contain a maximum of 56,320 MAC addresses, advertised by each active edge device along with its EVI system ID. This then defines the upper limit then for each site in an EVI deployment. For EVI to handle more than this default maximum, virtual system ID’s can be defined. A virtual ID must be defined for each block of 56,320 MAC addresses. The network administrator must ensure that each virtual system ID is unique throughout the entire EVI network.

Configuration Steps for EVI Figure 8-9 introduces the steps to configure EVI on HP Comware switches. These steps are detailed next.


Figure 8-9: Configuration Steps for EVI

Optional Step 1: Configure the EVI Site-ID If two edge devices are deployed at a single site, you should configure an EVI Site ID. Each site must be assigned a unique site ID, which is configured identically on both Edge devices at the site. In Figure 8-10, the switches at site 1 are named Site1-12500-1, and Site1-12500-2. They are both assigned site ID 1. The device at another site is named Site2-12500-1, and is assigned an EVI site ID of 2.


Figure 8-10: Optional Step 1: Configure the EVI Site-ID

Step 2: Configure the Transport Interface The transport interface must be configured on the edge device. This is a routed link that is connected directly to the IP transport network. To do this, a VLAN is defined, and the port is made a member of the VLAN. Then a Layer 3 VLAN interface is created and an IP address is assigned to it. Finally, EVI is enabled on the physical interface. Note The Layer 3 VLAN interface would also be configured to participate in some routing protocol, such as static routing or OSPF, for example. This configuration is not shown. In Figure 8-11, VLAN 4001 is created and assigned to interface G3/0/1. Next, Layer 3 interface VLAN 4001 is defined with an IP address of 10.1.0.2/24. The configuration step is completed by enabling EVI on the physical interface. This step must be done on the physical interface, and not on logical interface VLAN 4001.

Figure 8-11: Step 2: Configure the Transport Interface

Step 3: Configure the EVI Tunnel Interface ******ebook converter DEMO Watermarks*******

Each EVI network requires a unique tunnel internal interface. This logical interface is configured with its tunnel source IP address. This is the address assigned to the interface connected to the IP transport network, as configured in the previous step. Manual configuration of the tunnel destination is not necessary, since it will be learned dynamically via ENDP. The tunnel Interface ID also serves as the EVI IS-IS Process ID. In the example in Figure 8-12, interface tunnel 1 was created with a mode setting of EVI. Since tunnel interface 1 was created, EVI IS-IS process 1 will be created.

Figure 8-12: Step 3: Configure the EVI Tunnel Interface

Step 4: Configure the EVI Network ID The EVI tunnel must be mapped to an EVI Network ID. The valid range for this number is between 1 and 16777215. In the example in Figure 8-13, tunnel 1 is configured with EVI network ID 101.

Figure 8-13: Step 4: Configure the EVI Network ID

Step 5: Configure the Extended VLANs When you map site-local VLANs to EVI, you are creating extended VLANs. These are the VLANs that EVI will extend across data centers. As shown in Figure 8-14, multiple VLANs can be mapped to single EVI network ID, but each VLAN can be assigned to one and only one EVI network ID.

Figure 8-14: Step 5: Configure the Extended VLANs


All the MAC addresses learned in all VLANs for a particular network ID are announced by EVI IS-IS. These MAC addresses are announced to remote-site peers along with their VLAN ID. This is how EVI maintains separate MAC tables per VLAN. Note In this example VLAN 4001 serves as the uplink transport. This VLAN must never be configured as an extended VLAN.

Step 6: Configure ENDS A previous step enabled the GRE tunnel by specifying the tunnel source. Tunnel destinations are automatically discovered by ENDP. Figure 8-15 reveals the configuration of an ENDP Server, or ENDS.

Figure 8-15: Step 6: Configure ENDS

Up to two servers can be configured. An EVI edge device configured as an ENDS automatically registers with itself as a client. This configuration is added on the EVI tunnel interface. Since one tunnel interface is configured for each network ID, a separate neighbor discovery topology exists for EVI network ID.

Step 7: Configure EVI Neighbor Discovery Client EVI neighbor discovery is enabled by configuring ENDP clients (ENDC) with the IP address of the server, as configured in the previous step. A client can be configured with up to two servers. Servers can only be configured with one server, since they are already acting as one of the two possible servers. In Figure 8-16, the top example could be for a two-site deployment. Edge device Site2-125001-Tunnel1 is configured as an ENDC. This continues the example from Figure 8-15, in which Site1-125001-Tunnel1 was configured as an ENDS.


Figure 8-16: Step 7: Configure EVI Neighbor Discovery Client

The next example in Figure 8-16 is for a three-site deployment. Site 1 and Site 2 edge devices are each configured as servers, and then configured as a client of each other. Site 3’s edge device is configured purely as a client, and so both servers are specified.

Step 8: Verify As shown in Figure 8-17, there are several verification commands available to validate your configuration efforts.



The neighbor discovery summary and member commands reveal current EVI status and reveal registered addresses. The client summary options summarize client configuration and provide an overview of all the remote IP addresses which have been learned through the neighbor discovery protocol. The “Display EVI link interface tunnel” command shows the status of the EVI link. The “display interface” command shows information about logical L2 EVI links. The control plane protocol can be examined with “display EVI IS-IS brief”, allowing you to see remote and local MAC address.

Advanced Step 9: ARP Suppression ARP flooding suppression minimizes ARP broadcast traffic over EVI links. This must be configured separately on each edge device, since each edge device performs local ARP caching.


As in the example in Figure 8-18, this configuration is performed under the tunnel interface, so each tenant’s EVI network ID can function separately.

Figure 8-18: Advanced Step 9: ARP Suppression

Advanced Step 10: Selective Flooding Selective flooding provides administrative control over traffic flows. By default broadcasts are always flooded, while unknown and unregistered multicast unicast are only locally flooded on the local interfaces. The network administrator can enable traditional flooding for selected MAC addresses on one or more VLANs. In Figure 8-19, the top example is for Microsoft NLB MAC. The MAC address to be flooded is configured on the tunnel interface. In the example, whenever local packets in VLAN 10 are received, with the specified destination MAC address, they will be flooded to all remote sites. This is Independent of receipt of any IGMP join or PIM register messages.

Figure 8-19: Advanced Step 10: Selective Flooding

For typical multicast traffic, it is important to ensure that IGMP and PIM protocol messages are flooded between data centers. These protocols use Layer 3 multicast destination addresses 224.0.0.1 and 224.0.0.13. These addresses translate to the Layer 2 MAC address 0100-5e00-0001 and 0100-5e00-000d. In the case of Figure 8-19, the selective flooding is enabled for VLAN 10.

Advanced Step 11: Virtual-system IDs You learned earlier that a single EVI LSP can handle 56,320 MAC addresses, and that this defines the upper limit of MAC addresses per data center site. Virtual system IDs can be defined to accommodate more than this default maximum.


Virtual system IDs are defined from within the EVI IS-IS process. The EVI IS-IS process ID is unique for each EVI network ID, and is based on the EVI tunnel interface number. Since Tunnel 1 was defined, the EVI IS-IS process is also 1. Remember that the default limit of 56,320 is per data center, not per deployment. If you have four data centers, each with 20,000 MAC addresses, there are 80,000 MAC addresses in the deployment. The scenario in Figure 8-20 does not require virtual system IDs. Each data center site announces its own LSP of 20,000 addresses, which is well below the default maximum. If an individual site expanded over time to over 56,320 MAC addresses, then virtual system IDs would be required.

Figure 8-20: Advanced Step 11: Virtual-system IDs

Summary In this chapter you learned: ■ EVI was developed by HP to extend Layer 2 network across Layer 3 networks to remote data centers. EVI is based on the IS-IS link-state routing protocol, and uses MAC-in-GRE format tunnels to form peers over any existing routed IP infrastructure. ■ With support for up to eight interconnected data centers, several optimization techniques are available, including ARP suppression, VRRP Hello blocking, and selective flooding. ■ An EVI edge device provides EVI services by forming IS-IS adjacencies through GRE tunnels. A single EVI deployment can accommodate multiple tenants, since each tenant’s GRE tunnel set is defined by a unique EVI network ID. ■ Tunnels are dynamically created as new sites are brought up and down through use of the EVI Neighbor Discovery Protocol (ENDP). Once the tunnels are formed, EVI IS-IS advertises MAC reachability information to remote sites. This enables efficient data forwarding for endpoints. ■ Local MAC learning is handled in the data plane, in the same way that traditional Ethernet switches glean addresses from the source address field of an Ethernet frame. Learning MAC addresses for remote devices is handled at the control plane, by the EVI IS-IS protocol.


■ Default EVI forwarding mechanism efficiently handle most unicast, multicast, and broadcast traffic. However, these defaults can be modified to handle special cases, including ARP broadcast suppression and VRRP Hello frames. ■ EVI configuration includes setting up transport and tunnel interfaces, network IDs, extended VLANs, and ENDP. ARP suppression, selective flooding, virtual system IDs, and site IDs are optional.

Learning Check Answer each of the questions below. 1. What are three goals of EVI (Choose three)? a. Support connectivity between three or more data centers. b. Extend Layer 2 networks across data centers. c. Use Layer 3 transport mechanisms without requiring MPLS connectivity. d. Provide several techniques to optimize and tune Layer 3 connectivity. e. Enable VMs to be easily moved to any data center 2. Choose three correctly described components of a typical EVI deployment (Choose three). a. The device that provides EVI services is called the Edge device. b. Each participating VM host is assigned a VNI. c. The ENDP provides public transport IP address registration. d. EVI IS-IS is used as a control protocol for EVI. e. EVI uses an ENDC server to register MAC addresses. 3. Internal EVI interfaces perform classic MAC learning, while EVI-facing links leverage EVI IS-IS for MAC learning at the control plane. a. True. b. False. 4. What are three steps for EVI neighbor discovery (Choose three)? a. The ENDP Client registers the transport IP with the ENDP Server. b. The ENDC queries the ENDS to retrieve active remote IP addresses. c. The ENDS devices update each other’s ARP cache. d. Edge devices can setup EVI tunnels to active IP addresses.


e. IPSec tunnels respond to access list permit statements. 5. What are two advanced configuration options for EVI (Choose two)? a. Multicast routing protocol configuration 1 b. Selective flooding c. EVI Site-ID configuration. d. ARP suppression2

Learning Check Answers 1. b, c, e 2. a, c, d 3. a 4. a, b, d 5. b, d


9 MPLS Basics

EXAM OBJECTIVES In this chapter, you learn to: ✓ Describe Multiprotocol Label Switching (MPLS). ✓ Understand the basic operation of MPLS. ✓ Describe the MPLS encapsulation. ✓ Clarify several MPLS misconceptions. ✓ Describe the behavior of a Label Switching Router (LSR). ✓ Describe a Label Switched Path (LSP). ✓ Describe a Forwarding Equivalence Class (FEC).

INTRODUCTION Multiprotocol Label Switching (MPLS) provides connection-oriented label switching over connectionless IP backbone networks. It integrates both the flexibility of IP routing and the simplicity of Layer 2 switching.

MPLS Advantages Multiprotocol Label Switching (MPLS) delivers the following advantages: ■ High speed and efficiency—MPLS uses short- and fixed-length labels to forward packets, avoiding complicated routing table lookups. ■ Multiprotocol support—MPLS resides between the link layer and the network layer. It can work over various link layer protocols (for example, PPP, ATM, frame relay, and Ethernet) to provide connection-oriented services for various network layer protocols (for example, IPv4 and IPv6).


■ Good scalability—The connection-oriented switching and multi-layer label stack features enable MPLS to deliver various extended services, such as VPN, traffic engineering, and QoS.

MPLS History Multiprotocol Label Switching (MPLS) was initially proposed as a solution to improve the forwarding speed of routers. In the past, routers performed software based routing using a central CPU rather than using ASICs on line cards. Every packet that arrived on an interface required a routing table lookup which was both time consuming and processor intensive. This was exacerbated when routers had very large routing tables. Switches in the past provided Layer 2 hardware forwarding, but could typically not route packets or had limited Layer 3 capabilities. Most switches also only supported Ethernet connections and not Layer 2 encapsulations such as ATM or Frame Relay. One of the original objectives of MPLS was to provide routing information base (RIB) lookups close to the forwarding performance of Layer 2 switching. To enable this, it was envisioned that MPLS core devices would use labels rather than routing tables and would perform simple swaps of labels rather than using complex routing tables. The premise was to program the simple forwarding table used by MPLS via a higher level protocol. The MPLS forwarding table or label forwarding information base (LFIB) would be derived from information contained in the routing table. This original MPLS goal of increased speed through label switching is less relevant today because of hardware based routing. In today's networks, devices perform Layer 3 forwarding in hardware using ASICs and can therefore route at wire speeds. The past performance issues of slower Layer 3 forwarding is negated by the use of hardware based forwarding information bases (FIBs) based on software routing information bases (RIBs). MPLS is still valid today as it abstracts the backbone network from the user network. Customer services can be transported transparently across the MPLS backbone network with greater ease and scalability. MPLS benefits include improved performance (not as relevant today) and reduced total cost of ownership because of flexible network deployments. Both Layer 2 and Layer 3 customer connections are supported across an MPLS network. Large scale networks are supported providing connections from multiple customers or business units. MPLS can also utilize existing underlying network infrastructures including Frame Relay, ATM and Ethernet.


MPLS provides better security and isolation of customer networks. It is far simpler and more scalable to configure MPLS L3PVNs than trying to separate customer traffic using access control lists. MPLS supports dynamic re-convergence when a link or device in the network fails. Traditional protocols such as OSPF and BGP are used in addition to MPLS specific protocols to provide automatic failure and path calculation. This provides redundancy to any services utilizing the core MPLS infrastructure including L2VPNS or L3VPNs. MPLS supports advanced traffic engineering allowing network operators to select other path selection algorithms in addition to the default path selection algorithms of bandwidth or hop count used by traditional routing protocols.

MPLS Application Uses MPLS provides many advantages to service providers with multiple MPLS application use cases including L2VPNs, L3VPNs and VPLS. One major advantage of MPLS is the abstraction of the core network from customer devices. The customer network devices are unaware of the underlying MPLS core network. The core MPLS network devices are also protected from spurious or malicious routing updates. Management access to the core network devices is also restricted because of customer isolation. Another advantage is customer separation across a shared infrastructure – multiple customers can use the same core MPLS network, but be separated in a logical way, similar to how VLANs separate customers in traditional Ethernet environments. MPLS supports customer networks that use their own IP addressing scheme which may overlap with other customers. MPLS supports both Layer 3 IP and Layer 2 Ethernet VPN services between customer sites across a shared backbone network. A customer Layer 2 network can cross multiple routed core network devices which are invisible to the customer. Customers can also implement their own MPLS networks transporting Layer 2 or Layer 3 business unit traffic across an existing Layer 3 infrastructure. Customers can therefore easily create isolated networks similar to VLANs, but with the added advantage that MPLS networks can span routed links and disparate sites. A second important use case for MPLS is advanced traffic engineering (TE). The details of MPLS TE are out of scope of this study guide, but a brief overview is


provided here for completeness. Network congestion can degrade backbone network performance. Congestion can occur when network resources are inadequate or when load distribution is unbalanced. Traffic engineering (TE) is intended to avoid the latter situation where partial congestion might occur because of improper resource allocation. In traditional IP routing, traffic is typically forwarded based on a single best path to a destination (or a limited number of paths with equal cost multipath selection). Route selection is based on bandwidth and not on other dynamic factors such as traffic load. MPLS TE can load share traffic across multiple unequal paths based on real time network conditions. As an example, consider a network with two WAN connections between two remote sites. One WAN connection is a high speed, expensive connection while the other is a low speed, inexpensive link. Configuring policy based IP routing for load sharing of certain traffic types across one link and other traffic types across the second link is possible, but is labor intensive and error prone. Manual policy based routing configuration is also not scalable or dynamic. This is not a flexible or practical method in global service provider networks. MPLS TE provides an easier alternative for traffic path selection using scalable, easier to configure policies that can also dynamically adjust to network conditions such as load or link failure. TE can make the best use of network resources and avoid uneven load distribution by the use of real-time traffic monitoring dynamic tuning of traffic management attributes, routing parameters, and resource constraints. MPLS TE combines the MPLS technology and traffic engineering. It reserves resources by establishing LSP tunnels along the specified paths, allowing traffic to bypass congested nodes to achieve appropriate load distribution. MPLS TE features simplicity and good scalability. With MPLS TE, a service provider can deploy traffic engineering on the existing MPLS backbone to provide various services and optimize network resources management.

Supported Products Both Comware routers and switches include support for MPLS (model dependent).


Switches All Comware chassis based switches support MPLS with the exception of certain models when used in combination with basic line processing units (LPUs). Please refer to the datasheets of individual switches for more information. MPLS support on HP Comware fixed port switches is limited to the high end and data centre models. As an example, a 5500HI switch includes full MPLS support, whereas a 5500 switch running either the standard or enhanced images does not support MPLS. 5800 series switches support MPLS, but be aware that the 5820 series switches do not support MPLS. Comware 7 specific devices such as the 5900 series switches do support MPLS. Switches that support MPLS include the following functionality: basic MPLS, L3VPN, L2VPN (for Ethernet only) and VPLS.

Routers All chassis based routers support MPLS. In addition, all MSR routers support MPLS except for the MSR900 series routers. Routers that support MPLS include the following functionality: basic MPLS, L3VPN, and L2VPN (for Ethernet and other media types such as ATM). Router support for VPLS (point-to-multipoint VPN feature) is limited to high end routers. HP Virtual Services Routers (VSRs) also support MPLS including MPLS VPNs and MPLS Traffic Engineering (MPLS TE)

MPLS Terminology Table 9-1 briefly describes various MPLS terms. Some of these are discussed in more detail later in the chapter. Table 9-1: MPLS terms Term

Description

Customer

A CE device is a customer network device directly connected to the service provider network. It can be a network device (such as a router


edge (CE)

or a switch) or a host. It is unaware of the existence of any VPN, neither does it need to support MPLS. As a forwarding technology based on classification, MPLS groups Forwarding packets to be forwarded in the same manner into a class called the equivalence forwarding equivalence class (FEC). That is, packets of the same FEC class (FEC) are handled in the same way. Label A label uniquely identifies a FEC and has local significance. Label A router that performs MPLS forwarding is a label switching router switch (LSR). router (LSR) A label switched path (LSP) is the path along which packets of a FEC Label travel through an MPLS network. switch path An LSP is a unidirectional packet forwarding path. Two neighboring (LSP) LSRs are called the "upstream LSR" and "downstream LSR" along the direction of an LSP. The Label Forwarding Information Base (LFIB) on an MPLS network Label functions like the Forwarding Information Base (FIB) on an IP Forwarding network. When an LSR receives a labeled packet, it searches the LFIB Information to obtain information for forwarding the packet, such as the label Base (LFIB) operation type, the outgoing label value, and the next hop. A label block is a set of labels. It includes the following parameters: Label base—The LB specifies the initial label value of the label block. A PE automatically selects an LB value that cannot be manually modified. Label range—The LR specifies the number of labels that the label block contains. The LB and LR determine the labels contained in the label block. For example, if the LB is 1000 and the LR is 5, the label block contains labels 1000 through 1004. Label-block offset—The LO specifies the offset of a label block. If the existing label block becomes insufficient as the VPN sites increase, you can add a new label block to enlarge the label range. A PE uses an Label block LO to identify the position of the new label block. The LO value of a label block is the sum of the LRs of all previously assigned label blocks. For example, if the LR and LO of the first label block are 10 and 0, the LO of the second label block is 10. If the LR of the second label block is 20, the LO of the third label block is 30. A label block who’s LB, LO, and LR are 1000, 10, and 5 is represented as 1000/10/5. Assume that a VPN has 10 sites, and a PE assigns the first label block LB1/0/10 to the VPN. When another 15 sites are added, the PE keeps the first label block and assigns the second label block LB2/10/15 to


extend the network. LB1 and LB2 are the initial label values that are randomly selected by the PE. The top of stack label is removed. The packet will be forwarded Pop label based on remaining label stack (if labels remain in stack) or Layer 3 header (if no labels remain). The top label in the label stack is removed and replaced (changed or Swap label swapped) with a new label. Insert / Various terms used for the addition of a new label to a non-MPLS Impose / packet or added to the top of stack of an MPLS packet. Push label MPLS-TE focuses on the optimization of overall network performance. It is intended to conveniently provide highly efficient and reliable network services. The performance objectives associated MPLS with TE are either traffic oriented to enhance quality of service (QoS) Traffic or resources oriented to optimize resources (especially bandwidth) Engineering utilization. TE helps optimize network resources use to reduce (MPLS-TE) network administrative cost, and dynamically tune traffic when congestion or flapping occurs. In addition, it allows ISPs to provide added value services. Provider P devices do not directly connect to CEs. They only need to forward device (P) user packets between PEs using label swapping. A PE device is a service provider network device connected to one or Provider more CEs. It provides VPN access by mapping and forwarding edge (PE) packets from user networks to public network tunnels and from public network tunnels to user networks. Route An RD is added before a site ID to distinguish the sites that have the distinguisher same site ID but reside in different VPNs. An RD and a site ID (RD) uniquely identify a VPN site. PEs use the BGP route target attribute (also called "VPN target" attribute) to manage BGP L2VPN information advertisement. PEs support the following types of route target attributes: Export target attribute—When a PE sends L2VPN information (such as site ID, RD, and label block) to the peer PE in a BGP update message, it sets the route target attribute in the update message to export target. Route target Import target attribute—When a PE receives an update message from (RT) the peer PE, it checks the route target attribute in the update message. If the route target value matches an import target, the PE accepts the L2VPN information in the update message. Route target attributes determine which PEs can receive L2VPN information, and from which PEs that a PE can receive L2VPN


information.

MPLS Forwarding Equivalence Class (FEC) The first MPLS term described in more detail is the MPLS Forwarding Equivalence Class (FEC). A FEC is a group of data packets with similar or identical parameters which use the same MPLS labels and are forwarded in the same way through an MPLS network. As an analogy, a FEC can be compared to an IP prefix such as 10.1.1.0/24. In a production network, it is unlikely that multiple packets will have exactly the same header values or packet content. Typically, the destination subnet for multiple packets will be the same whilst other values are different (TCP port, TOS, source IP address etc.). As an example, when using traditional IP routing, ping or telnet traffic to the same destination subnet will use the same routing entry in the IPv4 routing table even though the Layer 4 protocol is different. Traffic from multiple sources to the same destination may also use the same entry. For example, a source of 10.2.2.1/24 or 10.2.2.2/24 may send traffic to 10.1.1.1/24, but traffic from either source host will match the same routing table entry to reach host 10.1.1.1 (destination subnet 10.1.1.0/24). In an IP routing table, any IP packets that match the destination IP prefix will follow the same path or paths through the network. In the same way, multiple flows from different sources may use the same FEC to reach a destination network. A FEC typically groups packets by destination IP address rather than grouping packets that have identical headers. This raises the question – what is the difference between a FEC and IP prefix? IPv4 unicast routing is based on the destination IPv4 address in a packet. With MPLS in contrast, the assignment of the data path is very flexible and does not have to be based solely on destination IPv4 address. FEC selection can be based on several options including source IP address, Layer 2 characteristics or source interface. As an example, you could assign all data (IPv4, IPv6, ARP) arriving on an interface to a specific FEC and then forward those packets in the same way.

MPLS Label An MPLS label uniquely identifies an FEC and has local significance.


A label is inserted between the Layer 2 header and Layer 3 header of a packet by a label switch router (LSR), as illustrated in Figure 9-1. The MPLS header is often referred to as a shim or Layer 2 ½ header in reference to the OSI model.

Figure 9-1: MPLS Label

The MPLS header is neither the Layer 2 nor the Layer 3 header of the OSI model. The Layer 2 header could be Ethernet, PPP, token ring or another Layer 2 encapsulation. The Layer 3 header could be IPv4 or IPv6. The MPLS header is inserted between an Ethernet header and IPv4 header. The MPLS header is four bytes (32 bits) long and consists of the following fields: ■ Label—20-bit label value. ■ TC—3-bit traffic class, used for QoS. It is also referred to as the experimental bits (Exp). ■ S—1-bit bottom of stack flag. A label stack can comprise multiple labels. The label nearest to the Layer 2 header is called the "top label," and the label nearest to the Layer 3 header is called the "bottom label." The S field is set to 1 if the label is the bottom label and set to 0 if not. This is used with MPLS L2VPNs and L2VPNs. ■ TTL—8-bit time to live field used for loop prevention. This is similar to the way TTL works in IPv4. MPLS labels are locally significant. That means that the label is only meaningful to the next hop LSR. No end to end single label value is assigned to a FEC. The label typically changes on a per hop basis and is different between each router hop. It is possible that a next hop LSR allocates the same label randomly to the same FEC. This occurrence of both ingress and egress label being the same is a random event and should not be expected. Labels typically change on a per hop basis per FEC. The MPLS label field is 20 bits in length. Label numbers are therefore in the range 0


to 1,048,575. Labels can be assigned manually or allocated by MPLS control protocols. For scalability reasons labels are typically automatically created and allocated by using label distribution protocols like Label Distribution Protocol (LDP) rather than being assigned manually by administrators. Label distribution protocols include LDP, Multiprotocol BGP (MBGP), constrain based OSPF and RSVP. This study guide focuses on LDP. MPLS supports label stacking - rather than a single label being inserted, multiple labels are inserted in the MPLS header using a label stack. MPLS implementations such as L2VPNs and L3VPNs require multiple MPLS labels. Typically, the outer label identifies the peer PE device (next-hop device in BGP) and the inner label identifies the VPN or circuit. The S-Bit or S Field indicates bottom of stack when set to 1. This means that the current label is the last label in the stack. If the S-Bit is set to 0, it indicates that there are more labels in the stack. There are several reserved label values as explained in RFC3032: ■ A value of 0 represents the "IPv4 Explicit NULL Label". This label value is only legal at the bottom of the label stack. It indicates that the label stack must be popped, and the forwarding of the packet must then be based on the IPv4 header. This can be used for Penultimate Hop Popping (PHP) which is explained later in the chapter. ■ A value of 1 represents the "Router Alert Label". This label value is legal anywhere in the label stack except at the bottom. When a received packet contains this label value at the top of the label stack, it is delivered to a local software module for processing. The actual forwarding of the packet is determined by the label beneath it in the stack. However, if the packet is forwarded further, the Router Alert Label should be pushed back onto the label stack before forwarding. The use of this label is analogous to the use of the "Router Alert Option" in IP packets. Since this label cannot occur at the bottom of the stack, it is not associated with a particular network layer protocol. ■ A value of 2 represents the "IPv6 Explicit NULL Label". This label value is only legal at the bottom of the label stack. It indicates that the label stack must be popped, and the forwarding of the packet must then be based on the IPv6 header. ■ A value of 3 represents the "Implicit NULL Label". This is a label that an LSR may assign and distribute, but which never actually appears in the encapsulation.


When an LSR would otherwise replace the label at the top of the stack with a new label, but the new label is "Implicit NULL", the LSR will pop the stack instead of doing the replacement. Although this value may never appear in the encapsulation, it needs to be specified in the Label Distribution Protocol, so a value is reserved. ■ Values 4-15 are reserved. Note The label range for Circuit Cross Connect (CCC) and Static Virtual Circuit (SVC) is from 16 to 1023. These labels are reserved for static LSPs.

MPLS Label Switch Router (LSR) The MPLS Label Switch Router (LSR) is the actual device performing the label switching. LSRs performs packet forwarding using label switching and run the MPLS control plane protocols required to set up the label switch path (LSP). An MPLS network comprises the following types of LSRs: ■ Ingress LSR—Ingress LSR of packets. It labels packets entering into the MPLS network. ■ Transit LSR—Intermediate LSRs in the MPLS network. The transit LSRs on an LSP forward packets to the egress LSR according to labels. ■ Egress LSR—Egress LSR of packets. It removes labels from packets and forwards the packets to their destination networks. There are also two LSR roles dependent on location in a core MPLS network: ■ Provider device (P) - P devices do not directly connect to CEs, but are backbone core devices. They only need to forward user packets between PEs. These devices typically swap labels (Transit LSR) ■ Provider edge (PE) - A PE device is a service provider network device connected to one or more CEs. It provides VPN access by mapping and forwarding packets from user networks to public network tunnels and from public network tunnels to user networks. These devices typically insert and pop labels (Ingress or Egress LSR). The PE device provides most MPLS features such as L2VPNs and L3VPNs and has the most complex configuration. Incoming traffic is selected and labels inserted or removed based on the FEC.


P devices are unaware of the additional labels used for MPLS implementations such as L2VPNs and simply swap labels. These devices tend to have simple configurations. A third device type is Customer Edge (CE device). The CE does not have an LSR role, but is rather a customer device at a customer site that is unaware of the MPLS network. This device runs traditional routing and switching and connects to the PE device. In an unmanaged MPLS CE environment, the CE device is configured and managed by the customer and the PE device is managed by the service provider. In a fully management environment, the service provider would manage both PE and CE devices. LSRs may perform the following actions on labels: ■ Insert / Impose / Push label: Addition of a new label to a non-MPLS packet or added to the top of stack of an MPLS packet. Typically performed by a PE device when packets enter the MPLS network. A PE will look up which FEC the packet is assigned to and then based on the FEC insert a label on the packet. The packet is then transmitted to the core MPLS network (typically to a P router). ■ Swap label: The top of stack label is removed and replaced (changed or swapped) with a new label. Typically performed by a P device when packets move from one interface to another in the core MPLS network. Labels are swapped on a hop by hop basis by each LSR in the LSP. ■ Pop / remove label: The top of stack label is removed. The packet will be forwarded based on remaining label stack (if labels remain in the stack) or Layer 3 header (if no labels remain). Typically performed by a PE device when packets leave the MPLS network. Packets that exit the MPLS network are typically unchanged from when they entered the MPLS network.

MPLS Label Switch Path (LSP) A label switched path (LSP) is the path along which packets of a FEC travel through an MPLS network. It is often referred to as an MPLS tunnel. As shown in Figure 9-2, the LSP is a unidirectional packet forwarding path. Two neighboring LSRs are called the "upstream LSR" and "downstream LSR" along the direction of an LSP.


Figure 9-2: MPLS Label Switch Path (LSP)

As an analogy, view the LSP as a group of labels that are bound to each other through the MPLS network forming a path through the MPLS network. In Figure 9-2, the PE1 on the left has an LSP to PE2 on the right. When PE1 pings PE2, it does not send IPv4 packets (EtherType 0x0800) to P1 in the MPLS network, but rather inserts a label before transmission to P1. The packet sent to P1 by PE1 is therefore an MPLS packet (EtherType 0x8847). The label inserted is based on advertisements from LDP and a predetermined path calculation made by the IGP or other mechanisms. Each P LSR will receive the MPLS packet (EtherType 0x8847) and will label switch the packets between interfaces. Each P LSR will also swap labels based on the predetermined path and advertised labels. This will continue until the packet arrives at PE2. PE2 will pop the label and route the packet to the appropriate interface based on the IP prefix. The core routers are not aware of the contents of the packet, but simply swap labels.


This is why the LSP is often referred to as a tunnel. Packets are encapsulated in MPLS for transmission across the MPLS network. Any packet that is labeled by PE1 with the same label will follow the same predetermined LSP. The packets will arrive at PE2 from PE1 without any of the core routers being aware of the Layer 3 packet headers or packet content. LSPs are unidirectional. Therefore a separate LSP is created for traffic in the reverse direction (not shown in the figure). Different labels are used and a separate LSP is calculated for traffic from PE2 back to PE1. Two LSPs would therefore be required for bidirectional communication.

MPLS Label Switch Path (LSP) How is the LSP path selection calculated? Path selection can firstly be based on a traditional interior gateway protocol (IGP) such as OSPF. The routing protocol will calculate the best path based on its own metric calculations such as bandwidth (OSPF) or hop count (RIP). In the case of OSPF, the best path is based on bandwidth. The LSP in turn is based on the best path selection as calculated by OSPF. This provides an easy and convenient LSP path selection process. A second option is to manually select the LSP using explicit routed path selection. An administrator would configure the LSP on a hop by hop basis by specifying the label to be used and the outgoing interface. This method is very flexible, but is also very labor intensive. In addition, there is no dynamic failover mechanism and thus explicit routed path selection is not used regularly in production environments. In more complex MPLS networks, constraint-based routing is often used. To establish a Constraint-based Routed Label Switched Path (CRLSP), an administrator configures a routing protocol such as OSPF, but in addition specifies constraints, such as explicit paths or path restrictions. Links or routers can be included or excluded based on specified constraint criteria. An example of this type of traffic engineering is the forwarding of customer traffic across a high speed, more expensive link and an additional low speed, less expensive link. The high speed link could be “marked” with a different “color” to the low speed link. Based on a specified algorithm, traffic from only certain customers will be transmitted across the high speed link while others use the low speed link. The mechanics of path selection is out of the scope of this study guide, but it is mentioned here for completeness.


MPLS Label Information Base (LIB) An MPLS label information base (LIB) is the software table of all the possible forward equivalence classes (FECs) and the labels allocated to them, see Figure 9-3. This is similar in concept to the IPv4 routing information base (RIB) which is the software routing table of an IPv4 router.

Figure 9-3: MPLS Label Information Base (LIB)

In a traditional RIB, information such as the destination network, outgoing interface and next hop IP address are stored. In a LIB, similar information is stored including FEC (destination), outgoing interface and associated label. The LIB is populated via routing protocols such as OSPF, ISIS and RIP. Labels associated to FECs are added by MPLS control plane protocols such as LDP. The LIB contains FEC to label mappings and label to label mappings. The FEC to label mappings associate ingress packets to MPLS labels. For example, ingress IPv4 traffic destined to IP prefix 10.1.1.0/24 may have label 1000 inserted. However in the core MPLS network, labels are swapped. Therefore a label of 1000 could be mapped to label 1001. MPLS traffic received by a P LSR with a label 1000 on an ingress interface has the label swapped with label 1001 on egress from the P LSR.


One difference between the FIB and LIB is that the LIB contains all known FECs and labels, whereas the FIB contains the best routes. In an MPLS environment, the LIB contains all known routes, but the LFIB contains only the best paths.

MPLS Label Forward Information Base (LFIB) The Label Forwarding Information Base (LFIB) is the hardware table as programmed in the ASICs of the MPLS label switch router, see Figure 9-4.

Figure 9-4: MPLS Label Forward Information Base (LFIB)

The LFIB in an MPLS network functions like the Forwarding Information Base (FIB) in an IP network. When an LSR receives a labeled packet, it searches the LFIB to obtain information for forwarding the packet, such as the label operation type, the outgoing label value, and the next hop. A FIB (hardware) is populated with information from the RIB (software). In the same way, a LFIB (hardware) is populated with information from the LIB (software). Information in the LIB is used to program the LFIB. While the LIB contains all known possible routes, on the “best path” for a FEC is programmed into the LFIB. The LFIB contains a FEC to label mapping and egress interface for traffic entering the MPLS network. In other words, when traffic is received by a PE from a CE, that


traffic may be IPv4 traffic. Before the PE transmits the traffic to a P LSR, the PE will insert a label. An IP prefix to label mapping is therefore stored in the PE LFIB. Traffic received by a P device from a PE device will be label swapped. Thus, a label to label mapping and egress interface is stored in the LFIB of the P device.

MPLS Forwarding Information Base (FIB) As shown in Figure 9-5, an LSR consists of two components: ■

Control plane: Implements label distribution and routing, assigns labels, distributes FEC-label mappings to neighbor LSRs, creates the LFIB, and establishes and removes LSPs.

■ Forwarding plane: Forwards packets according to the LFIB.

Figure 9-5: MPLS Forwarding Information Base (FIB)

Tables:


■ Routing Information Base (RIB): Software based version of the routing table. ■ Label Information Base (LIB): Software table of all the possible forward equivalence classes (FECs) and the labels allocated to them. ■ Forwarding information bases (FIB): Hardware forwarding table based on the software routing information base (RIB). ■ Forwarding information bases (FIB): Hardware forwarding table based on the software routing information base (RIB). ■ Label Forwarding Information Base (LFIB): Hardware label forwarding table based on information in the software LIB. An Edge LSR forwards both labeled packets and IP packets via the forwarding plane and therefore uses either the LFIB (labeled packets) or the FIB (IP packets). An ordinary LSR only needs to forward MPLS labeled packets and therefore uses only the LFIB.

MPLS Label Processing MPLS label processing can be compared to IPv4 routing. In IPv4, a destination prefix needs to be learned via a dynamic routing protocol or configured manually using static routes. Information about a single IP prefix could be learnt in multiple ways. For example both RIP and OSPF may have learnt about network 10.1.1.0/24 and would want to add that route to the routing table. Based on criteria such as the administrative distance, the best route is added to the routing table (RIB). The selected RIB entry information is then stored in the hardware routing table (FIB). Once the RIB and FIB are populated, traffic destined for the destination prefix can be processed in hardware without referring to the software RIB. In the similar way, destination prefixes or FECs are learnt via routing protocols such as OSPF and label distribution protocols such as LDP, or manually configured in the LIB. The best path is programmed in the LFIB (in hardware). Once the labels have been programmed into the LFIB, traffic destined to the FEC can then be processed in hardware. Later in this chapter, we will discuss how the destination FECs are announced and processed and then how traffic destined to a FEC is processed.


MPLS Distribution Protocol (LDP) Destination FECs are announced by label distribution protocols such as LDP. LDP distributes destination network to label mappings based on entries in the IPv4 routing table. The router's local IPv4 routing table contains networks learnt via routing protocols and each IPv4 entry is a possible MPLS FEC. For each local IPv4 prefix found in the local IPv4 routing table, a local label is assigned. Administrators can limit label assignment to only certain IPv4 prefixes by configuring the LSP trigger command appropriately. LDP exchanges FEC to LDP mappings with LDP neighbors. A local router informs neighbor LDP neighbors (peers) which label to use when sending traffic to the local router for a specific FEC. As an example, LSR1 may select label 1000 for network 10.1.1.0/24. LSR1 will then advertise label 1000 to peer LSRs using LDP. Those neighbors should then use label 1000 when sending traffic to LSR1 destined for network 10.1.1.0/24. All LDP neighbors will learn which labels to use for all possible FECs advertised by the local LSR. When traffic is received from peer LSRs, the local LSR knows which FEC the traffic belongs to as the LSR allocated those labels locally and advertised them to the peers.

MPLS LDP The network in Figure 9-6 will be used as a starting point to explain label distribution. The network state is as follows: ■ IPv4 addresses configured. ■ OSPF configured and IPv4. ■ Routes have been exchanged.


Figure 9-6: MPLS LDP

When LDP is configured for label distribution, label advertisement may take place as follows: 1. Assume that PE-2 in Figure 9-6 has learned about a subnet of 10.1.2.0/24 from CE-2. This advertisement is a traditional IPv4 advertisement via routing protocols such as OSPF or RIP. MPLS or LDP is not used as neither interface Gigabit Ethernet 1/0/0 on PE-2 is not configured for MPLS nor is CE configured to use MPLS. 2. Assuming that the interface is currently up, subnet 10.1.2.0/24 will be shown as available in PE-2’s routing table. As OSPF is configured on PE-2 and is advertising routes, network 10.1.2.0/24 will also be advertised to P-1. 3. The OSPF process on P-1 will learn about network 10.1.2.0/24 and the route will be added to the IPv4 routing table with a next hop of PE-2 and local outgoing interface of Gigabit Ethernet 1/0/0. 4. PE-2 in this example is configured to generate labels for all prefixes in the IPv4 routing table. PE-2 will allocate a locally significant label (2001 in this example) to subnet 10.1.2.0/24. This information is stored in both the LIB and LFIB tables. 5. PE-2 will advertise the label to P-1 using LDP. In other words, PE-2 is advertising to P-1 that subnet 10.1.2.0/24 is available using label 2001. 6. P-1 adds the update to its local LIB. 7. P-1 then associates label 2001 with subnet 10.1.2.0/24 in the FIB.

MPLS Data Forwarding As shown in Figure 9-7, assume that P-1 pings a host (10.1.2.10) on subnet


10.1.2.0/24. Traffic flows as follows:

Figure 9-7: MPLS Data Forwarding

1. The incoming traffic for data transmission is IPv4. This is because the local IPv4 version stack is used when P-1 pings the IPv4 address of 10.1.2.10. The ping application uses the ICMP protocol (layer 4) which uses the IPv4 protocol (Layer 3). Because this is IPv4 traffic, the FIB table is used for hardware forwarding of the traffic. 2. In addition to the LIB containing a label associated with subnet 10.1.2.0/24, the label was also associated previously with the FIB table by P-1. In the example, label 2001 has been associated with subnet 10.1.2.0/24. When the packet is transmitted to PE-2, the MPLS label is inserted and the packet is transmitted as an MPLS labeled packet (EtherType 0x8847) to PE-2. 3. PE-2, on receipt of the packet knows that the LFIB table should be used rather than the FIB table because of the packet header EtherType 0x8847. The entry in the LFIB table is POP as the packet is going to an IPv4 interface that does not have MPLS enabled on it. 4. Once the label has been removed, the IPv4 header can be processed according to the FIB table. The FIB table has an entry for subnet 10.1.2.0/24 with an outgoing interface of G1/0/0. 5. The packet is then transmitted to the CE device as an IPv4 packet (EtherType 0x0800). The CE device is unaware that the packet was encapsulated using MPLS in the MPLS core. Figure 9-8 shows an extended topology with the addition of PE-1.


Figure 9-8: MPLS LDP

■ P-1 will independently allocate a local label to the subnet 10.1.2.0/24. In this example label 3001 was allocated. ■ P-1 will announce the subnet and label to PE-1 using LDP. ■ PE-1 will update the LIB table to indicate that it can get to subnet 10.1.2.0/24 using label 3001. ■ PE-1 will also update the FIB table to indicate that label 3001 should be inserted when traffic is transmitted to subnet 10.1.2.0/24.


MPLS Packet Forwarding—Part 1 Figure 9-9 shows packet processing for traffic through the extended MPLS network:

Figure 9-9: MPLS Packet Forwarding - Part 1

1. Assume that CE-1 pings 10.1.2.10 which is a host on subnet 10.1.2.0/24. Also assume that CE-1 has PE-1 configured as its default gateway. 2. When PE-1 receives the IPv4 traffic, PE-1 will check the FIB table as the incoming traffic is IPv4 traffic (EtherType 0x0800). The FIB table has an entry for the destination subnet (10.1.2.0/24) and outgoing interface of G1/0/0. 3. In addition, the label 3001 is associated with this subnet in the FIB. In this example, label 3001 has been associated with subnet 10.1.2.0/24. When PE-1 transmits the packet to P-1, the MPLS label is inserted and the packet is transmitted as an MPLS labeled packet (EtherType 0x8847). 4. When P-1 receives the MPLS labeled traffic, P-1 will check the LFIB table because the incoming traffic is MPLS traffic (EtherType 0x8847). The LFIB table has an entry for the label 3001 which should be swapped with label 2001 and transmitted out of interface of G1/0/0. 5. P-1 swaps label 3001 with label 2001 and transmits the packet as an MPLS labeled packet (EtherType 0x8847) to PE-2. It is important to note that P-1 did not check the FIB or RIB to process the traffic. The router is routing traffic between interfaces without the use of the IPv4 routing table.


MPLS Packet Forwarding—Part 2 Figure 9-10 shows the second part of the packet processing for traffic through the MPLS network.

Figure 9-10: MPLS Packet Forwarding - Part 2

1. PE-2 receives the MPLS labeled traffic from P-1. 2.

PE-2 will check the LFIB table as the incoming traffic is MPLS traffic (EtherType 0x8847). The LFIB table has an entry for the label 2001 which should be removed (popped) as the packet is going to an IPv4 interface that does not have MPLS enabled on it.

3. Once the label has been removed, the IPv4 header can be processed according to the FIB table. The FIB table has an entry for subnet 10.1.2.0/24 with an outgoing interface of G1/0/0. 4. The packet is then transmitted to the CE device as an IPv4 packet (EtherType 0x0800). Both CE devices are unaware that the packet was encapsulated using MPLS in the MPLS core.


MPLS LDP - Part 1 In Figure 9-11, the core MPLS network is now expanded to include two P routers and additional links between devices.

Figure 9-11: MPLS LDP - Part 1

In MPLS environments, there are two label retention modes. The label retention mode specifies whether an LSR may maintain a label mapping for a FEC learned from a neighbor that is not its next hop. The two modes are: ■ Liberal label retention—Retains a received label mapping for a FEC regardless of whether the advertising LSR is the next hop of the FEC. This mechanism allows for quicker adaptation to topology changes, but it wastes system resources because LDP has to keep unused labels. Most MPLS routers support liberal label retention only. ■ Conservative label retention—Retains a received label mapping for a FEC only when the advertising LSR is the next hop of the FEC. This mechanism saves label resources, but it cannot quickly adapt to topology changes. In the example in Figure 9-11, we are assuming that liberal label retention mode is used as this is the most common implementation. 1. In Figure 9-11, PE-2 announces label 2001 to both P-1 and P-2. 2. P-1 and P-2 will update their LIB tables to include label 2001. In Figure 9-11, P-1 and P-2 both independently allocate a local label for subnet 10.1.2.0/24. P-1 allocates label 3001 and P-2 allocates label 3002.


3. Both P-1 and P-2 advertise their allocated labels to PE-1 which in turn updates its local LIB. In this case, PE-1 has learnt about two paths to subnet network 10.1.2.0/24. PE-1 will also allocate its own local label to the subnet, in this case 4001.


MPLS LDP - Part 2 In addition to labels being advertised to upstream LSRS, labels are also advertised back on the paths they were received (downstream). LDP will advertise all possible paths. In Figure 9-12: 1. PE-1 advertises its local label back to both P-1 and P-2. 2. P-1 and P-2 advertise their local labels to each other. In addition they update their LIB tables with all labels received.

Figure 9-12: MPLS LDP - Part 2

In Figure 9-12, P-1 contains the following labels for subnet 10.1.2.0/24: ■ 2001 - label advertised by PE-2 ■ 3001 - label advertised by P-2 ■ 4001 - label advertised by PE-1 P-1 can therefore reach the FEC (10.1.2.0/24) via any of the three routers. LDP does not select the best path. Another mechanism is required to determine the best path to the destination. The same situation applies to PE-1 and P-2. They have multiple paths to the FEC (10.1.2.0/24). As mentioned on previously, the routers are using liberal label retention. The advantage of this mechanism is that routers have already learned about multiple paths


to the same FEC and can react more quickly to topology changes. The disadvantage is overhead - more label information must to be maintained by the routers.

MPLS LDP - Best Path Selection MPLs control plane protocols do not calculate the best path. By default the IGP is used for path selection (OSPF, ISIS, static routes). An IGP such as OSPF uses a bandwidth calculation to determine the best path to a destination network and this in turn determines which LSP and labels are used for MPLS traffic. The LSRs will use the labels associated with the OSPF path selection next hop and outgoing interface. The best path LDP peer is determined from the IGP next hop IP address. The LSR determines the label and next hop by comparing the IGP next hop IP address in the routing table with LDP peers. Once a match is made between the IGP next hop and LDP peer IP address, the LSR can determine which label was advertised by that LDP peer and then use that label for label insertion or swapping. This information is then programmed in the LFIB in addition to the outgoing interface. In the case of a link failure, the IGP is also used to determine the new best path. When a link fails, OSPF will for example recalculate the best path to the destination prefix based on the bandwidth of remaining links. A new next hop IP address will then be associated with the route in the IP routing table. This in turn will determine the new LDP peer to use as well as the new label to use. The new label and new outgoing interface will be then programmed into the LFIB hardware table.

MPLS LDP Best Path In Figure 9-13, it is assumed that OSPF is the IGP used by the LSRs and all interfaces are of equal cost.


Figure 9-13: MPLS LDP Best Path

OSPF routing tables path selections for destination subnet 10.1.2.0/24 are shown in Figure 9-13. Both P-1 and P-2 have determined that the best path to prefix 10.1.2.0/24 is via PE-2. In the case of PE-1, the best path is via P-1. LSRs will compare the next hop IP addresses in the IP routing tables against LDP peers. Once a matched LDP peer is found, the label advertised by that peer and outgoing interface to that LDP peer is programmed in the LFIB table of each LSR. In the case of P-1, peer PE-2 and label 2001 are selected. This best route is then added to the LFIB of P-1. This is an important distinction - the LIB (software table) contains all possible paths as seen in Figure 9-13. However, the LFIB (hardware table) only contains the best route (not shown in Figure 9-13).

MPLS LDP Best Path After Link Failure In Figure 9-14, it is assumed that the link between P-1 and PE-2 and the link between P-1 and P-2 fail.


Figure 9-14: MPLS LDP Best Path After Link Failure

The only path available for P-1 to get to subnet 10.1.2.0/24 is via PE-1, P-2 and PE2. OSPF will determine this and update the IPv4 routing tables accordingly. The IP routing table of P-1 will reflect the new next hop of PE-1 and outgoing interface G1/0/1 (link to PE-1). PE-1 will also update its routing table with a new next hop of P-2 and outgoing interface of G1/0/2 (link to P-2). The P-2 router does not need to change its routing entry as the best path is still active. New LDP peers and labels are selected by both P-1 and PE-1. In the case of P-1, the new LDP peer is PE-1 and the advertised label of 4001 is now used for forwarding traffic to subnet 10.1.2.0/24. PE-1 will select a new LDP peer of P-2 and label of 3002. Because of liberal label retention, failover is very quick as the labels were previously learnt and retained. After link failure, LFIBs are simply updated with the updated label. In the case of conservative label retention, the new labels would have to be discovered which would slow down convergence.

MPLS Data Plane Label Processing Example An LSR can receive various types of traffic from CE devices (IPv4, IPv6, IPX, layer 2, layer 2 ½ labeled). In this study guide only incoming IPv4 and incoming labeled traffic are discussed. An edge LSR (PE) typically receives IP traffic from a CE device. When the PE receives the IP traffic destined to a subnet across an MPLS core, it will look up the destination IP prefix in the FIB. A corresponding forwarding equivalence class


(FEC) and label may be associated with the route. This label would have previously been announced by a LDP peer. The PE will insert the label and forward the traffic to the neighbor (typically a P LSR) based on the IGP selection process. The neighbor LSR will determine that the received packet is a labeled packet because of the EtherType. The LSR (typically a P router) will check the incoming label against the LFIB. If there is a match, the LSR will swap the label in hardware with a new label (LFIB). The new label is determined by the IGP next hop and in turn the neighbor LDP LSR.

MPLS Label Processing In summary, refer to Figure 9-15 for MPLS label processing. In this topology, it is assumed that label announcement via LDP has been completed.

Figure 9-15: MPLS Label Processing

In this example, the Ingress PE device receives an IPv4 packet with a destination address of 10.1.2.10. This packet will match the forward equivalence class 10.1.0/24 and the Ingress LSR will find the corresponding label for that forward equivalence class. In this example, OSPF has selected a path via P2 and P3 to the Egress PE LSR. P2 has advertised the label 1035 to the Ingress PE to use for this FEC. The Ingress LSR will therefore insert a label of 1035 and send an MPLS labeled packet (EtherType 0x8847) to P2. P2 will find a matching entry in its LFIB and then swap the label with a new label of 1096. The label of 1096 was previously advertised to P2 by P3 for this FEC. P2 will


forward the packet with the new label of 1096 to P3 (EtherType 0x8847). P3 will follow a similar process and find a matching entry in its LFIB and then swap the label with the label advertised previously by the Egress PE. P3 will forward the packet with the new label to the Egress LSR (EtherType 0x8847). The Egress LSR will find a matching entry in its LFIB table. This will include the instruction to pop the label. The packet’s IPv4 destination address will then be checked against the FIB table which indicates that the packet should be forwarded as a normal IPv4 packet out of the MPLS network (EtherType 0x0800).

PHP An egress node must perform two forwarding table lookups to forward a packet: two LFIB lookups (if the packet has more than one label), or one LFIB lookup and one FIB lookup (if the packet has only one label). The penultimate hop popping (PHP) feature can pop the label at the penultimate node, so the egress node only performs one table lookup. A PHP-capable egress node sends the penultimate node an implicit null label of 3. This label never appears in the label stack of packets. If an incoming packet matches an LFIB entry comprising the implicit null label, the penultimate node pops the label stack of the packet and forwards the packet to the egress LSR. The egress LSR then forwards the packet. Sometimes, the egress node must use the TC field in the label to perform QoS. To keep the TC information, you can configure the egress node to send the penultimate node an explicit null label of 0. If an incoming packet matches an LFIB entry comprising the explicit null label, the penultimate hop replaces the value of the top label with value 0, and forwards the packet to the egress node. The egress node gets the TC information, pops the label of the packet, and forwards the packet.

Basic MPLS Configuration Steps Basic MPLS configuration will now be discussed. These configuration steps, as shown in Figure 9-16, are performed on P and PE devices and not CE devices.


Figure 9-16: Basic MPLS Configuration Steps

In the first step, IP addresses and Interior Gateway Protocols (IGPs) are configured. In this study guide, the IGP used is OSPF, but another protocol such as ISIS could also be used. It is important that a loopback address be configured on both P and PE devices with a /32 mask. The loopback IP address must also be advertised by the IGP. In the second step, an MPLS LSR-ID is configured on each LSR device. HP recommends that this be set to a loopback IP address of the LSR. Thirdly, LDP is enabled globally on the LSR and IP prefixes that trigger label announcements are specified. Fourthly, MPLS and MPLS LDP are enabled on each backbone interface. Lastly, the configuration is verified. Note The configuration steps shown in this study guide are Comware 7 specific. Comware 5 MPLS configuration is similar, but be aware that there are minor differences. Please refer to the relevant device configuration documentation.

Step 1: Configure IP and IGP In this first step, configure IP and OSPF. The configuration of IP addresses and basic OSPF is not explained here as these fundamental topics are explained in other study guides. It is good practice to configure interfaces for optimal OSPF performance including


the following: ■ Configure interfaces as routed ports whenever possible. ■ Ensure minimal or no Layer 2 protocol impact on OSPF by disabling or tuning spanning tree. ■ Configure at least one loopback address with a /32 mask. Configure OSPF to use the loopback IP address as the router ID and advertise the loopback address via the OSPF. ■ Ensure that all backbone interfaces are configured with relevant IP addresses and advertise the networks via OSPF. ■ Enhance OSPF performance by using OSPF network type P2P on links where there are only two OSPF routers. This ensures that routers do not need to wait for the designated router election process to complete before forming peer relationships and exchanging routing information. ■ Adjust OSPF interface timers by reducing the OSPF hello interval to 1 second from the default of 10 seconds and the dead interval to 4 seconds from the default of 40 seconds. This helps detect peer device failure more quickly and provides quicker adjacency setup.

Command reference: Some of the following commands may be used for initial setup:

ospf Use the ospf command to enable OSPF and enter OSPF view. You can enable multiple OSPF processes on a router and specify different router IDs for them. Enable an OSPF process before performing other tasks.

Syntax ospf [ process-id instance-name ]

|

router-id

router-id

|

vpn-instance

undo ospf [ process-id ] process-id

Specifies an OSPF process by its ID in the range of 1 to 65535.


vpn-

router-id router-id

Specifies an OSPF router ID in dotted decimal notation. vpn-instance vpn-instance-name

Specifies an MPLS L3VPN instance by its name, a case-sensitive string of 1 to 31 characters. If no VPN is specified, the OSPF process runs on the public network.

Examples Enable OSPF process 100 and specify router ID 10.10.10.1. system-view [Sysname] ospf 100 router-id 10.10.10.1 [Sysname-ospf-100]

area (OSPF view) Use the area command to create an area and enter area view. Use the undo area command to remove an area. By default, no OSPF area is created.

Syntax area area-id undo area area-id area-id

Specifies an area by its ID, an IP address or a decimal integer in the range of 0 to 4294967295 that is translated into the IP address format by the system.

Examples Create area 0 and enter area 0 view. system-view [Sysname] ospf 100 [Sysname-ospf-100] area 0 [Sysname-ospf-100-area-0.0.0.0]


network (OSPF area view) Use the network command to enable OSPF on the interface attached to the specified network in the area. Use the undo network command to disable OSPF for the interface attached to the specified network in the area. By default OSPF is not enabled on any interface. This command enables OSPF on the interface attached to the specified network. The interface's primary IP address must be in the specified network. If only the interface's secondary IP address is in the network, the interface cannot run OSPF.

Syntax network ip-address wildcard-mask undo network ip-address wildcard-mask ip-address

Specifies the IP address of a network. wildcard-mask

Specifies the wildcard mask of the IP address. For example, the wildcard mask of mask 255.0.0.0 is 0.255.255.255.

Examples Specify the interface whose primary IP address is on network 131.108.20.0/24 to run OSPF in Area 2. system-view [Sysname] ospf 100 [Sysname-ospf-100] area 2 [Sysname-ospf-100-area-0.0.0.2] network 131.108.20.0 0.0.0.255

ospf network-type Use the ospf network-type command to set the network type for an interface. Use the undo ospf network-type command to restore the default network type for an interface.


By default, the network type of an interface depends on its link layer protocol: ■ For Ethernet and FDDI, the network type is broadcast. ■ For ATM, FR, and X.25, the network type is NBMA. ■ For PPP, LAPB, HDLC, and POS, the network type is P2P. If a router on a broadcast network does not support multicast, configure the network type for the connected interfaces as NBMA. If any two routers on an NBMA network are directly connected through a virtual link, the network is fully meshed, and you can configure the network type for the connected interfaces as NBMA. If two routers are not directly connected, configure the P2MP network type so that the two routers can exchange routing information through another router. When the network type of an interface is NBMA or P2MP unicast, you must use the peer command to specify the neighbor. If only two routers run OSPF on a network, you can configure the network type for the connected interfaces as P2P. When the network type of an interface is P2MP unicast, all OSPF packets are unicast by the interface.

Syntax ospf network-type { broadcast | nbma | p2mp [ unicast ] | p2p [ peer-address-check ] } broadcast

Specifies the network type as broadcast. nbma

Specifies the network type as NBMA. p2mp

Specifies the network type as P2MP. unicast

Specifies the P2MP interface to unicast OSPF packets. By default, a P2MP interface multicasts OSPF packets. p2p


Specifies the network type as P2P. peer-address-check

Checks whether the peer interface and the local interface are on the same network segment. Two P2P interfaces can establish a neighbor relationship only when they are on the same network segment.

Examples Configure the OSPF network type for VLAN-interface 10 as NBMA. system-view [Sysname] interface vlan-interface 10 [Sysname-Vlan-interface10] ospf network-type nbma

ospf timer hello Use the ospf timer hello command to set the hello interval on an interface. Use the undo ospf timer hello command to restore the default. By default, the hello interval is 10 seconds for P2P and broadcast interfaces, and is 30 seconds for P2MP and NBMA interfaces. The shorter the hello interval, the faster the topology converges, and the more resources are consumed. Make sure the hello interval on two neighboring interfaces is the same.

Syntax ospf timer hello seconds undo ospf timer hello seconds

Specifies the hello interval in the range of 1 to 65535 seconds.

Examples Configure the hello interval on VLAN-interface as 20 seconds. system-view [Sysname] interface vlan-interface 10 [Sysname-Vlan-interface10] ospf timer hello 20


ospf timer dead Use the ospf

timer

Use the undo

ospf

dead command to set the neighbor dead interval.

timer dead command to restore the default.

The dead interval is 40 seconds for broadcast and P2P interfaces. The dead interval is 120 seconds for P2MP and NBMA interfaces. If an interface receives no hello packet from a neighbor within the dead interval, the interface considers the neighbor down. The dead interval on an interface is at least four times the hello interval. Routers attached to the same segment must have the same dead interval.

Syntax ospf timer dead seconds undo ospf timer dead seconds

Specifies the dead interval in the range of 1 to 2147483647 seconds.

Examples Configure the dead interval for VLAN-interface 10 as 60 seconds. system-view [Sysname] interface vlan-interface 10 [Sysname-Vlan-interface10] ospf timer dead 60

Step 2: Configure MPLS LSR-ID Overview In this second step, the MPLS label switch router ID is configured. The LSR ID must be configured on each LSR device and typically, the loopback IP address is used as the LSR ID. In Figure 9-17, the LSR ID is configured as 10.0.0.1. The display mpls summary command is used to show MPLS settings including the LSR ID.


Figure 9-17: Step 2: Configure MPLS LSR-ID

mpls lsr-id Use the mpls lsr-id command to configure an LSR ID for the local LSR. Use the undo mpls lsr-id command to delete the LSR ID of the local LSR. By default, an LSR has no LSR ID. HP recommends that you use the address of a loopback interface on the LSR as the LSR ID.

Syntax mpls lsr-id lsr-id undo mpls lsr-id lsr-id

Specifies an ID for identifying the LSR, in dotted decimal notation.

Examples Configure the LSR ID as 10.0.0.1 for the local node. system-view [Sysname] mpls lsr-id 10.0.0.1

display mpls summary Use the display

mpls

summary command to display MPLS summary information.

Syntax display mpls summary


Examples # Display MPLS summary information.

See Table 9-2 for the output description. Table 9-2: Display MPLS summary output description Field

Description Label type that the egress assigns to the penultimate hop:

Egress Label Type

• Implicit-null. • Explicit-null. • Non-null.

Labels Range Idle

Label information. Label range. Number of idle labels in the label range.


Protocols Type

Running label distribution protocols and the related information. Protocol type: LDP, BGP, RSVP, Static, Static CRLSP, TE, or L2VPN. Label distribution protocol running status:

State

• Normal. • Recover—The protocol is in the Graceful Recovery process.

Step 3: LDP and Prefixes which Trigger LSP Overview In the next step, LDP is enabled globally on the device and prefixes are specified that can trigger LSP label announcements. Labels can be announced for all IP prefixes found in the LSR IP routing table, or they can be limited to certain IP prefixes only. The network administrator could configure that some background traffic should use using pure IP routing without label switching. This is useful where access lists or traditional IP filters are used to filter IP traffic on the backbone network. Additionally, selected label triggers may be used to ensure that only specific traffic is label switched. In a L2VPN or VPLS scenario, it is possible to generate labels for LSR loopback interfaces only and not other core interfaces. In this case, VPN traffic would be label switched on the backbone while other internal traffic would use traditional IP and be Layer 3 routed. This is discussed in more detail in the L2VPN and VPLS chapters. Within the MPLS LDP context, you specify which IP prefixes trigger label announcements. In Figure 9-18, label announcements are triggered for all IP prefixes, but this could be limited to certain IP prefixes by using access lists.

Figure 9-18: Step 3: LDP and Prefixes which Trigger LSP

mpls ldp ******ebook converter DEMO Watermarks*******

Use the mpls

ldp

command to enable LDP globally and enter LDP view.

Use the undo mpls ldp command to disable LDP globally for an LSR and delete all LDP-VPN instances. By default, LDP is globally disabled. You must enable LDP globally for an LSR to run LDP. The GR commands, the session protection command, and the targeted-peer command are available only in LDP view. All other commands available in LDP view are also available in LDP-VPN instance view. Commands executed in LDP view take effect only for the public network. Commands executed in LDP-VPN instance view take effect only for the specified VPN instance. The GR commands are global commands and take effect for all VPN instances and the public network.

Syntax mpls lsr-id lsr-id undo mpls lsr-id lsr-id

Specifies an ID for identifying the LSR, in dotted decimal notation.

Examples Enable LDP globally and enter LDP view. System-view [Sysname] mpls ldp [Sysname-ldp]

lsp-trigger Use the lsp-trigger command to configure an LSP generation policy. Use undo lsp-trigger command to restore the default. By default, LDP can only use host routes with a 32-bit mask to generate LSPs.


The default LSP generation policy depends on the label distribution control mode. ■ In Ordered mode, LDP can only use the Loopback interface address routes with a 32-bit mask and the routes with a 32-bit mask that match the FECs of label mappings received from downstream LSRs to generate LSPs. ■ In Independent mode, LDP can use all routes with a 32-bit mask to generate LSPs. After you configure an LSP generation policy, LDP uses all routes or the routes permitted by the IP prefix list to generate LSPs, regardless of the label distribution control mode. HP recommends using the default LSP generation policy.

Syntax lsp-trigger { all | prefix-list prefix-list-name } undo lsp-trigger all

Enables LDP to use all routes to generate LSPs. prefix-list prefix-name

Specifies an IP prefix list by its name, a case-sensitive string of 1 to 63 characters. LDP can only use the routes permitted by the IP prefix list to generate LSPs.

Examples Configure an LSP generation policy to use only routes 10.10.1.0/24 and 10.20.1.0/24 to establish LSPs for the public network. system-view [Switch] ip prefix-list egress-fec-list index 1 permit 10.10.1.0 24 [Switch] ip prefix-list egress-fec-list index 2 permit 10.20.1.0 24 [Switch] mpls ldp [Switch-ldp] lsp-trigger prefix-list egress-fec-list

Step 4: Enable MPLS and LDP on interfaces In step 4, MPLS and LDP are enabled on each backbone interface. This applies to physical routed interfaces or routed VLAN interfaces. Do not configure MPLS on customer facing interfaces.


In Figure 9-19, it is assumed that the Forty Gigabit Ethernet 1/0/51 interface is a routed port and therefore MPLS and LDP are configured directly on the interface. The mpls enable command is used to enable label switching and the mpls ldp enable command configures the device to attempt to form an LDP session with a peer device and then exchange labels and FEC information.

Figure 9-19: Step 4: Enable MPLS and LDP on interfaces

mpls enable command Use the mpls enable command to enable MPLS on an interface. Execute this command on all interfaces that need to perform MPLS forwarding. Use the undo

mpls enable

command to disable MPLS on an interface.

By default, MPLS is disabled on an interface.

Syntax mpls enable undo mpls enabled

Examples Enable MPLS on interface VLAN-interface 2. System-view [Sysname] interface vlan-interface 2 [Sysname-Vlan-interface2] mpls enable

mpls ldp enable command Use the mpls

ldp enable

command to enable LDP for an interface.

Use the undo

mpls ldp enable

command to disable LDP for an interface.

By default, LDP is disabled on an interface.


Before you enable LDP for an interface, use the to enable LDP globally.

mpls ldp

command in system view

Disabling LDP on an interface terminates all LDP sessions on the interface, and removes all LSPs established through the sessions. If the interface is bound with a VPN instance, you must also use the command to enable LDP for the VPN instance.

vpn-instance

An up interface enabled with LDP and MPLS sends Link Hellos for neighbor discovery. An up MPLS TE tunnel interface enabled with LDP sends Targeted Hellos to the tunnel destination and establishes a session with the tunnel peer.

Syntax mpls ldp enable undo mpls ldp enable

Examples Enable LDP for VLAN-interface 2. system-view [Sysname] mpls ldp [Sysname-ldp] quit [Sysname] interface vlan-interface 2 [Sysname-Vlan-interface2] mpls ldp enable

Step 5: Verify The last step is verification. Several display commands are available to verify the MPLS global settings, MPLS interfaces, LDP global settings, LDP interfaces and MPLS LDP LSP exchange information.

Step 5.1: Verify MPLS Global Setting The display mpls summary command can be used to view the LSR Router ID, the label ranges that will be used and the number of available labels in the ranges. In Figure 9-20, various label ranges are shown that have been allocated by the


Comware device as well as the number of labels that are available. The two label distribution protocols available on this device are LDP and Static. Protocols could include Multiprotocol BGP, RSVP, TE and others.

Figure 9-20: Step 5.1: Verify MPLS Global Setting

Guidelines Use the display

mpls summary

command to display MPLS summary information.

Syntax display mpls summary

Examples Display MPLS summary information. display mpls summary MPLS LSR ID : 2.2.2.2 Egress Label Type: Implicit-null Labels: Range Idle 16-1023 1008 1024-9215 8192 65536-73727 8192


131072-139263 8192 Protocols: Type State BGP Normal Static Normal

See Table 9-3 for the output description. Table 9-3: Display MPLS summary output description Field

Description Label type that the egress assigns to the penultimate hop:

Egress Label Type

• Implicit-null. • Explicit-null. • Non-null.

Labels Range

Label information. Label range.

Idle Protocols

Number of idle labels in the label range. Running label distribution protocols and the related information. Protocol type: LDP, BGP, RSVP, Static, Static CRLSP, TE, or L2VPN. Label distribution protocol running status:

Type

State

• Normal. • Recover—The protocol is in the GR process.

Step 5.2: Verify MPLS Interfaces A network administrator can verify which interfaces have MPLS enabled by using the display MPLS interface command, as shown in Figure 9-21. This should list all the backbone facing interfaces.


Figure 9-21: Step 5.2: Verify MPLS Interfaces

Guidelines Use the display mpls interface command to display MPLS interface information, including the interface name, interface status, and interface MPLS MTU.

Syntax display mpls interface [ interface-type interface-number ] interface-type interface-number

Specifies an interface by the interface type and number. If you do not specify an interface, the command displays MPLS information for all MPLS-enabled interfaces.

Examples Displays all MPLS interfaces. display mpls interface Interface

Status

MPLS MTU

Vlan2

Up

1500

Vlan20

Up

1500

The MPLS MTU of an interface is in bytes.

Step 5.3: Verify MPLS global parameters LDP is a label distribution protocol. As shown in Figure 9-22, the display mpls ldp parameter command will show LDP global configuration settings. The output includes nonstop routing and graceful restart information (both disabled by default).


Figure 9-22: Step 5.3: Verify MPLS global parameters

A network administrator would need to enable support for nonstop routing or graceful restart in a situation where dual management modules are used or in IRF based systems. This ensures seamless failover to another IRF member or management module in a situation where the primary master fails. The output also displays the label switch router ID.

Guidelines Use the display parameters.

mpls

ldp

parameter

command to display LDP running

Syntax display mpls ldp parameter [ vpn-instance vpn-instance-name ] vpn-instance vpn-instance-name

Specifies an MPLS L3VPN instance by its name, a case-sensitive string of 1 to 31 characters. The command displays the LDP running parameters for the specified VPN. If you do not specify a VPN instance, the command displays the LDP running parameters for the public network.

Examples Display LDP running parameters for the public network.


display mpls ldp parameter Global Parameters: Protocol Version : V1 Nonstop Routing : Off

Graceful Restart : Off

Reconnect Time : 120 sec

Forwarding State Hold Time: 360 sec

Instance Parameters: Instance ID : 0

Instance State : Active

LSR ID : 0.0.0.0 Loop Detection : Off Hop Count Limit : 32 Label Retention Ordered

Mode:

IGP Sync Delay : 0 sec

Path Vector Limit : 32 Liberal

Label

Distribution

Control

Mode:

IGP Sync Delay on Restart : -

See Table 9-4 for the output description. Table 9-4: Display MPLS LDP running parameter output description Field

Description

Global Global parameters for all LDP-enabled networks. Parameters Protocol LDP protocol version. Version Whether the nonstop routing function is enabled. This field is not Nonstop supported in the current software version and is reserved for future Routing support. Whether the GR function is enabled. Graceful • On—Enabled. Restart • Off—Disabled. Reconnect Value of the Reconnect timer, in seconds. Time Forwarding State Hold Value of the MPLS Forwarding State Holding timer, in seconds. Time Instance Running parameters for a specific VPN instance or public network. Parameters VPN instance ID. For the public network, this field displays 0.


VPN instance ID. For the public network, this field displays 0. Instance State LSR ID Loop Detection

LDP status in the VPN instance, Active or Inactive. LSR ID of the local device. Whether loop detection is enabled. • On—Enabled. • Off—Disabled.

Hop Count Hop count limit specified for loop detection. Limit Path Vector Path Vector length limit specified for loop detection. Limit Label Retention Mode IGP Sync Delay IGP Sync Delay on Restart

The device supports only the Liberal mode. Delay time (in seconds) that LDP must wait before it notifies IGP of an LDP session-up event. This field is not supported in the current software version and is reserved for future support. Delay time (in seconds) that LDP must wait before it notifies IGP of an LDP session-up event in case of LDP restart. This field is not supported in the current software version and is reserved for future support.

Step 5.4: Verify MPLS LDP Enabled Interfaces and Peers As shown in Figure 9-23, interfaces enabled for MPLS LDP can be verified using the display mpls ldp interface command.


Figure 9-23: Step 5.4: Verify MPLS LDP Enabled Interfaces and Peers

An administrator can also verify that an LDP peer relationship has been established with a peer LDP device using the display mpls ldp peer command. Information such as support for graceful restart is shown in the output.

display mpls ldp interface Use the display information.

mpls

ldp

interface

command to display LDP interface

Syntax display mpls ldp interface [ interface-type interface-number ] interface-type interface-number

Specifies an interface by its type and number. If you do not specify an interface, this command displays information about all LDP interfaces.

Examples Display information about all LDP interfaces. display mpls ldp interface Interface

MPLS

LDP

Auto-config

Vlan17

Enabled

Configured -

Vlan20

Enabled

Configured -


See Table 9-5 for the output description. Table 9-5: Display MPLS LDP interface output description Field

Description

Interface MPLS LDP Autoconfig

Interface enabled with LDP. Whether the interface is enabled with MPLS. Whether the interface is configured with the mpls ldp enable command. LDP automatic configuration information. This field is not supported in the current software version and is reserved for future support.

display mpls ldp peer Use the display mpls ldp peer command to display the LDP peer and session information.

Syntax display mpls ldp peer [ vpn-instance vpn-instance-name ] [ peerlsr-id ] [ verbose ] vpn-instance vpn-instance-name

Specifies an MPLS L3VPN instance by its name, a case-sensitive string of 1 to 31 characters. The command displays LDP peer and session information for the specified VPN. If you do not specify a VPN instance, the command displays the LDP peer and session information for the public network. peer peer-lsr-id

Specifies an LDP peer by its LSR ID. If you do not specify this option, the command displays all LDP peers and related session information. verbose

Displays detailed LDP peer and session information. If you do not specify this keyword, the command displays brief LDP peer and session information.

Examples Display brief information about all LDP peers and LDP sessions for the public


network. display mpls ldp peer Total number of peers: 1 Peer LDP ID

State

Role

GR

MD5 KA

Sent/Rcvd

2.2.2.9:0

Operational Passive Off Off 39/39

See Table 9-6 for the output description. Table 9-6: Display MPLS LDP peer output description Field

Description

Peer LDP LDP identifier of the peer. ID State of the LDP session between the local LSR and the peer: • Non Existent—No TCP connection is established. • Initialized—A TCP connection has been established. State

• OpenRecv—LDP has received an acceptable initialization message. • OpenSent—LDP has sent an initialization message. • Operational—An LDP session has been established.

Role

GR

Role of the local LSR in the session, Active or Passive. In a session, the LSR with a higher IP address takes the Active role. The Active LSR initiates a TCP connection to the passive LSR. Whether GR is enabled on the peer. • On—Enabled. • Off—Disabled. Whether MD5 authentication is enabled for the LDP session on the local device.

MD5

• On—Enabled. • Off—Disabled.

KA Number of keepalive messages sent/received. Sent/Rcvd

Step 5.5: Verify MPLS LDP LSP Overview ******ebook converter DEMO Watermarks*******

Once LDP has exchanged forwarding equivalence class (FEC) and label information, an administrator can view the LSP table with display mpls ldp lsp command. Figure 9-24 shows networks 10.0.0.1/32, 10.0.0.2/32 and 10.0.1.0/24. These are IP prefixes listed in the IPV4 routing table. Each prefix has an associated label which is announced to peer devices.

Figure 9-24: Step 5.5: Verify MPLS LDP LSP Overview

The local router will receive traffic for each FEC that is either unlabelled (displayed with a hyphen "-"), or is labeled (label displayed). Outgoing labels are also shown is used or a hyphen "-" is shown to indicate unlabelled traffic. An MPLS router may receive either labeled or unlabelled traffic. If labeled traffic is required, the LSR will typically either swap or pop the label. Unlabelled traffic will typically have a label imposed when the egress interface is a MPLS enabled interface.

Guidelines Use the display mpls ldp lsp command to display information about LSPs generated by LDP.

Syntax display mpls ldp lsp [ vpn-instance destination-address mask-length ]

vpn-instance-name

]

[

vpn-instance vpn-instance-name

Specifies an MPLS L3VPN instance by its name, a case-sensitive string of 1 to 31 characters. The command displays LDP LSP information for


the specified VPN. If you do not specify a VPN instance, the command displays LDP LSP information for the public network. destination-address mask-length

Specifies an FEC by an IP address and a mask length in the range of 0 to 32. If you do not specify a FEC, the command displays information about LDP LSPs for all FECs.

Examples Display LDP LSP information for the public network.

See Table 9-7 for the output description. Table 9-7: Display MPLS LDP LSP output description Field

Description LSP status:

Status Flags • *—Stale, indicating the LSP is under a GR process. • L—Liberal, indicating the LSP is not available. LSP statistics: • FECs—Total number of FECs. • Ingress LSPs—Number of LSPs that take the local device as the ingress node.


• Transit LSPs—Number of LSPs that take the local device as a transit node. • Egress LSPs—Number of LSPs that take the local device as the egress node. FEC In/Out Label Nexthop OutInterface

Forwarding equivalence class identified by an IP prefix. Incoming/outgoing label. Next hop address for the FEC. Outgoing interface for the FEC.

Step 5.6: Verify Using Tracert LSP Use the tracert lsp ipv4 command to locate MPLS LSP errors on the LSP for a FEC, as shown in Figure 9-25. The command sends MPLS echo requests along the LSP to be inspected, with the TTL increasing from 1 to the specified value.

Figure 9-25: Step 5.6: Verify Using Tracert LSP

Each hop along the LSP will return an MPLS echo reply to the ingress device because of the TTL timeout (similar to IPv4 tracert). The ingress device can collect information about each hop along the LSP, including an LSP failure. For example, MPLS may not be enabled on an interface in the path which results in the failed LSP. You can also use MPLS LSP tracert command to collect information about hop in the LSP, including the label allocated.

Syntax tracert lsp [ -a source-ip | -exp exp-value | -h ttl-value | -r reply-mode | -t time-out ] * ipv4 dest-addr mask-length [ destination-ip-addr-header ]


destination-ip-addr-header ] -a source-ip

Specifies the source IP address for the echo request messages. -exp exp-value

Specifies the EXP value for the echo request messages. The exp-value argument ranges from 0 to 7 and defaults to 0. -h ttl-value

Specifies the TTL value for the echo request messages. The ttl-value argument ranges from 1 to 255 and defaults to 30. -r reply-mode

Specifies the reply mode of the receiver in response to the echo request messages. The reply-mode argument can be 1 or 2, where 1 means “Do not response” and 2 means “Respond using a UDP packet.” The default is 2. -t time-out

Specifies the timeout interval for the response to an echo request message. The time-out argument ranges from 0 to 65535 milliseconds and defaults to 2000 milliseconds. ipv4 dest-addr mask-length

Specifies a FEC by an IPv4 destination address and the mask length of the destination address. The mask-length argument ranges from 0 to 32. destination-ip-addr-header

Specifies the destination address in the IP header of the MPLS echo request messages. It can be any address on segment 127.0.0.0/8—any local loopback address.

Example Locate errors along the LSP for FEC 3.3.3.9.


Tracert MPLS Use the tracert mpls ipv4 command to trace MPLS LSPs from the ingress node to the egress node for an IPv4 prefix. tracert mpls [ -a source-ip | -exp exp-value | -h ttl-value | -r reply-mode | -rtos tos-value | -t time-out | -v | fec-check ] * ipv4 dest-addr mask-length [ destination start-address [ endaddress [ address-increment ] ] ] -a source-ip

Specifies the source address for MPLS echo request packets. If you do not specify this option, the command uses the primary IP address of the outgoing interface as the source address for MPLS echo requests. -exp exp-value

Specifies the EXP value for MPLS echo request packets, in the range of 0 to 7. The default is 0. -h ttl-value

Specifies the maximum TTL value for MPLS echo request packets, namely, the maximum number of hops to be inspected. The value range for the ttl-value argument is 1 to 255, and the default is 30. -r reply-mode

Specifies the reply mode of the receiver in response to MPLS echo request packets. The reply-mode argument can be 1, 2, or 3. 1 means "Do not reply," 2 means "Reply by using a UDP packet," and 3 means "reply by using a UDP packet that carries the Router Alert option." The default is 2. -rtos tos-value

Specifies the ToS value in the IP header of an MPLS echo reply packet. The value range is 0 to 7, and the default value is 6.


-t time-out

Specifies the timeout interval for the reply to an MPLS echo request. The value range is 0 to 65535 milliseconds, and the default is 2000 milliseconds. -v

Displays detailed reply information. If you do not specify this keyword, the command displays brief reply information. fec-check

Checks the FEC stack at transit nodes. dest-addr mask-length

Specifies an FEC by an IPv4 destination address and a mask length. The value range for the mask-length argument is 0 to 32. destination

Specifies the destination address in the IP header of MPLS echo requests. The default is 127.0.0.1. start-address

Specifies the destination address or the start destination address. This address must be an address on subnet 127.0.0.0/8—a local loopback address. If the start-address argument is specified without the endaddress argument, the start-address is the destination address in the IP header. If you specify both the start-address argument and the endaddress argument, you specify a range of destination addresses and the destination addresses increase in turn by the address-increment, starting from the start-address to the end-address. The command performs a traceroute for each of the destination addresses. end-address

Specifies the end destination address. This address must be an address on subnet 127.0.0.0/8—a local loopback address. address-increment

Specifies the increment value by which the destination address in the IP header increases in turn. The value range is 1 to 16777215 and the default value is 1.


Examples Trace the path that the LSP (for FEC 5.5.5.9/32) traverses from the ingress node to the egress node. Specify the IP header destination address range as 127.1.1.1 to 127.1.1.2 and set the address increment value to 1. With these settings, the device performs a traceroute for 127.1.1.1 and 127.1.1.2, respectively.

Trace the path that the LSP (for FEC 5.5.5.9/32) traverses from the ingress node to the egress node. Display detailed reply information, specify the IP header destination address range as 127.1.1.1 to 127.1.1.2, and set the address increment value to 1. With these settings, the device performs a traceroute for 127.1.1.1 and 127.1.1.2, respectively.


See Table 9-8 for the output description: Table 9-8: Tracert MPLS output description Field LS trace route FEC Destination address TTL Replier Time Type Downstream ReturnCode

Description Trace the LSPs for the specified FEC. Destination IP address in the IP header. Number of hops Address of the LSR that replied to the request. Time used to receive the reply, in milliseconds. LSR type: Ingress, Transit, or Egress. Address of the downstream LSR and the label assigned by the downstream LSR. Return code. The number in parentheses represents a return subcode.

Summary In this chapter you learned Multiprotocol Label Switching (MPLS) basics including how labels are allocated and advertised using label distribution protocols like LDP. You learned many MPLS terms including LSR, FEC, LSP, labels, label stack, FIB, RIB, LFIB, LIB and PHP amongst others. You learned how MPLS inserts a 32 bit header to packets which includes a 20 bit label. The behavior of LSRs was explained and how labels are inserted, swapped or popped. Tables that are used by LSRs including RIB, LIB, FIB and LFIB were explained. Both the advertisement of FECs and forwarding of data to a FEC were explained. The configuration and verification of basic MPLS was then detailed.

Learning Check Answer each of the questions below.


1. An HP Comware MPLS P LSR receives an MPLS labeled packet. Which table will be used for reading the packet and determining the hop behavior? a. RIB b. LIB c. FIB d. LFIB 2. Which protocol determines the path LSRs use? a. EGP b. LDP c. IGP d. TDP 3. An HP Comware Ingress LSR receives a packet from a CE. Which table will be used for packet forwarding? a. RIB b. FIB c. LIB d. LFIB 4. Which MPLS feature removes a label on a P router rather than the PE router when traffic is destined to a network directly connected to the PE router? a. PHP b. PBB c. L2VPN d. LDP

Learning Check Answers 1. d 2. c 3. b 4. a



10 MPLS Layer 2 VPN (MPLS L2VPN)

EXAM OBJECTIVES In this chapter, you learn to: ✓ Describe MPLS L2VPN Features. ✓ Understand L2VPN architecture. ✓ Describe L2VPN implementation methods: Martini, Kompella, CCC, SVC. ✓ Configure MPLS L2VPN. ✓ Verify MPLS L2VPN.

INTRODUCTION This chapter discusses MPLS L2VPN technologies which provide point-to-point layer 2 connections across an MPLS backbone network. MPLS VPLS connections that provide point-to-multipoint connections are discussed in Chapter 11.

ASSUMED KNOWLEDGE You should have a basic understanding of Multiprotocol Label Switching (MPLS), including basic operations, the behavior of a Label Switching Router (LSR), and Label Switched Paths (LSPs). You should also be familiar with MPLS application uses cases including L2VPNs and VPLS.

MPLS L2VPN ******ebook converter DEMO Watermarks*******

Traditional VPNs based on Asynchronous Transfer Mode (ATM) or Frame Relay (FR) were popular in the past. They shared the network infrastructure of carriers, but had some inherent disadvantages: ■ Dependence on dedicated media: To provide both ATM-based and FR-based VPN services, carriers had to establish two separate infrastructures across the whole service scope, one ATM infrastructure and one FR infrastructure. The cost was very high and the infrastructures were not utilized efficiently. ■ Complicated deployment: To add a site to an existing VPN, you had to modify the configurations of all edge nodes connected with the VPN site. MPLS L2VPN, shown in Figure 10-1, was developed as a solution to address the above disadvantages.

Figure 10-1: MPLS L2VPN

MPLS L2VPN provides Layer 2 VPN services over an MPLS or IP backbone. It allows carriers to establish L2VPNs on different data link layer protocols, including ATM, FR, VLAN, Ethernet and PPP. MPLS L2VPN transfers user data transparently and from a user's perspective, the MPLS network is a Layer 2 switched network that can be used to establish Layer 2 connections between nodes. For example, when two Ethernet networks are connected through MPLS L2VPN over an MPLS backbone, Ethernet users are unaware of the MPLS backbone. The user experience is the same as if they were connected directly through an Ethernet connection. MPLS L2VPN is an implementation of Pseudo Wire Emulation Edge-to-Edge


MPLS L2VPN is an example of an application that can make use of the MPLS core network. L2VPNs provide point-to-point connectivity in contrast to VPLS which provides point-to-multipoint connectivity. If a customer requires point-to-multipoint connections to interconnect three datacenters for example, they would need to use VPLS or other technologies. VPLS is an extension to MPLS L2VPNs and is discussed in Chapter 11.

Comparison with MPLS L3VPN MPLS L3VPN is not discussed in this study guide, but the question often arises as to what the difference is between L3VPN and L2VPN. Compared with MPLS L3VPN, MPLS L2VPN has the following advantages: ■ High scalability: MPLS L2VPN establishes only Layer 2 connections. It does not involve the routing information of users. This greatly reduces the load of the PEs and even the load of the whole service provider network, enabling carriers to support more VPNs and to service more users. ■ Guaranteed reliability and private routing information security: As no routing information of users is involved, MPLS L2VPN neither tries to obtain nor processes the routing information of users, guaranteeing the security of the user VPN routing information. ■ Support for multiple network layer protocols, such as IP, IPX, and SNA. Please refer to the HP website for more information about L3VPN.

MTU An MPLS label stack is inserted between the link layer header and network layer header of a packet. With the addition of the MPLS header, an MPLS packet may exceed the maximum transmission unit (MTU) of an interface and therefore be dropped. To address the issue, you can configure the MPLS MTU on an interface of an LSR. The LSR will then compare the length of an MPLS packet against the configured MPLS MTU on the interface. When the packet is larger than the MPLS MTU: ■ If fragmentation is allowed, the LSR removes the label stack from the packet, fragments the IP packet (the length of a fragment is the MPLS MTU minus the length of the label stack), adds the label stack back into each fragment, and then forwards the fragments. ■ If fragmentation is not allowed, the LSR drops the packet directly.


To configure the MPLS MTU of an interface, see Table 10-1. Table 10-1: Configure the MPLS MTU of an interface Step

Command


system-view

2. Enter interface view.

interface interfacetype interface-number

3. Configure the MPLS MTU of the interface.

mpls mtu value

Remarks

By default, the MPLS MTU of an interface is not configured.

MPLS packets carrying L2VPN or IPv6 packets are always successfully forwarded, even if they are larger than the MPLS MTU. If the MPLS MTU of an interface is greater than the MTU of the interface, data forwarding may fail on the interface. If you do not configure the MPLS MTU of an interface, fragmentation of MPLS packets will be based on the MTU of the interface, and the length of fragments does not take the MPLS labels into account. Thus, the length of an MPLS fragment may be larger than the interface’s MTU.

MPLS L2VPN Label Stack In MPLS L2VPNs, the concepts and principles of CE, PE and P are the same as those in other MPLS technologies such as basic MPLS or MPLS L3VPNs: ■ Customer edge device (CE): A CE resides on a customer network and has one or more interfaces directly connected with service provider networks. It can be a router, a switch, or a host. It unaware of the existence of the VPN and does not need to support MPLS. ■ Provider edge router (PE): A PE resides on a service provider network and connects one or more CEs to the network. On an MPLS network, all VPN processing occurs on the PEs. ■ Provider (P) router: A P router is a backbone router on a service provider network. It is not directly connected with any CE. It only needs to be configured with basic MPLS forwarding capability. MPLS L2VPN uses label stacks, as shown in Figure 10-2, to implement the transparent transmission of user packets in the MPLS network. Layer 2 packets


received by the PE from a CE, are encapsulated using a VC label and tunnel label.

Figure 10-2: MPLS L2VPN Label Stack

■ An outer label, also called a tunnel label, is used to transfer packets from one PE to another. ■ An inner label, also called a VC label, is used to identify different connections between VPNs. ■ Upon receiving packets, a PE determines to which CE the packets are to be forwarded based on the VC labels. The label stacking is discussed in more detail later in this chapter.

MPLS L2VPN Terminology Some basic MPLS L2VPN terms you need to understand include VLL, VC and PW. A traditional leased line is used to physically connect two remote sites. From a customer point of view, a virtual leased line (VLL) behaves in the same way as a traditional leased line. When traffic is sent from one CE to another through an MPLS backbone, Layer 2 traffic is sent to the service provider at one end by a CE device and arrives at the other end CE device unchanged. A VLL interconnects two customer CE devices logically across the MPLS backbone. A virtual circuit (VC) is also called a pseudo wire (PW). It is a virtual bidirectional connection that connects the attachment circuits (ACs) on two PEs. An MPLS VC includes a pair of label switch paths (LSPs) in opposite directions. In other words,


each unidirectional path of the virtual circuit uses a different LSP and therefore different labels for forwarding. Traffic from CE A to CE B will use different labels than traffic from CE B to CE A. Multiple virtual circuits can share the same core MPLS network in conjunction with other MPLS technologies such as MPLS traffic engineering (MPLS-TE) and MPLS L3VPNs. See Table 10-2 for terminology definitions. Table 10-2: Terminology definitions Term

Description

Attachment An AC is a link between a CE and a PE, such as an FR DLCI, ATM circuit (AC) VPI/VCI, Ethernet interface, VLAN, or PPP connection. A cross-connect concatenates two physical or virtual circuits such as CrossACs and PWs. It switches packets between the two physical or virtual connect circuits. Cross-connects include AC to AC cross-connect, AC to PW cross-connect, and PW to PW cross-connect. A CE device is a customer network device directly connected to the Customer service provider network. It can be a network device (such as a router edge (CE) or a switch) or a host. It is unaware of the existence of any VPN, neither does it need to support MPLS. As a forwarding technology based on classification, MPLS groups Forwarding packets to be forwarded in the same manner into a class called the equivalence forwarding equivalence class (FEC). That is, packets of the same FEC class (FEC) are handled in the same way. A label block is a set of labels. It includes the following parameters: • Label base—The LB specifies the initial label value of the label block. A PE automatically selects an LB value that cannot be manually modified. • Label range—The LR specifies the number of labels that the label block contains. The LB and LR determine the labels contained in the label block. For example, if the LB is 1000 and the LR is 5, the label block contains labels 1000 through 1004. • Label-block offset—The LO specifies the offset of a label block. If the existing label block becomes insufficient as the VPN sites increase, you can add a new label block to enlarge the label range. A PE uses an LO to identify the position of the new label block. Label block The LO value of a label block is the sum of the LRs of all


previously assigned label blocks. For example, if the LR and LO of the first label block are 10 and 0, the LO of the second label block is 10. If the LR of the second label block is 20, the LO of the third label block is 30. • A label block which has LB, LO, and LR values of 1000, 10, and 5 is represented as 1000/10/5. • Assume that a VPN has 10 sites, and a PE assigns the first label block LB1/0/10 to the VPN. When another 15 sites are added, the PE keeps the first label block and assigns the second label block LB2/10/15 to extend the network. LB1 and LB2 are the initial label values that are randomly selected by the PE. MPLS-TE focuses on the optimization of overall network performance. It is intended to conveniently provide highly efficient and reliable network services. The performance objectives associated MPLS with TE are either traffic oriented to enhance quality of service (QoS) Traffic or resources oriented to optimize resources (especially bandwidth) Engineering utilization. TE helps optimize network resources use to reduce (MPLS-TE) network administrative cost, and dynamically tune traffic when congestion or flapping occurs. In addition, it allows ISPs to provide added value services. Provider P devices do not directly connect to CEs. They only need to forward device (P) user packets between PEs. A PE device is a service provider network device connected to one or Provider more CEs. It provides VPN access by mapping and forwarding edge (PE) packets from user networks to public network tunnels and from public network tunnels to user networks. Pseudo wire A virtual bidirectional connection between two PEs. An MPLS PW (PW) comprises a pair of LSPs in opposite directions. Route An RD is added before a site ID to distinguish the sites that have the distinguisher same site ID but reside in different VPNs. An RD and a site ID (RD) uniquely identify a VPN site. PEs use the BGP route target attribute (also called "VPN target" attribute) to manage BGP L2VPN information advertisement. PEs support the following types of route target attributes: • Export target attribute—When a PE sends L2VPN information (such as site ID, RD, and label block) to the peer PE in a BGP update message, it sets the route target attribute in the update message to export target.

Route target • Import target attribute—When a PE receives an update message (RT)


from the peer PE, it checks the route target attribute in the update message. If the route target value matches an import target, the PE accepts the L2VPN information in the update message. • Route target attributes determine which PEs can receive L2VPN information, and from which PEs that a PE can receive L2VPN information. A site ID uniquely identifies a site in a VPN. Sites in different VPNs can have the same site ID. A tunnel (or public tunnel) is a connection that carries one or more Tunnel PWs across the MPLS or IP backbone. It can be an LSP tunnel, MPLS TE tunnel, or a GRE tunnel. A VC is also called a Pseudowire (PW). It is a virtual bidirectional Virtual connection that connects the ACs on two PEs. An MPLS VC includes a Circuit (VC) pair of LSPs in opposite directions. A point-to-point L2 VPN service provided in the public network. It Virtual enables two sites to be connected as if they were connected via a Leased Line leased line. It cannot provide switching among multiple points of the (VLL) service provider. Site ID

MPLS L2VPN Configuration Methods Overview Multiple L2VPN implementation methods are available. Some options require manual configuration and others use dynamic signaling protocols to advertise the labels used by the L2VPN Virtual Circuit. The Provider-Provisioned Virtual Private Network (PPVPN) working group of the IETF has drafted several framework protocols. Two of the most important ones are Martini draft and Kompella draft: ■ draft-martini-l2circuit-trans-mpls ■ draft-kompella-ppvpn-l2vpn The Martini draft defines a method for establishing PPP links to implement MPLS L2VPN. It uses Label Distribution Protocol (LDP) as a signaling protocol for VC label transfer. The Kompella draft defines a CE-to-CE mode for implementing MPLS L2VPN on the


MPLS network. It uses extended BGP as the signaling protocol to advertise Layer 2 reachability information and VC labels. Kompella uses similar protocols (MBGP / MPLS) to those used in L3VPNs defined in RFC 2547. L2VPNs are extended to support point-to-multipoint connections using VPLS in RFC 4761 and RFC 4762. The two MPLS L2VPN implementation methods use Label Distribution Protocol (LDP) and Border Gateway Protocol (BGP) to carry VC labels and establish point-to-multipoint links. RFC 4762 is defined as Virtual Private LAN Service (VPLS) Using Label Distribution Protocol (LDP) Signaling and RC 4761 is defined as Virtual Private LAN Service (VPLS) Using BGP for Auto-Discovery and Signaling. In addition, MPLS L2VPN can also be implemented by configuring VC labels statically. Circuit Cross Connect (CCC) and Static Virtual Circuit (SVC) are two of the static implementation methods. Note The focus of this study guide is Martini as it is the most widely used implementation. Other implementation methods are discussed here for completeness and comparison purposes.

Comparison See Table 10-3 for a comparison of MPLS L2VPN implementation modes. Table 10-3: Comparison of MPLS L2VPN implementation modes


Martini MPLS L2VPN Martini MPLS L2VPN employs two levels of labels to transfer user packets. LDP and is used to advertise a destination network to create a LSP between PE devices. This is used as the outer label in L2VPNs. Remote LDP is used as the signaling protocol to distribute the inner VC label. Previously LDP was used to advertise an IPv4 prefix. In this case, remote LDP advertises a unique L2 interface to a remote PE device. PE1 in Figure 10-3 allocates a label to the interface connected to CE1 and advertises that information to PE2.


Figure 10-3: Label distribution in Martini mode

To allow the exchange of VC labels between PEs, the Martini method extended LDP by adding the forwarding equivalence class (FEC) type of VC FEC. Moreover, as the two PEs exchanging VC labels may not be connected directly, a remote LDP session must be set up to transfer the VC FEC and VC labels. With Martini MPLS L2VPN, only PEs need to maintain a small amount of VC labels and LSP mappings and no P device contains Layer 2 VPN information. Therefore, it has high scalability. In addition, to add a new VC, you only need to configure a oneway VC for each of the PEs. Your configuration will not affect the operation of the network. The Martini method applies to scenarios with sparse Layer 2 connections, such as a scenario with a star topology. The VC FEC contains the following information: ■ VC type—Encapsulation type of the VC such as PPP, HDLC, FR, Ethernet and ATM. ■ VC ID—Identifier of a VC on a PE. The VC type and the VC ID uniquely identify a VC. On a PE, the VC ID uniquely identifies a VC among the VCs of the same type. As shown in Figure 10-3, the PEs send a VC FEC and VC label mapping to each


other. After the VC labels are distributed, a VC is set up between the PEs. The key of the Martini method is to set up VCs between CEs. Martini MPLS L2VPNs employ the VC type and VC ID to identify a VC. The VC type as discussed indicates the encapsulation type of the VC, which can be ATM, VLAN, or PPP for example. The VC ID uniquely identifies the VC among the VCs of the same VC type on a PE. The PEs connecting the two CEs of a VC exchange VC labels through LDP, and bind their respective CE by the VC ID. Once LDP establishes an LSP between the two PEs and the label exchange and the binding to CE are finished, a VC is set up and ready to transfer Layer 2 data.

Kompella MPLS L2VPN Kompella MPLS L2VPN employs two levels of labels to transfer user packets, and uses BGP as the signaling protocol to distribute the inner VC label. Different from other MPLS L2VPN modes, Kompella introduces the concept of a VPN. It allows CEs in the same VPN to establish a connection. CEs in different VPNs cannot establish a connection. Kompella MPLS L2VPN has the following basic concepts: ■ CE ID—Kompella numbers CEs inside a VPN. A CE ID uniquely identifies a CE in a VPN. CEs in different VPNs can have the same CE ID. ■ Route distinguisher—To distinguish CEs with the same CE ID in different VPNs, Kompella adds an RD before a CE ID. An RD and a CE ID uniquely identify a CE. ■ Route target—Kompella uses the BGP route target attribute (also called "VPN target" attribute) to identify VPNs to make sure CEs in the same VPN can establish a connection and CEs in different VPNs cannot. A PE supports the following types of route target attributes: ■ Export target attribute—When a PE sends L2VPN information (such as CE ID and RD) to the peer PE through a BGP update message, it sets the route target attribute carried in the update message to export target. ■ Import target attribute—When a PE receives an update message from the peer PE, it checks the route target attribute in the update message. If the route target value matches an import target on the PE, the PE accepts the L2VPN information in the update message.


In brief, route target attributes define which PEs can receive L2VPN information, and from which PEs a PE can receive L2VPN information. Different from Martini mode, the Kompella mode does not distribute the VC label assigned by the local PE directly to the peer PE through the signaling protocol. Instead, it uses label blocks to assign labels to multiple connections at time. A PE advertises label blocks to all PEs in the same VPN. Each PE calculates the VC labels according to the label blocks from other PEs. A label block includes the following parameters: ■ Label Base—Initial label value of the label block. A PE automatically selects the LB value that cannot be manually modified. ■ Label Range—Number of labels that the label block contains. LB and LR determine the labels contained in the label block. For example, if the LB is 1000 and LR is 5, the label block contains labels 1000 through 1004. ■ Label-block Offset—Offset of the label block. When CEs increase in a VPN and the existing label block size is not enough, you do not need to withdraw the label block on the PEs. Instead, you can assign a new label block in addition to the existing label block to enlarge the label range. A PE uses LO to identify a label block among all label blocks, and to determine from which label block it assigns labels. The LO value of a label block is the sum of LRs of all previously assigned label blocks. For example, if the LR and LO of the first label block is 10 and 0, the LO of the second label block is 10. If the LR of the second label block is 20, the LO of the third label block is 30. The following describes a label block in the format of LB/LO/LR. For example, a label block whose LB, LO, and LR are 1000, 10, and 5 is represented as 1000/10/5. With label blocks, you can reserve some labels for the VPN for future use. This wastes some label resources in the short term, but can reduce the VPN deployment and configuration workload in the case of expansion. Assume that an enterprise VPN contains 10 CEs and the number of CEs might increase to 20 in future. In this case, set the LR to 20. When you add a CE to the VPN, you only need to modify the configurations of the PE to which the new CE is connected. No change is required for the other PEs, which simplifies VPN expansion.

CCC MPLS L2VPN The CCC mode sets up a CCC connection by establishing two static LSPs in opposite directions and binding the static LSPs to ACs.


Unlike other MPLS L2VPN implementations, Circuit Cross Connect (CCC) employs just one level of label to transfer user data. Therefore, it uses label switched paths (LSPs) exclusively. That is, a CCC LSP can be used to transfer only the data of the CCC connection; it can neither be used for other MPLS L2VPN connections, nor for MPLS L3VPN or common IP packets. The most significant advantage of this method is that no label signaling is required for transferring Layer 2 VPN information. As long as MPLS forwarding is supported and service provider networks are interconnected, this method works perfectly. In addition, since LSPs are dedicated, this method supports QoS services. There are two types of CCC connections: ■ Local connection: A local connection is established between two local CEs that are connected to the same PE. The PE functions like a Layer 2 switch and can directly switch packets between the CEs without any static LSP. ■ Remote connection: A remote connection is established between a local CE and a remote CE, which are connected to different PEs. In this case, a static LSP is required to transport packets from one PE to another. Note You must configure for each remote CCC connection two LSPs, one for inbound and the other for outbound, on all P devices on the path.

SVC MPLS L2VPN Static Virtual Circuit (SVC) also implements MPLS L2VPN by static configuration. It transfers L2VPN information without using any signaling protocol. The SVC method resembles the Martini method closely and is in fact a static implementation of the Martini method. The difference is that it does not use LDP to transfer Layer 2 VC and link information. You only need to configure VC label information. Note The labels for CCC and SVC range from 16 to 1023, which are reserved for static LSPs.


MPLS L2VPN Architecture - Martini Overview The Martini L2VPN configuration method is now discussed in more detail as it is the focus of this study guide. Martini is configured on the premise that a working core MPLS infrastructure is in place providing PE to PE communication. The L2VPN is configured between two PE devices and requires that the loopback addresses of the PE devices be advertised via a unique label. To implement this, the loopback IP addresses must be configured using a /32 mask, advertised with a routing protocol such as OSPF, and advertised via a label distribution protocol such as LDP. For L2VPNs to function, active LSPs between PE loopbacks are required across the MPLS core. As an example, in Figure 10-4, PE1 is configured with loopback address of 10.0.0.1/32. This address is advertised using a routing protocol like OSPF to PE2. At the same time, LDP advertises this destination prefix with an auto generated label to PE2.

Figure 10-4: MPLS L2VPN Architecture - Martini

PE2 is thus able to reach 10.0.0.1/32 using a label rather than an IPv4 prefix (assume penultimate hop popping is turned off in this example). Traffic from PE2 to the loopback of PE1 will use the MPLS tunnel or LSP. In other words, traffic sent from PE2 to PE1 uses an MPLS tagged packet with the label represented as "T" (tunnel label) in Figure 10-4. This forms a virtual tunnel between PE1 and PE2. In this example, the label discussed was applied to a destination IPv4 prefix (10.0.0.1/32). However, labels can also be applied to L2 interfaces. This is the feature that L2VPNs make use of to create Layer 2 connections between sites. The PE


devices in the figure do not have IP addresses configured on the interfaces connecting them to the CE devices (AC interfaces). PE1 generates an additional label for the interface to CE1 (AC). This label is advertised using remote LDP to PE2 and is represented in Figure 10-4 as "V" (VC circuit label). MPLS L2VPNs use label stacks to implement the transparent transmission of user packets in the MPLS network. Layer 2 packets received by the PE from a CE, are encapsulated using a VC label and tunnel label. ■ An outer label, also called a tunnel label, is used to transfer packets from one PE to another. ■ An inner label, also called a VC (virtual circuit) label, identifies a Layer 2 connection to a CE. PE devices use this label to identify a specific local L2 port. This is required because a PE may have multiple ports using the same tunnel label for communication with another PE. ■ Upon receiving packets, a PE determines to which CE the packets are to be forwarded to according to the VC labels. As shown in Figure 10-4, MPLS L2VPN forwards packets in the following steps: ■ After PE 1 receives a Layer 2 packet from CE 1, it adds a VC label to the packet according to the VC bound to the AC, searches for the public tunnel, adds a tunnel tag to the packet, and then forwards the packet to the core MPLS network. Any P devices will forward the packet to PE 2 according to the tunnel tag. In this example however, PE1 forwards the traffic directly to PE2 as no P devices are shown in the figure. ■ After PE 2 receives the packet from the public tunnel, it identifies the VC to which the packet belongs according to the VC label of the packet, deletes the tunnel label and the VC label from the packet, and then forwards the resulting packet to CE 2 through the AC bound to the VC.

MPLS L2VPN Architecture - Martini (continued) To reiterate, one of the advantages of L2VPNs over L3VPNs is that multiple types of traffic can be forwarded from CE to CE and not only IPv4. Ethernet or PPP frames could be transported transparently across the MPLS core. Another advantage of using MPLS virtual leased lines (VLL) rather than physical leased lines is link recovery. If a link failed in an MPLS core with multiple paths between PE devices, OSPF could reroute the traffic using an alternate path through


the core. A new tunnel label would be used, but the same VC label could be used for the VLL. The VLL would therefore still be available, whereas a physical leased line would be down. PE devices may have multiple customers connected to them. Traffic received from a customer by one PE should not inadvertently be transmitted to a different customer L2VPN on another PE device. On receipt of packets from a remote PE device, the local PE requires a label to identify which interface to forward the packets out of. The PE will receive packets from the tunnel and use the virtual circuit (VC) label to determine the local egress interface. The PE will remove the tunnel and VC label from the packet, and then forward the packet to the CE device. This ensures that packets are transmitted out of the correct interface and ensures that customer is unaware of any labels or VPNs. A L2VPN typically requires bidirectional communication. However, MPLS label switch paths (LSPs) are unidirectional and therefore two separate LSPs are used for a L2VPN (transmit/receive). LSPs are created independently of L2VPNs and PE devices require a mechanism to ensure that the transmitted and received packets of a L2VPN are processed correctly. To ensure that both transmit and receive traffic of a L2VPN is processed as part of that L2VPN and not injected into another L2VPN, configure both PE devices with the same pseudo wire ID. A unique Pseudo wire ID will be assigned to each L2VPN connection. This is configured by each PE device forming the point-to-point connection. The Pseudo wire ID has to be set to the same value on both sides of the L2VPN and has to be unique on each PE device. Multiple L2VPNs could be configured on a single PE device and therefore each L2VPN must be uniquely identified on that PE with a unique PW. Remote LDP advertises labels between PE devices and ensures that transmit and receive LSPs are bound to the correct L2 VPN. A PE device will receive an LDP advertisement from another PE device indicating which label is associated with a particular PW. Pseudo wire IDs function as possible destination resources in the same way that IPV4 prefixes in the routing table function as possible destination resources. Basic LDP generates a label for IP prefixes found in the routing table and advertises that


label to peers. In this case however, remote LDP generates labels for the PW IDs configured locally on the PE device and advertises those PW IDs and labels to peer PE devices (which may be remote). The remote LDP announcement contains information about the PW ID and label which allows the remote PE to determine which label to use for a particular PW ID. As an example, PE2 is advertising label 65679 to PE1 via remote LDP for pseudo wire ID 3. When PE1 receives traffic from a CE device that is matched to the L2VPN (PW 3), PE1 will encapsulate the traffic from the CE with inner label 65679 (VC label) and an outer label (LSP tunnel label) learnt via LDP. The packet will then be sent to the core MPLS network as an MPLS packet. The combination of LDP (outer label advertisements) and remote LDP (inner label advertisements) ensures that the correct LSP is used for both the transmit and receive paths of the L2VPN. PE devices know which unidirectional LSP to use for the bidirectional pseudo wire.

MPLS L2VPN Configuration Steps The following configurations steps apply to Comware 7 devices. Comware 5 steps differ. The overview of configuring MPLS L2VPNs steps are as follows: 1. Before L2VPNs can be configured, the prerequisite steps of configuring the core MPLS network with OSPF, MPLS and LDP must be completed. 2. The PE devices are configured globally for MPLS L2VPNs. 3. A service instance on the customer facing interfaces is configured. 4. In the last step a cross connect group is defined and used to bind or cross connect the customer facing interface and the target PE device. Once configured, verify that the configuration is working as expected.

Step 1: Configure Basic MPLS and LDP The first step is to configure basic MPLS and LDP. The configuration was explained in Chapter 9 and it thus not repeated here. Before VPLS is configured, it is assumed that the backbone infrastructure has been configured:


■ IP routing is configured with a routing protocol such as OSPF ■ Loopback addresses are configured and are being advertised ■ Basic MPLS and LDP are configured on all backbone facing interfaces ■ That any target PE loopback IP addresses are reachable via an LSP. Ensure for example that PE1 is able to reach PE2 via a unique LSP and that PE2 can reach PE1 using a different LSP label.

Step 2: Configure Global L2VPN Once basic MPLS has been configured and tested, L2VPNs can be configured. Ensure for example that the following has been completed: ■ LSR ID for the PE has been configured using the mpls lsr-id command ■ MPLS has been enabled on the backbone MPLS interface using the mpls enable command The l2vpn enable command, as shown in Figure 10-5, is not used exclusively by L2VPNs, but is required for any kind of advanced L2 connectivity on a Comware7 device such as SPB or VPLS.

Figure 10-5: Step 2: Configure Global L2VPN

You must enable L2VPN before configuring other L2VPN settings. Use the l2vpn enable command to enable L2VPN. Use the undo l2vpn enable command to disable L2VPN. L2VPNs are disabled by default.

Syntax l2vpn enable undo l2vpn enable Follow the steps in Table 10-4 to enable L2VPN. Table 10-4: Steps to enable L2VPN


Step

Command

Remarks

1. Enter system view. system-view 2. Enable L2VPN.

l2vpn enable By default, L2VPN is disabled.

Step 3: Configure a Service Instance Overview Once L2VPN has been enabled globally on a device, a service instance is required for L2VPNs. A service instance is once again a generic term and is also used in various L2 protocols such as VPLS and SPB. The service instance is a locally significant number. This is configured on the PE device and applied to the customer facing interface. In Figure 10-6, service-instance 10 is applied to interface ten-gigabit Ethernet 1/0/1. Instead of selecting all arriving traffic on the interface for transmission through the L2VPN, a network administrator can specify that only certain customer traffic types are transported via the L2VPN. In the simplest scenario, all arriving traffic is selected for transmission. Other options include specifying an individual VLAN or tagged traffic only. The encapsulation command is used to specify matched traffic. In Figure 10-6, VLAN 10 traffic is selected for transmission.

Figure 10-6: Step 3: Configure a Service Instance

802.1Q supports up to 4094 VLANs and therefore 4094 service instances could be created on a single interface to match individual VLANs. Each service instance in turn is mapped to a unique L2VPN. As an example, VLAN 10 traffic arriving on an interface at SiteA could be matched and forwarded to one remote PE (SiteB) while VLAN 11 traffic arriving on the same interface is forwarded to a different remote PE (SiteC). The service VLAN (s-vid) option applies to the outer VLAN tag of an incoming packet. If 802.1Q frames are received from a customer network, the service VLAN ID matches the 801.2Q VLAN ID.


However, QinQ frames could transmitted to the PE. An intermediate device between the PE and CE could be adding and additional tag to the original 802.1Q frames. That would allow the service provider to assign all VLANs that belong to a single customer to a single outer VLAN tag. Therefore, the PE will receive frames with two tags, the outer tag being the service VLAN ID and the inner tag, the customer VLAN ID. The PE will transport all VLANs that belong to that customer based on the single outer VLAN ID and send that traffic to a specific site. Other customers would be encapsulated with a different service VLAN ID, which would allow the service provider to scale beyond 4096 customer VLANs.

Service-instance Use the service-instance command to create a service instance and enter service instance view. Use the undo service-instance command to delete an existing service instance. By default, no service instance is created. The service instances created on different Layer 2 Ethernet interfaces can have the same service instance ID. service-instance service-instance-id undo service-instance service-instance-id instance-id

Specifies the ID of the service instance, in the range of 1 to 4096.

Example Create service instance 1 on the Layer 2 Ethernet interface Ten-GigabitEthernet 1/0/1 and enter service instance 1 view. system-view [Sysname] interface ten-gigabitethernet 1/0/1 [Sysname-Ten-GigabitEthernet1/0/1] service-instance 1 [Sysname-Ten-GigabitEthernet1/0/1-srv1]

encapsulation (service instance view) Use the encapsulation command to configure a packet matching rule for the current service instance.


Use the undo encapsulation command to remove the packet matching rule of the current service instance. By default, no packet matching rule is configured for a service instance. You can choose only one of the following match criteria for a service instance: ■ Match all incoming packets. ■ Match incoming packets with any VLAN ID or no VLAN ID. ■ Match incoming packets with a specific VLAN ID. The match criteria for different service instances configured on an interface must be different. You can create multiple service instances on a Layer 2 Ethernet interface, but only one service instance can use the default match criteria (encapsulation default) to match packets that do not match any other service instance. If only one service instance is configured on an interface and the service instance uses the default match criteria, all packets received on the interface match the default match criteria. This command cannot be executed multiple times for a service instance. Removing the match criteria for a service instance also removes the association between the service instance and the VSI.

Syntax encapsulation default encapsulation { tagged | untagged } encapsulation s-vid vlan-id [ only-tagged ] undo encapsulation default

Specifies the default match criteria. s-vid vlan-id

Matches packets with a specific outer VLAN ID. The vlan-id argument specifies a VLAN ID in the range of 1 to 4094. only-tagged

Matches only tagged packets. If this keyword is not specified when the


matching VLAN is the default VLAN, packets with the default VLAN ID or without any VLAN ID are all matched. If this keyword is specified when the matching VLAN is the default VLAN, only packets with the default VLAN ID are matched tagged

Matches tagged packets. untagged

Matches untagged packets.

Example Configure service instance 1 on Ten-GigabitEthernet 1/0/1 to match packets that have an outer VLAN ID of 111. system-view [Sysname] interface GigabitEthernet 1/0/1 [Sysname-GigabitEthernet3/0/3] service-instance 1 [Sysname-GigabitEthernet3/0/3-srv100] encapsulation s-vid 111

Step 4: Configure Cross Connect Group Overview In this step a cross connect group is created to bind the customer interface to the target PE. The customer interface was configured in the previous step with a service instance. The service instance configured previously selects traffic to transmit via the L2VPN and that is bound in this step to the remote PE. The IP address specified by the peer command is the remote PE's loopback IP address which is reachable using a unidirectional LSP through the core MPLS network. The outer label (tunnel label) is learned via LDP and the inner label (VC label) is configured with the pw-id parameter. The pseudo wire (pw-id) parameter needs to be configured the same on both PE devices. This allows both remote PE devices to determine through remote LDP advertisements that this connection is the same L2VPN. Received and transmitted traffic will therefore be matched to the same L2VPN. In Figure 10-7, a cross connect group with the name l2vpn1 is configured. This is a locally significant value. Within the cross connect group, a connection with the name ldp is created that binds service instance configured on Ten Gigabit Ethernet 1/0/1 to


PE 192.3.3.3 using pseudo wire 3. The pw-id needs to be unique and match on both PE devices. Each pseudo wire identifies a L2VPN and thus needs to be unique on each PE, but needs to be the same on both sides of the pseudo wire.

Figure 10-7: Step 4: Configure Cross Connect Group

xconnect-group Use the xconnect-group command to create a cross-connect group and enter crossconnect group view. If the specified group has been created, the prompt changes to the cross-connect group view. Use undo xconnect-group to delete a cross-connect group. L2VPNs can create multiple LDP, BGP, and static PWs for a cross-connect group.

Syntax xconnect-group group-name undo xconnect-group group-name group-name

Specifies the name of the cross-connect group, a case-sensitive string of 1 to 31 characters excluding hyphens.

Example Create a cross-connect group named vpn1 and enter cross-connect group view. system-view [Sysname] xconnect-group vpn1 [Sysname-xcg-vpn1]

connection Use connection to create a cross-connect and enter cross-connect view. If the specified cross-connect has been created, the command opens cross-connect view.


Use undo

connection

to remove a cross-connect.

A cross-connect is a point-to-point connection. You can perform the following operations in cross-connect view: ■ Execute ac interface and peer to connect an AC to a PW, so the PE can forward packets between the AC and the PW. ■ Execute peer twice to connect two PWs to form a multi-segment PW. ■ Execute ac interface and ccc to connect an AC to a remote CCC connection, so the PE can forward packets between the AC and the remote CCC connection. No cross-connect is created by default.

Syntax xconnect-group group-name undo xconnect-group group-name group-name

Specifies the name of the cross-connect group, a case-sensitive string of 1 to 31 characters excluding hyphens.

Example Create cross-connect ac2pw for cross-connect group vpn1 and enter cross-connect view. system-view [Sysname] xconnect-group vpn1 [Sysname-xcg-vpn1] connection ac2pw [Sysname-xcg-vpn1-ac2pw]

ac interface Use ac interface to bind an AC to a cross-connect. Use undo ac interface to remove the binding. An AC can be a Layer 3 interface or a service instance on a Layer 2 Ethernet interface. After you bind a Layer 3 interface or a service instance on a Layer 2 interface to a


cross-connect, the cross-connect forwards packets received from the Layer 3 interface or packets that match the service instance on the Layer 2 interface to the bound PW or another AC. The access mode determines how the PE treats the VLAN tag in Ethernet frames received from the AC. It also determines how the PE forwards Ethernet frames to the AC. ■ VLAN access mode—Ethernet frames received from the AC must carry a VLAN tag in the Ethernet header. The VLAN tag is called a P-tag assigned by the service provider. Ethernet frames sent to the AC must also carry the P-tag. ■ Ethernet access mode—If Ethernet frames from the AC have a VLAN tag in the header, the VLAN tag is called a U-tag, and the PE ignores it. Ethernet frames sent to the AC do not carry the P-tag. The service instance specified in this command must have match criteria configured by encapsulation. No AC is bound to a cross-connect by default.

Syntax ac interface interface-type interface-number [ service-instance instance-id ] [ access-mode { ethernet | vlan }] undo ac interface interface-type interface-number [ service-instance instance-id ] interface-type interface-number

Specifies an interface. service-instance instance-id

Specifies a service instance by its ID in the range of 1 to 4096. access-mode

Specifies the access mode. By default, the access mode is VLAN. ethernet

Specifies the Ethernet access mode. vlan

Specifies the VLAN access mode.


Examples Configure service instance 200 on the Layer 2 interface Ten-GigabitEthernet 1/0/1 to match packets with an outer VLAN tag of 200, and bind the service instance to the cross-connect actopw in the cross-connect group vpn1. system-view [Sysname] interface ten-gigabitethernet 1/0/1 [Sysname-Ten-GigabitEthernet1/0/1] service-instance 200 [Sysname-Ten-GigabitEthernet1/0/1-srv200] encapsulation s-vid 200 [Sysname-Ten-GigabitEthernet1/0/1-srv200] quit [Sysname-Ten-GigabitEthernet1/0/1] quit [Sysname] xconnect-group vpn1 [Sysname-xcg-vpn1] connection actopw [Sysname-xcg-vpn1-actopw] ac interface ten-gigabitethernet 1/0/1 service-instance 200

Configure service instance 200 on Ten-GigabitEthernet 1/0/1 to match packets with an outer VLAN tag of 200, and bind the service instance to the auto-discovery crossconnect in the cross-connect group vpwsbgp. system-view [Sysname] interface ten-gigabitethernet 1/0/1 [Sysname-Ten-GigabitEthernet1/0/1] service-instance 200 [Sysname-Ten-GigabitEthernet1/0/1-srv200] encapsulation s-vid 200 [Sysname-Ten-GigabitEthernet1/0/1-srv200] quit [Sysname-Ten-GigabitEthernet1/0/1] quit [Sysname] xconnect-group vpwsbgp [Sysname-xcg-vpwsbgp] auto-discovery bgp [Sysname-xcg-vpwsbgp-auto] site 1 range 10 default-offset 0 [Sysname-xcg-vpwsbgp-auto-1] connection remote-site-id 2 [Sysname-xcg-vpwsbgp-auto-1-2] ac interface ten-gigabitethernet 1/0/1 service-instance 200

peer Use peer to configure a PW for a cross-connect and enter cross-connect PW view. If the specified PW has been created, the command opens cross-connect PW view. Use undo

peer

to delete a PW.


To create a static PW, you must specify the incoming and outgoing labels. To enter the view of an existing static PW, you do not need to specify the incoming and outgoing labels. If you do not specify the incoming and outgoing labels when you create a new PW, LDP is used to create the PW. The PW ID for a PW must be the same on the PEs at the ends of the PW. The LSR ID of the peer PE and the PW ID uniquely identify a PW, and must not both be the same as those of any VPLS PW or PW bound to a cross-connect. PW redundancy is mutually exclusive with multi-segment PW function. If you have configured two PWs by using the peer command in cross-connect view, you cannot configure a backup PW by using the backup-peer command in cross-connect PW view, and vice versa. No PW is configured for a cross-connect by default.

Syntax peer ip-address pw-id pw-id [ in-label label-value out-label label-value ] [ pw-class class-name | tunnel-policy tunnel-policy-name ] undo peer ip-address pw-id pw-id ip-address

Specifies the LSR ID of the peer PE. pw-id pw-id

Specifies a PW ID for the PW, in the range of 1 to 4294967295. in-label label-value

Specifies the incoming label of the PW, in the range of 16 to 1023. out-label label-value

Specifies the outgoing label of the PW, in the range of 16 to 1023. pw-class class-name

Specifies a PW class by its name, a case-sensitive string of 1 to 19 characters. You can configure the PW type and control word by specifying a PW class. If no PW class is specified, the PW type is


determined by the interface type. The control word function is not supported for PW types that do not require using control word. tunnel-policy tunnel-policy-name

Specifies a tunnel policy by its name, a case-sensitive string of 1 to 19 characters. If no tunnel policy is specified, the default tunnel policy is used.

Examples Configure an LDP PW destined to 4.4.4.4 for the cross-connect pw2pw in the crossconnect group vpn1 and enter cross-connect PW view. The PW ID is 200. system-view [Sysname] xconnect-group vpn1 [Sysname-xcg-vpn1] connection pw2pw [Sysname-xcg-vpn1-pw2pw] peer 4.4.4.4 pw-id 200 [Sysname-xcg-vpn1-pw2pw-4.4.4.4-200]

Configure a static PW destined to 5.5.5.5 for the cross-connect pw2pw in the crossconnect group vpn1 and enter cross-connect PW view. The static PW has an ID of 200, an incoming label of 100, and an outgoing label of 200. system-view [Sysname] xconnect-group vpn1 [Sysname-xcg-vpn1] connection pw2pw [Sysname-xcg-vpn1-pw2pw] peer 5.5.5.5 pw-id 200 in-label 100 out-label 200 [Sysname-xcg-vpn1-pw2pw-5.5.5.5-200]

Step 5: Verify Overview The configured L2VPN can be verified and status reviewed. Figure 10-8 shows output for both PE1 and PE2. The output shows that remote LDP is advertising the labels to use (In/Out) for PW ID (pseudo wire) 3. An xconnect group named l2vpn1 is configured on both PE devices. PE1 has a peer of 192.3.3.3 (PE2) and PE2 has a peer of 192.2.2.2 (PE1).



PE2 is advertising label 65679 (in label) to PE1 via remote LDP. When PE1 receives traffic from a CE device that is matched to the L2VPN, PE1 will encapsulate the traffic from the CE with inner label 65679 (VC label) and an outer label (LSP tunnel label) learnt via LDP (not shown in output). The packet will then be sent to the core MPLS network as an MPLS packet (with label stack of L). In the same way, when PE2 sends packets via the L2VPN to PE1, the inner label (VC label) is set to 65681 as learned via remote LDP and the outer label (LSP tunnel label) is learnt locally using LDP, typically from a P device. Once label exchange has taken place, the status of L2VPN is up.

Guidelines Use the display l2vpn pw command to display L2VPN PW information.

Syntax display l2vpn pw [ xconnect-group group-name ] [ protocol { bgp | ldp | static } ] [ verbose ] xconnect-group group-name

Displays L2VPN PW information for the cross-connect group specified by its name, a case-sensitive string of 1 to 31 characters. If no group is specified, the command displays L2VPN PW information for all crossconnect groups. protocol

Displays L2VPN PW information established by a specific protocol. If


no protocol is specified, the command displays L2VPN PW information established by all protocols. bgp

Displays BGP PW information. ldp

Displays LDP PW information. static

Displays static PW information, including remote CCC connections. verbose

Displays detailed information. Without this keyword, the command displays brief information.

display l2vpn pw Display brief information about all L2VPN PWs. display l2vpn pw Flags: M - main, B - backup, H - hub link, S - spoke link, N - no split horizon Total number of PWs: 2, 2 up, 0 blocked, 0 down, 0 defect Xconnect-group Name: ldp PeerPW ID/Rmt Site In/Out Label Proto Flag Link ID State 192.3.3.3 500

65699/65699 LDP M 0 Up

Xconnect-group Name: vpnb Peer PW ID/Rmt Site In/Out Label Proto Flag Link ID State 192.3.3.3 2

65636/65663 BGP M 1 Up

See Table 10-5 for the output description: Table 10-5: Display l2vpn pw output description Step

Description PW flag:

Flag

• M—Primary PW. • B—Backup PW.


PW ID/Rmt Site Proto Link ID State

This field displays the PW ID for a static or LDP PW, and displays the remote site ID for a BGP PW. Protocol used to establish the PW: LDP, Static, or BGP. Link ID of the PW. PW state: Up, Down, Blocked, or BFD Defect. Blocked indicates that the PW is a backup PW. Defect indicates BFD has detected a defect on the PW.

display l2vpn pw verbose Display detailed information about all PWs.. display l2vpn pw verbose Xconnect-group Name: ldp Connection Name: ldp Peer: 192.3.3.3 PW ID: 500 Signaling Protocol : LDP Link ID : 0 PW State : Up In Label : 65699 Out Label: 65699 MTU : 1500 PW Attributes : Main VCCV CC : VCCV BFD : Tunnel Group ID : 0x1800000160000000 Tunnel NHLFE IDs : 136 Xconnect-group Name: vpnb Connection of auto-discovery: Site 1 Peer: 192.3.3.3 Remote Site: 2 Signaling Protocol : BGP Link ID : 1 PW State : Up In Label : 65636 Out Label: 65663 MTU : 1500 PW Attributes : Main VCCV CC : VCCV BFD : Tunnel Group ID : 0x1800000160000000 Tunnel NHLFE IDs : 136

See Table 10-6 for the output description.


Table 10-6: Display l2vpn pw verbose output description Field

Description

XconnectCross-connect group name. group Name Connection Cross-connect name, which is displayed for LDP and static PWs. Peer Peer IP address of the peer PE of the PW. PW state: Up, Down, Blocked, or Defect. PW State Blocked indicates that the PW is a backup PW. Defect indicates BFD has detected a defect on the PW. Wait time to switch traffic from the backup PW to the primary PW Wait to when the primary PW recovers, in seconds. If the switchover is Restore disabled, this field displays Infinite. Time This field is available when both primary and backup PW exist, and is displayed only for the primary PW. Remaining Remaining wait time for traffic switchover, in seconds. Time PW attribute: PW • Main—The PW is the primary PW. Attributes • Backup—The PW is the backup PW. VCCV CC type: • Control-Word—Control word. VCCV CC

• Router-Alert—MPLS router alert label. • TTL—TTL timeout. VCCV BFD type:

• Fault Detection with BFD—BFD packets use IP/UDP encapsulation (with IP/UDP Headers). VCCV BFD • Fault Detection with Raw-BFD—BFD packets use PW-ACH encapsulation (without IP/UDP Headers). Tunnel Group ID

ID of the tunnel group for the PW.

NHLFE ID of the public tunnel that carries the PW. Tunnel If equal-cost tunnels are available, this field displays multiple NIDs. NHLFE IDs If no tunnel is available, this field displays None. Connection


of autoThe PW is a BGP PW. discovery Site Local site ID. Remote Site Remote site ID.

Summary In this chapter, you learned about MPLS L2VPNs which provide Layer 2 point-topoint VPN services over an MPLS or IP backbone. MPLS L2VPNs transfer user data transparently. The MPLS network is a Layer 2 switched network that can be used to establish Layer 2 connections between CE nodes. Various implementation methods were discussed including Martini, Kompella, CCC and SVC. Martini which uses LDP to exchange VC labels as discussed at length. The configuration and verification of Martini implementations of L2VPNs was then discussed.

Learning Check Answer each of the questions below. 1. Which L2VPN implementation uses MBGP? a. SVC b. Martini c. Kompella d. CCC 2. Which L2VPN implementation uses one level of label? a. SVC b. Martini c. Kompella d. CCC 3. An administrator wants to configure a L2VPN. Which parameter must to be the same on both PE devices?


a. pw-id b. xconnect-group c. connection d. service-instance 4. Which L2VPN implementation uses LDP? a. SVC b. Martini c. Kompella d. CCC 5. Which protocol advertises the outer label in a Kompella L2VPN? a. BGP b. LDP c. MBGP d. Remote LDP

Learning Check Answers 1. c 2. d 3. a 4. b 5. b


11 Virtual Private LAN Service (VPLS)

EXAM OBJECTIVES In this chapter, you learn to: ✓ Describe MPLS VPLS Features. ✓ Understand VPLS architecture. ✓ Describe VPLS Loop Prevention. ✓ Configure MPLS VPLS.

INTRODUCTION Virtual private LAN service (VPLS), also called transparent LAN service (TLS) or virtual private switched network service, can deliver a point-to-multipoint L2VPN service over public networks. With VPLS, geographically-dispersed sites can interconnect and communicate over a metropolitan area networks (MAN) or wide area networks (WAN) as if they were on the same local area network (LAN).

ASSUMED KNOWLEDGE You should have a basic knowledge of Label Distribution Protocol (LDP) and prefixes that trigger a label switch path (LSP).

MPLS VPLS Overview Virtual Private LAN Service (VPLS), also called transparent LAN service (TLS) or virtual private switched network service, delivers a point-to-multipoint L2VPN


service over an MPLS or IP backbone, as shown in Figure 11-1. The provider backbone emulates a switch to connect all geographically dispersed sites of each customer network. The backbone is transparent to the customer sites, which can communicate with each other as if they were on the same LAN.

Figure 11.1: MPLS VPLS

VPLS provides Layer 2 VPN services for CE devices. However, it supports multipoint services, rather than the point-to-point services that traditional L2VPNs support.


While L2VPN PE devices simply forward any packet received from the CE devices on the service instance, VPLS PE devices participate in customer MAC address learning. PE devices learn and maintain a MAC address table in a similar way to a traditional Ethernet switch. The VPLS virtual switch instance has both local physical interfaces as well as virtual Ethernet interfaces. Any source MAC addresses in frames received from either a local CE device (local physical interface) or from remote PE devices (virtual Ethernet interfaces) will be learned and maintained in a virtual switch MAC address table. Each PE creates and maintains a virtual switch instance (VSI) which provides transparent layer 2 forwarding for customer sites. Note Even though VPLS can be enabled over IP backbone networks, this study guide focuses on MPLS based networks. As shown in Figure 11-2, a VPLS network can be regarded as a large, multisite, layer 2 switch from the point of view of users. The VPLS network can transparently transmit all L2/L3 packets sent by CE devices and it appears as if the VPLS network is a simple L2 switch with no protocols enabled. However, VPLS is actually a multipoint VPN technology that offers a service equivalent to an Ethernet switch over an MPLS-based core network.

Figure 11.2: CE devices view VPLS network as one large L2 switch

How are packets processed in a VPLS environment? In Figure 11-3, a core MPLS


network and connected CE sites are shown. For the CE devices, a VPLS network acts like a L2 switch with no protocols enabled, and transparently transmits all user packets (user PDU).

Figure 11.3: Packet forwarding.

The PE devices encapsulate CE traffic from local sites destined to remote sites using a virtual circuit (VC) label and MPLS tunnel label. Based on the destination MAC address in the frame received from the CE device, the traffic will be forwarded to either a single remote PE via a pseudo wire or multiple PE devices via multiple pseudo wires. The encapsulated MPLS packet will be label switched across the MPLS backbone by P devices. The remote PE device will upon receipt of the MPLS encapsulated packet, use the VC label to select the correct VPN to which the user packet belongs. The PE will then, based on the destination MAC address, select the correct egress physical interface. The PE will lastly remove the VC label and forward the original user packet to the local CE device. Once again, from the customer point of view, the VPLS is viewed as a multisite, layer 2 switch. Multiple customer sites can be connected to the VPLS network and because the PE devices learn MAC addresses, traffic from site 1 to site 2 can be


transmitted directly between those two sites via a pseudo wire (PW). Unicast traffic from site 1 to site 3 will also be transmitted directly via a PW for any discovered MAC addresses and not flooded to all sites. As shown in Figure 11-1, a customer may have a stretched data center network consisting of a layer 2 network hosting virtual machines across 4 sites (datacenters). Virtual machines across the 4 sites could be configured in the same subnet. A virtual machine in site 1 will be able to communicate with a virtual machine in site 2 without any inter-VLAN routing. Unicast communication between a virtual machine in site 1 and a virtual machine in site 2 would also only traverse PE1 and PE2 and any intermediate core MPLS P devices. The unicast traffic between the two sites is not sent to sites 3 or site 4. This because of MAC address learning by the PE devices and the direct PW between PE1 and PE2. As an additional example, this also applies for communication from Site 1 to Site 3. Unicast traffic transmitted between virtual machines in Site 1 and 3 is contained to those sites and is not visible in Site 2 or Site 4. This behavior is consistent with the behavior of a transparent, layer 2 Ethernet switch as represented in Figure 11-2. Logically, each site is connected to a large Ethernet switch which performs MAC address learning, forwarding of unicast traffic to specific ports only; and flooding of broadcast, multicast and unknown unicast traffic. VPLS technology is implemented using two drafts (Martini and Kompella), but is limited to Ethernet interfaces. Currently, the Martini mode is the most widely used version.

Use Case: Interconnect for Multiple Data Centers One of the use cases for VPLS is using a MPLS backbone to interconnect multiple data centers at Layer 2, as shown in Figure 11-4.


Figure 11.4: Use Case: Interconnect for Multiple Data Centers

L2VPNs could be used in cases where two data centers are connected. This is because L2VPNs provide only point-to-point connections and not point-to-multipoint connection. If three or more data centers need to be connected in the same Layer 2 VLAN, VPLS would be required. Technologies like VMware vMotion are only supported when the Hypervisors are connected in the same Layer 2 VLAN (this changed in ESXi version 6). In this example, virtual machines can be moved between any of the three remote data centers across the MPLS backbone. The core MPLS backbone may be using layer 3 connections, but from the point of view of the hypervisors, they are on the same VLAN. OSPF or other protocols used in the core determine the short path between data centers to provide an optimized path.

MPLS VPLS Termininology As shown in Figure 11-5, VPLS terminology is similar to L2VPNs, but with virtual switch extensions.


Figure 11.5: MPLS VPLS Termininology

Pseudo wire (PW): A pseudo wire is a bidirectional virtual connection between two PEs. An MPLS PW consists of two unidirectional MPLS LSPs in opposite directions. In a VPLS solution, multiple pseudo wires are configured to provide point-to-multipoint functionality. Virtual switch instance (VSI): A virtual switch instance is a virtual switch within a PE device operating like a traditional layer 2 switch. The VSI has physical local interfaces on the PE device as well as virtual Ethernet interfaces (PW connections to remote PEs). The VSI can dynamically learn MAC addresses from both the physical and virtual interfaces. There is no central control plane MAC learning or MAC


address synchronization between PE devices. Each PE device performs local MAC learning and acts independently of other PE devices. Broadcast, multicast and unicast traffic are also processed independently by the local virtual switch. VPLS instance: A VPLS instance is a grouping of local virtual switch instances (VSIs) on multiple PE devices into one logical switch. The VPLS instance is created per customer. The virtual switch backplane is not a crossbar or multi-bus backplane as in traditional physical switches but is rather the MPLS core network. From a customer point of view, the VPLS solution mimics a large layer 2 switch, but in reality, a VPLS solution consisting of 4 PE devices actually consists of 4 virtual switch instances working together. Each VSI is performing MAC learning independently of other PE VSIs but together appear to be a single switch. Once again, the logical grouping of all the VSIs is what we call a VPLS instance. See Table 11-1 for VPLS terminology. Table 11.1: VPLS Terminology Term

Description

Attachment circuit that connects the CE to the PE. It can use physical interfaces or virtual interfaces. Usually, all user packets on an AC, AC including Layer 2 and Layer 3 protocol messages, must be forwarded to the peer site without being changed. A customer edge device that is directly connected with the service CE provider network. Packets transmitted over a PW use the standard PW encapsulation Encapsulation formats and technologies: raw and tagged. A forwarder functions as the VPLS forwarding table. Once a PE Forwarders receives a packet from an AC, the forwarder selects a PW for forwarding the packet. Network provider edge device that functions as the network core PE. An NPE resides at the edge of a VPLS network core domain and NPE provides transparent VPLS transport services between core networks. A provider edge device connects one or more CEs to the service provider network. A PE implements VPN access by mapping and PE forwarding packets between private networks and public network tunnels. A PE can be a UPE or NPE in a hierarchical VPLS. A pseudo wire is a bidirectional virtual connection between two PEs. An MPLS PW consists of two unidirectional MPLS LSPs in PW


opposite directions. The PW signaling protocol fundamental to VPLS. It is used for creating and maintaining PWs and to automatically discover the VSI PW signaling peer PE. Currently, there are two PW signaling protocols: LDP and BGP. 802.1Q in 802.1Q, a tunneling protocol based on 802.1Q. It offers a point-to-multipoint L2VPN service mechanism. With QinQ, the private network VLAN tags of packets are encapsulated into the QinQ public network VLAN tags, allowing packets to be transmitted with two layers of tags across the service provider network. This provides a simpler Layer 2 VPN tunneling service. Quality of service (QoS) is implemented by mapping the preference QoS information in the packet header to the QoS preference information transferred on the public network. Route An RD is added before a site ID to distinguish the sites that have the distinguisher same site ID but reside in different VPNs. An RD and a site ID (RD) uniquely identify a VPN site. PEs use the BGP route target attribute (also called "VPN target" attribute) to manage BGP L2VPN information advertisement. PEs support the following types of route target attributes:

Route target (RT)

Tunnel

UPE

• Export target attribute—When a PE sends L2VPN information (such as site ID, RD, and label block) to the peer PE in a BGP update message, it sets the route target attribute in the update message to export target. • Import target attribute—When a PE receives an update message from the peer PE, it checks the route target attribute in the update message. If the route target value matches an import target, the PE accepts the L2VPN information in the update message. Route target attributes determine which PEs can receive L2VPN information, and from which PEs that a PE can receive L2VPN information. A tunnel can be an LSP tunnel or an MPLS TE tunnel. It carries one or more PWs over an IP/MPLS backbone. If a PW is carried on an LSP or MPLS TE tunnel, each packet on the PW packet is forwarded to the correct VSI. The outer label is the public LSP or MPLS TE tunnel label, which makes sure the packet is correctly forwarded to the remote PE. User facing provider edge device that functions as the user access convergence device.


VSI

VPLS instance

A virtual switch instance provides Layer 2 switching services for a VPLS instance on a PE. A VSI acts as a virtual switch that has all the functions of a conventional Ethernet switch, including source MAC address learning, MAC address aging, and flooding. VPLS uses VSIs to forward Layer 2 packets in VPLS instances. A customer network might include multiple geographically dispersed sites (such as site 1 and site 3 in Figure 11-1). The service provider uses VPLS to connect all the sites to create a single Layer 2 VPN, which is referred to as a "VPLS instance." Sites in different VPLS instances cannot communicate with each other at Layer 2.

MPLS VPLS Control Protocols Overview A pseudo wire is a bidirectional virtual connection between two PEs. PEs use PWs to forward packets among VPN sites. PWs include static PWs, LDP PWs, BGP PWs, and BGP auto-discovery LDP PWs. The two dynamic VPLS signaling protocols are LDP and MP-BGP. LDP signaling is used for transmitting VC information and conforms to RFC 4762.In LDP signaling mode, PE peers need to be manually specified. MP-BGP signaling conforms to RFC 4761. MP-BGP can also be used as the signaling protocol for transmitting VC information, but supports automatic topology discovery. PWs can be established on an MPLS tunnel (a common LSP) or a GRE tunnel. For a PW to be established, you need to complete the following: ■ Establish an MPLS tunnel between the local end and the remote peer PE. ■ Determine the address of the peer PE. If the peer PE is in the same VSI as the local PE, you can specify the address of the peer PE manually, or let the signaling protocol find the peer PE automatically. ■ Use either the LDP or BGP signaling protocols to assign multiplex distinguishing flags (that is, VC labels) and advertise the assigned VC flags to the peer PE, establish unidirectional VCs and further establish a PW. If a PW is established on an MPLS tunnel, a packet transported over the PW will contain two levels of labels. The inner label, called a VC label, identifies the VC to which the packet belongs so that the packet is forwarded to the correct CE; while the outer label, called the public network MPLS tunnel label, is for guaranteeing the correct transmission of the packet on the MPLS tunnel.


This study guide focuses on the LDP implementation method. The LDP implementation of VPLS uses extended LDP (remote LDP sessions) as the PW signaling protocol and is called Martini VPLS. This method is easy to implement when compared to MBGP. However, as LDP does not provide an automatic VPLS member discovery mechanism, each peer PE requires manually configuration and every PE needs to be reconfigured whenever a new PE joins.

Figure 11.6: MPLS VPLS Control Protocols

As shown in Figure 11-6, a PW is established between two PE using LDP as follows: 1. After being associated with a VSI, each PE uses LDP in downstream unsolicited (DU) mode to send a label mapping message to its peer PE (without solicitation). The message contains the PW ID FEC, the VC label bound with the PW ID FEC, and interface settings such as maximum transmission unit (MTU). 2. Upon receiving the LDP message, a PE determines whether it is associated with


the PW ID. If the association exists, the PE accepts the label mapping message and responds with its own label mapping message. 3. After a unidirectional VC is established in each direction, the PW is formed. A PW can be viewed as a virtual Ethernet interface of a VSI.

Implementation Methods For reference purposes, a summary of the various PW implementation methods is provided.

Static PW To create a static PW, specify the address of the remote PE, the incoming label, and the outgoing label.

LDP PW (Martini) To create an LDP PW, specify the address of the remote PE, and use LDP to advertise the PW-label binding to the remote PE. After the two PEs receive the PW-label binding from each other, they establish an LDP PW. The FEC type in the LDP message is PW ID FEC Element that includes the PW ID field (FEC 128). The PW ID identifies the PW bound to the PW label. Notes: ■ Two PEs establish a neighborhood with each other via the extended LDP. They directly send LDP messages over TCP connections, maintain a remote LDP session, and exchange VPN control information via the LDP session, including PW label allocation (the PW label is equivalent to a private network label in a L3 VPN). ■ A PE and a P still need to establish a common LDP neighborhood with each other so as to allow for public network MPLS label allocation. ■ A PE establishes a Virtual Switch Instance (VSI) for each VPN. Each VSI has an ID. ■ A pair of bi-directional Pseudo Wires (PWs) is established for each VPN between two PEs. A label is allocated to each PW via the extended LDP. This label is encapsulated in the transmitted packet so as to distinguish VPNs.

BGP PW To create a BGP PW, BGP advertises label block information to the remote PE. After


the two PEs receive label block information from each other, they use the label block information to calculate the incoming and outgoing labels and create the BGP PW. A PE also uses the received label block information to automatically find the remote PE. Notes: ■ Two PEs establish a peering relationship with each other via MBGP. They are added with a VPLS family and exchange VC signaling via BGP. The VPLS in this case is also called Kompella VPLS. ■ A PE establishes a VSI for each VPN. ■ The Kompella VPLS is somewhat similar to common MPLS L3 VPN. An RT and an RD also need to be configured for each VSI.

BGP Auto-discovery LDP PW To create a BGP auto-discovery LDP PW, a PE uses BGP to automatically find the remote PE, and uses LDP to advertise the PW-label binding to the remote PE. After the two PEs receive the PW-label binding from each other, they establish a BGP auto-discovery LDP PW. The information advertised by BGP includes the ID (for example, LSR ID) and VPLS ID of the advertising PE. The receiving PE compares the received VPLS ID with its own VPLS ID. If the two VPLS IDs are identical, the two PEs use LDP to establish a PW. If not, the PEs do not establish a PW. The FEC type in the LDP message is Generalized PW ID FEC Element (FEC 129), which contains the VPLS ID, Source Attachment Individual Identifier (SAII), and Target Attachment Individual Identifier (TAII). The SAII is the LSR ID of the advertising PE. The TAII identifies the remote PE and is advertised by the remote PE. VPLS ID+SAII+TAII uniquely identifies a PW in a VPLS instance.

MPLS VPLS Martini Architecture VPLS uses the same principles as L2VPNs, but supports point-to-multipoint connections using multiple L2VPN connections. When using the Martini implementation method, manual configuration of remote PE peers is required. VC labels are exchanged between peers using extended LDP. For label exchange and user packet forwarding to take place correctly, all PE members of the VPLS instance need to be configured in a full mesh. Each L2VPN configured to a remote PE peer acts as a virtual Ethernet interface in the


virtual switch instance on the local PE. In Figure 11-7, four PE devices are configured to provide VPLS functionality to four customer sites.

Figure 11.7: MPLS VPLS Martini Architecture

On each PE, the following exists: ■ 1 x virtual switch instance (VSI) which learns MAC addresses from connected interfaces.


■ 1 x local physical interface to CE devices (AC). This is added to the VSI using a service instance. ■ 3 x virtual Ethernet interfaces which are the pseudo wires (PWs) connecting the local PE to the 3 remote PE devices. These are L2VPN connections which are terminated on the virtual switch instance rather than on the physical interface to the CE device. This is in contrast to L2VPNs where the L2VPN was terminated on the physical interface. The result is that the VSI is a 4 port layer 2 Ethernet switch. MAC addresses need to be learnt in a similar way to a traditional Ethernet switch. Broadcast, multicast and unknown unicast traffic arriving from either the physical interface or virtual interfaces will also need to be processed and forwarded out of appropriate interfaces. These mechanisms are discussed next.

MPLS VPLS Martini Architecture - MAC Learning MAC Address Learning, Aging, and Withdrawal VPLS provides connectivity through source MAC learning. A PE device maintains a MAC address table for each VSI. As shown in Figure 11-8, a PE learns source MAC addresses in the following ways: ■ Learning the source MAC addresses of directly connected sites: If the source MAC address of a packet from a CE does not exist in the MAC address table, the PE learns the source MAC address on the AC connected to the CE. ■ Learning the source MAC addresses of remote sites connected through PWs: A VSI regards a PW as a logical Ethernet interface. If the source MAC address of a packet received from a PW does not exist in the MAC address table, the PE learns the source MAC address on the PW of the VSI.


Figure 11.8: MPLS VPLS Martini Architecture - MAC Learning

If no packet is received from a MAC address before the aging timer expires, VPLS deletes the MAC address to save MAC address table resources. When an AC or a PW goes down, the PE deletes MAC addresses on the AC or PW and sends an LDP address withdrawal message to notify all other PEs in the VPLS instance to delete those MAC addresses.

Unicast Traffic Forwarding and Flooding After a PE receives a unicast packet from an AC, the PE searches the MAC address table of the VSI bound to the AC to determine how to forward this packet. ■ If a match is found, the PE forwards the packet according to the matching entry. If the outgoing interface in the entry is a PW, the PE inserts the PW label to the packet, adds the public tunnel header to the packet, and then forwards the packet to the remote PE over the PW. If the outgoing interface in the entry is a local interface, the PE directly forwards the packet to the local interface.


■ If no match is found, the PE floods the packet to all other ACs and PWs in the VSI. After a PE receives a unicast packet from a PW, the PE searches the MAC address table of the VSI bound to the PW to determine how to forward this packet. ■ If a match is found, the PE forwards the packet through the egress interface in the matching entry. ■ If no match is found, the PE floods the packet to all ACs in the VSI.

Multicast and Broadcast Traffic Forwarding and Flooding After a PE receives a multicast or broadcast packet from an AC, the PE floods the packet to all other ACs and the PWs in the VSI bound to the AC. After a PE receives a multicast or broadcast packet from a PW, the PE floods the packet to all ACs in the VSI bound to the PW.

Example and Warning: In Figure 11-8, if PE1 receives broadcast traffic from the locally connected CE device. The virtual switch instance will flood the broadcast out of both virtual interfaces to PE2 and PE3. A single packet is replicated twice as the VSI floods traffic in the same way a traditional layer 2 switch does. If the topology was extended to a scenario where 20 sites connected using VPLS, a single broadcast packet received by PE1 would be replicated 19 times. If a multicast video was being streamed by a CE at 1 Mbps, that would result in 19 Mbps on the uplink interface of PE1 to the core MPLS network. This principal applies to multicast, broadcast and unknown unicast.

MPLS VPLS Martini Architecture - Loop Prevention Overview Even though the customer views the VPLS network as one large switch, internally multiple layer 2 switches are connected using layer 2 interfaces. Each PE is configured as a L2 switch and is connected in a full mesh to other PE L2 switches. Therefore loops need to be considered and prevented.


In the topology in Figure 11-9, four VSIs are configured with a full mesh of pseudo wires. Each VSI is acting as a layer 2 switch. In a traditional layer 2 switched network, this would cause loops because a broadcast received by a traditional L2 switch is flooded out of all interfaces except the interface on which it arrives. In a looped topology, this may result in a broadcast storm. In traditional switched networks protocols like spanning tree (STP) are used to prevent loops. However, enabling STP on a service provider network is not feasible because of scalability and slow convergence issues. Therefore, VPLS uses the following methods to prevent loops: ■ Full mesh - PEs and PWs are logically fully meshed. Each PE must create for each VPLS forwarding instance, a tree to all the other PEs of the instance. ■ Split horizon - Each PE must support split horizon to avoid loops, that is, a PE cannot forward packets via PWs of the same VSI instance. In other words, a PE does not forward packets received from a PW to any other PW in the same VSI but only forwards those packets to ACs. Note If a network administrator does not configure a full mesh between PE devices, some CE sites will not be able to communicate with each other.


Figure 11.9: MPLS VPLS Martini Architecture - Loop Prevention

MPLS VPLS Design Considerations A few design considerations to keep in mind when configuring MPLS VPLS:

MPLS Backbone ******ebook converter DEMO Watermarks*******

VPLS may be a candidate solution for scenarios where multiple data centers need to be interconnected. However, VPLS requires that the links between the data centers be MPLS enabled as VPLS uses a virtual circuit label to identify the VPN and an LSP to reach the destination PE device. If the customer has routed connections with a service provider MPLS network, VPLS could also not be used as VPLS requires L2 connections on the PE interfaces. Note Technically, it is possible to create GRE tunnels on certain CE devices and then enable MPLS on the GRE tunnels. This however complicates the configuration.

Layer 3 Routing Another consideration is that VPLS PE devices can transport frames received from the CE device, but the PE devices do not interact with the CE devices. As an example, it is not possible to configure IP addresses on the PE customer facing interfaces. The virtual switch configured on the PEs provide L2 switching only to remote sites and no L3 IP services or IP gateway functionality. If IP gateway functionality is required, or VRRP is required inside the VPLS instance, another device has to be used to provide this L3 gateway functionality. This is achieved by either adding a separate physical device to the PE, or by using a local back-to-back connection to another interface on the PE configured with the required IP addresses and routing. Even though the same device is used for the VSI and routing functionality, logically the VSI interfaces are configured for L2 only and see the L3 interface as an external device.

MPLS VPLS Martini Configuration Steps VPLS relies on a working MPLS backbone. Therefore, before configuring VPLS, ensure that a core MPLS network is operational. The following is an overview of basic VPLS configuration steps (completed on PE devices):

1.

Enable L2VPNs globally. This is the same configuration as the configuration used for L2VPNs.


2. Configure the virtual switch instances (create virtual switch). 3. Configure remote LDP peers. A L2VPN is created for each LDP peer which will operate as a virtual interface on the VSI. 4. Configure the local physical interface via a service instance. 5. Bind the service instance to the VSI. The virtual switch will at this point have both virtual interfaces (step 3) and physical interfaces (step 4) associated to it. 6. Verify the configuration.

Step 1: Configure Basic MPLS and LDP The first step is to configure basic MPLS and LDP. The configuration was explained in Chapter 9 and it thus not repeated here. Before VPLS is configured, it is assumed that the backbone infrastructure has been configured: ■ IP routing is configured with an IGP such as OSPF. ■ Loopback addresses are configured and are being advertised. ■ Basic MPLS and LDP are configured on all backbone facing interfaces. ■ Any target PE loopback IP addresses are reachable via an LSP. Ensure for example that PE1 is able to reach PE2 via a unique LSP and that PE2 can reach PE1 using a different LSP.

Step 2: Configure Global L2VPN Before configuring L2VPNs, ensure that the following has been completed: ■ LSR ID for the PE has been configured using the mpls

lsr-id

command

■ MPLS has been enabled on the backbone MPLS interface using the mpls command

enable

Once basic MPLS has been configured and tested, configure L2VPNs. You must enable L2VPN before configuring other L2VPN settings. This command applies to MPLS L2VPNs, VPLS and SPB. As shown in Figure 11-10, use the l2vpn enable command to enable L2VPN.


Use the undo l2vpn enable command to disable L2VPN. L2VPNs are disabled by default.

Figure 11.10: Step 2: Configure Global L2VPN

Syntax l2vpn enable undo l2vpn enable

To enable L2VPN, follow the steps in Table 11-2. Table 11.2: Steps to enable L2VPN Step

Command

Remarks

1. Enter system view. system-view 2. Enable L2VPN.

l2vpn enable By default, L2VPN is disabled.

Step 3: Configure the Virtual Switch Instance Overview The next step is to create a virtual switch instance (VSI). Local physical ports and remote virtual ports will then be bound to the VSI to create the virtual switch. The local physical ports are bound to the VSI using a service instance. The virtual ports are represented by pseudo wires (PWs) to remote PE devices, which are configured on the same VPLS Instance. The VSI is a local configuration object and has no member ports by default. Multiple control protocols are available for use with the VSI. This study guide focuses on the LDP Martini method. In Figure 11-11, a virtual switch instance is created for a data center inter connect (name dcic). The VSI name is locally significant.


Figure 11.11: Step 3: Configure the Virtual Switch Instance

Guidelines Use the vsi command to create a VSI and enter VSI view. If the specified VSI already exits, you enter the VSI view directly. Use the undo

vsi

command to remove a VSI.

You can create multiple LDP, BGP, and static PWs for a VSI.

Syntax vsi vsi-name undo vsi vsi-name vsi-name

Name of the VSI instance, a case-insensitive string of 1 to 31 characters. Hyphens (-) are not allowed.

Examples Create a VSI named vpls1 and enter VSI view. system-view [Sysname] vsi vpls1 [Sysname-vsi-vpls1]

Step 4: Configure the VSI Remote LDP Peers Overview In this step, the virtual interfaces are created by specifying the remote LDP PE peers and associated pseudo wire IDs. As mentioned previously, this study guide focuses on the LDP configuration method and hence the pwsignal is set to use LDP. Other valid methods to configure the pseudo wires include Kompella which uses MBGP and static configuration. The switch virtual ports are added to the virtual switch instance by specifying PW


IDs. The PW IDs could be the same for all peer connections within a VSI or use unique values. The PW ID values must match on both PE devices on either end of the L2VPN (peer command). The IP addresses specified in the peer commands are the loopback IP addresses of the remote PE devices and must be reachable via an MPLS LSP. In Figure 11-12, the peer command specifies the loopback IP address of PE2 and a pseudo wire ID of 1001. On PE2, a peer command would be configured with the IP address of PE1 and the same PW ID of 1001.

Figure 11.12: Step 4: Configure the VSI Remote LDP Peers

In this example, the same PW ID is used for the L2VPN to 10.0.0.3, but a different number could have been specified. On PE3, a peer command would also be configured with the IP address of PE1 and a matching PW ID of 1001.

pwsignal command Use the pwsignal command to specify the PW signaling protocol for VPLS to use, and enter VSI LDP view (Martini mode) or VSI BGP view (Kompella mode). pwsignal { bgp | ldp } bgp

Specifies to use BGP signaling (Kompella mode). ldp

Specifies to use LDP signaling (Martini mode).

Examples Specify that VPLS instance aaa uses the connection mode of Martini and enter VSI LDP view.


system-view [Sysname] vsi aaa [Sysname-vsi-aaa] pwsignal ldp [Sysname-vsi-aaa-ldp]

peer command (VSI LDP view) Use the peer command to create a peer PE for a VPLS instance. Use the undo peer command to remove a peer PE. With the hub-spoke feature for a VPLS instance, you can specify the connection mode of the peer PE as hub or spoke. peer ip-address [ { hub | spoke } | pw-class class-name | [ pw-id pw-id ] [ upe | backup-peer ip-address [ backup-pw-id pw-id ] ] ] undo peer ip-address ip-address

IP address of the remote VPLS peer PE. hub

Specifies the peer PE as the hub. spoke

Specifies the peer PE as a spoke. This is the default when the hub-spoke feature is enabled for the instance. pw-class class-name

References a PW class template. class-name represents the template name, a case-insensitive string of 1 to 19 characters. pw-id pw-id

ID of the PW to the VPLS peer PE, in the range of 1 to 4294967295. upe

Specifies that the peer PE is a UPE in the H-VPLS model. backup-peer ip-address

Specifies the IP address of the backup NPE. If you specify this parameter, you create a primary NPE and a backup NPE on the UPE.


backup-pw-id pw-id

Specifies the ID of the PW to the backup NPE. The pw-id argument is in the range of 1 to 4294967295, and the default is the VSI ID.

Examples Create a peer PE, which is of the UPE type, with the IP address of 4.4.4.4 and the PW ID of 200. system-view [Sysname] vsi aaa [Sysname-vsi-aaa] pwsignal ldp [Sysname-vsi-aaa-ldp] peer 4.4.4.4 pw-id 200 upe

Create a primary peer PE 1.1.1.1 and a backup peer PE 2.2.2.2, and set the PW ID to the primary peer to 300 and that to the backup peer to 400. system-view [Sysname] vsi aaa [Sysname-vsi-aaa] pwsignal ldp [Sysname-vsi-aaa-ldp] backup-pw-id 400

peer

1.1.1.1

pw-id

300

backup-peer

2.2.2.2

Step 5: Configure a Service Instance Overview In this step traffic arriving on the local physical interface is matched to a virtual service instance. The service instance is a locally significant number. This is configured on the PE device and applied to the customer facing interface. In Figure 11-13, service-instance 10 is applied to interface Ten Gigabit Ethernet 1/0/1. Instead of selecting all arriving traffic on the interface for transmission to other sites, a network administrator can specify that only certain customer traffic types are selected for transmission via VPLS. Like with L2VPNs, in the simplest scenario, all arriving traffic is selected for transmission. Other options include specifying an individual VLAN only or tagged traffic only. The encapsulation command is used to specify matched traffic. In Figure 11-13, VLAN 10 tagged traffic is selected for transmission.


Figure 11.13: Step 5: Configure a Service Instance

The service instance ID is a locally significant number and can be different at each VPLS site. The instance ID also does also not need to match the VLAN ID. Traffic from either physical interfaces or bridge aggregation interfaces can be matched to a VSI. If the PE device is configured as part of an IRF system, the IRF system can be configured with a bridge aggregation group on customer facing interfaces (which could also be an IRF system). Traffic selection can be based on the bridge aggregation rather than physical interfaces in this example. This provides additional redundancy at PE level.

Service-instance command Use the service-instance command to create a service instance and enter service instance view. Use the undo

service-instance

command to delete an existing service instance.

By default, no service instance is created. The service instances created on different Layer 2 Ethernet interfaces can have the same service instance ID. service-instance service-instance-id undo service-instance service-instance-id instance-id

Specifies the ID of the service instance, in the range of 1 to 4096.

Example On Layer 2 interface Gigabit Ethernet 3/0/3, create service instance 100 and enter its view. system-view [Sysname] interface GigabitEthernet 3/0/3 [Sysname-GigabitEthernet3/0/3] service-instance 100


[Sysname-GigabitEthernet3/0/3-srv100]

Encapsulation command (service instance view) Use the encapsulation command to configure a packet matching rule for the current service instance. Use the undo encapsulation command to remove the packet matching rule of the current service instance. By default, no packet matching rule is configured for a service instance. You can choose only one of the following match criteria for a service instance: ■ Match all incoming packets. ■ Match incoming packets with any VLAN ID or no VLAN ID. ■ Match incoming packets with a specific VLAN ID. The match criteria for different service instances configured on an interface must be different. You can create multiple service instances on a Layer 2 Ethernet interface, but only one service instance can use the default match criteria (encapsulation default) to match packets that do not match any other service instance. If only one service instance is configured on an interface and the service instance uses the default match criteria, all packets received on the interface match the default match criteria. This command cannot be executed multiple times for a service instance. Removing the match criteria for a service instance also removes the association between the service instance and the VSI.

Syntax encapsulation default encapsulation { tagged | untagged } encapsulation s-vid vlan-id [ only-tagged ] undo encapsulation default

Specifies the default match criteria. s-vid vlan-id


Matches packets with a specific outer VLAN ID. The vlan-id argument specifies a VLAN ID in the range of 1 to 4094. only-tagged

Matches only tagged packets. If this keyword is not specified when the matching VLAN is the default VLAN, packets with the default VLAN ID or without any VLAN ID are all matched. If this keyword is specified when the matching VLAN is the default VLAN, only packets with the default VLAN ID are matched tagged

Matches tagged packets. untagged

Matches untagged packets.

Example Configure service instance 1 on Layer 2 Ethernet interface Ten-GigabitEthernet 1/0/1 to match packets that have an outer VLAN ID of 111. system-view [Sysname] interface ten-gigabitethernet 1/0/1 [Sysname-Ten-GigabitEthernet1/0/1] service-instance 1 [Sysname-Ten-GigabitEthernet1/0/1-srv1] encapsulation s-vid 111

Step 6: Bind the Service Instance to the VSI Overview Once a service instance has been defined, the service instance can be bound to a virtual switch instance (VSI). The cross connect object is not a global object within VPLS, but is configured within the service instance. In Figure 11-14, any traffic arriving from VLAN 10 will be cross connected to a VSI dcic (data center interconnect). VLAN 10 traffic was previously associated with service instance 10 in Figure 11-13. This is how a new physical port is added to the virtual switch. The VSI will now learn MAC addresses as they arrive from VLAN 10 on interface ten-gigabitethernet1/0/1.


Figure 11.14: Step 6: Bind the Service Instance to the VSI

A network administrator could configure another service instance with a different number such as 11 on the same interface. The new service instance could be configured to match VLAN 11 traffic arriving on the same interface, but cross connect that traffic to a different VSI. This also means that the network administrator could configure a service instance 11 which is looking at a for instance at a serve all traffic coming in with service VLAN ID 11 and the traffic which is marked with service ID VLAN 11 will be bind to different virtual switch instance.

Guidelines Use the xconnect a VSI. Use the undo

vsi

command to bind a Layer 3 interface or a service instance to

xconnect vsi

command to remove the binding.

By default, a service instance is not bound to any VSI. After you bind a Layer 3 interface to a VSI, packets received from the interface are forwarded according to the MAC address table of the VSI. After you bind a service instance on a Layer 2 interface to a VSI, packets received from the interface and matching the service instance are forwarded according to the MAC address table of the VSI. The access mode determines how the PE considers the VLAN tag in Ethernet frames received from the AC and how the PE forwards Ethernet frames to the AC. ■ VLAN access mode—Ethernet frames received from the AC must carry a VLAN tag in the Ethernet header. The PE considers the VLAN tag as a P-tag assigned by the service provider. Ethernet frames sent to the AC must also carry the P-tag. ■ Ethernet access mode—If Ethernet frames from the AC have a VLAN tag in the header, the PE considers it as a U-tag and ignores it. Ethernet frames sent to the AC do not carry the P-tag. Before you configure this command for a service instance, make sure you have


configured match criteria for the service instance by using the encapsulation command. The xconnect vsi command is available for service instances with the ID in the range of 1 to 4094.

Syntax xconnect vsi vsi-name [access-mode{ethernet|vlan} | {hub|spoke }] undo xconnect vsi vsi-name

Name of a VPLS instance, a case-insensitive string of 1 to 31 characters. access-mode

Specifies the AC access mode. By default, the access mode is VLAN. ethernet

Specifies the access mode as Ethernet. vlan

Specifies the access mode as VLAN.

Example Configure service instance 200 on Layer 2 interface Ten-GigabitEthernet 1/0/1 to match packets with an outer VLAN tag of 200, and bind the service instance to the VSI vpn1. system-view [Sysname] vsi vpn1 [Sysname-vsi-vpn1] quit [Sysname] interface ten-gigabitethernet 1/0/1 [Sysname-Ten-GigabitEthernet1/0/1] service-instance 200 [Sysname-Ten-GigabitEthernet1/0/1-srv200] encapsulation s-vid 200 [Sysname-Ten-GigabitEthernet1/0/1-srv200] xconnect vsi vpn1

Step 7: Verify Overview ******ebook converter DEMO Watermarks*******

Display commands can be used to verify VPLS configuration. These commands are used on the PE devices and not the P or CE devices. The display l2vpn pw verbose command is used to verify the L2VPN status of the virtual switch instance (VSI). In Figure 11-15 the VSI used is dcic (data center interconnect) and two L2VPN connections are configured, one to PE 10.0.0.2 and another to PE 10.0.0.3.

Figure 11.15: Step 7: Verify

Both Link ID 8 (Peer 10.0.0.2) and Link ID 9 (Peer 10.0.0.3) currently have status of up (State: Up). LDP detects the loss of a connection to a remote PE peer by using LDP keep-alives. The state of the PW would change to Down for a lost connection. All MAC addresses that have been learned from that remote peer are flushed as the virtual interface is down.

Syntax Use the display l2vpn pw command to display L2VPN PW information. Without ldp and static, this command displays both LDP PW and static PW information. display l2vpn pw [ vsi vsi-name ] [ protocol { bgp | ldp | static } ] [ verbose ] vsi vsi-name

Displays L2VPN PW information for the VSI specified by its name, a case-sensitive string of 1 to 31 characters. If no VSI is specified, the command displays L2VPN PW information for all VSIs.


protocol

Specifies a signaling protocol. If no protocol is specified, this command displays PWs created by all protocols. bgp

Displays BGP PW information. ldp

Displays LDP PW information, including PWs for FEC 128 (LDP PWs) and FEC 129 (BGP auto-discovery LDP PWs). static

Displays static PW information. verbose

Displays detailed information. Without this keyword, the command displays brief information.

display l2vpn pw Display brief information about all PWs.

See Table 11-3 for the output description. Table 11.3: display l2vpn pw output description


Step

Description This field displays

PW • The PW ID for an LDP PW (FEC 128) or a static PW ID/Rmt • "-" for a BGP auto-discovery LDP PW (FEC 129) Site • The remote site ID for a BGP PW. Proto

Protocol used to establish the PW: LDP, Static, or BGP. PW flag: • M—Primary PW. • B—Backup PW.

Flag

• H—The PW is the hub link in the VPLS hub-spoke network. This value is not supported in the current software version and is reserved for future support. • S—The PW is a spoke link in the VPLS hub-spoke network. This value is not supported in the current software version and is reserved for future support. • N—Split horizon forwarding is disabled.

Link ID

Link ID of the PW in the VSI.

State

PW state: Up, Down, Blocked, or BFD Defect. Blocked indicates that the PW is blocked. BFD Defect indicates BFD has detected a defect on the PW.

display l2vpn pw verbose Display detailed information about all L2VPN PWs. display l2vpn pw verbose VSI Name: aaa Peer: 2.2.2.9 Remote Site: 2 Signaling Protocol : BGP Link ID : 9 PW State : Up In Label : 131120 Out Label: 131119 MTU : 1500 PW Attributes : Main VCCV CC : VCCV BFD : -


Tunnel Group ID : 0x1800000960000000 Tunnel NHLFE IDs : 138 Peer: 3.3.3.9 Remote Site: 3 Signaling Protocol : BGP Link ID : 10 PW State : Up In Label : 131121 Out Label: 131181 MTU : 1500 PW Attributes : Main VCCV CC : VCCV BFD : Tunnel Group ID : 0x1800000160000001 Tunnel NHLFE IDs : 130 VSI Name: bbb Peer: 2.2.2.9 VPLS ID: 100:100 Signaling Protocol : LDP Link ID : 8 PW State : Up In Label : 131153 Out Label: 131153 MTU : 1500 PW Attributes : Main VCCV CC : VCCV BFD : Tunnel Group ID : 0x1800000960000000 Tunnel NHLFE IDs : 138

See Table 11-4 for the output description: Table 11.4: Display l2vpn pw verbose output description Field

Description

Peer Link ID

IP address of the peer PE to which the PW is destined. Link ID of the PW in the VSI. PW state: Up, Down, Blocked or BFD Detect. PW State Blocked indicates that the PW is blocked. BFD Defect indicates BFD has detected a defect on the PW. Wait time to switch traffic from the backup PW to the primary PW when Wait to the primary PW recovers, in seconds. If the switchover is disabled, this Restore field displays Infinite. Time This field is available when both primary and backup PW exist, and is displayed only for the primary PW. Remaining Remaining wait time for traffic switchover, in seconds. This field is


Time MTU

displayed after the switchover wait timer is started. Negotiated MTU of the PW. PW attribute: • Main—The PW is the primary PW. • Backup—The PW is the backup PW.

• Hub link—The PW is the hub link in the VPLS hub-spoke network. PW This value is not supported in the current software version and is Attributes reserved for future support. • Spoke link—The PW is a spoke link in the VPLS hub-spoke network. This value is not supported in the current software version and is reserved for future support. • No-split-horizon—Split horizon forwarding is disabled. VCCV CC type: • Control-Word—Control word. VCCV CC

• Router-Alert—MPLS router alert label. • TTL—TTL timeout.

VCCV BFD Tunnel Group ID Tunnel NHLFE IDs

VCCV BFD type: Fault Detection with BFD—BFD packets use IP/UDP encapsulation (with IP/UDP Headers). Fault Detection with Raw-BFD—BFD packets use PW-ACH encapsulation (without IP/UDP Headers). ID of the tunnel group for the PW. NHLFE ID of the public tunnel that carries the PW. If equal-cost tunnels are available, this field displays multiple NIDs. If no tunnel is available, this field displays None.

VPLS ID ID of the VPLS instance. Remote ID of the remote site. Site

Step 7: Verify (continued) Use display l2vpn mac-address to display MAC address table information for VSIs, as shown in Figure 11-16. This displays MAC addresses that the PE has learnt from


local interfaces and from remote sites via virtual interfaces.

Figure 11.16: Step 7: Verify (continued)

Syntax display l2vpn mac-address [vsi vsi-name] [dynamic] [count] vsi vsi-name

Displays MAC address table information for the VSI specified by its name, a case-sensitive string of 1 to 31 characters. If no VSI is specified, the command displays MAC address table information for all VSIs. dynamic

Displays dynamically generated MAC address entries. If this keyword is not specified, the command displays all types of MAC address entries. Currently, the device supports only dynamic MAC address entries. count

Displays the number of the MAC address entries. If you do not specify this keyword, the command displays detailed information about the MAC address entries.

Example Display MAC address table information for all VSIs. display l2vpn mac-address MAC Address

State

VSI Name

Link ID

Aging Time

0000-0000-000a

dynamic

vpn1

1

Aging

0000-0000-0009

dynamic

vpn1

2

Aging

--- 2 MAC address(es) found ---

Display the total number of MAC address entries of all VSIs.


display l2vpn mac-address count 2 MAC address(es) found

See Table 11-5 for the output description. Table 11.5: Display l2vpn mac-address output description Field State Link ID Aging Time XX MAC address(es) found

Description MAC address type. Currently, the MAC address type can only be dynamic, which indicates that the MAC address is dynamically learned. Outgoing link ID of the MAC address entry. It is the link ID of the AC or PW in the VSI. Indicates whether the MAC address entry will be aged. Total number of MAC address entries of the VSI.

Summary In this chapter you learned about VPLS, which is an extension of L2VPNs. VPLS supports point-to-multipoint connections, whereas L2VPNs only support point-topoint connections. A VPLS network is perceived to be a large layer 2 switch by CE devices. Multiple implementation methods were discussed include static configuration and the use dynamic protocols such as LDP and MBGP. The chapter focused on the Martini method using extended LDP. You learned how MAC addresses are learnt, aged and withdrawn by the virtual switch instances (VSI). Flooding and forwarding of unicast, broadcast and multicast traffic was discussed. You learned about VPLS loop prevention mechanisms using a full mesh of PWs and split horizon. The configuration and verification of VPLS were also discussed.

Learning Check ******ebook converter DEMO Watermarks*******

Answer each of the questions below. 1. Which mechanism prevents loops in a VPLS core network? a. STP b. Split horizon c. TTL d. Partial mesh e. OSPF 2. Which of the following is a bidirectional virtual connection between two PEs? a. LSP b. VSI c. PW d. NPE e. VPLS instance 3. Which of the following is not permitted in VPLS? a. Broadcast packet received on PW 1 transmitted out of AC. b. Broadcast packet received on AC transmitted out of PW 1. c. Broadcast packet received on AC transmitted out of PW 1 and PW 2. d. Broadcast packet received on PW 1 transmitted out of PW 2.

Learning Check Answers 1. b 2. c 3. d


12 Data Center Network Design

EXAM OBJECTIVES In this chapter, you learn to: ✓ Describe requirements for a datacenter network design. ✓ Describe different datacenter deployment models. ✓ Understand various data center technologies and their impact on a design. ✓ Describe the options for data center layers. ✓ Understand the HP FlexFabric portfolio.

INTRODUCTION This chapter provides an overview of data center design considerations. This includes relating various design philosophies and objectives to specific technologies for Layer 2 connectivity inside a data center, data center interconnects, Layer services, storage protocols, and overlay technologies.

Key Drivers for a New Data Center Infrastructure When looking at data center topologies, there are several key business and technology drivers that affect data center design choices. Large scale data center consolidation is one such driver. As hosted solutions gain popularity, data centers continue to grow in size, demanding new levels of performance and scalability. Several smaller data centers are being consolidated into larger facilities to improve economies of scale.


Multiple organizations are hosting business-critical applications in these large data centers, and the expectation is for extremely reliable, continuously operating services. Due to high service volumes, and pressures to minimize space, power, and heating requirements, many data centers will rely on blade servers to optimize server deployments. Server virtualization is a key element in all data center deployments. The ability to host multiple Virtual Machines inside a single physical blade server further reduces power, space, and cooling costs. These technologies can drastically increase the flexibility and ease of initial deployments. Migrating virtual machines to new physical servers are becoming much easier with tools like VMware’s vMotion and Microsoft’s Live Migration. New application deployment and delivery models are also driving data center design. Where formerly one server may have performed all functions for an application, there may now be a separate front-end server, a business logic server, and a back-end database server. In the single-server model, a client made a request, and a single server performed all functions to service that request and then generate responses. Now, a similar client request might be serviced by three servers, which must all communicate amongst themselves before responding to that client request. These new models create bandwidth intensive traffic flows, and require high-performance server-to-server communications. Meanwhile, Virtual Desktop Infrastructures (VDI) can concentrate many client environments into a few hosted systems. This further increases the need for scalable, reliable, high-performance data center infrastructure.

Data Center Deployment Models Figure 12-1 provides an overview of four different data center deployment models. Each model serves unique business requirements, with different characteristics and objectives. This will lead to different technical priorities, which can ultimately be met with a unique set of protocols, design choices, and implementation methods.


Figure 12-1: Data Center Deployment Models

These methods enable new cloud-based delivery models that drive a whole new set of technology requirements across servers, storage, and networking domains. These increasingly popular models let enterprises provision applications more flexibly within traditional internal infrastructures, and enable hosted application and service providers to build entire businesses based on delivering services via a public cloud model. Given the range of use cases and options, customers often deploy a combination of architectures to address varied requirements and to optimize operations.

HP FlexFabric Use Cases in the Data Center Figure 12-2 provides a high-level overview of how HP FlexFabric offerings can be used in modern data center deployments.


Figure 12-2: HP FlexFabric Use Cases in the Data Center

Traditional 2-Tier networks use the classic, hierarchical tree paradigm used for several decades. In this deployment, access switches like the 59xx-series are singleor dual-homed to one or more core switches, such as the FlexFabric 12x00-series. In a leaf/spine deployment, so-called leaf switches, such as the HP 59xx-series, form the access layer. This access layer is fully meshed to a fabric of spine switches, such as the FlexFabric 5930-series. Nearly everything is one hop away from everything else. More advanced protocols replace STP in this design, so none of the links are placed in a blocking state. What makes the real difference between traditional 2 tier and Spine/leaf? A list of Pros/Cons would be interesting Overlay technologies create a virtual network that is built on top of another network. The physical network could be a traditional 2-tier, or new leaf-spine designs. HP is developing Software Defined Networking (SDN), which combines many separately managed, physical network components into a single control plane. VXLAN is an overlay technology that allows multiple clients to share a single data center infrastructure. Layer 2 frames are encapsulated into an IP-datagram with a 24bit VXLAN ID, and routed over any Layer 3 infrastructure. Each of over 16 million clients can have up to 4095 VLANs. HP’s advanced, 1-tier blade systems greatly simplify a virtualized network deployment by incorporating traditional Top-of-Rack (ToR) switches inside the blade enclosure. This could be in the form of a Virtual Connect module, or a 6125XLG, which can be connected directly to the core, or to End-of-Row (EoR) Chassis


devices. These models are based on the HP Comware operating system, which provides zerocost licensing, IRF for redundancy, IMC management integration, and capability to be integrated into a software-defined fabric.

Requirements Overview Typical data center design requirements include the following: ■ Virtualization improves efficiency, agility, and resiliency of data center network operations ■ Multi-tenant support may not be a concern for private clouds, but is the main reason many data centers exist – the core business is to host the infrastructure for multiple organizations. ■ Multiple Data centers are often required, either to add additional capacity or for resiliency purposes. A natural disaster strikes a location in one continent, while the redundant data center in another continent may be unfazed. ■ WAN connections are required to connect each corporate site or tenant to data center facilities ■ Storage services are often centrally located for efficient server access and streamlined communications and manageability. ■

Compliance to stringent, generally accepted security best-practices and procedures is vital for any type of networking environment. This is especially true for multi-tenant data centers.

Impact of Requirements Each requirement impacts the design and deployment choices you make. These are summarized below: ■ Virtualization increases device density, which drives an increased need for bandwidth. These virtual machines should be able to be hosted on any physical device. This means that the associated Layer 2 domain or VLAN must be available to any physical server. These VLANs may likely need to span across physically separate data centers, especially to enable certain disaster recovery scenarios. ■ Multi-tenant support requires a single infrastructure to support and isolate multiple tenants. Hardware devices, software, and protocols must be deployed that can meet this requirement.


■ When multiple Data centers are required for scalability and redundancy, they need to be interconnected with links that support Layer 2 connectivity, while localizing the impact of faults. For example, broadcast storms and STP loop errors at one site should not affect operations at other sites. ■ WAN connections can be deployed using various technologies. The options used can depend on what services are required, and whether the customer or provider will manage the link. Perhaps one data center tenant my prefer a provider to provision and manage some type of L2VPN service, while another tenant may prefer to use some other technology. ■ Storage services are often deployed in data centers using some converged technology. Both iSCSI and FCoE require specific service handling over Ethernet systems. iSCSI requires special QoS configurations on the Ethernet fabric, while FCoE requires specific lossless Ethernet services to be deployed. Of course, the devices you deploy should be a part of a validated design, and support required QoS or lossless Ethernet features. ■ Security at the network edge challenged by virtualization. Specific technologies may be required to maintain insight into Virtual Machine communication. This is important for compliance monitoring and reporting.

Overview This section will discuss data center technologies, including the following: ■ Customer Service models ■ Layer 2 ■ Layer 3 ■ Multi-tenant ■ Data center interconnect ■ Data center WAN connectivity ■ Storage ■ Security

Data Center Customer Service Models Figure 12-3 provides a basic overview of three customer service models. A single tenant model can be implemented as a traditional enterprise solution – a single data center, using a single set of 4094 VLANs. A Layer 2 data center interconnect may be


required to access a single backup data center. A single Layer 3 routing service is normally sufficient for a single enterprise.

Figure 12-3: Data Center Customer Service Models

When a data center need only support a limited number of tenants, it might be possible to use a single data center VLAN space, allocating unique VLANs to each tenant. Another alternative is to allocate a separate set of 4095 VLANs to each of the tenants. In this scenario, each tenant’s Layer 2 DC interconnect should be isolated. A separate Layer 3 routing service should be provided per tenant, provided by the data center’s own core devices, or by separate data center routers. For large multi-tenant data centers, a single VLAN space might suffice, if each tenant only needs a few VLANs. A more flexible and scalable alternative is to provide each tenant with their own VLAN space of 4094 isolated VLANs. We will also need isolated Layer 2 DC interconnects and Layer 3 services for each tenant. This could be provided by a physical DC routing solution, or by deploying some type of network function virtualization.

Datacenter Customer Service Solutions Figure 12-4 shows specific technologies that can be used for certain deployment models. These technologies will be reviewed in the sections that follow.


Figure 12-4: Datacenter Customer Service Solutions

Intra Datacenter Layer 2 Services Overview There are many options to provide Layer 2 connectivity inside the data center. These include the following: ■ VLANs ■ VLANs with MDC ■ QinQ ■ VLANs with TRILL ■ TRILL with MDC ■ SPBM ■ SPBM with MDC The deployment model to be implemented is a key factor in selecting which Layer 2 option to use. This chapter will review these technologies and relate their suitability to different deployment models.

Layer 2: VLANs Classic VLANs use a traditional Layer 2 isolation model, where there is a separate MAC address table per VLAN. All HP data center switches provide support for the


standard 4094 concurrent VLANs. This can be sufficient for small to medium sized data centers, especially if they are privately owned, for a single tenant. The technology is simple and well-understood by the typical network administrator. IRF is often deployed with traditional VLANs to improve redundancy at the access and core layers. Link-aggregation is used between IRF systems for improved bandwidth utilization and resiliency. For improved scalability, VLANs can be deployed along with Multi-tenant Device Context (MDC), see Figure 12-5. This is suitable for a limited number of tenants, since current-generation devices support a maximum of nine MDCs. Since one of these is used for management, a maximum of eight tenants can be currently supported. Since each MDC has exclusive access to its own set of advanced ASICs in hardware, each tenant has a dedicated set of 4094 VLANS.

Figure 12-5: Layer 2: VLANs

Again, this solution is typically deployed in conjunction with IRF and linkaggregation for improved scalability and redundancy.

Layer 2: QinQ ******ebook converter DEMO Watermarks*******

QinQ provides a more scalable technology for multi-tenant support. Each customer has their own set of 4094 VLANs, using standard 802.1q tagging. An additional, outer 802.1q tag is added to the original tenant frame, which is used to move frames through a common data center infrastructure. This outer tag is a standard 802.1q tag, and so also supports 4094 unique backbone Service VLANs (S-VLANs), one for each tenant. Each S-VLAN can host an isolated set of 4094 customer VLANs (C-VLANs). For a data center that intends to host fewer than 4094 tenants, QinQ is a viable option that can be implemented with relative ease. All HP data centers switches support QinQ with 4094 concurrent VLANs. Typically QinQ is be deployed in combination with IRF at the server access and core or aggregation layers. Link-aggregation is used between the IRFs to further improve scalability and redundancy. Although QinQ provides complete isolation between customers, shared data center services can still be offered through the use of selective QinQ. This allows some provider VLANs to be visible in several tenant’s VLAN space. For example, CVLAN 4001 can be assigned a unique, outer S-VLAN tag. This S-VLAN tag could be associated with a tenant in order to provide some type of back-up service that is deployed on the provider’s VLAN 4001. All HP data center switches support selective QinQ.

Layer 2: TRILL TRILL currently supports a single set of 4094 VLANs, and so does not improve scalability over traditional VLANs in this regard. However, TRILL does provide significant improvements in frame delivery methods. IRF and link-aggregation provide a redundant active/active topology, but the use of STP means that most traffic traverses the core devices. This is not optimal, especially for communications between two access-layer switches. TRILL uses a shortest path unicast delivery method that optimizes frame paths. TRILL is also more efficient in its method of handling traffic over multiple equal cost paths. This multi-device, multi-path load-balancing is based on an IP hashing algorithm. For example, one hundred servers near one edge of the data center are in VLAN 10. They communicate with one hundred servers at the other edge of the data center, also in VLAN10. This layer 2 traffic can traverse multiple paths inside the TRILL fabric, based on source/destination IP address pairs. TRILL is often best when combined with the MDC feature, especially considering that TRILL devices only provide Layer 2 services. Any Layer 3 service must be


connected through a TRILL access port. With MDC, one context can be used to host TRILL fabric layer 2 services, while another can be used for the layer 3 IP service. The alternative is to use separate devices for each service function.

Layer 2: SPBM SPBM uses an I-SID number to uniquely identify up to 16 million tenants, each with its own set of 4094 VLANs. Like TRILL, SPBM uses an efficient, shortest path unicast delivery method. Unlike TRILL’s IP hash-based load sharing method, SPBM uses a deterministic, configuration-based multi-path technique. SPBM only provides Layer 2 services. Layer 3 service must be connected through an SPBM VSI Service instance. One option is to use one MDC to host the SPBM fabric, while another MDC is used for Layer 3 services. A back-to-back cable can be used to connect the line cards from the SPBM fabric to the line cards of the layer 3 MDC.

Data Center Layer 3 Services Overview Several technologies are available to provide Layer 3 IP services, including the following: ■ IRF ■ VRRP ■ MCE ■ MDC ■ QinQ Sub-interfaces ■ NFV ■ Path optimization

Layer 3: IRF IRF, as shown in Figure 12-6, combines multiple physical devices into a single virtual device, managed as a single unit. IRF can leverage the data plane of both devices simultaneously to provide an active/active redundancy model. This means there is no slave fail-over time. The only downtime is due to link-failure. The actual service restore time will vary between 10 - 100 milliseconds, depending on IRF hardware.


Figure 12-6: Layer 3: IRF

IRF provides an active/standby control plane, so local and static routes have a hitless failover. This is because these routes need not be relearned by the standby unit. For dynamically learned routes, a graceful restart feature maintains peer relationships during fail-over. This feature is supported for most routing protocols, such as OSPF, BGB, and IS-IS. It is also supported for TRILL and SPBM. Some of the routing protocols support non-stop routing, in which full control-plane synchronization is maintained. With graceful restart, peer relationships are maintained while the new master builds a link-state database. Non-stop routing keeps the link-state database synchronized in real time. Therefore, the graceful restart feature is not required. IRF is the recommended Layer 3 model inside the data center.

Layer 3: VRRP VRRP is used between two or more independent Layer 3 switches or routers. Unlike IRF, the devices have independent control planes, and so continue to be individually configured as separate devices. VRRP uses an active/standby model. The master device acts as the active forwarder, while the backup device lays dormant, waiting to become an active forwarder should the master fail. By default, this failover takes about 3 seconds, as opposed to 100ms or less for IRF.


VRRP is the recommended layer 3 redundancy model to connect data centers, and is available on all HP data center routers and switches. The isolated control planes used by VRRF can be an advantage when connecting data centers. VRRP can be tuned to improve overall performance. One such method is to implement Bi-Directional Forwarding Detection (BFD). This allows the standby unit to monitor the status of its BFD session with the master, and immediately transition to a forwarding state. With BFD tuning, the 3-second failover delay can be reduced to somewhere between 100ms to 2 seconds, depending on hardware. When VRRF is configured to use BFD, typical failover time is less than 500ms. Another tuning feature is VRRP Hello Blocking. When hello packets are blocked between VRRP devices, both units will assume the master role, resulting in two masters. This creates a more desirable active/active router topology. This would not be an acceptable condition inside a data center, since both routers would use the same VRRP virtual MAC address. This would create a conflict and cause a MAC-flapping condition for attached data center switches. However, for connectivity between data centers, this is not an issue. This is because we can ensure that the conflicting MAC address will not be learned on the link to the remote data center. Data center interconnect technologies such as HP EVI will block them automatically. If some other interconnect is used, then Ethernet ACLs can be deployed.

Layer 3: Multi-Customer CE (MCE) In single-tenant environments, traditional Layer 3 routing might be sufficient. In such a deployment, a single routing table is used. All layer 3 VLAN interfaces will be serviced by this single routing table. This could possibly be sufficient for multi-tenant environments, since communication between different customer’s Layer 3 interfaces can be secured with ACLs. However, great diligence is required with this deployment. A simple ACL configuration error could expose one tenant’s traffic to others. Also, with dynamic routing exchange, it is possible for one customer to advertise a route that could conflict with another customer. For these reasons, isolated routing tables are preferred in a multi-tenant environment. The MCE feature, also known as VRF-Lite, can be applied to Comware devices to create VPN Instances, as shown in Figure 12-7. This provides isolated routing tables for tenants, perhaps created by different routing protocols.


Figure 12-7: Layer 3: Multi-Customer CE (MCE)

Data center administrators can maintain control over all tenant routing functions. Each MCE’s routing instance can have routing limits applied, and all MCE’s can be managed by a single administrator or team. The MCE feature is supported by most HP Data Center Switches.

Layer 3: Multi-tenant Device Context (MDC) MDC is a technology that can partition a physical device or IRF fabric into multiple logical switches called MDC’s. For a traditional customer deployment, a single admin MDC can host Multiple L3 MCE instances. Hundreds of customers can be supported, each with their own IP routing table. All these IP VPN Instances share the same underlying hardware ASICs, with a single management interface. If the design requires more isolation, multiple MDCs can be used. Each MDC will have its own set of L3 MCE VPN instances, bound to its own set of dedicated hardware resources. Each MDC can be separately managed by different administrative groups, as shown in Figure 12-8. This requires dedicated line cards or interfaces on the core devices.


Figure 12-8: Layer 3: Multi-tenant Device Context (MDC)

Up to eight customer MDCs can be created, depending on the chassis-based switch model deployed.

Layer 3: Traditional Sub-interfaces With a traditional, single-tenant routing design, a physical routed interface can be created on a switch by disabling the port’s Layer 2 bridge function. The IP address is configured directly on the interface, so there is no need to create a virtual VLAN interface on the switch. Additionally, this physical interface can be logically divided into sub-interfaces, with an IP address assigned to each one. In this scenario, ingress packets will be serviced by the appropriate Layer 3 sub-interface, based on traditional 802.1q VLAN


tagging. These VLAN tags are locally significant. For example, the sub-interface configured to service frames with an 802.1q tag for VLAN 10 will route this traffic. It will not be bridged at Layer 2, since that functionality is disabled on the interface. Sub-interfaces (like the one servicing VLAN 10) simply perform normal Layer 3 routing. As such, the Layer 2 frame, including the original tag, is removed. The destination IP address is compared to the route table, and the best-path egress interface is selected. This could be another sub-interface, perhaps servicing VLAN 20. A new frame is added, with appropriate 802.1q tag, and the frame is transmitted. In this way, a single physical interface can be used to route traffic between multiple VLANs. In a multi-tenant environment, one physical interface might route between the ten VLANs in use by a single customer, with a sub-interface for each VLAN. Each of these sub-interfaces can be assigned to that customer’s IP VPN Instance, and so becomes a part of that client’s isolated routing table. In this scenario, physical interface functionality can be extended by using QinQ sub-interfaces.

Layer 3: QinQ Sub-interfaces In a multi-tenant, QinQ-based environment, each customer has their own set of 4094 VLANS that use an inner 802.1q C-VLAN tag. Each customer is assigned a unique SVLAN tag, which is added as an outer tag to maintain tenant separation over a common fabric. If a traditional sub-interface receives a QinQ frame, it would only consider the outer S-VLAN tag to service inbound frames. For this reason, a separate physical interface would be required for each customer. However, a routed QinQ sub-interface is designed to interpret both the inner and outer tags. It can parse the S-VLAN tag to maintain tenant separation, and interpret inner C-VLAN tags to route for each customer VLAN space. This means that a router with a single 10Gbps interface can provide inter-VLAN routing functions for multiple customers. Besides this additional functionality, QinQ sub-interfaces are very similar to traditional sub-interfaces. It is a single physical interface with its Layer 2 bridging function disabled, thus enabling it to operate as a Layer 3 routed interface. This interface is then logically divided into sub-interfaces. Each sub-interface has unique IP addressing assigned.


Layer 3: Network Function Virtualization (NVF) Network Function Virtualization (NVF) provides each tenant with a dedicated Layer 3 routing service, from within a hypervisor environment, see Figure 12-9. For example, an HP Virtual Service Router (VSR) runs inside a VMWare ESXi and KVM. Support for other hypervisor platforms are expected with future releases.

Figure 12-9: Layer 3: Network Function Virtualization (NVF)

The HP VSR provides a dedicated Layer 3 service that can be managed by the tenant, or by the data center provider. It supports a broad Layer 3 feature set, including VRRP, OSPF, MCE, and MPLS. Different performance levels are available, since it can be deployed in a version that uses a single vCPU, or 4 vCPUs. Since it runs as a VM, two or more vNICs can be assigned. These can be connected to a traditional customer VLAN or to a customer VXLAN, which would be terminated by the ESX host. In this way, the HP VSR can be used to route between VXLAN services and traditional VLANs.

Layer 3: Remote Site IP Path to Data Centers ******ebook converter DEMO Watermarks*******

Now that various Layer 2 and Layer 3 technologies have been reviewed, it is appropriate to discuss connectivity between remote sites. Solution redundancy and administrative flexibility are enhanced by extending a VLAN across two data centers. This is because any VM on that VLAN can be located and easily moved to either of the two sites. Meanwhile, user sites will typically have a Layer 3 IP routed path to each data center. This scenario can lead to a condition known as traffic Tromboning.

Tromboning In the scenario in Figure 12-10, the user site is serviced by the router at the top of the figure, at IP address 10.3.1.1. Two redundant data centers are used, called DC1 and DC2. A Layer 2 service connects the two data centers. The DC routers are configured to use VRRP. The VRRP Primary router is at DC1, while the backup is at DC2.

Figure 12-10: Tromboning

User traffic for a server VM arrives at DC1, via the IP cloud. DC1 is aware that the target server is actually at DC2. The traffic is therefore sent from DC1 to DC2 via the Layer 2 interconnect circuit. The server receives the user request and replies. This reply is via its default gateway, which is across the Layer 2 circuit at DC1. Since the target VM is hosted at DC2, while the active VRRP router is at DC1, there is a sub-optimal routing


condition. The traffic from the client doesn’t directly go to DC2. Instead the traffic initially arrives at DC1, and is then extended (like a trombone slide) out over to DC2. The return traffic must allow follow this same sub-optimal path back to the client. This not only affects remote communication but also local communication. Any routed communication between two servers in DC2 must traverse the DC interconnect to be routed by the default gateway at DC1.

Layer 3: Path Optimization using VRRP This sub-optimal path issue can be improved by filtering VRRP hello packets, either automatically by using EVI, or by manually configuring an Ethernet ACL. The two VRRP devices can’t see each other, so they both assume the VRRP master role. Each data center now has a local routing service, so intra-DC traffic will always be routed locally. Although the request from end-user to VM server may still take a sub-optimal path, the return traffic will be forwarded directly by DC2’s local router. Although this is an improvement, the issue of asymmetric routing remains. It is not a best practice for return traffic to use a path different than the request traffic.

Layer 3: VRRP Path Optimization Scenario Figure 12-11 reveals how VRRP hello filtering can be used to improve packet flow. There is still a VRRP primary at DC1, and the client may still use DC1 as its path to reach a server VM that is actually located at DC2.


Figure 12-11: Layer 3: VRRP Path Optimization Scenario

However, when you filter VRRP hello packets, the router at DC2 also becomes a VRRP Primary device. The return traffic is therefore optimized, as is traffic among VM’s at DC2. Again, the asymmetric routing issue still remains. As you can see the user traffic and server reply traffic are using different paths.

Layer 3: Path Optimization with a Load Balancer In the scenario described in Figure 12-11, the target server’s IP subnet is announced by some routing protocol, such as OSPF, by both DC1 and DC2. The client site sees a lower cost to reach this subnet via DC1, and so that is the path used for all traffic toward that subnet. As you have seen, this lowest-cost model can lead to sub-optimal paths when a VLAN is extended across data centers. This behavior can be optimized using a WAN load balancer such as the F5 BIG-IP product.

Layer 3: Load Balancer Path Optimization Scenario In the scenario in Figure 12-12, traffic from clients is sent to a Big-IP load balancer.


The load balancer determines the target’s location. The load balancers will adjust client perception and routing to ensure that optimal paths are achieved. Clients connect based on the DNS names, and the Big-IP devices at each site will dynamically update information to maintain optimal, symmetric paths.

Figure 12-12: Layer 3: Load Balancer Path Optimization Scenario

Data Center Interconnect (DCI) Overview Data center designs can leverage any of several technologies for interconnections, as shown in Figure 12-13. This includes Multi-site IRF, IRF with Link Aggregation, EVI, L2VPN/VPLS and SPBM.


Figure 12-13: Data Center Interconnect (DCI) Overview

DCI: Multi-Site IRF Multi-site IRF (or Geo-IRF) creates a single logical entity using physical devices at different sites. Dedicated fiber links are required for this solution. You cannot use any type of engineered linked between IRF members. One important design considerations is that each data center should be an isolated failure domain. With multi-site IRF, an issue inside the IRF control plane would impact both data center locations. Also, complex split-brain scenarios can result from this deployment. This scenario can also complicate the firmware update process. The impact at both data centers could require local devices to have connectivity to remote IRF members. You can see this in Figure 12-14, where each local IRF is dual homed to its local Geo-IRF member and the IRF member at the remote data center. For these reasons, it is generally best to avoid using a single IRF system over multiple sites.


Figure 12-14: DCI: Multi-Site IRF

DCI: Multiple IRF Systems with LinkAggregation You can deploy multiple IRF systems with link aggregation. Each data center core is comprised of an IRF system, and these IRF systems can be interconnected using traditional link aggregation. There should be at least two or more links connecting the data centers. Since each site operates an independent IRF group, these can be traditional 1Gbps connections, 10Gbps connections, direct fiber links, some type of engineered circuit, or a service provider’s L2VPN solution. All the links are to be bundled using link aggregation so there are multiple active/active paths between the data centers. LACP can be used to negotiate and


control the aggregated connections. HP Data Center switches also have support for Ethernet Operations Administration Management (OAM). This can be used as a heartbeat for the aggregated links. OAM operates at Layer 2, in a similar way to BFD operation at Layer 3. If there is a failure inside an MPLS cloud between data centers, LACP could take up to 90 seconds to detect it. OAM can detect this condition in less than 500 milliseconds. A deployment that uses IRF and link aggregation is very easy to maintain and operate. Both switch and link failures are reliably and easily managed. This method is especially appropriate for a design that requires two data centers. If more than two data centers are required, some Layer 2 topology protocol must be used that can prevent loops and perform topology calculation between sites. IRF and link aggregation is available on all HP Data Center Switches.

DCI: IRF with Link-Aggregation Figure 12-15 shows an example of two data centers using IRF with link aggregation. Each data center has an IRF group. The IRF systems at each site are interconnected using multiple links. These links might be dedicated fiber optic cables, or a Dense Wave-Division Multiplex (DWDM) circuit. Whatever the connecting technology, all of the links are bundled into a single, logical connection using link aggregation.

Figure 12-15: DCI: IRF with Link-Aggregation

DCI: Ethernet Virtual Interconnect (EVI) For Ethernet Virtual Interconnect (EVI) each data center has local IRF systems which


can be interconnected by a Layer 3 network. The EVI protocol will transport local data center VLANs and interconnect up to eight data centers. Multiple tenants can be supported with multiple EVI networks. Network IDs can be used to isolate traffic over the EVI system. If a single VLAN space is used to host multiple tenants, you can allocate VLANs 100 - 199 for network ID1, and VLANs 200 - 299 for Network ID2, and so on. In this way, you allocate specific VLANs to each tenant, with separation maintained over the EVI network. You could also use a QinQ device in front of the EVI device to ensure full VLAN space isolation. Thus, all 4094 tenant C-VLANs are encapsulated in single S-VLAN by the QinQ device. These S-VLANs will be transported and managed by EVI. EVI provides multicast isolation, so VRRP hello blocking is automatic. EVI is currently available on 12500 and 12900 model HP Data Center switches. Do not combine the 12500 and 12900 into a single EVI deployment. These products use different EVI encapsulations and therefore are not compatible for EVI network deployments.

DCI: L2VPN / VPLS L2VPN and VPLS provide Layer 2 bridging services. You can use L2VPN to connect two data centers, or VPLS if more than two data centers must be connected. These technologies could be offered by a service provider, or self-deployed and managed by data center staff. When a service provider delivers these services, a simple Ethernet connection is provisioned at the local data center. Data center core devices can provide Layer 3 services over this connection. Alternatively, the data center team might choose to deploy their own service. This team must be aware that the WAN Edge devices are strictly Layer 2 devices. They only provide the L2VPN or VPLS service to core switches, and cannot provide any Layer 3 functions. The core switches must provide the routing service, just as with a provider-delivered service.

DCI: SPBM SPBM can be used to connect two or more data center sites with full multi-tenant support. However it does require a Layer 2 Ethernet connection between sites. This connection can be a direct fiber link or an L2VPN service.


One advantage to this approach is the availability of Layer 2 multi-path services. Layer 2 Ethernet traffic will take the shortest path to any of the several sites that may be connected. This pathing is purely a Layer 2 function. Some data center device must be configured to provide any required Layer 3 functionality. This could be in the form of a dedicated core device, a separate MDC, or an HP VSR, possibly in conjunction with third-party load balancers.

Storage (1) Several protocols should be considered in a data center design. These include NFS, iSCSI, FC, and FCoE. Network File System (NFS) binds separate storage structures into a single virtual drive, to provide file-level access. There are no significant design requirements for the deployment of NFS. You can use a dedicated VLAN and server NIC if you need guaranteed service for NFS. Another option is to let NFS share a NIC with other traffic, perhaps using QoS to mitigate delivery issues, if needed. iSCSI provides block-level access to storage systems. It is highly recommended to have a dedicated VLAN for the iSCSI infrastructure. It is also best to use a dedicated switch that was designed for storage traffic. For connectivity, both data and iSCSI traffic will share the top-of-rack switch, which will have dedicated uplinks to the Ethernet storage switch.

Storage (2) For FC and FCoE deployments, it is vital to consult HP’s validated design documentation. This ensures that you are using the right components, and that all components will interoperate. A common option for Fibre Channel is HP’s 5900CP. This switch provides full FC fabric services to interconnect server native FC HBAs, native FC storage systems, and HP virtual connect FC solutions. It can also provide Fibre Channel N-Port Virtualization gateways to existing HP ISS SAN switches, such as the H/B/C/ series SAN switches. The 5900CP can easily be added to service most existing native FC environments. Another Fibre Channel option is to use the HP Flex Fabric Blade system. This system provides FCoE internally to the blade server CNA, as well as an NPV gateway to traditional SAN switches.


For FCoE, the 5900CP can be used as the gateway between FCoE and the native Fibre Channel systems.

Data Center WAN Data center WANs require high-performance connectivity, often at speed of 10Gbps. Fine-grained QoS services are needed to prioritize and control traffic over the links. Dynamic VPN technologies ease deployments and improve scalability. For example, DVPN is an HP proprietary technology that offers easy multi-site IPSec VPN tunnel setup. Equipment should be able to perform high-speed encryption and hashing functions in support of these IPSec tunnels. These features are all provided by HP’s router portfolio.

Overlay Technologies (1) For data center designs that require an overlay technology, VXLAN should be used. VXLAN provisions virtual Layer 2 networks for hypervisor VMs. The VXLAN protocol is developed and promoted by VMware and other vendors. Initially, VXLAN exists only between VMware hypervisors as a virtual construct. Additional components are required for external Layer 3 VXLAN routing. For single-customer deployments the hardware VXLAN gateway function of the 5930 can be used for routing. You must bind the VXLAN to a VLAN on the physical interface with a service Instance. Multi-tenant infrastructures can also use the 5930’s hardware VXLAN gateway, using a transport VLAN to reach the tenant’s Layer 3 gateway. Another option is to deploy the HP VSR. You can configure multiple vNICs as routed interfaces for the VSR. Some of these routed ports will be bound to VXLAN to reach other VMs on the same VXLAN, while other ports will be bound to traditional VLANs. This provides connectivity between VXLAN and the classic IP network.

Overlay Technologies (2) The hardware-based 5930 VXLAN gateway requires VTEP tunnel setup to remote VTEP endpoints. VTEP endpoints are ESXi servers with online VMs in the VXLAN. This tunnel setup is automatically and dynamically orchestrated using the HP VAN SDN Controller, which interacts with the VMware NSX controller.


Server Access Options Considerations for server access options include storage access requirements. Server-to-storage communications can be handled with a dedicated FC network, a converged FCoE system, or via iSCSI. Optimized hypervisor support is another consideration. This can be achieved by supporting multiple paths for the hypervisor, management, vMotion, VM networks, and storage networks. This way, each role can be isolated to their own network paths. For RAID environment this can be provided by the Flex10 network cards. Another consideration is the need for deep integration with the hypervisor network. This can be achieved using EVB.

Overview of Data Center Layers This section provides an overview of data center access layers, as listed below: ■ Access Layer ■ Blade Server One-Tier design ■ 2-Tier design ■ 3-Tier design ■ Layer 2 Fabrics

Access Layer Switching: ToR Design Access Layer switching is done with Top-of-Rack switches. In a ToR design, servers connect to access switches via copper or fiber within the rack, while the access switch connects to backbone switches within the data center via multi-mode fiber (MMF). Often, the data center backbone consists of high-capacity aggregation or distribution switches with Layer 2/3 capabilities. Ethernet copper is generally restricted to relatively short runs inside the rack, while connections outside the rack can be made with smaller form factor multi-mode fiber. This reduces the overhead weight, as well as having the ability to be connected to different capacity interfaces to support the bandwidth requirements of the rack. Figure 12-16 lists advantages and disadvantages of this solution.


Figure 12-16: Access Layer Switching: ToR Design

Access Layer Switching: EoR Design Another option for deploying access layer switches is to use and End-of-Row (EoR) model. With this design, a rack containing the switching equipment is typically placed at either end of a row of cabinets or racks. Bundles of cabling provide the connectivity from each server rack to the switching equipment rack. Servers are usually connected to a patch panel inside each server rack. The copper or fiber bundles are home run to another patch panel in the rack containing the access switches. The switches are then connected to the patch panel via patch cables. EoR switches are typically connected back to the core with a series of fiber patch cables. EoR does not imply that the network racks have to be placed at the end of the row. There are designs where network switch racks are placed together in the middle of cabinet/rack rows. Placing the switch rack in the middle of a row limits the length of cables required to connect the furthest server racks to the nearest network rack. Unlike the ToR model, where each rack is treated as an independent module, in the EoR placement model, each row is treated as an independent module. Figure 12-17 lists advantages and disadvantages of this design.


Figure 12-17: Access Layer Switching: EoR Design

Blade Server One-Tier Design In a Blade server one-tier design, the blade enclosures include internal blade switches. As shown in Figure 12-18, these serve as ToR access switches that connect directly to the core, as would any ToR switch.

Figure 12-18: Blade Server One-Tier Design


This deployment signifies the current pinnacle of network virtualization. Server blades allow for substantial compute density per rack, row, and data center. HP has optimized the BladeSystem server portfolio to maximize virtualization capabilities. This solution simplifies high performance networking, while providing flexible VM networking and converged I/O options.

Blade Server One-Tier Design The network edge physically starts with the FlexConnect fabric, but really starts internally in the VMs and configured virtual switches. HP 12500 Switches are able to communicate natively with VMware virtual switches, allowing updates of ARP tables between physical and virtual switches, see Figure 12-19. The VMs and virtual switches can provision VLANs, which in turn interoperate with the IRF fabric, allowing seamless VM movement with vMotion and high performance frame forwarding.


Figure 12-19: Blade Server One-Tier Design

Combining LACP and IRF in this design provides high-speed link aggregation with re-convergence times at sub-50 ms in the event of a link failure. It also allows links to be aggregated and utilized for higher bandwidth from the converged network adapters across all switches to forward traffic. This design also supports the dynamic storage of worldwide names. This feature allows a FlexConnect module to be configured with IP and VLAN information once. If a device fails, the replaced FlexConnect device will gracefully provision itself with the original configuration.


This design supports long-range vMotion connectivity, enabling clustering or synchronizing VMs between sites. The switches allow VPLS connectivity between data center locations. Be aware that long-range fiber segments using MPLS can add latency that limits the ability for VMs to use converged I/O resources between data center locations. Long distance WAN networks generally address normal server communications and disaster recovery efforts.

Simplified Two-Tier Design Objectives Simplified two-tier design is very similar to the one-tier blade design. The difference is that instead of the FlexFabric Blade servers, traditional rack servers are used, as shown in Figure 12-20. They are connected to Top-of-Rack switch. This solution is best suited to designs that must use a mixture of rack and blade servers. This flatter, two-tier design is simpler, and introduces less latency than three-tier legacy models.

Figure 12-20: Simplified Two-Tier Design Objectives

Three-Tier Design Objectives A three-tier design introduces an aggregation layer between access and core devices. With this design, many ToR switches connect to a few aggregation layer switches. These aggregation layer switches are then dual homed to a redundant core, as shown in Figure 12-21.


Figure 12-21: Three-Tier Design Objectives

This type of design is suited to data center networks where added bandwidth, 10GbE port capacity, and simplified management are paramount. It also helps ensure the interoperability of legacy EoR and ToR switches. Although the depicted design focuses on HP switches, IRF permits the addition of third-party switches at any level, and will interoperate using standards-based networking.

Data Center Layer 2 Fabric The data center Layer 2 fabric needs to provide a standards-based solution for largescale deployments. For a 2-tier design, traditional IRF will typically be used, along with link aggregation. 3-tier designs would often use TRILL or SPBM. The TRILL solution provides a standards-based, shortest path algorithm for Layer 2 forwarding, with ECMP loadsharing. The limitation of a maximum 4094 VLANs should be considered. SPBM also uses a standards-based, shortest path algorithm. Its equal-cost loadsharing algorithm is more deterministic. Each tenant’s traffic can be configured to use a single best path. SPBM is far more scalable, with support for over 16 million tenants, each with its own unique I-SID assignment.

Example of an HP Standards-Based DC Fabric Figure 12-22 shows an example of an HP standards-based DC fabric. The ToR switches can be individual devices, or they could be deployed as an IRF system, interconnected to the spine switches. The access switches provide layer 2


functionality, with Layer 3 services provided at the spine.

Figure 12-22: Example of an HP Standards-Based DC Fabric

Variations on this theme are possible. These include: ■ Scale out with more spine switches for increased bandwidth: i.e. With 4 spine switches, 4 x 10GE is available instead of 2 x 10GE provided by 2 spine switches ■ Single IRF spine switch with multiple members for chassis HA: i.e. 4 chassis IRF spine ■ Multi-homed servers to multiple leaf switches ■ Layer 3 devices connected to leaf switches instead of spine

HP FlexFabric Core Switches Figure 12-23 reviews HP’s FlexFabric core switch portfolio.


Figure 12-23: HP FlexFabric Core Switches

HP FlexFabric Access Switches Figure 12-24 reviews HP’s FlexFabric access switch portfolio.

Figure 12-24: HP FlexFabric Access Switches

HP FlexFabric Routers Figure 12-25 reviews HP’s HSR router series.


Figure 12-25: HP FlexFabric Routers

IMC VAN Fabric Manager IMC VAN Fabric Manager simplifies the management of data center and SAN fabrics. It unifies the view of data center fabric components to enable quick troubleshooting and pro-active management. It provides a unified view of all of the network and storage devices in the data center fabric alongside the fabric health to enable quick troubleshooting and proactive management. It helps eliminate manual provisioning and allows you to easily configure Ethernet Virtual Interconnect (EVI), shortest path bridging (SPB) or transparent interconnect of lots of links (TRILL) through the same graphical user interface used to automate, monitor, and manage your entire network, as shown in Figure 12-26.


Figure 12-26: IMC VAN Fabric Manager

Summary In this chapter you learned: ■ Key drivers for new data center designs include large-scale consolidation, optimized blade server deployments, server virtualization technologies, and new application and delivery models. ■ Data center deployment models include traditional enterprise, traditional multitenant, and cloud computing designs. ■ Layer 2 data center solutions include traditional VLANs, VLANs with MDC, QinQ, VLANs with TRILL, TRILL and MDC, SPBM, and SPBM with MDC. ■ Data center interconnect solutions include IRF with link-aggregation, EVI, L2VPN/VPLS, and SPBM. ■ Layer 3 data center technologies include IRF, VRRP, MCE, MDC, QinQ subinterfaces, and the HP VSR. ■ Storage protocols include NFS, iSCSI, FC, and FCoE. ■ VXLAN can be used for data center designs that require an overlay technology to automatically provision Layer 2 networks for hypervisor VMs. ■ Data center designs include using ToR access solution, EoR access solution, a blade server one-tier design, 2-tier, 3-tier, and Layer 2 fabric designs.

Learning Check ******ebook converter DEMO Watermarks*******

Answer each of the questions below. 1. Name three use cases for the HP FlexFabric in the data center (Choose three)? a. IEEE-compliant route/switch. b. Leaf/spine. c. SDN Overlay d. Traditional 3-tier e. 1-tier blade systems. 2. Choose three solutions that are appropriate for large, multi-tenant data center solutions (Choose three). a. IRF b. Traditional VLANs c. EVI + QinQ d. MCE e. SPB 3. TRILL, QinQ, and SPBM with MDC all provide possible Layer 3 services for intra-data center connectivity a. True. b. False. 4. What are three possible Layer 3 solutions for data centers (Choose three)? a. QinQ. b. VRRP. c. NFV. d. SPBM. e. MCE. 5. What are three advantages of a ToR design (Choose three)? a. Issue isolation b. Traffic isolation c. Effect of physical disasters limited to ToR, instead of entire row d. Fewer switches to manage as separate entities e. Fewer rack-to-rack hops


Learning Check Answers 1. b, c, e 2. c, d, e 3. b 4. b, c, e 5. a, b, c


13 Practice Test INTRODUCTION This exam tests your skills and knowledge on how to deploy and implement the HP FlexFabric Data Center solutions. In this exam you will be tested on specific Data Center topics and technologies such as Multitenant Device Context (MDC), Datacenter Bridging (DCB), Multiprotocol Label Switching (MPLS), Fibre Channel over Ethernet (FCoE), Ethernet Virtual Interconnect (EVI), Multi-Customer Edge (MCE). The exam will also cover topics on high availability and redundancy such as Transparent Interconnection of Lots of Links (TRILL) and Shortest Path Bridging Mac-in-Mac mode (SPBM). This certification exam is designed for candidates with “on the job” experience. The associated training course, which includes numerous hands on lab activities, provides a foundation, but you are expected to have experience in the real world as well.

Exam Details The following are details about the exam: ■ Exam ID: HP2-Z34 ■ Number of items: 60 ■ Item types: Multiple choice (single response) ■ Exam time: 105 minutes ■ Passing score: 70%

HP2-Z34 Testing Objectives ■ 5 % Fundamental HP FlexFabric Data Center architectures and technologies • Describe common data center networking requirements and options for data


center architectures. ■ 5 % HP FlexFabric Data Center solutions, products, and warranty/service offerings • Explain how the HP FlexFabric portfolio, including switches, routers, and IMC modules, meets common data center needs. ■ 15 % HP FlexFabric Data Center solution planning and design • Plan how to use data center technologies (such as MDC, MPLS, MPLS Layer 2 VPNs, VPLS, MCE, TRILL, SPBM, DCB, FCoE, and EVI) for common data center use cases. • Explain the impact of various data center technologies on the network design. ■ 55 % HP FlexFabric Data Center solution implementation (install, configure, setup) • Implement various forms of virtualization on HP solutions, including HP MDC and MCE (VRF Lite). • Configure HP solutions to extend Layer 2 connectivity between and within sites (using appropriate technologies, such as MPLS Layer 2 VPNs, VPLS, SPBM, and HP EVI). • Configure HP solutions to support LAN/SAN convergence using technologies such as DCB and FCoE. ■ 15 % HP FlexFabric Data Center solution enhancement (performance-tune, optimize, upgrade) • Provide resiliency, efficiency, and load-balancing for data center infrastructure solutions. • Enhance QoS for storage traffic carried in the data center LAN. ■ 5 % HP FlexFabric Data Center solution troubleshooting, repair and replacement • Verify and troubleshoot the implementation of various data center technologies.

Test Preparation Questions and Answers The following questions will help you measure your understanding of the material presented in this study guide. Read all the choices carefully, as there may be more than one correct answer. Choose all correct answers for each question.


Questions 1. Refer to the exhibits.

Figure 13-1: Exhibit 1 for question 1


Switch 1 is configured as shown in the second exhibit. When traffic destined to servers on other switches arrives on interfaces Ten1/0/1 to Ten1/0/48, Switch 1 should send the traffic over the TRILL region. (Similarly, it should egress traffic from the TRILL region on those interfaces.) How should the administrator configure interfaces ten1/0/1 to ten1/0/48? a. Create a service instance on these interfaces; the service instance references a TRILL virtual switch instance (VSI). b. Configure these interfaces as TRILL trunk ports. c. Configure these interfaces as TRILL access ports.


d. Add VLAN 100 as a permitted VLAN on these interfaces. 2. A network administrator is configuring SPBM on an HP Comware switch. The administrator has enabled SPBM globally. What other feature must be enabled globally? a. Layer 2 VPNs b. LLDP c. LDP d. MPLS 3. Refer to the exhibit.


Figure 13-3: Exhibit for question 3

An administrator has configured Switch 2 to support Fibre Channel over Ethernet (FCoE) in Fibre Channel Forwarder (FCF) mode. Switch 1 is configured in N-Port Virtualization (NPV) mode. Switch 2 connects to Switch 1 on interface Ten-GigabitEthernet1/0/1, which carries Ethernet and FCoE traffic. What is the correct configuration for this interface? a. FC port in F mode b. FC port in NP mode c. bound to a VFC interface, which is in E mode d. bound to a VFC interface, which is in F mode


4. A switch port is using Application TLVs to communicate application priority information to a server Converged Network Adaptor (CNA). What is one valid criterion for selecting the application? a. IP protocol b. UDP source port c. TCP destination port d. IP source or destination IP address 5.

A network administrator has created three Multi-tenant Device Contexts (MDCs) on an HP Comware switch. How do VLANs, ACLs, and routes on one MDC affect other MDCs? a. MDCs are isolated and have their own hardware resources. The resources used by VLANs, ACLs, and routes on one MDC do not affect other MDCs. b.

MDCs are isolated and have their own control plane, but they share resources. The resources used by VLANs, ACLs, and routes on one MDC decrease the resources available for other MDCs.

c. MDCs are isolated at Layer 3, but they share a control plane. In addition to the resources used by one MDC affecting other MDCs, VLAN IDs cannot overlap. d. MDCs are isolated for management purposes, but they share a control plane. In addition to the resources used by one MDC affecting other MDCs, VLAN IDs, ACLs, and routes cannot overlap. 6. Refer to the exhibit.



The network devices shown in the exhibit are correctly set up to implement a VPLS Martini solution. The network administrator now wants to configure PE1, which is an HP switch running Comware 7, to connect to Site 1. Interface G1/0/1 supports several VLANs with different VLAN tags and does not have any service instances configure on it yet. All traffic that arrives on interface G1/0/1 should be forwarded in the same VSI for this VPLS solution. What is a valid setup on interface G1/0/1? a. The interface has one service instance. The encapsulation type is default, and the cross-connect group uses VLAN access mode. b. The interface has one service instance. The encapsulation type is untagged, and the cross-connect group uses Ethernet access mode. c.

The interface has one service instance for each VLAN. The encapsulation type is S-VLAN ID, and the cross-connect group uses VLAN access mode.

d.

The interface has one service instance for each VLAN. The encapsulation type is S-VLAN ID, and the cross-connect group uses Ethernet access mode.

7. Which type of interfaces can be assigned to a VPN instance? a. VLANs and physical interfaces operating in bridged-mode b. physical interfaces, operating in routed-mode or bridged-mode c. Layer 3 interfaces, including VLAN interfaces, physical interfaces, and loopback interfaces d. only VLAN interfaces 8.

A network administrator is configuring a VFC interface that carries Fibre Channel over Ethernet (FCoE). What are the requirements for configuring the VSAN in access or trunk mode on this interface? a. The VSAN must always be set in access mode. b. The VSAN must be set in trunk mode only when the VFC interface is assigned to more than one VSAN. c. The VSAN must be set in trunk mode only when the VSAN is associated with more than one VLAN. d. The VSAN must always be set in trunk mode.

9. Refer to the exhibits.




Switch 1 receives a broadcast on Ten1/0/1. On which of the TRILL trunk interface or interfaces does it send the broadcast? a. Forty1/0/49 only b. Forty1/0/49 and Forty1/0/51


c. Forty1/0/49 and Forty1/0/50 d. Forty1/0/51 only 10. A network administrator has set up LDP on an HP Comware switch. The administrator has not entered an lsp-trigger command. The switch is operating in Independent mode. What is the switch behavior for generating LSPs to advertise FECs? a. It does not generate any LSPs. b. It generates LSPs for any routes in is routing table that have a /32 prefix length. c. It generates LSPs for any directly connected routes in its routing table. d. It generates LSPs for all routes in its routing table. 11. Which of these HP switch series is designed for the data center core and supports technologies such as Ethernet Virtual Interconnect (EVI) and MultiTenant Device Context (MDC)? a. HP 5700 b. HP 7500 c. HP 10500 d. HP 12900 12. A switch interface has been configured to advertise Data Center Bridging Extensions (DCBX) Type Length Values (TLVs) and to implement priority flow control (PFC) in auto mode. What other step is required for PFC to work correctly on this interface? a. A QoS policy with a “dcbx” statement is applied as an inbound policy to the interface. b. LLDP MED TLVs are enabled on the interface. c. The priority flow no-drop queue is set to the 802.1p value associated with storage traffic. d. The QoS scheduling mechanism is set to strict priority (SP) queuing. 13. Refer to the exhibit.



Several HP switches are successfully implementing an SPBM solution in a multi-tenant data center. A network administrator is setting up an I-SID for a tenant VLAN. How does the administrator configure the SPBM switches to send this traffic over the backbone in B-VLAN 101? a. Specify 101 as the B-VLAN ID in the I-SID for this tenant VLAN. b. Set S-VLAN ID 101 for the encapsulation in the service instances for this I-SID. c. Set VLAN 101 as the PVID on the interfaces that receive the tenant traffic. d. Place VLAN 101 and the VLAN used by the tenant in the same MSTP instance. 14. An administrator has created Multi-device context (MDC) 1 on an HP Comware switch. The administrator then authorized MDC 1 to use interface module 1. What is the effect? a. All interfaces on interface module 1 are now assigned to MDC 1 and to no other MDC. b. MDC 1 can now send and receive traffic on all interfaces on the module, but other MDCs authorized for the module can as well. c. The administrator can now assign port groups on interface module 1 to MDC 1. d. MDC 1 can now exchange traffic with other MDCs that are authorized to use this interface module. 15. Refer to the exhibit.



The network administrator needs to implement a default router redundancy solution for the two data centers, which support the same VLANs and subnets. The administrator wants to reduce the amount of routed traffic that is sent between the data centers. Which solution best meet the needs? a. Switch 1 and Switch 2 implement VRRP. Switch 1 is the master, and Switch 2 is standby. b. Switch 1 and Switch 2 implement VRRP, and both switches act as masters. c. Switch 1 and Switch 2 are part of the same HP IRF virtual device. d. Switch 1 and Switch 2 implement graceful IS-IS restart. 16. Refer to the exhibits.




Switch 1, Switch 2, and Switch 3 are successfully implementing an Ethernet Virtual Interconnect (EVI) network. The network administrator wants to add Switch 4 and data center 4 to the EVI network. Which step helps Switch 4 to establish EVI links to the other switches? a. specifying 10.0.4.4 for an evi neighbor-discovery client command on Switch 1 and Switch 2 b. specifying 10.0.1.1 and 10.0.2.2 for evi neighbor-discovery client commands on Switch 4 c. adding 10.0.1.1, 10.0.2.2, and 10.0.3.3 as tunnel source addresses on Switch 4 d. enabling selective MAC flooding for the MAC address used by the EVI discovery protocol


17. Refer to the exhibits.



PE-1 and PE-3 cannot establish a remote LDP session. Which could help fix the problem? a. Enable Layer 2 VPNs on PE-1. b. Change the LDP mode to Independent on PE-1. c. Enable LDP on the loopback interface on PE-3.


d. Configure PE-3 to advertise its loopback interface address in OSPF. 18. A switch is part of an MPLS Layer 2 VPN Martini mode solution. When forwarding traffic over the VPN, it sends traffic with an inner MPLS label and an outer MPLS label. Which correctly describes the label ID in each label? a.

The inner label ID identifies the remote peer, and the outer label ID identifies the LSP for reaching the remote peer.

b. The inner label ID identifies the LSP for reaching the remote peer, and the inner label ID identifies the remote peer. c. The inner label ID identifies the VC or pseudo wire, and the outer label ID identifies the LSP for reaching the remote peer. d. The inner label ID identifies the LSP for reaching the remote peer, and the inner label ID identifies the VC or pseudo wire. 19. Refer to the exhibit.


Switch 1 is actually an IRF virtual switch with two members. The network administrator wants to ensure that the master can fail without causing disruption to the EVI solution. Which setting on Switch 1 helps to support this requirement? a. graceful restart for EVI IS-IS b. VRRP hello blocking c. BFD MAD for the IRF link d. ARP suppression for EVI 20. A switch is configured as part of a VPLS solution. How does the switch


maintain MAC forwarding tables for this solution? a. The switch maintains one MAC forwarding table for each VSI. b. The switch maintains one MAC forwarding table for each VLAN that is associated with a VSI in a service instance. c. The switch maintains one MAC forwarding table for each remote LDP peer and each VSI associated with that peer. d. The switch maintains one MAC forwarding table for each remote LDP peer, which might be part of several VSIs.

Answers 1. C is correct. If you want a TRILL switch port to ingress and egress non-TRILL traffic in and out of the TRILL domain, configure that port as a TRILL access port. A, B, and D are incorrect. A is incorrect because TRILL does not use service instances. B is incorrect because TRILL trunk ports are used to transmit TRILL traffic, not to encapsulate non-TRILL traffic for transmission. D is incorrect; a TRILL access port does not typically support the same VLAN as the TRILL trunk ports. 2. A is correct. The Layer 2 VPNs feature lets the switch support virtual switch instances (VSIs), which SPBM uses for I-SIDs. B, C, and D are incorrect. LLDP, LDP, and MPLS are not required for SPBM to work. 3. D is correct. An Ethernet interface that carries FCoE traffic must be bound to a VFC interface. Switch 1 operates in NPV mode, so it forwards FC logins to Switch 2 and acts like a node. Therefore, the correct mode for the Switch 2 VFC interface is F mode, which is a port that connects to a node. A, B, and C are incorrect. A and B are incorrect because the Ethernet interface needs to carry both Ethernet and FCoE traffic. Even if the transceiver supports FC-only mode, it should remain in Ethernet mode and be bound to a VFC interface. C is incorrect because the mode should not be E. (This would be the correct mode if Switch 1 operated in FCF mode.) 4. C is correct. The traffic classifier used in an App TLV QoS policy can select traffic by several criteria, including TCP destination port (as well as UDP destination port and Ethernet type).


A, B, and D are incorrect. If you specify any of these criteria in the traffic classifier, they do not take effect. 5. A is correct. MDCs are isolated and have their own hardware resources. The number of VLANs, ACLs, and routes on one MDC does not affect the number allowed on another MDC. B, C, and D are incorrect. B is incorrect because the MDCs do not share hardware resources (they have dedicated ASICs). C and D are incorrect because each MDC does have its own control plane. 6. A is correct. When you want traffic to be part of the same VSI, you use a single service instance for all of that traffic. The default encapsulation mode selects all traffic not selected by another instance. Because this interface has no service instances, it selects all traffic, as the scenario indicates that it should. Traffic has a VLAN tag, so VLAN access mode works. B, C, and D are incorrect. B is incorrect because it uses the untagged mode, which only selects untagged traffic. C and D are incorrect because they use multiple service instances. Using a different service instance for each VLAN ID is valid, but it places each instance in a different VSI. The scenario indicated that the traffic should be placed in the same VSI. 7. C is correct. Any Layer 3 interface can be assigned to a VPN instance. A, B, and D are incorrect. A and B are incorrect because VLANs (Layer 2) and bridge-mode (Layer 2) physical interfaces cannot be assigned to a VPN instance. D is incorrect. VLAN interfaces can be assigned to VPN instances, but other Layer 3 interfaces can be assigned to VPN instances as well. 8. D is correct. The VSAN must always be set in trunk mode for a VFC interface. The VFC interface sends FCoE traffic, which uses a VLAN tag for the VLAN associated with the VSAN, and the VSAN must be configured as tagged as well. A, B, and C are incorrect. They indicate that the VSAN can sometimes be set in access mode on this type of interface, which is not correct. C also indicates that a VSAN can be associated with more than one VLAN, which is not correct. 9. A is correct. Switch 2 will be elected the root bridge because it has the highest value for the tree-root priority. One multi-destination tree is being used (the default setting), and Switch 2 is the root of that tree. Switch 1 connects to Switch 2 on interface Forty1/0/49, so it sends the broadcasts on that interface. B, C, and D are incorrect. These interfaces are not included in the multi-


destination tree. 10. B is correct. When an lsp-trigger has not been configured, the switch only generates LSPs for routes with /32 prefix lengths. In Independent mode, it generates LSPs for all /32 routes in the routing table. (In Ordered mode, it only generates LSPs for its loopback interface routes and /32 routes for which it has an FEC.) A, C, and D are incorrect. A is incorrect because the switch does generate some LSPs, and C and D are incorrect because they indicate that the switch generates more LSPs than it does. 11. D is correct. The HP 12900 Switch Series supports MDC and EVI. A, B, and C are incorrect. These switch series do not support these features. 12. C is correct. You must specify the 802.1p value for the no-drop queue, which configures the switch to apply flow control to the proper queue. The correct 802.1p value is the value applied to storage traffic because storage traffic requires the flow control. A, B, and D are incorrect. A is incorrect. Although not strictly required for flow control to work on the switch side, a QoS policy with a “dcbx” statement should inform the server CNA which 802.1p value to use for storage traffic. However, the policy should be applied as an outbound policy. B is incorrect because LLDP MED is not related to the PFC feature. D is incorrect. PFC works regardless of the queue scheduling mechanism, and, in any case, weighted round robin (WRR) is typically recommended over SP for this scenario. 13. A is correct. You control the B-VLAN used to forward tenant traffic across the backbone by mapping the I-SID to the B-VLAN ID. B, C, and D are incorrect. B is incorrect; if you are using S-VLAN ID for a service instance’s encapsulation mode, you must specify the ID used by the tenant not the B-VLAN ID. C is incorrect. You should not extend B-VLANs to interfaces outside of the SPBM region. D is incorrect. B-VLANs belong in MSTP instance 4092. Tenant VLANS belong in other instances if you are using MSTP outside of the SPBM region. 14. C is correct. Authorizing the MDC for the interface module simply enables administrators to assign port groups on that module to the MDC. It does not actually assign any port groups to the MDC yet.


A, B, and D are incorrect. A and B are incorrect because MDC 1 cannot use any interfaces on the module until an administrator actually assigns port groups to it. (B also incorrectly states that two MDCs can use the same interfaces.) D is also incorrect. Authorizing an MDC to use an interface module does not directly affect whether an MDC can exchange traffic with another MDC. To exchange traffic, an interface assigned to each MDC must be connected together (just as if the MDCs actually belonged to different switches). 15. C is correct. VRRP is a default router redundancy protocol, which you can implement on the two switches. Making both switches act as master (which you achieve by blocking VRRP hellos between the switches) lets each switch route traffic for its own data center. This design helps to reduce the traffic sent between the data centers. A, B, and D are incorrect. A is incorrect because this design requires all traffic that needs to be routed in data center 2 to be sent to data center 1. C is incorrect because IRF does not work for this multi-data center scenario. First, Switch 1 and Switch 2 would need physical links between each other to establish the IRF virtual device. Also, it is typically recommended that each device connected to an IRF virtual device with two members connects to both members, which is not feasible for this multi-data center scenario. D is incorrect because IS-IS graceful restart is not a default router redundancy protocol. Instead it helps the IS-IS feature restart more seamlessly when a standby management module or an IRF member takes over during in a failover situation. 16. B is correct. Switch 1 and Switch 2 are already set up as EVI Discovery Protocol Servers (ENDSs). Switch 4 needs to be set as a client to their IP addresses. It can then register itself to the ENDS and learn about the servers and other clients. A, C, and D are incorrect. A is incorrect. The client command is for setting the IP address of an ENDS, and Switch 4 is not an ENDS. C is incorrect. Only Switch 4’s IP address should be the source address. D is incorrect. ENDP does not use multicasts. Selective MAC flooding lets the EVI switches send multicasts across EVI links even if the multicast address has not been discovered by IGMP. 17. D is correct. PE-1 must know a route to PE-3’s loopback interface address in order to add an LSP for it and to establish a remote LDP session with it. The exhibit shows that PE-1 has learned other MPLS switch loopback addresses with OSPF but not PE-3’s loopback address. Therefore, the administrator should


check that PE-3 is correctly configured to advertise this address with OSPF. A, B, and C are incorrect. A is incorrect; Layer 2 VPNs might be part of the overall solution, but enabling this feature does not help PE-1 to establish the remote LDP session. B is incorrect. Changing the LDP mode on PE-1 can affect which LSPs this switch generates but not which LSPs it accepts. C is incorrect. LDP does not need to be enabled on the PE-3 loopback interface; PE-3 can still generate an LSP for the loopback address. The problem in this scenario is that the MPLS switches are not adding the LSP to 10.255.3.3 to their LFIBs because they do not know a route to the address. 18. C is correct. The inner label ID identifies the pseudo wire, and the outer label ID identifies the LSP for reaching the remote peer. A, B, and D are incorrect. These answers do not correctly define the labels. 19. A is correct. EVI uses IS-IS to forward information about MAC addresses and maintain the links. Graceful restart for EVI IS-IS enables an IRF member to take over when the IRF master fails without disrupting these functions. B, C, and D are incorrect. B is incorrect because it does not relate to the IRF functionality. You would take advantage of the hello blocking feature to let a device in each data center act as the active VRRP master. C is incorrect because BFD MAD helps to prevent issues if an IRF link fails, not if the master fails, which is the requirement in this scenario. D is incorrect. ARP suppression helps to make the solution more efficient, but does not affect the failover when an IRF master fails. 20. A is correct. The switch maintains one MAC forwarding table for each VSI. B, C, and D are incorrect. B is incorrect. Depending on the setup, multiple VLAN IDs on the customer side might be associated with a VSI. But the switch maintains just one MAC forwarding table for the VSI. (If you wanted to isolate the VLANs, you would need to associate them with different VSIs.) C is incorrect. The switch does not maintain separate MAC forwarding tables for different remote LDP peers. (It needs to use the table to track the correct remote LDP peer to which to send traffic.) D is incorrect for the same reason as C. In addition, the same remote peer might be specified in different VSIs, but each VSI has its own MAC forwarding table.


Appendix 1 Enhanced IRF EXAM OBJECTIVES In this appendix, you learn to: ✓ Describe the Enhanced IRF Feature. ✓ Describe use cases for Enhanced IRF. ✓ Describe the Enhanced IRF operation and components. ✓ Configure Enhanced IRF.

INTRODUCTION This appendix explores the Enhanced IRF (EIRF). You will learn about the advantages, operational aspects, and connectivity requirements of EIRF. Unicast and Multicast traffic handling mechanisms are then explored, followed by a discussion of EIRF configuration tasks. You will not be tested on the material in this appendix. The information is provided for your reference.

Enhanced IRF Overview This appendix includes an introduction to EIRF, along with a discussion on EIRF operation and concepts. Configuration is also reviewed.

Network Virtualization Types In the context of this appendix, virtualization refers to representing multiple physical network devices as a single unit. See Figure A1-1 for the network virtualization types.


Figure A1-1: Network Virtualization Types

One method of achieving this objective is to configure multiple chassis as members of an IRF group. IRF enables two or more physical chassis to be administratively managed and configured as one device. This IRF group is perceived as one unit by other devices through the use of multi-chassis link aggregation with LACP. Another method to achieve this functionality is through 1:N virtualization, in which multiple logical entities are hosted on one device. This includes the definition of multiple VLANs in a single switch, or multiple VRF’s defined inside a single router. This also includes MDC, in which multiple virtual switches are defined inside a single network entity. Both of these two methods can be used to provide complete network virtualization. IRF can combine two physical switches into one logical entity. This IRF entity can then be configured to support several MDCs, in which multiple VLANs can be defined. To route between them, a VRF can be deployed in each of the MDCs. In this way, N:1 and 1:N device virtualization types can be combined.

Evolution of N:1 Virtualization Traditional network design could involve the use of two core switches with separate redundancy mechanisms for Layers 2 and 3. The Layer 2 redundancy was provided through STP. While each access switch might be dual-homed to upstream cores, one of those links would be placed in a blocking mode to prevent loops. The dual connections provided active-standby redundancy – the redundant links did not help to carry traffic load.


For Layer 3 redundancy, VRRP could be used. With VRRP, one core switch would assume the master role, actively routing traffic. Another device would assume a standby role to provide redundancy. This standby device can detect an outage, and assume the master role if needed. This is another active-standby redundancy mechanism. IRF provided a significant evolution from previous techniques by grouping multiple switches into a single entity. With IRF, multiple core switches can be bundled together, as can access switches. Separate core and access IRF groups can be interconnected using LACP, which bundles multiple, physical links into a single, logical link. This greatly simplifies the network design, while providing active-active data and control plane redundancy, as opposed to the active-standby mechanisms provided by STP and VRRP. IRF also improved the management plane. If four physical chassis are grouped into a single IRF group, then those four devices are all managed as a single unit. If nine members are bundled into an IRF group, there is then only one-ninth of the administrative configuration and management overhead. However, each IRF system must still be individually managed. For example, if 200 switches are configured into multiple four-member IRF groups, then 50 logical IRF units must be individually managed. While IRF provides significant advantages over traditional networking, room for improvement remains. The next generation of virtualization is the Enhanced IRF (EIRF). This technology provides similar active-active redundancy for data and control planes as IRF. The evolution is that switches of multiple design layers can now be grouped into a single, logical entity. IRF provides a horizontal grouping of identical devices. This means that IRF can group access switches into a logical entity, and it can group core switches into a separate logical entity. EIRF can provide vertical grouping of both access and core switches into a single entity. With EIRF a complete, virtualized network can now be managed from a single IP address, with complete active-active device and path redundancy. The number of nodes to be managed is reduced by 1/30th, or possibly 1/100th depending on the number devices converged. VLAN configuration is greatly simplified. In the IRF scenario described above, 200 physical switches were combined into 50 IRF groups. This means that a VLAN addition or modification requires 50 configuration sessions – one for each IRF group.


With EIRF, the same task can be accomplished with a single configuration, since the entire network is perceived as a single device. Advantages are also realized with physical inter-switch cabling. With IRF, cables are required between the different members the same IRF system. Access switches require multiple uplink connections to the core, as well as 10Gbps IRF connections between members. With EIRF, these horizontal, intra-member connections are not required at the access layer. Also, EIRF greatly reduces the need for LACP configurations. LACP is not required to interconnect the members of a EIRF group. It is only used to connect LACPcapable endpoint devices to access layer switches. EIRF simplifies and improves the efficiency of MAC address learning operations, and also eases the migration to SDN. These advantages are the result of EIRF’s ability to collapse the entire network into a much smaller number of logical entities. Figure A1-2 displays the evolution of N:1 virtualization.

Figure A1-2: Evolution of N:1 Virtualization

EIRF EIRF is based on IRF, which is a mature, proven technology. Traditional IRF can only group identical switch-series models and types. EIRF provides a complete N:1 virtualization technology to collapse different device types at different deployment layers into a single logical entity. With EIRF, various models of core and access


switches can each be perceived as the individual line cards of a single, centrally managed system. In the example at the top of Figure A1-3, IRF was used to group two identical chassis-based core switches into a single entity. Two identical fixed-port access switches were also paired into an IRF group, and another pair was also configured to act as a single IRF entity.

Figure A1-3: EIRF

In the bottom example of Figure A1-3, the same devices are now all integrated into a single EIRF. They are centrally managed by the chassis switches as if they were remote line cards. The entire system is perceived as one massive chassis device with multiple line cards.


Endpoint devices can be connected to two separate access switches to maximize redundancy. This is the same as connecting endpoints to different line cards of a single physical device. The endpoint’s dual interfaces can simply be grouped together using traditional link aggregation. The connections between access switches are no longer needed, and so are removed. All communication between access switches shall now be via the chassis switches.

Benefits of EIRF The single logical system created by EIRF greatly simplifies the network topology. As shown in Figure A1-4, there is now only one entity to manage and configure. This reduces repetitive configuration tasks and reduces configuration complexity.

Figure A1-4: EIRF


For example, VLANs need only be configured one time, as opposed to once for each of 200 physical switches, or 50 IRF systems, for example. Redundancy protocol configurations such as VRRP and STP are no longer necessary, further simplifying the configuration. Firmware management is also streamlined. All access switch members are automatically provisioned with the correct firmware by the main core switches.

EIRF in the Data Center Figure A1-5 shows a deployment of three logical EIRF fabrics. The main, chassisbased core switches in each fabric act as a Controlling Bridge (CB) device. These distribution-layer CBs terminate all the top-of-rack switch connections.

Figure A1-5: EIRF in the Data Center

Each EIRF fabric operates as a single logical device that is dual-homed to the Layer 3 core network. This routing core could be individual routing devices, or an IRF system. They could be running the routing protocol of choice, such as OSPF, BGP, or IS-IS. The Top-of-Rack access switches provide the EIRF Port Extender (PE) function, since they extend the number of available ports in the EIRF fabric. Typically, servers will be dual-homed to two PEs to ensure high availability.

EIRF in the Campus Although the focus of this study guide is on data centers, it should be noted that EIRF can also be deployed in a campus setting. In the scenario in Figure A1-6, the entire campus network becomes a single logical device. The CBs consist of an IRF fabric at the core layer. These core devices provide Layer 3 forwarding, and act as the default gateway for endpoints.


Figure A1-6: EIRF in the Campus

The access switches act as PEs, providing High Availability groups for endpoint connectivity. Although not shown in Figure A1-6, it is also possible to connect other switches to the PEs using link aggregation. This would provide an active-active redundancy mechanism for these switch connections.

EIRF Operation Overview This section describes EIRF operation, beginning with a discussion of terminology and components. This will lead to a more detailed look at Port Extender (PEX) Ports and CB-PE connectivity. This is followed by an overview of CB chassis-based switches and Top-of-Rack models. You will also learn about various PE forwarding modes, port numbering systems, and the EIRF split-brain condition.

Terminology and Components The CB is the highest layer of a EIRF fabric, and must be a chassis-based or highperformance fixed-port device. Chassis that support EIRF include the 11900 and 12900. Fixed-port devices include the 5900/5930 models. The EIRF fabric perceives one logical Controlling Bridge. For redundancy, two physical chassis can be grouped into an IRF system. The PE is the lowest layer in the EIRF model, deployed as a fixed-port device. Chassis-based switch models cannot be used as PE devices. Various models are available to meet different performance and scalability requirements. PEs are connected to CBs with a Port Extender (PEX) port. This is a logical connection, similar to a traditional IRF system’s logical IRF port. This logical PEX port can be formed with multiple physical interfaces. To maximize redundancy, sets of physical interfaces can be connected to different CBs.


In Figure A1-7, the CBs are depicted as high-performance, chassis-based devices. Each PE has multiple physical connections to the CBs. This set of connections is configured as single logical PEX port.

Figure A1-7: Terminology and Components

EIRF Implementation: PEX Port Multiple physical interfaces will be assigned and automatically aggregated into a logical PEX port. As shown in Figure A1-8, the cabling for PEX interfaces can be fiber-optic connections or dedicated direct-attached cables. These will be 10Gbps or 40Gbps connections.


Figure A1-8: EIRF Implementation: PEX Port

No direct links between the PE devices are allowed. PEs can only have physical connections with CBs. Multi-chassis link aggregation is not required for this connectivity, but is recommended to maximize redundancy. As each PE comes on line, it appears as an interface line card on the CB. When the PE goes offline, it appears as if the line card was removed. For this reason, PEX port connections can be thought of as similar to the chassis backplane connection in a traditional chassis-based switch.

EIRF Implementation: CB-PE Registration Flow ******ebook converter DEMO Watermarks*******

Figure A1-9 shows CB-PE registration flow to establish the EIRF. In the first step, the PE device is first manually converted to operate in PEX mode, via a BOOTROM menu option. The PE reboots. Meanwhile, the CB periodically sends EIRF hello packets out all PEX ports. When the PE device comes online, it receives these packets and requests a Slot ID. The CB receives and responds with a Slot ID assignment.

Figure A1-9: EIRF Implementation: CB-PE Registration Flow

In the next step, the PE determines if it is running the correct firmware. If not, then boot and system images are downloaded from the CB, and saved locally on the PE device. The next time the PE boots it will already have the correct images. Once booted, with images validated, the PE can register itself, and the CB pushes the configuration down to it, as if configuring a chassis-based line card.

EIRF Implementation: CB-PE Connection A logical PEX port is manually created on the CB to manage each PE. One logical PEX port is defined for each physical PE device. Each PEX port is assigned a virtual Slot ID and physical interfaces are bound to it. Optionally, a description can be configured for PEX port. This would typically describe the connected PE device. As shown in Figure A1-10, multiple physical CB-to-PE connections can be assigned to one logical PEX port. At least two connections should be provisioned for redundancy, with additional connections added as needed to meet bandwidth requirements. A hash-based load balancing algorithm is used to distribute traffic over


the physical PEX port connections.

Figure A1-10: EIRF Implementation: CB-PE Connection

A given PE’s uplinks can only be connected to physical CB interfaces that have been assigned to the same logical PEX port. If a CB has an interface assigned to PEX port1 and another interface assigned to PEX port2, then a PE should not be connected to both of these interfaces. Instead, assign two or more physical ports to be in PEX port1, and connect a PE’s interfaces to that. This is very similar to how traditional IRF physical ports should be mapped to the same IRF logical port. If a cabling mistake breaks this rule, it will be automatically detected and the interface will be placed in a blocked state by the EIRF protocol.

EIRF Implementation: Physical Port States The three physical port states for an EIRF implementation are forwarding, down, and blocked. ■ Forwarding: The interface is up, properly connected, and data can be sent over the


link. ■ Down: The physical link is disconnected, and traffic cannot be forwarded. ■ Blocked: The physical link is in an error condition. This can be due to misconfiguration or a cabling error. This is likely because the PE interface is connected to a physical CB port that is a member of a different logical PEX port than its other interfaces.

EIRF Implementation: PE Identification – New PE New PE identification can be compared to traditional IRF, which uses a member ID concept to identify devices. With traditional IRF, this member ID is used as a part of the port numbering scheme for the IRF fabric. IRF member ID changes require a reboot to take effect. With EIRF, the virtual slot ID is used to identify the PE. This ID must be unique within each EIRF fabric. The Virtual Slot ID replaces the original Slot ID portion of the device’s port numbers. Unlike IRF, the virtual Slot ID takes effect immediately after is has been configured. There is no need to reboot the PE for this purpose. In Figure A1-11, a new PE has been added to the EIRF fabric. This can occur while the EIRF is in operation, without impact to existing traffic flows. The CB automatically computes the topology to prevent loops. After successful topology validation, Hello packets are sent over the new links, which are moved into a forwarding state. This is similar to when new line cards are installed into a traditional chassis, which also has no impact on switch operations.


Figure A1-11: EIRF Implementation: PE Identification – New PE

EIRF Implementation: PE Removal PE removal is similar to board removal in a traditional chassis. The PE configuration is removed from the running configuration, but maintained in the operational configuration. The visible output of display commands will not include the configuration for the removed PE.


Should the running configuration be saved after PE removal, the saved configuration will no longer include the configuration for the removed PE. This is not an issue because the complete operational configuration will remain. If the PE device is replaced by a new device it will simply inherit the original configuration from the previous PE. However, upon PE removal, if you save the configuration and then reboot the EIRF fabric, the removed PE device’s configuration would be lost, and the interfaces would be added to the EIRF in their default configuration states.

EIRF Implementation: CB Chassis Option CB devices can be deployed using higher-end Top-of-Rack switches or with chassisbased devices. With the chassis-based option, the CB is configured as a two-member IRF system. Each chassis would typically have two MPUs. Of the four total MPUs, one will be the master, and the other three will be standby MPUs. The three standby MPUs remain fully synchronized with the master MPU. This is simply a traditional IRF configuration between the two chassis-based switches. In Figure A1-12, PE1-4 will be connected and brought online as if they are line cards on this IRF chassis. The line card numbering is different from a traditional chassis deployment.

Figure A1-12: EIRF Implementation: CB Chassis Option

EIRF Implementation: CB Fixed-Port Option ******ebook converter DEMO Watermarks*******

As shown in Figure A1-13, he CB can be deployed using two fixed-port devices, configured as an IRF fabric. Each fixed-port device has a single MPU. One of the devices acts as a Master MPU and the other is a standby MPUs. The EIRF fabric will have the same single point of management as a chassis-based solution.

Figure A1-13: EIRF Implementation: CB Fixed-Port Option

Each PE device will be added, and become visible as other member devices of the IRF system. Unlike the CBs, they will not receive a full synchronization of the control plane.

EIRF Implementation: Forwarding Modes PEs can operate in one of two different forwarding modes, as shown in Figure A114. One option is to configure a PE to operate in central forwarding mode, in which the CB processes all traffic. Another option is local forwarding mode, which allows each PE to process traffic locally.


Figure A1-14: EIRF Implementation: Forwarding Modes

EIRF Implementation: Central Forwarding Mode Central forwarding mode is appropriate when you have relatively lowerperformance PEs. This does not necessarily mean low bandwidth capability. It could simply mean that the ASICs are not designed to handle complex network protocols. As shown in Figure A1-15, in this mode, PEs simply forward all traffic to the CB. The CB makes all forwarding decisions, selects an egress interface, and traffic is forwarded out that interface.


Figure A1-15: EIRF Implementation: Central Forwarding Mode

EIRF table sizes are limited by CB device maximums. For example, a PE device may have a maximum MAC address table limitation of 60,000 entries, and its CB may support 200,000 entries. The maximum MAC address capability of the EIRF fabric is 200,000 MAC addresses. The PE limitation is not relevant, since it merely acts as a “dumb” line card. All decisions are made by the CB. Effectively, PE interfaces have the same feature set as CB interfaces, since they are all controlled by the same MPU. If the CB ASICs support features such as MPLS and TRILL, then the PE interfaces also support these features. In this way, relatively inexpensive PEs can acquire the advanced features inherent to the more capable CB devices.

EIRF Implementation: Local Forwarding Mode Local forwarding mode is most appropriate when you have high performance PE devices, with advanced ASICs that can handle the features you need. As shown in Figure A1-16, in this scenario, the CB creates and maintains all Layer 2


and Layer 3 forwarding entries. These table entries are replicated to the PEs, enabling them to perform local forwarding functions.

Figure A1-16: EIRF Implementation: Local Forwarding Mode

Each PE can therefore perform local table lookup. If a received frame’s destination is out a local PE port, it can forward traffic autonomously, without CB involvement. If the destination’s outgoing port is not local to the PE, it forwards the frame upstream to the CB, which will act as a fabric interconnect. Local forwarding mode requires an extensive ASIC feature set that is only be available on higher end PE device models.

EIRF Implementation: Unicast 1 Figure A1-17 shows a unicast forwarding scenario in centralized forwarding mode. When PE device receives a traditional Ethernet frame, it does not perform table lookups. The PE adds a EIRF header to the frame and sends it to the CB. The CB removes the EIRF header, and adds the source MAC address to its table with its associated inbound source port. This is very much like classic Ethernet MAC


address learning.

Figure A1-17: EIRF Implementation: Unicast 1

The CB then performs table look up. It compares the Ethernet frame’s destination MAC address to its learned MAC address table to determine the appropriate outbound port.

EIRF Implementation: Unicast 2 The CB could process the traffic using a typical Layer 2 table lookup, or it might process the frame using Layer 3 routing or MPLS, depending on its configuration. Either way, it makes a forwarding decision, selecting the appropriate PEX port toward the destination PE. As shown in Figure A1-18, an EIRF header is added to the frame before transmission. This header indicates the appropriate outgoing port on the PE device. The PE receive this frame, removes the EIRF header, and forwards the traffic out the appropriate egress interface, as indicated in a field of the EIRF header.


Figure A1-18: EIRF Implementation: Unicast 2

EIRF Implementation: Multicast/Broadcast 1 For multi-destination traffic, the inbound traffic from PE to CB is processed the same as unicast traffic. As shown in Figure A1-19, the PE receives the frame, adds an EIRF header, and sends it upstream to the CB. The CB removes the device header, and performs address learning.

Figure A1-19: EIRF Implementation: Multicast/Broadcast 1

It is the downlink traffic, from CB to destination PE, where a multicast replication operation occurs.

EIRF Implementation: Multicast/Broadcast 2 As shown in Figure A1-20, if the CB receives a broadcast frame, it sends one copy of the traffic to each PE. Broadcast frames are received by each PE and forwarded out all ports in the destination VLAN.


Figure A1-20: EIRF Implementation: Multicast/Broadcast 2

If the CB receives a multicast frame, it parses its multicast routing table, and sends one frame copy to each PE that is participant in the multicast group. This could include the PE that originally received the frame inbound from an endpoint. Each PE that receives a multicast frame uses traditional multicast table lookups to determine which of its ports are connected to multicast members. The PE then forwards the frame out all of those ports.

IRF and EIRF: PE-CB Connections Figure A1-21 provides several examples of PE-CB connection methods. The recommended configuration is to create a two-chassis IRF system for the CB. Each PE should have at least one physical link to each physical IRF member. To optimize redundancy LACP port groups can be attached to each CB member.


Figure A1-21: IRF and EIRF: PE-CB Connections

The second scenario on the top row shows multiple ports aggregated to a single, physical CB member. This is technically possible and supported, but redundancy and scalability are compromised. The third scenario is also supported, but even less redundant, since only a single physical link provides the PE-CB connection. The bottom row in Figure A1-21 shows various invalid configurations. Direct PE-toPE connections are invalid and unsupported. It is not acceptable to group PE devices using IRF, and then connect that IRF system to the CB. Regardless of physical PE-CB link configurations, only single, physical PE devices can be connected to the CB.

EIRF Port Numbering When fixed-port switches are used to form a traditional IRF group, they use a threelevel port numbering schema. These levels are the slot, or IRF Member ID, the subslot, and port.

Note In a fixed-port switch, the sub-slot number will usually be zero, unless the switch supports additional modules. These modules are typically installed into the back of the switch. For example, a typical interface name for a fixed-port CB switch is TenGi1/0/1. Parsing the numbers from left to right, this port is physically located on physical IRF member 1, in sub-slot 0, port number 1. Interface TenGi2/0/1 is located on IRF


member 2, in sub-slot 0, port number 1. When chassis-based switches are deployed into an IRF group, they use a four-level numbering schema – IRF member-ID, slot, sub-slot, and port number. For example, consider the interface named TenGi1/2/0/1. For this interface, the first “1” indicates that this port is located in IRF member-ID 1, with a line card installed into slot number 2 of the chassis, sub-slot number 0, port number 1.

Note Most chassis-based line cards to not support a sub-slot, and so the number will usually be a zero. Some router-based line cards support the installation of modules. In this case, the sub-slot number indicates the appropriate module slot in the line card. The port numbering schema used by traditional IRF is also used by EIRF.

EIRF Port Numbering Fixed-Port CB Currently, fixed-port, CB-capable switches include the 5900 and 5930-series. EIRF port numbering for the CB switches is identical to that used for traditional IRF. Interface TenGi1/0/1 is the first, or left-most port in a physical switch that has a member-ID of 1. As previously noted, the sub-slot is typically 0. As an administrator, you can easily determine if a port is physically installed in a CB or PE device, as shown in Figure A1-22. IRF ID numbers assigned to CB devices will be in the range of 1 through 9, and PE devices start at 100.


Figure A1-22: EIRF Port Numbering Fixed-Port CB

For example, a 5900G model could be deployed as a PE device. Its interfaces could be referred to as Gi100/0/1, Gi100/0/2, and so on. A 5900XG could be installed as another PE in this fabric, with ports TenGi101/0/1, TenGi101/0/2, and so on.

EIRF Port Numbering Chassis CB Examples of chassis-based CBs include the 11900 and 12900 series. These CB devices use the same four-level schema as previously described for IRF. A difference between IRF and EIRF is in the number of devices supported. While up to four chassis-based switches can be deployed in a single IRF group, EIRF supports a maximum of two. The first number in the four-level numbering scheme is always the virtual chassis ID, and there can be two chassis-based CBs in a EIRF fabric. Therefore CB port numbers will either start with 1 or 2. Figure A1-23 shows two port number examples


TenGi1/2/0/31, and TenGi2/2/0/31.

Figure A1-23: EIRF Port Numbering Chassis CB

The EIRF fabric’s PE devices use this same 4-level identification method. The first number of every PE device is always 9. This is the virtual IRF node number assigned to all PEs connected to chassis-based CBs. This virtual IRF node is always online, independent of IRF chassis ID 1 or 2. Each PE is uniquely identified by the slot ID, starting with 1. The first PE may have interfaces Gi9/1/0/1 – Gi9/1/0/24. The third PE could have interfaces TenGi9/3/0/1 – 24.


IRF and EIRF: Network Access for Servers There are several options for connecting server endpoints to the EIRF fabric. The best practice is to maximize redundancy by dual-homing the server to two physical PEs, as shown in Figure A1-24. The connections to each PE are accomplished with multiple physical links in a port aggregation group.

Figure A1-24: IRF and EIRF: Network Access for Servers

Another option is to configure a port aggregation group to a single PE. This is supported, but of course redundancy is compromised. There is link redundancy due to port aggregation, but there is no device redundancy. You could also decide to connect a server using a single physical link. Again, this is supported, but there is neither link nor device redundancy in this scenario.

IRF and EIRF: Network Access for Servers Example Figure A1-25 provides an example for configuring EIRF fabric ports to accommodate server access. This example shows ports from multiple physical PE devices being configured for server connectivity. This configuration is the same as that used to configure Multi-Chassis Link Aggregation for a traditional IRF deployment.


Figure A1-25: IRF and EIRF: Network Access for Servers - Example

In the example, LACP link aggregation is configured by first defining a logical Bridge-Aggregation interface 201, and then configuring it for dynamic negotiation mode. Next, interfaces ten101/0/1 and ten102/0/1 are configured as members of this logical link aggregation group. From this configuration, you should be able to surmise two key facts. One is that this configuration is being performed on a EIRF fabric built using fixed-port CB devices. Recall that with fixed-port CB chassis, PE port numbers use a three-level scheme, starting with 100. Also, the LACP member ports are on different physical switches, since one port is identified by 101, and the other by a unique slot ID of 102.

IRF and EIRF: Split Brain Situation 1 A split brain condition, illustrated in Figure A1-26, occurs when an IRF fabric is broken due to link failures. Both portions of the split fabric use the same IP address, causing routing issues and IP address conflicts.


Figure A1-26: IRF and EIRF: Split Brain Situation 1

The way to mitigate this condition is to configure Multi-Active Detection (MAD). This feature is configured on the CB devices, enabling them to notify their connected PEs that a split has occurred. PEs receive these messages and determines the number of members for each IRF fragment to which it is attached. It joins the partial fabric with the most members. If fabric sizes are identical, it joins the fabric with the lowest master ID. The PE blocks ports connected to other IRF fabrics and continues to function. There are three variations of MAD – LACP MAD, Bi-directional Forwarding (BFD) MAD, and ARP MAD. LACP MAD is the recommended method for EIRF fabrics. Detailed discussion of these methods is not covered in this study guide.

Note For more detailed information on MAD, see the Virtual Connect and HP ASeries switches IRF Integration Guide. http://h30507.www3.hp.com/hpblogs/attachments/hpblogs/143/797/1/VC-IRFintergration-white-paper-v2.pdf


Learning Activity: EIRF Operation Review Write the letter of the descriptions in the right-hand column in the space provided under each numbered EIRF component in the left-hand column.

Leaning Activity: Answers 1. CB Highest layer of EIRF fabric (m) Chassis or high-performance fixed-port device (f) Can be two physical chassis grouped into an IRF (a) 2. PE Lowest Layer in EIRF (c) Fixed-port device (d) No direct connections between them (e) Switches must be manually converted via BOOTROM menu and then rebooted (g) Each one requires a separate PEX port, can’t be connected to two PEX ports (k) Identified by a virtual slot ID (h) 3. PEX port Logical port between PE and CB (i) Can contain multiple physical ports (j)


Physical ports are aggregated automatically (l) Uses 10G or 40Gbps connections (n) Manually created on the CB (b)

Design Considerations Regarding cabling, PEs should only have connection to CBs, and never to each other. This would cause a loop in the logical EIRF device. Only one layer of PE devices is supported in a EIRF fabric. All PE devices must be directly connected to the CB. It is not possible to connect PE devices to an intermediate layer of PE devices, which then connect to the CB. This in turn brings up an important cabling consideration related to port availability. For a deployment that includes eight access switches, each one should have at least two CB connections for redundancy, for a total of sixteen. CB devices must therefore have sixteen available ports to successfully deploy an optimal EIRF fabric.

Design Considerations: Deployment Planning Figure A1-27 summarizes a high-level deployment plan.

Figure A1-27: Design Considerations: Deployment Planning

The first step is to plan the network by identifying CB member devices, PEs, and PEX ports. Next, the CB devices should be virtualized into single, logical IRF device. The third step is to configure the PEX links for each PE. Each port is assigned a virtual slot ID, and then physical interfaces are assigned to the logical


PEX port. This completes the basic CB configuration. The next step is to integrate PEs with the CB. The operating mode of each PE must be manually changed to PEX mode from the BootROM. Once PE ports are properly cabled and booted, it is a “Plug and Play” scenario. The EIRF protocol will automatically perform the software check, and ensure correct firmware is downloaded to each PE device. After any required download, each PE reboots and comes online as part of the EIRF fabric.

Configuration Steps for EIRF Figure A1-28 reveals the high-level configuration steps for EIRF. This scenario uses two 5900-series devices as the CB, and a 5900 as a PE. Interconnections are via 10Gbps interfaces, with a 1Gbps link for server access.

Figure A1-28: Configuration Steps for EIRF

Step 1: Configure IRF for the CB Devices The first step is to configure the two CB devices for traditional IRF. LACP MAD should also be configured. It is assumed that traditional switch configurations such as IRF are already understood.

Step 2: Prepare the PEX Firmware Image Step 2 involves preparation of the PEX firmware image. This image must be


installed to flash on the CB. The PE will be able to download this PEX firmware image during the initial boot process. Once downloaded, this PEX firmware is saved in local flash on the PE. From that point the PE will boot from this local image as a member of the EIRF fabric. Figure A1-29 shows the normal firmware images required for the 5900 CB device. You can also see that images for the PE device have been copied to flash as well. The top two files listed are the boot and the system images for PE devices.

Figure A1-29: Step 2: Prepare the PEX Firmware Image

PEs download this image during initial startup, and then reboot using this new image. When they come back online they receive a Hello message from the CB. This Hello message helps the PE confirm that its PEX firmware version is correct, and that additional firmware downloads are not required.

Step 2: Prepare the PEX Firmware Image 2 Simply saving these PEX images to flash on the CB is not sufficient. The CB must be able to select which files to send to each PE switch model that comes online. Both the boot and system images will be bound to the PE device model. In the example in Figure A1-30, the images are bound to PEX device model called PEX-5900. To do this, the CB command “boot loader” is used, along with a “pex” keyword, followed by the device model. This indicates that this boot loader syntax is not for the local CB device, but for 5900 PE devices that may come online.


Figure A1-30: Step 2: Prepare the PEX Firmware Image 2

The “display boot-loader pex” command can then be used to validate which image files will be used for which PE device models.

Step 3: Create a PEX Port with Virtual Slot-ID The third step on the CB involves defining the PEX port. The maximum number of PEX ports that can be defined depends on the platform being deployed. If a PEX port is removed and the slot ID is change, the PE will reboot. In the example in Figure A1-31, PEX Port1 is defined and a description is provided. The description could include the rack or device number of the PE device. The “associate 101” command assigns a virtual Slot ID to the PEX port. This assignment is locally significant, and will be reflected in the interface port numbering schema for the attached PE device.

Figure A1-31: Step 3: Create a PEX Port with Virtual Slot-ID

Step 4: Associate PEX Port with a Physical Interface Now that the logical PEX Port has been defined, physical interfaces can be bound to it. Multiple interfaces are required for redundancy. It is recommended that at least one interface from the PE be attached to each physical CB device member. The maximum number of interfaces that can be connected is device dependent.


When interfaces are assigned to the PEX port, they will revert back to their default configuration. In the example in Figure A1-32, PEX port 1 has been defined, and an interface from each physical CB IRF member is assigned to this logical port. This includes interfaces Ten1/0/1 and Ten2/0/1.

Figure A1-32: Step 4: Associate PEX Port with a Physical Interface

The port group command shown here is identical to the command used for traditional IRF configuration. All the commands shown thus far have been issued on the CB.

Step 5: Configure PE in PEX Mode Using Boot Menu The next step is to configure PE devices to operate in PEX mode. This is accomplished through the boot menu of the PE device. As the PE device is booting, you must use the “Ctrl + B” option to break out of the normal boot process and enter the boot menu. Some switch models may use a different key sequence to break into the boot menu. As shown in Figure A1-33, in this menu, you can use the “Change Work Mode” option to configure the switch to operate as a PE device. Once this is enabled the switch will reboot. A fairly “Plug-and-Play” type of scenario exists at this point. The PE automatically becomes part of the EIRF fabric.


Figure A1-33: Step 5: Configure PE in PEX Mode Using Boot Menu

Step 6: Verify The final step is to verify your configuration efforts. You can use the “display pexport” command to display logical EIRF ports, status, and associated slot ID, along with the status of the EIRF fabric. As shown in Figure A1-34, the “display pex-port verbose” can reveal which physical ports are participating as part of a PEX port.

Figure A1-34: Step 6: Verify

Step 6: Verify 2 ******ebook converter DEMO Watermarks*******

Figure A1-35 shows variations of “working-mode” commands. These commands reveal that the CB-attached PE device is operating in PEX mode, and is associated with slot ID 101.

Figure A1-35: Step 6: Verify 2

The “display devices” command allows you to see which physical CB devices are operating as Master and Standby units in the CB IRF group. All PE devices will be listed as line cards in the normal state. PE devices can never become a master of the IRF.

Step 6: Verify 3 As shown in Figure A1-36, the “display log” option can be used to see any messages that may be related to EIRF.

Figure A1-36: Step 6: Verify 3

Summary In this appendix, you learned:


■ EIRF is based on IRF – a mature, proven technology. While traditional IRF can only group like-devices, EIRF can group various models at the access and core layers into a single, centrally managed system. ■ EIRF simplifies the network topology, reduces configuration tasks, decreases configuration complexity, and eases initial deployments. ■ An EIRF fabric is composed of Controlling Bridges (CB) at the core layer, and Port Extender (PE) devices at the access layer. ■ PEX ports are used as the PE-to-CB connections. These logical ports can have multiple, link-aggregated ports connected to multiple physical CB member switches. ■ Both unicast and multicast traffic are handled efficiently in the loop-free EIRF fabric. ■ EIRF configuration is very straightforward, and is similar in many ways to a traditional IRF configuration.

Learning Check Answer each of the questions below. 1. EIRF is based on IRF, and so has the following features (Choose three)? a. EIRF can only group identical switch-series models and types b. EIRF can group different device types into a single logical entity c. EIRF can group access and core switches into a single logical entity. d. EIRF is appropriate for both Data Center and campus deployments. e. EIRF requires that all access layer switches be directly connect to each other. 2. Which three statements are true about EIRF components (Choose three)? a. The CB function is only supported on chassis-based switches b. Two physical switches can be grouped into an IRF system to serve the CB function. c. The PE function is only supported on fixed-port devices d. Various models of chassis-based and fixed-port devices can serve as PE devices. e. A PEX port is a logical port that can contain multiple physical ports.


f. Each PE must only have a single connection to a CB to avoid STP loops. 3. How are new PEs identified in an existing EIRF deployment? a. EIRF uses a virtual slot ID to identify PEs, and the PE must reboot for this change to take effect. b. When a PE is added, traffic flow for existing PEs is disrupted for about 3 seconds. c. When a new PE is added, the CB automatically computes the new topology to prevent loops d. PE identification operates in an identical way to how the IRF member ID is used to identify member devices. 4. What two statements are true about how EIRF forwards frames (choose two)? a. Central forwarding mode is appropriate when you have PEs that either lack the forwarding performance or capabilities that you require. b. EIRF uses central, local, and broadcast forwarding modes. c.

With local forwarding mode, the CB creates and maintains forwarding tables, and shares appropriate entries with each PE device.

d. With central forwarding mode, the CB is responsible for forwarding all frames, based on table lookup information provided by the PE.

Learning Check Answers 1. b, c, d 2. b, c, e 3. c 4. a, c


Appendix 2 EVB - VEPA

EXAM OBJECTIVES In this appendix, you learn to: ✓ Understand the EVB/VEPA protocol. ✓ Describe the advantages of the EVB model. ✓ Understand all components involved in a complete EVB solution. ✓ Understand the integration with the Hypervisor. ✓ Understand the role of the HP 5900v Distributed vSwitch. ✓ Describe the configuration process. ✓ Describe the operational process of EVB.

INTRODUCTION This appendix introduces EVB and VEPA technologies. This suite of services coordinates virtual hypervisor systems, physical switches, and management platform to provide a more scalable, more easily managed data center environment.

EVB and VEPA Edge Virtual Bridging (EVB) is defined in the IEEE 802.1Qbg standard. The purpose of this standard is to enable Virtual Machines (VMs) to share a common bridge port for forwarding services. This standard includes Virtual Ethernet Port Aggregator (VEPA) technologies, and protocols to help automate the coordination and configuration of network resources. This results in enhanced network visibility for VM-to-VM configuration and


communication.

Review of Hypervisor Networking A hypervisor “VM host” is a hardware device that is capable of creating and running multiple VMs. Common hypervisor systems in the industry include VMWare’s ESX, Microsoft’s Hyper-V, and Linux KVM. All of these systems can support multiple virtual environments inside a single physical host, or a collection of physical hosts. The hypervisor environment includes one or more software based switches, or vSwitches. Each VM has a virtual NIC, or vNIC that connects to the hypervisor’s vSwitch. This vSwitch provides connectivity between the VMs running on the same Hypervisor. The vSwitch is also bound to a physical NIC (pNIC), which provides connectivity to the physical network infrastructure. Figure A2-1 shows a typical VMWare infrastructure that includes two physical ESX hypervisor hosts named ESX1 and ESX2. The vSwitch inside each machine connects to two VM servers and a physical NIC. The physical connection to an external switch can be one or more Ethernet links. In this scenario, each ESX host has a single connection to an external HP 5900 Switch Series.


Figure A2-1: Review of Hypervisor Networking

This environment also includes vSphere, which is VMWare’s hypervisor management system. vSphere provides a centralized platform from which you can define, modify, and control all virtual environments across all physical hosts. HP’s IMC management platform is also a part of this environment. The IMC platform has been enhanced by installing a Virtual Application Network (VAN) module. While IMC adds significant value, it is not required for a pure hypervisor environment. However, if you are deploying an EVB solution with your hypervisor environment, the IMC VAN Connection Manager is a required component.

Traffic Flow with Classic Hypervisor ******ebook converter DEMO Watermarks*******

Networking Figure A2-2 shows how VM-to-VM traffic flows within the same hypervisor host. When VM-11 and VM-12 need to communicate, traffic never leaves the ESX1 host. The internal vSwitch can handle this intra-host traffic.

Figure A2-2: Traffic Flow with Classic Hypervisor Networking

This vSwitch has vPorts that connect to each VM’s vNIC. Each VM’s connected vPort is assigned to a port group, which in turn can be assigned to a VLAN ID. In this example, the vPorts for both VMs have been assigned to port group PG2, which has been configured with VLAN 2.


Therefore, when VM-11 sends an untagged broadcast frame, all other VMs in the same port group on ESX1 will receive this frame, since they are all in the same broadcast domain. Unicast frames between hosts on the same VLAN are also handled internally by the vSwitch, just as they would be with an external, physical switch. This also means that traffic between VM-11 and VM-12 never leaves the ESX1 hypervisor host. As a result, this communication is not visible to any physical network devices, such as the HP 5900 switch in this scenario.

Traffic Flow with Classic Hypervisor Networking When VMs on different hypervisors must communicate, the scenario remains fairly similar to the previous example. As shown in Figure A2-3, the vSwitches inside ESX1 and ESX2 have both assigned the vPorts for their respective VMs to VLAN 2. VM-11 sends an untagged frame to the vSwitch, which has not learned the destination MAC address for VM-13. It adds a standard 802.1q tag of VLAN 2 to the frame and sends it to the physical HP 5900 Switch Series. The physical switch has been performing normal network forwarding, and so has likely learned the destination MAC address. The switch simply forwards the frame to the ESX2 hypervisor.


Figure A2-3: Traffic Flow with Classic Hypervisor Networking

The target vSwitch removes the VLAN 2 tag and delivers the frame to VM-13. In this case, the traffic is of course visible to the physical network.

Traffic Visibility When the physical network has visibility to VM traffic flow, additional services can be provided, as shown in Figure A2-4.


Figure A2-4: Traffic Visibility

Physical switches like the HP 5900 can offer many advanced services, in a way that is familiar to network professionals. This includes services like the following: ■ QoS provides preferred and/or low-latency delivery for certain data types. ■ sFlow offers network analysis reporting, allowing visibility into application usage patterns and statistics. ■ ACLs can be applied to improve security. ■ Port mirroring enables advanced troubleshooting methods by allowing you to see all packets between certain devices, or on certain VLANs. Few if any of these features might be available for traffic that remains inside the hypervisor environment, and the physical network no longer has control of a large


portion of network traffic. The goal of the EVB solution is to ensure that the physical network handles all traffic flows. This ensures that all traffic is handled in a consistent manner, and that all traffic can be processed by the rich feature set available to physical switches.

VLAN Network Management In a traditional hypervisor model, VLANs are configured on the hypervisor management tool, such as in VMWare’s vSphere product. This tool is used to configure port groups, VLANs, and other aspects of vSwitch operation. These vSwitches are of course acting as the access layer switches for all VMs. The physical connections between the hypervisor vSwitch and the physical switches are configured as trunk ports. Both the hypervisor and the physical switch must of course be configured to support trunking with the appropriate VLANs. This means that VLAN management must now be managed from two disparate tools – the hypervisor’s management tool and traditional network configuration and management tools. In a classic hypervisor environment, VLANs for VMs must be configured on all physical switches, and enabled on all hypervisor-connected trunk ports. Even when a host has no VM on a specific VLAN, this VLAN should still be enabled on the interface connected to that ESX host. In the example in Figure A2-5, ESX1 contains VM-11 and 12, both configured with port group 2 on VLAN 2. ESX2 is hosting VM-13 & 14, assigned to VLAN 3 in port group 3. It would seem the HP 5900 switch only needs to support VLAN 2 on the connection to ESX1, and VLAN 3 toward ESX2.


Figure A2-5: VLAN Network Management

However, this would hinder the ability to use vMotion, a feature that eases transferring VMs to different physical hosts. It is often vital that an administrator can move any VM to any physical host. For this reason, all VLANs should be supported over all trunk links. In a larger environment with many ESX hosts, this can create large broadcast domains, which wastes bandwidth, degrades performance, and reduces network scalability.

EVB Model The EVB model provides consistent traffic flow handling, because all traffic is handled by the physical switch. No VM-to-VM traffic is handled by the internal vSwitch. This includes traffic between VM-11 and 12, as well as that between VM11 and 13.


Therefore, all traffic can be treated by the same network policies, and leverage the same services. This includes QoS, sFlow, ACLs, traffic mirroring, and more. With EVB, the physical switch uses a feature called reflective relay. A traditional Ethernet switch will never forward traffic back out the interface on which it was received. In Figure A2-6, the reflective relay feature is what allows the switch to receive traffic inbound from VM-11 and forward it back out the same interface to VM-12. Special systems must be installed on the hypervisor to support this feature.

Figure A2-6: EVB Model

EVB VLAN Network Management VLAN management is greatly improved with the EVB model, because VLANs need only be deployed where they are actually needed. With EVB, port VLAN


membership is dynamically adjusted to accommodate the dynamic nature of VMs. Products like VMWare’s vMotion make it easy for server administrators to move VMs to a different physical server. EVB ensures the required network configuration for these moves happens automatically. In Figure A2-7, the physical switch connection to ESX1 only supports VLAN 2, and the connection to ESX2 only supports VLAN 3. If something like vMotion is used to move VM-11 from ESX1 to ESX2, the EVB solution automatically adjusts the VLAN support. The link to ESX2 would be automatically configured to support VLAN 2. If VM-12 was also moved, and no other VMs on ESX1 required VLAN 2, then VLAN 2 would automatically be removed from the trunk port on the physical switch.

Figure A2-7: EVB VLAN Network Management

Terminology and Concepts 1 ******ebook converter DEMO Watermarks*******

Figure A2-8 introduces EVB terminology and concepts, as described below:

Figure A2-8: Terminology and Concepts 1

■ EVB Bridge: the physical switch connected to the hypervisor. ■ EVB Station: the physical hypervisor host that connects to the EVB Bridge. This physical EVB station can host multiple vSwitches, called Edge Relays in an EVB setting. ■ Edge Relay (ER): replaces the traditional vSwitch, and ensures that VM traffic is egressed to the physical EVB Bridge. Multiple pNICs can be installed into an EVB station, with an ER bound to each one. ■ Uplink Relay Port (URP): one of two ER interface types, this is the connection to the EVB Bridge. ERs can have one or more URPs for link-aggregated redundancy.


■ Downlink Relay Port (DRP): This is the second of two ER interface types. It connects to a VM vNIC, available upon VM-sourced demand. One DRP is created per activated VM. ■ Virtual Station Interface (VSI): The EVB term for a VM’s vNIC, it connects to the DRP of the Edge Relay.

Terminology and Concepts 2 The Service Channel or S-Channel, as shown in Figure A2-9, is a virtual link between the ER and EVB Bridge. The link uses the VEPA protocol, which is very similar to the QinQ mechanism of adding an outer VLAN tag to the original frame. The S-Channel is negotiated between the ER and EVB Bridge using an LLDP extension called the Channel Discovery and Configuration Protocol (CDCP).


Figure A2-9: Terminology and Concepts 2

The reflective relay feature must be enabled on the physical switch’s S-Channel interface. This feature allows inbound frames to be sent back out their ingress interface. A traditional Ethernet switch will never allow this, making this feature a key enabler of EVB bridge functionality.

S-Channel Identifier Each ER-to-EVB Bridge S-Channel is identified by a configuration pair. The SChannel name is based on the host interface. If an S-Channel is activated on interface Ten1/0/1, then the name of the logical interface is S-Channel 1/0/1, as shown in Figure A2-10. If the S-Channel were enabled on bridge aggregation interface BAGG3, then the name of the logical interface would be S-Channel Agg3.


Figure A2-10: S-Channel Identifier

The S-VLAN Identifier is the outer VLAN tag added to the S-Channel’s QinQ tunnel. The name and S-VLAN Identifier form the S-Channel ID. The standard provides device support for one or more S-Channels, operating in one of two available modes. By default, VEPA Mode is used to support a single SChannel on service VLAN 1. This is reflected in the configuration as a number following the interface name, separated by a colon. For example, you might see interface S-Channel 1/0/1:1. This means that the S-Channel is configured on interface Ten1/0/1, using VLAN 1. Another mode is Multichannel VEPA mode, which supports multiple S-Channels over a single interface. This mode is not widely used, and will not be discussed further in this study guide.

S-Channel Sub-interfaces The S-Channel will be configured with sub-interfaces. It is these sub-interfaces that provide a logical connection to a VM’s vNIC. When a vNIC is activated, a new subinterface is dynamically created on the EVB Bridge. This sub-interface is


dynamically removed from the S-Channel interface when the VM is stopped, or moved to another host using vMotion. On the switch, it is the sub-interface that is configured for things like the VLAN assignment, QoS, and ACLs. Figure A2-11 shows two VMs online. To accommodate this, the EVB Bridge configuration will include interface S-Channel 1/0/1:1.1 for VM-11, and interface SChannel 1/0/1:1.2 for VM-12. This is how a single physical interface on the switch is logically separated to accommodate reflective relay. While traffic between VM-11 and VM-12 arrive on the same physical interface, the switch perceives them as being on separate, logical sub-interfaces.


Figure A2-11: S-Channel Sub-interfaces

S-Channel Setup Negotiation CDCP is responsible for negotiating S-Channel setup. This EVB-defined extension to LLDP uses special TLVs to negotiate S-Channel parameters. These CDCP exchanges occur between the ER and EVB Bridge using a special destination MAC address of 0180C2-000003. This MAC address must be configured on the physical switch. Incidentally, this address is referred as a “Nearest non-TPMR bridge address”. TPMR stands for Two-Port MAC Relay. This is not of practical importance to EVB deployments, and is only mentioned here due to its inclusion in many descriptions of


the standard. This special MAC address helps CDCP to operate on the same interface as other LLDP extensions without conflict. It also ensures that if there is a “Two-Port MAC Relay” device between the EVB Bridge and the ER, it will transparently pass the frame, instead of treating it as a typical LLDP frame, which would not be forwarded.

Note A TPMR is simply a bridge device that supports a subset of typical MAC bridge functions. It is transparent to all traffic flowing through it, except traffic addressed specifically to it, or for neighbor agents, such as LLDP. The special MAC address used by CDCP ensures that its frames will traverse any such device without issue.

S-Channel Configuration Negotiation The following summarizes the key factors related CDCP’s S-Channel negotiation: ■ Reflective Relay: This setting enables the egress of frames back out their ingress port. ■ Auto-configuration: CDCP will typically request the reflective relay feature automatically, eliminating the need to manually configure it. ■ Manual Configuration: If the hypervisor’s ER is incapable of negotiating the reflective relay feature, the network administrator can manually configure it on the physical switch – the EVB Bridge device.

EVB VSI Manager EVB Bridges service each VM’s vNIC, which is referred to as a Virtual Station Interface (VSI). As shown in Figure A2-12, sub-interfaces must be configured on the EVB Bridge to support these connections. This configuration can be done manually or automatically.


Figure A2-12: EVB VSI Manager

Manual configuration of sub-interfaces is not recommended because it is difficult to maintain configuration consistency. This is especially true in a typical data center where VMs are moved between physical hosts. These moves would necessitate manual changes to support VLANs and other configurations relevant to each VM. A better solution is to allow sub-interfaces to be automatically configured by the VSI manager. This tool integrates with the hypervisor management tool in order to learn about VM starts, stops, and migrations. It then communicates with the EVB Bridge,


automatically modifying sub-interface configurations as appropriate. HP’s Virtual Application Networks (VAN) Connection Manager is a software module for HP’s IMC management platform. It functions as a VSI manager by integrating with market-leading hypervisors, such as vSphere, Hyper-V, Zen, and KVM. A templatebased approach is used to ensure consistent configuration across the network infrastructure. You use IMC VAN to define VSI templates with specific VLANs, QoS rules, ACL filters, and more. These templates are made available to the hypervisor, where you define traditional network profiles, called port groups. You bind an appropriate VSI template to this port group. As a result, the ER knows which VSI template to use for each VM, and announces this to the EVB Bridge. The EVB Bridge queries the VSI Manager, which delivers the appropriate configuration. In this way physical switches are automatically configured to accommodate a dynamic data center environment.

EVB VSI Manager: VSI Templates The templates that you define in the VSI Manager consist of two objects, as shown in Figure A2-13. One is a network object, which contains the VLAN assignment.


Figure A2-13: EVB VSI Manager: VSI Templates

The other object is the VSI Type, which is bound to the network object. The VSI Type contains the actual configuration settings for ACLs, QoS, and more. Most of these configuration settings are optional, except the VSI type, which must be bound to the network object. A properly completed template configuration is released as a VSI Type version. For example, you might define a sales server template, to be applied to certain VMs. When these VMs come online, sales template version 1 will be applied. When you modify this template, you release it as a new VSI Type version, which can then be


applied to the VMs. Version control is automatically enforced. HP’s IMC VAN system allows various roles to be defined for administrative staff members. Role security settings can be used to define which network objects each IMC administrator can access and control. This role security can also be used to determine which VSI types a particular administrator can manage. The hypervisor management platform is used to define port groups, and associate them to a VSI type. The VSI Type version is bound to a network on the VAN server. This ensures that the port group will be configured with the correct ACL, QoS, and VLAN service. The hypervisor manager only needs to configure vNICs and associate each one to a traditional Port Group, just as they did before the implementation of EVB. The actual control over which VSI type is assigned to which port group is controlled by IMC VAN templates.

HP 5900v: EVB Interaction with EVB Station 1 Standard hypervisor vSwitches do not support EVB or VEPA. The hypervisor can only support this functionality by installing an EVB-capable product like the HP 5900v (see Figure A2-14). The HP 5900v is EVB and VEPA-capable, and so provides CDCP communication with the EVB HP 5900 physical switch. 5900v configuration can be orchestrated through the IMC VAN Connection Manager.


Figure A2-14: HP 5900v: EVB Interaction with EVB Station 1

The HP 5900v can coexist with a traditional vSwitch in the hypervisor. At least one physical NIC on the host server must be assigned to the HP 5900v for EVB and VEPA functionality. A traditional vSwitch can have other VMs associated with it, using a separate physical NIC on the host. The limitation is that the traditional vSwitch and the HP 5900v cannot share a physical interface. Each must have exclusive access to


its own physical NIC on the hypervisor host.

HP 5900v: EVB Interaction with EVB Station 2 Shown in Figure A2-15, the HP 5900v is responsible for negotiating the S-Channel with the EVB Bridge, and also for announcing the addition or removal of VMs. A separate protocol is used for each of these responsibilities.


Figure A2-15: HP 5900v: EVB Interaction with EVB Station 2

CDCP is used for S-Channel set up between the ER and the EVB Bridge, as previously discussed. The Edge Control Protocol (ECP) is used to announce VM additions and removals, thus automating vNIC session setup and teardown. ECP’s transition state machine includes the pre-associate, associate, and de-associate states. The ER sends a pre-associate message to the EVB Bridge for VMs that are in the


process of coming online. This gives the EVB Bridge time to prepare and configure new sub-interface for the VM. The associate state is applicable when the VM is operational and can be used in production. This state is also used when a VM is moved to another hypervisor. During the transition, the VSI remains in the associate state with the original EVB Bridge interface, while the target host sends a pre-associate announcement to the new EVB Bridge interface. When the transfer is complete, the original host sends a de-associate message to the original interface, thus removing the sub-interface from the EVB Bridge. The target host moves to an associate state with new EVB Bridge interface. The transition is fairly seamless, since the target interface was already prepared and configured before the VM came online.

HP 5900v Components The HP 5900v consists of a Virtual Forwarding Engine (VFE) and a Virtual Control Engine (VCE). The VFE is the HP 5900v data plane component, and is installed on each hypervisor, replacing the standard vSwitch for the EVB deployment. This component has no user interface or any type of local management capabilities. The VFE must be configured and managed by the VCE. The VCE is the control plane of the HP 5900v, and runs as a VM on a hypervisor host. This is shown as a separate logical server in Figure A2-16, which is running as a VM on the physical host. Arrows pointing from the VCE to the VFEs indicate that configuration information is being transferred to modify VFE operation.


Figure A2-16: HP 5900v Components

HP 5900v Communication 1 For EVB deployments, VMware’s vSphere product requires a HP 5900v Plugin module, as shown in Figure A2-17. This plugin is used to deploy the HP 5900v VFEs to ESX hosts. VFEs are deployed to ESX hosts in a similar way that traditional VMware vSwitches are distributed. However, for HP 5900v support, VMWare’s distributed vSwitch feature must be licensed.


Figure A2-17: HP 5900v Communication 1

The plugin is also used to bind physical NICs to HP 5900v VFE instances. The administrator can select which physical links on a host server will be used as Uplink Relay Ports (URP). Multiple interfaces can be bound to the HP 5900v for redundancy. The VFE acts as an Edge Relay on the host. It can be the sole virtual switch on the host, or it can coexist with the traditional vSwitch on the host. For example, the traditional vSwitch could use physical interfaces 0 and 1 on the hypervisor host, while the HP 5900v uses physical interfaces 2 and 3.

HP 5900v Communication 2 The management plugin also defines the VMware port group to VSI Type version mapping. As shown in Figure A2-18, the VCE component communicates this


information to the vSphere host, which sends port group configurations to ESX hosts. This configuration does not include VLAN or QoS information, only the VSI type and version data. It is simply an internal VSI type index number.

Figure A2-18: HP 5900v Communication 2

This information is distributed to the VFE instances as VMs are brought online. This means that the VCE is not aware of the actual ACL and QoS rules to be applied. It simply knows the internal identifier of the VSI type profile that should be applied.

Learning Activity: EVB Operation and Component Review Refer to the figure. Write the letter pointing to the component in the figure next to the


appropriate component name listed below. Provide a brief description of each component in the space provided.

■ EVB Bridge: ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ■ EVB Station: ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ■ Edge Relay (ER): ______________________________________________________________________ ■ Uplink Relay Port (URP): ______________________________________________________________________ ■ Downlink Relay Port (DRP): ______________________________________________________________________ ■ Virtual Station Interface (VSI):


______________________________________________________________________ ______________________________________________________________________ ■ S-Channel: ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ■ VSI Manager: ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________ ______________________________________________________________________

Learning Activity: Answers ■ EVB Bridge: (g) an HP 5900-series or similar pSwitch connected to the hypervisor. Uses the reflective relay feature to switch frames out the same interface on which they were received. ■ EVB Station: (a) the physical hypervisor host that connects to the EVB Bridge, which can host multiple Edge Relays (vSwitches). ■ Edge Relay (ER): (e) the HP 5900v replaces the traditional vSwitch so VM traffic is egressed to the EVB Bridge. ■ Uplink Relay Port (URP): (c) the EVB Station-to-Bridge link. ■ Downlink Relay Port (DRP): (d) The ER-to-vNIC link, one per active VM. ■ Virtual Station Interface (VSI): (b) The EVB term for a VM’s vNIC ■ S-Channel: (f) A virtual channel negotiated over the EVB station-to-Bridge URP links. Interface S-Channel 1/0/1:12 indicates that physical interface Ten1/0/1 is being used for a URP connection for an S-Channel, and has a sub-interface to support an active VM that exists on VLAN 12. This sub-interface is dynamically created and removed as VMs are activated and deactivated, using an extension of LLDP called CDCP. ■ VSI Manager: (h) IMC VAN Connection Manager integrates with vSphere or similar hypervisor manager to learn about VM starts, stops, and migrations. Templates on the VSI Manager control VLANs, QoS rules, ACLs applied to each


particular VM. The 5900v ER tells the 5900 EVB Bridge which template to use. The EVB Bridge then queries IMC VAN Connection Manager for this configuration.

HP 5900v Installation Prerequisites The following summarizes the prerequisites to installing a HP 5900v-based solution. This scenario assumes the use of a VMWare environment. IMC 7.0 or later must be installed and running, along with the Virtual Network Manager, which is part of the basic platform. The VAN Connection Manager must also be installed. VMware vCenter Server should be installed and running, as should the VMware ESXi hosts. Key information should also be gathered in preparation for the actual installation of the HP 5900v. This includes the IMC server’s IP address, ports, and administrator login credentials. You will also need the VMWare vCenter IP address and login credentials, as well as an IP address to assign to the VCE.

Installation Flow Overview 1 The HP 5900v is deployed as a VM, using the vSphere “Deploy OVF Template” option. The HP 5900v software package is available for download from the HP web site in an OVF format. The OVF template deployment wizard streamlines the installation. In the screenshot shown in Figure A2-19, the configuration wizard requests IMC login credentials and vCenter IP address and credentials. These settings will be pushed to the VM as it comes online. In addition to installing the VM, this OVF Template also includes the required vSphere plugin installation. This happens transparently. No special deployment process is required for the vSphere plugin.


Figure A2-19: Installation Flow Overview 1

Once deployed, the HP 5900v VCE can be configured with new IMC or vCenter information by directly accessing the IP address of the VCE.

Installation Flow Overview 2 The administrator connects to the VCE, shown in Figure A2-20, at http://VCE-IPAddress:8080/gui. After login, the configuration can be modified. This includes the local IP address, IMC and vCenter IP addresses, and their related credentials.



Installation Flow Overview 3 As mentioned, the VCE deployment automatically installs the vSphere plugin. This can be verified from the vSphere management interface. As shown in Figure A2-21, an additional tab appears for the HP Virtual Distributed Switch (VDS) solution. This tab includes an option for VFE installation.


The example in Figure A2-21 shows two ESXi hypervisor hosts are available. The administrator can select the host on which to install the VFE component. After clicking the check box next to the host, the install button is clicked, and the VFE components will be deployed as a background task.

Initial Configuration – Device Discovery and ******ebook converter DEMO Watermarks*******

VDS After VFEs are deployed they should be configured. The initial configuration includes adding physical switches to the deployment, such as Top-of-Rack HP 5900 EVB Bridges. ESXi and vSphere hosts are added using a SOAP/HTTP template. This allows IMC to query specific VMware information related to vMotion and other VM status information using SOAP. A Virtual Distributed Switch (VDS) must be created on the vSphere host. One and only one VDS is supported on each vSphere host. Then port groups can be defined on this VDS for the uplink ports. In the bottom-right screenshot, two hypervisor hosts are listed by their IP address, along with available uplink ports. The network administrator can select which interfaces can be used as uplinks for each server. The link aggregation type can also be specified. Once the uplink has been configured a default port group can be defined, as shown in the bottom-left screen shot in Figure A2-22.

Figure A2-22: Initial Configuration – Device Discovery and VDS

New Network/VM Configuration Flow Overview 1 Figure A2-23 shows a new VM configuration process from the IMC VAN Connection Manager. The first step is to define the network object. In this example, a new network is defined for Customer C and is assigned to VLAN 31. A maximum connection limit for this VLAN is set at 200. The 201st VM that attempts to connect to


this VLAN will be refused, and the VM will not come online. This is a convenient way to prevent the overpopulation of an IP subnet.

Figure A2-23: New Network/VM Configuration Flow Overview 1

Next, the VSI should be created. In the example in Figure A2-23, a VSI Type for Customer C’s front-end server is created, and assigned to the Cust-C-31 network that was just defined, using VLAN 31. Specific service units can also be enabled, such as bandwidth control or VM access control, as in the example. As the administrator enables these services, new configuration parameters become available. For example, IP address/mask pairs can be configured for filtering, as can the amount of bandwidth to be allowed. These Service Unit features are not required. If the network administrator simply wants to assign VLANs, all the service units can be unchecked. Some client VMs may not need bandwidth or QoS control, so a simple VLAN assignment would suffice. The VAN Connection Manager automatically translates whatever options were selected into Comware CLI Commands. This could include creating traffic classifiers for QoS, access-lists, and QoS traffic behaviors to control the interface rate of the filtering. The traffic classifiers and behaviors could be combined into a policy, and applied to the S-Channel’s sub-interface. This will all be done dynamically and automatically by IMC during the pre-associate phase of the VM connection.


New Network/VM Configuration Flow Overview 2 Once the VSI Type has been released you will see them in the VAN Connection Manager. In the example in Figure A2-24 we can see that a VSI Type named Customer-C-Server-Front has been released with version 1. You can only deploy actual versions to VM port groups.


New Network/VM Configuration Flow Overview 3 The next step happens from within the vSphere Network Configuration environment. Using the HP plugin, you can create new Port Groups. This vSphere plugin will query the IMC VAN module for the list of VSI types, such as the ones you created in the previous steps. These VSI types are made available as selections in vSphere. Now, when a new port group is defined, an appropriate VSI Type version can be bound to it. In the example in Figure A2-25, a new port group named PortGroup-VLAN600 has been defined. The administrator has bound the VSI Type version named VLAN600VSI(V1) to this port group. This VSI Type version template contains all of the VLAN, QoS, and security settings that you configured previously from IMC VAN.



New Network/VM Configuration Flow Overview 4 As you recall, the port group was bound to a VSI Type version in the previous step. The final step is to bind the VM’s vNIC to a port group. This operation is exactly the same as what VMware administrator has been doing for years. This is because, as shown in Figure A2-26, the EVB port groups are all listed next to any traditional port groups that may have been locally defined.



Operational VM Boot Process Flow 1 Figure A2-27 summarizes the VM boot process.


Figure A2-27: Operational VM Boot Process Flow 1

Using vSphere management tools the administrator starts VM-11. The vSphere management tool tells ESX1 to start VM-11. vSphere also supplies the VCE with port group information, including the associated VSI Type version to be applied. It is important to understand that at this point vSphere is only providing the VSI Type ID information. The ESX host is not providing the actual QoS and ACL settings to the VCE. The VCE only receives the identifier (such as 1, 2, 3 or 20 for example) for the VSI Type version that was bound to that VM’s assigned port group.

Operational VM Boot Process Flow 2 The VCE now knows that the VM is booting and that the network must be provisioned. The VCE informs the VFE of this fact. The VFE announces to the local


EVB Bridge that a VM is coming online. In essence, the VCE tells the VFE to initiate an ECP session with the EVB Bridge. The VFE begins an ECP exchange with the EVB Bridge. Figure A2-28 shows a packet trace of this communication. ECP includes information about the VSI type ID and the virtual ID. This is simply an internal identifier for the VSI type. It also includes information about the MAC address and VLAN ID of that VM.


The EVB Bridge now knows that a new sub-interface should be created. This subinterface will be used to process traffic for the VM’s MAC address (00:10:95:00:00:02, in this example). The actual sub-interface configuration is unknown at this point. The EVB Bridge only knows that the sub-interface should be configured with VSI type ID 1, Version 1.

Operational VM Boot Process Flow 3 The EVB Bridge creates a new sub-interface for every new vNIC connection. If the switch has an S-Channel Interface 1/0/1:1, then the first sub-interface created will be S-Channel 1/0/1:1.0. This sub-interface will bound to the VLAN and MAC address of the VM. This can be seen in the switch configuration as a VSI filter. This enables you to use the switch configuration to determine which sub-interface is used by a particular VM’s MAC address. As shown in Figure A2-29, this is created dynamically based


on the ECP exchange, since the HP 5900v provides the MAC address and VLAN used by the VM.


Operational VM Boot Process Flow 4 As shown in Figure A2-30, the EVB Bridge now has a sub-interface for VM-11, but no configuration parameters have been applied. For this to occur, the EVB Bridge must contact the VSI manager.



Two options are available. The interface could be manually configured by the network administrator. As previously stated, this is not a best practice. Instead, it is best to leverage the centralized control provided by the IMC VAN Connection Manager. The EVB Bridge will send detailed configuration information to the VSI Manager. This includes filter information, which includes the VM’s MAC address and VLAN, as well as the VSI-ID, VSI type and version information, and sub-interface details. Essentially the EVB Bridge tells the VSI manager, “I have a new sub-interface that should be configured with VSI 51, version 1.”


Operational VM Boot Process Flow 5 The IMC VAN Connection Manager creates an entry for this VM connection in its VAN database. When it receives the EVB Bridge configuration request, it performs a lookup to find the appropriate VSI Type version configuration. Next, as shown in Figure A2-31, the IMC VAN Connection Manager opens a Telnet session to the EVB Bridge and delivers CLI configuration syntax to the sub-interface. This will include any ACLs, QoS policies, traffic classifiers, behaviors, and policies. Any required policies will of course be applied to the sub-interface.


Operational VM Boot Process Flow 6 ******ebook converter DEMO Watermarks*******

Once this has been completed, the VSI Associate state is achieved. VM-11 is online with a unique sub-interface configuration on the EVB Bridge. The running configuration contains the operational VSI Type version configuration settings. In the example in Figure A2-32, VM-11 is now logically connected to S-Channel 1/0/1:1.0. Note that VSI Type version changes cannot be done on the fly. If changes were made on the IMC VAN module, this would not be reflected in the configuration on the EVB Bridge. The two would be out of sync.


Whenever a VSI configuration is modified in the IMC VAN module, it must be released as a new version and then assigned to the port group. At that point the configuration change can become active.


EVB Bridge Configuration Steps The basic steps to configure the EVB Bridge include configuring the interface for EVB support, configuring LLDP, and configuring the VSI Manager.

Step 1: Configure Interface with EVB support The interface must be configured as a trunk link, and EVB must be enabled on the interface. The “evb enable” command also enables CDCP. VLANs will be permitted dynamically with the VSI manager. VLAN1 enabled by default, and is used as a service VLAN. In the example in Figure A2-33, EVB is enabled on a physical interface. It can also be enabled on Bridge Aggregation interfaces.

Figure A2-33: Step 1: Configure Interface with EVB support

Step 2: Configure LLDP LLDP has to be enabled at the global level, and the interface must be configured to use the non-TPMR destination MAC address. This is accomplished with the “lldp agent” command, as shown in figure A2-34.

Figure A2-34: Step 2: Configure LLDP

Step 3: S-Channel Reflective Relay ******ebook converter DEMO Watermarks*******

Typically, the reflective relay feature is automatically negotiated by CDCP. HP’s 5900v product acts as an Edge Relay device, and so will automatically negotiate this feature using CDCP. This means that the “evb reflective-relay” command is not necessary. If some other ER device was deployed that lacked this capability, you need to enable this feature manually, as shown in Figure A2-35.

Figure A2-35: Step 3: S-Channel Reflective Relay

Step 4: VSI Manager Configuration The VSI Manager is responsible for delivering VSI network configuration to the EVB Bridge. A local manager could be used, but is not recommended. It is far more effective to deploy a product like HP’s VAN Connection Manager to act as a central VSI manager. You must inform the EVB Bridge who it is to communicate with for this purpose. This configuration can be done at the global level, as shown in Figure A2-36. Configured this way, VSI Manager IP address and name specified will be used for all interfaces on the EVB Bridge. If you wanted some interfaces to use a different VSI manager, you could configure it using similar syntax at the interface level.

Figure A2-36: Step 4: VSI Manager Configuration

Step 5: Verify Figure A2-37 lists several commands that can be used to verify your configuration efforts.



Summary In this appendix you learned: ■ Edge Virtual Bridging (EVB) is defined in the IEEE 802.1Qbg standard. The purpose of this standard is to enable Virtual Machines (VMs) to share a common bridge port for forwarding services. ■

This standard encompasses Virtual Ethernet Port Aggregator (VEPA) technologies, and protocols to help automate the coordination and configuration of network resources. This results in enhanced network capabilities for VM-toVM traffic, including QoS, ACLs, sFlow, port mirroring.

■ HP’s 5900v can replace a hypervisor’s native vSwitch to enable EVB and VEPA services in the data center. It can work with physical switches like HP’s 5900 series. ■ In an EVB deployment, the physical switch is called an EVB Bridge, the HP 5900v virtual switch is called an Edge Relay (ER), and a VM’s vNIC is called a Virtual Station Interface (VSI). The EVB Bridge and ER communicate and connect over a virtual link called the S-Channel. This channel is negotiated using CDCP. ■ HP’s IMC VAN Connection Manager serves as an EVB VSI Manager. This component enables sub-interfaces to be automatically configured on an EVB Bridge’s S-Channel interface to accommodate the movement, addition, and removal of VMs in the data center.


■ The HP 5900v’s VFE component is installed on hypervisor hosts. Its VCE component is installed as a VM on a hypervisor host. The physical EVB Bridge must also be configured to support an EVB solution.

Learning Check Answer each of the questions below. 1. Which of the following is an advantage of an EVB – VEPA solution? a. More frames can be forwarded locally inside a hypervisor host. b.

EVB – VEPA solutions can be implemented purely inside a standard switched environment, with no additional components required.

c. Services such as QoS, security ACL’s, and sFlow are extended into the hypervisor environment and so are supported on internal vSwitches. d. Since all frames are forwarded by the physical switch, consistent services and visibility are available for all traffic flows e. EVB – VEPA can take advantage of a standard IMC solution. No additional modules for IMC are required. 2. Which three statements are true about how EVB VLAN management (Choose three)? a. VLANs for all VMs must be provisioned on all data center switches. b. VLANs need only be deployed where they are actually needed. c. VLANs are automatically provisioned when they are defined in IMC d. VLANs are automatically provisioned when a VM comes online e.

Communication between the IMC VAN module and the pSwitch help facilitate VLAN management.

3. The S-Channel is a virtual link between the ER and EVB Bridge, and uses subinterfaces to support dynamic VLAN creation. a. True. b. False. 4. Which three statements are true about an EVB VSI manager (Choose three)? a.

It allows sub-interfaces to be automatically configured on a switch by integrating with a hypervisor management tool

b. An EVB VSI manager is included by default with an IMC installation.


c.

The IMC VAN Connection Manager module adds EVB VSI manager capabilities to IMC.

d. Automatic VLAN creation using an EVB VSI manager is the only method for creating VLANs for Virtual Machines in an EVB deployment. e.

To aid in automatic VLAN creation, VSI Templates are defined with networks and VSI types.

5. What are two features of the HP 5900v (Choose two)? a. Both a standard hypervisor vSwitch and the HP 5900v can be used for VEPA functionality in an EVB deployment. b. A hypervisor’s standard vSwitch can coexist with the HP 5900v, but only the 5900v provides VEPA functionality. c. The 5900v uses CDCP to communicate with an HP 5900 physical switch. d. The 5900v is not responsible for announcing the addition or removal of VLANs to an EVB Bridge.

Learning Check Answers 1. d 2. b, d, e 3. a 4. a, c, e 5. b, c


Appendix 3 VXLAN EXAM OBJECTIVES In this appendix, you learn to: ✓ Describe the VXLAN feature. ✓ Understand the VXLAN basic operations. ✓ Describe the MAC learning process in a VXLAN. ✓ Describe the virtual VXLAN to physical VLAN network integration. ✓ Understand the basic configuration of a VXLAN tunnel.

INTRODUCTION This appendix begins an Introduction to of Virtual eXtensible LANs, or VXLANs. You will learn how this Layer 2 overlay technology functions, and how it is used in a data center environment. This includes an understanding of solution objectives for VXLAN, and how those objectives are fulfilled with capabilities such as MAC learning inside the VXLAN, and how multi-destination delivery methods are handled. You will also learn how VXLAN can be interconnected to traditional physical networks, providing multiple deployment options. One such option includes the use and configuration of VXLAN hardware-based gateways, such as the HP model 5930.

VXLAN Overview This appendix includes a section on VXLAN introductory topics, before moving into VXLAN theory of operation and design considerations. The final section involves the configuration of VXLAN functionality on HP switches, such as the HP 5930.

VXLAN introduction ******ebook converter DEMO Watermarks*******

VXLAN functionality is based on an RFC draft document, which is yet to be assigned an official RFC number. VXLAN provides an alternative to the traditional VLAN concept. Traditional VLANs are tagged inside an 802.1q header with a 12-bit number, providing for a maximum of 4094 distinct VLANs. This is more than adequate for typical corporate deployments. However, multi-tenant data centers can quickly exhaust this resource. Suppose a hosting site had several clients that each used two or three VLANS, The hosting service would run out of VLANs somewhere around 2000 clients. Of course, several of these clients may require a much larger number of VLANs, limiting the client base even further. Another challenge with traditional VLANs is the possible over-expansion of Layer 2 broadcast domains. In a data center environment, many switches connect to hypervisor servers that host Virtual Machines (VMs). In this scenario, it can be difficult for network administrators to predict where VMs will actually be hosted. On which physical servers, connected to which physical switches might any VM need to be hosted? To accommodate this uncertainty, every client VLAN might be supported among every physical switch. This would enable any VM to operate on any server. However, in a larger data center with hundreds of switches, this could create large broadcast domains and unnecessary broadcast traffic. This is also less than optimal from a security standpoint, as well as for limiting the scope of network outages and errors. VXLAN technology offers a solution to these limitations.

Supported Products HP top-of-rack switches, such as the HP 5930 supports VXLAN, as well as chassisbased models like the 7900, 12900 series switches. The 48-port 10G FC module, and the 24-port 40G FC module are also supported.

VXLAN Operation VXLAN technology is most advantageous in a data center environment, since traditional campus networks do not experience the issues that VXLAN is designed to resolve.


VXLAN is an IP-based Layer 2 overlay technology. It encapsulates Layer 2 traffic inside an IP datagram for improved function and scalability. A format referred to as “MAC in UDP” is used for this encapsulation. Once this Layer 2 traffic is encapsulated, it can take advantage of any existing IP routed infrastructure inside the data center, or between data centers. All of the redundancy, resiliency, and loadsharing capabilities typical of routed infrastructures are therefore available to VXLAN services. The VXLAN ID is a 24-bit value – double the number of bits available with 802.1q. This means that over 16 million VXLAN IDs are available, as compared to 4,000 VLAN IDs for traditional VLANs.

VXLAN Concepts Figure A3-1 depicts a transport IP network composed of a number of switches, interconnected in a routed environment. Several physical server hosts are connected to this infrastructure.


Figure A3-1: VXLAN Concepts

VXLAN functionality is yet to be fully deployed, with no physical devices configured with this capability. The VMs hosted on these servers, have however been configured with a 24-bit VXLAN Network Identifier (VNI). Two IP backbone sub-nets are indicated. The 10.1.1.0/24 includes two physical ESX servers name ESX-11 and ESX-12. The 10.1.2.0/24 subnet also has two hosts named ESX-21 and ESX-22.


ESX-11 is hosting VMs 1, 3 and 11. The ESX Hypervisor was configured with VXLAN 1001, which was bound to the virtual links for VM1 and VM3, while VM11 was assigned a VNI of 2001. Similar configurations exist on the other servers, as shown. The ESX-11 and -12 servers will encapsulate the Layer 2 frames into an IP packet, including the appropriate VNI. The result is that VMs 1 through 5 are functionally on the same Layer 2 multipoint subnet. Similarly, virtual network 2001 contains hosts VM11, 12, and 13. VXLANs 1001 and 2001 operate like a traditional classic VLAN without any Layer 3 IP interface in them. They are totally isolated and these VMs can communicate with each other but they cannot communicate with any device outside of their own VXLAN.

VXLAN: VTEP The VXLAN Tunnel End Point (VTEP) provides an entry point into the VXLAN, while interacting with the other VTEPs to provide connectivity. In Figure A3-2, each physical ESX hosts has been assigned this role. They are responsible to encapsulate traffic from the VMs into the tunnel, and send the resulting packets to their destination. The receiving, destination VTEP must decapsulate the packet and deliver it toward the intended target.


Figure A3-2: VXLAN: VTEP

Essentially, the VTEP provides an “on-ramp” to the VXLAN by performing this frame encapsulation and decapsulation service.

VXLAN Tunnel and Multicast When one VXLAN VTEP or tunnel end point communicates with another, a VXLAN tunnel is established. A VXLAN tunnel is required for each destination VTEP in the


same VXLAN. In Figure A3-3, ESX 11, 12 and 21 are the physical devices hosting VMs assigned to VXLAN ID 1001, and so each of those devices maintains a tunnel with the other two. VMs assigned to VXLAN ID 2001 are hosted on servers ESX 11, 12 and 22, with a similar set of tunnels.

Figure A3-3: Supported Products

The tunnel is merely a mechanism of transport through the IP network. One tunnel can be used by multiple VXLAN IDs because each VM’s encapsulated packet contains its


VNI. This VNI is then used to send packets to appropriate tunnel endpoints. This is why traffic from multiple VXLANs can use the same tunnel. The underlying IP transport network uses multicasting to deliver broadcast, multicast, and unknown unicast frames for each VXLAN. Suppose VM1 sends a broadcast. Since hosts ESX 12 and 21 contain VMs in the same VXLAN, they must receive this broadcast. There are two ways to recreate this broadcast domain functionality. One method is to use head-end replication. With this method, host ESX 11 encapsulates the Layer 2 broadcast frame two times. It is encapsulated once as an IP unicast to host ESX 12, and again as a unicast to ESX 21. This unicast-based solution simplifies the deployment by eliminating the need for multicast services. However, larger deployments could have scalability issues, since every broadcast must be individually unicast to each host in the VXLAN. The second method optimizes scalability by using multicasting. Using this solution, ESX 11 can encapsulate each Layer 2 broadcast in a single IP multicast packet, addressed to some multicast address, such as 239.1.1.1 in this scenario. All hosts are configured to know that VNI 1001 is bound to this address. Hosts ESX 12 and 21 will use IGMP join the multicast group, informing the IP transport network that they need to receive traffic destined to 239.1.1.1. If properly configured for multicasting, the IP network will deliver the original transmission to both ESX hosts. This method does add a bit of complexity to the transport network deployment, since it must support multicast functionality. However, this saves resources and improves scalability, since each Layer 2 broadcast need only be transmitted once by the VTEP device, regardless of the VXLAN’s host count. Another advantage to multicasting is that only ESX hosts which actually require a VXLAN’s traffic will join the multicast group. Suppose an administrator were to move VM5 from ESX-21 to some other physical server. ESX-21 would realize that it is no longer hosting VMs on VXLAN 1001, and send an IGMP leave message to the IP network, leaving the multicast group 239.1.1.1. This VXLAN functionality provides a major advantage over traditional VLANs. The network can respond to the changing relationship between Virtual Machines and physical server connections, and send broadcast/multicast traffic only to required destinations.

VXLAN Packet Structure Figure A3-4 and -5reveal VXLAN packet structure. The original Layer 2 frame is


encapsulated into a UDP IP frame, with an additional 8-byte VXLAN header inserted, which contains the VXLAN ID.

Figure A3-4: VXLAN Packet Structure

Figure A3-5: VXLAN Packet Structure cont.

This additional encapsulation adds an additional fifty bytes of overhead to the transmission. The original frame has the 8-byte VXLAN header added, an 8-byte UDP header, another 20-byte IP header and 14-bytes for Ethernet. For the original Layer 2 frame to support 1500-byte payloads, the MTU of the IP infrastructure should be increased to at least 1550 bytes or greater.

VXLAN Traffic Flow Overview This section is focused on VXLAN traffic flow. The objective is to understand how unicast, multicast, and broadcast traffic is accommodated through a VXLAN-capable infrastructure.


We will examine a scenario that focuses only on VMWare ESX hypervisor connectivity to the VXLAN, before delving into typical multicast and unicast scenarios. You will understand the packet flow that supports VXLAN communications. This includes MAC learning, interconnecting a VXLAN-based system to traditional networks. Finally you will learn about Comware hardware-based VXLAN Gateway operation, providing a solution to bridge VXLANs to physical VLANs.

VXLAN: Multi-Destination Delivery The scenario in Figure A3-6 describes multi-destination delivery, showing how VM1 sends an ARP broadcast intended for VM5.


Figure A3-6: VXLAN: Multi-Destination DeliveryVXLAN: Multi-Destination Delivery

VM1 sends an ARP broadcast, which is delivered through the virtual link to the VTEP. ESX-11 receives this frame from the VM’s virtual port, which is assigned to VXLAN 1001. ESX-11 determines that the destination MAC address is a broadcast. ESX-11 encapsulates the packet into a VXLAN frame, with VXLAN ID 1001. The source IP address is ESX-11’s IP address of 10.1.1.11, and the destination IP address is the configured multicast address of 239.1.1.1. ESX-11 sends this encapsulated frame into the transport IP network.

VXLAN: Multi-Destination Delivery ******ebook converter DEMO Watermarks*******

The transport IP network will deliver the IP Multicast packet to all the joined hosts. This assumes that multicast routing has been properly configured on this infrastructure. Another assumption is that the remote VTEPs have joined the IP Multicast group using IGMP. If so, the IP packet is delivered to ESX-12 and ESX-21, and the other members of the 239.1.1.1 multicast group. As shown in Figure A3-7, this scenario will focus on the ESX-21, since the same activity applies to ESX-12.


Figure A3-7: VXLAN: Multi-Destination Delivery

ESX-21 decapsulates the VXLAN packet and so discovers the source VM’s virtual MAC address on the tunnel interface. It binds this MAC address to the ESX-11 source IP address of 10.1.1.11. ESX-21 then reads the destination MAC address, which is a broadcast in this case. Any multicast, broadcast, or unknown unicast destination will be flooded to all of ESX-21’s local virtual ports assigned to VXLAN 1001.In this case, only VM5 is


active, and so it receives the ARP request sent to it by VM1.

VXLAN Unicast Delivery VM5 has now received the ARP request and responds to VM1 with an ARP reply, which is formed as a Unicast packet. ESX-21 receives the packet from VM5’s virtual port, which is assigned to VXLAN 1001, as shown in Figure A3-8.


Figure A3-8: VXLAN Unicast Delivery

ESX-21 looks up the destination MAC address, which is VM1’s MAC address. Since ESX-21 just received an ARP request from VM1, it has an entry in its MAC address table for this host. It will use its outbound port towards the ESX-11’s IP address. ESX-21 encapsulates the packet into a VXLAN frame tagged with VXLAN ID 1001. Its own IP address is the source, and the destination IP address will ESX-11’s IP


address of 10.1.1.11. A multicast is not needed in this case, since the destination is known. This unicast IP datagram is sent into the transport IP network. Once encapsulated, VXLAN packets can be treated as any other IP-based traffic. It will benefit from the various types of equal-cost load-balancing capabilities offered by a typical routed IP infrastructure for intra-VXLAN delivery.

VXLAN Unicast Delivery As shown in Figure A3-9, ESX-11 receives and decapsulates the VXLAN packet, learning VM5’s source MAC address. It binds this MAC address to ESX-21’s source IP as outgoing interface. It then reads the destination MAC address, which is VM1’s address in this case. It then forwards the packet to the local virtual port assigned to VM1. VM1 receives the ARP reply.


Figure A3-9: VXLAN Unicast Delivery

The result of the multicast and unicast flows that we have just analyzed is that all VMs have communication at Layer 2 and all the VTEPs have built a table that associates learned MAC address to VTEP IP addresses. Therefore, each source can deliver frames directly to any valid destination host.

Learning Activity: VXLAN Review ******ebook converter DEMO Watermarks*******

Refer to the figure above when necessary, and answer the following questions. 1. Which devices above must have a tunnel between them, and why? Assuming two VXLANs must be supported, how many tunnels must be active? _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ 2. For devices in the IP network to successfully support VXLAN, what should their MTUs be set to, and why? _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ 3. What are two approaches to supporting multi-destination traffic, and why might you choose one over the other?


_______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ 4. ESX-12 has just received a frame from vm1, destined for vm4. How will ESX12 keep track of vm1’s MAC address? a. ESX-12 does not need to keep track of this, since it is handled by the fabric b. It maps the source MAC address on the outer-most Ethernet header to its local interface c. It maps the source MAC address on the original Ethernet header to its local interfaces IP address d. It maps the source MAC address of the original Ethernet header to ESX-11’s IP address e. It maps the FC-ID from the native frame to ESX-11’s IP address 5. When ESX-12 sends the response from vm4 back to vm1, what will it used as the VXLAN ID, source IP address, and destination IP address? _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________ _______________________________________________________________

Learning Activity: Answers 1. Which devices above must have a tunnel between them, and why? Assuming two VXLANs must be supported, how many tunnels must be active? Each VTEP must form a tunnel with other VTEPS in the same VXLAN (in other words, has the same VNI assignment). Since physical hosts ESX-11, 12, and 21 all support VXLAN 1001, each must have a tunnel to the other two. Since hosts ESX-11, 12, and 22 support VXLAN 2001, each must have a tunnel to the other two. Multiple VXLANs can be supported over a single tunnel, so only one tunnel needs to form between VTEPs, regardless of the number of VXLANs supported.


2. For devices in the IP network to successfully support VXLAN, what should their MTUs be set to, and why? VXLAN adds a 20-byte IP header, 8-byte UDP header, 8-byte VXLAN header, and a 14-byte Ethernet header, therefore, the MTU must be greater than 1550 bytes on all device interfaces in the IP network. 3. What are two approaches to supporting multi-destination traffic, and why might you choose one over the other? Head-end replication is one solution, in which each multi-destination frame is replicated for each VTEP that supports the VXLAN. This a less scalable method, due to the overhead associated with processing multiple unicast frames. However, for smaller deployments it allows for a simple IP network, since multicast support need not be configured. For larger networks, multicast support can be configured on the IP network, and then leveraged by VXLAN to more efficiently forward frames. The added complexity of having to configure multicast support is likely worth the additional efficiency for large and growing deployments. 4. ESX-12 has just received a frame from vm1, destined for vm4. How will ESX12 keep track of vm1’s MAC address? The answer is c, It maps the source MAC address of the original Ethernet header to ESX-11’s IP address 5. When ESX-12 sends the response from vm4 back to vm1, what will it used as the VXLAN ID, source IP address, and destination IP address? Both VMs in this scenario are on VXLAN 1001, so that would be used. The source IP address would be 12.1.2.12, and the destination IP address would be 10.1.1.11.

VXLAN to Physical Network Having explored how devices on the same VXLAN communicate, the focus is now on routing between VXLANs. Some kind of a gateway function must allow VMs to access external networks. This is similar to how a host connected to a traditional VLAN requires a default gateway on the same VLAN. There are three possible solutions for inter-VXLAN routing: ■ VXLAN Layer 3 VM-Based: This solution places routing services inside the ESX host, using the HP Virtual Service Router. ■ VXLAN Layer 3 VMWare Edge Gateway: This solution relies on a VMWare


product inside the ESX host to provide routing services. ■ VXLAN Layer 2 hardware: This solution exposes the VXLAN to an external VLAN connection, using a hardware-based VXLAN –VLAN gateway, like the HP 5930.

VXLAN to Physical network: VM Based IP Routing With VM-Based IP routing, you create a virtual machine that has two or more virtual NICs. One vNIC is bound to the VXLAN network object on the ESX server, while the other vNIC is bound to a traditional VMNet VLAN in a traditional vSwitch defined on the ESX server. This VMNet VLAN is associated with a physical NIC on the ESX host, which is connected to a switch port configured to support that VLAN. Figure A3-10 highlights this scenario for the ESX-11 server. To support VM-base routing, VM2 is created. This VM includes a G1/0 vNIC bound to VXLAN 1001, with IP address 192.168.1.1. Its G2/0 interface is bound to VMNet VLAN 102, and has an IP address of 192.168.2.1.


Figure A3-10: VXLAN to Physical network: VM Based IP Routing

Host VM1 is configured on VXLAN 1001 as before, and its default gateway is the 192.168.1.1 address. As with traditional default gateways, the VM2 virtual gateway device accepts frames from hosts and routes them out its G2/0 interface. Traffic sent out VM2’s G2/0 interface includes an 802.1q VLAN tag of 102 as it exits the physical NIC. The attached switch port is configured to support this VLAN, and so accepts it inbound and forwards it on through the IP transport network. Of course, this VM2 virtual router must exchange routes with the physical routers on the IP transport network. The example in Figure A3-10 shows VM2 running on physical host ESX-11. However, this virtual router could operate on any ESX host that has access to VXLAN 1001, and is connected to a physical switch port that supports VLAN 102. If the virtual service router is moved using Vmotion, the logical router topology does not change. This solution is viable with any operating system that allows routing. For example,


the HP Comware portfolio includes HP Virtual Service Routing. This is a softwarebased router optimized for hypervisor environments. Currently, this solution supports routed access only. It is not possible to connect the 192.168.1.1 VXLAN directly to an external, traditional VLAN.

VXLAN to Physical Network: VMWare Edge Gateway This solution, shown in Figure A3-11, is very similar to the previous solution, but deployed using a VMWare product. The VMWare Guest operating system can provide routing, filtering, address translation, and firewall services. Fundamental operation of this solution is as described in the previous example.


Figure A3-11: VXLAN to Physical Network: VMWare Edge Gateway

VXLAN to Physical Network: Hardware Gateway The third solution available for VXLAN-to-physical network connectivity uses external hardware, such as the HP Comware 5930, as shown in Figure A3-12. The 5930 supports this functionality due to its newer Trident II chipset. The 5900, with its earlier Trident + chipset cannot support this VXLAN functionality.


Figure A3-12: VXLAN to Physical Network: Hardware Gateway

With this solution, hosts in a VXLAN are directly mapped to a traditional VLAN, using a Virtual Switch Instance (VSI). This is a Layer 2 connection. There is no longer a software gateway, with its associated IP addressing, required to connect the virtual and physical environments. The VXLAN is simply mapped to an external VLAN. IP routing services are not provided by the 5930 Layer2 gateway. This requires an external routing device. Routing can be provided internally by a VM routing service, such as the aforementioned HP VSR or VMWare Edge Gateway. Alternatively, routing could be provided via a physical IP routing switch. This provides two ways to deploy this solution, and it is up to the network administrator which method they


prefer.

Note The next 7900 and 12900 LPU will support VXLAN termination and routing on the same device.

Hardware Gateways and OVSDB VTEP configurations on the ESX Host are created and maintained by the vSphere management software, using the Open Virtual Switch Data Base (OVSDB) format, as described in Figure A3-13. The moment the virtual machines start, this management software knows that they are bound to specific VXLANs. It notifies all other VTEP devices to create the appropriate internal interfaces, providing a kind of centralized configuration management.

Figure A3-13: Hardware Gateways and OVSDB

Design Considerations Figure A3-14 introduces some design considerations related to VXLAN connectivity.


Figure A3-14: Design Considerations

First, the IP transport network should support IP multicast. This means that PIM must be configured to accommodate routed links between the VTEPs. Also, IGMP must be configured to avoid Layer 2 flooding. Of course, if all hosts are on the same subnet, there is no routing, and therefore the need for multicast routing is eliminated Also, you must provision IP multicast ranges for the VXLAN IDs. Multiple VXLANs can be mapped to the same transport multicast IP address. However, this will result in sub-optimal delivery for VXLAN multi-destination traffic. For example, suppose that both VXLAN 1001 and 1002 are mapped to multicast address 239.1.1.1. The broadcast traffic for VXLAN 1001 would physically arrive on all the tunnel end points of both VXLAN 1001 and 1002. This effectively increases the size of the broadcast domain, and causes undue processing on endpoints. VTEPs which only have VMs on VXLAN1001 will receive broadcasts for VXLAN 1002. They will decapsulate these packets, realize that they are not needed, and then discard them.

Configuration Steps for VXLAN Figure A3-15 introduces the steps to configure VXLAN, and depicts a sample


topology. In this scenario, the transport IP network is multicast-enabled. Switch 5930-1 is near the top of the diagram, while 5930-2 is near the bottom. These are the two VTEPs. They will be configured to set up a tunnel interface so that VXLAN 1001 traffic can traverse the transport IP network. That specific VXLAN will also be delivered to physical VLAN 101

Figure A3-15: Configuration Steps for VXLAN

The tunnel interface on each switch will be bound to the VSI defined for VXLAN 1001. A Service Instance will be created to bind the physical interface VLAN traffic to the VSI. The top server, with IP address 192.168.1.11, is connected to an access port in VLAN 101. That switch will tag its uplink traffic when sending it to 5930-1, which maps VLAN 101 to VXLAN 1001. It delivers frames to the VSI, which processes the traffic and sends it out over the Tunnel Interface. Broadcasts from the 192.168.1.11 server are delivered to the tunnel interface over the VXLAN network. This VXLAN encapsulated packet arrives at 5930-2, on the tunnel interface on the VSI. The VSI performs local flooding, sending the packet on the local wire, tagged with VLAN 101. This traffic is processed by the external access switch, which recognizes it as a broadcast frame, and floods it out all ports in VLAN 101, including the port connected to the server at IP address 192.168.1.12.


Step 1: Configure Global L2VPN Shown in Figure A3-16, the first step is to enable the Layer 2 VPN globally. This is the same command that is used for Layer2 VPN/VPLS configuration. This enables the VSI model inside the Comware 5930 switch hardware.

Figure A3-16: Step 1: Configure Global L2VPN

Step 2: Configure VXLAN Tunnel The next step is to create a VXLAN tunnel. This provides the VXLAN encapsulation services over the transport IP Network. To ensure a stable, unchanging tunnel source IP address, a Loopback interface is created, and specified. These are point-to-point tunnels, so you must create a tunnel interface for each remote VTEP. This scenario only uses a single remote 5930, so only one tunnel endpoint is defined here. As previously mentioned, this step must be done manually, since OVSDB information cannot currently be shared between hardware gateways and ESX hosts. Should this capability become available, then the vSphere management host could dynamically create new tunnels on the hardware gateway. In Figure A3-17, Interface loopback 0 is defined and an IP address is assigned to it. The tunnel interface is created with VXLAN mode, and the tunnel’s local source and remote destination loopback addresses are specified. On the 5930-2 switch, the loopback address would be 10.2.0.22, and the tunnels source and destination addresses would be reversed.

Figure A3-17: Step 2: Configure VXLAN Tunnel


Step 3: Create VSI VXLAN + Bind VXLAN Tunnel Next, the VSI is defined. This VSI will support a local, physical interface which is bound to the Service Instance and the VXLAN Tunnel interface. The VSI will actually contain the VXLAN Identifier. In Figure A3-18, a VSI named Customer1 is created, and associated with VXLAN 1001. VXLAN 1001 is in turn bound to the previously created tunnel.

Figure A3-18: Step 3: Create VSI VXLAN + Bind VXLAN Tunnel

If multiple tunnels to multiple VTEP endpoints were required, more tunnel end points would be created, and those tunnels would also be added as virtual ports to the virtual switches.

Step 4: Create Service Instance A service instance defines which local traffic should be connected to the VXLAN VSI. The Service Instance ID is locally significant to the Interface only. In Figure A3-19, interface ten gigabit 1/0/2 is configured to be associated with service-instance 10, and configured to use VLAN 101.

Figure A3-19: Step 4: Create Service Instance

At this point in the discussion, some might remember that the point of having VXLANs was to have more than 4000 VLANs. However, once VXLANs are mapped


to traditional VLANS, the old VLAN limitation seems to resurface. This doesn’t have to be the case. With this model, VXLANs can be bound to 4000 VLAN IDs on interface ten1/0/2, and another set of VXLANs can be bound to 4000 other VLANs on another physical interface. The VLAN 101 on interface ten1/0/2 has nothing to do with the VLAN 101-tagged traffic on interface ten1/0/3. Therefore, you can distribute blocks of 4000 VLANs over different physical ports to different regions of the data center. This is so because of the VSI-to-Service Instance mapping you will configure in the next step.

Step 5: Bind Service Instance to VSI Next, the Service Instance is bound to the VSI, as shown in Figure A3-20. This creates a kind of cross-connect between the service instance and the VSI. Since this cross-connect directly maps the physical interface to a VSI, the traffic is not processed globally by the switch as VLAN 101 traffic. It is only processed inside the VSI.

Figure A3-20: Step 5: Bind Service Instance to VSI

Globally defined VLAN 101 operates totally independent of the VLAN 101 that is processed by the service instance on this physical interface.

Step 6: Configure Transport IP Interface IGMP The next step is to configure Transport IP Interfaces and IGMP. An IP connection to the remote VTEP Loopback address is required, and this interface needs to support an IGMP client function. In the example in Figure A3-21, the gigabit interface 1/0/1 is assigned an IP address, IGMP is enabled, and the IGMP host function is set. This means causes this interface to act as an IP Multicast endpoint, and use IGMP to join a multicast group, just like an actual client endpoint would.


Figure A3-21: Step 6: Configure Transport IP Interface IGMP

Step 7: Configure VXLAN VSI Multicast Address Now that IGMP host functionality is enabled, the multicast group to join is specified, along with source IP address used to join that group, as shown in Figure A3-22.

Figure A3-22: Step 7: Configure VXLAN VSI Multicast Address

It is recommended to use a unique Multicast IP address per VXLAN. This optimized efficiency and reduces the unnecessary processing of frames, as previously explained.

Step 8: Verify Figure A3-23 shows several display commands that can be used to validate your configuration efforts. This includes validating the status of the tunnel interface and the VSI, as well as checking MAC-addresses.



Summary In this appendix, you learned that: ■ VXLANs extend the scalability of traditional VLANs to support over 16 million broadcast domains, while improving deployment and management flexibility and efficiency for data center administrators. ■ Supported on 5930 Tor and 7900, 11900, 12900 products, VXLAN is an IP-based overlay technology that tunnels L2 frames inside IP datagrams. Each physical server or switch has a tunnel endpoint called a VXLAN Tunnel Endpoint (VTEP). Both Layer 2 broadcast and unicast frames are tunneled through a traditional IP transport network to provide all the functionality of single broadcast domain. ■ VXLAN and physical networks can be interconnected with Layer 3 routing functionality deployed inside a hypervisor environment, or via a Layer 2 hardware VXLAN-to-VLAN gateway, such as the HP 5930. ■ The IP transport network can support VXLANs using unicast or multicast services. Unicast is simple to deploy, but has scalability and processing concerns. A multicast deployment optimizes bandwidth and packet processing utilization. ■ You can configure a 5930 to operate as a VXLAN-to-VLAN gateway.

Learning Check Answer each of the questions below. 1. What are three advantages of VXLAN (Choose three)? a. It is an IEEE standards-based protocol. b. Any existing IP routed infrastructure can be used. c. The VXLAN ID is a 16-bit number, allowing for over 64,000 VLAN IDs d. It provides flexibility, since it is an IP-based Layer 2 overlay technology e. Scalability is further enhanced through the use of BGP extensions. 2. Choose three correctly described components of a typical VXLAN deployment (Choose three). a. The VTEP provides an entry point into the VXLAN. b. Each participating VM host is assigned a VNI. c. A VXLAN tunnel is formed between two VNI-assigned VMware hosts. d.

VXLAN can use a multicast IP address for Layer 2 multi-destination


delivery. e. VXLAN requires IP multicast capability, since that is the only method of transporting broadcast frames across the VXLAN fabric. 3. Because of the additional header information added by VXLAN systems, the MTU of the IP infrastructure should be increased to at least 1550 bytes a. True. b. False. 4. What are three possible solutions for routing between VXLANs (Choose three)? a. The internal, native vRouter on the hypervisor. b. The HP Virtual Services router. c. VXLAN Layer 3 VMWare Edge Gateway. d. A hardware-based VXLAN – VLAN gateway, like the HP 5930. e. Any Layer 3-capable device. 5. What are two design considerations for VXLAN deployments (Choose two)? a. Avoid Multicast routing protocol configuration on the IP infrastructure b. IGMP must be configured to avoid Layer 2 flooding c. You need to configure IP multicast ranges for the VXLAN IDs. d. Mapping multiple VXLANs to the same transport IP address can improve the efficiency of packet deliver.

Learning Check Answers 1. b, d, e 2. a, b, d 3. a 4. b, c, d 5. b, c


Building HP FlexFabric Data Centers_PD28400 698 Pages PDF

Recommend Documents