_SE_ADCX-14.a-R_SG_.pdf

Advanced Data Center Switching 14.a

Student Guide Guide

Worldwide Education Services 1133 Innovation Way Sunnyvale, CA 94089 USA 408-745-2000 www.juniper.net

Course Number: EDU-JUN-ADCX

This document is produced by Juniper Networks, Inc. This document or any part thereof may not be reproduced or transmitted in any form under penalty of law, without the prior written permission of Juniper Networks Education Services. Juniper Networks, Junos, Steel-Belted Radius, NetScreen, and ScreenOS are registered trademarks of Juniper Networks Networks,, Inc. in the United States and other countries. The Juniper Networks Logo, the Junos logo, and JunosE are trademarks of Juniper Networks, Inc. All other trademarks, service marks, registered trademarks, or registered service marks are the property of their respective owners.

Advanced Data Data Center Switching Switching Student Guide Guide, Revision 14.a Copyright © 2016 Juniper Networks, Inc. All rights reser ved. Printed in USA. Revision History: Revision 14.a—April 2016 The information in this document is current as of the date listed above. The information in this document has been carefully verified and is believed to be accura te for software Release 14.1X53. Juniper Networks assumes no responsibilities for any inaccuracies that may appear in this document. In no event will Juniper Networks be liable for direct, indirect, special, exemplary, incidental, or consequential damages resulting from any defect or omission in this document, even if advised of the possibility of such damages.

Juniper Networks reserves the right to change, modify, transfer, or otherwise revise this publication without notice. YEAR 2000 NOTICE Juniper Networks hardware and software products do not suffer from Year 2000 problems and hence are Year 2000 compliant. The Junos operating system has no known time-related limitations through the year 2038. However, However, the NTP application is known to have some difficulty in the year 2036. SOFTWARE LICENSE The terms and conditions for using Juniper Networks software are described in the software license provided with the software, or to the extent applicable, in an agreement executed between you and Juniper Networks, or Juniper Networks agent. By using Juniper Networks software, you indicate that you understand and agree to be bound by its license terms and conditions. Generally speaking, the software license restricts the manner in which you are permitted to use the Juniper Networks software, may contain prohibitions against certain uses, a nd may state conditions under which the license is automatically terminated. You should consult the software license for further details.

Traditional Multitier Multitier Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3 Data Center Fabric Architectures Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9

IP Fabric Overview Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3 IP Fabric Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12 IP Fabric Scaling Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-25 Configure an IP Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-30 Lab: IP Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-49

Layer Connectivity Connectivity Over a Layer 3 Network Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3 VXLAN Using Multicast Multicast Control Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11 VXLAN Configuration Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24 Lab: VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-42

The Benefits Benefits of EVPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VXLAN Using EVPN EVPN Control Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EVPN /VXLAN Configuration Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lab: EVPN Control Plane for VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. 5-3 5-11 5-31 5-56

DCI Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3 MPLS VPN Review Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10 DCI Options for a VXLAN VXLAN Overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-43 EVPN Type 5 Routes Routes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-49 DCI Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-52 Lab: Data Center Interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-63

www.juniper.net

Contents • iii

iv • Contents

www.juniper.net

This two-day course is designed to introduce various QFX5k and MX/vMX features inc luding, but not limited to, IP Fabric, Virtual eXtensible Local Area Network (VXLAN) Layer 2 and Layer 3 Gateways, VXLAN with Ethernet VPN (EVPN) signaling, Data Center Interconnect (DCI) for a VXLAN overlay. overlay. Students will learn to configure and monitor these features that exist on the Junos oper ating system running on the QFX5100 and vMX Series platform. Through demonstrations and hands-on labs, students will gain experience configuring, monitoring, and analyzing the above features of the Junos OS. This course is based on software Release 14.1X53.

This course benefits individuals re sponsible for configuring and monitoring switching features that exist on the Junos OS running on the QFX5k and MX Series platforms, including individuals in professional services, sales and support organizations, and the end users.

(ADCX) is an advanced-level course. Advanced Data Center Center Switching (ADCX)

The following are the prerequisites for this course: •

Understan stand ding of of th the OS OSI mo model;

•

Juno Junos s OS conf config igur urat atio ion n expe experi rien ence ce—t —the he Introduction to the Junos Operating System (IJOS) course or equivalent;

•

Advan dvance ced d routi outing ng kno knowled wledge ge—t —the he Advanced Junos Enterprise (AJER) course or equivalent; and Enterprise Routing (AJER)

•

Inte Interm rmed edia iate te swit switch chin ing g know knowle ledg dge— e—th the e Junos Enterprise Switching Switching Using Enhanced Layer Layer 2 Software (JEX-ELS) and Data Center Switching (DCX) courses or equivalent.

After successfully completing this course, you should be able to: •

Describ Describe e the the benef benefits its and challe challenges nges of the tradit traditiona ionall multi multitie tierr archi architec tectur ture. e.

•

Descr Describ ibe e the the new new netw networ orki king ng req requi uirem rement ents s in a data data cente centerr.

•

Descr Describ ibe e the the vari various ous data data cente centerr fab fabric ric arch archit itect ectur ures es..

•

Expl Expla ain routi outing ng in an IP Fabri abric. c.

•

Desc Descri ribe be how how to scal scale e an IP Fabri abric. c.

•

Conf Config igur ure e an an EB EBGP-b GP-bas ased ed IP Fabri abric. c.

•

Expla Explain in why why you you woul would d use use VXLA VXLAN N in in yyou ourr dat data a cent center er..

•

Describ Describe e the the contr control ol and and data data plan plane e of VXLAN VXLAN in in a cont control roller ler-le -less ss over overla lay. y.

•

Describ Describe e how how to to confi configure gure and monito monitorr VXLA VXLAN N when when using using mult multica icast st sign signali aling. ng.

•

Desc Describ ribe e the the bene benefit fits so off usi using ng EVPN EVPN sign signal alin ing g for for VXLA VXLAN. N.

•

Desc Descri ribe be the the ope opera rati tion on of the the EVP EVPN N pro prottocol ocol..

•

Conf Config igur ure e and and moni monittor EVP EVPN N sign signal alin ing g for for VXL VXLAN AN..

•

Defi Define ne the the ter term m Dat Data a Cen Cente terr Int Inter erco conn nnec ect. t.

•

Desc Descri ribe be the the cont contro roll and and dat data a plan plane e of an MPLS MPLS VPN. VPN.

•

Describ Describe e the the DCI DCI opti options ons when when usin using g a VXLAN VXLAN overla overlayy with with EVPN EVPN sign signali aling. ng.

www.juniper.net

Course Overview • v

Chapter 1:

Course Introduction

Chapter 2:

Next Generation Data Centers

Chapter 3:

IP Fabric Lab: IP Fabric

Chapter 4:

VXLAN Lab: VXLAN

Chapter 5:

EVPN Lab: VXLAN with EVPN Signaling

Chapter 6:

Data Center Interconnect Lab: Data Center Interconnect

vi • Course Agenda

www.juniper.net

Frequently throughout this course, we refer to text that appears in a command-line interface (CLI) or a graphical user interface (GUI). To make the language of thes e documents easier to r ead, we distinguish GUI and CLI text from chapter text according to the following table. Style

Description

Franklin Gothic

Normal text.

Cour i er New

Console text: •

Screen captures

•

Noncommand-related syntax

Usage Example Most of what you read in the Lab Guide and Student Guide.

commi t compl et e Exi t i ng conf i gur at i on mode

GUI text elements: •

Menu names

•

Text field entry

Select Fi l e > Open, and then click Conf i gur at i on. conf in the Fi l ename text box.

You will also frequently see cases where you must enter input text yourself. Often these instances will be shown in the context of where you must enter them. We use bold style to distinguish text that is input versus text that is simply displayed. Style

Description

Usage Example

Nor mal CLI

No distinguishing variant.

Physi cal i nt er f ace: f xp0, Enabl ed

Nor mal GUI

View configuration history by clicking Conf i gur at i on > Hi st or y. CLI Input

Text that you must enter.

l ab@San_ J ose> show route Select Fi l e > Save, and type e field. config.ini in the Fi l enam

GUI Input

Finally, this course distinguishes between regular text and syntax variables, and it also distinguishes between syntax variables where the value is already assigned (defined variables) and syntax variables where you must assign the value (undefined variables). Note that these styles can be combined with the input style as well. Style

Description

Usage Example

CLI Variable

Text where variable value is already assigned.

pol i c y my-peers

GUI Variable

CLI Undefined Text where the variable’s value is

the user’s discretion or text where the variable’s value as shown in GUI Undefined the lab guide might differ from the value the user must input according to the lab topology.

www.juniper.net

Click my-peers in the dialog. Type set policy policy-name. ping 10.0.x.y

Select Fi l e > Save, and type filename in the Fi l enam e field.

Document Conventions • vii

You can obtain information on the latest Education Services offerings, course dates, and class locations from the World Wide Web by pointing your Web browser to: http://www.juniper.net/training/education/.

The Advanced Data Center Switching Student Guide was developed and tested using sof tware Release 14.1X53. Previous and later versions of software might behave differently so you should always consult the documentation and release notes for the version of code you are running before reporting errors. This document is written and maintained by the Juniper Networks Education Services development team. Please send questions and suggestions for improvement to [email protected].

You can print technical manuals and release notes directly from the Internet in a variety of formats: •

Go to http://www.juniper.net/techpubs/.

•

Locate the specific software or hardware release and title you need, and choose the format in which you want to view or print the document.

Documentation sets and CDs are available through your local Juniper Networks sales office or account representative.

For technical suppor t, contact Juniper Networks at http://www.juniper.net/customers/support/, or at 1-888-314-JTAC (within the United States) or 408-745-2121 (outside the United States).

viii • Additional Information

www.juniper.net

Advanced Data Center Switching Chapter 1: Course Introduction

Advanced Data Center Switching

•

Objectives and course content information;

•

Additional Juniper Networks, Inc. courses; and

•

The Juniper Networks Certification Program.

Chapter 1–2 • Course Introduction

www.juniper.net


The slide asks several questions for you to answer during class introductions.

www.juniper.net

Course Introduction • Chapter 1–3


The slide lists the topics we discuss in this course.


www.juniper.net


The slide lists the prerequisites for this course.

www.juniper.net



The slide documents general aspects of classroom administration.


www.juniper.net


The slide describes Education Services materials that are available for reference both in the classroom and online.

www.juniper.net



The slide provides links to additional resources available to assist you in the installation, configuration, and operation of Juniper Networks products.


www.juniper.net


Juniper Networks uses an electronic survey system to collect and analyze your comments and feedback. Depending on the class you are taking, please complete the survey at the end of the class, or be sure to look for an e-mail about two weeks from class completion that directs you to complete an online survey form. (Be sure to provide us with your current e-mail address.) Submitting your feedback entitles you to a certificate of class completion. We thank you in advance for taking the time to help us improve our educational offerings.

www.juniper.net



Juniper Networks Education Services can help ensure that you have the knowledge and skills to deploy and maintain cost-effective, high-performance networks for both enterprise and service provider environments. We have expert training staff with deep technical and industry knowledge, providing you with instructor-led hands-on courses in the classroom and online, as well as convenient, self-paced eLearning courses. In addition to the courses shown on the slide, Education Services offers training in automation, E-Series, firewall/VPN, IDP, network design, QFabric, support, and wireless LAN.

Juniper Networks courses ar e available in the following formats: •

Classroom-based instructor-led technical courses

•

Online instructor-led technical courses

•

Hardware installation eLearning courses as well as technical eLearning courses

•

Learning bytes: Short, topic-specific, video-based lessons covering Juniper products and technologies

Find the latest Education Services offerings covering a wide range of platforms at http://www.juniper.net/training/technical_education/.


www.juniper.net


A Juniper Networks certification is the benchmark of skills and competence on Juniper Networks technologies.

www.juniper.net



The Juniper Networks Certification Program (JNCP) consists of platform-specific, multitiered tracks that enable participants to demonstrate competence with Juniper Networks technology through a combination of written proficiency exams and hands-on configuration and troubleshooting exams. Successful candidates demonstrate a thorough understanding of Internet and security technologies and Juniper Networks platform configuration and troubleshooting skills. The JNCP offers the following features: •

Multiple tracks;

•

Multiple certification levels;

•

Written proficiency exams; and

•

Hands-on configuration and troubleshooting exams.

Each JNCP track has one to four cer tification levels—Associate-level, Specialist-level, Professional-level, and Expert-level. The Associate-level, Specialist-level, and Professional-level exams are computer-based exams composed of multiple choice questions administered at Pearson VUE testing centers worldwide. Expert-level exams are composed of hands-on lab exercises administered at select Juniper Networks testing centers. Please visit the JNCP website at http://www.juniper.net/certification for detailed exam information, exam pricing, and exam registration.


www.juniper.net


The slide lists some options for those interested in preparing for Juniper Networks certification.

www.juniper.net



The Junos Genius application takes certification exam preparation to a new level. With Junos Genius you can practice for your exam with flashcards, simulate a live exam in a timed challenge, and even build a virtual network with device achievements earned by challenging Juniper instructors. Download the app now and Unlock your Genius today!


www.juniper.net


The slide lists some online resources to learn and share information about Juniper Networks.

www.juniper.net



If you have any questions or concerns about the class you are attending, we suggest that you voice them now so that your instructor can best address your needs during class.


www.juniper.net

Advanced Data Center Switching Chapter 2: Next Generation Data Centers


•

The benefits and challenges of the traditional multitier architecture;

•

The networking requirements that are requiring a change to the design of a data center; and

•

The various data center fabric architectures.

Chapter 2–2 • Next Generation Data Centers

www.juniper.net


The slide lists the topics we will discuss. We discuss the highlighted topic first.

www.juniper.net

Next Generation Data Centers • Chapter 2–3


Legacy data centers are often hierarchical and consist of multiple layers. The diagram on the slide illustrates the typical layers, which include access, distribution (sometimes referred to as aggregation), and core. Each of these layers performs unique responsibilities. We cover the functions of each layer on a subsequent slide in this section. Hierarchical networks are designed in a modular fashion. This inherent modularity facilitates change and makes this design option quite scalable. When working with a hierarchical network, the individual elements can be replicated as the network grows. The cost and complexity of network changes is generally confined to a specific portion (or layer) of the network rather than to the entire network. Because functions are mapped to individual layers, faults relating to a specific function can be isolated to that function’s corresponding layer. The ability to isolate faults to a specific layer can greatly simplify troubleshooting effor ts.


www.juniper.net


The individual layers usually represent specific functions found within a network. It is often mistakenly thought that the access, distribution (or aggregation), and core layers must exist in clear and distinct physical devices, but this is not a requirement, nor does it make sense in some cases. The slide highlights the access, aggregation, and core layers and provides a brief description of the functions commonly implemented in those layers. If CoS is used in a network, it should be incorporated consistently in all three layers.

www.juniper.net



Data centers built utilizing a hierarchical implementation can bring some flexibility to designers: •

Since using a hierarchical implementation does not require the use of proprietary features or protocols, a multitier topology can be constructed using equipment from multiple vendors.

•

A multitier implementation allows flexible placement of a variety of switching platforms. The simplicity of the protocols used does not require specific Junos versions or platform positioning.


www.juniper.net


Data centers built more than a few years ago face one or more of the following challenges: •

The legacy multitier switching architecture cannot provide today’s applications and users with predictable latency and uniform bandwidth. This problem is made worse when virtualization is introduced, where the performance of virtual machines (VMs) depends on the physical location of the servers hosting those VMs.

•

The management of an ever growing data center is becoming more and more taxing administratively speaking. While the north to south boundaries have been fixed for years, the east to west boundaries have not stopped growing. This growth, of the compute, storage, and infrastructure, requires a new management approach.

•

The power consumed by networking gear represents a significant proportion of the overall power consumed in the data center. This challenge is particularly important today, when escalating energy costs are putting additional pressure on budgets.

•

The increasing performance and densities of modern CPUs has led to an increase in network traffic. The network is often not equipped to deal with the large bandwidth demands and increased number of media access control (MAC) addresses and IP addresses on each network port.

•

Separate networks for Ethernet data and storage traffic must be maintained, adding to the training and management budget. Siloed Layer 2 domains increase the overall costs of the data center environment. In addition, outages related to the le gacy behavior of the Spanning Tree Protocol (STP), which is used to s upport these legacy environments, often results in lost revenue and unhappy customers.

Given these challenges, along with others, data center operators are seeking solutions.

www.juniper.net



In the multitier topology displayed on the slide, you can see that almost half the links are not utilized. In this example you would also need to be running some type of spanning tree protocol (STP) to avoid loops which would introduce a delay with your network convergence as well as introduce significant STP control traffic taking up valuable bandwidth. This topology is relatively simple but allows us to visualize the lack of resource utilization. Imagine a data center with a hundred racks of servers with a hundred top of rack access switches. The access switches all aggregate up to the core/distribution switches including redundant connections. In this much larger and complicated network you would have 1000s of physical cable connections that are not being utilized. Now imagine these connections are fiber, in addition to the unused cables you would also have two transceivers per connection that are not being used. Because of the inefficient use of physical components there is a significant amount of usable bandwidth that is sitting idle.


www.juniper.net


The slide highlights the topic we discuss next.

www.juniper.net



Data centers must be flexible and change as the users needs change. This means that today’s data centers must evolve and are becoming flatter simpler and more flexible in order to keep up with the constantly increasing end user demands. Understanding why these changes are being implemented is important when trying to understand the needs of the customer. There are a few reasons impacting this change including: •

Application Flows: More east-west traffic communication is happening in data centers. With todays applications, many requests can generate a lot of traffic between devices in a single data center. Basically a single user request triggers a barrage of additional request to other devices. Then go here, get this, then go here get that, behavior of many applications is being done on such a large scale today that it is driving data centers to become flatter and provide higher performance with consistency.

•

Network Virtualization: This means overlay networks for example, NSX and Contrail. Virtualization is being implemented in todays data centers and will continue to gain in popularity in the future. Some customer might not be currently using virtualization in their data center, but it could definitely plays a role in your design for those customers that are forward looking and eventually want to incorporate some level of virtualization.

•

Everything as a service: To be cost effective, a data center offering hosting services must be easy to scale out and scale back as demands change. The data center should be very agile and easy to deploy new services quickly.


www.juniper.net


The graphic on the slide is designed to serve as a quick Juniper Networks data center architecture guide based strictly on the access (server) por ts needed. The scaling numbers p rovided are calculated based on access switches that have 96 available server ports.

www.juniper.net



This combination is the recommended deployment method if the data center requires a standard multitier architecture. Multichassis link aggregation groups (MC-LAGs) are very useful in a data center when deployed at the access layer to allow redundant connections to your server s as well as offers dual control planes. In addition to the acc ess layer, MC-LAGs are also commonly deployed at the cor e layer. When MC-LAG is deployed in an Active/Active fashion, both links between the attached device and the MC-LAG peers are active and available for forwarding traffic. Using MC-LAG eliminates the need to run STP on member links and depending on the design, can eliminate the need for STP all together.


www.juniper.net


The Juniper Networks Virtual Chassis Fabric (VCF) provides a low-latency, high-performance fabric architecture that can be managed as a single device. VCF is an evolution of the Virtual Chassis feature, which enables you to interconnect multiple devices into a single logical device, inside of a fabric architecture. A VCF is constructed using a spine-and-leaf architecture. In the spine-and-leaf architecture, each spine device is interconnected to each leaf device. A VCF supports up to 32 total devices, including up to four devices being used as the spine.

www.juniper.net



The QFabric System is composed of multiple components working together as a single switch to provide high-performance, any-to-any connectivity and management simplicity in the data center. The QFabric System flattens the entire data center network to a single tier where all access points are equal, eliminating the effects of network locality and making it the ideal network foundation for cloud-ready, virtualized data ce nters. QFabric is a highly scalable system that improves application performance with low latency and converged services in a non-blocking, lossless architecture that supports Layer 2, Layer 3, and Fibre Channel over Ethernet (FCoE) capabilities. The reason you can consider the QFabric system as a single system is that the Director software running on the Director group allows the main QFabric system administrator to access and configure every device and port in the QFabric system from a single location. Although you configure the system as a single entity, the fabric contains four major hardware components. The hardware components can be chassis-based, group-based, or a hybrid of the two.


www.juniper.net


Junos Fusion is a Juniper Networks Ethe rnet fabric architecture designed to provide a bridge from legacy networks to software-defined cloud networks. With Junos Fusion, service providers and enterprises can reduce network complexity and operational costs by collapsing underlying network elements into a single, logical point of management. The Junos Fusion architecture consists of two major components: aggregation devices and satellite devices. With this structure it can also be classified as a spine and leaf architecture. These components work together as a single switching system, flattening the network to a single tier without compromising resiliency. Data center operators can build individual Junos Fusion pods comprised of a pair of aggregation devices and a set of satellite devices. Each pod is a collection of aggregation and satellite devices that are managed as a single device. Pods can be small—for example, a pair o f aggregation devices and a handful of satellites—or large with up to 64 satellite devices based on the needs of the data center operator.

www.juniper.net



An IP Fabric is one of the most flexible and scalable data center solutions available. Because an IP Fabric operates strictly using Layer 3, there are no proprietary features or protocols being used so this solution works very well with data centers that must accommodate multiple vendors. One of the most complicated tasks in building an IP Fabric is assigning all of the details like IP addresses, BGP AS numbers, routing policy, loopback address assignments, and many other implementation details.


www.juniper.net


Next generation data centers have dif ferent requirements than the traditional data ce nter. One major requirement in a next generation data center is that traffic is load balanced over the multiple paths between rack in a data center. Also, a requirement that is becoming less and less necessary is the ability of the underlying switch fabric to carry native Ethernet frames between VMs/server in different racks. Some of the major reasons for this shift are... 1.

IP-only Data: Many data centers simply need IP connectivity between racks of equipment. There is less and less need for the stretching of Ethernet networks over the fabric. For example, one popular compute and storage methodology is Apache’s Hadoop. Hadoop allows for a large set of data (i.e. like a single Tera-bit file) to be stored in chunks across many servers in a data center. Hadoop also allows for the stored chunks of data to be processed in parallel by the same servers they are stored upon. The connectivity between the possibly hundreds of servers needs only to be IP-based.

2.

Overlay Networking: Overlay networking allows for Layer 2 connectivity between racks however, instead of layer 2 frames being transferred natively over the fabric, they are tunneled using a different outer encapsulation. virtual eXtensible local area network (VXLAN), multiprotocol label switching (MPLS), and generic routing encapsulation (GRE) are some of the common tunneling protocols used to transport Layer 2 frames of the fabric of a data center. One of the benefits of overlay networking is that when there is a change to layer 2 connectivity between VMs/servers (the overlay network), the underlying fabric (underlay network) can remain relatively untouched and unaware of the changes occurring in the overlay network.

www.juniper.net



The diagram above shows a typical scenario with a Layer 2 underlay network with attached servers that host VMs as well as virtual switches. The example shows the underlay network as an Ethernet fabric. The fabric solves some of the customer requirements including load balancing over equal cost paths (assuming Virtual Chassis Fabric) as well as having no blocked spanning tree por ts in the network. However, this topology does not solve the VM agility problem or the 802.1q VLAN overlap problem. Also, as 802.1q VLANs are added to the virtual switches, those same VLANs must be provisioned on the underlay network. Managing the addition, removal, and movement of VMs (and their VLANs) for the 1000s of customers would be a nightmare for the operators of the underlay network.


www.juniper.net


Overlay networking can help solve many of the requirements and problems discussed in the previous slides. This slide shows the addition of an overlay network that includes the use of VXLAN. The overlay network consists of the virtual switches and the VXLAN tunnel endpoints (VTEPs). A VTEP will encapsulate the Ethernet frames that it receives from the vir tual switch into IP and forward the re sulting IP packet to the remote VTEP. The underlay network simply needs to forward IP packets between VTEPs. The receiving VTEP will de-encapsulate the VXLAN IP packets and then forward the resulting Ethernet Frame to the appropriate VM. Adding and removing VMs from the data center has no e ffect on the underlay network. The underlay network simply needs to provide IP connectivity between the VTEPs. When implementing the underlay network in this scenario, you have a few choices. You can use an Ethernet fabric like Virtual Chassis (VC), Virtual Chassis Fabric (VCF), or Junos Fusion. All of these are valid solutions. Because all of the traffic crossing the underlay network is IP, the option for an IP fabric becomes available. The choice of underlay network comes down to scale and future growth. An IP fabric is considered to be the most scalable underlay solution.

www.juniper.net



•

The benefits and challenges of the traditional multitier architecture;

•

The networking requirements that are requiring a change to the design of a data center; and

•

The various data center fabric architectures.


www.juniper.net


1.

2.

3.

www.juniper.net



1. Some of the challenges of the traditional data center designs are the slow convergence times of xSTPs as well as the wasted resources of unused (blocked by xSTP) interfaces. 2. Some of the applications that are driving change in the data center are multitenancy, increase in east-to-west traffic, Hadoop, and overlay networking. 3. Layer 2 networks can be stretched over an IP network using an overlay like VXLAN or GRE.


www.juniper.net


www.juniper.net




www.juniper.net

Advanced Data Center Switching Chapter 3: IP Fabric


•

Routing in an IP Fabric;

•

Scaling of an IP Fabric; and

•

Configuring an IP Fabric.

Chapter 3–2 • IP Fabric

www.juniper.net



www.juniper.net

IP Fabric • Chapter 3–3


An IP Fabric is one of the most flexible and scalable data center solutions available. Because an IP Fabric operates strictly using Layer 3, there are no proprietary features or protocols being used so this solution works very well with data centers that must accommodate multiple vendors. Some of the most complicated tasks in building an IP Fabric are assigning all of the details like IP addresses, BGP AS numbers, routing policy, loopback address assignments, and many other implementation details.


www.juniper.net


In the 1950s, Charles Clos first wrote about his idea of a non-blocking, multistage, telephone switching architecture that would allow calls to be completed. The switches in his topology are called crossbar switches. A Clos network is based on a three-stage architecture, an ingress stage, a middle stage, and an egress stage. The theory is that there are multiple paths for a call to be switched through the network such that calls will always be co nnected and not "blocked" by another call. The term Clos “fabric” came about later as people began to notice that the pattern of links looked like threads in a woven piece of cloth. You should notice that the goal of the design is to provide connectivity from one ingress crossbar switch to an egress crossbar switch. Notice that there is no need for connectivity between crossbar switches that belong to the same stage.

www.juniper.net



The diagram shows an IP Clos Fabric using Juniper Networks switches. In an IP Fabric the Ingress and Egress stage crossbar switches are called Leaf nodes. The middle stage crossbar switches are called Spine nodes. Most diagrams of an IP Fabric do not present the topology with 3 distinct stages as shown on this slide. Most diagrams show an IP Fabric with the Ingress and Egress stage combined as a single stage. It would be like taking the top of the diagram and folding it over onto itself with all Spines nodes on top and all Leaf nodes on the bottom of the diagram (see the next slide).


www.juniper.net


To maximize the throughput of the fabric, each Leaf node should have a connection to each Spine node. This ensures each server-facing interface is always two hops away from any other server-facing interfaces. This creates a highly resilient fabric with multiple paths to all other devices. An important fact to keep in mind is that a member switch has no idea of its location (Spine or Leaf) in an IP Fabric. The Spine or Leaf function is simply a matter of a device’s physical location in the fabric. In general, the choice of router to be used as a Spine nodes should be partially based on the interface speeds and number of ports that it supports. The example on the slide shows an example where every Spine node is a QFX5100-24q. The QFX5100-24q supports (32) 40GbE interfaces and was literally designed by Juniper to be a Spine node.

www.juniper.net



The slide shows that there are four distinct paths (1 path per Spine node) between Host A and Host B across the fabric. In an IP Fabric, the main goal of your design should be that traf fic is automatically load shared over those equal cost paths using a hash algorithm (keeping frames from same flow on same path).


www.juniper.net


IP Fabrics are generally structured in either a 3-stage topology or a 5-stage topology. A 3-stage topology is used in small to medium deployments. We cover the configuration of a 3-stage fabric in the upcoming slides. A 5-stage topology is used in a medium to large deployment. Although we do not cover the configuration of a 5-stage fabric, you should know that the configuration of a 5-stage fabric is quite complicated.

www.juniper.net



The slide shows some of the recommended Juniper Networks products that can act as Spine nodes. As stated earlier, you should consider port density and scaling limitations when choosing the product to place in the Spine location. Some of the pertinent features for a Spine node include overlay networking support, Layer 2 and Layer 3 VXLAN Gateway support, and number of VLANs supported.


www.juniper.net


The slide shows some of the recommended Juniper Networks products that can act as Leaf nodes.

www.juniper.net





www.juniper.net


The slide highlights the desired routing behavior of a Leaf node. Ideally, each Leaf node should have multiple next-hops to use to load share traffic over the IP fabric. Notice the router C can use two different paths to forward traffic to any remote destination.

www.juniper.net



The slide highlights the desired routing behavior of a Spine node. Ideally, each Spine node should have multiple next-hops to use to load share traf fic to remote destinations attached to the IP fabric. Notice that routers D and E have one path for singly homed hosts and two path available for multihomed hosts. It just so happens that getting these routes and associated next hops into the forwarding table of a Spine node can be tricky. The rest of the chapter discusses the challenges as well as the solutions to the problem.


www.juniper.net


Remember that your IP Fabric will be forwarding IP data only. Each node will be an IP router. In order to forward IP packets between routers, they need to exchange IP routes. So, you have to make a choice between routing protocols. You want to ensure that your choice of routing protocol is scalable and future proof. As you can see by the chart, BGP is the natural choice for a routing protocol.

www.juniper.net



IBGP is a valid choice as the routing protocol for your fabric. IBGP peers almost always peer to loopback addresses as opposed to physical interface addresse s. In order to establish a BGP session (over a TCP session), a router must have a route to the loopback address of its neighbor. To learn the route to a neighbor an Interior Gateway Protocol (IGP) like OSPF must be enabled in the network. One purpose of enabling an IGP is simply to ensure every router knows how to get to the loopback address of all other routers. Another problem that OSPF will solve is determining all of the equal cost paths to remote destinations. For example, router A will determine from OSPF that there are 2 equal cost paths to reach router B. Now router A can load share traf fic destined for router B’s loopback address (IBGP learned routes, se e next few slides) across the two links towards router B.


www.juniper.net


There is a requirement in an IBGP network that if one IBGP router needs to advertise an IBGP route, then every other IBGP router must receive a copy o f that route (to prevent black holes). One way to ensure this happens is to have every IBGP router peer with every other IBGP router (a full mesh). This works fine but it does not scale (i.e., add a new router to your IP fabric and you will have to configure every router in your IP fabric with a new peer). There are two ways to help scale the full mesh issue; route reflection or confederations. Most often, it is route reflection that is chosen (it is easy to implement). It is possible to have redundant route reflectors as well (shown on the slide). It is best practice to configure one or more of the Spine nodes as route reflectors.

www.juniper.net



Note: The next few slides will highlight the problem face d by a Spine node (router D) that is NOT a route reflector. You must build your your IP Fabric such that all routers load share traffic over equal cost paths (when they exist) towards towards remote networks. Each router should be configured for BGP multipath so that they will load share when multiple BGP routes exist. The slide shows that router A and B advertise the 10.1/16 network to RR-A. RR- A will use both routes for forwarding (multipath) but will chose only one of those routes (the one from router B because it B has the lowest router ID) to send to router C (a Leaf node) and router D (a Spine node). Router C and router D will rece ive the route for 10.1/16. 10.1/16. Both copies will have a BGP next hop of router B’s loopback address. This is the default behavior of route advertisement and selection in the IBGP with route reflection scenario. Did you notice the load balancing problem (Hint: the problem is not on router C)? Since router C has two equal cost paths to get to router B (learned from OSPF), router C will load share traffic to 10.1/16 10.1/16 over the two uplinks towards the Spine routers. The load balancing problem lies on router D. Since router D received a single route that has a BGP next hop of router B’s loopback, it forwards all traffic destined to 10.1/16 10.1/16 towards router B. The path to router A (which is an equal cost path to 10.1/16) 10.1/16) will never be used in this case. The next slide discusses the solution to this problem. It should be wor th noting that although router C has no problem load sharing towards the 10.1/16 network, if router B were to fail, it may take some time for router C to learn about the router through router A. The next slide discusses the solution to this problem.


www.juniper.net


The problem on RR-A is that it sees the routes received from routers A and B, 10.1/16, as a single route that has been received twice. If an IBGP router receives dif ferent versions of the same route it is supposed to make a choice between them and then advertise the one, chosen route to its appropriate neighbors. One solution to this problem is to make every Spine node a route reflector. This would be fine in a small fabric but probably would not make sense when there are 10s of Spine nodes. Another option would be to make each o f the advertisements from router A and B look l ike unique routes. How can we make the multiple advertisements of 10.1/16 from router A and B appear to be unique routes? There is a draft RFC (draft-ietf-idr-add-paths) that defines the ADD-PATH capability which does just that; makes the adver tisements look unique. All routers in the IP Fabric should support this capability for it to work. Once enabled, routers advertise and evaluate routes based on a tuple of the network and its path ID. In the example, router A and B advertis e the 10.1/16 route. However, this time, every router suppor ts the ADD-PATH capability, RR-A attaches a unique path ID to each route and is able to advertise both routes to all clients including router D. When the routes arrive on the clients, the clients install both routes in its routing table (allowing them to load share towards routers A and B.) Although, router C was already able to load share without the additional route, router C will be able to continue forwarding traffic to 10.1/16 even in the event of a failure of either router A or router B.

www.juniper.net



EBGP is also a valid des ign to use in your IP Fabric. You will notice that the load balancing problem is much easier to fix in the EBGP scenario. For example, there will be no need for the routers to support any draft RFCs! Generally, each router in an IP Fabric should be in its own unique AS. You can use AS numbers from the private or public range or, if you will need thousands of AS numbers, you can use 32-bit AS numbers.


www.juniper.net


In an EBGP-based fabric, there is no need for route reflectors or an IGP. The BGP peering sessions parallel the physical wiring. For example, every Leaf node has a BGP peering session with every Spine node. There is no leaf-to-leaf or spine-to-spine BGP sessions just like there is no leaf-to-leaf or spine-to-spine physical connectivity. EBGP peering is done using the physical interface IP addresses (not loopback interfaces). To enable proper load balancing, all routers need to be configured for mul t i pat h mul t i pl e- as as well as a load balancing policy. Both of these configurations will be covered later in this chapter.

www.juniper.net



The slide shows that the router in AS64516 and AS64517 are advertising 10.1/16 to their 2 EBGP peers. Because mul t i pat h mul t i pl e- as is configured on all routers, the re ceiving routers in AS64512 and AS64513 will install both routes in their routing table and load share traffic destined to 10.1/16.


www.juniper.net


The slide shows that the routers in AS64512 and AS64513 are advertising 10.1/16 to all of their EBGP peers (all Leaf nodes). Since mul t i pat h mul t i pl e- as is configured on all routers, the receiving router in the slide, the router in AS64514, will install both routes in its routing table and load share traffic destined to 10.1/16.

www.juniper.net



When enabling an IP fabric you should follow some best practices. Remember, two of the main goals of an IP fabric design (or a Clos design) is to provide a non-blocking architecture that also provides predictable load-balancing behavior. Some of the best practices that should be followed include... •

All Spine nodes should be the exact same type of router. They should be the same model and they should also have the same line cards installed. This helps the fabric to have a p redictable load balancing behavior.

•

All Leaf nodes should be the exact same type of router. Leaf nodes do not have to be the same router as the Spine nodes. Each Leaf node should be the same model and they should also have the same line cards installed. This helps the fabric to have a predictable load balancing behavior.

•

Every Leaf node should have an uplink to every Spine node. This helps the fabric to have a predictable load balancing behavior.

•

All uplinks from Leaf node to Spine node should be the exact same speed. This helps the fabric to have predictable load balancing behavior and also helps with the non-blocking nature of the fabric. For example, let us assume that a Leaf has o ne 40GbE uplink and one 10GbE uplink to the Spine. When using the combination of OSPF (for loopback interface advertisement and BGP next hop resolution) and IBGP, when calculating the shortest path to the BGP next hop, the bandwidth of the links will be taken into consideration. OSPF will most likely always chose the 40GbE interface during its shortest path first (SPF) calculation and use the interface for forwarding towards remote BGP next hops. This essentially blocks the 10GbE interface from ever being used. In the EBGP scenario, the bandwidth will not be taken into consideration, so traffic will be equally load shared over the two different speed interfaces. Imagine trying to equally load share 60 Gbps of data over the two links, how will the 10GbE interface handle 30 Gbps of traffic? The answer is...it won’t.


www.juniper.net


The slide highlights the topic we discu ss next.

www.juniper.net



To increase the overall throughput of an IP Fabric, you simply need to increase the number of Spine devices (and the appropriate uplinks from the Leaf nodes to those Spine nodes). If you add one more Spine node to the fabric, you will also have to add one more uplink to each Leaf node. Assuming that each uplink is 40GbE, each Leaf node can now forward an extra 40Gbps over the fabric. Adding and removing both server-facing ports (downlinks from the Leaf nodes) and Spine nodes will affect the oversubscription (OS) ratio of a fabric. When designing the IP fabric, you must understand OS requirements of your data center. For example, does your data center nee d line rate forwarding over the fabric? Line rate forwarding would equate to 1-to-1 (1:1) OS. That means the aggregate ser ver-facing bandwidth is equal to the aggregate uplink bandwidth. Or, maybe your data center would work perfectly fine with a 3:1 OS of the fabric. That is, the aggregate server-facing bandwidth is 3 times that of the aggregate uplink bandwidth. Most data centers will probably not require to design around a 1:1 OS. Instead, you should make a decision on an OS ratio that makes the most sense based on the data center’s normal bandwidth usage. The next few slides discuss how to calculate OS ratios of various IP fabric designs.


www.juniper.net


The slide shows a basic 3:1 OS IP Fabric. All Spine nodes, four in total, are qfx5100-24q routers that each have (32) 40GbE interfaces. All leaf nodes, 32 in total, ar e qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE server-facing interfaces. Each of the (48) 10GbE ports for all 32 Spine nodes will be fully utilized (i.e., attached to downstream servers). That means that the total server-facing bandwidth is 48 x 32 x 10Gbps which equals 15360 Gbps. Each of the 32 Leaf nodes has (4) 40GbE Spine-facing interfaces. That means, that the total uplink bandwidth is 4 x 32 x 40Gbps which equals 5120 Gbps. The OS ratio for this fabric is 15360:5120 or 3:1. An interesting thing to note is that if you remove any number of Leaf nodes , the OS ratio does not change. For example, what would happen to the OS ratio if their were only 31 nodes. The server facing bandwidth would be 48 x 31 x 10Gbps which equals 14880 Gbps. The total uplink bandwidth is 4 x 31 x 40Gbps which equals 4960 Gbps. The OS ratio for this fabric is 14880:4960 or 3:1. This fact actually makes your design calculations very simple. Once you decide on an OS ratio and determine the number of Spine nodes that will allow that ratio, you can simply add and remove Leaf nodes from the topology without effecting the original OS ratio of the fabric.

www.juniper.net



The slide shows a basic 2:1 OS IP Fabric in which two Spine nodes were added to the topology from the last slide. All Spine nodes, six in total, are qfx5100-24q routers that each have (32) 40GbE interfaces. All leaf nodes, 32 in total, are qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE server-facing interfaces. Each of the (48) 10GbE ports for all 32 Spine nodes will be fully utilized (i.e., attached to downstream servers). That means that the total server-facing bandwidth is still 48 x 32 x 10Gbps which equals 15360 Gbps. Each of the 32 Leaf nodes has (6) 40GbE Spine-facing interfaces. That means, that the total uplink bandwidth is 6 x 32 x 40Gbps which equals 7680 Gbps. The OS ratio for this fabric is 15360:7680 or 2:1.


www.juniper.net


The slide shows a basic 1:1 OS IP Fabric. All Spine nodes, six in total, are qfx5100-24q routers that each have (32) 40GbE interfaces. All leaf nodes, 32 in total, ar e qfx5100-48s routers that have (6) 40GbE uplink interfaces and (48) 10GbE server-facing interfaces. There are many ways that an 1:1 OS ratio can be attained. In this case, although the Leaf nodes each have (48) 10GbE server-facing interfaces, we are only going to allow 24 servers to be attached at any given moment. That means that the total ser ver-facing bandwidth is still 24 x 32 x 10Gbps which e quals 7680 Gbps. Each of the 32 L eaf nodes has (6) 40GbE Spine-facing interfaces. That means, that the total uplink bandwidth is 6 x 32 x 40Gbps which equals 7680 Gbps. The OS ratio for this fabric is 7680:7680 or 1:1.

www.juniper.net





www.juniper.net


The slide shows the example topology that will be used in the subsequent slides. Notice that each router is the single member of a unique autonomous system. Each router will peer using EBGP with its directly attached neighbors using the physical interface addresses. Host A is singly homed to the router in AS 64514. Host B is multihomed to the routers in AS 64515 and AS 64516.

www.juniper.net



The slide shows the configuration of the Spine node in AS 64512. It is configured to peer with each of the Leaf nodes using EBGP.


www.juniper.net


The slide shows the configuration of the Leaf node in AS 64515. It is configured to peer with each of the Spine nodes using EBGP.

www.juniper.net



Once you configure BGP neighbors, you can check the status of the relationships using either the show bgp summary or show bgp neighbor command.


www.juniper.net


Once BGP neighbors are established in the IP Fabric, each router must be configured to advertise routes to its neighbors and into the fabric. For example, as you attach a server to a top-of-rack (TOR) switch/router (which is usually a Leaf node of the fabric) you must configure the TOR to advertise the ser ver’s IP subnet to the rest of the network. The first step in advertising route is to write a policy that will match on a route and then accept that route. The slide shows the policy that must be configured on the routers in AS64515 and AS 64516. 64516.

www.juniper.net



After configuring a policy, the policy must be applied to the router EBGP peers. The slide shows the direct policy being applied as an export policy as64515’s EBGP neighbors.


www.juniper.net


After applying the p olicy, the router should begin advertise any routes that were accepted by the policy. Use the show route advertising-protocol bgp command to see which routes are being advertised to a routers BGP neighbors.

www.juniper.net



Assuming the routers in AS 64515 and AS 64516 are advertising Hos t B’s subnet, the slide shows the default routing behavior on a Spine node. Notice that the Spine node has received two advertisements for the same subnet. However, because of the default behavior of BGP, the Spine node chooses a single route to select as the active route in the routing table (you can tell which is the active route because of the asterisk). Based on what is shown in the slide, the Sp ine node will send all traffic destined for 10.1.2/24 over the ge-0/0/2 link. The Spine node will not load share over the two possible next hops by default.


www.juniper.net


The mul t i pat h statement overrides the default BGP routing behavior and allows two or more next hops to be used for routing. The statement by itself requires that the multiple routes must be rece ived from the same autonomous system. Use the multiple-as modifier to override that matching AS requirement.

www.juniper.net



View the routing table to see the results of the multipath statement. As you can see the active BGP route now has two next hops that can be use for forwarding. Do you think the router is using both next hops for for warding?


www.juniper.net


The slide shows that since multipath was configured in the previous slides, two next hops are associated with the 10.1.2/ 24 route in the routing table. However, only one next-hop is pushed down to the forwarding table, by default. So, at this point, the Spine node is continuing to only forward traffic destined to 10.1.2/24 over a single link .

www.juniper.net



The final step in getting a router to load share, is to write and apply a policy that will cause the multiple next hops in the routing table to be exported from the routing table into the forwarding table. The slide shows the details of that process.


www.juniper.net


The output shows that after applying the load balancing policy to the forwarding table, all next hops associated with active routes in the routing table have been copied into the forwarding table.

www.juniper.net



The slide shows the BGP and policy configuration for the router in AS 64514.


www.juniper.net



www.juniper.net





www.juniper.net


•

Routing in an IP Fabric;

•

Scaling of an IP Fabric; and

•

Configuring an IP Fabric.

www.juniper.net



1.

2.

3.


www.juniper.net


The slide provides the objective for this lab.

www.juniper.net



1. Some of the Juniper Networks products that can be used in the Spine position of an IP Fabric are MX, QFX10k, and QFX5100 Series routers. 2. Routing should be implemented in such a way that when multiple, equal physical paths exist between two points data traffic should be load-shared over those paths to reach those two points. 3. To allow a BGP speaker to install more than one next hop in the routing table when the same route is received from two or more neighbors, mul t i pat h must be enabled.


www.juniper.net

Advanced Data Center Switching Chapter 4: VXLAN


•

Reasons why you would use VXLAN in your data center;

•

The control and data plane of VXLAN in a controller-less overlay; and

•

Configuration and monitoring of VXLAN when using multicast signaling.

Chapter 4–2 • VXLAN

www.juniper.net



www.juniper.net

VXLAN • Chapter 4–3


The needs of the applications that run on the servers in a data center usually drive the designs of those data centers. There are many server-to-server applications that have strict requirements layer 2 connectivity between servers. A switched infrastructure that is built around xSTP or a layer 2 fabric (like Juniper Network’s Vir tual Chassis Fabric or Junos Fusion) is perfectly suited for this type of connectivity. This type of infrastructure allow for broadcast domains to be stretched across the data center using some form of VLAN tagging.

Many of today’s next generation data centers are being built around IP Fabrics which, as their name implies, provide IP connectivity between the racks of a data center. How can a next generation data center based on IP-only connectivity support the layer 2 requirements of the traditional server-to-server applications? The rest of this section of this chapter will discuss the possible solutions to the layer 2 connectivity problem.


www.juniper.net


One possible solution to providing layer 2 connectivity over an IP-based data center would be to implement some form of layer 2 virtual private network (VPN) on the routers that directly attach to the ser vers in the rack. Usually these routers would be the top-of-rack (TOR) routers/switches. In this s cenario, each TOR router would act as a layer 2 VPN gateway. A gateway is the device in a VPN that performs the encapsulation and decapsulation of VPN data. In a layer 2 VPN based on Ethernet, a gateway (router on left) will take Ethernet frames destined for a remote MAC address, encapsulate the original Ethernet frame in some other data type (like IP, MPLS, IPsec, etc.) and transmit the newly formed packet to the remote gateway. The receiving gateway (router on right) will receive the VPN data, decapsulate the data by removing the outer encapsulation, and then forward the remaining original Ethernet frame to the locally attached server. Notice on the diagram, that the IP Fabric simply had to forward IP data. The IP Fabric had no knowledge of the Ethernet connectivity that exists between Host A and B.

www.juniper.net



There are generally two components of a VPN. There is the data plane (as described on this slide) and the control plane (as described on the next slide). The data plane of a VPN describes the method in which a gateway encapsulates and decapsulates the original data. Also, in regards to an Ethernet layer 2 VPN, it might be necessar y for the gateway to learn the MAC addresses of both local and remote servers much like a normal Ethernet switch learns MAC addresses. In almost all forms of Ethernet VPNs, the gateways learn the MAC addresses of locally attached ser vers in the data plane (i.e. from received Ethernet frames). Remote MAC addresses can be learned either in the data plane (after decapsulating data received from remote gateways) or in the control plane.


www.juniper.net


One question that must be asked is, “How does a ga teway learn about remote gateways?” The learning of re mote gateways can happen in one of two ways. Remote gateways can be statically configured on each gateway participating in a VPN or they can be learned through some dynamic VPN signaling protocol. Static configuration works fine but it does not re ally scale. For example, imagine that your have 20 TOR routers participating in a statically configured layer 2 VPN. If you add another TOR router to the VPN, you would have to manually configure each of the 20 switches to r ecognize the newly added gateway to the VPN. Usually a VPN has some form of dynamic signaling protocol for the control plane. The signaling protocol can allow for dynamic adds and deletions of gateways from the VPN. Some signaling protocols also allow a gateway to advertise its locally learned MAC addresses to remote gateways. Usually a gateway has to receive an Ethernet frame from a remote host before it can learn the host’s MAC address. Learning remote MAC addresses in the control plane allows the MAC tables of all gateways to be more in sync. This has a positive side effect of causing the forwarding behavior of the VPN to be more efficient (less flooding of data over the fabric).

www.juniper.net



The slide lists some of the layer 2 VPNs that exist today.


www.juniper.net


Data centers are relying on virtualization more and more. The slide shows the concepts of virtualizing servers in a data center. Instead of installing a bare metal servers (BMS), a server can run as a virtual machine (VM) on a host machine. A VM is a software computer that runs the same OS and applications as a BMS. A host machine is the physical machine that house the VMs that run inside it. One interesting piece of virtualization is how networking works between VMs. Normally, a BMS would simply need a physical network interface card (NIC) to attach to the network. In the virtualized world, the VMs also utilize NICs, however they are in fact, virtual. VMs use their virtual NICs to communicate with other VMs. To provide connectivity between VMs on the same host machine, the virtual NICs attach to virtual s witches. To allow VMs to communicate over the physical network, the virtual switches use the physical NICs of the host machine. If the physical network is a switched network (as in the diagram), the virtual switches appear to standard switches attached to the network. VLANs can simply be stretched from one virtual switch, across the physical switched network, and terminate on one or more remote virtual switches. This works great when the physical network is some sor t of Ethernet switches network. However, what happens when the physical network is based on IP routing?

www.juniper.net



As described in the previous slides, a layer 2 VPN can solve the problem by tunneling Ethernet frames over the IP network. In the case of virtualized networks, the virtual switches running on the host machines will act as the VPN gateways. Many vendors of virtualized products have chosen to support VXLAN as the layer 2 VPN. VXLAN functionality can be found in the virtual switches like VMWare’s Distributed vSwitch, Open vSwitch, and Juniper Network’s Contrail vRouters. If vir tualizing the network is the future, it would seem that VXLAN has become the de facto layer 2 VPN in the data center.


www.juniper.net



www.juniper.net



VXLAN is defined in RFC 7348 and describes a method of tunneling Ethernet frames over an IP network. RFC 7348 describes the data plane and a signaling plane for VXLAN. Although, RFC 7348 discusses PIM and multicast in the signaling plane, other signaling methods for VXLAN exist including Multi-protocol Border Gateway Protocol (MP-BGP) Ethernet VPN (EVPN) as well as Open Virtual Switch Database (OVSDB). This chapter covers the multicast method of signaling.


www.juniper.net


The VXLAN packet consist of the following: 1.

Original Ethernet Frame: The Ethernet frame being tunneled over the underlay network minus the VLAN tagging.

2.

VXLAN Header (64 bits): Consists of an 8 bit flags field, the VNI, and two reserved fields. The I flag must be set to 1 and the other 7 reserved flags must be set to 0.

3.

Outer UDP Header: Usually contain the well known destination UDP port 4789. Some VXLAN implementations allow for this destination port to be configured to some other value. The destination port is a hash of the inner Ethernet frames header.

4.

Outer IP Header: The source address is the IP address of the sending VXLAN Tunnel End Point (VTEP). The destination address is the IP address of the receiving VTEP.

5.

Outer MAC: As with any packet being sent over a layer 3 network, the source and destination MAC addresses will change at each hop in the network.

6.

Frame Check Sequence (FCS): New FCS for the outer Ethernet frame.

www.juniper.net



The VXLAN Tunnel Endpoint (VTEP) is the VPN g ateway for VXLAN. It per forms the encapsulation (and decap sulation) of Ethernet frames using VXLAN encapsulation. Usually, the mapping of VLAN (VM-facing) to VNI is manually configured on the VTEP.


www.juniper.net


The slide shows how a VTEP handles an Ethernet frame from a locally attached VM that must be sent to a remote VM. Here is the step by step process taken by Virtual Switch 1... 1.

VS1 receives an Ethernet frame with a destination MAC of VM3.

2.

VS1 performs a MAC table look up and determines that the frame must be sent over the VXLAN tunnel to the remote VTEP, VS2.

3.

VS1 removes any outer VLAN tagging on the original Ethernet frame and then encapsulates the remaining Ethernet frame using VXLAN encapsulation while also setting the destination IP address to VS2’s VTEP address as well as setting the VNI appropriately.

4.

VS1 forwards the VXLAN packet towards the IP Fabric.

www.juniper.net



The slide shows how a VTEP handles a VXLAN packet from a remote V TEP that must be decapsulated and sent to a local VM. Here is the step by step process taken by the network and VS2... 1.

The routers in the IP fabric simply route the VXLAN packet to its destination, VS2’s VTEP address.

2.

VS2 receives the VXLAN packet and uses the received VNI to determine on which MAC table the MAC table lookup should be performed.

3.

VS2 strips the VXLAN encapsulation leaving the original Ethernet frame.

4.

VS2 performs a MAC table lookup to determine the outgoing virtual interface to send the Ethernet frame.

5.

VS2, if necessary, pushes on VLAN tag and forwards the Ethernet frame to VM3.

One thing you should notice about the VLAN tagging between the VMs and the virtual switches is that since the VLAN tags are stripped before sending over the IP Fabric, the VLAN tags do not have to match between remote VMs. This actually allows for more flexibility in VLAN assignments from server to server and rack to rack.


www.juniper.net


We have discussed VTEPs that exist on virtual switches that sit on the host machines. However, what happens when the VMs on the host machine need to communic ate with a standard BMS that doesn’t support VXLAN. The VXLAN RFC describes how a networking device like a router or switch can handle the VTEP role. A networking device that can perform that role is called a VXLAN Gateway. There are two types of VXLAN Gateways; layer 2 and layer 3. The slide shows how a VXLAN Layer 2 Gateway (router on the right) handles VXLAN packets received from a remote V TEP. It simply provides layer 2 conne ctivity between hosts on the same VLAN. As you discuss the concept of a V TEP with others, you may notice that people refer to the different types of VTEPs in different ways. For example, a VTEP that is part of a v irtual switch (as shown in previous slides) is sometimes referred to as a sof tware VTEP. A physical router or switch acting as a VXLAN Gateway (Layer 2 or Layer 3) is sometimes referred to as a hardware VTEP.

www.juniper.net



Another form of gateway is the VXLAN Layer 3 Gateway. A layer 3 gateway acts as the default gateway for hosts on the same VXLAN Segment (i.e. broadcast domain). I n the slide, the default gateway for VM1 and VM2 is 10.1.1.254 which belongs to Router B’s IRB interface. To send a packet to 1.1.1.1 (a remote IP subnet) VM1 must use Address Resolution Protocol (ARP) to determine the MAC address of 10.1.1.254. Once VM1 knows the MAC address for 10.1.1.254, VM1 and the devices along the way to the 1.1.1.1 will use the following procedure to forward an IP packet to its de stination... 1.

VM1 creates an IP packet destined to 1.1.1.1.

2.

Since 1.1.1.1 is on a different subnet than VM1, VM1 encapsulates the IP packet in an Ethernet frame with a destination MAC address of the default g ateway’s MAC address and sends the Ethernet frame to VS1.

3.

VS1 receives the Ethernet frame and performs a MAC table lookup and determines that the Ethernet frame must be sent over the VXLAN tunnel to Router B. Router B appears to VS1 as the VTEP that is directly attached the host that owns the destination MAC address. The reality is that the destination MAC address is the MAC address of Router B’s IRB interface for that VLAN/VXLAN segment.

4.

Router B receives the VXLAN packet, determines the VNI which maps to a particular MAC table, and strips the VXLAN encapsulation leaving the original Ethernet frame.

5.

Router B performs a MAC table lookup and determines that the destination MAC belongs to its own IRB interface.

6.

Router B strips the remaining Ethernet framing and performs a routing table lookup to determine the nexthop to the destination network.

7.

Router B encapsulates the IP packet in the outgoing interface’s encapsulation and forwards it to the nexthop.


www.juniper.net


The slide shows that the standard place to implement VXLAN layer 2 gateways is on the Leaf nodes. Layer 3 GW placement is usually in the Spine or Fabric tier but can also be found on the Leaf nodes. Currently, most Juniper Leaf nodes QFX5100, EX4300, etc do not support Layer 3 GW functionality.

www.juniper.net



This slide discusses the MAC learning behavior of a VTEP. The next few slides will discuss the details of how remote MAC addresses are learned by VTEPs when using PIM as the control protocol.


www.juniper.net


The slide discusses the handling of BUM traffic by VTEPs according to the VXLAN standard model. In this model, you should note that the underlay network must support a multicast routing protocol, preferably some form of Protocol Independent Multicast Sparse Mode (PIM-SM). Also, the VTEPs must suppor t Internet Group Membership Protocol (IGMP) so that they can inform the underlay network that it is a member of the multicast group associated with a VNI. For every VNI used in the data center, there must also be a multicast group assigned. Remember that there are 2^24 (~16M) possible VNIs so your customer will need 2^24 group addresses. Luckily, 239/8 is a reserved set of organizationally scoped multicast group addresses (2^24 group addresses in total) that can be used fr eely within your customer’s data center.

www.juniper.net



The slide shows an example of a PIM-SM enabled network where the (*,G) rendezvous point tree (RPT) is established from VTEP A to R1 and finally to the rendezvous point (RP). This is the only part of the RPT shown for simplicity but keep in mind that each VTEP that belongs to 239.1.1.1 will also build its branch of the RP T (including VTEP B).


www.juniper.net


When VTEP B receives a broadcast packet from a local VM, VTEP B encapsulates the Ethernet frame into the appropriate VXLAN/UDP/IP headers. However, it sets the destination IP address of the outer IP header to the VNI’s group address (239.1.1.1 on the slide). Upon receiving the multicast packet, VTEP B’s DR (the PIM router closest to VTEP B) encapsulates the multicast packet into unicast PIM register messages that are destined to the IP address of the RP. Upon receiving the register messages, the RP de-encapsulates the register messages and forwards the resulting multicast packets down the (*,G) tree. Upon receiving, the multicast VXLAN packet, VTEP A does the following: 1.

Strips the VXLAN/UDP/IP headers;

2.

Forwards the broadcast packet towards the VMs using the virtual switch;

3.

If VTEP B was unknown, VTEP A learns the IP address of VTEP B; and

4.

Learns the remote MAC address of the sending VM and maps it to VTEP B’s IP address.

For all of this to work, you must ensure that the appropriate devices support PIM-SM, IGMP, and the PIM DR and RP functions. It is not shown on this slide but once R1 receives the first native multicast packet from the RP (source address is VTEP B’s address), R1 will build a shortest path tree (SPT) to the DR closest to VTEP B which will establish (S,G) state on all routers along that path.

www.juniper.net





www.juniper.net


The slide shows the example topology that will be used for the subsequent slides.

www.juniper.net



To help you understand the behavior of the example, the slide shows a logic al view of the overlay network. Using the help of VXLAN, it will appear that Host A, Host B, and the IRB’s of the routers in AS 64512 and 64513 will be in the same broadcast domain as well as IP subnet. Also, VRRP will run between the two routers so as to provide a redundant default gateway to the two hosts.


www.juniper.net


You must ensure that all VTEP addresses are reachable by all of the routers in the IP Fabric. Generally, the loopback interface will be used on Junipe r Network’s routers as the VTEP interfaces. Therefore, you must make sure that the loopback addresses of the routers are reachable. Remember, the loopback interface for each router in the IP Fabric fell into the 172.16.100/24 range.

www.juniper.net



Some form of PIM must be enabled in the IP Fabric. The slide shows that the routers will run PIM-SM with a statically configured RP. The configurations of the RP as well as all other routers is shown on the slide. Notice that PIM-SM only needs to be enabled on the IP Fabric facing interfaces.


www.juniper.net


You must decided on the source address of the VXLAN and multicast packets that will be generated by the local VTEP. Use the vt ep- sour ce- i nt er f ace statement to specify the interface where the IP address will come from. This command is the same for both MX and QFX5100 Series devices.

www.juniper.net



The slide shows the configuration necessary to enable VXLAN Layer 2 Gateway functionality on a QFX5100 Series router. It might be worth noting that you can configure the same multicast group for different VNIs on the sa me VXLAN gateway. However, it may cause a remote VXLAN gateway to receive unwanted BUM traffic for a VNI that does not belong to.


www.juniper.net


The slide shows the configuration necessary to enable VXLAN Layer 2 Gateway functionality on a MX Series router.

www.juniper.net



The slide shows how to enable VXLAN L ayer 3 Gateway functionality on an MX Series router (not suppor ted on QFX5100 series). Also, notice that VRRP has been enable on router as64512. The VRRP/IRB configuration for router as64513 is as follows...

[ edi t i nt er f ac es i r b] l ab@as64513# show uni t 0 { f ami l y i net { addr ess 10. 1. 1. 11/ 24 { vr r p- gr oup 1 { vi r t ual - addr ess 10. 1. 1. 254; pr i or i t y 100; } } } } The bridge domain configuration on router as64513 would be the identical to that shown on the slide.


www.juniper.net


Since VXLAN-based bridge domains do not support any form of multicast snooping, you can use the command on the slide to block the forwarding of multicast traffic over the VXLAN tunnels. As you know, multicast is used in the control plane for VXLAN. It helps in the forwarding of BUM traffic (here we care about the multicast traffic). Normally, when a VTEP rec eives multicast traffic from an attached ser ver, it will send a copy to all other locally attached servers on the same VLAN. It will also send a VXLAN encapsulated copy over the IP fabric using the multicast-group for the VXLAN segment. That is, every remote VTEP will receive a copy of the original multicast packet, regardless of whether or not they have any attached receivers. If you know that there are no receivers attached to any remote VTEPs for a particular multicast group, you can use the command on the slide to help stop the transmission of transit multicast traffic to uninterested VTEPs.

www.juniper.net



As you know, the default behavior of a Juniper Networks device acting as a VXLAN Layer 2 Gateway is to strip the original VLAN tag of Ethernet frames received from locally attached receivers. Another default behavior of those same devices, is to automatically discard any received VXLAN packets that, when decapsulated, contain a VLAN tagged Ethernet fr ame. The slide shows the commands that can override those default behaviors. One reason that you might want to preserve the VLAN tagging is to preserve the 802.1p bits for class of service purposes.


www.juniper.net


The command on the slide helps determine the cur rent (*,G) and (S,G) state for a router. From the point of view of a VXLAN Gateway, the (*,G) state should instantiate as soon as you commit the vxl an statement in the configuration. Any (S,G) state means that the gateway has received multicast traffic (BUM traffic encapsulated in VXLAN) from a remote VTEP allowing it to learn the remote V TEP’s IP address, so the local gateway has instantiated a SPT towards that remote VTEP.

www.juniper.net



The commands on the slide verify which PIM neighbors have been discovered and the associated settings for the neighbors.


www.juniper.net


Prior to learning any remote neighbors, a VXLAN Gateway will create a single logical VTEP interface, vt ep. 32768 on the slide. Although this interface is never used for forwarding, when it shows up in the output of this command it allows you to verify two things; the local device is configured as a VXLAN Gateway and its source IP address for VXLAN packets. For each remote VTEP learned, a gateway will instantiate another logical VTEP interface, vt ep. 32769 on the slide. These interfaces represent the VXLAN tunnel established between the local gateway and the remote gateway. These interfaces are actually used for forwarding as you can tell from the input and output p acket counts.

www.juniper.net



The sour ce command allows you see the locally configured values for a gateway. The r emot e command allows you to see the details of the r emotely learned gateway/VTEPs.


www.juniper.net


A VXLAN Gateway uses a MAC table for forwarding decisions. The slide shows the two commands to verify the MACs and associated interfaces that have been learned by the gateway.

www.juniper.net



•

Reasons why you would use VXLAN in your data center;

•

The control and data plane of VXLAN in a controller-less overlay; and

•

Configuration and monitoring of VXLAN when using multicast signaling.


www.juniper.net


1.

2.

3.

www.juniper.net





www.juniper.net


1. Major vendors of virtualization product support VXLAN to provide the layer 2 stretch over an IP-based data center. If the vSwitches of your virtualized product ONLY support VXLAN, then more than likely your other networking devices will need to support VXLAN as well. 2. A VXLAN Gateway automatically removes the VLAN tag for an Ethernet frames received from a locally attached server. 3. show ethernet-switching vxlan-tunnel-end-point remote mac-tabl e on a QFX5100 Series switch or show l2-learning vxlan-tunnel-end-point remote mac-table on an MX Series router can be used to view the

MAC learned from remote gat eways.

www.juniper.net




www.juniper.net

Advanced Data Center Switching Chapter 5: EVPN


•

The The ben benef efit its s of usin using g EVPN EVPN sign signal alin ing g for for VXL VXLAN AN;;

•

The The ope opera rati tion on of the the EV EVPN prot protoc ocol ol;; and and

•

Conf Config igur urin ing g and and moni monito tori ring ng EVPN EVPN sign signal alin ing g for for VXLA VXLAN. N.

Chapter 5–2 • EVPN

www.juniper.net



www.juniper.net

EVPN • Chapter 5–3


VXLAN is defined in RFC 7348 and describes a method of tunneling Ethernet frames over an IP network. RFC 7348 describes the data plane and a signaling plane for VXLAN. Although, RFC 7348 discusses Protocol Independent Multicast (PIM) and multicast in the signaling plane, other signaling methods for VXLAN exist including Multi-protocol Border Gateway Protocol (MP-BGP) Ethernet VPN (EVPN) as well as Open Virtual Switch Database (OVSDB). This chapter covers the EVPN method of signaling. Although we cover EVPN as the signaling co mponent for VXLAN in this chapter, it should be n oted that EVPN can also be used as the signaling component for both MPLS/MPLS and MPLS/GRE encapsulations as well. Those encapsulation types are not covered in this course.


www.juniper.net


The slide lists some of the benefits of using EVPN signaling instead of PIM. The subsequent slides of this section will discuss each of these benefits at a very high level. It will be in the next section of this chapter that we will take a deep dive into the EVPN protocol.

www.juniper.net



EVPN is based on Multiprotocol Border Gateway Protocol (MP-BGP). It uses the Address Family Identifier (AFI) of 2 5 which is the Layer 2 VPN address family. It uses the Subsequent Address Family Identifier of 70 which is the EVPN address family. BGP is a proven protocol in both service provider and enterprise networks. It has the ability to scale to millions of route advertisements. BGP also has the added benefit of being policy oriented. Using polic y, you have complete control over route advertisements allowing you to control which devices learn which routes.


www.juniper.net


When using PIM in the control plane for VXLAN, it is really not possible to have a server attach to two different top of rack switches with the ability to forward data over both links (i.e., both links active). When using EVPN signaling in the control plane, active/active forwarding is totally possible. EVPN allows for VX LAN gateways (Leaf1 at the top of the sl ide) to use multiple paths and multiple remote VXLAN gateways to forward data to multihomed hosts. Also, EVPN has mechanisms (like split horizon, etc.) to ensure that broadcast, unknown unicast, and multicast traffic (BUM) does not loop back towards a multihomed host.

www.juniper.net



The slide shows how EVPN signaling minimizes unknown unicast flooding. 1.

Leaf2 receives an Ethernet frame with a source MAC address of HostB and a destination MAC address of HostC.

2.

Based on a MAC table lookup, Leaf2 forwards the Ethernet frame to its destination over the VXLAN tunnel. Leaf2 also populates its MAC table with HostB’s MAC address and associates with the outgoing interface.

3.

Since Leaf2 just learned a new MAC address, it advertises the MAC address to the remote VXLAN gateway, Leaf1. Leaf1 installs the newly learned MAC address in its MAC table and assoc iates it with an outgoing interface, the VXLAN tunnel to Leaf2.

Now, when Leaf1 needs to send an Ethernet frame to HostB, it can send it directly to Leaf2 because it is a known MAC address. Without the sequence above, Leaf1 would have no MAC entry in its table for HostB (making the frame destined to HostB an unknown unicast Ethernet frame), so it would have to send a copy of the frame to all remote VXLAN gateways.


www.juniper.net


Although not currently supported, the EVPN RFC mentions that a EVPN Provider Edge (PE) router, Leaf1 in the example, can perform Proxy ARP. It is possible that if Leaf2 knows the IP-to-MAC binding for HostB (because it was snoop ing some form of IP traffic from HostB), it ca n send the MAC advertisement for HostB that also contains HostB’s IP address. Then, when HostA sends an ARP request for HostB’s IP address (a broadcast Ethernet frame), Leaf1 can simply send an ARP reply back to HostA without ever having to send the broadcast frame over the fabric.

www.juniper.net



The EVPN control plane also helps enable distributed layer 3 gateways. In the slide, notice that HostC has a default gateway configured of 10.1.1.254. SpineA and SpineB have been enabled as VXLAN L ayer3 Gateway. They both have been configured with the same virtual IP address of 10.1.1.254. If the Spine nodes are MX Series routers, they also share the same virtual MAC address, 00:00:5e:00:01:01 (same as VRRP even though VRRP is not used). SpineA and SpineB send a MAC Advertisement to LeafC for the same MAC. Now, LeafC can load share traf fic from HostC to the default gateway.


www.juniper.net



www.juniper.net



The slide highlights the terms used in a network using VXLAN with EVPN signaling. •

PE devices: These are the networking devices (Leaf nodes in the diagram) to which servers attach in a data center. These devices also act as VXLAN Tunnel Endpoints (VTEPs) or VXLAN gateways (can be Layer 2 or Layer 3). These devices can be any node of an IP fabric; Leaf or Spine.

•

P devices: These are networking devices that only forward IP data. The do not instantiate any bridge domains related to the EVPN.

•

Customer Edge (CE) devices: These are the devices that require the Layer 2 stretch over the data center. They are the servers, switches, and storage devices that need layer 2 connectivity with other devices in the data center.

•

Site: An EVPN site is a set of CEs that communicate with one another without needing to send Ethernet frames over the fabric.

•

EVPN Instance (EVI): An EVPN Instance spanning the PE devices participating in that EVPN.

•

Bridge Domain: A MAC table for a particular VLAN associated with an EVI. There can be many bridge domains for a given EVI.

•

MP-BGP Session: EVPN PEs exchange EVPN routes using MP-BGP.

•

VXLAN Tunnel: A tunnel established between EVPN PE devices used to encapsulate Ethernet frames in VXLAN IP packets.


www.juniper.net


The slide lists the EVPN routes, their usage, as well as where they are defined. The subsequent slides will discuss most of these routes in detail.

www.juniper.net



The type 2 route has a very simple purpose which is to advertise MAC addresses. Optionally, this route can be used to advertise a MAC address, as usual, but also an IP address that is bound to that MAC address. Leaf2, an EVPN PE, will learn MAC addresses in data plane from Ethernet frames received from CEs, CE2 in the example. Once Leaf2 learns CE2’s MAC address, it will automatically advertise it to remote PEs and attaches a target community, community “Orange” in the example. Leaf1, another EVPN PE, upon receiving the route must decide on whether it should keep the route. It makes this decision based on the received route target community. Leaf1, in order to accept and use this advertisement, must be configured with an import p olicy that accepts routes tagged with the “Orange” target community. Without a configured policy that matches on the “Orange” route target, Leaf1 would just discard the advertisement. So, at a minimum, ea ch EVI on each participating PE for a given EVPN must be configured with an export policy that attaches a unique target community to MAC advertisements and also configured with an import policy that matches and accepts advertisements based on that unique target community.


www.juniper.net


The route distinguisher can be formatted two ways: •

Type 0: This format uses a 2-byte administration field that codes the p rovider’s autonomous system number, followed by a 4-byte assigned number field. The assign ed number field is administered by the provider and should be unique across the autonomous system.

•

Type 1: This format uses a 4-byte administration field that is normally coded with the router ID (RID) of the advertising PE router, followed by a 2-byte assigned number field that carie s a unique value for each VRF table supported by the PE router.

The examples on the sl ide show both the Type 0 and Type 1 route distinguisher formats. The first example shows the 2-byte administration field with the 4-byte assigned number field (Type 0). RFC 7432 recommends using the Type 1 route distinguisher for EVPN signaling.

www.juniper.net



Each EVPN route advertised by a PE router contains one or more route target communities. These communities are added using VRF export policy or explicit configuration. When a PE router receives route advertisements from remote PE routers, it determines whether the associated route target matches one of its local VRF tables. Matching route targets cause the PE router to install the route into the VRF table whose configuration matches the route target. Because the application of policy determines a VPN’s connectivity, you must take extra care when writing and applying VPN policy to ensure that the tenant’s connectivity requirements are faithfully met.


www.juniper.net


VRF export policy for EVPN is applied using the vr f - t ar get statement. In the example, the statement v r f - t a r g e t t ar get : 1: 1 is applied to Leaf2’s orange EVI. That statement causes all locally learned MACs (in the MAC table) to be copied into the VRF table as EVPN Type 2 routes. Each of the Type 2 routes associated with lo cally learned MACs will be tagged with the community target:1:1. Finally, these tagged routes are then advertised to all remote PEs. In the next few slides, you will learn the details of the other EVPN route types. You should know that the vr f - t ar get statement always sets the target community (using hidden VRF import and export policies) of Type 1 routes. By default, the vr f - t ar get statement also sets the target community of Type 2 and Type 3 routes as well. Later in this chapter, you will learn how to set a dif ferent target community for Type 2 and Type 3 routes.

www.juniper.net



VRF import policy can be applied using the vr f - t ar get statement or it can be enabled by manually writing a policy and then applying it with the vrf-import statement. As you know, the vr f - t ar get statement is used to enable expor t policy that advertises EVPN routes tagged with the target community. The statement also happens to enable the associated impor t policy which will accept routes that are tagged with that target community. So, you must configure the vr f - t ar g et statement to enable export p olicy at a minimum. To override the impor t policy instantiated by that statement, you can apply the vr f - i mpor t statement. In the example, the vr f - t ar get t ar get : 1: 1 is applied to Leaf1’s EVI. When Leaf1 receives the MAC Advertisement from Leaf2, it runs the route through the configured import policy which will accept routes tagged with target:1:1. Once accepted, the route is copied into the Leaf1’s global RIB-IN table and then copied into the appropriate VRF table (the one configured with the vr f - t ar get t a r get : 1: 1 statement). Finally, the route is converted into a MAC entr y and stored in Leaf1’s MAC table for the Orange EVI. The outgoing inter face associated with the MAC is the VXLAN tunnel that terminates on Leaf2.


www.juniper.net


The set of links that attaches a site to one or more PEs is called an Ethernet segment. In the slide, there are two Ethernet Segments. Site 1 has an Ethernet segment that consists of links A and B. Site 2 has an Ethernet segment that consists of link C. Each Ethernet Segment must be assigned a 10-octet Ethernet Segment Identifier (ESI). There are two reserved ESI values as shown in the slide. For a single-homed site, like Site 2, the ESI should be set to 0x00:00:00:00:00:00:00:00:00:00. This is the default ESI setting for a server facing interface on a Juniper Networks EVPN PE. For any multihomed site, the ESI should be set to a globally unique ESI. In the example, both link A and link B have their ESI set to 0x01:01:01:01:01:01:01:01:01:01. The commands below shows how to set the ESI on the ser ver-facing interface.

{mast er : 0}[ edi t i nt er f aces et - 0/ 0/ 50] l ab@l eaf 1# show es i { 01: 01: 01: 01: 01: 01: 01: 01: 01: 01; al l - act i ve; } uni t 0 { f ami l y et her net - swi t chi ng { i nt er f ace- mode t r unk; vl an { member s v100; ... www.juniper.net



Once you have configured a non-reser ved ESI value on a site-facing interface, the PE will advertise an Ethernet Autodiscovery route to all remote PEs. The route carries the ESI value as well as the ESI Label Extended Community. The community contains the Single-Active Flag. This flag lets the remote PEs know whether or not they can load share traffic over the multiple links attached to the site. If the Single-Active flag is set to 1, that means only one link associated with the Ethernet segment can be used for forwarding. If the Single-Active flag is set to 0, that means that all links associated with the Ethernet segment can be used for forwarding data (we call this active/active forwarding). Juniper Networks devices only support active/active forwarding (we always set the flag to 0).


www.juniper.net


When a remote PE, Leaf 3 in the example, receives the Ethernet Autodiscovery routes from Leaf1 and Leaf2 it now knows that it can use either of the two VXLAN tunnels to forward data to MACs learned from Site 1. Based on the forwarding choice made by CE1, it may be that Leaf1 was the only PE attached to Site1 that learned CE1’s MAC address. That means that Leaf3 may have only ever received a MAC Advertisement for CE1’s MAC from Leaf1. However, since Leaf1 and Leaf2 are attached to the same Ethernet Seg ment (as advertised in their Type 1 routes), Leaf3 knows it can get to CE1’s MAC through either Leaf1 or Leaf2 . You can see in Leaf3’s MAC table, that both VXLAN tunnels have been installed as next hops for CE1’s MAC address.

www.juniper.net



Another benefit of the Ethernet Autodiscovery route is that it helps to enable faster convergence times when a link fails. Normally, when a site-facing link fails, a PE will simply withdraw each of its individual MAC Advertisement. Think about the case where there are thousands of MACs associated with that link. The PE would have to send 1000s of withdrawals. When the Ethernet Autodiscovery route is being advertised (because the esi statement is configured on the interface), a PE (like Leaf1 on the slide) can simply send a single withdrawal of its Ethernet Autodiscovery route and Leaf3 can immediately update the MAC table for all of the 1000s of MACs it had learned from Leaf1. This allows convergence times to greatly improve.


www.juniper.net


When EVPN signaling is used with VXLAN encapsulation, Juniper Networks devices only support ingress replication of BUM traffic. That is, when BUM traffic arrives on a PE, the PE will unicast copies of the BUM packets to each of the individual PEs that belong the same EVPN.

www.juniper.net



This EVPN route is very simple. The route informs remote PEs of how BUM traffic should be handled. This information is carried in the Provider Multicast Service Interface (PMSI) Tunnel attribute. It specifies whether PIM or ingress replication will be used and the addressing that should be used to send the BUM traffic. In the diagram, Leaf2 advertises that it is expecting and using ingress replication and that Leaf1 should use 4.4.4.4 as the destination address of the VXLAN packets that are carrying BUM traffic.


www.juniper.net


The slide shows the default split horizon rules that EVPN PEs follow when they receive BUM traffic from a local CE.

www.juniper.net



The slide shows the default split horizon rules that EVPN PEs follow when they receive BUM traffic from a remote PE.


www.juniper.net


Earlier we discussed how the Type 1 Ethernet Autodiscovery route can enable multipath forwarding when a site is multihomed to 2 or more PEs. That advertisement works great for known unicast traffic. However, the slide shows what happens when Leaf1 must send BUM traffic. In the top diagram, Leaf1 will make copies of the BUM packets and unicast them to each remote PE belonging to the same EVPN. This will cause CE2 to receive multiple copies of the same packets. This is not good. In the bottom diagram, Leaf3 receives BUM traffic from the attached CE. It makes copies and unicasts them to the remote PEs including Leaf2. Leaf2 because of the default split horizon rules will forward BUM traffic back to the source creating a loop. Electing a designated forwarder for an ESI will solve these problems.

www.juniper.net



To fix the problems described on the previous slide, all the PEs attached to the same Ethernet Segment will elect a designated forwarded for the Ethernet segment (2 or more PEs advertising the same ESI). A designated forwarder will be elected per broadcast domain. Remember that an EVI can contain 1 or more broadcast domains or VLANs. The Ethernet Segment Route (Type 4) is used to help with the election of the designated for warder.


www.juniper.net


Once you’ve configured an ESI on an interface, the PE will advertise the Ethernet Autodiscovery Route (Type 1) and also a Ethernet Segment Route (Type 4). The type 4 solves two problems. It helps in the designated forwarder election process and it helps add a new split horizon rule. Notice that Leaf2 and Leaf3 will advertise a type 4 to every PE belonging to an EVPN. However, notice that the route is not tagged with a target community. Instead, it is tagged with a ES-import target community. The ES-import target community is automatically generated by the advertising PE and is based off of the ESI value. Since Leaf1 does not have an import policy that matches on the ES-import target, it will drop the type 4’s. However, since Leaf2 and Leaf3 are configured with the same ESI, the routes are accepted by a hidden policy that matches on the ES-import target community that is only known by the PEs attached to the same Ethernet Segment. Now Leaf2 and Leaf3 use the Originator IP address in the Type 4 route to build a table that associates an Originator IP address (i.e. the elected designated forwarder) with a VLAN in a round-robin fashion. After the election, If a non-designated forwarder for a VLAN receives BUM traffic from a remote PE, it will simply drop those packets.

www.juniper.net



It is possible to have multiple default gateways sharing the same IP address for a subnet. Notice the configuration on an MX Series router...

[ edi t i nt er f aces i r b] l ab@spi ne1# show i rb { uni t 0 { f ami l y i net { addr ess 10. 1. 1. 10/ 24 { <<<<

www.juniper.net



www.juniper.net



The slide shows the IP Fabric that will serve as the underlay network. It is based on EBGP with each router being in its own autonomous system. Each router will advertise its loopback address which will also serve as the VTEP address as well.


www.juniper.net


The slide shows the overlay topology. Each leaf will act as a VXLAN Layer 2 Gateway. Each Spine will act as a distributed VXLAN Layer 3 Gateways and provide routing into and out of the 10.1.1/24 subnet. Host A will be dual-homed using a LAG to two Leaf nodes. The control plane for VXLAN will be EVPN using MP-IBGP. In the IBGP topology, the Spine nodes will act as route reflectors.

www.juniper.net



To help you understand they behavior of the example, the slide shows a logical view of the overlay network. Using the help of VXLAN, it will appear that Host A, Host B, and the IRB’s of the routers in AS 64512 and 64513 will be in the same broadcast domain as well as IP subnet. Also, a matching virtual IP address and a matching virtual MAC address will be assigned to each Spine node’s IRB interface which will provide a redundant, distributed default gateway to the two hosts.


www.juniper.net


The slide shows the common configuration for all routers. Notice that a load-balancing policy has been applied to the forwarding-table that will allow for multiple next hops to be installed in the forwarding table. Also, there is a policy called di r ect that will be applied to the EBGP neighbors. The main purpose of this policy is to advertise each router’s loopback interface (VTEP source interface) to all routers in the fabric. Lastly, in order for each router to run BGP, the autonomous system must be set under [ edi t r out i ng- opt i ons] . Looking at the example topology, you should notice that each router will belong to two autonomous systems. Each router will belong to one AS in the underlay and one AS in the overlay. If you plan to use the automatic route target function (described in subsequent slides) you should set the AS under [ edi t r out i ng- opt i ons] to the overlay network’s AS number.

www.juniper.net



The slide shows the BGP configuration of the Spine nodes. •

Underlay Configuration: Each router is peering to one another using EBGP. The expor t statement allows for all directly connect networks to be advertise to BGP neighbors. The l ocal - as statement overrides the settings under r out i ng- opt i ons just for the neighbors in this group. The mul t i pat h mul t i - as statement allows for multiple routes from multiple ASs to be used as active routes in the routing table.

•

Overlay Configuration: Each Spine node is acting as a route reflector running IBGP with its clients. The cl ust er statement cause the local router to act like a route reflector for the neighbors in this group. The f ami l y evpn si gnal i ng statement sets the AFI and SAFI for the IBGP sessions. The l ocal - as configuration is probably unnecessary since the same AS is configured under r out i ng- opt i ons . The mul t i pat h statement allows for multiple similar received BGP routes to be active in the routing table.


www.juniper.net


The slide shows the BGP configuration of a leaf node. The configuration is very similar to the previous slide. The main difference is that in the overlay group, a leaf node only needs to peers with the route reflectors.

www.juniper.net



You must ensure that all VTEP addresses are reachable by all of the routers in the IP Fabric. Generally, the loopback interface will be used on Juniper Network’s routers as the VTEP interface. Therefore, you must make sure that the loopback addresses of the routers are reachable.


www.juniper.net


Use the show bgp summary command to determine the status of your BGP neighbors. The slide shows that Leaf1 has established neighbor relationships using EVPN signaling with the route reflectors. Although no routes have been received from the route reflectors, you can see the RIB-IN that will be used for both sessions is bgp. evpn. 0.

www.juniper.net



The slide shows the inter face and VLAN configuration nece ssary to enable VXLAN L ayer 2 Gateway functionality on a QFX5100 Series router. Although this slide shows the configuration on the Leaf1 device, Leaf2 will have the exact same configuration. Since Leaf1 and Leaf2 have interfaces that belong to the same Ethernet Segment, both Leaf1 and Leaf2 should have their et-0/0/50’s ESI value set to the same value. When you assign an ESI value, you need to make sure that it is globally unique for the ES.


www.juniper.net


The slides shows the minimum configuration (along with the previous slide) to enable VXLAN Layer 2 Gateway functionality using EVPN signaling. It is under [ edi t swi t ch- opt i ons] that you will configure the vt ep- sour ce- i nt er f ace, r out e- di st i ngui sher , and the vr f - t ar get statement. It is under [ edi t pr ot ocol s evpn] that you will set the encapsulation of VXLAN, the multicast mode, and the list of VNIs that will receive the benefit of EVPN signaling. The slide mentions the vr f - t ar get statement and its behavior in the exporting of EVPN routes. It literally creates a hidden export policy that advertises all locally generated Type1, Type2, and Type3 routes to remote PE routers af ter tagging the routes with the specified target community. Also, the vr f - t ar get statement creates a hidden import policy that accepts any received EVPN routes that are tagged with the specified target community. We will discuss how to modify the router’s import and export policies in subsequent slides.

www.juniper.net



The slide shows the minimum configur ation of an MX Series router acting as a VXLAN Layer 3 Gateway. Even though the slide shows the top to bottom view, it may help to understand what is going on if you look at it from the bottom up. Notice that the bridge domain and EVPN configuration occurs in the context of a virtual switch, t enant 1_vs . Virtual switch configuration is required on an MX Series when using EVPN signaling. Everything under the t enant 1_vs enables the MX Series to be a VXLAN Layer 2 Gateway, similar to the QFX5100 series configuration on the previous slide except for the r o ut i ng- i nt er f ace i r b. 0 statement. Notice the IRB inter face has been given a r eal IP address of 10.1.1.10. It has also been assigned a vi r t ual - gat eway- addr ess of 10.1.1.254 which is the default gateway for the 10.1.1/24 subnet. It may not be obvious but the vi r t ual - gat eway- addr ess statement also binds a virtual MAC address of 00:00:5e:00:01:01 to the 10.1.1.254 address on the spine1 router. Thevi r t ual - gat eway- addr ess does everything mentioned above as well as cause the Spine1 router to send a MAC Advertisement route to all remote PEs advertising the virtual MAC address. Since Spine1 and Spine2 are configured with the same vi r t ual - gat eway- addr ess (and virtual MAC), the remote PEs can load share traf fic towards the virtual MAC address (i.e. whenever a host needs to se nd data to the default gateway). One last thing to mention is that by default, the subnet associated with the IRB interface is installed in the inet.0 table. The slide shows that the IRB interface has been associated with the t enant 1_vr routing instance. That means that any packet arriving on the IRB interface will be routed based on the t enant _ vr . i net . 0 routing table.


www.juniper.net


vrf-target Earlier, we saw the minimum configuration needed to allow a device to advertise EVPN routes using the vr f - t ar get statement. Using the vr f - t ar get statement by itself gives you very little control over the routes that get po pulated in the VRF tables. The slide shows that Leaf2 only needs to receive MAC Advertisements for VNI 1000. However, since each Leaf node is only configured with the vr f - t ar get statement, Leaf2 will receive and accept MAC Advertisement Routes for VNI 2000 also. Even though, Leaf2 does not have a MAC table for VLAN 200, it will still install all the MAC Advertisement routes in its RIB-IN table as well as its VRF table. This can be a major waste of memory on Leaf2 depending on how many MACs have been advertised for VNI 2000. The next few slides will show you how to get control over which routes are accep ted by the PE routers.

www.juniper.net



Type 1 Ethernet Autodiscovery routes are always advertised with the target associated with the vr f - t ar get statement. Also, Type 4 Ethernet Segment routes do not carry a standard target community, instead, they carry an ES-impor t community. That leaves us with the Type 2 MAC Advertisement Route and the Type 3 Inclusive Multicast E thernet Tag Route. As you know, both of these routes carry the VNI value. That means that these types of routes are VNI-specific (Type 1 and Type 4 routes are Ethernet Segment-specific). It is possible to set VNI-specific import and export policy using VNI-specific target communities. The slide shows how to configure the VNI-specific vrf - t ar get expor t statement under the vni - opt i ons hierarchy. Although, the vr f - t ar get expor t statements apply a hidden export policy that adver tises and tags the Type 2 and Type 3 routes for the related VNI using the configured target community, the commands do not apply any import policies. So, af ter applying the vr f - t ar get expor t statements, you must also configure and apply a vr f - i mpor t policy that accepts the new target communities as well as the original target community for the EVI (t ar get : 64520: 1 in the example). Your import policy will override the hidden impor t policy that was created by the original vr f - t ar get statement (vr f - t ar get t ar get : 64520: 1 in the example).


www.juniper.net


In the previous slide you learned how to take control of a router’s Type 2 and Type 3 advertisements. If you are working with 1000s of VLANs/VNI per interface, the task of applying per-VNI route targets might become cumbersome. The slide shows you how you can have your router automatically assign route targets to each configured VNI by configuring the aut o statement. This statement will also cause your router to automatically enable hidden VRF import and export policies to advertise and accept received routes tagged with the automatically generated target communities. You should note that the automatically generated VRF import policies that are created as a result of the aut o statement will override the import policy that gets instantiated with the vr f - t ar get t ar get : 64520: 1 statement on the slide (which is used for the Type 1 advertisements). So, you must configure and apply an import policy that will accept the Type 1 routes. In order for the auto statement to work nicely between PEs (so they calculate the same target communities), every PE router must be configured with aut o statement. Also, each PE router must be configured for the same autonomous system under the [ edi t r out i ng- i nst ance] hierarchy since the automatically generated target communities are based on that AS value.

www.juniper.net



As you know, the default behavior of a Juniper Networks device acting as a VXLAN Layer 2 Gateway is to strip the original VLAN tag of Ethernet frames received from locally attached servers. Another default behavior of those same devices, is to automatically discard any received VXLAN packets that, when decapsulated, contain a VLAN tagged Ethernet fr ame. The slide shows the commands that can override those default behaviors. One reason that you might want to preserve the VLAN tagging is to preserve the 802.1p bits for class of service purposes.


www.juniper.net


Use the show bgp summary command to determine the status and routing tables used with your router’s BGP neighbors.

www.juniper.net



The slide shows how to view all the routes (for all EVPN instances) that have been accepted by VRF import policies.


www.juniper.net


The slide shows you how to view the routes for a particular EVPN instance.

www.juniper.net



The slide shows some other useful BGP troubleshooting commands.


www.juniper.net


Prior to learning any remote neighbors, a VXLAN Gateway will create a single logical VTEP interface, vt ep. ep. 3276 32768 8 on the slide. Although this interface is never used for forwarding, when it shows up in the output of this command it allows you to verify two things; the local device is configured as a VXLAN Gateway and its source IP address for VXLAN packets. For each remote VTEP learned, a gateway will instantiate another logical VTEP interface, vt ep. ep. 32769 2769 on the slide. These interfaces represent the VXLAN tunnel established between the local gateway and the remote gateway. These interfaces are actually used for forwarding as you can tell from the input and output p acket counts.

www.juniper.net



The sour sour ce command allows you see the locally configured values for a gateway. The r emot e command allows you to see the details of the r emotely learned gateway/VTEPs.


www.juniper.net


A VXLAN Gateway uses a MAC table for forwarding decisions. The slide shows the two commands to verify the MACs and associated interfaces that have been learned by the gateway.

www.juniper.net



•

The The ben benef efit its s of usin using g EVPN EVPN sign signal alin ing g for for VXL VXLAN AN;;

•

The The ope opera rati tion on of the the EV EVPN prot protoc ocol ol;; and and

•

Conf Config igur urin ing g and and moni monito tori ring ng EVPN EVPN sign signal alin ing g for for VXLA VXLAN. N.


www.juniper.net


1.

2.

3.

www.juniper.net





www.juniper.net


1. EVPN allows for CE devices to multi-home to more than one Leaf node such that all interfaces are actively forwarding data. EVPN signaling minimizes unknown unicast flooding since PE routers advertise locally learned MACs to all remote PEs. 2. An Ethernet Segment route is tagg ed with the ES-Import Route Target community. 3. Because configuring the aut o statement overrides the hidden import policies of the vr f - t ar get statement, you must configure and apply a VRF import policy that accepts the target community that is assigned to the Type 1 routes.

www.juniper.net




www.juniper.net

Advanced Data Center Switching Chapter 6: Data Center Interconnect


•

The meaning of the term Data Center Interconnect;

•

The control and data plane of an MPLS VPN; and

•

The DCI options when using a VXLAN overlay with EVPN signaling.

Chapter 6–2 • Data Center Interconnect

www.juniper.net



www.juniper.net

Data Center Interconnect • Chapter 6–3


Data center interconnect (DCI) is basically a method to connect multiple data c enters together. As the name implies, a Layer 3 DCI uses IP routing between data centers while a Layer 2 DCI extends the Layer 2 network (VLANs) from one data center to another. Many of the DCI communication options rely on an MPLS network to transport frames between data centers. Although in most cases an MPLS network can be substituted with an IP network (i.e., by encapsulating MPLS in GRE), there are several advantages to using an MPLS network including availability, cost, fast failover, traffic engineering, and scalable VPN options.


www.juniper.net


Between two data centers that need to be interconnected is a network of some type. A typical interconnect network could be a point-to-point line, an IP network, or an MPLS network. The slide shows that these networks can be owned by customer (the owners of the data center) or by a se rvice provider. All the DCI options that we discuss in this chapter will work in both a customer-owned or service provider-owned interconnect network. The main difference is how much control a customer has over the DCI. Sometimes it is just easier and cost effective to let the service provider manage the DCI.

www.juniper.net



In general, if there is great distance between data centers, a point-to-point interconnect can be pretty expensive. However, if the data centers are just down the street from o ne another, it might make sense to have a point-to-point interconnect. This type of interconnect is usually provided as a dark fiber between the data center. The customer simply attaches equipment to the fiber and has the choice of running any type of protocol they wish over the interconnect.


www.juniper.net


It is possible to provide a DCI over an IP network. If the DCI is meant to provide Layer 2 stretch (extending of VLANs) between the data centers then the Ethernet frames will need to be encapsulated in IP as it traverses the DCI. VXLAN and GRE are some of the typical IP encapsulations that provide the layer 2 stretch. If the DCI is to provide layer 3 reachability between data center, then an IP network is well suited to meet those needs. However, sometime the DCI network may only support globally routeable IP addressing while the data centers use RFC 1918 addressing. When that is the case, it might make sense to create a layer 3 VPN between the two data centers, like GRE, IPsec, or RFC 4364 (MPLS Layer 3 VPN over GRE).

www.juniper.net



The slide shows the encapsulation boundary of an MPLS transport network. The boundaries are different depending on who owns the MPLS network. If the customer own the MPLS network then MPLS can be use d for encapsulation from end-to-end. If the service provider owns the MPLS network then the encapsulations between DC and MPLS network completely depends on what is allowed by the ser vice provider. If the service provider is providing a layer 2 VPN service, then the customer should expect that any Ethernet frames sent from one data center will appear unchanged as it arrives at the remote data center. If the service provider is providing a layer 3 VPN service, then the customer should expect that any IP packets sent from one data center will appear unchanged as it arrives at the remote data center. In some cases, the service provider will allow a customer to established data center-to-data center MPLS label switched paths (LSPs).


www.juniper.net


Many of the DCI technologies that we will discuss depend on an MPLS network to transport frames between data centers. Although in most cases an MPLS network can be substituted with an IP network (i.e., by encapsulating MPLS in GRE), there are several advantages to using an MPLS network: 1.

Fast failover between MPLS nodes: Fast reroute and Node/Link protection are two features of an MPLS network that allow for 50ms or better recovery time in the event of a link failure or node failure along the path of an MPLS label switched path (LSP).

2.

Scalable VPNs: VPLS, EVPN, L3 MPLS VPNs are DCI technologies that use MPLS to transport frames between data centers. These same technologies allow for the interconnection of many sites (potentially hundreds) without the need for the manual setup of a full mesh of tunnels between those sites. In most cases, adding a new site only requires administrator to configure the devices at the new site. The remote sites do not need to be touched.

3.

Traffic engineering: MPLS allows for the administrator to decide the path takes over the MPLS network. You no longer have to take the same path calc ulated by the IGP (i.e., all data takes the same path between sites). You can literally direct different traffic types to take different paths over the MPLS network.

4.

Any-to-any connectivity: When using an MPLS backbone to provide the DCI, it will allow you the flexibility to provide any type of MPLS-based Layer 2 DCI, Layer 3 DCI, or both combinations that you cho ose. An MPLS backbone is a network that can generally support most types of MPLS or IP-based connectivity at the same time.

www.juniper.net





www.juniper.net


MPLS is responsible for directing a flow of IP packets along a predetermined path across a network. This path is the LSP, which is similar to an ATM VC in that it is unidirectional. That is, the traffic flows in one direction from the ingress router to an egress router. Duplex traffic requires two LSPs—that is, one path to carry traffic in each direction. An LSP is created by the concatenation of one or more label-switched hops that direct packets between label-switching routers (LSRs) to transit the MPLS domain. When an IP packet enters a label-switched path, the ingress router examines the packet and assigns it a label based on its destination, placing a 32-bit (4-byte) label in front of the packet’s header immediately after the Layer 2 encapsulation. The label transforms the packet from one that is forwarded based on IP addressing to one that is forwarded based on the fixed-length label. The slide shows an example of a labeled IP packet. Note that MPLS can be used to label non-IP traffic, such as in the case of a Layer 2 VPN. MPLS labels can be assigned per interface or per router. The Junos operating system currently assigns MPLS label values on a per-router basis. Thus, a label value of 10234 can only be assigned once by a given Juniper Networks router. At egress the IP packet is restored when the MPLS label is removed as part of a pop operation. The now unlabele d packet is routed based on a longest-match IP address lookup. In most cases, the penultimate (or second to last) router pops the label stack in penultimate hop popping. In some c ases, a labeled packet is delivered to the ultimate router—the egress LSR—when the stack is popped, and the packet is forwarded using conventional IP routing.

www.juniper.net



The MPLS shim header is composed of four fields: •

20-bit label: Identifies the packet as belonging to a particular LSP. This value changes as the packet flows on the LSP from LSR to LSR.

•

Traffic Class (TC): Formerly called EXP (experimental), these three bits can be used to convey class-of-service information, specifically the forwarding class a given packet belongs to. The 3-bit width of this field makes it possible to give a frame a total of eight possible markings, each of them potentially linked to a different forwarding behavior, for example a different queuing priority and a different buffer size.

•

Bottom-of-stack bit: many MPLS applications require a packet to be tagged with several labels, one stacked on top of the other. The bottom-of-stack bit of a MPLS header is set to 1 if this is the bottom of the label stack, and below is the payload. The bottom-of-stack bit is set to z ero instead if below the header lays another MPLS header (i.e. another label). Among the applications which require label stacking are for example VPNs. Here the outer label, or transport label, indicates which label-switching router traffic should be delivered to. The inner label, called service label, describes instead how the payload should be treated once it reaches its destination label-switching router.

•

Time to live (TTL): As in the case for the equivalent IP field, TTL limits the number of hops a MPLS packet can travel. It is decremented at each hop, and if its value drops to zero, the packet is discarded. When using MPLS for IP traffic engineering, the default behavior is to copy the value of the IP TTL field into the MPLS TTL field. This allows diagnostic tools like traceroute to continue working even when packets are encapsulated within MPLS and sent down a label-switched path.


www.juniper.net


A very important point to keep in mind is that labels have only local significance : they are assigned by each router according to its own label availability. In other words, you can establish a label-switched path across a domain, between two endpoints, and traffic following the path will typically be tagged with a different label at each hop. A second important point is that generally labels are global to the router, and not tied to the incoming interface; a packet tagged with a given label will be subject to the same for warding treatment regardless from the interface it has been rec eived on. This apparently minor point will play a major role in MPLS traffic protection - a se t of MPLS features that try and minimize packet loss during a link or a node failure. There are only very few exceptions to this rule, mostly to do with specific (and very advanced) MPLS applications. One example is carrier-of-carriers, where a MPLS-enabled service provider offers a MPLS transport service to other service providers.

www.juniper.net



Labels 0 through 15 are reserved (RFC 3032, MPLS Label Stack Encoding). •

A value of 0 represents the IP version 4 (IPv4) explicit null label. This label indicates that the label must be popped, and the forwarding of the packet must then be based on what is below it, either another label or the payload.

•

A value of 1 represents the router alert label. This label value is legal anywhere in the label stack except at the bottom. When a received packet contains this label value at the top of the label stack, it is delivered to a local software module for processing. The label beneath it in the stack determines the actual forwarding of the packet. However, if the packet is forwarded further, the router aler t label should be pushed back onto the label stack before forwarding. The use of this label is analogous to the use of the router alert option in IP packets.

•

A value of 2 represents the IP version 6 (IPv6) explicit null label. This label value is legal only when it is the sole label stack entry. It indicates that the label stack must be popped, and the forwarding of the packet then must be based on the IPv6 header.

•

A value of 3 represents the implicit null label. This is a label that a LSR can assign and distribute, but it never actually appears in the encapsulation. When an LSR would otherwise replace the label at the top of the stack with a new label, but the new label is implicit null, the LSR pops the stack instead of doing the replacement. Although this value can never appear in the encapsulation, it can be specified by a label signaling protocol.

Continued on the next page.


www.juniper.net


The following list is a continuation of reserved Labels 0 through 15 (RFC 3032, MPLS Label Stack Encoding). •

A value of 7 is used for the Entropy Label Indicator (ELI). After determining a load balancing methodology, the ELI allows the ingress LSR to notify the downstream LSRs of the chosen load balancing methodology.

•

A value of 13 is used for Generic Associated Channel Label (GAL). This label informs an LSR that a received LSP belongs to a Virtual Circuit Connectivity Verification (VCCV) control channel.

•

A value of 14 is used as the OAM Alert Label. This label indicates that a packet is an MPLS OAM packet as described in ITU-T Recommendation Y.1711.

•

Values 4–6, 8-12, and 15 are reserved for future use.

www.juniper.net



Two labels deserve special attention: Label 0 and Label 3. These two labels can only be used at the end of an LSP, between the penultimate (that is, seco nd-to-last) and the egress router. •

Label 0 (Explicit null): This label is always assigned an action of “decapsulate” (pop); the label-switching router will just remove the MPLS header, and take a forwarding action based on what is below it (either another label, or the actual LSP payload).

•

Label 3 (Implicit Null): This is a special label value that is never actually found in MPLS frames, but only within MPLS signaling protocols. It is use d by the egress router, i.e. the last hop in a label-switched path, to request the previous router to remove the MPLS header. This behavior, referred to as penultimate-hop pop ping, is the Junos OS default.


www.juniper.net


The original definition of label-switching router is “a router that takes forwarding decisions based only on the content of the MPLS header”. In other words, a label-switching router always operates in label-switching mode. We will use a definition which is slightly less restrictive, to include also ingress and egress routers, sometimes referred to as label edge routers. Traffic at the ingress or at the egress of a label-switched path is typically not encapsulated into MPLS, so label-switching is not possible, and a forwarding decision needs to be taken according to other rules. We will use the term label-switching router (LSR) to mean any router which par ticipates in MPLS forwarding, including both the ingress and the egress nodes. For brevity, in the rest of the course we will also use the term router as synonym for label-switching router .

www.juniper.net



The forwarding behavior of a label-switching router is defined according to three basic label operations: •

Push: add a MPLS header (a label) to a packet. This operation is typically done by the label-switching router at the beginning of a label-switched path, to encapsulate a non-MPLS packet and all ow it to be forwarded by label switching within the MPLS domain.

•

Pop: remove a MPLS header from a MPLS-encapsulated packet. This is often done either at the end of an LSP or, as we will see shortly, by the second-to-last router (the penultimate hop).

•

Swap: replace the label value of a MPLS packet with another value. This operation is typically performed by transit label-switching routers, as a packet traverses a label-switched path.

After performing one of these MPLS basic operations, the packet is generally forwarded to the next-hop router. In some cases the forwarding treatment can be more complex, involving different combinations of the three basic operations. For some types of services, for example for VPNs, it is common to see a double-push forwarding action; while in some traffic protection scenarios, when building a local detour to avoid a link failure, sometimes a transit router will have to perform a swap-push operation.


www.juniper.net


A label-switched path (LSP) is a unidirectional p ath through the network defined in terms of label switching operations (push, pop, swap). You can think of a LSP as a tunnel: any packet that enters it is delivered to its endpoint, no matter what type of payload it contains. Establishing a label-switched path across a MPLS domain means determining the actual labels and label operations performed by the label-switching routers on the path. This can be done with manual configuration, or by some type of dynamic label distribution protocol. Often a label-switched path will reside within a single MPLS domain, for example within a single service provider. However, the development of advanced BGP-based MPLS signaling allows the creation of label-switched paths that span multiple domains and multiple administrations.

www.juniper.net



The ingress router, sometimes called head end, is typically performing a label operation of push, inserting the MPLS header between the layer-2 encapsulation and the payload packet. Its role is encapsulating non-MPLS traffic by adding one or more labels to it, and forwarding it down a label-switched path. The ingress router is not a pure label-switching router: the initial decision of which traffic to forward down which LSP is taken not according to the content of labels (which are not present yet), but according to other criteria, e.g. a route lookup for IP MPLS traffic engineering, or even the incoming interface, in case of point-to-point transport of layer-2 frames over MPLS (layer-2 circuits, circuit-cross-connect).


www.juniper.net


The transit label-switching routers are LSRs that are neither at the beginning nor at the end of a label-switched path. They typically operate in pure label switching mode, taking forwarding decisions only based on the label value of incoming MPLS frames. Very often transit LSRs will perform a swap operation, replacing the incoming label with the one expected by the next-hop of the label-switched path. Transit LSRs are typically not aware of the content of the MPLS traffic they are forwarding, and do not know if the payload is IP, IPv6, layer-2 frames or anything else.

www.juniper.net



The Label Information Base contains the actual MPLS label switching table which associates incoming MPLS labels to forwarding actions, typically a label operation of either swap or pop and a forwarding next-hop. Even if the label information base can be populated by static entries, generally this is done by a dynamic label distribution protocol.


www.juniper.net


Often the MPLS header is removed by the second-to-last (the pe nultimate) router in an LSP. This removal is an optimization that helps in several cases, including using MPLS for IP traffic engineering. Removing the label at the penultimate hop facilitates the work of the last-hop (egress) router, which, instead of having both to remove the MPLS header and then take an IP routing decision, will only need to do the latter. Penultimate-hop popping (PHP) is the default behavior on Juniper routers; however, it can be disabled in the configuration. Some applications require PHP to be disabled, but that is often done automatically: the Junos OS is smart enough to detect the need to signal the LSP so that PHP is disabled.

www.juniper.net



The egress router (or tail end) of a LSP is the last router in the label-switched path. Exactly as in the case of the ingress LSR, it is generally not a pure label-switching router, as it has to take a forwarding decision based on other factors than the incoming label. In case of MPLS IP traffic engineering, the egress router will be delivered ordinary IP packets due to penultimate-hop popping, and will take a forwarding decision based on ordinary IP routing.


www.juniper.net


On routers running the Junos OS, the label information base is stored into the mpl s. 0 table. As soon as you enable MPLS processing, four default entries are automatically created: they are for label 0 (explicit null), label 1 (router alert), label 2 (ipv6 explicit null) and label 13 (Generic Associated Label, used for Operation and Maintenance and defined in RFC5586).

www.juniper.net



On top of the pre-defined four labels, the mpls.0 table can be populated by static configuration or, much more fre quently, by dynamic label distribution protocols. Each label is associated with a forwarding action, typically composed of a MPLS label operation (push, pop, swap or a combination of these) and a next-hop. In this example, label 300576 has been installed by a dynamic protocol called LDP, while the remaining label, 1004792, has been configured statically. Note that there are two entries for this last label. This is because, in some cases, a label-switching router may have to take different forwarding actions according to whether the label is or is not at the bottom of the label stack. In this case, the forwarding actions turn out to be the same: pop the MPLS header and sent the content to 172.17.23.1 via interface ge-1/1/ 5.0. The IP address of the next hop needs of course to be directly connected: it is only use to derive which MAC address to use for layer-2 encapsulation.


www.juniper.net


Label distribution protocols create and maintain the label-to-forwarding equivalence class (FE C) bindings along an LSP from the MPLS ingress label-switching router (LSR) to the MPLS egress LSP. A label distribution protocol is a set of procedures by which one LSR informs a peer LSR of the meaning of the labels used to forward traffic between them. MPLS uses this information to create the forwarding tables in each LSR. Label distribution protocols are of ten referred to as signaling protocols. H owever, label distribution is a mo re accurate description of their function and is preferred in this course. The label distribution protocols create and maintain an LSP dynamically with little or no user intervention. Once the label distribution protocols are configured for the signaling of an LSP, the egress router of an LSP will send label (and other) information in the upstream direction towards the ingress router based on the configured options.

www.juniper.net



The Junos OS uses RSVP as the label distribution protocol for traffic engineered LSPs. •

RSVP was designed to be the resource reservation protocol of the Internet and “provide a general facility for creating and maintaining distributed reservation state across a set of multicast or unicast delivery paths” (RFC 2205). Reservations are an important part of traffic engineering, so it made sense to continue to use RSVP for this purpose rather than reinventing the wheel.

•

RSVP was explicitly designed to support extensibility mechanisms by allowing it to carry what are calledopaque objects. Opaque objects make no real sense to RSVP itself but are carried with the understanding that some adjunct protocol (such as MPLS) might find the information in these objects useful. This encourages RSVP extensions that create and maintain distributed state for information other than pure resource reservation. The designers believed that extensions could be developed easily to add support for explicit routes and label distribution.

Continued on the next page.


www.juniper.net


•

Extensions do not make the enhanced version of RSVP incompatible with existing RSVP implementations. An RSVP implementation can differentiate between LSP sig naling and standard RSVP reservations by examining the contents of each message.

•

With the proper extensions, RSVP provides a tool that consolidates the procedures for a number of critical signaling tasks into a single message exchange: –

Extended RSVP can establish an LSP along an explicit path that would not have been chosen by the interior gateway protocol (IGP);

–

Extended RSVP can distribute label-binding information to LSRs in the LSP;

–

Extended RSVP can reserve network resources in routers comprising the LSP (the traditional role of RSVP); and

–

Extended RSVP permits an LSP to be established to carry best-effort traffic without making a specific resource reservation.

Thus, RSVP provides MPLS-signaled LSPs with a method of support for explicit routes (“go here, then here, finally here…”), path numbering through label assignment, and route recording (where the LSP actually goes from ingres s to egress, which is very handy information to have). RSVP also gives MPLS LSPs a keepalive mechanism to use for visibility (“this LSP is still here and available”) and redundancy (“this LSP appears dead…is there a secondary path configured?”).

LDP associates a set of destinations (prefixes) with each data link layer LSP. This set of destinations is called the FEC. These destinations all share a common data LSP path egress and a common unicast routing path. LDP supports topology-driven MPLS networks in best-effort, hop-by-hop implementations. The LDP signaling protocol always establishes LSPs that follow the contours of the IGP’s shortest path. Traffic engineering is not possible with LDP.

www.juniper.net



Three classifications exist for Layer 2 DCIs: 1.

No MAC learning by the Provider Edge (PE) device: This type of layer 2 DCI does not require that the PE devices learn MAC addresses.

2.

Data plane MAC learning by the PE device: This type of DCI requires that the PE device learns the MAC addresses of both the local data center as well as the remote data centers.

3.

Control plane MAC learning - This type of DCI requires that a local PE learn the local MAC addresses using the control plane and then distribute these learned MAC addressed to the remote PEs.

A Layer 3 DCI uses routing to interconnect data centers. Each data center must maintain a unique IP address space. A Layer 3 DCI can be established using just about any IP capable link. Another important consideration for DCIs is incorporating some level of redundancy by using link aggregation groups (LAGs), IGPs using equal cost multipath, and BGP or MP-BGP using the mutipath or multihop features.


www.juniper.net


CE devices are located in the DC and usually pe rform standard switching or routing functions. CE devices can interface to PE routers using virtually any Layer 2 technology and routing protocol.

www.juniper.net



PE devices are located at the edge of the data center or at the edge of a Service Provider’s network. They interface to the CE routers on one side and to the IP/MPLS c ore routers on the other. PE devices maintain site-specific VPN route and forwarding (VRF) tables. In a Layer 3 VPN scenario, the PE and CE routers function as routing peers (RIP, OSPF, BGP, etc), with the PE router terminating the routing exchange between customer sites and the IP/MPLS core. In a Layer 2 VPN scenario, the PE’s CE-facing interface is configured with matching VLAN-tagging to the CE’s PE-facing interfaces and any frames received from the CE device will be forwarding over the MPLS backbone to the r emote site. Information is exchanged between PE routers using either MP-BGP or LDP. This information exchange allows the PE routers to map data to and from the appropriate MPLS LSPs traversing the IP/MPLS core. PE routers, and Ingress and Egress LSRs, use MPLS LSPs when forwarding customer VPN traffic between sites. LSP tunnels in the interconnect network separate VPN traf fic in the same fashion as PVCs in a legacy ATM or Frame Relay network.


www.juniper.net


Provider (P) routers are located in the IP /MPLS core. These routers do not carr y VPN data center routes, nor do they inter face in the VPN control and signaling planes. This is a key aspect of the RFC 4364 scalability model; only PE devices are aware of VPN routes, and no single PE router must hold all VPN state information. P routers are involved in the VPN forwarding plane where they act as label-switching routers (LSRs) performing label swapping (and popping) operations.

www.juniper.net



A VPN site is a collection of devices that can communicate with each other without the need to transit the IP/MPLS backbone (i.e., a single data center). A site can range from a single location with one switch o r router to a network consisting of many geographically diverse devices.


www.juniper.net


The slide shows how VPN data is encapsulated in an MPLS VPN scenario (MPLS L3 VPN as an example). PE1 receives IP packets destined for CE2. PE1 performs a lookup in its Green VRF table (the table associated with the PE-CE interface). The route to CE2’s address should list three things in terms of next-hop. It will list the outgoing interface and the inner and outer label that should be p ushed onto the IP packet. The outer label is swapped by the P routers along the way to deliver the MPLS packet to PE2. P3 per forms a penultimate hop pop, leaving only single labeled packets and forwards them to PE2. PE2 receives the labeled packets, pops the inner label, and uses the inner label to determine which VRF table to use (PE2 might have many VRF table). PE2 performs a lookup on the Green VRF table (because label 1000=Green VRF) and forwards the original IP packets to CE2.

www.juniper.net



Sometimes data from one CE may need to pass through multiple VPNs before reaching the remote CE. The top diagram shows the situation where packets enter the green VPN at PE1, get decapsulated at PE2, and then forwarded in their original format to PE3 where they enter the red VPN. From PE2’s perspective, PE3 is the CE for the green VPN. From PE3’s perspective, PE2 is the CE for the red VPN. You might think that you need 2 physical devices, PE2 and PE3 to “stitch” the two VPNs together. Well, as the bottom diagram shows, you can actually “stitch” two VPNs together using a single MX Series router. You can use the logical tunnel interface feature which are internal interfaces that allow you to connect two virtual routers together. The two virtual routers enabled on the MX Series device would simply perform the same functions as PE2 and PE3 in the top diagram.


www.juniper.net


The next few slides are going to discuss the details of MPLS Layer 3 VPNs. One thing to remember with Juniper Networks routers is that once an LSP is established (from PE1 to PE2 in the diagram) the ingress PE will install a host route (/32) to the loopback interface of the egress router in the inet.3 with a next-hop of the LSP (i.e. outbound interface of LSP and push a label). This default behavior means that not all traffic entering PE1 can get routed through the LSP. So what traff ic gets routed over the LSP then? Looking at the example in the slide, remember PE1 and PE2 are MP-BGP peers. That means that PE2 will advertise VPN routes to PE1 using MP-BGP which will have a BGP next-hop of 2.2.2.2 (PE2’s loopback). For these VPN routes to be usable by PE1, PE1 must find a route to reach 2.2.2.2 in the inet.3 table. PE1 will not look in inet.0 to resolve the nexthop of MPLS VPN MP-BGP routes.

www.juniper.net



The VPN-IPv4 route has a very simple purpose which is to advertise IP routes. PE2 installs locally learned routes in its VRF table. That includes the directly connected PE-CE interface as well as any routes PE2 le arns from CE2 (RIP, OSPF, BGP, etc). Once PE2 has locally learned routes in its VRF table, it advertises it (based on configured policy) to remote PEs and attaches a target community, target community “Orange” in the example. PE1, upon receiving the route must decide on whether it should keep the route. It makes this decision based on resolving the BGP nexthop in inet.3 as well as looking at the r eceived route target community. PE1, in order to accept and use this advertisement, must be configured with an import policy that accepts routes tagged with the “Orange” targ et community. Without a configured policy that matches on the “Orange” route target, PE1 would just discard the advertisement. So, at a minimum, each VRF on each par ticipating PE for a given VPN must be configured with an export policy that attaches a unique target community to routes and also configured with an import policy that matches and accepts advertisements based on that unique target community.


www.juniper.net


The route distinguisher can be formatted two ways: •

Type 0: This format uses a 2-byte administration field that codes the p rovider’s autonomous system number, followed by a 4-byte assigned number field. The assign ed number field is administered by the provider and should be unique across the autonomous system.

•

Type 1: This format uses a 4-byte administration field that is normally coded with the router ID (RID) of the advertising PE router, followed by a 2-byte assigned number field that carie s a unique value for each VRF table supported by the PE router.

The examples on the sl ide show both the Type 0 and Type 1 route distinguisher formats. The first example shows the 2-byte administration field with the 4-byte assigned number field (Type 0).

www.juniper.net



Each VPN-IPv4 route advertised by a PE router contains one or more route target communities. These communities are added using VRF export policy or explicit configuration. When a PE router receives route advertisements from remote PE routers, it determines whether the associated route target matches one of its local VRF tables. Matching route targets cause the PE router to install the route into the VRF table whose configuration matches the route target. Because the application of policy determines a VPN’s connectivity, you must take extra care when writing and applying VPN policy to ensure that the tenant’s connectivity requirements are faithfully met.


www.juniper.net


For each VRF, you must apply a VRF export policy. A VRF export policy determine which routes in a PE’s VRF table will be advertised to remote PEs. A VRF export policy gives you complete control over the connectivity from one site to another simply by either advertising or not advertising particular routes to a remote site. Another important function of the VRF export policy is that it will also cause the advertised routes to be tagged with a target community. In the slide, PE2 has a locally learned route (10.1.2/24, the network between PE2 and CE2) in its VRF table. To ensure CE1 and PE1 can send data to CE2, PE2 has an VRF export policy applied to its IBGP neighbor, PE1, which advertises locally learned routes tagged with the target community, target:1:1. The next slide shows PE1’s process of installing the VPN-IPv4 route in its own VRF table.

www.juniper.net



The slide shows the process that PE1 goes through when it receives a VPN-IPv4 route advertisement PE2. There is an assumption that PE1 is configured with a VRF import policy that accepts the same route target, target:1:1, that PE2 is attaching to its VPN routes.


www.juniper.net



www.juniper.net



This slide shows the 4 options for DCI when the data centers are enabled for VXLAN using EVPN signaling. The next few slides discuss each option in detail.


www.juniper.net


The slide shows an example of the signaling and data plane when using EVPN/VXLAN over a Layer 3 VPN. The two MX Series devices are the PE routers for the Layer 3 VPN. The layer 3 VPN can be over a private MPLS network o r could be a purchased Service Provider service. From the two QFX perspectives, they are separated by an IP network. The QFXs simply forward VXLAN packets between each other based on the MAC addresses learned through EVPN signaling. The MX devices have an MPLS layer 3 VPN between each other (Bidirectional MPLS LSPs, IGP, L3 VPN MP-BGP routing, etc). The MXs advertise the local QFX’s loopback address to the other MX. When forwarding data from West to East, QFX1 takes a locally received Ethernet frame and encapsulates it in a VXLAN packet destined to QFX2’s loopback address. MX1 performs a lookup for the received packet on the VRF table associated with the VPN interface (the incoming interface) and encapsulates the VXLAN p acket into two MPLS headers (outer for MPLS LSP, inner for MX2 VRF mapping). Upon receiving the MPLS encapsulated packet, MX2 uses the inner MPLS header to determine the VRF table so that i t can route the remaining VXLAN packet to QFX2. QFX2 strips the VXLAN encapsulation and forwards the original Ethernet frame to the destination host.

www.juniper.net



The slide shows an example of the signaling/data plane when using EVPN stitching between three EVPNs; EVPN/VXLAN between QFX1 and MX1, EVPN/MPLS between MX1 and MX2, and an EVPN/VXLAN between MX2 and QFX2. Each EVPN is signaled using EVPN MP-BGP signaling and are stitched together on the MX devices using logical tunnel interfaces. When forwarding data from West to East, QFX1 takes a locally received Ethernet frame and encapsulates it in a VXLAN packet destined to MX1’s loopback address. MX1 strips the VXLAN encapsulation and forwards the remaining Ethernet frame out of a logical tunnel interface. MX1 receives the Ethernet frame over the associated (looped) logical tunnel interface. MX1 takes the locally received Ethernet frame and encapsulates it in two MPLS header s (outer for MPLS LSP, inner for MX2 VRF mapping). Upon receiving the MPLS encapsulated packet, MX2 uses the inner MPLS header to determine the appropriate VRF and outgoing interface. MX2 forwards the remaining Ethernet frame out of a logical tunnel interface. MX2 receives the Ethernet frame over the associated (looped) logical tunnel interface. MX2 takes the locally received Ethernet frame and encapsulates it in a VXLAN packet destined to QFX2’s loopback address. QFX2 strips the VXLAN encapsulation and forwards the remaining Ethernet frame to the destination host.


www.juniper.net


The slide shows an example of the signaling/data plane when using EVPN stitching between three EVPNs; EVPN/VXLAN between QFX1 and MX1, EVPN/VXLAN between MX1 and MX2, and EVPN/VXLAN between MX2 and QFX2. Each EVPN is signaled using EVPN MP-BGP signaling and are stitched together on the MX devices using logical tunnel interfaces. When forwarding data from West to East, QFX1 takes a locally received Ethernet frame and encapsulates it in a VXLAN packet destined to MX1’s loopback address. MX1 strips the VXLAN encapsulation and forwards the remaining Ethernet frame out of a logical tunnel interface. MX1 receives the Ethernet frame over the associated (looped) logical tunnel interface. MX1 takes the locally received Ethernet frame and encapsulates it in a VXLAN packet destined to MX2’s loopback address. MX2 strips the VXLAN encapsulation and forwards the remaining Ethernet frame out of a logical tunnel interface. MX2 receives the Ethernet frame over the associated (looped) logical tunnel interface. MX2 takes the locally received Ethernet frame and encapsulates it in a VXLAN packet destined to QFX2’s loopback address. QFX2 strips the VXLAN encapsulation and forwards the remaining Ethernet frame to the destination host.

www.juniper.net



The slide shows an example of the signaling/data plane when using EVPN/VXLAN over an IP network. EVPN MP-BGP is used to synchronize MAC tables. When forwarding data from West to East, QFX1 takes a locally received Ethernet frame and encapsulates it in a VXLAN packet destined to MX1’s loopback address. QFX2 strips the VXLAN encapsulation and forwards the remaining Ethernet frame to the destination host.


www.juniper.net



www.juniper.net



The slides shows the EVPN Type 2 MAC Advertisements that must be exchanged between data centers when individual subnets are stretched between data centers. Notice that Host1 and Host2 are attached to the same subnet. The example shows the advertisement of just a single MAC addresses. However, in a real environment you might see 1000s of MAC addresses advertised between data centers. That is a bunch of routes! MAC moves, adds, and changes in one data center will actually effect the MAC tables/EVPN routing exchanges in another data center.


www.juniper.net


The EVPN Type 5 IP Prefix route can be used in a DCI situation in which the I P subnets between data centers are completely unique. Notice that Host1 and Host2 are attached to different subnets. This fact is very important to the discussion. In this situation, if Host1 needs to send an I P packet to Host2 it will send it to its default gateway which is the IRB of PE1. Leaf 1 will encapsulate the Ethernet frames from Host1 into VXLAN and send the VXLAN packets to PE1. PE1 will strip the VXLAN header and notice that the remaining Ethernet frames from Leaf1 have a destination MAC of its own IRB. It will strip the Ethernet header and route the remaining IP packet based on the routing table associated with the IRB interface. PE1 will use the EVPN Type 5 route that was received from PE2 for the 10.1.2/24 network and the pac ket will be forwarded over the VXLAN tunnel between PE1 and PE 2. You might ask yourself, “Why couldn’t PE1 use a standard IP route? Why does the 10.1.2/24 network need to be advertise by an EVPN Type 5 route?” The answer is that the Type 5 route allows for inter-data center traffic to be forwarded over VXLAN tunnels (i.e. the end to end VXLAN-based VPN is maintained between data centers). This is very similar to stitching concept discussed earlier. PE2 then receives the VXLAN encapsulated packet and forwards the remaining IP packet towards the destination over the IRB interface (while encapsulating the IP packet in an Ethernet header with a destination MAC of Host2). Finally, PE performs a MAC table lookup and forwards the Ethernet frame over the VXLAN tunnel between PE2 and Leaf2.

www.juniper.net





www.juniper.net


The slide shows the topology that will serve as the underlay network. It is based on EBGP routing between the routers in the same data center. In AS 64555, PE and P routers will run OSPF to advertise each router’s loopback address. They will run LDP to automatically establish MPLS LSPs to each other’s loopback address. Finally, each PE will establish a VPN-IPv4 MP-IBGP session with each other. The PEs will exchange locally learned routes (the loopback addresses of the Leaf nodes) so that they Leaf nodes can establish the overlay network (next slide).

www.juniper.net



Once the underlay network is established, each Leaf node will have learned a route from the local PE (using the EBGP session) to reach the loopback address (VTEP source address) of the remote Leaf. Leaf1 and Leaf2 will act as VXLAN Layer 2 Gateways and also establish an EVPN MP-IBGP session with each other to exchange EVPN routes to advertise locally learned MACs to the remote Leaf. Host A and Host B will be able to communicate as if they were on the same LAN segment.


www.juniper.net


The slide shows the MPLS configuration of PE1.

www.juniper.net



Any easy way to see that there are MPLS LSPs established when using LDP signaling is to view the inet.3 table. If there is a route in the inet.3 table to the remote PE’s loopback address then there is a unidirectional MPLS LSP established to the remote PE. Remember, there needs to be an MPLS LSP established in each direction so you must check the inet.3 table on both PEs.


www.juniper.net


The slide shows the VRF configuration for PE1. Notice the use of the vr f - t ar get statement. Originally, VRF import policies could only be enabled by writing explicit policies under [ edi t pol i c y- opt i ons ] and applying them using the vr f - i mpor t and vr f - expor t statements. However, more recent versions of the Junos Operating System allow you to skip those steps and simple configure a single vr f - t ar g et statement. The vr f - t ar get statement actually enables two hidden policies. One policy is an VRF export policy that takes all locally learned routes in the VRF (direct interface routes as well as routes learned from the local CE) and advertises them to the remote PE tagged with the specified target community. The other policy is an VRF impor t policy that will accept all VPN-IPv4 routes learned from remote PEs that are tagged with the specified target community.

www.juniper.net



The slide shows how to enable VPN-IPv4 signaling between PEs. Use the show bgp summary command to verify that the MP-BGP neighbor relationship is established and that the PE is receiving routes from the remote neighbor.


www.juniper.net


Remember, the main purpose of establishing an underlay network and the DCI is to ensur e that the routers in e ach site can reach the loopback addresses (VTEP source addresses) of the remote Leaf nodes. The slide shows that PE1 has learned the loopback address of Leaf2.

www.juniper.net



The slide shows the underlay and overlay network configuration of Leaf1. Leaf2 would be configured very similarly.


www.juniper.net


•

The meaning of the term Data Center Interconnect;

•

The control and data plane of an MPLS VPN; and

•

The DCI options when using a VXLAN overlay with EVPN signaling.

www.juniper.net



1.

2.

3.


www.juniper.net



www.juniper.net



1. A DCI can be provided by a point-to-point link, and IP network, or an MPLS network. 2. The VPN-IPv4 NLRI includes an MPLS label, the route distinguisher, and an IP prefix. A target community is also tagged to the route but it is not officially part of the NLRI. 3. When the transport network of a DCI is a public IP network, the option available for a DCI is option 3.


www.juniper.net


The slide lists online resources available to learn more about Juniper Networks and technology. These resources include the following sites: •

Pathfinder : An information experience hub that provides centralized product details.

•

Content Explorer : Junos OS and ScreenOS software feature information to find the right software release and hardware platform for your network.

•

Feature Explorer : Technical documentation for Junos OS-based products by product, task, and software release, and downloadable documentation PDFs.

•

Learning Bytes: Concise tips and instructions on specific features and functions of Juniper technologies.

•

Installation and configuration courses : Over 60 free Web-based training courses on product installation and configuration (just choose eLearning under Delivery Modality).

•

J-Net Forum: Training, certification, and career topics to discuss with your peers.

•

Juniper Networks Certification Program: Complete details on the certification program, including tracks, exam details, promotions, and how to get started.

•

Technical courses: A complete list of instructor-led, hands-on courses and self-paced, eLearning courses.

•

Translation tools : Several online translation tools to help simplify migration tasks.

www.juniper.net


www.juniper.net

AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . aggregation device AFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Address Family Identifier BGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Border Gateway Protocol BUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . broadcast, unknown unicast, and multicast CapEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .capital expenses CE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . customer edge CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . command-line interface CSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Control and Status Protocol DCI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data Center Interconnect EVI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EVPN Instance FCoE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fibre Channel over Ethernet FCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Frame Check Sequence FEC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . forwarding equivalence class GRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . generic routing encapsulation GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . graphical user interface IBGP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . internal BGP IGMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Internet Group Management Protocol IGP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . interior gateway protocol IPv6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IP version 6 JNCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Juniper Networks Certification Program LAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . link aggregation group LSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . label switched path LSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .label-switching router MAC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .media access control MC-LAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Multichassis Link Aggregation MP-BGP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multi-protocol Border Gateway Protocol MPLS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiprotocol Label Switching OpEx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . operating expenditures OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . operating system P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . provider PE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . provider edge PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .penultimate-hop popping PIM-SM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Protocol Independent Multicast Sparse Mode RID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . router ID RP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rendezvous point RPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . rendezvous point tree SD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . satellite device STP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Spanning Tree Protocol VC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Chassis VCF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual Chassis Fabric VM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . virtual machine VPN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . virtual private network VRF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VPN route and forwarding VTEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .VXLAN Tunnel End Point VXLAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Virtual eXtensible Local Area Network

www.juniper.net

Acronym List • ACR–1

_SE_ADCX-14.a-R_SG_.pdf

Recommend Documents