Summary of Features................................................................................... 22
Port MAC Features .................................................................................................................... 22 Port Trunking Features .............................................................................................................. 24 Distributed Switching Architecture (DSA) Features ................................................................... 24 Quality of Service Features ....................................................................................................... 25 Policy Features .......................................................................................................................... 25 Bridging Features ...................................................................................................................... 26 Unicast Routing Features .......................................................................................................... 27 Traffic Policing Features ............................................................................................................ 28 Bandwidth Management Features............................................................................................. 28 Secure Control Technology (SCT) Features ............................................................................. 29 Traffic Monitoring Features........................................................................................................ 30
Cascade Ports ........................................................................................................................... 44 Single-Target Destination in a Cascaded System ..................................................................... 45
PCI Interface ............................................................................................................................. 58 Serial Management Interfaces (SMI) ........................................................................................ 79 Two Wire Serial Interface (TWSI) ............................................................................................. 90 Device Address Space.............................................................................................................. 93 CPU MII/GMII/RGMII Port......................................................................................................... 94 Interrupts ................................................................................................................................... 98 General Purpose Pins (GPP) .................................................................................................. 100
Section 7. 7.1 7.2 7.3
CPU Traffic Management ........................................................................... 102
CPU Port Number ................................................................................................................... 102 Packets to the CPU................................................................................................................. 102 Packets from the CPU............................................................................................................. 107
Tri-Speed Port Overview .........................................................................................................137 HyperG.Stack Port Overview...................................................................................................138 HX and QX Ports Overview .....................................................................................................142 MAC Operation and Configuration...........................................................................................144 Tri-Speed Ports Auto-Negotiation............................................................................................161 MAC MIB Counters..................................................................................................................164 MAC Error Reporting ...............................................................................................................171
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
9.1 9.2 9.3 9.4 9.5 9.6 9.7
Network Interfaces and Media Access Controllers (MACs) ................... 136
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 5
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Section 9.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Table of Contents
Unicast Routing Features........................................................................................................ 265 Unicast Routing Overview....................................................................................................... 265 Policy Engine Support of Unicast Routing .............................................................................. 266 Bridge Engine Support for Unicast Routing ............................................................................ 270 Router Engine Processing ...................................................................................................... 271 Routed Packet Header Modification........................................................................................ 275 Layer 3 Control Traffic to the CPU .......................................................................................... 278 One-Armed Router Configuration ........................................................................................... 279
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8
Section 13. Port Trunking .............................................................................................. 281 13.1 13.2 13.3 13.4
Port Trunk-ID Assignment....................................................................................................... 281 Forwarding to a Single Trunk Destination ............................................................................... 283 Forwarding of Multi-Destination Packets................................................................................. 285 Trunking over Cascade Link ................................................................................................... 291
LED Interface Overview .......................................................................................................... 318 LED Indications ....................................................................................................................... 319 LED Indication Groups ............................................................................................................ 324 Other Indications ..................................................................................................................... 325 LED Stream............................................................................................................................. 326
MV-S102110-02 Rev. E Page 6
CONFIDENTIAL Document Classification: Restricted Information
Appendix A. DSA Tag Formats ....................................................................................... 333
Appendix B. CPU Codes .................................................................................................. 343 Appendix C. Register Set................................................................................................ 374
C.9 C.10 C.11 C.12 C.13 C.14 C.15 C.16 C.17
Registers Overview..................................................................................................................374 Global, TWSI Interface and CPU Port Configuration Registers...............................................376 GPP Configuration Registers...................................................................................................390 PCI SDMA Registers ...............................................................................................................392 Master XSMI Interface Configuration Registers ......................................................................400 Router Header Alteration Configuration Registers ..................................................................404 Tri-Speed Ports MAC, CPU Port MAC, and SGMII Configuration Registers...........................410 HyperG.Stack and HX/QX Ports MAC, Status, and MIB Counters, and XAUI Control Configuration Registers...............................................................................................426 XAUI PHY Configuration Registers .........................................................................................438 HX Port Registers Registers....................................................................................................475 LEDs, Tri-Speed Ports MIB Counters, and Master SMI Configuration Registers....................507 PCI Registers...........................................................................................................................552 Policy Engine and Bridge Engine Configuration Registers......................................................560 Policers and Unicast Routing Engine Configuration Registers................................................679 Pre-Egress Engine Configuration Registers ............................................................................692 Egress, Transmit Queue and VLAN Configuration Registers and Tables ...............................711 Buffers Memory, Ingress MAC Errors Indications, and Egress Header Alteration Configuration Tables and Registers ........................................................................................769 Buffers Management Registers ...............................................................................................777 Summary of Interrupt Registers...............................................................................................790
C.18 C.19
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 7
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Extended DSA Tag in TO_CPU Format ..................................................................................333 Extended DSA Tag in FROM_CPU Format.............................................................................336 Extended DSA Tag in TO_ANALYZER Format.......................................................................339 Extended DSA Tag in FORWARD Format ..............................................................................341
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
A.1 A.2 A.3 A.4
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Table of Contents
MV-S102110-02 Rev. E Page 8
CONFIDENTIAL Document Classification: Restricted Information
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86 CONFIDENTIAL
Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 9
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Per Port Protocol Table Entry...........................................................................................................209 VLAN Entry Fields ............................................................................................................................217 Spanning Tree Port State Behavior..................................................................................................220 FDB Address Table Entry.................................................................................................................222 Additional Address Update Fields ....................................................................................................227 IEEE Reserved Multicast Addresses................................................................................................245 Cisco Proprietary L2 Protocols.........................................................................................................245 MLD Messages over ICMPv6...........................................................................................................247 Common IPv4/6 Link-Local Multicast Addresses .............................................................................250 Host Counters ..................................................................................................................................261 Matrix Source Destination Counters.................................................................................................262 Ingress Port/VLAN/Device Counters per Counter-Set .....................................................................262 Egress Counters per Counter-Set ....................................................................................................264 Routing PCL Rule Classification Key Fields.....................................................................................268 Policy Action Entry As a Route Entry ...............................................................................................269 Configuration Range of CIR and CBS..............................................................................................299 Number of 256-Byte Buffers For Each Device .................................................................................302 SDWRR vs. DWRR ..........................................................................................................................307 Tri-Speed Ports and CPU Port Indication Classes Description........................................................319 HyperG.Stack Port Indication Classes Description ..........................................................................321 XAUI PHY LED Indications ..............................................................................................................322 Group Data Description....................................................................................................................324 LED Interface 0 Ordered by Class ...................................................................................................326 LED Interface 1 Ordered by Class ...................................................................................................328 LED Interface 0 Ordered by Port......................................................................................................329 LED Interface 1 Ordered by Port......................................................................................................331 Extended TO_CPU DSA Tag ...........................................................................................................333 Extended FROM_CPU DSA Tag .....................................................................................................336 Extended TO_ANALYZER DSA Tag................................................................................................339 Extended FORWARD DSA Tag .......................................................................................................341 CPU Codes ......................................................................................................................................343 Standard Register Field Type Codes ...............................................................................................374 Valid Ports for Each Device..............................................................................................................375 Global, TWSI Interface and CPU Port Configuration Register Map Table .......................................376 GPP Configuration Register Map Table ...........................................................................................390 SDMA Register Map Table...............................................................................................................392 Master XSMI Interface Register Map Table .....................................................................................400 Router Header Alteration Configuration Registers Map Table .........................................................404 Tri-Speed Ports MAC, CPU Port MAC and SGMII Configuration Registers Map Table ..................410 HyperG.Stack and HX/QX Ports MAC and XAUI PHYs Configuration Register Map Table ............426 XAUI Register Map Table.................................................................................................................438 Register Map Table for the HX Port Registers Registers.................................................................475 LEDs, Tri-Speed Ports MIB Counters, and Master SMI Register Map Table...................................507 PCI Registers Map Table .................................................................................................................552
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
98DX106, 98DX163, 98DX166, 98DX243, and 98DX246 Top Level Block Diagram ....................... 32 98DX130, 98DX250, 98DX260, 98DX262, 98DX270, and 98DX803 Top Level Block Diagram ...... 33 98DX107, 98DX167, and 98DX247 Top Level Block Diagram ......................................................... 34 98DX133, 98DX253, 98DX263, and 98DX273 Top Level Block Diagram ........................................ 35 98DX169 and 98DX249 Top Level Block Diagram ........................................................................... 36 98DX269 Top Level Block Diagram .................................................................................................. 37 SecureSmart and Layer 2+ Switches Ingress and Egress Processing Engines ............................... 38 Multilayer Stackable and SecureSmart Stackable Switches Ingress and Egress Processing Engines........................................................................................................................... 39 Example of Single-Target Destination Forwarding in a Cascaded System....................................... 45 Example of Multi-Destination Forwarding in a Cascaded System..................................................... 47 DSA Tag in the Ethernet Frame ........................................................................................................ 48 Host Management Interfaces: 98DX130, 98DX133, 98DX250, 98DX253, 98DX260, 98DX263, 98DX270, 98DX273, and 98DX803 ................................................................................. 57 Host Management Interfaces: 98DX106, 98DX107, 98DX163, 98DX166, 98DX167, 98DX169, 98DX243, 98DX246, 98DX247, 98DX249, 98DX262, and 98DX269 ............................. 57 CPU Descriptors and Memory Buffers .............................................................................................. 66 Serial ROM Data Structure................................................................................................................ 91 TWSI Bus Transaction—External Master Write to a Device Register............................................... 92 TWSI Bus Transaction–External Master Read from a Device Register ........................................... 92 Hierarchal Interrupt Scheme ............................................................................................................ 98 QoS Processing Walkthrough ......................................................................................................... 112 Port-Based QoS Marking Operation................................................................................................ 117 MAC-Address-Based QoS Marker Configuration............................................................................ 120 QoS Enforcement Walkthrough....................................................................................................... 123 {TC, DP} Assignment Algorithm for Data traffic............................................................................... 124 {TC, DP} Assignment for Control Packets....................................................................................... 125 {TC, DP} Assignment of Mirrored Packets ...................................................................................... 125 DiffServ Domains Crossing Using a Single DSCP to DSCP Mutation Table .................................. 134 Functional Block Diagram of Tri-Speed Port in 1000BASE-X Mode .............................................. 137 Functional Block Diagram of Tri-Speed Port in SGMII Mode ......................................................... 138 Functional Block Diagram of the HyperG.Stack Port ..................................................................... 139 Functional Block Diagram of the HX/QX ......................................................................................... 142 MAC Loopback Packet Walkthrough............................................................................................... 155 PCS Loopback Packet Walkthrough ............................................................................................... 155 Analog Loopback Packet Walkthrough ........................................................................................... 156 Repeater Loopback Packet Walkthrough........................................................................................ 156 Ingress Pipe Block Diagram for SecureSmart and Layer 2+ Stackable Switches .......................... 173 Ingress Pipe Block Diagram for Multilayer Stackable Switches ...................................................... 174 Organization of the Policy TCAM .................................................................................................... 176
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
List of Figures
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Preface
This document describes the architecture and features of the Prestera-DX SecureSmart switches, Layer 2+ stackable switches, and Multilayer stackable switches. It also provides full register definitions for these devices. All feature descriptions and specifications in this document refer to all of the following packet processors, unless otherwise specified. 98DX106, 98DX107, 98DX130, 98DX133, 98DX163, 98DX163R, 98DX166, 98DX167,98DX243, 98DX246, 98DX247, 98DX249, 98DX250, 98DX253, 98DX260, 98DX262, 98DX263, 98DX269, 98DX270, 98DX273, and 98DX803 In this document, any or all of these packet processors are referred to as “the device” or “the devices”. Wherever a section is relevant for only some of these devices, this is stated in the following way at the beginning of the section: This section is relevant for the following devices:
R
D SecureSmart: 98DX262 D Layer 2+ Stackable: 98DX130, 98DX260, 98DX270, 98DX803 D Multilayer Stackable: 98DX133, 98DX263, 98DX273 D SecureSmart Stackable: 98DX169, 98DX249, 98DX269
R
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Note that if the section is not relevant for only one or two of the devices, this is emphasized as follows: This section is relevant for the following devices:
D SecureSmart: 98DX106, 98DX163, 98DX163R, 98DX243, 98DX262 D SecureSmart Stackable: 98DX169, 98DX249, 98DX269 D Layer 2+ Stackable: 98DX130, 98DX166, 98DX246, 98DX250, 98DX260, 98DX270 D Multilayer Stackable: 98DX107, 98DX133, 98DX167, 98DX247, 98DX253, 98DX263, 98DX273 U
Not relevant for: 98DX803
Document Organization
The sections in this specification are organized according to architectural and functional topics. Section 3. "Functional Overview" on page 31 provides a general description of the functional units in the device, and a packet walk-through description. Subsequent chapters focus on each of the device’s architectural and functional topics. Each chapter includes a description of the particular functional behavior, which is followed by the associated hardware register and table configurations. References to registers and table entries are hyperlinks to the corresponding register definition in the appendix of this document.
Related Documentation
The following documents contain additional information related to the Prestera® family chipset: • RFC and IEEE standards (Table 605, “Referenced Standards,” on page 821) • Prestera-DX Packet Processors Hardware Design Guide, (Document Control # MV-S300644-00) • 98DX130/133/250/253/260/263/270/273 Hardware Specifications (Document Control # MV-S102110-00) • 98DX166/167/246/247 Hardware Specifications (Document Control # MV-S102727-00) • 98DX803 Hardware Specifications (Document Control # MV-S103020-00)
MV-S102110-02 Rev. E Page 14
CONFIDENTIAL Document Classification: Restricted Information
The following conventions are used in this document: Document Conventions
Document Conventions The following name and usage conventions are used in this document: Signal Range
A signal name followed by a range enclosed in brackets represents a range of logically related signals. The first number in the range indicates the most significant bit (MSb) and the last number indicates the least significant bit (LSb). Example: CPU_TXD[7:0]
An n symbol at the end of a signal name indicates that the signal’s active state occurs when voltage is low. Example: INTn
State Names
Register Naming Conventions
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Active Low Signals n
State names are indicated in italic font. Example: linkfail
Register field names are indicated as follows: Example: The field in the Global Control register. If the field name is in blue font (), this indicates a hyperlink. Register field bits are enclosed in brackets. Example: Field [1:0]
Register addresses are represented in hexadecimal format Example: 0x0
Reserved: The contents of the register are reserved for internal use only or for future use.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 15
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
98DX163/243 Hardware Specifications (Document Control # MV-S103374-00) 98DX106-BCW Hardware Specifications (Document Control # MV-S103473-00) 98DX106-LKJ Hardware Specifications (Document Control # MV-S103381-00) 98DX107-BCW Hardware Specifications (Document Control # MV-S102993-00) 98DX107-LKJ Hardware Specifications (Document Control # MV-S103560-00) 98DX262 Hardware Specifications (Document Control # MV-S103020-00) 98DX249 and 98DX269 Hardware Specifications (Document Control # MV-S103653-00) 98DX169, 98DX249, and 98DX269 Product Brief (Document Control # MV-S103614-00)
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
• • • • • • • •
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Document Conventions
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Glossary of Acronyms The acronyms in Table 1 are used in Prestera documentation.
Aged Address Access Control List DiffServ “Assured Forwarding” Per-Hop Behavior Address Resolution Protocol Address Update Best Effort Bridge Protocol Data Unit Committed Burst Size Classless Interdomain Routing Committed Information Rate Class of Service Cyclic Redundancy Check DiffServ “Class Selector” Per-Hop Behavior Destination MAC Address IPv4 Header “Don’t fragment” field Destination IP Address Drop Precedence Differentiated Service Distributed Switching Architecture IEEE 802.2 Destination Service Access Point DiffServ Codepoint Equal/Weighted Cost Multipath DiffServ “Expedited Forwarding” Per-Hop Behavior Forwarding Database GARP VLAN Registration Protocol Internet Control Message Protocol Independent VLAN Learning Longest Prefix Match Media Access Control IPv4 header “More Fragments” Flag Multicast Listener Discovery Million packets per second Multiple Spanning Tree Maximum Transmission Unit “New Address” Address Update message Port-based ACL Policy Control Entry Policy Control List Per-Hop Behavior Port VLAN-ID “Query Address” Address Update message Quality of Service “Query Reply” Address Update message Reconciliation Sublayer Source MAC Address
MV-S102110-02 Rev. E Page 16
CONFIDENTIAL Document Classification: Restricted Information
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Acronyms
AA ACL AF ARP AU BE BPDU CBS CIDR CIR CoS CRC CS DA DF DIP DP DS DSA DSAP DSCP ECMP EF FDB GVRP ICMP IVL LPM MAC MF flag MLD MPPS MST MTU NA PACL PCE PCL PHB PVID QA QoS QR RS SA
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Table 1:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Acronyms (Continued) Shaped Deficit Weighted Round Robin Source IP Address Service-Level-Agreement Strict Priority IEEE 802.Source Service Access Point Single Spanning Tree Spanning Tree Protocol Shared VLAN Learning “Transplanted Address” Address Update message Traffic Class IPv4 header “Type of Service” field IPv4 header “Time to Live” field User Priority VLAN-based ACL VLAN Identification Multicast group index Virtual Local Area Network Variable-Length Subnet Masking Weighted Round Robin 10 Gigabit Attachment Unit Interface
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
SDWRR SIP SLA SP SSAP SST STP SVL TA TC TOS TTL UP VACL VID VIDX VLAN VLSM WRR XAUI
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 17
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Table 1:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Document Conventions
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Section 1. Product Overview Product Family Overview
The Marvell® Prestera®-DX family of packet processors deliver the optimal desktop switching solution for Enterprise (desktop and stackable) and Small-to-Medium Size Business (SMB) networks. This functional specification describes three families of Prestera-DX devices: SecureSmart switches Layer 2+ stackable switches Multilayer stackable switches
• • •
1.1.1
Prestera-DX SecureSmart Switches
The Prestera-DX SecureSmart switches are targeted at the SMB market. They integrate Gigabit Ethernet ports with integrated SERDES (serializer-deserializer), as well as HyperG.Stack ports with XAUI transceivers, a Layer 2+ Switching engine, a Layer 2 through Layer 4 Policy engine, MII/GMII/RGMII Ethernet port for management, and on-chip buffer memory. These complete system-on-a-chip (SoC) packet processors provide support for line-rate Layer 2 bridging with 128-byte deep packet inspection Policy Control List and full IEEE 802.1p and DiffServ QoS Support.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The Host CPU management interface of these devices is an MII/GMII/RGMII Ethernet port for packet forwarding and a Slave SMI Interface for address-mapped entities access. These devices do not support IPv4/IPv6 Unicast routing.
The Prestera-DX SecureSmart family of switches consists of the following devices: 98DX106 10 Tri-Speed Ports SecureSmart switch 98DX163/98DX163R 16 Tri-Speed Ports SecureSmart switch 98DX243 24 Tri-Speed Ports SecureSmart switch 98DX262 24 Tri-Speed Ports + 2 HyperG.Stack Ports SecureSmart switch
98DX106, 98DX163, 98DX163R, and 98DX243
Apart from their port configurations, the 98DX106, 98DX163, 98DX163R, and 98DX243 devices are: • Footprint compatible • Features compatible • Software compatible • Footprint compatible with the 98DX160, 98DX240, 98DX162, 98DX242, 98DX166, 98DX246, 98DX107, 98DX167 and 98DX247 devices.
98DX262
Apart from its pin configuration, the 98DX262 is footprint compatible with the 98DX250, 98DX260, 98DX270, 98DX803, 98DX253, 98DX263, and 98DX273 devices.
MV-S102110-02 Rev. E Page 18
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The Prestera-DX SecureSmart Stackable switches are targeted at the SMB market. They integrate Gigabit Ethernet ports with integrated SERDES (serializer-deserializer), as well as HyperG.Stack ports with XAUI transceivers and HX/QX ports with integrated SERDES, a Layer 2+ switching engine, an IPv4/IPv6 Unicast routing engine, a Layer 2 through Layer 4 Policy engine, MII/GMII/RGMII Ethernet port for management, and on-chip buffer memory. These complete system-on-a-chip (SoC) packet processors provide support for line-rate Layer 2 bridging with 128-byte deep packet inspection Policy Control List and full IEEE 802.1p and DiffServ QoS Support.The HX/QX ports provide cost-effective stacking solutions, ideal for the SMB market by utilizing low-cost HDMI or SATA cables. The Host CPU management interface of these devices is an MII/GMII/RGMII Ethernet port for packet forwarding and a Slave SMI Interface for address-mapped entities access. These devices are stackable to up to 32 devices.
98DX169, 98DX249 and 98DX269
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The Prestera-DX SecureSmart Stackable family of switches consists of the following devices: 98DX169 16 Tri-Speed Ports + 2 HX/QX ports SecureSmart Stackable switch with IPv4/IPv6 Unicast Routing capabilities. 98DX249 24 Tri-Speed Ports + 2 HX/QX ports SecureSmart Stackable switch 98DX269 24 Tri-Speed Ports + 2 HX/QX ports + 1 HyperG.Stack Ports SecureSmart Stackable switch with IPv4/IPv6 Unicast Routing capabilities. or 24 Tri-Speed Ports + 1 HX/QX Port + 2 HyperG.Stack Ports SecureSmart Stackable switch with IPv4/IPv6 Unicast Routing capabilities.
Apart from their port configurations the 98DX169, 98DX249, and 98DX269 devices are: • Features compatible • Software compatible
Apart from its port configuration, the 98DX269 is footprint compatible with the 98DX250, 98DX260, 98DX262, and 98DX270.
1.1.3
Prestera-DX Layer 2+ Stackable Switches
The Prestera-DX Layer 2+ stackable switches are targeted at the Layer 2+ stackable market. They integrate Gigabit Ethernet ports with integrated SERDES (serializer-deserializer), as well as HyperG.Stack ports with XAUI transceivers, a Layer 2+ switching engine, a Layer 2 through Layer 4 Policy engine, PCI or MII/GMII/RGMII Ethernet port for management, and on-chip buffer memory. These complete system-on-a-chip packet processors provide support for line-rate Layer 2 bridging with 128-byte deep packet inspection Policy Control List and full IEEE 802.1p and DiffServ QoS Support. The Host CPU management interface of these devices is a PCI Interface or an MII/GMII/RGMII Ethernet port for packet forwarding and a Slave SMI Interface for address-mapped entities access. Those devices are stackable to up to 32 devices and they do not support IPv4/IPv6 Unicast Routing.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 19
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
1.1.2
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Product Family Overview
Note
The 98DX166 and 98DX246 do not incorporate a PCI interface for management. Like the SecureSmart switches, their management interface is an MII/GMII/RGMII Ethernet port for packet forwarding and an a Slave SMI Interface for address-mapped entities access.
98DX166 and 98DX246
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Apart from their port configurations, the 98DX166 and 98DX246 devices are: • Footprint compatible • Features compatible • Software compatible • Footprint compatible with the 98DX160, 98DX240, 98DX162, 98DX242, 98DX163, 98DX243, 98DX107, 98DX167 and 98DX247 devices
98DX250, 98DX130, 98DX260, 98DX270, and 98DX803
Apart from their port configurations, the 98DX130, 98DX250, 98DX260, 98DX270 and 98DX803 devices are: • Footprint compatible • Features compatible • Software compatible • Footprint compatible with the 98DX262, 98DX253, 98DX263, and 98DX273.
1.1.4
Prestera-DX Multilayer Stackable Switches
The Prestera-DX Multilayer stackable switches are targeted at the stackable edge router market. They integrate Gigabit Ethernet ports with integrated SERDES (serializer-deserializer) as well as HyperG.Stack ports with XAUI transceivers, a Layer 2+ switching engine, an IPv4/IPv6 Unicast Routing Engine, a Layer 2 through Layer 4 Policy engine, PCI or MII/GMII/RGMII Ethernet port for management, and on-chip buffer memory. These complete system-on-a-chip (SoC) packet processors provide support for line-rate Layer 2 bridging with 128-byte deep packet inspection Policy Control List and full IEEE 802.1p and DiffServ QoS Support. The Host CPU management interface of these devices is a PCI Interface or an MII/GMII/RGMII Ethernet port for packet forwarding and an a Slave SMI Interface for address-mapped entities access. These devices are stackable to up to 32 devices and they support IPv4/IPv6 Unicast Routing.
MV-S102110-02 Rev. E Page 20
CONFIDENTIAL Document Classification: Restricted Information
The 98DX107, 98DX167, and 98DX247 do not incorporate a PCI interface for management. Like the SecureSmart switches, their management interface is an MII/GMII/RGMII Ethernet port for packet forwarding and a Slave SMI Interface for address-mapped entities access.
98DX107, 98DX167, and 98DX247
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Apart from their port configurations, the 98DX107, 98DX167 and 98DX247 are: • Footprint compatible • Features compatible • Software compatible • Footprint compatible with the 98DX160, 98DX240, 98DX162, 98DX242,98DX106, 98DX163, 98DX243, 98DX166 and 98DX246 devices
98DX253, 98DX133, 98DX263 and 98DX273
Apart from their port configurations, the 98DX133, 98DX253, 98DX263, and 98DX273 are: • Footprint compatible • Features compatible • Software compatible • Footprint compatible with the 98DX262, 98DX250, 98DX260, and 98DX270.
1.2
Prestera Software Suite
The Prestera Software Suite (PSS) is composed of a set of production-quality comprehensive drivers for managing a Prestera based system. The Prestera Software Suite serves as a foundation for customer-developed applications, such as IEEE 802.1 bridging services, IPv4/IPv6 routing, Policy Control Lists, Traffic Conditioning, and Quality of Service. Based on a modular architecture and comprehensive APIs, the Prestera Software Suite enables software developers to integrate high-level applications with minimal effort, without register-level knowledge of the Prestera chipset registers and tables. The software is written in ANSI-C and is OS and CPU independent for easy porting. See the Prestera Software Suite User Guide for additional information.
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Prestera Software Suite
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Section 2. Summary of Features
Detailed feature definition and configuration descriptions can be found in the associated sections of this document.
2.1
Port MAC Features
The device incorporates 10/12/16/24 independent 10/100/1000 Mbps Ethernet MACs with integrated 1.25 Gbps SERDES. In addition, the following devices incorporate independent HyperG.Stack MACs with integrated XAUI transceivers: 1 HyperG.Stack port:
98DX130, 98DX133
2 HyperG.Stack ports:
98DX260, 98DX262, 98DX263
3 HyperG.Stack ports:
98DX270, 98DX273, 98DX803
2 HX/QX ports:
98DX249, 98DX169
The MAC port features include: • 10/100/1000 Mbps Ethernet MAC:
–
•
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
2 HX/QX ports and 1 HyperG.Stack port or 1 HX/QX port and 2 HyperG.Stack ports: 98DX269
Integrated SGMII interface on all 10/24/16 tri-speed ports. SGMII is a serialized version of the IEEE 802.3 GMII interface, which supports a triple-speed MAC (1000/100/10 Mbps) using only four I/Os per port.
–
IEEE 802.3x Flow Control support on full-duplex links and back-pressure Flow Control on half-duplex links.
–
Two IEEE 802.3 Clause 22 compliant master SMI interfaces for external PHY management and AutoNegotiation.
–
Supports manual or automatic setting for link, speed, duplex, and IEEE 802.3x Flow Control.
–
Support of Automatic Media Select when connected to a 88E1112 Alaska® PHY, without CPU intervention.
– – – – –
Support for Virtual Cable Tester® (VCT) technology, using the Alaska transceiver. Support for 1000 BASE-X for fiber and backplane applications. Support for pre-emphasis on serial driver.
Support for Ethernet-like and RMON EtherStats counters. Support for Jumbo frames of up to 10 KB.
HyperG.Stack MAC
–
The HyperG.Stack port integrates a XAUI transceiver using 16 I/0s, incorporating four synchronized lanes that deliver bi-directional point-to-point data transmission of 3.125 Gbps or 3.75 Gbps per lane.
– –
IEEE 802.3ae XAUI-compliant Quad 3.125 Gbps/lane. Supports pre-emphasis on serial driver.
MV-S102110-02 Rev. E Page 22
CONFIDENTIAL Document Classification: Restricted Information
Three IPG modes—LAN mode, Fixed mode, and WAN mode. IEEE 802.3x Flow Control support.
Packet Level Flow Control support via digital pins.
IEEE 802.3 Clause 45 compliant master XSMI interface for configuration of the HyperG.Stack MACs and configuration of external XFP or XENPAK PHYs. Per-port IEEE 802.3 Clause 45 compliant Slave XSMI interface for port configuration. Two preamble modes—Standard and Enhanced.
Support for Ethernet-like and RMON EtherStats counters. Supports Jumbo frames of up to 10 KB.
HX/QX MAC
–
– – – – – – – – –
The HX/QX port integrates two (HX) or one (QX) SERDES lanes using four (HX) or two (QX) I/0s. It incorporates two synchronized lanes, delivering bi-directional point-to-point data transmission of 3.125 Gbps per lane. QX port uses a single 3.125 Gbps SERDES lane for 2.5 Gbps throughput. HX port uses a two 3.125 Gbps SERDES lanes for 5 Gbps throughput. Supports pre-emphasis on serial driver.
Exceeds IEEE 802.3ae jitter requirements in 10 Gbps applications. On-chip 50 ohms serial receiver termination. IEEE 802.3x Flow Control support.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
•
On-chip 50 Ohms Serial receiver termination.
Packet Level Flow Control support via digital pins.
Support for Ethernet-like and RMON EtherStats counters. Supports Jumbo frames of up to 10 KB.
The port MAC features and configuration are described in detail in Section 9. "Network Interfaces and Media Access Controllers (MACs)" on page 136.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 23
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
– – – –
Exceeds IEEE 802.3ae jitter requirements in 10 Gbps applications.
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
– – – – – –
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Port MAC Features
2.2
Port Trunking Features
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Port trunking (also known as link aggregation) allows multiple physical ports to function as a single high-bandwidth logical port between the device and other switching devices or end-stations. The device’s port trunking support is compliant with the IEEE 802.3ad Link Aggregation standard.
The following port trunk features are supported by the device: • Support for 127 trunk groups. (The SecureSmart devices support the following: 98DX163, 98DX243, and 98DX262 support 32 trunk groups and the 98DX106 supports 8 trunk groups.) • Each trunk group can be configured with up to eight port members. The Marvell Distributed Switching Architecture (DSA) enables the trunk group members to reside on any device in the system. • Unicast and Multicast packets are load-balanced among the trunk group port members using either:
– –
A hash function based on the packet’s L2, L3, and/or L4 header fields The ingress port number or Trunk-ID
Port trunking features and configuration are described in detail in Section 13. "Port Trunking" on page 281.
2.3
Distributed Switching Architecture (DSA) Features
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The device supports DSA, which allows multiple devices to be cascaded through any of its Ethernet MAC port interfaces with other devices in these three families, or with any Marvell device that supports DSA tag cascading (e.g., the 98DX240). The cascade port can be a single MAC port or a trunk group consisting of several MAC ports on the device. Up to 32 devices can be cascaded to create a single cascaded system. Any cascade topology (e.g., chain, ring, or mesh) is supported.
A cascaded system of devices in these three families supports the same features as a non-cascaded single device in these three families. This includes: • Trunk groups with port members on multiple devices in the system. • Mirroring to analyzer port on any device in the system. • Traffic to the CPU can be sent through any device in the system. • CPU can inject traffic to be transmitted through a port on any device in the system. • A source-ID based egress filtering mechanism may be used to prevent loops in the forwarding topology.
Cascaded system features and configuration are described in detail in Section 4. "Distributed Switching Architecture" on page 44.
MV-S102110-02 Rev. E Page 24
CONFIDENTIAL Document Classification: Restricted Information
Eight traffic class assignments for segregation on egress queues. (SecureSmart and SecureSmart Stackable devices have four traffic classes for network ports and eight queues for the CPU port.) 2 drop precedence level assignments for tail-dropping on congested egress queues.
On egress, optional QoS marking of packet user priority and/or DSCP. QoS initial marking mechanisms: Port-based, Protocol-based, Policy-based, or FDB based. Layer-2 and/or Layer-3 QoS Trusted Port modes:
– –
Maps the packet User Priority or DSCP to a QoS profile.
•
Optional DSCP Mutating for crossing of DiffServ domains. Policing: Out-of-profile packets may be QoS remarked or dropped.
•
Setting Packet header 802.1p Use Priority and/or DSCP QoS fields.
2.5
Policy Features
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The Quality of Service features and configuration are described in detail in Section 8. "Quality of Service (QoS)" on page 110.
The device incorporates an on-chip line-rate ingress Policy engine. The Policy engine is suited for supporting port or VLAN access control lists (ACLs/VACLs), policy based QoS, VLANs, mirroring or trapping to the CPU, or switching. The Policy engine features include the following:
• •
• • • • • •
Inspection of the first 128 bytes of the packet. Up to 1024 policy rules where each rule key is 24 bytes, or 512 policy rules where each rule key is 48 bytes, or a combination of 24-byte and 48-byte rules. (SecureSmart and SecureSmart Stackable devices: Up to 256 policy rules, where each rule key is 24 bytes, or 128 policy rules, where each rule key is 48 bytes, or a combination of 24-byte and 48-byte rules.) Each rule key has per-bit masking capability. The key consists of well-known fixed Layer-2/3/4 fields, as well as user-defined fields. Rules are associated with a policy-ID. Packets are assigned a policy-ID based on the source port/trunk number, or the packet’s VID. Supports two policy searches per packet, each with a separate policy-ID assignment. Rule match counters—a rule action can be bound to one of 32 global rule-match counters. The policy actions include the following features:
– – – – – –
Accept/Deny Trap/Mirror to CPU Mirror to analyzer port Assign QoS attributes
VLAN assignment or VLAN translation Redirect to a target destination
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 25
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
–
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The device provides extensive Layer-2 and Layer-3 Quality of Service (QoS) capabilities, allowing it to support IEEE 802.1p and IETF Diffserv requirements. These device QoS features include: • 72 global QoS Profiles. • A QoS Profile determines the packet’s traffic class, drop precedence, user priority, and DSCP:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Quality of Service Features
Bind to one of 256 Policers (SecureSmart and SecureSmart Stackable devices support four Policers per port.)
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
–
The Policy features and configuration are described in detail in Section 10. "Ingress Policy Engine" on page 172.
Bridging Features
The device supports wire-speed 802.1D/Q bridging features, together with many additional bridging feature enhancements. The device’s bridging features include: • 16K entry Forwarding Database (FDB): (SecureSmart devices, SecureSmart Stackable devices and 98DX107: 8K entry FDB)
– – – – – – •
• • • • • •
New Source Addresses can be dropped, trapped, or forwarded. This is an important security hook for IEEE 802.1X Port Based Access Control, and to the proprietary extension MAC-based access control. Independent and Shared VLAN Learning.
CPU triggered delete of entries by VLAN and/or port/trunk.
MAC based filtering, trapping, mirroring to CPU, or mirroring to analyzer port. Address transplanting from an old device, port, or trunk to a new device, port, or trunk. This is an important hook for efficient implementation of IEEE 802.1w Rapid Reconfiguration. Address Update messages to/from the CPU for FDB management. IPv4/6 Multicast bridging based on the packet (Source-IP, Group-IP, VLAN-ID)
VLANs:
– – – – •
Automatic and CPU-controlled learning and aging modes.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
– –
4K entry VLAN table. (SecureSmart and SecureSmart Stackable devices have 256 active VLANs.) Port, Protocol, and Policy-based VLAN assignment mechanisms. Nested VLAN support for Provider Bridging. VLAN ingress and egress filtering.
4K entry Multicast Group table. (SecureSmart and SecureSmart Stackable devices have 256 Multicast Group tables) Support for single spanning tree and multiple spanning tree with up to 256 spanning tree groups. (SecureSmart and SecureSmart Stackable devices do not support Multiple Spanning Tree.) Private VLAN Edge for secure forwarding to an uplink port. Trapping/Mirroring of well-known control protocols. Trapping/Mirroring/Dropping of Unknown or Unregistered packets. Rate limiting of Known, Unknown Unicast, Multicast, and Broadcast packets. Counters:
– – –
RMON 1 Host Group and Matrix Group counters. Port/VLAN/Device ingress counters.
Port/VLAN/drop precedence/traffic class egress counters.
The Bridge features and configuration are described in detail in Section 11. "Bridge Engine" on page 203.
MV-S102110-02 Rev. E Page 26
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
This section is relevant for the following devices:
D Multilayer Stackable: 98DX107, 98DX133, 98DX167 98DX247, 98DX253, 98DX263, 98DX273 D SecureSmart Stackable: 98DX169, 98DX249, 98DX269 Not relevant for the SecureSmart or Layer 2+ Stackable devices.
The device supports the following Unicast routing features: • Per-port and per-VLAN enabling of IPv4 and IPv6 Unicast routing. • Policy-based IPv4/v6 routing lookup. • Up to 1K prefix/host entries and 1K ARP MAC addresses. SecureSmart Stackable devices support up to 32 static IPv4 prefix/host entries and 256 ARP MAC addresses. • Next-hop forwarding to any {device, port}, trunk, or VLAN group in the system. • Per route entry QoS assignment. • Per route entry mirroring-to-CPU or mirroring to Ingress Analyzer port. • Router exception checking:
– – –
Options Routed packet modifications:
– – – •
MAC SA assignment based on port or VLAN IPv4 TTL and IPv6 Hop Limit decrement.
IPv4 Checksum update. Support for Layer 3 control traffic:
– – –
•
TTL/Hop Limit Exceeded
RIPv1
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
•
IPv4/v6 Header Error
IPv4/v6 control protocols running over link-layer Multicast, e.g. RIPv2, OSPv2
UDP Relay. Egress mirroring of routed packet to an Analyzer port.
The Unicast Routing features and configuration are described in detail in Section 12. "IPv4 and IPv6 Unicast Routing" on page 265.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 27
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
U
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Unicast Routing Features
2.8
Traffic Policing Features
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The device supports 256 on-chip wire-speed ingress traffic policers. (SecureSmart and SecureSmart Stackable devices support four per-port on-chip wire-speed ingress traffic policers.)
Each Policer supports the following features: • Single meter, configurable with a maximum rate and burst-size:
–
– • • •
Rates range from a minimum rate of 1Kbps to a maximum rate 100 Gbps, with six levels of granularity ranging from a minimum granularity of 1 Kbps for rates under 1 Mbps, to a maximum granularity of 100 Mbps for rates up to 100 Gbps.
Large burst-size supports temporal bursts without impacting TCP's sliding window algorithm. Color aware and unaware operational modes. Out-of-profile packets are either remarked with QoS or dropped: QoS Remarking is based either on explicit QoS assignment or mapped according to the incoming DSCP. Conformance counters:
– –
16 global conformance counter sets. Counts in-profile and out-of-profile packets. Each policer may be bound to a conformance counter set.
The Policing features and configuration are described in detail in Section 14. "Ingress Traffic Policing Engine" on page 292.
Bandwidth Management Features
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
2.9
The device provides the bandwidth management features required for QoS (lossy) systems, and flow-control (lossless) systems. These device QoS features include:
• • •
Ingress bandwidth management using flow-control with XOFF/XON buffer limits. Eight egress traffic class queues per port (including the CPU port). (SecureSmart and SecureSmart Stackable devices have four traffic classes for the network ports and eight traffic classes for the CPU.) Egress tail-dropping, for congestion avoidance:
– – •
Two levels of drop precedence for color-aware tail-dropping.
Egress queue scheduling algorithms:
– – – •
Based on the queued buffers limit and queued packets limit.
Shaped Weighted Round Robin (SDWRR), for minimum bandwidth assignment.
Strict Priority (SP) provides low-latency scheduling, for high-priority low-latency traffic. Hybrid scheduling of both SP and SDWRR queues.
Egress per-port and per-queue shaping, for limiting the maximum bandwidth:
–
Byte-based shaping rates ranging from 64 Kbps to 12 Gbps, with a granularity of under 64 Kbps.
The bandwidth management features and configuration are described in detail in the Section 15. "Bandwidth Management" on page 302.
MV-S102110-02 Rev. E Page 28
CONFIDENTIAL Document Classification: Restricted Information
In managed systems, it is critical that the CPU receive only traffic that requires software processing. Unwanted traffic unnecessarily burdens the CPU and delays handling of other traffic that requires processing. Furthermore, traffic that is sent to the CPU must be properly prioritized into separate queues. This allows the CPU to process high-priority traffic with minimum delay, even when overloaded with low-priority traffic. The device provides Secure Control Technology for both selecting traffic to be sent to the CPU, as well as prioritizing and managing the bandwidth of traffic sent to the CPU.
•
8 Traffic Class CPU queues: Same queueing, scheduling algorithms, and shaping as non-CPU-port queues, see Section 7. "CPU Traffic Management" on page 102. For each packet type trapped or mirrored to the CPU, the user can configure the following packet attributes:
– – – – – •
Traffic Class
Drop Precedence
CPU destination device
Packet truncation to 128 bytes Statistical dropping
Explicit mechanisms to trap or mirror well-known Multicast and Broadcast control packets to the CPU:
– – – – – – – – –
ARP Request
IPv4 IGMPv1/2/3 IPv6 MLDv1/2
IPv6 Neighbor Discovery IP Broadcast
Spanning tree BPDU
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
•
Other IEEE reserved Multicast packets (e.g. GVRP, LACP, PAE, etc.) Cisco Layer-2 Multicast control packets Unicast MAC-to-me packets
The STC features and configuration are described in detail in Section 7. "CPU Traffic Management" on page 102.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 29
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The device’s physical management interface may be the PCI interface for both packet Rx/Tx and register access, or MII/GMII/RGMII for packet Rx/Tx and SMI interface for register access.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Secure Control Technology (SCT) Features
2.11
Traffic Monitoring Features
–
Ingress and/or egress port packet sampling compliant with RFC 3176: Sflow - A Method for Monitoring Traffic in Switched and Routed Networks.
– –
Packets may be truncated to 128 bytes and sent to any CPU in the system. Sampling to the CPU is independent of ingress/egress packet mirroring to analyzer port.
Mirroring to Analyzer Port:
– – – – –
Independent ingress and egress analyzer port configuration.
Ingress mirroring enable per port, policy rule action, VLAN, and/or FDB. Egress mirroring enabled per port.
Unlimited number of ingress and egress mirrored ports.
Supports ingress and/or egress statistical mirroring to the destination analyzer port.
MV-S102110-02 Rev. E Page 30
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The traffic monitoring features and configuration are described in detail in Section 16. "Traffic Monitoring" on page 312.
CONFIDENTIAL Document Classification: Restricted Information
The devices are members of the Marvell® Prestera®-DX family of networking switches. This single-chip packet processor integrates 10/12/16/24, ports of Gigabit Ethernet with integrated SERDES, and one, two, or three HyperG.Stack ports, each with integrated quad SERDES interface XAUI transceivers, Layer 2+ switching engine, IPv4/IPv6 Unicast routing engine, powerful Layer 2 through Layer 4 Policy engine, PCI or MII/GMII/RGMII Ethernet port for management, and on-chip 6 Mbit buffer memory. This complete system-on-a-chip (SoC) packet processor provides support for line rate Layer 2 bridging, IPv4 and IPv6 Unicast routing, deep packet inspection Policy engine, and Layer 2/3 QoS.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The device integrates the following functions: • Store and forward switching architecture with on-chip packet buffering. • Layer 2 through Layer 4 packet Policy engine. • Ethernet Bridge engine. • IPv4/IPv6 Unicast routing engine • Ingress policers. • Support for packet header manipulation, including VLAN insertion/removal/replacement, IEEE 802.1p User Priority field remarking and DSCP field remarking • Marvell Distributed Switching Architecture, based on the DSA tag for the CPU packet interface and cascade ports between devices. • On-chip transmit queues, including congestion handling and scheduling. • Egress rate shapers. • Support for host processor interface—PCI, Ethernet + SMI, or MII/GMII/RGMII Ethernet. In addition, Marvell provides a comprehensive set of software tools and software drivers supporting the Prestera chipset and the Alaska® transceivers. The Prestera Software Suite (PSS) provides the user with high-level APIs and OS/CPU independence.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 37
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
3.6
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
98DX269 Block Diagram
3.7
High-Level Packet Walkthrough
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
This section provides a functional walkthrough of the device.
Note
The only difference between the Layer 2+ devices and the Multilayer devices is that the Multilayer devices incorporate an IPv4/IPv6 Unicast Routing Engine.
SecureSmart and Layer 2+ Switches Ingress and Egress Processing Engines Egress Pipeline
Ingress Pipeline
Egress Filtering
Pre-Egress Engine
Multi-Target Replication
Policing Engine
Descriptor Enqueueing
Bridge Engine
Rate Shaping
Transmit Scheduler
Headers Alteration
Ports MAC Tx
MV-S102110-02 Rev. E Page 38
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Figure 7:
Policy Engine
Header Decode Engine
Ports MAC Rx
CONFIDENTIAL Document Classification: Restricted Information
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
The device’s processing engines are pipelined. The device maintains two pipelines—an ingress pipeline and an egress pipeline, which process all the traffic received on the device and transmitted from it. Figure 7 and Figure 8 illustrate the ingress and egress pipelines and the engines in each pipeline stage.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Headers Alteration
Bridge Engine
Ports MAC Rx
Packets received on a device’s port are first processed by the ingress pipeline, which consists of the following processing units: • Port MAC Rx • Header Decode engine • Policy engine • Bridge engine • IPv4/IPv6 Unicast Routing Engine (98DX107/167/247/253/263/273/169/249/269 devices only) • Policing engine • Pre-egress engine
3.7.1.1
Ingress Processing on a Cascade Port
Although all packets received by the device always pass through the ingress pipeline, all ingress engines are not necessarily enabled. Specifically, the Policy and Bridge engines can be independently enabled or disabled on a per port basis. In a cascaded/stackable system based on devices in these families, the device in the stack through which the packet is received performs the policy and bridge decisions. If the packet is forwarded through a cascade port, the
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 39
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Multilayer Stackable and SecureSmart Stackable Switches Ingress and Egress Processing Engines
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Figure 8:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
High-Level Packet Walkthrough
Ports MAC Rx
Each port MAC Rx operates independently. The port MAC is responsible for IEEE 802.3 MAC functionality, packet reception, allocation of buffers in the device’s packet memory and DMA of the packet data into the buffers memory. Packets that contain errors such as FCS Errors, Length Errors, etc. are discarded. Error-free packets continue to be processed by the ingress pipeline.
3.7.1.3
Header Decode Engine
If the packet is not filtered by the port MAC, the packet’s header of up to 128 bytes is decoded by the Header Decode engine. This engine decodes the packet header and extracts the packet fields (e.g., MAC SA, MAC DA, EtherType, SIP, DIP) that are required by the subsequent pipe engines.
3.7.1.4
Policy Engine
If the packet is not filtered by the port MAC, and the ingress port configuration setting enables Policy engine processing, the packet is processed by the Policy engine.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The Policy engine allows Policy Control Lists to be applied to packets based on flexible criteria, including the packet Layer 2, Layer 3, and Layer 4 field content.
The Policy engine may be used to implement user applications such as Access Control Lists (ACLs), Quality of Service (QoS), and Policy-based VLANs. The Policy engine may perform two lookups per packet.
On the 98DX107/167/247/253/263/273/169/249/269 devices, the Policy engine second lookup may be used to perform IPv4/IPv6 Unicast Longest Prefix match on the packet’s DIP. If a mach is found, the policy action entry is used as a next hop entry for the IPv4/IPv6 Unicast Routing engine.
3.7.1.5
Bridge Engine
If the packet is not filtered by the previous engines, and the ingress port configuration is enabled for Bridge engine processing, the packet is sent to the Bridge engine. The Bridge engine is responsible for the following functions: 1. IEEE 802.1Q/D Bridging. This includes functions such as VLAN assignment, Spanning Tree support, MAC learning, Address table entries aging, filtering, and forwarding. 2. IPv4 IGMP snooping and IPV6 MLD snooping. 3. Control packet trapping and mirroring to the CPU. This includes identifying IEEE reserved Multicast, IPv4/v6 link layer Multicast, IGMP and MLD, ARP, RIPv1, and IPv4 Broadcast packets. 4. Ingress port rate limiting of Broadcast, Multicast, and Unknown Unicast packets. 5. Filtering/trapping/mirroring of unknown and/or unregistered packets. 6. Private VLAN Edge (PVE), if enabled, overrides the bridge forwarding decision and sends the packet to a pre-configured destination.
MV-S102110-02 Rev. E Page 40
CONFIDENTIAL Document Classification: Restricted Information
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
3.7.1.2
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
forwarding decision and packet descriptor information are attached to the packet using the DSA tag. The Policy and Bridge engines are configured to be disabled on the device’s cascade ports, since the packet was already processed by these engines on the ingress device. In addition, any ingress trapping or mirroring of a packet to the CPU is also performed only on the ingress pipeline on the ingress device. This ensures that a packet is not sent to the CPU multiple times.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
This section is relevant for the following devices:
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
D Multilayer Stackable: 98DX107, 98DX133, 98DX167, 98DX247, 98DX253, 98DX263, 98DX273 D SecureSmart Stackable: 98DX169, 98DX249, 98DX269 U
Not relevant for the SecureSmart and Layer 2+ devices.
If the packet is IPv4 or IPv6 Unicast, and it has not been filtered by the previous engines, and the Policy engine has performed Longest Prefix Match on its DIP, and the packet has been triggered for routing by the Bridge engine, and its VLAN is enabled for routing, the IPv4/IPv6 Unicast routing engine is triggered and the packet is routed. The IPv4/IPv6 Routing engine is responsible for the following functions: • Router exception checking: • IPv4/v6 Header Error • TTL/Hop Limit Exceeded • Options • Next-hop forwarding to any {device, port}, trunk, or VLAN group in the system • Per route entry QoS assignment • Per route entry mirroring-to-CPU or mirroring to ingress analyzer port
3.7.1.7
Policing Engine
3.7.1.8
Pre-Egress Engine
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The Policing engine is invoked when a policy rule binds the matching packet flow to a policer instance. The Policing engine can meter the flow and maintain flow statistics. Non-conforming packets can be dropped, or have their QoS re-marked.
At the end of the ingress pipeline, the Pre-egress engine examines the decisions made by the ingress pipeline and prepares/duplicates the packet descriptor for the egress pipeline processing.
3.7.1.8.1 Ingress Pipeline Packet Commands
The ingress pipeline packet descriptor command may be any ONE of the following: • Drop the packet. • Trap the packet to the CPU. • Forward the packet to a target destination(s). • Forward the packet to a target destination(s) AND mirror the packet to the CPU.
In addition to one of the possible packet commands listed above, the ingress pipeline may set the packet descriptor with either or both of the following commands: • Mirror to the ingress analyzer port. • Ingress sample to the CPU.
3.7.1.8.2 Packet Descriptor Processing
If the packet descriptor command is “Drop the packet” and it is neither ingress-mirrored nor sampled to the CPU, the packet is dropped and its buffer(s) is/are released. The descriptor of this dropped packet is not forwarded to the egress pipeline. If the packet is mirrored to the CPU and/or sampled to the CPU and/or mirrored to the ingress analyzer port, the packet descriptor is replicated by the pre-egress engine for each target.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 41
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
R
IPv4/v6 Unicast Routing Engine
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
High-Level Packet Walkthrough
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
If, for any reason, the packet is to be sent to the CPU, the packet descriptor is set with a specific CPU code and its associated attributes, and then forwarded to the specified CPU in the system. The global CPU Code table defines the CPU code attributes that define the CPU device through which the packet is to be forwarded, its traffic class, drop precedence, the statistical sampling ratio, and an option to truncate the packet to 128 bytes.
3.7.2
Egress Pipeline
The ingress pipeline Pre-egress unit passes to the egress pipeline packet descriptors with either a Unicast or Multicast destination target. The egress pipeline is comprised of the following functional units: • Egress Filtering • Multi-Target Replication • Descriptor Queueing • Rate Shaping • Transmit Scheduler
3.7.2.1
Egress Filtering
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The following is performed prior to enqueueing a packet on a given egress port queue: 1. VLAN egress filtering 2. Spanning Tree egress filtering 3. Source port/trunk group egress filtering for multi-target packets 4. Egress port unregistered Multicast and Unknown Unicast filtering 5. Source ID multi-target filtering
Each of the egress filtering unit mechanisms can be independently enabled or disabled.
3.7.2.2
Multi-Target Replication
A multi-target destination is indicated via a multi-target group index, also known as VIDX.
If the packet is multi-target (i.e. Broadcast, Multicast, or Unknown Unicast), the packet descriptor is replicated for each egress port member of the VIDX group.
3.7.2.3
Descriptor Queueing
Non-filtered packet descriptors are enqueued on the egress traffic class queue.
For Head of the Line blocking prevention, the Queueing is based on a tail-drop algorithm, according to the number of packets or buffers in the queue. If the egress port is configured for egress mirroring to analyzer port, every packet descriptor enqueued on the egress port queue has its descriptor duplicated and forwarded back to the ingress pipeline Pre-egress unit. It is then forwarded to the configured egress analyzer port.
If the egress port is configured for statistical sampling of packets to the CPU, for every packet selected for sampling, the descriptor is duplicated and forwarded back to the ingress pipeline Pre-egress unit for forwarding to the configured CPU.
MV-S102110-02 Rev. E Page 42
CONFIDENTIAL Document Classification: Restricted Information
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
If the packet has a Unicast destination set to a trunk group, the destination is converted to one of the trunk group port members (on any device), based on the trunk hash function.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Each egress port traffic class queue can be configured with a token bucket for shaping traffic transmitted from the queue.
3.7.2.5
Transmit Scheduler
Packet descriptors are de-queued from the egress port traffic class queue according to the configured scheduling algorithm: • Deficit Weighted Round Robin • Shaped Weighted Round Robin • Strict Priority
3.7.2.6
Header Alteration
When the packet is read from the buffers memory for transmission, its header is altered according to its descriptor content and according to the type of port from which it is being sent. The packet’s header may be modified by one of more of the following actions: • In the 98DX107, 98DX133, 98DX167, 98DX247, 98DX253, 98DX263, 98DX273, 98DX169, 98DX249 and 98DX269 devices, if a packet is routed, its header is changed as follows: MAC DA is modified to reflect the next hop MAC.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
– – – – –
MAC SA is modified to reflect the router’s MAC.
VLAN is modified to reflect the next hop subnet.
IPv4 header TTL field is decremented (optional).
IPv6 hop limit is decremented (optional). • VLAN Tag add/removed/modified • IEEE 802.1 user priority remarked • IPv4/IPv6 DSCP remarked • If IPv4 header is modified, by either TTL decrement for routed packets or DSCP remarking, its checksum is recalculated. If the packet is sent via a Cascading port or to the CPU, a DSA tag is attached.
3.7.2.7
Ports MAC Tx
After the packet has been read from the buffers memory and its header has been altered, it is transmitted via the port’s MAC, which performs the MAC IEEE 802.3 MAC functionality. If the packet header has been altered, it generates a new CRC and appends it to the packet. If necessary, it pads the packet to a MinFrameSize of 64 bytes.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 43
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
In addition, the aggregate traffic of all the traffic class queues for a given egress port can be configured with a token bucket shaper.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
High-Level Packet Walkthrough
This section describes the Marvell® Distributed Switching Architecture (DSA). The DSA architecture allows multiple devices to be cascaded through any of its Ethernet MAC port interfaces with other devices, or with any Marvell device that supports DSA tag cascading. Up to 32 devices can be cascaded to create a single cascaded system. A cascaded system of devices in these three families supports the same features as a non-cascaded single device in these three families. This includes: • Trunk groups with port members on multiple devices in the system. • Mirroring to analyzer port on any device in the system. • Traffic to the CPU can be sent through any device in the system. • CPU can inject traffic to be transmitted out through a port on any device in the system.
4.1
Cascade Ports
A device’s port used for interconnecting Marvell devices is configured as a cascade port. All traffic sent and received on cascade ports is always DSA-tagged (Section 4.6 "DSA Tag"). Consequently, the cascade ports should only connect to other cascade ports.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Multiple cascade ports can be configured as a trunk group, to support large bandwidth inter-device connections (Section 13.4 "Trunking over Cascade Link" on page 291). Cascade ports should be members of all active VLANs. However, due to the fact that the DSA tag replaces the packet's VLAN Tag (see Section 4.6 "DSA Tag") the VLAN tagged state for cascade ports is not relevant.
To allow the CPU to transmit a packet to any port in the system, and to learn about received packets (e.g., CPU Code, source device/Port), the CPU port must be configured as a cascade port (Section 7.1 "CPU Port Number" on page 102), however it cannot be a member of any VLAN.
Configuration • To configure a port as a cascade port, set the field in the Cascading and Header • •
Insertion Configuration Register (Table 528 p. 770). To configure a CPU port as a cascade port, set the bit in the Cascading and Header Insertion Configuration Register (Table 528 p. 770). To configure the cascade port as a member of all active VLANs, set the corresponding port as a member in the VLAN Entry (0<=n<4096) (Table 520 p. 754).
MV-S102110-02 Rev. E Page 44
CONFIDENTIAL Document Classification: Restricted Information
A packet with a single-target destination from the CPU or a cascade or network port is associated with a destination device number and port number. The destination port may be the CPU port or any of the other local ports on the destination device (among other parameters). If the destination device is not the local device number, then the packet must be sent through a cascade port that leads to the destination device (either directly or through intermediate devices). To facilitate this, each device supports a Device Map table that maps a destination device number to a cascade port or cascade group. A single-target destination packet whose destination device is not the local device is sent out the cascade port or trunk group specified in the Device Map table entry for the given destination device. In the example in Figure 9, four devices are connected in a ring topology. The cascade ports on each devices are labeled P0 and P1. In this example, a network port on device 0 receives a known Unicast packet and is assigned a forwarding target destination port on device 2. The Device Map table on device 0 indicates that the packet must be egressed on P1 to reach device 2. The packet is then received on device 1, where its Device Map table indicates that the packet must be egressed on P1 to reach device 2. The packet is then received on device 2 and egressed on the target destination port.
Example of Single-Target Destination Forwarding in a Cascaded System
Known Unicast packet
FORWARD DSA Tag
P0 Dev 0 P1
P0
FORWARD DSA Tag
Dev 1
Dev
Port
Dev
0
N/A
0
1
P1
1
2
P1
2
3
P0
3
Device Map Table on device 0
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Figure 9:
P1
P0
Dev 2 P1
P0
Dev 3 P1 P1
Port
Dev
Port
Dev
Port
P0
0
P1
0
P1
N/A
1
P0
1
P1
P1
2
N/A
2
P0
P1
3
P1
3
N/A
Device Map Table on device 1
Device Map Table on device 2
Device Map Table on device 3
Configuration To configure the local device number, set the field in the Global Control Register (Table 84 • •
p. 377) accordingly. For each device number in a cascaded system (excluding the local device number), set the corresponding Device Map Table Entry (0<=n<32) (Table 471 p. 728) accordingly.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 45
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Every device in a cascaded system must be assigned a unique 5-bit device number.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Single-Target Destination in a Cascaded System
4.3
Multi-Target Destination in a Cascaded System
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
A multi-destination packet (i.e. unknown Unicast, Multicast, or Broadcast) is forwarded by the bridge egress processing according to the packet VLAN and its VIDX Multicast group assignment (Section 11.5 "Bridge Multicast (VIDX) Table" on page 240).
However, if the topology of the cascaded system is not a Spanning Tree, i.e., a loop exists in the topology (e.g., as in a ring topology), then multiple copies of the multi-target destination packets may be received by devices. The device supports a unique feature that allows Multicast destination packets to be flooded in the cascaded topology according to a source device rooted Spanning Tree. The packet is assigned a unique source device identifier by the ingress device, and this “source ID” value is propagated with the packet as part of the DSA tag. The egress pipeline of each device performs egress filtering of packets based on their source-ID. A port bitmap per source-ID defines whether a packet originating from a given source device should be filtered or forwarded on the corresponding port. Once the cascade topology is learned, for each device in the system a Spanning Tree, rooted at the device and reaching all the other devices in the system, is calculated by the system management software. The cascade port(s), on the leaf devices of the Spanning Tree are configured to be filtered for this source-ID Spanning Tree (Section 11.14 "Bridge Source-ID Egress Filtering" on page 257).
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In the example in Figure 10, four devices are connected in a ring topology. The Source-ID table on each device keeps the packet from looping around the ring. The Source-ID table forwards the packet along a Spanning Tree path, reaching each device once only. For clarity, in Figure 10 the Source-ID table shows the port bitmap only for the cascade ports P0 and P1 on each device, however the network ports on each device are all set to ‘1’ (i.e., forward). In the example, a Multicast packet is received on one of the network ports on device 0 and is assigned source-ID 0. The packet is assigned a VLAN and VIDX Multicast group index, and is flooded to the local member ports on the device. Note that the VLAN/VIDX must always include the cascade ports. The flooding of this packet through the ring is depicted by the RED arrows. Specifically, the Source-ID table on device 0 enables the packet to be forwarded on its cascade ports 0 and 1 to reach devices 3 and 1 respectively. Device 3 allows the packet to be flooded to device 2, but the Source-ID table on device 1 does not permit the packet to be forwarded device 2, and the Source-ID table on device 2 does not permit the packet to be forwarded to device 1. So we can see that multitarget packets received by device 0 are forwarded along a Spanning Tree path where device 0 is the root, and each device in the stack is reached. To continue the example, a Multicast packet is received on a network port on device 1. Here we see that the packet is flooded along a different Spanning Tree path (the blue arrows) to each of the devices in the stack.
MV-S102110-02 Rev. E Page 46
CONFIDENTIAL Document Classification: Restricted Information
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
The cascade port(s) on the device must be fixed members of all VLAN and Multicast groups, to ensure that the packet is propagated to all devices in the system.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Figure 10: Example of Multi-Destination Forwarding in a Cascaded System
P Dev P 0 0 1
Dev 0 Src-ID Table
Multicast Packet received on a network port on Device 1
P Dev P 1 1 0
Dev 1 Src-ID Table
Dev P 2 1
P 0
Dev 2 Src-ID Table
Dev P 3 1
P 0
Dev 3 Src-ID Table
SrcID P0/P1
SrcID P0/P1
SrcID P0/P1
SrcID P0/P1
0
1/1
0
0/0
0
0/0
0
1/0
1
1/0
1
1/1
1
0/0
1
0/0
4.4
Loop Detection
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Source ID Egress Filter table on each device
In the event of a topology change or misconfiguration of the Device Map or Source-ID Egress Filtering, a packet may be looped back to its original source device. To prevent such packets from continuing to loop between devices in the cascaded system, a global configuration can be enabled to discard DSA tagged packets received where the packet’s DSA tag is equal to the local device number. This filter is applied to all types of DSA tagged packets, with the exception of FROM_CPU, which does not contain a field.
Configuration
To enable ingress filtering of a DSA-tagged packet whose source device is equal to the local device, set the field in the Bridge Global Configuration Register1 (Table 371 p. 647) accordingly.
4.5
QoS on Cascade Interface
When oversubscribed, a cascade interface may suffer from congestion on its egress traffic class queues and leading to packet loss. Traffic on cascade ports is classified as either data, control, or mirror-to-analyzer.
Control traffic is defined as either traffic to the CPU, or traffic from the CPU that is specified as control traffic. Control traffic is further classified as either CPU-to-CPU traffic, or “other” traffic to/from the CPU. To segregate control traffic from data and mirror-to-analyzer traffic, control traffic is assigned a configurable traffic class on the cascade port. To segregate CPU-to-CPU traffic (typically used for internal system control) from other, less critical traffic to/from the CPU, each type of control traffic (CPU-to-CPU and Other-Control) is assigned a configurable drop precedence.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 47
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Multicast Packet received on a network port on Device 0
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Loop Detection
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Data traffic, defined as network-to-network traffic, is sent across a cascade port according to a global table that maps the packet traffic class and drop precedence to a cascade port traffic class and drop precedence. Mirror-to-analyzer traffic is assigned a dedicated traffic class and drop precedence for ingress mirrored traffic, and a dedicated traffic class and drop precedence for egress mirrored traffic.
4.6
DSA Tag
In a cascaded system, the DSA tag records the relevant packet information that must be passed from one device to another, to correctly process the packet. The DSA tag architecture is extensible, allowing additional fields to be added in new generation devices, while remaining backward-compatible with older devices. The first version of the DSA tag specified a 4-byte DSA tag format. The enhanced version of the DSA tag implements an extended 8-byte DSA tag. All packets transmitted through the device’s cascade ports must contain either a 4-byte DSA-tag (from legacy devices) or an extended 8-byte DSA tag. Legacy devices accept the extended tag from the device, but only recognize the first 4-bytes of the tag. The DSA tag is a superset of the fields contained in the IEEE 802.1Q tag. Thus a packet that arrives from a network port with an IEEE 802.1Q tag and is forwarded through a cascade port, has its 4-byte Q-tag converted into an extended 8-byte DSA tag, without any loss of the original Q-tag information.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Because a port configured as a cascade always sends and receives DSA tagged packets, there is no need for a special Ethertype to identify the DSA tag. The DSA tag appears directly after the MAC source address as illustrated in Figure 11. A packet received from a cascade port and transmitted through a network (i.e., non-cascade) port has its DSA tag either stripped or converted to an 802.1Q tag.
Figure 11: DSA Tag in the Ethernet Frame
7 Octets
Preamble
1 Octet
SFD
6 Octets
Destination Address
6 Octets
Source Address
8 Octets
Marvell Tag (Extended)
2 Octets
Length/Type
FCS
b63
b1
Tag Command
DSA Tag Data
00 - TO_CPU
MAC Client Data 4 Octets
b0
01 - FROM_CPU
10 - TO_ANALYZER 11 - FORWARD
For further details on the DSA tag, see Appendix A. "DSA Tag Formats" on page 333.
MV-S102110-02 Rev. E Page 48
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Packets sent between devices (through cascade ports) are always DSA-tagged.
There are four DSA tag commands: • FORWARD • TO_CPU • FROM_CPU • TO_ANALYZER
For each command there is a different DSA tag format, which carries the relevant fields for the respective command.
4.6.1.1
FORWARD DSA Tag Command
The FORWARD command indicates that the packet is forwarded to specified destination device/port, trunk group, or Multicast group. The extended 8-byte FORWARD DSA tag includes the forwarding the destination of the packet made by the ingress pipeline engines of the ingress device.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In a homogenous system where all devices support the same ingress processing engines, it is recommended to disable ingress processing engines on the cascade port, and to rely on the FORWARD DSA tag decision made by the ingress device (Section 11.1 "Bypassing Bridge Engine" on page 203 and Section 10.4.2 "Enabling Policy Engine Processing" on page 183). In heterogeneous systems, the cascade port may be enabled for ingress engine processing. The legacy 4-byte DSA tag does not include the forwarding destination. Packets received on a cascade port with the 4-byte FORWARD DSA tag are subject to processing by the ingress pipeline engines. This is the case when the device is connected to a legacy device. If the packet is either received on a cascade port with a FORWARD DSA tag whose destination is a remote device, or the packet is received on a network port and the forwarding destination is a remote device, the packet is sent with a FORWARD DSA-tagged via the cascade port configured in the device map table for the requested destination device. (Section 4.2 "Single-Target Destination in a Cascaded System"). If the FORWARD destination is Multicast, the packet is flooded according to the packet’s VLAN and Multicast group assignment (Section 11.5 "Bridge Multicast (VIDX) Table" on page 240). If the packet is queued for transmission on a local network port(s), the packet is transmitted without the DSA tag. If the packet is queued for transmission on a cascade port, the packet is transmitted with a FORWARD DSA tag. For further details on the DSA tag FORWARDING format, see Appendix A. "DSA Tag Formats" on page 333.
4.6.1.2
TO_CPU DSA Tag Command
Packets are sent to the CPU with the TO_CPU DSA tag, which provides the CPU with all the relevant information regarding the packet. The CPU port must be configured as a cascade port to receive this information. A packet is sent to the CPU as a result of one of the following actions: • Packet is assigned a TRAP command by an ingress processing engine (Section 5.1.3 "TRAP Command" on page 53). • Packet is assigned a MIRROR command by an ingress processing engine (Section 5.1.2 "Mirror-to-CPU Command" on page 53).
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 49
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
The DSA tag determines how the packets is processed by the receiving device.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
DSA Tag
•
Packet is assigned a FORWARD command to the virtual CPU port 63 by an ingress processing engine (Section 5.1.1 "FORWARD Command" on page 52). Packet is selected for sampling to the CPU by the ingress or egress port sampling mechanism (Section 16.1 "Traffic Sampling to the CPU" on page 312).
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
•
A configurable CPU code table defines a set of configurable attributes associated with each CPU code. This set of attributes includes the packet QoS, statistical sampling ratio, truncation enable, and the device number through which the packet is sent to the CPU (Section 7.2.1 "CPU Code Table" on page 103). The destination device number for a given CPU code can be the local device or a remote device. If the target is the local device CPU, the packet is sent to the local device CPU packet interface with the TO_CPU DSA tag. If the target is a remote device, the packet is sent with a TO_CPU DSA tag over a cascade port selected according to the Device Map table for the destination device (Section 4.2 "Single-Target Destination in a Cascaded System"). The packet is queued on the cascade port according to the configured control traffic class, and the configured drop precedence for Other-Control traffic (Section 4.5 "QoS on Cascade Interface"). Packets received on a cascade port with the TO_CPU DSA tag are not subject to processing by ingress pipeline engines, even if enabled on the port. The packet is forwarded to the destination CPU device derived from the DSA tag CPU code, as described above. For further details on the DSA tag TO_CPU format, see A.1 "Extended DSA Tag in TO_CPU Format" on page 333.
4.6.1.3
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
For further details on packets to/from the CPU, see Section 6.1 "PCI Interface" on page 58.
FROM_CPU DSA Tag Command
The CPU can inject a packet for a specific destination by creating a FROM_CPU DSA-tagged packet. The FROM_CPU DSA tag contains all the relevant information required by the device to send the packet to the specified destination. The FROM_CPU tag has special provisions for: • Sending the packet to the CPU attached to an adjacent neighbor device without knowing its device number. This “mailbox” mechanism is useful for topology discovery in a multi-CPU system. • Sending to a VLAN or Multicast group, excluding a device/port or trunk group. • Marking the packet as “control”, so it is queued on the cascade control traffic class. • Disabling egress VLAN and Spanning Tree egress filtering. This is important for sending BPDUs out ports that are in the blocked Spanning Tree state. For more details on the FROM_CPU DSA tag, see Section 7.2 "Packets to the CPU" on page 102. and Section 7.3 "Packets from the CPU" on page 107 Packets received on a cascade port with the FROM_CPU DSA tag are not subject to processing by ingress pipeline engines, even if enabled on the port. The packet is forwarded to the target destination defined in the FROM_CPU DSA tag. If the FROM_CPU DSA tag has a single destination with a target device equal to the local device, the packet is transmitted on the local port without the DSA tag.
If the FROM_CPU DSA tag has a single-destination with a target device not equal to the local device, the packet is sent with the FROM_CPU DSA tag on the cascade port configured in the Device Map Table for the destination device (Section 4.2 "Single-Target Destination in a Cascaded System").
MV-S102110-02 Rev. E Page 50
CONFIDENTIAL Document Classification: Restricted Information
The device supports ingress and egress mirroring to an analyzer port. The device supports a configurable ingress analyzer port and a configurable egress analyzer port. The destination analyzer port may be a network port on the local device or a remote device. When an analyzer port is not local to a device, the cascade port leading to the device supporting that analyzer port is configured to be the analyzer port for this device. If a packet is marked to be ingress or egress mirrored to the analyzer port and the respective analyzer port is a cascade port, the packet is sent with the TO_ANALYZER DSA tag. Packets received on a cascade port with the TO_ANALYZER DSA tag are not subject to processing by ingress pipeline engines, even if enabled on the port. The packet is forwarded to the ingress or egress analyzer port configured in this device. Packets transmitted out the analyzer network port are sent without the TO_ANALYZER DSA tag.
4.7
Cascading
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
For more details on Mirroring to an analyzer port (Section 16.2 "Traffic Mirroring to Analyzer Port" on page 314).
In a cascaded system, the Policy, Bridge, Unicast Routing, and Policing Ingress Processing engines should be enabled only on network ports and disabled on cascade ports. In this model, the ingress device of a cascaded system performs the policy, bridging, Unicast routing, and policing of traffic. If the packet is sent through a cascade port to adjacent devices in the system, the packet is processed based on its DSA tag information only.
Configuration
In the Port VLAN and QoS Configuration Entry (0<=n<27, for CPU port n=0x3F) (Table 312 p. 567): • To disable Bridge engine processing on a port, set the bit. • To disable Policy and Policing engine processing on a port, clear the bit. Note
Even when the bridge engine is disabled, the Source address lookup for address learning is still performed (see Section 11.4.7 "FDB Source MAC Learning" on page 231).
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 51
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
If the FROM_CPU destination is Multicast, the packet is flooded according to the VLAN and Multicast group assignment defined in the FROM_CPU DSA tag (Section 11.5 "Bridge Multicast (VIDX) Table" on page 240). If the packet is queued for transmission on a local network port(s), the packet is transmitted without the DSA tag. If the packet is queued for transmission on a cascade port, the packet is transmitted with a FROM_CPU DSA tag. To prevent loops in a non-Spanning Tree cascading topologies, see Section 4.3 "Multi-Target Destination in a Cascaded System".
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Cascading
This section describes packet command assignment and resolution. At each stage of processing in the ingress pipeline, a packet is associated with a specific command value.
5.1
Ingress Packet Command Assignment
Packets received on network ports are assigned the packet command FORWARD at the beginning of the ingress pipeline. The packet command may then be modified by the ingress pipeline engines enabled on the given port. Packets received on cascade ports must always be DSA-tagged (Appendix A. "DSA Tag Formats" on page 333). DSA-tagged packets received with the FORWARD command may be processed by the ingress processing engines, according to the cascade port configuration (Section 11.1 "Bypassing Bridge Engine" on page 203 and Section 10.4.2 "Enabling Policy Engine Processing" on page 183). In a homogenous system where all devices support the same ingress processing engines, it is recommended to disable ingress processing engines on the cascade port, and to rely on the FORWARD DSA tag decision made by the ingress device. In heterogeneous systems, the cascade port may be enabled for ingress engine processing.” DSA-tagged packets received with the command TO_CPU, FROM_CPU, or TO_ANALYZER are not eligible for ingress processing, regardless of the cascade port configuration, i.e., the DSA tag command is final and it will be treated as such by all devices in the system.
Note
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The following packet commands can be assigned by the ingress processing engines to packets received from networks ports and packets received from cascade ports with a DSA tag FORWARD. • FORWARD • MIRROR • TRAP • SOFT DROP • HARD DROP
Marking a packet for ingress-mirroring-to-analyzer and/or ingress-sampling-to-CPU is orthogonal to the packet command, i.e., a packet can have any of the above packet commands and still be ingress mirrored to the analyzer port and/or ingress sampled to the CPU (Section 16.1 "Traffic Sampling to the CPU" on page 312 and Section 16.2 "Traffic Mirroring to Analyzer Port").
5.1.1
FORWARD Command
The FORWARD packet command sends the packet to the assigned single or multi-target destination, where a single target destination is a device/port or trunk group, and a multi-target destination is a VLAN-ID (VID) and Multicast group index (VIDX). Packets received on network ports are assigned the initial packet command FORWARD and a null destination assignment.
Packets received on cascade ports with the FORWARD DSA tag are assigned the initial packet command FORWARD. If the packet has an extended FORWARD DSA tag, the destination in the DSA tag is used as the packet destination.
MV-S102110-02 Rev. E Page 52
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
A packet with the FORWARD command and destination port set to the virtual port 63 is sent to the CPU with a TO_CPU DSA tag, according to the attributes defined in the CPU code table for the packet CPU code assignment, see Section 7.2.1 "CPU Code Table" on page 103.
• •
5.1.2
The destination device number is ignored when the destination port is 63. The TO_CPU packet is sent to the target device specified in the CPU Code table. The FORWARD packet command sends a packet to the CPU only if the destination port is port 63. Multi-target packets (Unknown Unicast, Multicast, and Broadcast) with the FORWARD packet command do not reach the CPU, as the CPU cannot be a member of a VLAN or Multicast group.
Mirror-to-CPU Command
The MIRROR command is a superset of the FORWARD command. In addition to sending the packet to its destination device/port, trunk group, or VID/VIDX group, the CPU is added as a destination as well. A copy of the packet is sent to the CPU with a TO_CPU DSA tag, according to the attributes defined in the CPU code table for the packet CPU code assignment (Section 7.2.1 "CPU Code Table").
5.1.3
TRAP Command
The TRAP command forwards the packet to the CPU only. The packet destination assignment is ignored.
5.1.4
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The packet is sent to the CPU with a TO_CPU DSA tag, according to the attributes defined in the CPU code table for the packet CPU code assignment (Section 7.2.1 "CPU Code Table").
SOFT/HARD DROP Command
The HARD/SOFT DROP command prevents the packet from being sent to its FORWARD destination device/port, trunk group, or VID/VIDX group. The HARD DROP also prevents the packet from being trapped or mirrored to the CPU by any mechanism in the ingress pipeline. The SOFT DROP does not prevent the packet from being trapped or mirrored to the CPU, i.e., if some other mechanism assigns a MIRROR or TRAP command, a SOFT DROP command still allows the packet to be sent to the CPU (but is not sent to any other destination).
5.2
Command Resolution Matrix
Each ingress pipeline engine can assign a new packet command to the packet.
However, the new command is not automatically applied to the packet. The state machine in Table 2 defines the command resolution rules for assigning the packet command. The left column is the packet command from the previous engine and the top row is the new packet command assignment. At the end of the ingress pipeline there is a single packet command that is assigned to the packet.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 53
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Notes
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Command Resolution Matrix
Packet Command Resolution
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Table 2:
Previous Pack et Com mand
FO RWAR D
MIRROR
TRAP
SOFT DROP
HARD DROP
F O RWA R D
FO RWA R D
MIRROR
TRAP
SOFT DROP
HARD DROP
MIRROR
MIRROR
MIRROR
TRAP
TRAP
HARD DROP
TRAP
TRAP
TRAP
TRAP
HARD DROP
SOFT DROP
SOFT DROP
TR AP
TRAP
SOFT DROP
HARD DROP
HARD DROP
HARD DROP
HARD DROP
HARD DROP
HARD DROP
HARD DROP
TRAP
Notes
• •
Example
If multiple mechanisms assign the TRAP command, the CPU code reflects the last mechanism to assign the TRAP command.
If multiple mechanisms assign the MIRROR command, the CPU code reflects the last mechanism to assign the MIRROR command.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
•
If the packet command is updated from MIRROR to TRAP, the CPU code is always updated to reflect the TRAP CPU code.
If the Policy engine has a rule whose action assigns to the packet the SOFT DROP command, and the bridge engine has a mechanism that assigns to the packet the MIRROR command, then the resulting packet command is TRAP.
MV-S102110-02 Rev. E Page 54
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Section 6. Host Management Interfaces
Table 3:
Host Management Interfaces
Relev ant Dev ic es
PCI interface
98DX130 98DX133 98DX250 98DX253 98DX260
98DX262 98DX263 98DX270 98DX273 98DX803
Forwarding packets to/from the host CPU. Asynchronous message notification of the device FDB events. Read and Write access to all the device's address mapped entities. Interrupt handling.
Forwarding packets to/from the host CPU for the devices that do not incorporate a PCI interface or for the devices that do incorporate a PCI interface but choose to use this interface instead.
Address-mapped read and write access for the devices that do not incorporate a PCI interface or for the devices that do incorporate a PCI interface but choose to use this interface instead.
Read/write access to the PHY devices attached to the device’s Tri-Speed ports. Out-Of-Band Auto-Negotiation with the PHY devices attached to the device’s Tri-Speed ports.
Read/write access to HyperG.Stack ports 24, 25, and 26 integrated XAUI PHYs registers via their Slave SMI Interface. Read/write access to an external device (e.g., 88X2010 XFP PHY) attached to the device’s HyperG.Stack ports.
IEEE 802.3 Clause 45 compliant Slave XSMI interface, per HyperG.Stack port
98DX130 98DX133 98DX260 98DX262 98DX263
98DX269 98DX270 98DX273 98DX803
Read/write access to HyperG.Stack ports 24, 25, and 26 integrated XAUI PHYs registers.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In terfa ce
Figure 12 and Figure 13 illustrate the device’s Host Management interfaces.
MV-S102110-02 Rev. E Page 56
CONFIDENTIAL Document Classification: Restricted Information
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Host Management Interfaces
6.1
This section is relevant for the following devices:
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
R
PCI Interface D Layer 2+ Stackable: 98DX130, 98DX250, 98DX260, 98DX262, 98DX270, 98DX803 D Multilayer Stackable:98DX133, 98DX253, 98DX263, 98DX273, U
Not relevant for the SecureSmart and SecureSmart Stackable devices.
The PCI interface is used for: • Forwarding packets to/from host CPU. • Asynchronous message notification of the device FDB events. • Address-mapped entities read and write access. • Interrupt handling.
The device has a 32-bit, 66 MHz, 3.3V PCI bus, which is compliant with Revision 2.1 of the PCI specification.
Note
The device does not support 5V tolerance.
The PCI interface consists of a Master unit and a Slave unit.
6.1.1
PCI Slave Unit
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The Slave unit responds to all transactions from the host PCI controller and host PCI bridge. The Master unit is used by the device’s internal DMAs to access the host memory for packet transmission and receipt, and for sending asynchronous Address Update messages to host memory.
The PCI Slave unit is used to access the device’s address-mapped memory regions and the address-mapped memory regions on the devices attached to the device via the various SMI Interfaces.
The host processor uses the PCI Slave unit to manage the device, i.e., reading/writing registers and table entry data structures. The PCI Slave unit has two internal buffers. Each buffer is dedicated for a single write transaction and can hold transactions of up to 32 bytes. These buffers are used to store data coming from and transmitted to the PCI. The PCI Slave unit is also used to configure the DMAs that utilize the PCI Master unit. The PCI Slave unit is capable of responding to the following PCI transactions: • Memory Read • Memory Write • Memory Read Line • Memory Read Multiple • Memory Write and Invalidate • Configuration Read • Configuration Write
MV-S102110-02 Rev. E Page 58
CONFIDENTIAL Document Classification: Restricted Information
All PCI writes (except for configuration writes) are posted. The device incorporates dual 32-byte posted write buffers. In every posted write cycle, the data is first written to the buffers. When a buffer fills up (32 bytes) or a FRAME is de-asserted, the data is written to the destination while the second buffer fills up. The PCI Master unit is released after the data is stored in one of the slave buffers.
6.1.1.2
PCI Non-Posted Write Operation
PCI configuration writes are non-posted writes. The slave asserts PCI_TRDYn only when data is actually written to the configuration register. This implementation guarantees that there is never a race condition between the PCI transaction changing address mapping (Base Address registers) and subsequent transactions.
6.1.1.3
PCI Read Operation
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In the device, the PCI Slave unit supports conventional read accesses to the PCI. Upon detection of a read cycle, the slave fetches the required data from the device’s internal register files or internal RAM arrays to its 32-byte prefetch buffer, and then sends it to the host device on the PCI. The device starts sending the data as soon as it reaches the PCI slave prefetch buffer. Most of the internal registers and tables in the device’s address space may be read, one DWORD (4-byte) at a time. When the CPU reads one of the internal registers, the PCI_FRAMEn should de-assert to one cycle only indicating a one DWORD read transaction. Otherwise, the slave will terminate the transaction using the Disconnect Target Termination. The device supports burst reads from its internal memory address space.
When accessing a large memory space in one of the burst-capable address spaces of the device, it is recommended to use the memory read multiple transaction. In this kind of transaction the device’s slave pre-fetches bursts from the internal memory so that the PCI transaction is faster and more efficient.
6.1.1.4
PCI Slave Termination
The device’s PCI Slave unit supports the following three types of target termination events described in the PCI specification: • Target Abort • Retry • Disconnect The PCI specification requires that latency for the first data read does not exceed 15 cycles, and that consecutive data (in the case of a burst) does not exceed 7 cycles. The slave has an idle timer which counts the number of PCI clock cycles while the target awaits data from the device’s resources. The timer is activated when the PCI slave identifies a transaction to one of the device’s BARs and is reset after a transaction occurs (assertion of PCI_TRDYn for reads and PCI_IRDYn for writes). In the following section, this timer is referred to as the Idle counter.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 59
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The PCI Slave unit does not support the following PCI transactions: • I/O Read • I/O Write • Special Cycle • Dual-Address-Cycle • Lock • Interrupt acknowledgement
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
PCI Interface
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The slave has two configuration parameters—Timeout0 and Timeout1. • Timeout0 is the latency for the first data read • Timeout1 is the latency for consecutive data transactions
•
(Table 308 p. 557) accordingly. To configure the latency for consecutive data transactions, set the field in the Timer and Retry Register (Table 308 p. 557) accordingly.
Target Abort
Target Abort is activated when detecting a parity error on the address sent on the PCI. When detecting a Target Abort, the device performs the following: • Terminates the transaction with Target Abort. • The PCI bit in the PCI Interrupt Cause Register (Table 565 p. 797) is set. • When the bit is set in PCI_SERRn Mask Register (Table 309 p. 558), the PCI_SERRn signal is asserted.
Target Retry
The slave has a configurable timeout limit, which is applied to the Idle counter. The limit is configured using the field in the Timer and Retry Register (Table 308 p. 557)
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The RETRY termination is activated in on of the following cases: • In a write transaction, the Idle counter reaches the limit and buffers are still full of data from the previous transaction. • In a read transaction, the Idle counter reaches the limit and read data is not available yet. • In Non-posted writes (configuration writes), the Idle counter reaches the limit and the write buffer is not empty. In any of the above cases, the slave terminates the transaction with a RETRY transaction. The master initiating the transaction is expected to send the exact transaction again.
Target Disconnect
The slave has a configurable timeout limit, which is applied on the Idle counter. The limit is configured using the field in the Timer and Retry Register (Table 308 p. 557). The slave generates a DISCONNECT termination in any of the following cases: • Upon a burst read or write access with the address LSB[1:0] not equal ‘00’. • Upon a burst read or write access that reaches BAR boundary. • Burst read or write access to internal registers. • Burst write, has expired, and data n+1 cannot be stored in the slave buffers, since they are full. • Burst write, has expired, and data n+1 cannot be stored in the slave buffers, since they are full. In any of the above cases, the slave terminates the transaction with a disconnect transaction.
MV-S102110-02 Rev. E Page 60
CONFIDENTIAL Document Classification: Restricted Information
The slave responds to a Type 0 PCI configuration transaction when: • IDSEL is active • A configuration command is decoded • AD[1:0] is’00’ (Type 0 Configuration Command) See Appendix C.12.1 "PCI Registers" on page 553 for a full description of the configuration registers supported by the device. Other optional registers that are specified in the PCI standard specifications and are not implemented by the device, will be cleared to all 0 when read from the PCI.
6.1.1.6
Fast Back-to-Back
The device’s slave is capable of supporting fast back-to-back transactions, compliant with the PCI specification and requirements. The slave can track bus transactions from a final data transfer (PCI_FRAMEn high, PCI_IRDYn low) directly to an address phase (PCI_FRAMEn low, PCI_IRDYn high) on consecutive clock cycles. Since the device’s PCI_DEVSELn timer is set to medium, contentions can be avoided on the PCI_DEVSELn, PCI_PERRn, PCI_TRDYn, and PCI_STIPn signals.
I/O-Mapped Transactions
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
6.1.1.7
The device does not support I/O transactions on the PCI.
6.1.1.8
Memory-Mapped Transactions
This section describes the device’s PCI slave memory-mapped transactions.
Address Spaces
The device consumes two address regions on the PCI: PCI Internal Address Space 12-bit address space (Base address is the 20 MSb’s) that contains the PCI Configuration (Header Type 00) and debug registers. A PCI Internal Address Space transaction is answered by the device’s PCI Slave unit, if PCI_AD[31:12] = 20 MSb’s of PCI memory-mapped internal base address. The PCI internal address space is also accessible from the TWSI interface. Device Address Space
Note
26-bit address space (Base address is the 6 MSb’s). This address space is used to address the majority of the device’s registers, as well as all internal memories. A Device Address Space transaction is answered by the device’s PCI Slave unit if PCI_AD[31:26] = the 6 MSb’s of device’s memory-mapped internal base address.
All memory-mapped transactions are accepted only if the bit in the PCI Status and Command Register (Table 300 p. 553) is set.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 61
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The device’s PCI interface supports a Type 00 configuration space header, as defined in the PCI specification. The devices in these three families are single-function devices.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
PCI Interface
Configuration
•
To configure the base address for the PCI internal address space, set the field in the PCI Memory Mapped Internal Base Address Register (Table 303 p. 556) accordingly. To configure the base address for the PCI device address space, set the field in the Device Memory Mapped Internal Base Address Register (Table 304 p. 556) accordingly.
Address Completion Register
The device consumes a 26-bit address space on the PCI bus, however it uses 32 bits for internal address space. To enable the host processor to access the entire device’s address space, a window-based approach is used. The device defines four address windows (address regions) of 24 bits each. Every PCI access is mapped to one of these address regions. Whenever the device identifies an access on the PCI bus to its address space, it uses PCI_AD[25:24] pins as a region identifier and performs address completion to one of four 8-bit values. The address seen internally by the device is {8-bit address completion value, PCI_AD[23:0]}. The 8-bit address completion value is taken from the Address Completion register, where there are four possible values indexed according to the PCI_AD[25:24].
Configuration
6.1.2
PCI Master Unit
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In the Address Completion Register (Table 85 p. 379): • To configure region0, (for PCI_AD[25:24:] = 0), set the field accordingly. • To configure region1, (for PCI_AD[25:24:] = 1), set the field accordingly. • To configure region2, (for PCI_AD[25:24:] = 2), set the field accordingly. • To configure region3, (for PCI_AD[25:24:] = 3), set the field accordingly.
The PCI Master unit is used by the device for the following: • Receive packets from the host CPU memory. • Transmit packets to the host CPU memory. • Asynchronous message notification of the device FDB events to the host CPU memory. The device’s PCI Master unit supports the following transactions: • Memory Read. • Memory Write. • Memory Read Line. • Memory Read Multiple. • Memory Write and Invalidate.
MV-S102110-02 Rev. E Page 62
CONFIDENTIAL Document Classification: Restricted Information
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
•
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The PCI Internal Address Space base address and the Device Address space base address may be configured during the PCI configuration cycle of via the TWSI Interface.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Memory Write and Invalidate and Memory Read Line cycles are performed when a transaction accessing the PCI memory space requests a data transfer size equal to multiples of the PCI cache line size and it is also cache-line aligned. For these transactions in the PCI Status and Command Register (Table 300 p. 553) must be set to 1.
Memory Read Multiple
Memory Read Multiple is performed when a transaction accessing the PCI memory space requests a data transfer size greater than a cache-line or crosses the PCI cache-line size boundary.
Note
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The device supports a cache-line size of eight 32-bit words only. If the PCI cache-line register is set to any other value, the cache-line size is regarded as being set to 0.
The PCI Master unit consists of 512 bytes of posted-write buffer data and 512 bytes of read buffer data. It can accommodate up to four write transactions plus four read transactions. In the device, the PCI master posted-write buffer permits the SDMA to complete memory writes when the PCI bus becomes available, even if the PCI bus is busy while the posted data is written to the target PCI device (Host bridge). The read buffer is used to absorb the incoming data from the PCI. Read and Write buffer implementation guarantees that there are no wait states inserted by the master.
Note
PCI_IRDYn is never de-asserted in the middle of a transaction.
6.1.2.1
PCI Master Write Operation
The PCI master unit supports the combining of memory writes, where it combines consecutive write transactions if possible. This is especially useful for long DMA transfers, where a long burst write is required. Combining is always enabled in the device. For write combining, the following conditions must be met: • The start address of the second transaction matches the address of data n+1 of the first transaction. For example, a write to a continuous buffer in host memory. • The request for the new transaction arrives while the first transaction is still in progress. The benefit of write combining is seen when writing packets to the CPU memory and when the buffer is longer than 128 bytes.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 63
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The following transactions are not initiated from the device’s PCI Master unit. • I/O Read. • I/O Write. • Configuration Read. • Configuration Write. • Interrupt Acknowledge. • Special Cycle. • DAC Cycles.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
PCI Interface
Master Fast Back-to-Back
Configuration
To enable/disable Fast Back-to-Back, set the bit in the PCI Status and Command Register (Table 300 p. 553).
6.1.2.2
PCI Master Read Operation
On a read transaction, as soon as the SDMA requests PCI read access, the master drives the transaction on the bus (after obtaining bus mastership). The returned data is written into the read buffer. The master also supports combining read transactions. This is especially useful for long SDMA reads. The PCI target is capable of driving long burst data without inserting wait states.
6.1.2.3
PCI Master Termination
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Combining is always enabled in the device. Whenever possible, the master combines consecutive read transactions. For read combining, the following conditions must be met: • The start address of the second transaction matches the address of data n+1 of the first transaction. • The request for the new transaction comes while the first transaction is still in progress.
This section describes the device’s PCI Master Target termination.
PCI Master Termination Overview
The master issues a Master Abort event only if there is no target response to the initiated transaction within four clock cycles. In this case, the master de-asserts PCI_FRAMEn and on the next cycle de-asserts PCI_IRDYn. When this happens, the bit in the PCI Interrupt Cause Register (Table 565 p. 797) is set. The master supports all types of target termination—Retry, Disconnect, and Target Abort.
Target Retry
If a target has terminated a transaction with Retry, the device’s master re-issues the transaction. By default, the master retries a transaction until it is being served. However, the number of Retry attempts can be limited. If the master reaches this number, it stops retrying, and the bit in the PCI Interrupt Cause Register (Table 565 p. 797) is set.
Configuration
To configure the number of retries, set the field in the PCI Status and Command Register (Table 300 p. 553) accordingly.
Target Disconnect
If a target terminates a transaction with Disconnect, the master re-issues the transaction from the point it was disconnected. If, for example, the master attempts to burst eight 32-bit words starting at address 0x18 and the target disconnects the transaction after the fifth data transfer, the master will re-issue the transaction with the address 0x2C, in order to burst the three remaining words.
MV-S102110-02 Rev. E Page 64
CONFIDENTIAL Document Classification: Restricted Information
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The master supports fast back-to-back transactions. If a new transaction is pending while a transaction is in progress, the master will start the new transaction as soon as the first one ends, without inserting a dead cycle. The master will issue a fast back-to-back transaction if the following conditions occur: • The first transaction is a write. • The new transaction request comes while the first transaction is still in progress.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
If a target abnormally terminates a transaction with a Target Abort, the master will not attempt to re-issue the transaction. In this event, the bit in the PCI Interrupt Cause Register (Table 565 p. 797) is set.
6.1.3
PCI Parity and Error Support
Assertion of the PCI_PERRn and PCI_SERRn signals is configurable, depending on the configuration of the bit in the PCI Status and Command Register (Table 300 p. 553).
Configuration To enable/disable Parity Error support, set the bit in the PCI Status and Command Reg• •
ister (Table 300 p. 553). Assertion of PCI_SERRn depends on the setting of PCI_SERRn Mask Register (Table 309 p. 558).
6.1.4
Disabling the PCI Interface
When the PCI Interface is not used, the PCI_EN pin must be pulled-down. The rest of the PCI Interface pins may be left NC.
6.1.5
Packet Reception and Transmission
6.1.5.1
SDMA Overview
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
This section describes the device’s Packet Reception and Transmission via the PCI bus.
The device incorporates 16 Bus Master DMAs, which transfer packets from device memory to CPU memory (receive path) and from CPU memory to device memory for transmission (transmit path). The 16 DMAs are divided into eight RxDMAs and eight TxDMAs, one for each of the eight traffic class (TC) queues. (SecureSmart devices have 4 traffic classes.)They are referred to as Serial DMAs (SDMA), as they operate on a serialized linked list of descriptors. For convenience, the SDMA is designed in a NIC-style DMA on the host processor. SDMA uses the PCI as a Master for all its transactions: descriptor read/write, buffer read/write. Maximum burst size is 128 bytes and minimum burst size is 8 bytes. The device incorporates eight dedicated receive DMA queues and eight dedicated transmit DMA queues, operated by two dedicated DMA engines (one for receive and one for transmit) that operate concurrently. Each queue is managed by buffer descriptors, which are chained together and managed by the software. The Rx SDMA engine transfer packets from the device memory to CPU memory (receive path) and the Tx SDMA from CPU memory to the device memory for transmission (transmit path). The packet data is stored in memory buffers, with any single packet spanning multiple buffers if necessary. The buffers are allocated by the CPU and are managed through chained descriptor lists. Each descriptor points to a single memory buffer and contains all the relevant information relating to that buffer (i.e., buffer size, buffer pointer, etc.) and a pointer to the next descriptor. Each descriptor also has an ownership bit, which indicates the current owner of the descriptor and its buffer (CPU or SDMA). The SDMA may process descriptors marked with SDMA ownership only. Data is read from the buffer or written to the buffer according to information contained in the descriptor. Whenever a new buffer is needed (end of buffer or end of packet), a new descriptor is automatically fetched and the data movement operation is continued using the new buffer. Ownership of the processed packet descriptors is returned by the SDMA to the CPU. (This operation is also referred to as Close Descriptor.) A Tx/Rx buffer maskable inter-
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 65
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
The device implements all parity features required by the PCI specification. This includes PCI_PARITYn, PCI_PERRn, and PCI_SERRn generation and checking.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
PCI Interface
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
rupt is asserted to indicate a Close Descriptor event. The SDMA also delivers status information per packet via the representative descriptor of a packet (the first descriptor at Rx and the last descriptor at Tx). Figure 14 illustrates an example of memory arrangement for a single packet using three buffers. A cyclic ring of descriptors can be build by the software, or the chain can terminate with a NULL next-pointer.
Descriptor 1 cmd/status
byte count/buffer size
Packet 1 - buffer 1
buffer pointer
next descriptor pointer
Descriptor 2 cmd/status
byte count/buffer size buffer pointer
Packet 1 - buffer 2
next descriptor pointer
Descriptor 3 cmd/status
byte count/buffer size
6.1.5.2
Packet 1 - buffer 3
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
buffer pointer
next descriptor pointer
Packet Reception (from the Device to the CPU)
The CPU receives packets from the device directly into a pre-allocated space in host memory. To receive packets for a given CPU traffic class queue, the CPU must do the following: 1. Prepare a linked list of descriptors located in the host memory. 2. Configure one of the eight device Rx SDMAs with the address of the first descriptor in the list. 3. Enable the given device SDMA. A packet may occupy several buffers. For all descriptors except the first one, the DMA closes the descriptor by returning ownership after filling the buffer with packet data. After the entire packet has been copied to the host memory, the DMA closes the first descriptor, ownership is returned, and the packet information is written to the relevant descriptor fields. Then the DMA automatically starts transferring the next packet.
Rx SDMA Initialization
For each CPU traffic class to which traffic to the CPU is assigned, the CPU must prepare the host memory descriptor list and initialize the corresponding Rx SDMA. The host CPU may also evaluate which SDMA Rx queues have packets pending by reading the field in the Receive SDMA Status Register (Table 121 p. 397).
MV-S102110-02 Rev. E Page 66
CONFIDENTIAL Document Classification: Restricted Information
After completing these steps, the Rx SDMA is ready to deliver packets destined to that queue. The Rx SDMA fetches the current descriptor according to the address in >, which is triggered by the setting. The Rx SDMA generates write bursts via the PCI to deliver the packet data to the current buffer. In parallel, the Rx SDMA updates > and fetches the next descriptor. If the buffer is filled before the end of packet, the Rx SDMA continues to write the packet data to the next buffer. Upon every descriptor close, the Rx SDMA updates the descriptor First and Last indications—the first descriptor with First=1 and the last descriptor with Last=1 (see "Rx SDMA Descriptor" on page 70).
Note
Rx SDMA Descriptor Pointer
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The Rx SDMA closes the descriptor upon completing the writing of data to the descriptor buffer, however the first descriptor of the packet is closed only after the packet is received in its entirety, at which time the Rx SDMA sets the first descriptor’s ownership to ‘CPU’, sets the First indication, and updates the status word and byte count of this packet.
The Rx SDMA holds one 32-bit pointer register per queue—>. This 32-bit register is used to point to the first descriptor of a receive packet. The CPU must initialize this register before enabling DMA operation. The value used for initialization should be the address of the first descriptor to use. After enabling the DMA channel, this register is updated by the Rx SDMA, as it moves thorough the descriptors chain and reflects the current pointer location within the linked list of descriptors.
Note
The CPU must not write to this register while its corresponding DMA is enabled. Modifying this register is allowed only when the respective bit is reset. This register may be read to assess the progress of the DMA, as well as to monitor the status of the queue.
Rx SDMA Buffer Write Done Interrupt
When the Rx SDMA completes writing to the current descriptor buffer, it sets descriptor ownership to the CPU. If the descriptor Enable Interrupt bit of the current buffer is set to 1, the Rx SDMA sets the relevant bit of the interrupt in the Receive SDMA Interrupt Cause Register (Table 561 p. 796).
Rx SDMA Resource Error Event
A resource error event occurs when it reaches the end of available descriptors in the linked list. The condition occurs if either: • The current descriptor for the Rx SDMA has a NULL next-descriptor pointer OR • the SDMA has reached a descriptor that is owned by the CPU (i.e., a cyclic descriptor list is implemented).
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 67
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
To initialize a receive operation, the CPU must do the following: 1. Prepare a chained list of descriptors and packet buffers with ownership=1 (SDMA). NOTE: The Rx SDMA supports eight priority queues. If the user wants to take advantage of this capability, a separate list of descriptors and buffers must be prepared for each of the priority queues. 2. Write the pointer to the first descriptor to > in the Receive SDMA Current Descriptor Pointer Register (0<=n<8) (Table 122 p. 397) associated with the priority queue to be started. If multiple priority queues are needed, the user must initialize > for each queue. 3. Enable the Rx SDMA channel by setting the relevant bit of to 1 in the Receive SDMA Queue Command Register (Table 115 p. 394).
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
PCI Interface
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
This may happen at the start of a packet or in the middle of writing it to host memory (if it occupies more than a single descriptor buffer). This may also happen due to a speculative pre-fetch of a descriptor (i.e, without any current packet destined to the given queue).
In the case of a resource error event, the device Rx SDMA mechanism performs the following actions: • The last valid descriptor is closed by setting the descriptor Last=1, and the first descriptor of the broken packet is closed by setting the descriptor First=1 and Resource-Error=1 (see "Rx SDMA Descriptor" on page 70). • The corresponding Rx SDMA Interrupt Cause bit is set in the Receive SDMA Interrupt Cause Register (Table 561 p. 796). • If the resource error is due to reaching a descriptor with a NULL next-descriptor pointer, the relevant Rx SDMA Enable Queue bit is cleared in the Receive SDMA Queue Command Register (Table 115 p. 394). If the resource error is due to reaching a descriptor owned by the CPU, the Rx SDMA Enable Queue bit remains set. The Rx SDMA maintains its pointer to the current descriptor (that is currently owned by the CPU). • If the resource event occurred on a real packet (and not upon a speculative pre-fetch), increment the 8-bit resource event counter for the given queue (see "Rx SDMA Counters" on page 70).
Note
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
If the resource error is due to reaching a descriptor owned by the CPU, then the CPU must update the current descriptor with a pointer to a free buffer and set the descriptor ownership to the SDMA. On the next transmit attempt, the Rx SDMA will detect that descriptor ownership is now the SDMA and will proceed to transmit the packet to the descriptor buffer.
If the CPU does not immediately have a free buffer to supply the descriptor, it is recommended that the CPU: 1) Disable the Rx SDMA queue by setting the field to 1 in the Receive SDMA Queue Command Register (Table 115 p. 394). This prevents the Rx SDMA from performing a PCI read of the descriptor on every attempt to transmit the packet for this queue. 2) Once the descriptor is updated, the Rx SDMA can be enabled by setting the relevant bit of to 1 in the Receive SDMA Queue Command Register (Table 115 p. 394).
If the resource error is due to reaching a descriptor with a NULL next-descriptor pointer, the CPU must: 1. Prepare the new linked list of descriptors in host memory. 2. Write the pointer to the start of the descriptor list to > in the Receive SDMA Current Descriptor Pointer Register (0<=n<8) (Table 122 p. 397) associated with the priority queue to be started. 3. Enable the relevant DMA channel by setting the relevant bit of to 1 in the Receive SDMA Queue Command Register (Table 115 p. 394). If the Rx SDMA successfully writes the packet to the descriptor buffers, the SDMA closes the first descriptor of the packet by returning ownership, setting the First bit, and writing the packet information in the descriptor fields.
MV-S102110-02 Rev. E Page 68
CONFIDENTIAL Document Classification: Restricted Information
Upon Rx SDMA resource error, the packet that failed to be written in its entirety to host memory is dropped. The next packet is scheduled according the CPU port scheduling algorithm (see Transmit Queue Scheduling (Table 15.3.2 p. 306).
RETRY mode
Upon Rx SDMA resource error, the packet that failed to be written in its entirety to host memory remains scheduled for transmission. No other packet is handled from other traffic class queues until the current packet is successfully transmitted to host memory. When the resource error is resolved (i.e., the current descriptor is owned by the SDMA), the packet is transmitted again in its entirety.
To configure queue RETRY/ABORT mode, set the relevant bit of the field in the SDMA Configuration Register (Table 114 p. 393) accordingly.
Rx SDMA Disable Queue Operation
The CPU may disable any active DMA queue at any time.
To disable a queue, set the relevant bit of the field to 1 in the Receive SDMA Queue Command Register (Table 115 p. 394).
Rx SDMA Parity Error Handling
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
If the queue is disabled while Rx SDMA is in the middle of writing the packet to host memory, the Rx SDMA continues to process the packet until the packet end or a resource error event, then it disables the queue operation by resetting its bit.
The Rx SDMA may encounter a parity error event, derived from the PCI bus, upon descriptor fetch operation. This case is handled like a resource error event, except that in this case a parity interrupt is asserted (see Section 6.1.3 "PCI Parity and Error Support" on page 65).
Rx SDMA Parity Error on Data Read from the Device’s Buffers Memory
When the device reads the packet from its buffer memory, it conducts a parity check. A parity error indicates that the packet’s data was corrupted due to a soft error on the device’s buffers memory. When the packet is transmitted via a network port, it is transmitted with a bad CRC so that the receiving device will reject it. On the Rx SDMA CPU interface this is indicated by setting the bus_error bit in the first packet descriptor to 1 (see "Rx SDMA Descriptor" on page 70).
Rx SDMA Invalid CRC
All packets forwarded to the host CPU contain a TO_CPU DSA tag. Due to the DSA tag packet modification, the packet Ethernet CRC is always invalid. The Rx SDMA does not recalculate the packet’s CRC. Instead it indicates that the packet contains the four bytes of the packet’s CRC, but that these bytes are invalid. This is indicated by setting the invalid_CRC bit in the first packet descriptor to 1 (see "Rx SDMA Descriptor" on page 70).
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 69
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
If the resource error occurred on a real packet (and not upon a speculative pre-fetch), each queue can be independently configured to operate in one of two modes:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
PCI Interface
Rx SDMA Data Byte Order
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In default mode, the device writes data to the PCI bus in Big Endian byte order.
Configuration
In the SDMA Configuration Register (Table 114 p. 393): • To configure Word Swap mode, set the bit. • To configure Byte Swap mode, set the bit.
Rx SDMA Counters
The Rx SDMA maintains the following per-queue counters:
PktCnt> in the Receive SDMA Packet Count Register (0<=n<8) (Table 126 p. 398).
Byte Counters (32 bit)
BCCnt> in the Receive SDMA Byte Count Register (0<=n<8) (Table 127 p. 398).
Resource Error counters (8 bit)
, , and in the Receive SDMA Resource Error Count 0 Register (Table 128 p. 399).
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Packet counters (32 bit)
, , and in the Receive SDMA Resource Error Count 1 Register (Table 129 p. 399).
For each successful packet forwarded to the CPU, the relevant packet counter is incremented by one and the relevant byte counter is incremented by the packet’s byte count. If the packet is unsuccessful due to resource error, the Rx SDMA increments the respective Resource Error counter by one.
When working in Retransmit On Error mode, this counter is incremented upon each unsuccessful packet transfer that occurs, until the packet transfer succeeds.
Rx SDMA Descriptor
The CPU can learn information about the packet received from the device by examining the Rx SDMA descriptor and the packet TO_CPU DSA tag.
Notes • •
All packets forwarded to the CPU contain a TO_CPU DSA tag (see Section 7.2 "Packets to the CPU" on page 102). The format of the TO_CPU DSA tag is found in Section A.1 "Extended DSA Tag in TO_CPU Format" on page 333.
MV-S102110-02 Rev. E Page 70
CONFIDENTIAL Document Classification: Restricted Information
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Two bits are defined to support byte-swapping, for CPUs that use different Endian byte ordering and different word sizes (32-bit or 64-bit): Word Swap Mod Within each 64-bit format, swaps the highest 32 bits with the lowest 32 bits. Byte Swap Mode Swaps bytes within every 32-bits (byte 3 swapped with byte 0; byte 2 swapped with byte 1).
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Buffer Pointer [31:7]
Next Descriptor Pointer [31:4]
Table 5:
+C
Rx SDMA Descriptor—Command/Status Field
Bits
Nam e
Desc ription
Se t By
31
O
Ownership bit: 0 = Buffer owned by CPU 1 = Buffer owned by the device SDMA
CPU/ Device
30
bus_error
The packet had an error while fetching it from device memory (see "Rx SDMA Parity Error on Data Read from the Device’s Buffers Memory" on page 69). NOTE: Valid only on the packet’s first descriptor, ( = 1).
Device
29
EI
Enable Interrupt When set, a maskable interrupt will be generated upon closing this descriptor. (see "Rx SDMA Buffer Write Done Interrupt" on page 67).
CPU
28
Resource Error
A resource error event occurred (see "Rx SDMA Resource Error Event" on page 67.) NOTE: Valid only on the packet’s first descriptor, ( = 1).
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 71
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Rx SDMA descriptor properties: • Descriptor length is 16-bytes, and it must be16-byte aligned (i.e., Descriptor_Address[3:0]==0000). • Descriptors may reside anywhere in the address space except for the null address (0x00000000), which is used to indicate the end of a descriptor chain. • If the link list of descriptors is in a chain formation (i.e., not a ring formation), the last descriptor in the chain must have a null value in the Next Descriptor Pointer field. In a ring formation, the end of chain is indicated by arriving at descriptor that is owned by the CPU. • The length of Rx buffers (whose pointer resides in the Rx SDMA descriptor) is limited to 16 KB and must have a 128-byte aligned address (i.e., Buffer_Address[6:0]==7’b0000000). • The minimum Rx buffer length is eight bytes.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
PCI Interface
Rx SDMA Descriptor—Command/Status Field (Continued) Nam e
Desc ription
Se t By
27
F
First. When set, indicates that the buffer associated with this descriptor is the first buffer of a packet.
Device
26
L
Last. When set, indicates that the buffer associated with this descriptor is the last buffer of a packet.
Device
25:0
Reserved
Must be 0.
Device
Bits 31 30
29:16
Receive Descriptor—Byte Count Field
Na me
De scription
Set B y
Reserved
Must be 0
Device
Invalid_CRC
This bit states whether the Ethernet packet had a valid CRC or invalid CRC. Invalid CRC can be caused by the device modifying the packet’s content. (see "Rx SDMA Invalid CRC" on page 69). 0 = Valid 1 = Invalid NOTE: Valid only on the packet’s first descriptor, ( = 1).
Device
Packet Byte Count[13:0]
When the last descriptor is closed, this field in the first descriptor of the packet is written by the device with a value indicating the total byte count of the received packet. For Ethernet packets the packet length ALWAYS includes the 4 bytes of CRC. NOTE: Valid only on the packet’s first descriptor, ( = 1).
Device
Device
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Table 6:
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Bits
15:14
Reserved
Must be 0
13:3
Buffer Size
Buffer size in quantities of 8 bytes. This field indicates the size in 8-byte resolution in the CPU memory allocated for this descriptor. When the number of bytes written to this buffer is equal to Buffer Size Value, the DMA closes the descriptor and moves to the next descriptor.
CPU
2:0
Reserved
Must be 0.
Device
Table 7:
Rx SDMA Descriptor—Buffer Pointer
Bits
Nam e
Des cription
31:7
Buffer Pointer
25 most significant bits of the 32-bit address of the buffer associated with this descriptor. NOTE: The buffer address must be 128-byte aligned.
6:0
Reserved
Must be 0.
MV-S102110-02 Rev. E Page 72
Set By
CONFIDENTIAL Document Classification: Restricted Information
32-bit pointer that points to the beginning of the next descriptor. Bits[3:0] must be set to 0. DMA operation is stopped when a NULL (all zero) value in the Next Descriptor Pointer field is encountered.
CPU
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Bits
6.1.5.3
Packet Transmission (from the CPU to the Device)
The CPU transmits packets to the device using the Tx Serial DMA (SDMA) mechanism. The CPU initializes a linked list of Tx SDMA descriptors in host memory space. Each descriptor is associated with a data buffer to be transmitted, where a packet can span multiple descriptors. Once the Tx SDMA descriptor linked list is set up and ready for processing, the CPU configures the device with the address of the head of the linked list and enables the relevant Tx SDMA. The device starts reading the data from the linked list and passes it to the device’s buffer memory until reaching a descriptor marked with ‘L=1’. Multiple packets can be in a single linked list. If the list is in a chain format, the last descriptor of the last packet must be marked with the next-descriptor pointer set to NULL (all zeros). If a ring descriptor list format is used, the list is terminated by reaching a descriptor owned by the CPU.
Tx SDMA Initialization
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Once enabled, the device’s Tx SDMA transmits all the packets in the linked list, stopping when it reaches a descriptor with the NULL pointer (all zeros) or a descriptor owned by the CPU.
To initialize a transmit operation, the CPU must do the following: 1. Prepare a chained list of initialized Tx SDMA descriptors. The descriptor owner must be set to ‘1’ (SDMA). Note: The Tx SDMA supports eight priority queues. If the user wants to take advantage of this capability, a separate list of descriptors and buffers must be prepared for each of the priority queues. 2. Write the pointer to the first descriptor to > in the Transmit SDMA Current Descriptor Pointer Register (0<=n<8) (Table 117 p. 396) associated with the priority queue to be started. If multiple priority queues are needed, the user must initialize > for each queue. 3. Enable the Tx SDMA channel by setting the relevant bit of to 1 in the Transmit SDMA Queue Command Register (Table 116 p. 395). After completing these steps, the Tx SDMA starts to perform arbitration between the active transmit queues, according to fixed priority, on a packet-by-packet basis. The Tx SDMA then fetches the first descriptor from the specific queue it has decided to serve (according to the address at >) and starts transferring data from the memory buffer to the device’s buffers memory. The > is updated with the next descriptor pointer. When the Tx SDMA completes sending the buffer’s data to the device, the Tx SDMA closes the respective descriptor by returning ownership to the CPU. If this is not the end of the packet, the Tx SDMA then fetches the next descriptor and continues to read packet data from the next buffer. Upon completing the processing of all the buffers of a packet, Tx SDMA again performs arbitration between the active transmit queues, to schedule the next packet for transmission. On the selected queue, if the Tx SDMA has reached the end of the linked list, the Tx SDMA resets its respective bit, thus disabling that DMA channel. A maskable interrupt in the Transmit SDMA Interrupt Cause Register (Table 563 p. 797) is also generated, to indicate this event. The CPU can transmit packets again on the disabled channel by repeating the above initialization sequence.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 73
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Table 8:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
PCI Interface
Tx SDMA Descriptor Pointer
Note
The CPU must not write to this register while its corresponding DMA is enabled. Modifying this register is allowed only when the respective bit is reset. This register may be read to assess the progress of the DMA, as well as to monitor the status of the queue.s
Tx SDMA Buffer Read Done Interrupt
When the Tx SDMA completes the read operation on the current buffer, it transfers ownership of buffer’s descriptor to the CPU. In addition, if the bit in the descriptor of the current buffer is set to 1, the Tx SDMA sets the relevant bit of the interrupt in the Transmit SDMA Interrupt Cause Register (Table 563 p. 797).
Tx SDMA Resource Error Event
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
A Tx SDMA resource error event occurs when the Tx SDMA reaches the end of the descriptors linked list (a null next descriptor pointer or a CPU-owned descriptor) in the middle of a packet chain (i.e, after fetching the first descriptor and before fetching the last descriptor of the packet). This event may occur only due to a bad configuration of the CPU. In the event of a resource error, the following steps are taken by the Tx SDMA: 1. The Tx SDMA disables the DMA channel of the relevant queue by resetting the relevant bit in the Transmit SDMA Queue Command Register (Table 116 p. 395). 2. The Tx SDMA asserts the relevant bit in the Transmit SDMA Interrupt Cause Register (Table 563 p. 797). Due to chain end, is also asserted. 3. The Tx SDMA discards the partial packet read.
Tx SDMA Recovery from Resource Error
To restart the DMA channel, the CPU must perform the initialization operations defined in "Tx SDMA Initialization" on page 73.
Tx SDMA Disable Queue Operation
The CPU may stop any active Tx SDMA queue at any time.
To disable a queue, set the relevant bit of to 1 in the Transmit SDMA Queue Command Register (Table 116 p. 395). If this happens in the middle of a packet, the Tx SDMA continues to process the packet until packet end or resource error event, then it disables the queue operation by resetting its bit. In addition, the relevant interrupt in the Transmit SDMA Interrupt Cause Register (Table 563 p. 797) is asserted.
MV-S102110-02 Rev. E Page 74
CONFIDENTIAL Document Classification: Restricted Information
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The Tx SDMA holds one 32-bit pointer register per queue—>. This is a 32-bit register used to point to the first descriptor of a packet transmitted from the CPU. The CPU must initialize this register before enabling DMA operation. The value used for initialization should be the address of the head of the Tx SDMA descriptor list. After enabling the DMA channel, this register is updated by the Tx SDMA, as it moves thorough the descriptors chain and reflects the current pointer location within the linked list of descriptors.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The Tx SDMA may encounter a parity error event (derived from the PCI bus) upon descriptor fetch operation, or upon a buffer data read operation.
If a parity error occurs upon buffer data read, the Tx SDMA continues processing the current packet and the packet read is discarded by the device.
Tx SDMA Data Byte Order
In default mode, the device reads data from the PCI bus in Big Endian byte order. Two bits are defined to support byte-swapping, for CPUs that use different Endian byte ordering and different word sizes (32-bit or 64-bit): Word Swap Mod Within each 64-bit format, swaps the highest 32 bits with the lowest 32 bits. Byte Swap Mode Swaps bytes within every 32-bits (byte 3 swapped with byte 0; byte 2 swapped with byte 1).
Configuration
Tx SDMA Descriptor
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In the SDMA Configuration Register (Table 114 p. 393): • To configure Word Swap mode, set the bit. • To configure Byte Swap mode, set the bit.
The CPU indicates to the device Tx SDMA how the packet is to be forwarded through the fields in the Tx SDMA descriptor and the through the fields in the packet DSA tag.
Notes •
All packets sent from the CPU to the device must contain a FROM_CPU or FORWARD DSA tag (see 7.3 "Packets from the CPU" on page 107). • The format of the TO_CPU DSA tag is found in A.2 "Extended DSA Tag in FROM_CPU Format" on page 336. • The format of the FROM_CPU DSA tag is found in A.4 "Extended DSA Tag in FORWARD Format" on page 341. Tx SDMA Descriptor properties: • Descriptor length is 16 bytes, and it must be 16-byte aligned (i.e., Descriptor_Address[3:0]==0000). • Descriptors may reside anywhere in the CPU address space except for a null address (0x00000000), which is used to indicate the end of the descriptor chain. • Descriptors are always fetched in a 16-byte burst. • The Tx SDMA descriptor list is terminated by either a NULL value in the Next Descriptor Pointer field, or by a Tx SDMA descriptor owned by the CPU. • Tx buffers associated with Tx descriptors are limited in length to a maximum of 64 KB. There are no alignment restrictions for buffers with a length greater than 8 bytes. However, buffers with a payload of one to eight bytes must be aligned to a 64-bit boundary. Zero size buffers are illegal.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 75
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
If a parity error occurs upon descriptor fetch, the Tx SDMA performs the same operations as described in "Tx SDMA Resource Error Event" on page 74, however a PCI parity interrupt is asserted instead of a Tx SDMA Resource Error interrupt.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
PCI Interface
Transmit Descriptor Format
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Ownership bit. When set to’1’ buffer is owned by the Tx SDMA. When set to’0’ buffer is owned by CPU.
Device
CPU/
30
Reserved
Must be set to 0.
29:24
Reserved
Must be set to 0.
23
EI
Enable Interrupt. When set, a maskable interrupt will be generated upon closing the descriptor. To limit the number of interrupts and prevent an interrupt per buffer situation, the user should set this bit only in descriptors associated with LAST buffers. If this is done, a TxBuffer interrupt will be set only when transmission of a frame is completed.
CPU
Device
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Device
22
Reserved
Reserved
21
F
If set, indicates the desciptor is the FIRST descriptor of the packet.
CPU
20
L
If set, indicates the desciptor is the LAST descriptor of the packet.
CPU
19:13
Reserved
Reserved
Device
12
RecalcCRC
If set, this field indicates that the four CRC bytes of the packet are invalid and the CRC of the packet should be recalculated when transmitting the packet to its destination.
CPU
11:0
Reserved
Reserved
Device
MV-S102110-02 Rev. E Page 76
CONFIDENTIAL Document Classification: Restricted Information
Number of bytes in the corresponding buffer. This is the payload size of the buffer in bytes. This field contains only the byte count of the buffer pointed to by this descriptor.
CPU
15:0
Reserved
Must be 0.
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Bits
Table 12:
Transmit Descriptor—Buffer Pointer
Bits
Nam e
Desc ription
Se t By
31:0
Buffer Pointer
32-bit pointer to the beginning of the buffer associated with this descriptor. Has a 64-bit alignment requirement for buffers with byte count of 1–8 bytes.
CPU
Table 13:
Transmit Descriptor—Next Descriptor Pointer
Nam e
Desc ription
Se t By
31:0
Next Descriptor Pointer
32-bit pointer that points to the beginning of the next descriptor. Bits[3:0]. Must be set to’0’ (descriptor must be 16-byte aligned). A value of NULL (all zero) can be used to terminate the descriptor list.
CPU
6.1.6
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Bits
Asynchronous Notifications for FDB Update Messages
The device implements an asynchronous message notification mechanism, to notify the host processor about: • Changes performed automatically by the device on the bridge Forwarding Database (FDB). • New Source Address received on a port or trunk group. • Reply to a host CPU query to the FDB. This mechanism is called AUQ (Address Update Queue), as it implements an internal queue to store the messages received from the Bridge engine, until their release to the CPU predefined memory via the PCI bus. These messages, called Address Update (AU) messages, can be transferred over the PCI bus into an Address Update Queue (AUQ) defined in host memory. (In non-PCI systems, MAC Update messages can be read by the host CPU as regular address-mapped registers.) The AU message size is 16 bytes and each AU message transfer is performed using a single PCI Master transaction. For detailed information regarding the FDB Address Update messages see Section 11.4.5 "Address Update (AU) Messages" on page 226. The FDB Address Update message format is defined in MAC Update Message Format (Table 401 p. 663).
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 77
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Table 11:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
PCI Interface
6.1.6.1
Allocating Host Memory for the Address Update Queue (AUQ)
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The host device allocates one or two contiguous buffers (i.e., AUQs) in its memory for AU messages. If two AUQs are allocated, the device automatically switches to the second AUQ when the first AUQ is full, thus allowing more efficient processing of AU messages.
Configuration To configure the size of the AUQ (in terms of 16-byte AU messages), set in the Address Update • •
•
Queue Control Register (Table 94 p. 384). After configuring the AUQ size, the 16-byte aligned AUQ base address is configured by setting the field in the Address Update Queue Base Address Register (Table 93 p. 384). When the CPU sets the AUQ base address, the device validates and loads the AUQ size and its base. The AUQ loads these values only if they are valid. AUQ loading invalidates these registers’ values. To configure the second AUQ, the base address and AUQ size are reconfigured using the same registers.
6.1.6.2
AUQ Forwarding to the Host CPU
To initialize the AUQ after exit from reset, the host CPU must configure the AUQ size and base address. Configuring the base address triggers the AUQ to update its internal data structures. AU messages from the device’s Bridge engine generate a write burst on the PCI bus, using the PCI Master. For each AU message transfer, the AUQ sets the interrupt in the Miscellaneous Interrupt Cause Register (Table 557 p. 794).
Notes • •
6.1.7
The CPU is expected to allocate a new AUQ area as soon as the interrupt is asserted. This is important to prevent lack of address space for the AUQ, which stops FDB table updates. For continuous operation, it is recommended to allocate two buffers for AU messages.
PCI Interrupts
See Section 6.6 "Interrupts" on page 98.
MV-S102110-02 Rev. E Page 78
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
When the AUQ is filled (i.e., the last AU message entry is written to the AUQ memory allocated), the device sets the interrupt in the Miscellaneous Interrupt Cause Register (Table 557 p. 794). If the CPU has configured another AUQ memory area, the AUQ loads their values and continues to write the incoming AU messages to the new area. If there is not another valid AUQ memory area, the device stops sending AU messages until the host CPU configures a valid AUQ memory area, as described in Section 6.1.6.1 "Allocating Host Memory for the Address Update Queue (AUQ)".
CONFIDENTIAL Document Classification: Restricted Information
Slave SMI interface for CPU management access to all address-mapped entities in the device and all PHY devices connected to it. This interface complies with IEEE 802.3 Clause 22.
Master SMI interface for CPU management of, and Auto-Negotiation with PHY devices connected to the Tri-Speed device’s Ethernet Port0 through Port11. This interface complies with IEEE 802.3 Clause 22.
Master SMI interface for CPU management and Auto-Negotiation with PHY devices connected to the Tri-Speed device’s Ethernet Port12 through Port23 This interface complies with IEEE 802.3 Clause 22.
Master XSMI interface
98DX130 98DX133 98DX260 98DX262 98DX263
98DX269 98DX270 98DX273 98DX803
Master SMI for packet processor access to the integrated HyperG.Stack XAUI PHYs and any external components such as XFP PHYs. The CPU may indirectly access the integrated XAUI PHYs and any external XFP PHY through this interface as well. This interface complies with IEEE 802.3 Clause 45.
HyperG.Stack Port24 Slave XSMI interface
98DX130 98DX133 98DX260 98DX262 98DX263
98DX269 98DX270 98DX273 98DX803
Slave SMI interface for CPU management of the device HyperG.Stack Port24. This interface complies with IEEE 802.3 Clause 45. Typically, this interface is directly connected to the device’s Master XSMI interface via the board.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In terfa ce
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 79
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Table 14:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Serial Management Interfaces (SMI)
SMI Interfaces (Continued) R ele van t D evices
Des cription
HyperG.Stack Port25 Slave XSMI interface
98DX260 98DX262 98DX263 98DX269
98DX270 98DX273 98DX803
Slave SMI interface for CPU management of the device HyperG.Stack Port25. This interface complies with IEEE 802.3 Clause 45. Typically, this interface is directly connected to the device’s Master XSMI interface via the board.
HyperG.Stack Port26 Slave XSMI interface
98DX270 98DX273
98DX803
Slave SMI interface for CPU management of the device HyperG.Stack Port26. This interface complies with IEEE 802.3 Clause 45. Typically, this interface is directly connected to the device’s Master XSMI interface via the board.
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In terfa ce
Note
IEEE 802.3 clause 22 compliant SMI Interfaces are referred to as SMI interfaces. IEEE 802.3 clause 45 compliant SMI Interfaces are referred to as XSMI interfaces.
6.2.1
Serial Management Interface Overview
The following section is an overview of the serial management interface.
MDC: Serial Management Interface Clock
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
6.2.1.1
MCD is the Serial Management clock. In the Master SMI Interfaces, M_MDC0, M_MDC1 and M_XMDC are output pins of the device and run at either 1.56 MHz or at 12 MHz. In the Slave SMI interface, CPU_MDC inputs can run from DC to a maximum rate of 10 MHz.
S_XMDC0, S_XMDC1, and S_XMDC2 inputs can run from DC to a maximum rate of 8.33 MHz.
Configuration • To set M_MDC0 speed for Master SMI Interface0, set the bit in the PHY Address Register0 • •
(for Ports 0 through 5) (Table 258 p. 514). To set M_MDC1 speed for Master SMI Interface1, set the bit in the PHY Address Register2 (for Ports 12 through 17) (Table 260 p. 516). To set M_XMDC speed for Master XSMI Interface, set the bit in the HyperG.Stack and HX/ QX Ports MIB Counters and XSMII Configuration Register (Table 163 p. 430).
6.2.1.2
MDIO: Serial Management Interface Data
MDIO, for all SMI interfaces, is the Serial Management data input/output pin. It is a bi-directional signal that runs synchronously to the relevant MDC.
MV-S102110-02 Rev. E Page 80
CONFIDENTIAL Document Classification: Restricted Information
Although the SMI Interface requires a preamble of 32 bits, the devices are permanently programmed for preamble suppression. A minimum of one Idle MDC cycle is required between two consecutive transactions.
6.2.1.4
IEEE 802.3 Clause 45 SMI Framing
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The XSMI implements a 16-bit address register that is used as follows: • For an address cycle, it contains the address of the register to be accessed on the next cycle. • For read, write, and post-read increment-address cycles, the field contains the data for the register. At power-up and reset, the content of the register is undefined.
Write, read, and post-read-increment-address frames access the address register, though write and read frames do not modify the contents of the address register. Table 16 outline the XSMI Interface frames supported: Table 16:
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 81
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Table 15:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Serial Management Interfaces (SMI)
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Note
6.2.2
CPU SMI Interface
The device incorporates a Slave SMI interface, which is used for address-mapped entities access when the PCI interface is not used. This interface may also be used in parallel to the PCI Interface. The CPU can access the device memory space via read and write cycles on the SMI Interface. The device registers are PCI-compatible registers with 32-bit address and 32-bit data. As the Slave SMI interface is compliant with IEEE 802.3u Clause 22, it uses a 5-bit PHY address, 5-bit register address, and 16-bit data. Consequently, a direct access to the device’s 32-bit registers is not supported. The Read and Write accesses are done indirectly using the 16-bit data portion of the SMI transaction to write the device’s register address in two SMI transactions and the device’s register data, to be written or read, in two SMI transactions.
6.2.2.1
Device SMI PHY Address
All accesses to the device are performed using the device’s PHY address.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The device uses only one PHY address for all address-mapped entities accesses. The PHY address is a 5-bit address. The PHY Address is sampled at reset (For details see the relevant device Hardware Specifications). The HOST CPU may change .
6.2.2.2
Device SMI Registers
As the device uses one SMI PHY address, its SMI register space consists of 32 registers. The device uses the following SMI registers: • SMI Read-Write Status Register (Table 17 p. 83) • SMI Write Address MSBs Register (Table 18 p. 83) • SMI Write Address LSBs Register (Table 19 p. 83) • SMI Write Data MSBs Register (Table 20 p. 83) • SMI Write Data LSBs Register (Table 21 p. 84) • SMI Read Address MSBs Register (Table 22 p. 84) • SMI Read Address LSBs Register (Table 23 p. 84) • SMI Read Data MSBs Register (Table 24 p. 84) • SMI Read Data LSBs Register (Table 25 p. 85)
The following subsections defines the offset and content of each SMI register.
MV-S102110-02 Rev. E Page 82
CONFIDENTIAL Document Classification: Restricted Information
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Although the SMI Interface requires a preamble of 32 bits, the devices are permanently programmed for preamble suppression. A minimum of one Idle MDC cycle is required between two consecutive transactions.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Read-only register used for indicating the read and write status of the SMI interface. SMI Read-Write Status Register Offset: 0x1F
B i ts
F ie l d
Ty p e / InitVal
D e s c r i p t io n
15:2
Reserved
RO
Reserved for future use.
SMIWriteDone
R0 0x1
SMI Interface write Status. When this bit is HIGH, the device has completed the current write transaction and is ready for another read or write transaction.
SMIReadRdy
R0 0x1
An indication that the device has completed the read transaction and the read data is ready for the CPU to read in the SMI Read Data MSBs Register (Table 24 p. 84) and the SMI Read Data LSBs Register (Table 25 p. 85).
1
0
SMI Write Address MSBs Register
Write-only register for the 16 MSBs of the 32-bit address of the device in a write transaction. Table 18:
SMI Write Address MSBs Register Offset: 0x0
F ie l d
Ty p e / InitVal
D e s c r i p t io n
15:0
SMIWrAddr MSBs
WO
16 MSBs of the 32-bit Address of the device in a write transaction
SMI Write Address LSBs Register
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
B i ts
Write-only register for the 16 LSBs of the 32-bit Address of the device in a write transaction. Table 19:
SMI Write Address LSBs Register Offset: 0x1
B i ts
F ie l d
Ty p e / InitVal
D e s c r i p t io n
15:0
SMIWrAddr LSBs
WO
16 LSBs of the 32-bit Address of the device in a write transaction
SMI Write Data MSBs Register
Write-only register for the 16 MSBs of the 32-bit Data to be written to a device’s address-mapped entity. Table 20:
SMI Write Data MSBs Register Offset: 0x2
B i ts
F ie l d
Ty p e / InitVal
D e s c r i p t io n
15:0
SMIWrData MSBs
WO
16 MSBs of the 32-bit Data to be written to a device’s address-mapped entity
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 83
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Table 17:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Serial Management Interfaces (SMI)
SMI Write Data LSBs Register
B i ts 15:0
SMI Write Data LSBs Register Offset: 0x3
F ie l d
Ty p e / InitVal
D e s c r i p t io n
SMIWrData LSBs
WO
16 LSBs of the 32-bit Data to be written to a device’s address-mapped entity
SMI Read Address MSBs Register
Write-only register for the 16 MSBs of the 32-bit Address of the device in a read transaction. Table 22: B i ts 15:0
SMI Read Address MSBs Register Offset: 0x4
F ie l d
Ty p e / InitVal
D e s c r i p t io n
SMIRdAddr MSBs
WO
16 MSBs of the 32-bit Address of the device in a read transaction
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
SMI Read Address LSBs Register
Write-only register for the 16 LSBs of the 32-bit Address of the device in a read transaction. Table 23: B i ts
SMI Read Address LSBs Register Offset: 0x5
15:0
F ie l d
Ty p e / InitVal
D e s c r i p t io n
SMIRdAddr LSBs
WO
16 LSBs of the 32-bit Address of the device in a read transaction
SMI Read Data MSBs Register
Read-only register for the 16 MSBs of the 32-bit Data read from the device’s address-mapped entity. Table 24:
SMI Read Data MSBs Register Offset: 0x6
B i ts
F ie l d
Ty p e / InitVal
D e s c r i p t io n
15:0
SMIRdData MSBs
RO
16 MSBs of the 32-bit Data read from the device. NOTE: This register contains valid data only when in the SMI Read-Write Status Register (Table 17 p. 83) is HIGH.
SMI Read Data LSBs Register
Read-only register for the 16 LSBs of the 32-bit Data read from the device’s address-mapped entity.
MV-S102110-02 Rev. E Page 84
CONFIDENTIAL Document Classification: Restricted Information
16 LSBs of the 32-bit Data read from the device. NOTE: This register contains valid data only when in the SMI Read-Write Status Register (Table 17 p. 83) is HIGH.
6.2.2.3
Write Transaction
To write to a device’s address-mapped entity the following SMI transaction must be performed: 1. Write 16 MSBs of the device’s 32-bit Address to the SMI Write Address MSBs Register (Table 18 p. 83). 2. Write 16 LSBs of the device’s 32-bit Address to the SMI Write Address LSBs Register (Table 19 p. 83). 3. Write 16 MSBs of the device’s 32-bit Data to the SMI Write Data MSBs Register (Table 20 p. 83). 4. Write 16 LSBs of the device’s 32-bit Data to the SMI Write Data LSBs Register (Table 21 p. 84). 5. Read SMI Read-Write Status Register (Table 17 p. 83) until is HIGH.
6.2.2.4
Read Transaction
6.2.3
R
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
To read from a device’s address-mapped entity the following SMI transaction must be performed: 1. Write 16 MSBs of the device’s 32-bit Address to the SMI Read Address MSBs Register (Table 22 p. 84). 2. Write 16 LSBs of the device’s 32-bit Address to the SMI Read Address LSBs Register (Table 23 p. 84). 3. Read SMI Read-Write Status Register (Table 17 p. 83) until is HIGH. 4. Read SMI Read Data MSBs Register (Table 24 p. 84). 5. Read SMI Read Data LSBs Register (Table 25 p. 85).
Master SMI Interfaces
This section is relevant for the following devices:
D SecureSmart: 98DX106, 98DX163, 98DX163R, 98DX243, 98DX262 D SecureSmart Stackable: 98DX169, 98DX249, 98DX269 D Layer 2+ Stackable: 98DX130, 98DX166, 98DX246, 98DX250, 98DX260, 98DX270 D Multilayer Stackable: 98DX107, 98DX133, 98DX167, 98DX247, 98DX253, 98DX263, 98DX273 U
Not relevant for the 98DX803.
The device maintains two, IEEE 802.3 clause 22 compliant, Master Serial Management interfaces for managing external GbE PHY devices, as well as for Auto-Negotiation with an external GbE PHY device. Master SMI Interface0 (M_MDC0/M_MDIO0) is used for managing GbE PHY devices connected to the device’s Tri-Speed ports 0 through 11. Master SMI Interface1 (M_MDC0/M_MDIO0) is used for managing GbE PHY devices connected to the device’s ports 12 through 23.
6.2.3.1
Tri-Speed Ports PHY Address
The device holds a PHY address for each of the PHYs connected to each of the device’s ports in configurable registers. PHY address is used by the SMI Interface when accessing the PHY device connected to port.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 85
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
SMI Read Data LSBs Register Offset: 0x7
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Table 25:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Serial Management Interfaces (SMI)
6.2.3.2
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
To configure the PHY Address of the PHY devices connected to Tri - Speed ports 0 through 5, set the , , , , , and fields in the PHY Address Register0 (for Ports 0 through 5) (Table 258 p. 514) accordingly. It holds six PHY addresses for PHY devices connected to ports 0 through 5. • PHY Address Register1 (for Ports 6 through 11) (Table 259 p. 515) holds six PHY addresses for PHY devices connected to ports 6 through 11. • PHY Address Register2 (for Ports 12 through 17) (Table 260 p. 516) holds six PHY addresses for PHY devices connected to ports 12 through 17. • PHY Address Register3 (for Ports 18 through 23) (Table 261 p. 516) holds six PHY addresses for PHY devices connected to ports 18 through 23.
Tri-Speed Ports PHY Registers Management via SMI Interface
The device provides a mechanism for PHY registers read and write access. PHY registers read and write transactions are supported by both SMI interfaces, using the SMI0 Management Register (Table 256 p. 513) for accessing PHY devices connected to Master SMI interface0 and the SMI1 Management Register (Table 257 p. 514) for accessing PHY devices connected to Master SMI interface1.
6.2.3.3
Reading a PHY Register
6.2.3.4
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
To read the value of a PHY register via the PCI Interface, TWSI Interface, or CPU SMI interface, the following procedure is performed. The procedure described is for the Master SMI Interface0, but it is the same for Master SMI Interface1: 1. Read SMI0 Management Register (Table 256 p. 513) until is 0. When Read is performed via the TWSI Interface, which is considerably slower than the SMI Interface, this stage may be skipped. 2. Write the following to the SMI0 Management Register (Table 256 p. 513): - 1: Op code for read transaction. - Address of the register to be read. - PHY address from which the register is to be read. 3. Read SMI0 Management Register (Table 256 p. 513) until is set to 1. When is 1, the content of the Read register is in .
Writing to a PHY Register
For writing the value of a PHY register via the PCI Interface, TWSI Interface, or CPU SMI interface, the following procedure is performed. The procedure described is for Master SMI Interface1, but it is the same for the Master SMI1 Interface: 1. Read SMI0 Management Register (Table 256 p. 513) until is 0. When Read is performed via the TWSI Interface, which is considerably slower than the SMI Interface, this stage may be skipped. 2. Write the following to the SMI0 Management Register (Table 256 p. 513): - 0: Op code for write transaction. - Address of the register to be written. - PHY address in which the register is to be written. - Data to be written into the register.
MV-S102110-02 Rev. E Page 86
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The device uses a standard master Serial Management Interface bus for reading from/writing to the PHY registers.
For ports that are Auto-Media Selection enabled, the PHY polling unit also polls the PHY registers, to determine the medium connected to this port (fiber or copper). In normal behavior, the device polls the Status register of each PHY in a round-robin manner. The PHY polling unit uses Master SMI Interface0 for Auto-Negotiation and for polling PHY devices connected to Tri-Speed ports 0 through 11, and uses Master SMI Interface1 for Auto-Negotiation and for polling PHY devices connected to Tri-Speed ports 12 through 23. Thus PHY devices connected to Tri-Speed ports 0 through 11 must be connected to Master SMI Interface0 and PHY devices connected to Tri-Speed ports 12 through 23 must be connected to Master SMI Interface1. If the device detects a change in the link from down to up on one of the ports, it performs a series of register reads from the PHY and updates the Auto-Negotiation results in the device’s registers. The Port MAC Status register is updated with these results only if Auto-Negotiation is enabled (see Port Status Register0 (0<=n<24, CPUPort = 63) (Table 149 p. 418)). The device enables full configuration of the Auto-Negotiation functionality via configuration of the Port AutoNegotiation Configuration Register (0<=n<24, CPUPort = 63) (Table 148 p. 415).
6.2.4
R
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
For further details about Auto-Negotiation settings and Auto-Media Select see Section 6.5.3 "CPU Port MAC Operation and Configuration" on page 95.
Master XSMI Interface
This section is relevant for the following devices:
D SecureSmart: 98DX262 D SecureSmart Stackable: 98DX269 D Layer 2+ Stackable: 98DX130, 98DX260, 98DX270, 98DX803 D Multilayer Stackable: 98DX133, 98DX263, 98DX273
The 98DX130, 98DX133, 98DX260, 98DX262, 98DX263, 98DX270,98DX273, and 98DX803 maintain an IEEE 802.3 Clause 45 compliant Master Serial Management interface for managing their integrated XAUI PHYs and other devices, such as XFP PHYs, connected to its HyperG.Stack ports. Since the 98DX130 and 98DX133 integrate one, the 98DX260, 98DX262, and 98DX263 integrate two, and the 98DX270,98DX273, and 98DX803 integrate three XAUI PHYs, the XSMI Master must be connected to these XAUI PHYs’ slave XSMI interfaces, as illustrated in Figure 12. This connection is done over the board, to enable additional devices with Slave XSMI interface connection to the device’s Master XSMI interface. The device provides a mechanism for registers’ read and write access via the PCI interface, TWSI interface, or CPU SMI interface. This access consists of two/three phases. The first phase is performed as a regular general access via the PCI/SMI/TWSI interface, to configure the XSMI transaction parameters (located in XSMI Management Register (Table 131 p. 401) and XSMI Address Register (Table 132 p. 403)). The next phase is performed by the XSMI Master, which generates the appropriate transaction over the XSMI bus. If this is a read transaction, then there is a third phase in which the CPU reads the XSMI Management Register (Table 131 p. 401) via the PCI/SMI/TWSI interface, until it determines that the read data within it is valid (according to the bit).
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 87
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
In addition, the PHY polling unit performs Auto-Negotiation with PHY devices attached to the Tri-Speed ports via the Master SMI Interface.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Serial Management Interfaces (SMI)
XSMI Transactions
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The XSMI Master is able to perform the following operations, which are coded at the field in the XSMI Management Register (Table 131 p. 401): • Write only (Opcode = 1). • Incremented address read only (Opcode = 2). • Read only (Opcode = 3). • Address then Write (Opcode = 5). • Address then incremented address read (Opcode = 6). • Address then Read (Opcode = 7). The “Read/Write only” consists of content frame only, whereas the Address-then-read/write/incremented read consist sof address and content frames. The “Incremented address read” is content-only access, which is followed by an increment of the stored address at the PHY device. This type of operation is used in order to efficiently access several PHY registers mapped to successive addresses. The following subsections describe the details of several basic management operations via the XSMI Master.
6.2.4.2
Reading a PHY Register
6.2.4.3
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
To read the value of a PHY register, perform the following procedure: 1. Read XSMI Management Register (Table 131 p. 401) until is 0. When Read is performed via the TWSI Interface, which is considerably slower than the PCI/SMI Interfaces, this stage may be skipped. 2. Write the PHY register Address to the field in the XSMI Address Register (Table 132 p. 403). 3. Write the following to the XSMI Management Register (Table 131 p. 401): - 7: Op code for Address then Read transaction. - Address of the PHY. - Address of the device within the PHY. 4. Set in the Read XSMI Management Register (Table 131 p. 401). When is 1, the content of the Read register is in the Data field.
Writing to a PHY Register
To write to a PHY register, perform the following procedure: 1. Read the XSMI Management Register (Table 131 p. 401) until is 0. When Read is performed via the TWSI Interface, which is considerably slower than the PCI/SMI Interfaces, this stage may be skipped. 2. Write the PHY register Address to the field in the XSMI Address Register (Table 132 p. 403). 3. Write the following to the XSMI Management Register (Table 131 p. 401): - 5: Op code for Address then Write transaction. - Address of the PHY. - Address of the device within the PHY. - Write data.
6.2.4.4
Read Modify Write to a PHY Register
For a “Read Modify Write” operation of a PHY register, perform the following procedure: 1. Read XSMI Management Register (Table 131 p. 401) until is 0. When Read is performed via the TWSI Interface, which is considerably slower than the PCI/SMI Interfaces, this stage may be skipped. 2. Write the PHY register Address to the field in the XSMI Address Register (Table 132 p. 403).
MV-S102110-02 Rev. E Page 88
CONFIDENTIAL Document Classification: Restricted Information
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
To read several successive PHY registers, perform the following procedure: 1. Read the XSMI Management Register (Table 131 p. 401) until is 0. When Read is performed via the TWSI Interface, which is considerably slower than the PCI/SMI Interfaces, this stage may be skipped. 2. Write the first PHY register Address to the field in the XSMI Address Register (Table 132 p. 403). 3. Write the following to the XSMI Management Register (Table 131 p. 401): - 6: Op code for Address then incremented address Read transaction. - Address of the PHY. - Address of the device within the PHY. 4. Read XSMI Management Register (Table 131 p. 401) . When is 1, the content of the Read register is in the Data field. 5. Perform steps 3 and 4 as many times as necessary, to read all registers except the last one. 6. On the last register read, write the following to the XSMI Management Register (Table 131 p. 401): - 3: Op code for Read only transaction. - Address of the PHY. - Address of the device within the PHY. 7. Read XSMI Management Register (Table 131 p. 401) . When is 1, the content of the Read register is in the Data field.
Slave XSMI Interfaces
This section is relevant for the following devices:
D SecureSmart: 98DX262 D SecureSmart Stackable: 98DX269 D Layer 2+ Stackable: 98DX260, 98DX270 D Multilayer Stackable: 98DX263, 98DX273
Each of the HyperG.Stack ports incorporates a XAUI PHY. The transceiver XGXS/XAUI function is controlled by a dedicated XMDIO management interface, as defined in IEEE 802.3ae Clause 45. The register map is composed of IEEE-defined manageable device registers and vendor-specific registers. The XGXS/XAUI can be configured as a PCS, PHY, and DTE device. Each of these Slave XSMI Interfaces is connected to the device’s Master XSMI Interface, as illustrated in Figure 12.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 89
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Write the following to the XSMI Management Register (Table 131 p. 401): - 7: Op code for Address then Read transaction. - Address of the PHY. - Address of the device within the PHY. Read XSMI Management Register (Table 131 p. 401) . When is 1, the content of the Read register is in the Data field. Write the following to the XSMI Management Register (Table 131 p. 401): - 5: Op code for Address then Write transaction. - Address of the PHY. - Address of the device within the PHY. - Write data.
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
3.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Serial Management Interfaces (SMI)
Configuration
Two Wire Serial Interface (TWSI)
This section describes the device’s Two-Wire Serial Interface (TWSI).
The device provides full TWSI support. It acts as a master—generating read/write requests, as well as a slave— responding to read/write requests. The device fully supports multiple TWSI master environments (clock synchronization, bus arbitration etc.). The primary use of the TWSI interface is for serial ROM initialization.
6.3.1
TWSI Overview
The TWSI is used as a master for EPROM initialization of the device. After the EEPROM initialization phase is done, the TWSI moves to TWSI slave mode. It can then be used for read and write access of all of the device’s address-mapped entities
6.3.2
TWSI Bus Operation
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The TWSI can be operated with a 100 KHz clock.
The TWSI port consists of two open-drain signals—Serial Clock (SCL) and Serial Data/Address (SDA).
The TWSI master starts a transaction by driving a start condition followed by a 7-bit or 10-bit slave address and a read/write bit indication. The target TWSI slave responds with an acknowledge. For write access (R/W bit is 0) following the acknowledge, the master drives 8-bit data and the slave responds with acknowledge. This write access (8-bit data followed by acknowledge) continues until the TWSI master ends the transaction with a stop condition. For a read access, following the slave address acknowledge, the TWSI slave drives 8-bit data and the master responds with acknowledge. This read access (8-bit data followed by acknowledge) continues until the TWSI master ends the transaction, by responding with no acknowledge to the last 8-bit data, followed by a stop condition. If a target slave cannot drive valid read data immediately after it has received the address, it can insert “wait states” by forcing SCL low until it has valid data to drive on the SDA line.
A master is allowed to combine two transactions. After the last data transfer, it can drive a new start condition followed by a new slave address, rather than a drive stop condition. Transaction combining guarantees that the master will not loose arbitration to another TWSI master.
6.3.3
Serial ROM Initialization
The device supports initialization of all its configuration registers and memories, as well as other system components, through the TWSI master interface. At exit from reset, the device’s TWSI master starts reading initialization data from the serial ROM and writes it to the appropriate registers or RAM arrays. EPROM initialization can be triggered by global reset hard/soft reset to the device. The size of the EPROM is determined according to the reset sampling (see Prestera® Hardware Specifications.
MV-S102110-02 Rev. E Page 90
CONFIDENTIAL Document Classification: Restricted Information
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
6.3
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In the Port XAUI PHY Configuration Register1 (24<=n<27) (Table 166 p. 436): • To configure the XAUI transceiver SMI PHY address, set the field accordingly. • To configure the XAUI transceiver SMI Device address, set the field accordingly. • To configure the XAUI transceiver to accept any device address on the SMI interface, set the bit.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
The device reads eight bytes at a time. The first four bytes are treated as address and the last four bytes are treated as data. Address bit [31] indicates which device internal address space is accessed. Address[31] = 0: Access to the PCI register address space. Address[31] = 1: Access to the Device address space.
The TWSI Last Address Register (Table 97 p. 386) contains the expected value of the last serial data item (default value is 0xFFFFFFFF). When the device reaches the last data, it stops the initialization sequence.
Notes • • • •
The data read from the Serial ROM is assumed to have Little Endian byte ordering. Use EEPROMs with 8- or 16-bit slave address, according to the size of the EEPROM device used. The EPROM device slave address is sampled at reset. (For details see the relevant device Hardware Specifications) The device must not be reset when it is in the middle of a master EPROM transaction. This may cause the EPROM to hang.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 91
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Figure 15: Serial ROM Data Structure
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Two Wire Serial Interface (TWSI)
6.3.3.2
Disabling the TWSI Interface
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
When the TWSI interface is not in use, it must be disabled as follows: 1. Tie SCL and SDA to pullup VDD_MISC. Each pin must use a separate resistor. 2. Tie SRESET_EEPROM_DIS to a pullup resistor (see the Prestera® Hardware Specification).
After EEPROM initialization has been completed, the TWSI interface enters TWSI slave mode. It can then be used for read and write access of all the device registers that are mapped to the device address space. The device TWSI slave interface address is 7 bits. The value of the four highest bits and the value of the lowest three bits are sampled (For details see the relevant device Hardware Specifications). To transfer the register address and operation code, the Master writes four times. This is followed by four writes to/ reads from the register data, according to the operation code. The address and data format are illustrated in the following figures:
Figure 16: TWSI Bus Transaction—External Master Write to a Device Register Data7[7:0] = RegData [7:0]
Stop ACK
Data6[7:0] = RegData [15:8]
ACK
Data5[7:0] = RegData [23:16]
ACK
State is driven by the Master
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Data4[5:0] = RegData [31:24]
ACK
Data3[7:0] = RegAddr [7:0]
ACK
Data2[7:0] = RegAddr [15:8]
ACK
Data1[7:0] = RegAddr [23:16]
ACK
Data0[7:0] = {0,0,RegAdd r[29:24]}
ACK
ACK R/W=0
Device TWSI Addr [7:1]
State is driven by the Slave
M
Figure 17: TWSI Bus Transaction–External Master Read from a Device Register
Data3[7:0] = RegAddr [7:0]
Stop ACK
ACK
Data3[5:0] = RegData [7:0]
Stop
Data2[5:0] = RegData [15:8]
ACK
Data2[7:0] = RegAddr [15:8]
ACK
Data1[5:0] = RegData [23:16]
ACK
Data1[7:0] = RegAddr [23:16]
ACK
Data0[5:0] = RegData [31:24]
ACK
Data0[7:0] = {1,0,RegAdd r[29:24]}
ACK R/W=1
Start
Device TWSI Addr [7:1]
ACK R/W=0
Start
Device TWSI Addr [7:1]
State is driven by the Master
State is driven by the Slave
To perform write access, the following sequence from an external TWSI master is applied: • The Master transfers a 32-bit address in which bits [31:30] are {0,0} and bits[29:0] are the offset of the register address in the device address space. • The Master transfers 32 bits of data to be written to the addressed register. .To perform read access, the following sequence from an external TWSI master is applied: The Master transfers a 32-bit address in which bits [31:30] are {1,0} and bits[29:0] are the offset of the register address in the device address space. • The Master reads 32 bits of data four times.
•
MV-S102110-02 Rev. E Page 92
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The address space is accessible via the PCI interface, the CPU Slave SMI interface, or the TWSI interfaces. Table 26 outline the device’s address space partitioning Address Space Partitioning
Bit #
D e f in i ti o n
Not e s
31
Reserved
When accessing the register Address Space from the TWSI bus, this bit is significant: 0 = Access to PCI space. 1 = Access to the device’s space.
0 = Access to the device’s Address space
Set to 0
0 = Internal register or tables, set to 0
Set to 0
30 29
28:2 3 W he n Ad dre ssing In tern al Re gister s a nd Tab le s ([2 9] = 0) 0 = Global registers, SDMA registers, Master XSMI registers and TWSI registers. 1–2 = Reserved. 3 = Transmit Queue registers. 4 = Ethernet Bridge registers. 5 = Reserved. 6 = Buffer Management register. 7 = Reserved. 8 = Ports group0 configuration registers (port0 through port5), LEDs interface0 configuration registers, and Master SMI interface0 registers. 9 = Ports group1 configuration registers (port6 through port11) and LEDs interface0. 10 = Ports group2 configuration registers (port12 through port17), LEDs interface1 configuration registers, and Master SMI interface1 registers. 11 = Ports group3 configuration registers (port18 through port23) and LEDs interface1. 12 = MAC Table Memory. 13 = Internal Buffers memory Bank0 address space. 14 = Internal Buffers memory Bank1 address space. 15 = Buffers memory block configuration registers. 20 = VLAN table configuration registers and VLAN table address space. 21 = Ports registers, including CPU port (port# 0x3F). Bits [13:8] are used as port number. Bits [7:2] are used as the register address. Bits [1:0] should be “00”. 22 = Eq. 23 = PCL. 24 = Policer. 63–25 = Reserved.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
28:23
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 93
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Table 26:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Device Address Space
6.5
CPU MII/GMII/RGMII Port
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The device implements a standard Gigabit Ethernet MAC over MII, GMII, or RGMII, to interface with the host CPU. This is an alternative interface for the CPU to send/receive traffic to/from the device.
• •
6.5.1
In PCI systems, the PCI is used as the CPU packet interface. For details, see 6.1.5 "Packet Reception and Transmission". This CPU MII/GMII/RGMII interface must not be used when the PCI Interface is used.
Port Type Configuration
The device’s CPU port interface type is sampled at reset (For details see the relevant device Hardware Specifications). Table 27 describes the CPU interface type according to this configuration. When the CPU port is not used, set CPU_IF_TYPE[1:0] to 0.
Table 27:
CPU Port Interface According to CPU_IF_TYPE[2:0]
CPU_IF_TYPE[1:0]
CPU Port Interface
0
CPU port interface is MII MAC Mode
2 3
CPU port interface is GMII
CPU port interface is RGMII
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
1
CPU port interface is MII PHY Mode
For further details regarding the CPU Interface type setting see the Prestera® Hardware Specification.
The CPU interface type is reflected in the field of the CPU Port Global Configuration Register (Table 100 p. 387).
6.5.2
CPU Port MAC Overview
The CPU Port MAC implements a standard Gigabit Ethernet MAC over MII, GMII, or RGMII, to interface with the host CPU. The device implements a standard MAC, which filters out received frames shorter than 64 bytes or longer than the Maximum Receive Unit. It also filters out received packets with a bad CRC or those in which a receive error occurred during packet reception. The MAC also maintains the minimum IPG restriction on transmitted packets. In 10/100 Mbps half-duplex modes it implements the CSMA/CD protocol (collision detect and retransmit). The MAC has a set of LED indicators (Section 17. "LED Interface" on page 318).
The 10/100 Mbps speeds and the half-duplex are supported In MII and RGMII modes only.
The CPU port does not support Auto-Negotiation and must be set manually for Link, Speed, Duplex mode and Flow Control support.
All packets received via the CPU port must be DSA-tagged (see Section 7. "CPU Traffic Management" on page 102).
MV-S102110-02 Rev. E Page 94
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
This section specifies the MAC operation and configuration.
6.5.3.1
CPU Port Activation
Configuration
To activate/de-activate the CPU Port, set the bit in the CPU Port Global Configuration Register (Table 100 p. 387).
6.5.3.2
Port Enable
See Section 9.4.1 "Port Enable" on page 144 for Tri-Speed ports.
6.5.3.3
Link State
As the CPU port does not support Auto-Negotiation, its link state must be set manually.
Configuration
Interrupt
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In the Port Auto-Negotiation Configuration Register (0<=n<24, CPUPort = 63) (Table 148 p. 415): • To force a Tri-Speed port link state to up, set the bit. • To force a Tri-Speed port link state to down, set the bit.
The field in the Port Interrupt Cause Register (0<=n<24, CPUPort = 63) (Table 571 p. 802) is set upon any change of the Tri-Speed Port link state.
6.5.3.4
Port Status Register
See Section 9.4.3 "Port Status Register" on page 145 for Tri-Speed ports.
6.5.3.5
Disable CRC Checking on Received Packets
See Section 9.4.4 "Disable CRC Checking on Received Packets" on page 145 for Tri-Speed ports.
6.5.3.6
Short Packets Padding
See Section 9.4.5 "Short Packets Padding" on page 146 for Tri-Speed ports.
6.5.3.7
Preamble Length
See Section 9.4.6 "Preamble Length" on page 146 for Tri-Speed ports.
6.5.3.8
Maximum Receive Unit (MRU)
See Section 9.4.7 "Maximum Receive Unit (MRU)" on page 147 for Tri-Speed ports that are configured as Cascading ports.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 95
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
To save power, at default the CPU port is inactive. For it to be operational, the CPU port must be activated.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
CPU MII/GMII/RGMII Port
6.5.3.9
802.3x Flow Control
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
As the CPU port does not support Auto-Negotiation, Flow Control support must be set manually. See Section 9.4.8 "802.3x Flow Control" on page 148 for Tri-Speed ports.
6.5.3.10
Back Pressure in Half-Duplex Mode
6.5.3.11
Speed Setting
Speed Setting in MII MAC Mode or MII PHY Mode
In MII MAC mode or MII PHY mode, the MAC speed may be set to 10 Mbps or 100 Mbps. As Auto-Negotiation is not supported on the CPU port, the port speed must be set manually
Configuration
In the Port Auto-Negotiation Configuration Register (0<=n<24, CPUPort = 63) (Table 148 p. 415): • Clear the bit. • Clear the bit. • To set the ports speed to 10 Mbps or 100 Mbps, set the bit.
Speed Setting in GMII Mode
Configuration
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In GMII mode, the MAC speed must be set to 1000 Mbps. As Auto-Negotiation is not supported on the CPU port, this must be set manually.
In the Port Auto-Negotiation Configuration Register (0<=n<24, CPUPort = 63) (Table 148 p. 415): • Clear the bit. • Set the bit.
Speed Setting in RGMII Mode
In RGMII mode, the MAC speed must be set to 10/100/1000Mbps.
As Auto-Negotiation is not supported on the CPU port, the port speed must be set manually.
Configuration
In the Port Auto-Negotiation Configuration Register (0<=n<24, CPUPort = 63) (Table 148 p. 415): • Clear the bit. • To set the speed to 10 Mbps:
•
•
– –
Clear the bit.
– –
Clear the bit.
–
Set the bit.
Clear the bit. To set the speed to 100 Mbps: Set the bit. To set the speed to 1000 Mbps:
MV-S102110-02 Rev. E Page 96
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
6.5.3.12
Duplex Mode in MII MAC Mode or MII PHY Mode
In MII MAC Mode or MII PHY mode, the MAC may operate in full-duplex or half-duplex mode.
Configuration
In the Port Auto-Negotiation Configuration Register (0<=n<24, CPUPort = 63) (Table 148 p. 415): • Clear the bit. • To configure full-duplex or half-duplex mode, set or clear the bit.
Duplex Mode in GMII Mode
In GMII mode the port speed is 1000 Mbps. As the MAC does not support half-duplex at this speed, the duplex mode must be set to full-duplex. As Auto-Negotiation is not supported on the CPU port, the duplex mode must be set manually.
Configuration
Duplex Mode in RGMII Mode
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In the Port Auto-Negotiation Configuration Register (0<=n<24, CPUPort = 63) (Table 148 p. 415): • Clear the bit. • Set the bit.
In RGMII mode, half-duplex mode is supported only when the port’s speed is 10 Mbps or 100 Mbps. When the port’s speed is 1000 Mbps, Duplex mode must be set to full-duplex (see "Duplex Mode in GMII Mode" above). As Auto-Negotiation is not supported on the CPU port, the port’s duplex mode must be set manually.
Configuration
In the Port Auto-Negotiation Configuration Register (0<=n<24, CPUPort = 63) (Table 148 p. 415): • Clear the bit. • To configure full-duplex or half-duplex mode, set or clear the bit.
6.5.3.13
Excessive Collisions
See 9.4.13 "Tri-Speed Port: Excessive Collisions" on page 154.
6.5.4
Auto-Negotiation
The CPU port does not support Auto-Negotiation. The port’s link, speed, Duplex mode and Flow Control support must be set by the host CPU.
6.5.5
CPU Port MIB Counters
The device implements a limited number of MAC MIB Counters for the CPU port, thus providing the necessary counters to support CPU port statistics. The counters and their addresses are listed in C.2.8 "CPU Port Configuration Register and MIB Counters" on page 387.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 97
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
As Auto-Negotiation is not supported on the CPU port, the duplex mode must be set manually.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
CPU MII/GMII/RGMII Port
•
CPUPort = 63) (Table 145 p. 411). By setting the bit in the CPU Port Global Configuration Register (Table 100 p. 387), any CPU port MIB counter read is reset to ZERO (cleared on read). By clearing the bit in the CPU Port Global Configuration Register (Table 100 p. 387), any CPU port MIB counter read is not reset to ZERO (not cleared on read).
6.6
Interrupts
The device implements a hierarchical interrupt scheme. An Interrupt Cause and Interrupt Mask register is defined for each functional block. Each block provides a summary bit of all its interrupts to one Global Interrupt Cause register. The Global Interrupt Cause register contains the summary bits sent from all peripheral functional blocks and other interrupts belonging to the PCI. The PCI interrupt pin is an open-drain output, which is asserted LOW if any bit in the Global Interrupt Summary is set. The Global Interrupt Summary is a summary of all interrupts in the Global Interrupt Cause register. It is derived from a logical OR of all unmasked interrupt bits in the Global Interrupt Cause register. At reset, all Interrupt Mask and Cause registers are set to zero, therefore all interrupts are masked after reset. For a summary of all of the device’s interrupts, see C.19 "Summary of Interrupt Registers" on page 790. The interrupt scheme is illustrated in Figure 18.
M
INTn
M AR int_sumB_ VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Figure 18: Hierarchal Interrupt Scheme
Global Int Cause Reg
Global Cause Mask Reg
int_sumA_
Reg File A
Reg File B
Int Cause Reg
Page 98
Reg File N
Int Cause Reg
Int Cause Mask Reg
MV-S102110-02 Rev. E
int_sumN_
Int Cause Mask Reg
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Note
6.6.1
Interrupt Types
The interrupts are divided into the following two groups: Summary interrupts: Logical OR of all unmasked bits in the Cause register. Functional interrupts: Interrupts asserted by the device after an event has occurred. The functional interrupts are located within the Functional Block Cause register and the summary interrupts are always held in the least significant bit of each local Interrupt Cause register. A copy of each summary interrupt is held in the Global Cause register.
6.6.2
Setting and Resetting Interrupts
All interrupts are set asynchronously by the device. The functional interrupts are set after the relevant event has occurred. Summary interrupts are set only as a result of assertion of one of the unmasked bits in the respective Interrupt Cause registers.
Notes • •
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The device resets the interrupts after the host device reads the Cause register. Functional interrupts are reset by the device immediately after the Interrupt Cause register is read (Auto Clear).
When reading the Interrupt Cause register, all interrupts in the register are cleared, including the masked interrupts. When an interrupt occurs, the corresponding Interrupt Cause is set, regardless of its interrupt mask bit.
Summary interrupts always reflect the status of the corresponding Cause register. When the Interrupt Cause register in the functional block is read, the summary bit reflecting the unmasked bit of that functional block is cleared. When reading the Global Interrupt Cause register, the summary interrupts within the register are not reset. To reset the Global Interrupt Cause register, all the summary interrupts must be cleared by reading the corresponding local Cause register.
6.6.3
Interrupt Coalescing
When CPU interrupts are generated at extremely high rates, the CPU is unable to achieve any useful processing. Interrupt coalescing forces a minimum time (i.e., clock cycles) between successive hardware interrupts from being generated to the CPU. Unmasked interrupts that occur during this forced delay will generate a hardware interrupt at the end of the delay, after which a new delay period begins. Only the generation of the hardware interrupt is delayed by this mechanism. The interrupt cause bits are still set when the interrupt occurs. The device has a global configurable interrupt coalescing period, which may be changed dynamically during normal operation. The default value is 0, i.e., a hardware interrupt is generated as soon as any interrupt cause bit is set. This period value defines the delay period in 64 clock cycles resolution.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 99
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
On devices that do not incorporate a PCI Interface or when the PCI Interface is used as the management interface, the PCI_INTn pin must be used as the device’s interrupt signal. When the PCI interface is not used, the INTn pin must be used as the device’s interrupt signal.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Interrupts
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Note
If the coalescing period is modified in the middle of an old period, the device will finish the current period and then, upon the next global summary interrupt, it will start working according to the new period.
To limit the minimal IDLE period between two consecutive assertions of the PCI_INTn pin or the INTn pin, set the field in the Interrupt Coalescing Configuration Register (Table 95 p. 384) accordingly.
6.6.3.1
Interrupt Coalescing Override Due to Ports Interrupt
Typically, a Link Change event on one of the device’s ports must be addressed immediately by the Host interface. The device enables override of the coalescing IDLE period if one of the port’s links has changed and its Link Change Interrupt is not masked. Once a port’s link status has changed, the Interrupt pin is asserted immediately, regardless of the configured coalescing IDLE period.
Configuration
To enable/disable Interrupt Coalescing IDLE Period override due to a link change, set the bit in the Interrupt Coalescing Configuration Register (Table 95 p. 384)
R
General Purpose Pins (GPP)
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
6.7
This section is relevant for the following devices:
D Layer 2+ Stackable: 98DX166, 98DX130, 98DX246, 98DX250, 98DX260, 98DX270, 98DX803 D SecureSmart Stackable: 98DX169, 98DX249, 98DX269 D Multilayer Stackable: 98DX133, 98DX167, 98DX247, 98DX253, 98DX263, 98DX273 U
6.7.1
Not relevant for the SecureSmart devices.
GPP Overview
The device includes eight general purpose pins (GPP), which are very useful for system design. These pins receive indications from the device’s environment, e.g., from the connector and/or the backplane of a chassis. The GPPs can also be used to drive indications from the device to the peripheral devices. They can be used as inputs, outputs, or edge-sensitive interrupts.
The pins are 2.5V or 3.3V LVCMOS-compatible with LVTTL, however they are not 5V tolerant (see Prestera® Hardware Specification).
6.7.2
Working with GPPs
6.7.2.1
Control I/O Direction
Each of the eight GPPs may be configured as an input or an output.
MV-S102110-02 Rev. E Page 100
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
To configure the direction of the GPPs, set the field in the GPP I/O Control Register (Table 112 p. 391) accordingly.
6.7.2.2
GPP as Input
As the device has an Internal pulldown resistor of 50 kilohms for each of the GPPs, GPP input that is to be connected to LOW voltage level (logical ZERO) may by left unconnected and will be reflected as logical ZERO in the GPP Input Register (Table 111 p. 391). If the GPP acts as an output, the corresponding value in the GPP Input register is meaningless.
6.7.2.3
GPP as Output
When a GPP acts as an output, the value of the GPP is driven with the value of the corresponding pin in the GPP Output Register (Table 110 p. 391).
6.7.2.4
GPP as Interrupt
When a GPP acts as an input, an interrupt can be enabled by unmasking the corresponding bit in in the GPP Interrupt Mask Register (Table 560 p. 796). The interrupt is conveyed to the corresponding bit in in the GPP Interrupt Cause Register (Table 559 p. 795).
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The interrupt is edge-sensitive and is asserted on every change of the input signal.
When the pin is set as an output, the GPP Interrupt Mask register masks the corresponding bit.
6.7.2.5
Disabling the GPP
By default, GPPs are set as inputs.
As the device has an internal pulldown resistor of 50K ohm for each of the GPPs, an unused GPP may be left unconnected.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 101
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
When a GPP acts as an input, the value of the GPP pin is reflected in the GPP Input Register (Table 111 p. 391).
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
General Purpose Pins (GPP)
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Section 7. CPU Traffic Management
In managed systems, it is critical that the CPU receive only traffic that requires software processing. Unwanted traffic unnecessarily burdens the CPU and delays handling of other traffic that requires processing. Furthermore, traffic that is sent to the CPU must be properly prioritized into separate queues. This allows the CPU to process high-priority traffic with minimum delay, even when overloaded with low-priority traffic The device supports the Marvell Secure Control Technology features for selecting only the required traffic to the CPU, as well as prioritizing and managing the bandwidth of traffic sent to the CPU.
7.1
CPU Port Number
The CPU is represented as port 63 (0x3F). The CPU port has the same configuration options and egress queueing/scheduling features as the physical MAC ports. Specifically, the CPU port has eight egress traffic class queues. Each traffic class can be assigned a minimum bandwidth using the WRR scheduler, and the maximum rate can be limited with a leaky bucket shaper. The aggregate traffic to the CPU can also be rate-limited with a leaky bucket shaper (Section 15.3 "Egress Bandwidth Management" on page 305). The CPU port must be configured as a cascade port (Section 4.1 "Cascade Ports" on page 44).
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
When it is configured as a cascade port, all packets sent to the CPU have the TO_CPU DSA tag providing information regarding the packet, to allow the CPU to inject FROM_CPU DSA-tagged packets to a specific destination and to receive TO_CPU DSA-tagged packets containing information about the packet (e.g. CPU code, source device, source port, etc.) (Section 7.2.2 "TO_CPU DSA Tag"). The CPU can inject FROM_CPU or FORWARD DSA-tagged packet to the CPU port (Section 7.3 "Packets from the CPU").
Note
Typical applications require the CPU port to be configured as a cascade port, to allow the CPU to inject FROM_CPU DSA-tagged packets to a specific destination, and to receive TO_CPU DSA-tagged packets containing information about the packet (e.g. CPU code, source device, source port, etc.).
7.2
Packets to the CPU
The device offers tight control of specific types of traffic to be sent to the CPU. In the device’s architecture, the CPU is not a member of the VLAN, so there is no implicit flooding of Unknown Unicast, Multicast and Broadcast traffic to the CPU port. A packet is sent to the CPU as a result of one of the following actions: • Packet is assigned a TRAP command by an ingress processing engine (Section 5.1.3 "TRAP Command" on page 53). • Packet is assigned a MIRROR command by an ingress processing engine (Section 5.1.2 "Mirror-to-CPU Command" on page 53).
MV-S102110-02 Rev. E Page 102
CONFIDENTIAL Document Classification: Restricted Information
Packet is assigned a FORWARD command to the CPU port 63 by an ingress processing engine (Section 5.1.1 "FORWARD Command" on page 52). Packet is selected for sampling to the CPU by the ingress or egress port sampling mechanism (Section 16.1 "Traffic Sampling to the CPU" on page 312).
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
•
The list of CPU codes is found in Appendix B. "CPU Codes" on page 343. The Policy engine rule action is capable of Mirroring or Trapping a packet to the CPU with a specific user-defined CPU code, see Section 10.6 "Policy Actions" on page 195. The Bridge engine has many mechanisms for Mirroring or Trapping a packet to the CPU. This includes BPDUs (Section 11.3.1 "Trapping BPDUs" on page 221) and other well-known control packets (Section 11.8 "Control Traffic Trapping/Mirroring to the CPU" on page 244). In addition, the bridge VLAN configuration can Mirror or Trap to the CPU various kinds of flooded traffic: unknown Unicast, unregistered IPv4/6 and non-IP Multicast, unregistered IPv4 Broadcast (Section 11.11.1 "Per-VLAN Unknown/Unregistered Filtering Commands" on page 253).
7.2.1
CPU Code Table
•
Packet truncation on the CPU port
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The CPU Code table is a 256-entry table that defines for each CPU code a set of configurable attributes controlling how the packet is sent to the CPU. The CPU code attributes are: • Packet device destination to CPU port • Packet QoS on the CPU port • Packet statistical sampling to CPU port These attributes are described in detail in the subsequent subsections.
Packet Device Destination to the CPU Port
In a single device system, packets are sent to the CPU via the device host interface (Section 6. "Host Management Interfaces" on page 55). In a cascaded system, however, it may not always be desirable for the packets to be sent to the CPU attached to the local device. For example, a CPU attached to one of the devices may serve as a master CPU for the system. Packets with a specific CPU code should be sent directly to that CPU. The device allows a CPU Destination Device table containing up to seven device numbers.
The CPU Code table entry has a 3-bit field , which serves as an index into a CPU Destination Device table. The value of 0 is reserved to indicate that the packet is sent to the local device CPU port. For a given CPU Code table entry, a value of 1–7 sends CPU traffic to the device with the corresponding number in the CPU Destination Device table. This allows distributed processing of protocols by multiple CPUs in the system, e.g., BPDUs are sent to the device attached to CPU #1, and GVRP PDUs are sent to the device attached to CPU#2.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 103
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Packets sent to the CPU are assigned an 8-bit CPU code, which indicates the reason the packet is sent to the CPU. In addition, the CPU code determines the attributes controlling how the packet is sent to the CPU, according to the CPU code table (Section 7.2.1 "CPU Code Table").
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Packets to the CPU
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Configuration To configure CPU Destination Device table entries, set the CPU Target Device Configuration Register0 • (Table 443 p. 702) and CPU Target Device Configuration Register1 (Table 444 p. 702) accordingly. To configure the destination CPU device index per CPU code, set the field in the respective entry in the CPU Code Table Entry (0<=n<256) (Table 437 p. 698) accordingly.
•
Like all ports in the device, the CPU port supports eight traffic class egress queues and two levels of drop precedence. (SecureSmart devices have 4 traffic classes.) The CPU Code table associates a traffic class and drop precedence with each CPU code. This allows differentiation between different kinds of traffic to the CPU on a per CPU code basis, in terms of the traffic class assignment and the drop precedence. The following table provides an example mapping of traffic type to traffic class:. Table 28:
CPU Code
Tr affic Type
• •
CPU TO CPU MAIL FROM NEIGHBOR CPU
CPU to CPU management traffic in a multi-CPU cascaded system (Section 7.3 "Packets from the CPU" on page 107).
•
BPDU TRAP
Bridge Spanning Tree BPDUs.
•
IEEE RESERVED MULTICAST ADDR TRAP/MIRROR
IEEE Layer 2 control protocols (e.g. BDPUs, GARP, LACP).
4
•
MAC TABLE ENTRY TRAP/MIRROR
Unicast packets whose MAC DA is the CPU MAC Address. This includes ARP reply, and IP management protocols, e.g., SNMP, Telnet, and HTTP.
The traffic class and drop precedence assignment is applied to the queueing on the CPU port only. If the packet is sent over a cascade port with a TO_CPU DSA tag, the packet is assigned the cascade control traffic class and drop precedence (Section 8.5.3 "Setting QoS Fields on Cascaded Ports" on page 127). If the packet is CPU-to-CPU, the traffic class and drop precedence assignment on the destination CPU port is taken from the FROM_CPU DSA tag and not from the CPU Code table. This applies to the case where the destination port is the CPU port 63 and when the FROM_CPU field is set (Section "CPU Mailbox to Neighbor CPU Device" on page 108).
Configuration
In the CPU Code Table Entry (0<=n<256) (Table 437 p. 698): • To configure the CPU Code table entry with the traffic class assignment, set the field accordingly. • To configure the CPU Code table entry with the drop precedence assignment, set the field accordingly.
Statistical Sampling
The device supports statistical sampling of packets sent to the CPU on a per-CPU code basis. Each CPU Code table entry contains a 6-bit pointer to one of thirty-two 32-bit sampling thresholds.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
A pseudo-random number ranging from 1 to 232-1 is generated for every packet to be sent to the CPU.
If the sampling threshold pointed to by the CPU Code entry is less than the pseudo-random number generated for the packet, the packet is not sent to the CPU. (If the packet command is MIRROR, it is still sent to the network ports.) If the maximum sampling threshold value of 0xFFFFFFFF is configured, all packets with the given CPU code are forwarded to the CPU. Conversely, if the minimum sampling threshold value of 0x0 is configured, no packets with the given CPU code are forwarded to the CPU. (Note that the pseudo-random number is always > 0.) In a cascaded system, a CPU code statistical sampling mechanism is applied only to packets received from network ports (i.e., the ingress device), and not to DSA-tagged packets received on cascade ports. This mechanism can be used to sample to the CPU a statistical percentage of an arbitrary traffic flow that is identified by the Policy engine. The Policy engine command can be to MIRROR the packet with a user-specified CPU code. The corresponding CPU code table entry is configured to the desired sampling percentage.
Note
The CPU code statistical sampling mechanism is orthogonal to the port statistical sampling mechanism (Section 16.1 "Traffic Sampling to the CPU" on page 312).
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 105
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Notes
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Packets to the CPU
field accordingly in the CPU Code Table Entry (0<=n<256) (Table 437 p. 698). To configure the set of 32 sampling threshold profiles, set the Statistical Rate Limits Table Entry (0<=n<32) (Table 440 p. 700) accordingly.
Packet Truncation
Some statistical sampling applications only require the packet header information and not the entire packet data. This conserves host memory required for queueing received packets. The device allows packets to the CPU to be truncated to 128 bytes, on a per CPU code basis. The TO_CPU DSA tag contains a flag indicating the received packet was truncated to 128 bytes. In addition, the original packet length is reported in the TO_CPU DSA tag (Section 7.2.2 "TO_CPU DSA Tag").
Configuration
To configure the CPU Code table entry to truncate packets to 128 bytes, set the bit in the respective entry in the CPU Code Table Entry (0<=n<256) (Table 437 p. 698).
7.2.2
TO_CPU DSA Tag
When the CPU port is configured as a cascade port, all packets sent to the CPU are DSA-tagged TO_CPU.
– – – –
Ingress or egress port number Ingress or egress device number Ingress or egress VLAN Ingress or egress tag CFI bit
Notes • •
For caveats about the TO_CPU ingress port and device fields, see "CPU to CPU" on page 108 and "CPU Mailbox to Neighbor CPU Device" on page 108. For further information about DSA tag TO_CPU format, see A.1 "Extended DSA Tag in TO_CPU Format" on page 333.
MV-S102110-02 Rev. E Page 106
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The TO_CPU DSA tag provides the following important packet attributes to the application: • 8-bit CPU code: Indicates the mechanism that caused the packet to be sent to the CPU (Appendix B. "CPU Codes" on page 343). • Truncation flag: Indicates that the packet was truncated to 128 bytes ( "Packet Truncation" on page 106). • Original packet byte length: Packet length (independent of truncation). • Ingress/egress attributes: The following attributes are defined in the TO_CPU DSA tag. These attributes reflect either the packet ingress or packet egress, depending on whether the packet was sent to the CPU by the ingress or egress pipeline.
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The CPU can inject packets into the device using the FROM_CPU or FORWARD DSA tags.
7.3.1
FROM_CPU DSA Tag
The FROM_CPU DSA tag allows the CPU to specify the following packet attributes: • Packet destination The packet destination can be a single-target device/port, a multi-target destination of either a VLAN or Multicast group, or a neighboring device CPU port (useful for device/topology discovery). Single-destination packet is to be sent on a network port as VLAN-tagged or Untagged If set, the single-target packet is sent on the egress port with a VLAN tag. This flag is not applied to Multicast packets, which are sent tagged/untagged according to the VLAN tagged state.
•
Packet 802.1Q VLAN tag fields: VID, CFI, and user priority These are the VLAN tag values used if the packet is transmitted VLAN-tagged.
•
Packet traffic class and drop precedence. These attributes are applied on the network destination port(s) and not the cascade ports.
•
Use Control traffic class on cascade port If set and the packet is sent to cascade port(s), the packet is assigned the control traffic class and drop precedence on the cascade port. If disabled, the packet traffic class and drop precedence is mapped to a cascade port traffic class and drop precedence (Section 8.5.3 "Setting QoS Fields on Cascaded Ports" on page 127).
•
Egress Filter Enable If set, the packet is subject to VLAN egress filtering and spanning tree state egress filtering. Specifically, when injecting BPDUs, this field should be disabled.
•
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
•
Source-ID This is the source-ID used for egress Multicast source-ID filtering (Section 11.14 "Bridge Source-ID Egress Filtering" on page 257).
•
Exclude device/port or trunk group This option allows a Multicast packet to be injected while excluding a specific device/port or trunk group.
•
Mailbox to neighbor device CPU This option is used for topology discovery prior to knowing the device number assignment of other devices in a cascaded system. The CPU specifies the local cascade port through which to send the packet and the receiving device automatically sends the packet to the CPU.
Note
For further information about DSA tag TO_CPU format, see A.1 "Extended DSA Tag in TO_CPU Format" on page 333. The FROM_CPU packet destination options are discussed in the following subsections.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 107
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
The CPU can inject packets with a specific destination and QoS using the FROM_CPU DSA tag.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Packets from the CPU
CPU to Network
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The CPU can send a packet to any network device/port, VLAN group, or Multicast group. To set the destination to a VLAN group, the Multicast group index (VIDX) is set to a special value 0xFFF (Section 11.5 "Bridge Multicast (VIDX) Table" on page 240).
If the destination is a VLAN or Multicast group in a cascaded system, the source-ID is used to prevent forwarding loops in the cascade topology (Section 4.3 "Multi-Target Destination in a Cascaded System").
CPU to CPU
In a cascaded system, the CPU can send a packet to the CPU port on any device in the system. The FROM_CPU DSA tag destination port field is set to the CPU port 63. The destination device is set to the device number on which the CPU port resides. The packet is sent across the cascade port as a FROM_CPU DSA-tagged packet. When it arrives at the destination device, the packet is converted to a TO_CPU packet with the following settings: • CPU code = CPU TO CPU • Source device = FROM_CPU • Source port = Undefined (based on the CPU code, it can be determined that the source port is the CPU port on the source device)
CPU Mailbox to Neighbor CPU Device
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The packet is queued on the CPU port according to the traffic class and drop precedence defined in the FROM_CPU DSA tag (and not according to the CPU Code table).
In cascaded systems the mailbox mechanism allows a CPU to communicate with its adjacent neighbor CPU prior to having learned the device numbers in the system and programming the Device Map table. To send a Mailbox packet to the CPU attached to the adjacent device on a cascade port, the FROM_CPU DSA tag must have the Mailbox field set, the destination device must be set to the local device number, and the destination port is set to the cascade port through which the packet is to be sent. When received by the neighboring device, the packet is converted to a TO_CPU packet with the following settings: CPU code=MAIL FROM NEIGHBOR CPU Source device = local device number Source port = local cascade port from which the packet was received
• • •
The packet is queued on the CPU port according to the traffic class and drop precedence defined in the FROM_CPU DSA tag.
7.3.2
FORWARD DSA Tag
The CPU can inject packets to the ingress pipeline using the FORWARD DSA tag.
The motivation for injecting packets with the FORWARD DSA tag is to allow the ingress pipeline to make all the forwarding, filtering, and QoS decisions. The packet is processed just as any FORWARD DSA-tagged packet is processed when received on a cascade port. The fact that the packet is received from the CPU port does not affect the way it is processed by the ingress pipeline.
MV-S102110-02 Rev. E Page 108
CONFIDENTIAL Document Classification: Restricted Information
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
If the destination is a VLAN or Multicast group, it is possible to exclude a specific device/port or trunk group from the flood domain. This is useful if the packet was initially trapped to the CPU and now needs to be re-injected to the VLAN, but it should not be resent on its original source port.
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The forwarding destination and QoS attributes of the FORWARD DSA-tagged packet are irrelevant when injected by CPU. The packet forwarding decision is assigned by the ingress processing engines enabled on the CPU port (e.g., policy, bridge, and router).
The basic attributes of the FORWARD DSA tag that must be set by the CPU are: • Source device/port or trunk group: This determines Source Address location if learned by the bridge engine. • VLAN assignment: If set to 0, the packet is assigned a VID by the ingress pipeline. • User Priority assignment. • Egress filtering source-ID. This is used to prevent loops in cascaded systems (Section 4.3 "Multi-Target Destination in a Cascaded System" on page 46). The remaining attributes are assigned by the ingress pipeline.
Note
For further information about FORWARD DSA tag format, see A.4 "Extended DSA Tag in FORWARD Format" on page 341.
7.3.3
Ethernet Frame Alignment
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Currently, IP stacks require that IP and TCP/UDP headers be 32-bit aligned. Ethernet headers preceding the IP header generally have a 14-byte length (6 MAC DA, 6 MAC SA and 2 bytes for EtherType). Ethernet headers may also have an 18-byte length (802.1Q tagged), 22-byte length (with LLC/SNAP) or 26-byte length (802.1Q tagged with LLC/SNAP). In any case, they are never aligned to multiples of 32-bits. The device enables the pre-pending all packets forwarded to the Host CPU, via the PCI Interface or the CPU Port MAC, with 2 bytes to 32-bit align the IP and TCP/UDP.
Configuration
To enable/disable prepending a header of 16 bits to packets forwarded to the host CPU, set the bit in the Cascading and Header Insertion Configuration Register (Table 528 p. 770).
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 109
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
The packet QoS Profile is assigned by the ingress processing engines when the CPU port is configured with set to 0 (Section 8. "Quality of Service (QoS)" on page 110).
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
Packets from the CPU
This section describes how QoS is implemented in device. Quality of Service (QoS) provides preferential treatment to specific traffic possibly at the expense of other traffic. Without QoS, the device offers best-effort service to each packet and transmits the packets without any assurance of reliability, delay bounds, or throughput. Implementing QoS in a network makes performance more predictable and bandwidth utilization more effective. QoS implementation in the device supports the IETF-DiffServ and IEEE-802.1p standards. The typical QoS model is based on the following: • At the network edge, the packet is assigned to a QoS service. The service is assigned based on the packet header information (i.e., the packet is trusted) or on the ingress port configuration (packet is not trusted). • The QoS service defines the packet’s internal QoS handling (e.g., traffic class and drop precedence) and optionally the packet’s external QoS marking through either the 802.1p User Priority and/or the IP header DSCP field. • Subsequent devices within the network core provide consistent QoS treatment to traffic based on the packet’s 802.1p or DSCP marking. As a result, an end-to-end QoS solution is provided. • A device may modify the assigned service if a packet stream exceeds the configured profile. In this case, the packet may be dropped or reassigned to a lower QoS service.
8.1
QoS Model
This section describes the QoS model.
8.1.1
Traffic Types
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
The device incorporates the required QoS features to implement network-edge as well as network-core switches/routers: • The device provides flexible mechanisms to classify packets into as many as 72 different services. • Up to 256 traffic policers may be used to control the maximum rate of specific traffic flows. (The SecureSmart devices support four Ingress Traffic Policers per port.) • The packet header may have its User Priority and/or DSCP set to reflect the QoS assignment. • Service application mechanism is based on eight egress priority queues per port (including the CPU port), on which congestion-avoidance and congestion-resolution policies are applied.
The device classifies incoming traffic into data, control, and mirror-to-analyzer. Table 29 describes these traffic types.
To assure predictable performance, as well as to simplify configuration, the device uses different mechanisms to assign QoS service to each traffic type. In addition, whenever different traffic types compete for shared resources (e.g., egress queuing resources), the device can be configured to control the degree of cross-interaction between different traffic types, by limiting or completely eliminating resource sharing. See Section 8.4.1 "Traffic Class and Drop Precedence Assignment" for further details on assignment of QoS per traffic type. See Section 15. "Bandwidth Management" on page 302 for further details about configuring queuing resources.
MV-S102110-02 Rev. E Page 110
CONFIDENTIAL Document Classification: Restricted Information
Data packets are defined as either: • Network-to-Network traffic: Packets received on a cascade port with DSA tag = FORWARD, or packets received on a network port that are assigned a FORWARD or MIRROR TO CPU or • Data traffic from the CPU: Packets with a DSA tag = FROM_CPU and DSA tag flag is clear. See Appendix A. "DSA Tag Formats" on page 333 for further details about DSA tag encapsulation formats. Traffic classified as Data is subject Initial QoS Marking as described in 8.2 "Initial QoS Marking" on page 115
Control
Control packets are defined as one of the following: • Packet to the CPU: Packets sent to the CPU due to trapping, mirroring or forwarding a Data packet to the CPU by one of the ingress pipe, or egress pipe engines or
–
Packets received via a cascading port with a DSA tag = TO_CPU or
–
Mirrored to Analyzer Port
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
–
CPU-to-CPU traffic: Packets received via a cascading port with a DSA tag = FROM_CPU and are destined to another CPU • Control packet from the CPU: Packets sent by the CPU with DSA tag = FROM_CPU and DSA tag flag is set and are not targeted to another port Traffic Classified as control traffic is not subject to initial QoS Marking as described in: 8.2 "Initial QoS Marking" on page 115 and is not subject to policing as described in 8.3 "Traffic Policing" on page 121 Traffic Classified as control is assigned with a TC and DP as described in 8.4.1.2 "Control Packet Traffic Class and Drop Precedence Assignment" on page 124 Packets mirrored to an analyzer port are defined as either: • packets received on a cascade port with a DSA tag = TO_ANALYZER, or • packets that are duplicated for either ingress or egress mirroring to analyzer port. An additional condition is that the target analyzer port is NOT the CPU port on the local device. This would cause the packet to be treated as Control. Traffic classified as Mirrored to Analyzer Port is not subject to initial QoS Marking as described in: 8.2 "Initial QoS Marking" on page 115 and is not subject to policing as described in 8.3 "Traffic Policing" on page 121 Traffic classified as Mirrored to Analyzer Port is assigned with a TC and DP as described in 8.4.1.3 "Mirrored Packet Traffic Class and Drop Precedence Assignment" on page 125.
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 111
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
Traffic Types
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Table 29:
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
QoS Model
8.1.2
QoS Processing Walkthrough
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
QoS processing by the device is illustrated in Figure 19. The processing is described in four stages, where the first two stages—Ingress QoS Initial Marking and Ingress Traffic Policing and QoS Remarking —are performed in the ingress pipeline and the latter two stages—QoS Enforcement and Packet QoS Marking—are performed in the egress pipeline.
C o n tro l o r M irro re d to A n a ly z e r P o rt P a c k e t
D a ta P a c k e t
In g re s s Q o S In itia l M a rk in g
In g re s s T ra ffic P o lic in g a n d Q o S R e m a rk in g
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
E g re s s Q o S E n fo rc e m e n t
E g re s s S e ttin g o f P a c k e t H e a d e r Q o S F ie ld s
D a ta P a c k e t E g re s s
8.1.2.1
QoS Initial Marking
QoS initial marking associates every packet classified as Data with a set of QoS attributes that determines the QoS processing by subsequent stages. For further details see Section 8.2 "Initial QoS Marking".
Note
Traffic classified as Control or Mirrored to Analyzer Port (see: 8.1.1 "Traffic Types" on page 110) is not subject to QoS Initial Marking.
MV-S102110-02 Rev. E Page 112
CONFIDENTIAL Document Classification: Restricted Information
Traffic classified as Control or Mirrored to Analyzer Port (see: 8.1.1 "Traffic Types" on page 110) is not subject to Traffic Policing and Qos Remarking.
8.1.2.3
QoS Enforcement
In the device, QoS enforcement utilizes eight priority egress queues per port. Congestion avoidance and congestion resolution techniques are used to provide the required service. For a detailed description of QoS enforcement see Section 8.4 "QoS Enforcement".
8.1.2.4
Setting Packet Header QoS Fields
The device supports setting or modifying the packet header 802.1p User Priority and/or IP-DSCP. For a detailed description see 8.5 "Setting Packet Header QoS Fields".
8.1.3
Packet QoS Attributes
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Every packet Classified as Data is assigned a set of QoS attributes, which can be modified by each ingress pipeline engine. Each of the ingress pipeline engines contain several Initial QoS Markers, which assign the packet’s initial QoS attribute, as described in Section 8.2 "Initial QoS Marking". The ingress pipeline engine also contains a QoS Remarker which can modify the initial QoS attributes, as described in Section 8.3 "Traffic Policing". The packet QoS attributes are defined in the following table:. Table 30:
Packet QoS Attributes
Q o S P a r a m e te r
De s c r ip ti o n
QoS Precedence
QoS precedence. The device incorporates multiple QoS markers operating in sequence. As a result, a later marker overrides an earlier QoS attribute assignment. By setting the QoS Precedence flag to HARD, a QoS marker can prevent modification of packet QoS attributes by subsequent QoS markers. 0 = SOFT QoS Precedence: Subsequent QoS markers can override the existing packet QoS attributes assigned by a previous marker. 1 = HARD QoS Precedence: Subsequent QoS markers cannot override the existing QoS attributes that were assigned by a previous marker. NOTE: The traffic policer remarker can modify packet QoS attributes regardless of the QoS precedence value.
QoS Profile index. Ranges from 0–71. See Section 8.1.4 "QoS Profile".
CONFIDENTIAL Document Classification: Restricted Information
MV-S102110-02 Rev. E Page 113
4duun-fnzjmsxe * Opnet Technologies * UNDER NDA# 12101786
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
If enabled on a policy-based traffic flow and if the packet is classified as Data, the policer meters the given flow according to a configurable rate profile and classifies packets as either in-profile or out-of-profile. Out-of-profile packets may be discarded or have their QoS attributes remarked. For further details see Section 8.3 "Traffic Policing".
M
MARVELL CONFIDENTIAL - UNAUTHORIZED DISTRIBUTION OR USE STRICTLY PROHIBITED
QoS Model
Packet QoS Attributes (Continued) De s c r ip ti o n
Modify DSCP
Enable Packet DSCP field modification: 0 = Packet DSCP field is not modified when the packet egresses the device. 1 = Packet DSCP field is modified to the value of the QoS Profile entry for the packet’s QoS Profile index.
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Q o S P aram e te r
NOTE: This flag is valid only for IPv4/6 packets. See Section 8.5.2 "Setting the Packet IP Header DSCP Field" for a detailed description of packet DSCP field modification rules.
Modify User Priority
Enable packet 802.1p-User Priority field modification. 0 = Packet User Priority is preserved when the packet egresses the device. 1 = Packet User Priority field is modified to the value of the QoS Profile entry for the packet’s QoS Profile index, when the packet egresses the device. NOTE: This flag is relevant only if a VLAN tag is added or modified when the packet egress the device. See Section 8.5.1 "Setting the Packet Header 802.1p User Priority Field" for a detailed description of packet User Priority field modification rules.
Note
8.1.4
QoS Profile
The device supports up to 72 QoS Profiles.
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Traffic classified as Control or Mirrored to Analyzer Port (see: 8.1.1 "Traffic Types" on page 110) is not assigned with Qos Attributes.
Every packet that is classified as Data is assigned QoS attribute , which is used by the egress pipeline to apply the QoS service. The QoS Profile index is used as a direct index, ranging from 0 to 71, into the global QoS Profile table. Each entry in the QoS Profile table contains the set of attributes defined in Table 31. Table 31:
QoS Profile Table Entry
Qo S Pr ofile field
Desc ription
TC
Traffic class queue assigned to the packet.
DP
Drop precedence assigned to the packet.
UP
If the packet QoS attribute is set, and the packet is transmitted tagged, this field is the value used in the packet 802.1p User Priority field. If the packet was received tagged, the existing User Priority is modified with this value.
DSCP
If the packet QoS attribute is set, and the packet is IPv4 or IPv6, this field is the value used to modify the packet IP-DSCP field.
MV-S102110-02 Rev. E Page 114
CONFIDENTIAL Document Classification: Restricted Information
AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
In a cascaded system, the QoS Profile is conveyed between devices through the extended FORWARD DSA tag. To ensure consistent QoS treatment within the system, the application must configure QoS Profile attributes consistently in every device in the system.
Note
Traffic classified as Control or Mirrored to Analyzer Port (see: 8.1.1 "Traffic Types" on page 110) is not assigned with a QoS Profile
8.2
Initial QoS Marking
Initial QoS marking provides various methods of assigning QoS attributes to packets classified as Data.(Section 8.1.3 "Packet QoS Attributes").The device supports the initial QoS markers described in the following sub-sections, listed according to their sequential order in the ingress pipeline. Note
8.2.1
M AR VE 4du LL un CO -fn NF zjm ID sx EN e TI * O AL pn , U et ND Te ER chn ND olo A# gie 12 s 10 17 86
Traffic classified as Control or Mirrored to Analyzer Port (see: 8.1.1 "Traffic Types" on page 110) is not subject to Initial Qos Marking, thus, the following sub-sections are not relevant for Control and for Mirrored to Analyzer Port traffic
Port-Based QoS Marking
The port-based QoS marker provides port-based assignment of the QoS attributes: • QoS Precedence • QoS Profile index • Modify DSCP flag • Modify User Priority flag See Section 8.1.3 "Packet QoS Attributes" for descriptions of these fields.
In addition, port-based QoS marking supports a default port 802.1p User Priority assignment.
Note
Applications of the port-based marker, as well as recommended configuration, are described in Section 8.6.1 "Applications of Port-Based QoS Marker" and Section 8.6.2 "QoS Configuration of Cascade Ports". Assignment of the packet QoS attributes ,