Challenges in Online Charging System Virtualization

A project assignment within Dataföreningen Kompetens - Certifierad IT-arkitekt Master, K54

CHALLENGES IN ONLINE CHARGING SYSTEM VIRTUALIZAT VIRTUALIZATION ION Magnus Abrahamsson

Challenges in Online Charging System virtualization

P a g e | 2

This paper examines and describes various virtualization architectures, its challenges for soft real-time, and effects on an Online Charging System (OCS) deployment architecture therein.

CHALLENGES IN ONLINE CHARGING SYSTEM VIRTUALIZATION

Author: Magnus Abrahamsson, Solution Architect Charging & Rating E-mail address: [email protected] [email protected] Study counselor: Michael Thurell (DataFöreningen) Date: 2016-02-15 Version : 4.0


P a g e | 3

1 Abstract Virtualization technology and Cloud com puting have revolutionized general-purpose computing applications in the past decade. The cloud paradigm offers advantages through reduction of operation costs, server consolidation, flexible system configuration and elastic resource provisioning. However, despite the success of cloud computing for general-purpose computing, existing cloud computing and virtualization technology face tremendous challenges in supporting emerging real-time applications such as online-charging, online video streaming, and other telecommunication management. These applications demand real-time performance in open, shared and virtualized computing environments. This paper studies various virtualization architecture to identify the technical challenges in supporting real-time applications therein, focusing on Online-Charging. It also surveys recent advancement in real-time virtualization and cloud computing technology, and research directions to enable cloud-based real-time applications in the future.

Keywords: Visualization, Cloud computing, Hypervisor, OCS, real-time, NFVI, Charging, distributed system.


P a g e | 4

TABLE OF CONTENTS 1

Abstract ........................................................................................................................................ 3

2

Terminology .................................................................................................................................. 6

3

Introduction .................................................................................................................................. 8

3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8

Background ............................................................... .............................................................. .... 8 Concerns ..................................................................................................................................... 8 Objectives ................................................................................................................................... 9 Scope .......................................................................................................................................... 9 Stakeholders ............................................................................................................................... 9 Methods ......................................................... .............................................................. ............... 9 Note to the reader .................................................................................................................... 10 Outline ...................................................................................................................................... 10

4

Current situation......................................................................................................................... 11

5

Stakeholders and requirements .................................................................................................. 11

5.1 5.2 6

Online Charging System .............................................................................................................. 12

6.1 6.2 7

Virtual machine architectures................................................................................................... 16 Virtualization techniques and performance characteristics ..................................................... 18 Know the performance curve......................................................... ........................................... 22

Real-time challenges in cloud computing .................................................................................... 23

8.1 9

Current OCS deployment .......................................................................................................... 15 Service availability classification level ...................................................................................... 16

Cloud computing and virtualization ............................................................................................ 16

7.1 7.2 7.3 8

Business requirement ............................................................................................................... 11 None-functional requirement (Quality attributes) ................................................................... 11

Terminology mapping ......................................................... ...................................................... 23

Network Functions Virtualization ............................................................................................... 24

9.1 9.2 9.3

NFV framework ......................................................... .............................................................. .. 25 VNF ........................................................................................................................................... 25 NFV Management and Orchestration ...................................................................................... 28

10

NFV challenges............................................................................................................................ 29

11

Solution architecture .................................................................................................................. 31

11.1 11.2 11.3

Generic VNF Cluster Architecture ............................................................................................. 32 Deploying OCS as VNFs ....................................................... ...................................................... 33 Analyze of the Solution architecture ......................................................... ................................ 36

12

Future directions ........................................................................................................................ 37

13

Conclusions ................................................................................................................................. 38


P a g e | 5

14

Discussions ................................................................................................................................. 39

15

References .................................................................................................................................. 40

16

Appendix A: VMWare ................................................................................................................. 42

17

Appendix B : OpenStack – Tracker .............................................................................................. 42

18

Appendix C: MANO ar chitecture ................................................................................................. 43

19

Appendix D: OCS RTO calculati on & Latency ............................................................................... 44

P a g e | 6


2 Terminology 4G

4th generation (LTE)

ABMF

Account Balance Management Function

BE

Back-end

BRM

Billing and Revenue Management

BSS

Business Support systems

CAPEX

Capital Expenditures

CBA

Component Based Architecture

CDR

Charging data record

CPU

Central Processing Unit

CRM

Customer Relationship Management

CSP

Communications Service Provider

CTF

Charging Trigger Function

DCCA

Diameter Credit-Control Application

EM

Element Management

EPC

Evolved Packet Core

EPG

Evolved Packet Gateway (Ericsson)

ERP

Enterprise Resource Planning

ETSI

European Telecommunications Telecommunications Standards Institute

FE

Front-end

GGSN

Gateway GPRS Support Node

IaaS

Infrastructure-as-a-Service

IETF

Internet Engineering Task Force

IMS

IP Multimedia Subsystem

IT

Information Technology

KVM

Kernel-based Virtual Machine

LTE

Long-Term Evolution

MMS

Multi Media Message

MTAS

Multimedia Telephony Application Server

NaaS

Network-as-a-Service

NFV

Network Functions Virtualization

NFVI

NFV Infrastructure

NFV-MANO

NFV Management and Orchestration

NIC

Network Interface Controller

P a g e | 7


OCF

Online-Charging Function

OCS

Online-Charging System

OM

Operation & Maintenace

OPEX

Operating Expenditures

OS

Operating System

OSS

Operation Support System

PaaS

Platform-as-a-Service

PCI

Peripheral Component Interconnect

PL

Payload

QoS

Quality of Service

RT

Rating Function

RTO

Retransmission TimeOut

SaaS

Software-as-a-Service

SC

System Controllers

SCUR

Session Charging with Unit Reservation

SDP

Service Data Point

SLA

Service Level Agreements

SR-IOV

Single-Root I/O Virtualization

TCO

Total Cost of Ownership

TTM

Time to market

UDR

Usage Detail Record

vCPU

Virtual CPU

vDC

Virtual Data Center

VF

Virtual Functions

VIM

Virtualized Infrastructure Manager

VM

Virtual Machine

VMM

Virtual machine monitor (hypervisor)

VNF

Virtualized Network Function

VNFC

VNF Component

VNFM

VNF Manager

vNIC

Virtual NIC

VoLTE

Voice over LTE

VOMS

VOucher Management System


P a g e | 8

3 Introduction 3.1 Background The current online charging system (OCS) for TeliaSonera has provides real-time charging successfully since 2006 for three of the Nordic countries. It offers prepaid services for brands like Refill, Halebop, Netcom, Telia DK & Chess. These kind of systems has high demands on availability and performance to achieve their functions, which usually require software installed on proprietary hardware. The rapid pace of innovation for these types of hardware, OCS dependencies to it, and drive for cost efficient solutions has reduced the system life cycle. Today, resulting in a nightmare because of more or less a constant flow of tedious upgrades. Their consequences are not limited to increased costs and time consumption, it also risk limit the business delivery capabilities. As the industry demands, faster time to market (TTM) and reduced total cost of ownership (TCO) this is not sustainable in the long run and calls for some action. Ideas about migrating the system from f rom today’s native, dedicated hardware, into a privatecloud has been discussed. That would most likely simplify today’s life cycl e management of the system infrastructure, provide elastic scaling, reduce costs and bring other virtualization benefits. However, there are some doubts on whether a soft real-time system, like OCS, in a virtual environment can provide the same overall quality attributes. In particular reliability, availability, manageability, security and performance as the current setup.

3.2 Concerns The virtualized data centre are currently c urrently considered state-of-the-art technology in the Information Technology (IT) domain while in the telecom domain there are no widespread deployments yet. One key differentiator between the IT and telecom domains is the level of service continuity required. In the IT domain outages lasting seconds are tolerable and the service user typically initiates retries whereas in the telecom domain there is an underlying service expectation that outages will be below the recognizable level (i.e. in milliseconds), and service recovery is performed automatically. One of the disruptive technologies that is emerging in the area of telecom Cloud Computing and data centers architectures is the one of Network Functions Virtualization (NFV) [1]. Shifting from hardware-based provisioning of network functions to a software based provisioning where so-called virtualized network functions (VNF) [1] are deployed in private or hybrid clouds of communication service providers1 (CSP) [2]. 





1

Identifies the technical challenges in supporting real-time applications like OCS in the cloud. Survey recent NFV architect framework and cloud computing technology. How compliant is the OCS architecture to the Telco-cloud (NFV reference architecture)?

CSP is broad category encompassing telecommunications, entertainment and media, and Internet/Web service businesses.


P a g e | 9

3.3 Objectives The objectives of this paper are … …give an overview of the virtualization technology virtualization technology for cloud computing focusing on the realtime issues that appear therein. …identify real-time real-time challenges in cloud computing. …present a subset of selected examples that illustrate some of the design decisions taken by virtualization technology to integrate real-time support. …give an overview of the OCS architecture and architecture and the NFV framework. …proposing a deployment a deployment of for OCS as VNF in the Telco-cloud (NFVI) and identify architectural impact. …to learn more about the areas mentioned above.

3.4 Scope Because of the limited time I will focus on OCS’s OCS ’s integration integration points with the network which is handling real-time charging of IMS services, e.g. voice call over 4G and messaging. Its framework is regarded as future-proof for charging solutions. Online charging o f 2G/3G voice call are out of the scoop for this paper. So is top-up, voucher management and balance control. The deployment scenario in this paper will focus on a Monolithic Operator , where the same organization that operates the virtualized network functions deploys and controls the hardware and hypervisors they run on, and physically secures the premises in which they are located.

3.5 Stakeholders IT Solution owner IT solution manager Solution and Infrastructure architects CSP Vendor(s) Study counselor for the DF IT architecture training.

3.6 Methods Unstructured interviews with various stakeholder to explore requirements and get more knowledge. System observations to provide the most objective picture of reality as possible and an inductive approach. Literature studies and perusing of available material for information gathering about the theories to meet the objectives. o bjectives.


P a g e | 10

3.7 Note to the reader This paper can be read from start to end or pick a section of interest with help with the outline below.

3.8 Outline Section 5 overviews and explains various important requirements affecting O CS Section 6 offers an overview of OCS architecture, and how its link to the current solution. Section 7 offers an overview of the virtualization technology for cloud computing Section 8 identifies different real-time challenges in cloud computing Section 9 an overview of the technology and architecture for network function virtualization. Section 10 identifies VNF challenges in NFV, with suggest solutions and requirements. Section 11 proposing a solution architecture of for OCS as VNF in the NFVI Section 12 points out open areas of research related to NFV. Section 13 conclusion


P a g e | 11

4 Current situation Besides what we mention in the background section, converting charging has gained momentum during the last years as the IMS technology get wider spread. Requiring operators to integrate or replace there various types of charging systems, e.g. for fixed, mobile and broadband, with a new common one. The new common charging & rating system are often an OCS or similar. Furthermore, as the company planes for a new system is slowly taking shape, the legacy prepaid OCS are facing a major upgrade of the platform software version and its hardware during next year. The company can today provide a private-cloud with Platform-as-a-service (PaaS) and will soon have a “Telco-cloud “Telco-cloud as Infrastructure-as-a-Service (IaaS)” (IaaS) ” ready. These gives us different options for the technical implementations. What type of cloud is most suitable based on the Online Charging System requirement? If any? In a virtualized context where w here a multitude of virtual machines (V Ms) share the same physical hardware for providing a plethora of services with highly various performance requirements to independent customers/end-users customers/end-users brings many challenges, some of which can be tackled as summarized in this paper.

5 Stakeholders and requirements This section overview and explains various important requirements affecting OCS.

5.1 Business requirement According to the prepaid business policy should customer be able to use their services even though the charging is out of function. Negative customer account balance is prohibited as well, since different user surveys has indicated it to have negatively impact on the customer experience. This policy combination will risk revenue lost at any major disturbance related to the online charging service. service. If the real time cost-control is out of order the customer usage can potentially be higher than her/his account balance as the cost-control can no longer determine in real time whether the account balance are exhausted and/or cut customer usage. For flat rate (all you can eat) services there is no risk of revenue lost but for services like Pay As You Go and quota buckets post charging of accounts are not allowed to cause overdraw(negative balance). Other important services related to prepaid are; top-up (refill) and (refill) and balance control . A similar disturbance will affect those services as well. Customer will not be able check account balances and perform top-ups, e.g. top-up additional allowance or activate new services (e.g. 3 rd party like Spotify). The top-up, voucher management and balance control are out of the scoop for this paper.

5.2 None-functional requirement (Quality attributes) The online charging consider being a highly available service with availability requirement of 99.98%, including planned and unplanned maintenance. The service availability calculates, simplified, by dividing the total amount of network Usage Detail Records (UDRs) with the OCS charging data records (CDR). OCS and underlying functions ex pectation that response time should be predictable, o utages should be below the recognizable level (i .e. in milliseconds), and service recovery should be performed automatically.

P a g e | 12


One of the most important of these performance metrics quantifies are Retransmission Timeouts (RTOs). The table below show the figures for the OCS Rc/Re reference point (page 14 - Figure 3) which 3) which is one of many integration point end-to-end for a charging control tr ansition, e.g. setup a VoLTE call.

maximum RTO

200 ms

minimum RTO

80 ms

T ABLE 1 : SHOWING MAX / MIN /RE REFERENCE POINT 2 MIN RTO FOR RC

The charging and billing information is to be generated, processed, and transported to a desired conclusion in less than 1 second. The customer will expect the call to be setup in reasonable time, normally 3-4 seconds. Figure seconds. Figure 1 below illustrates the expected performance requirements.

F IGURE IGURE 1 : PERFORMANCE SCENARIO FOR SETTING UP A CALL .

6 Online Charging System This session offers an overview of the OCS architecture, its different components and how they interact, with focus on the real-time issues that appear therein. OCS allows Communications Service Provider (CSP) to charge their customers in real time (RT) based on service usage. It’s a soft RT application that demand certain degree of service level agreements in terms of real-time performance but doesn’t require hard real-time performance guarantees. The OCS system has integrations both with the business support systems (BSS), marked green in Figure in Figure 2, and 2, and the network layer. OCS integration with BSS is out of the scoop for this paper.

2

Statistics from the DCCA server showing showing max response time for CCR-I and number of PADS initial statistics from the current OCS solution. See Appendix See Appendix D: OCS RTO calculation & Latency for more details

P a g e | 13


Billing

CRM

ERP

systems

systems

systems

Online Charging Network layer

FIGURE 2 : SIMPLE DEPLOYMENT VIEW VIEW STACK OF ONLINE CHARGING .

The 3rd Generation Partnership Project (3GPP) Charging architecture divides OCS into three main functions; the online-charging function (OCF), account balance management function (ABMF) and (ABMF) and finally the rating function (RT). (RT). The different functions are described below: 





The Online Charging Function (OCF) consists (OCF) consists of two distinct charging modules, namely the Session Based Charging Function (SBCF) and (SBCF) and the Event Based Charging Function (EBCF). (EBCF). The charging events are forwarded from the CTF, outside of OCS, to the Online Charging Function (OCF) in order to obtain authorization for the chargeable event and/or network resource usage 3 requested by the end-user. The OCF communicates with the Rating Function in order to determine the value of the requested bearer resources or session, and then with the Account Balance Management Function to query and update the subscribers' account and counters status [3]. Account Balance Management Function (ABMF) is the location of the subscriber’s account balance within the OCS. E.g. Recharged/top-up money or account counters counters like free Giga bytes, free Calls/Message etc. The Rating The Rating Function (RF) determines the value of the network resource usage o n behalf of the Online Charging Function. To this end, the Online Charging Functio n furnishes the necessary information, obtained from the charging event to the RF and receives in return the rating output (monetary or non-monetary units) via the Re reference point. The RF may handle a wide variety of ratable instances [3].

The Charging Trigger Function (CTF) 4 is outside of OCS and is able to delay the actual resource usage until permission by the OCS has been granted. It also track the availability of resource usage permission ("quota supervision") during the network resource usage, and can enforce termination of the end user’s network resource usage when permission by the OCS is not granted or expires. From the online charging architecture perspective, the IMS gateway function (GWF), the Gateway GPRS Support Node (GGSN) or Telephony Application Server (TAS called MTAS in this paper) are examples of online charging capable CTFs. The Ro reference Ro reference point supports interaction between a Charging Trigger Function and an Online Charging Function. The information flow across this reference point are online charging events. The protocol crossing this reference point, called Diameter Credit-Control Application

3

Typical examples of network resource usage are a voice call of certain duration, the transport of a certain volume of data, or the submission of a MMS of a certain size

4

Part of networks provide functions that implement offline and/or online charging mechanisms on the bearer (e.g. EPC), subsystem (e.g. IMS) and service ( e.g. MMS) levels.

P a g e | 14


(DCCA) [4], supports the capabilities of real-time transactions in stateless mode (“event based charging”) and statefull mode (“session based charging”) [3]. Figure 3 below provides an overview of the online parts in a common charging architecture. OCS (Onlince Charging System) CN Domain ABMF

C T F

Service element

Ro/CAP Ro/C AP

O C F

R c

R e

RT

Subsystem

F IGURE IGURE 3 : C OMPUTATIONAL OMPUTATIONAL VIEW OF A UBIQUITOUS ONLINE CHARGING ARCHITECTURE

Figure 4 shows the transactions that are required on the Ro reference point in order to perform the Session Charging with Unit Reservation (SCUR), e.g. for IMS Voice Call or Data session. CTF

OCS/

1. Session Request

Reserve Units operation

2. CCR[INITIAL_REQUEST, RSU] 3. Perfor m Charging Control 4. CCA[INITIAL_REQUEST, GSU, [VT]]

5. Session Delivery

Reserve Units and Debit Units operations

6. CCR[UPDATE_REQUEST, RSU, USU]

7. Perform Charging Control 8. CCA[UPDATE_REQUEST, GSU, [FUI]]

9. Session Delivery

10. Session Terminate

Debit Units operation

11. CCR[TERMINATION_REQUEST, USU]

12. Perform Charging Control

13. CCA[TERMINATION_REQUEST, CI]

F IGURE -C ONTROL IGURE 4 : LOGICAL VIEW OF SCUR - SESSION BASED C REDIT REDIT -C ONTROL .

P a g e | 15


6.1 Current OCS deployment OCS has three tiers, a Front-End (FE) tier, a Back-End (BE) tier, and finally a database tier (also known as SDP). The Online Charging Function has been split into client/server components where the OCF-client is deployed in FE tier and the OCF-server in a BE tier. The FE are considered almost stateless 5 with no data centric parts, and the BE/SDP (for now on only referred to as BE) is stateful and by that have higher demands on transaction reliability and security. The BE also has to relate to the CAP theorem [5], which in short states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees; data consistency, availability, and partition tolerance. This mainly affects the account balance management function (ABMF) where there are high demands on the data consistency of the account balances at all times. The BE has a geo-redundancy Active/Standby high availability/disaster recovery model. The SDP is a relationship database. The FE are deployed geo-redundant active/active with support for horizontal scalability of the application layer.

OCS (Onlince Charging System) CTF

DCCA client

DCCA server

EPG

OCF client

Customer Module

EPG Gy Quota Server G y

Charging Client

Diameter R o

MTAS

ABMF

OCF server

MTAS Ro Quota Server Front-End

Charging Server

c R

RT

R e

Customer Rating

Rating Engine Back-End

BE-SDP

F IGURE IGURE 5 : C URRENT URRENT OCS DEPLOYMENT VIEW WITH 3GPP OCS ARCHITECTURE MAPPING

Front-End (FE): has two components; DCCA server and OCF client. The DCCA server communicates externally with the CTS DCCA client via the Ro (or Gy) integration point and internally with the OCF client, which externally communicates with the OCF server. Back-End (BE): has three components; OCF server, ABMF and RT. The BE can have one or many FE connected. OCF-server communicates externally with OCF c lient(s), and internally with the ABMF and RT. It also contain a Voucher Manager (VOM) component, but that is outside the scope of this paper. BE-Service Data Point (SDP): holds the data for Customer and Rating information in a relationship database.

5

Stateless during normal operation, stateful in case the front-end front-end can’t do online-charging online-charging against backend. The front-end then stores transaction and post-charge them as the connection to the back-end is established again.


P a g e | 16

6.2 Service availability classification level As in any service, the online charging needs to ensure the availability of its part of the end-toend. However, the relative availability of the service, the impact of failures is the second important factor for service continuity for the operator. I have identified OCS to have a service availability classification levels of 2B [6], because of the real-time network dependencies to provide continuous revenue. Although the OCS do not always demand hard time guarantees, it require the underlying virtualization layer supports low latency and provide adequate computational resources for completion within a reasonable or predictable timeframe. In the next section, we will overview the different virtualization technology for cloud computing, to assist in the choices of infrastructure.

7 Cloud computing and virtualization This session offers an overview of the virtualization technology for cloud computing focusing on the real-time issues that appear therein. The cloud computing paradigm provides a wide range of benefits through the presence of enablers, i.e., the virtualization technologies. It is seen as an efficient solution for optimum use of hardware, improved reliability and security. It has transformed the thinking from physical to logical, treating IT resources as logical resources rather than separate physical resources. In simple words, it is a technology that introduces a software abstraction layer between the underlying hardware i.e. the physical platform/host and the operating system(s) (OS), e.g. a guest virtual machine(s) (VM) including the applications running on top of it. This software abstraction layer is known as the virtual machine monitor (VMM) or a hypervisor. In practice, the specific virtualization type is always a trade-off among the proximity of control and access to the act ual underlying hardware platform, the performance offered to the hosted software, and the flexibility of the development and deployment. But to what extent and what can we do about it?

7.1 Virtual machine architectures Before describing the different types of virtualization that is provided within this section, it is necessary to explain the architecture of virtual machines that is, in fact, closely related to the architecture of a hardware platform.

7.1.1 Hypervisor types The design and implementation of the hypervisor is of highest importance because it has a direct influence on the throughput of VMs, which can be very close to that of t he bare machine hardware. Hypervisors are the primary technology of choice for system virtualization since it is possible to achieve high flexibility in how virtual resources are defined and managed. The replication provided by hypervisors is achieved by partitioning and/or virtualizing platform resources. There are two main types of hypervisors (Figure 6):

P a g e | 17


App App

App

App

Bin/libs

Guest OS M V

V

App

Bin/libs

Bin/libs

Guest OS M

App

Guest OS M V

Bin/libs

Guest OS M V

Hypervisor

Hypervisor

Host OS

Hardware

Hardware

F IGURE IGURE 6 : T YPE YPE 1 ( BARE BARE METAL ) HYPERVISORS VERSUS T YPE YPE 2 HYPERVISORS.





Bare metal or or type 1 that runs directly on the physical hardware platform. It virtualizes the critical hardware devices offering several independent isolated partitions. It also provides basic services for inter-partition control and communication. Type 2 or hosted hypervisors that run on top of an operating system that acts as a host. These are hosted hypervisors since they run within a conventional operating system environment. The hypervisor layer is typically a differentiated software level on top of the host operating system (that runs directly above the hardware), and the guest operating system runs at a different level.

Note: the distinction between these two types is not necessarily clear. For example, example, the Linux's Kernel-based Virtual Machine (KVM) that effectively convert the host operating system to a type 1 hypervisor, can also be categorized as type 2 hypervisor.



Conclusion: The type 1 hypervisors are suitable for real-time systems since its VMs are close to the hardware and able to use hardware resources directly rather than going through an operating system. It is more efficient than a hosted architecture and delivers greater scalability, robustness and performance. Therefor type 2 hypervisor is not in the scope for this paper.

Bare-metal hypervisors are divided into two subcategories: Monolithic and Monolithic and Micro-kernelized designs. The difference between them are the way of dealing with the device drivers. Example, Hyper-V (Microsoft) and Xen (open ( open source) are micro-kernelized hypervisors which leverages para-virtualization together with full-virtualization, while VMware ESXi is a monolithic hypervisor which leverages hardware emulation [7].

7.1.2 Monolithic vs Microkernelized Monolithic hypervisor (Figure 7), such as VMware ESXi, handles all the hardware access for each VM. It hosts drivers for all the hardware (storage, network, and input devices) that the VMs need to access [7].

 The biggest advantage of this design is that it does not require a host operating system. The hypervisor acts as the operating platform with drivers, so it is easily possible to run multiple operating systems, even heterogeneous ones, on the same hardware.

P a g e | 18


Then, the hypervisor takes care of the necessary network emulation, in order to let the VMs communicate with the outside world and with each other.

 The drawbacks are limited hardware support and instability - the device drivers are directly incorporated into the layers of functionality, which means that if one driver is hit by an update, bug, or security vulnerability, the entire virtual architecture within that physical machine will be compromised [8].

Microkernel eliized Hy Hyp pervisor

Monol oliithic Hype perrvisor

Management Console Parent Par ent Par Partit tition ion

Virtual Machines

Virtua Vir tuall Mac Machin hines es

Drivers

Hypervisor Drivers Hypervisor Hardware

Hardware

F IGURE IGURE 7: MICROKERNELIZED VS MONOLITHIC HYPERVISOR

In contrast, Micro-kernelized design design (Figure 7), e.g. Hyper-V Server and Xen, does not require require the device drivers to be part of the hypervisor layer. Instead, drivers for the physical hardware are installed in the operating system, which is running in the parent partition (VM) of the hypervisor [9]. This means that there is no need to install the physical hardware supporting device drivers for each guest operating system running as a child partition, because when these guest operating systems need to access physical hardware resources on the host computer, they simply do this by communicating through the parent partition [10]. This communication can be via a very fast memory-based bus in case the child partition is para-virtualized, or using the emulated devices provided by the parent partition in case of full-virtualization. Micro-kernalized is out of the scoop for this paper. However, worth mentioning is that studies has shown that Xen predictability is very close to the bare-machine (non-virtualized ( non-virtualized hardware) which makes it a candidate to be used for soft real-time hypervisor [11].

7.2 Virtualization techniques and performance characteristics The various techniques to virtualization environments are used in different slightly different ways and even with some semantic overlapping. In this section, the VMware ESXi (Elastic Sky X Integrated) are put forward and summarized, and references to its different performance characteristics are given.

7.2.1 Full virtualizatio virtualization n Full virtualization allows for running unmodified guest operating systems by full emulation of the hardware that they believe to be running on top of, e.g., network adapters and other peripherals. This way, it is easily possible to run multiple operating systems, even heterogeneous ones, on the same hardware. Then, the hypervisor takes care of the necessary network emulation, in order to let the VMs communicate with the outside world and with each other.


P a g e | 19

We have to take a closer look at VMware ESXi, as it is the mai n stream vendors, and chosen as hypervisor by the company. ESXi as is a type 1 monolithic hypervisor. It is an operating systemindependent hypervisor based on the VMkernel OS, interfacing with agents and approved third-party modules that run atop it. VMkernel is a POSIX-like (Unix style) operating system developed by VMware and provides certain functionality similar to that found in other operating systems, such as process creation and control, signals, file system, and process threads.

FIGURE 8 : ARCHITECTURE OF VMWARE ESXI

 Designed specifically to support running multiple virtual machines and provides such core functionality as resource scheduling, I/O stacks, and device drivers [12].

 Full virtualization environment, like ESXi, will cause continuous traps to the hypervisor that intercepts the special privileged instructions that a guest operating system kernel believes to execute, but whose effect i s actually solely emulated by the hypervisor. Examples of such privileged instructions are accesses to peripheral registers. The implied performance issues may be mitigated by recurring to hardware-assisted virtualization [13].

7.2.2 Hardware assisted virtualization Hardware-assisted virtualization is a technique where the hardware provides additional features that speed-up the execution of VMs. For example, an additional layer for translation in hardware of virtualized to physical memory addresses enables unmodified guest operating systems to manipulate their page tables without trapping; this is provided by the VT-x technologies from Intel [13] or IOMMU from AMD, among others. With these modifications, VMs can actually manipulate only virtual page tables, while the physical page tables are under the control of the hypervisor. Another noteworthy example is the one of a physical network adapter capable of behaving as multiple logical ones, each with its own MAC address and send/receive queues, to be used by different VMs running on the same physical system. The Single-Root I/O Virtualization (SR-IOV) technology from Intel [14] goes beyond network peripherals, allowing for extending the concept to any PCIe peripheral, see Figure see Figure 9. These 9. These functions may be correlated to the three NFVI domains and subsequently address Compute, Network and Storage Acceleration.


P a g e | 20

F IGURE IGURE 9 : N ATIVELY AND AND SOFTWARE SHARED

 The biggest advantage of this design improved I/O throughput and reduced CPU utilization, lower latency, improved scalability.

 This design’s drawbacks are: portability; Live Migration is not possible, reduced features for VM configuration patching. But probably the major problem is the lac k of full support for SR-IOV in management and orchestration solutions for NFV.

7.2.3 Para-virtualization It was the open-source Xen hypervisor that brought in the concept of para-virtualization, also used by Hyper-V (MS product name Enlightened [15]). It is a technique by which the guest operating system is modified so as to be aware to be running within a VM. This results in avoiding the wasteful emulation of virtualized hardware. Rather, the modified kernel and drivers of the guest operating system are capable of performing direct calls to the hypervisor (also known as hypercalls) whenever needed 6. The evolution of hardware assisted virtualization, coupled with para-virtualization techniques, allow virtualized applications now days to achieve an average performance only slightly below the native one.



Para-virtualization is more suitable for real-time cloud computing since it identifies the specific components of an operating system that have to be virtualized in order to optimize performance.

 The guest OS must be tailored specifically to run on top of the virtual machine monitor (VMM). And it eliminates the need for the virtual machine to trap privileged instructions. Trapping, means of handling unexpected or unallowable conditions, can be timeconsuming and can adversely impact performance in systems that employ full virtualization.

6

The performance advantages of para-virtualization over full-virtualization have to face the weakened isolation from a security perspective, but these aspects are outside the scope of this paper.

P a g e | 21


7.2.4 Container-bas Container-based ed virtualization A lightweight alternative to the hypervisors is container-based virtualization, also called operating system level virtualization, sometimes it's not really talked about as virtualization at all. But as container-based virtualization can have advantages in terms of performance and scalability we will take a closer look at it.

App Apps r e n i a t n o C

Bin/libs

Apps

Bin/libs

Apps r e n i a t n o C

Bins/libs

App

App

Bin/libs

Bin/libs

Guest OS M V

M V

Host OS

Hypervisor

Hardware

Hardware

Guest OS

F IGURE 1) IGURE 10 : C ONTAINER ONTAINER -BASED VIRTUALIZATION VERSUS HYPERVISOR-BASED VIRTUALIZATION ( TYPE TYPE 1)

The containers provides operating system-level virtualization through a virtual environment that has its own process and network space, usually imposes little to no overhead instead to creating a full-fledged VM. Each container are isolated but must have the same OS and, where appropriate, libs / bins.

 The separation can improved security, hardware independence, and added resource management features [16].

 However, this operating system level virtualization are not as flexible as other virtualization approaches. For example, with Linux, different distributions are fine, but other operating systems such as Windows cannot be hosted. [17] Note: The Internet Engineering Task Force (IETF) have an ongoing analysis the challenges of using VM for NFV workloads and how containers can potentially address these challenges [18].


P a g e | 22

7.3 Know the performance curve In the world of networking there is an old adage which says: “If you have a network problem, throw bandwidth at it.” i t.” This type of statement is usually true in that more bandwidth will make a network problem go away. A similar saying happens in compute environments: increase CPU (processing power) if there are application issues. When applying this to VNF performance, it is important to apply the right number of resources for the desired performance. Most VNF vendors are providing guidelines on performance from a minimum to maximum resource allocation.

F IGURE CPU CORES PERFORMANCE CURVE 7 IGURE 11 : CPS VS V CPU

7

Using Ixia’s IxLoad to test HTTP connections/sec increasing from 2 vCPU to 6 vCPU achieves significant performance improvements. When moving to 8 or beyond, t he performance flattens or can even decrease.

P a g e | 23


8 Real-time challenges in cloud computing This section will identify some of the real-time challenges in cloud computing.

8.1 Terminology mapping In essence, both real-time and cloud computing communities have the goal of guaranteeing the assigned resources or levels of contracted services between users and the execution platform, or between VMs and the platform. However, what it means to a cloud computing person (typically coming from the fields of networking, distributed systems, or operating syst ems) may be somehow different from the idea conceived from a real-time systems point of view. Therefore, a clarification of concepts and terminology mapping is proposed in this section as shown in Table 2. TABLE 2 - TERMINOLOGY MAPPING FOR THE DOMAINS OF CLOUD COMPUTING AND REAL -TIME. Cloud terminology

Real-time concern

Multi-tenancy

Avoidance of interference of multiple workloads in the same platform

Spatial and temporal isolation

Dynamic provisioning

Autonomous configuration, allocation, and deployment of virtual cluster resources

Dynamic resource Management

Service Level Agreements (SLA)

Guaranteed QoS levels for applications that include the computational resources and networking performance.

Resource contracts and budget enforcement

QoS guarantees

Quality of service with respect to network performance parameters that guarantee the level of service for multiple users

Temporal guarantees

Applications must not interfere either on the processor resource (to avoid deadline misses) nor on the usage of memory (to avoid synchronization delays that may cause deadline misses) Resource managers are typically implemented close to (and even as part of) the real-time operating system. They include the needed policies to temporarily scale vertical (up/down) resource assignments if it does not cause any temporal or spatial conflict Applications negotiate with the resource managers their required resource assignments. Resource managers have admission policies that are able to determine whether a specific assignment can be fulfilled. If the contract is set, the system will guarantee it at all times Timeliness of the execution is required in real-time systems. Results, i.e., either operations or communications, must fulfill specific deadlines. QoS is also considered as tradingoff assigned resources for the quality of the output To ensure redundant infrastructures are in different locations. This enables a VNF instance to be implemented implemented on different physical resources, e.g. compute resources and hypervisors, and/or be geographically dispersed as long as its overall end-to-end service performance and other policy constraints are met

In this line, one may think of the increasingly importance of network functions virtualization, which we will take a closer look at in next session. Still there is an increasing need for controlling the temporal behavior of virtualized software, making their behavior more predictable.

P a g e | 24


9 Network Functions Virtualization This session offers an overview of the technology and architecture for network function virtualization. In the telecommunication industry, there is a major shift from hardware-based provisioning of network functions to a software based provisioning paradigm where virtualized network functions are deployed in private or hybrid clouds of CSP [2]. Network function virtualization (NFV) technology has been developed by the European Telecommunications Standards Institute (ETSI) and reviled in 2012. ETSI NFV aims to define an architecture and a set of interfaces so that physical network functions, like routers, firewalls, content distribution networks (CDNs) and Telco applications, can be transformed: from software applications designed to run on specific dedicated hardware into decoupled applications – called – called virtual network functions (VNF) – (VNF) – deployed deployed on VMs or containers, on commercial offthe-shelf (COTS) equipment. The VMs are deployed on high-volume servers which can be located in datacenters, at network nodes, and in end-user facilities. The most VMs provide on-demand computing resources using cloud. Cloud-computing services are offered in various formats [19]: IaaS, PaaS, Software-as-aService (SaaS), and Network-as-a-Service (NaaS). There is no agreement on a standard definition of NaaS. However it is often considered to be provided under IaaS. The NFV technology takes advantage of infrastructure and networking services (IaaS and NaaS), see Figure see Figure 12, to 12, to form the network function virtualization infrastructure (NFVI) [20], also called Telco-cloud.

NFVIaaS

S a a I

S a a N

F IGURE IGURE 12 : M APPING I AA AAS AND N AAS WITHIN THE NFV INFRASTRUCTURE INFRASTRUCTURE

As mentioned above there are different service offerings in cloud computing. Depending on where cloud stack are split between tenant and provider defines the type of service offering (X-as-a-Service). For the traditional IT systems the whole stack are managed by t he owner, whereas in e.g. PaaS the tenet are responsible for the top layers (data and application), and the provider of the PaaS manage everything below. Compared to the Telco-cloud (IaaS) where the tenet (for VNF) are responsible for the OS 8 layer and above, and the provider of the IaaS manage everything below. Figure 13 below illustrates how the different offerings of cloud computing services are divided and managed.

8

This could various, in some cases the OS stack are split into two, as the figure below.


P a g e | 25

F IGURE IGURE 13 M ANAGEM ANAGEMENT ENT OF DIFFERENT DIFFERENT C LOUD LOUD COMPUTING SERVICE OFFERINGS

9.1 NFV framework The ETSI NFV group has defined the NFV architectural framework at the functional level using functional entities and reference points, without any indication of a specific implementation. The functional entities of the architectural framework and the reference points are listed and defined in [21] and shown in Figure in Figure 14.

F IGURE IGURE 14: NFV REFERENCE REFERENCE ARCHITECTURAL FRAMEWORK

9.2 VNF A Virtualized Network Function (VNF) is a functional element of the NFV architecture framework as represented on Figure on Figure 14 above. A VNF could e.g. be a Telco applications like OCS or IMS component. Note : When designing and developing the software that provides the VNF, VNF Providers may structure that software into software components (implementation view of software architecture) and package those components into one or more images (deployment view of software architecture). In the following, these V NF Provider defined software components are called V NF Components (VNFCs).


P a g e | 26

The VNF design patterns and properties are described in [22], but will shortly described next. The goal is to capture all practically relevant points in the design space.

9.2.1 VNF Internal I nternal Structure VNFs are implemented with one or more VNFCs and it is assumed, without loss of generality, that VNFC Instance map 1:1 (Figure 15) to NFVI Virtualized Container interface (Vn-Nf, see Figure 14). 14). A VNFC in this case is a software entity deployed in a virtualization co ntainer, see Figure 16. A VNF realized by a set of one o r more VNFCs appear to the outside as a single, integrated system, illustrated as circles in the figure.

F IGURE VNF IINSTANTIATION IGURE 15 : VNF Parallelizable: Instantiation of multiple time per VNF instance but with constraints on number. Non-Parallelizable: Instantiation once per VNF

F IGURE VNF C C OMPOSITION IGURE 16 : VNF OMPOSITION Remark: Virtualization Containers in a VNF context are not necessary similar to in section 7.2.3.

9.2.2 VNF states Each VNFC of a VNF could either be stateless or stateful. 





Statefulness will Statefulness will create another level of complexity, e.g. a session (transaction) consistency has to be preserved and has to be taken into account in procedures such as load balancing. The data repository holding the externalized state may itself be a stateful VNFC VNFC in the same VNF. The data repository holding the externalized state may state may be an external VNF.


P a g e | 27

9.2.3 VNF Scaling A VNF can be scaled by scaling one or more of it VNF C, depending on type of workload. Three basic models of VNF scaling have been identified:

Auto scaling:

-

VNFM will trigger the scaling of VNF according to the defined rules setup for the VNF in the VNFM. For VNFM to trigger the scaling the VNF Instance's state has to be monitored by tracking its events on an infrastructure-level and/or a VNF-level. Infrastructure-level events are generated by the VIM. V NF-Level events may be generated by the VNF Instance or its EM. Auto scaling supports both horizontal and vertically scaling.

On- demand scaling:

-

VNFM will be trigger the scaling of VNF through explicit request from VNF or its EM, based on monitoring of VNFs VNFCs states. On-demand scaling supports both horizontal and vertically scaling.

Scaling based on management request:

-

Manual triggered scaling. OSS/BSS trigger scaling based on r ules in NFVO or by human operator triggers scaling through the VNFM.

Vertically scaling (out/in) allows (out/in) allows adding/removing VNFC instance(s) that belong to the VNF. Horizontal scaling (up/down) allows (up/down) allows dynamic (adding/removing) resources from existing VNFC that belong to the VNF.

9.2.4 VNF Load balancing There are different type of load balancing (LB) VNF-internal VNF-internal LB

The Peer NF sees the VNF instance as one logical NF. The VNF has at least one VNFC that can be replicated and an internal LB (which is also a V NFC) that direct flows between different VNFC instances, see Figure 17 .


P a g e | 28

F IGURE 17 :: VNF VNF IINTERNAL LOAD B ALANCER IGURE 17 VNF-external LB

The Peer NF sees the many VNF Instances as one logical NF. A load balancer external to the VNF (which may be a VNF itself) direct flows between the different VNF instances. (Important, not the VNFCs!), see Figure 18. 18.

F IGURE VNF E E XTERNAL LOAD B ALANCER IGURE 18 : VNF End-to-End Load Balancing

The Peer NF sees the many VNF instances as many logical NFs. In this case the Peer NF itself contains load balancing functionality to balance between the different logical interfaces.

F IGURE VNF E2E E2E LLOAD B ALANCING IGURE 19 : VNF

9.3 NFV Management and Orchestration The NFV Management and Orchestration (NFV-MANO), called MANO from now on, acts as the heart and brain of NFV architecture. The NFV decoupling of Network Functions from the physical infrastructure results in a new set of management functions. Their handling requires a new and different set of management and orchestration functions. MANO has the role to manage the NFVI and orchestrate the allocation of resources needed by the VNFs. Such coordination is necessary now because of the decoupling of the Network Functions software from the NFVI. Traditional network functions needs just one management system i.e. an FMS or at most supported by an OSS. The NFV network, on the other hand, needs multiple managers i.e. a VIM Manager, a VNF Manager and an Orchestrator. See Appendix See Appendix C: MANO architecture for more details.


P a g e | 29

10 NFV challenges This section identify some of the VNF challenges in NFV, with suggested solutions and requirements. Although NFV is a promising solution for CSP, it faces certain challenges that could degrade its performance and hinder its implementation in the telecommunications industry. In this section, some of the NFV requirements and challenges, and proposed solutions are discussed. Table 3 summarizes this section.

TABLE 3 : NFV CHALLENGES AND REQUIREMENTS Challenges Security

Description Virtualization security risks according to functional domains: 1) Virtualization environment domain (Hypervisor): − Unauthorized access or data leakage. 2) Computing domain: − Shared computing resources: CPU, memory, disk…etc. disk…etc. 3) Infrastructure networking domain: − Shared logical-networking layer (vSwitches). − Shared physical NICs. − Network isolation for tenants and application.

Computing performance performance

The VNF should provide comparable performance to NF running on proprietary hardware equipment.

VNF interconnection interconnection

Virtualized environment has different approaches from classical network function interconnection.

Portability

VNFs should be decoupled from any underlying hardware and software. VNFs should be deployable on different virtual environments to take advantage of virtualization techniques like live migrations.

Operation and management Existence with legacy networks Carrier-grade service assurance

VNFs should be easy to manage and migrate with existing legacy systems without losing the specification of a carriergrade service.

Solution and Requirements Requirements Security implementations according to functional domains: 1) Virtualization environment domain (Hypervisor): − Isolation of the served virtual-machine space, with access provided only with authentication controls. − Use virtual data centers (VDCs) that serve as resource containers and zones of isolation. 2) Computing domain: − Private and shared disk/memory allocations should be erased before their re-allocation, and verify that it has be erased. − Lawfull encryption. Data should be used and stored in an encrypted manner by which exclusive access is provided only to the VNF. 3) Infrastructure domain (networking): − Usage of secured networking techniques (TLS, IPSec, or SSH). VNF software could achieve high performance using the following techniques: − Multithreading to be executed over multiple cores, or could be scaled over different hosts. − Independent memory structures to avoid operating-system deadlocks. − VNF should implement its own network stack. − Processor affinity techniques should be implemented. − Important to allocate the proper amount of vCPU to each VM used by the VNF. Over-provisioning a VNF with too much CPU may be wasteful or even degrade the performance. VNFs should take advantage of accelerated vSwitches and use NICs that are single-root I/O virtualization (SR-IOV) [14] compliant. Caveats: Breaks live migrations of VMs. The VNF development should be based on a cross-platform virtual resource manager that ensure its portability. The NFVO should support a heterogeneous VNF environment to reduce vendor dependencies. E.g. OpenStack Tracker [23]. To achieve the desired operation and management performance, a standard template of NFV framework entities should be well-defined. It should be


Scalability - Resource Allocation & Optimization

VNFs should be able to temporarily scale vertically resource assignments if it does not cause any temporal or spatial conflict.

P a g e | 30

able to interact with legacy management systems with minimal effects on existing networks. The NFVO must monitor network function performance almost in real time. To achieve automated dynamic scaling of the resources the design patterns of Auto- or On-demand scaling should be implemented. For a more manual approach the Scaling based on management request pattern is good enough.


P a g e | 31

11 Solution architecture This section are proposing a solution architecture of for OCS as VNF in the Telco-cloud (NFVI) and identify architectural impacts. The OCS system has the following high level requirements Front-End 

Site + Geo redundant



Stateless9



Active/active



Communicate with BE+ELM via internal network for charging and monitoring.



Communicate with Network domain (EPG/MTAS/DSC) (EPG/MTAS/DSC) via signaling network (SIG).



Operation and maintenance(OAM) network for admin and monitoring



SAN for backups



RTO FE-BE max 200ms (avg. ~100ms)

Back-End 

Site + Geo redundant



Average latency may not exceed 10 ms between the sites.



Stateful



Active/Standby



Communicate with FE+ELM via internal network for charging and monitoring.



Need an OAM network connection for admin and integration with Business Support System (BSS)



Need a SAN for backups



Support horizontal scaling of application.



Don’t support horizontal scaling of relationship database.



DB replication



No Single point of failure (SPOF)

for OCS. Element Manager (ELM) – Alarm/Event monitor for 

Geo redundant (2 nodes)



Active/Standby



Communicates with FE+BE via internal network.



Need an OAM network connection for O&M and integration with Operation Support System (OSS).



Located on the same subnet as FE & BE to work as deployment server.

We will start with describing a generic VNF cluster architecture for O CS and later deploy the components therein.

9

See remark in section section 6.1 6.1


P a g e | 32

11.1 Generic VNF Cluster Architecture The OCS VNF are built according to Component Based Architecture (CBA) using same component base for network connectivity. Generally, VNF, follow same network connectivity design independently on the VNF type/function. However, there is a degree of flexibility in network connectivity providing, if necessary, adjustments for achieving overall better network design. Generally speaking, every VNF can connect to a number of different Virtual Routing Function (VRF) on site routers. However, in order to achieve resiliency at VNF level, it is required to connect a VNF with at least two interfaces for each VRF/VPN for external communication. Figure 20 shows example of generic VNF cluster architecture with connectivity to multiple subnets for signaling (SIG) VRF

F IGURE VNF C C LUSTER IGURE 20 : GENERIC VNF LUSTER ARCHITECTURE WITH CONNECTION TO MULTIPLE SIG SUBNETS

There are two distinct types of VNF included in the cluster architecture: System Controllers (SC) VNF and Payload (PL) VNF. The SC VNF are none-traffic nodes and handling system management/control of the PL VNF. The PL VNF are traffic nodes and connected to the signaling network. The figure above shows LB stretched over both types of VNF: SC and PL. In order to keep homogeneous networking design for all VNF in the NFVI, two SC for communicating externally with both VRF enabled OM network (OM-SC) and emergency OM network (OM-CN) are used. OM-SC is used for communicating with the BSS domain. The OM-CM is used for some auxiliary functions as well, such as NTP sync, etc. Independently on number of PL VNF included in the cluster, as a result of dimensioning output, there are two dedicated PL VNF which are communicating externally through Signaling VRF. Independently on the VNF type, there is an internal (“backplane”) network for intra-cluster intra-cluster communication. Different types of VNF may have different number of internal networks required for their operation. The number of internal networks may vary between one and three. However, one internal network is always present in any VNF. The backplane is also used for backup/restore via the vSAN. It is important to notice that current VNF Cluster Architecture is not limited to such design, however, it is chosen due to the following reasons:

 

Homogeneous connectivity of all VNF to external networks Two VNF of each kind are chosen in order to provide redundancy

P a g e | 33


Traffic dimensioning results show that two Traffic VNF for OM and Signaling are enough to cope with the traffic load. It is possible to increase number of traffic VFN for each type of traffic during runtime (horizontal), although will most likely requires cluster reboot for changes to take effect.





Another example of network connectivity is when all PL nodes are taking part in sending and receiving traffic along with load balancing of processing capacity. This example is, as generic as previous, shown in Figure in Figure 21.

Net :Backplane : internal

.

…

LB LB

VNF (PL)

VNF (PL)

VNF (PL)

VNF (PL)

VRF

VRF

VRF

VRF

Net: SIG A

VNF (SC)

VRF

VNF (SC)

VRF Net: OM-SC

Net: OM-CN

FIGURE 21 : : GENERIC VNF CLUSTER ARCHITECTURE WITH ALL PL VNF CONNECTED TO THE SAME SUBNET AND TAKING PART IN LOAD SHARING OF BOTH TRAFFIC AND PROCESSING CAPACITY

The BE and the ELM in OCS are not traffic nodes and will therefore not follow the generic design with separated SC and PL. They will be plane VNFs with external load balancer and VRF. Note: ELM in this context is not the same as element manager (EM) in the NFV.

11.2Deploying 11.2 Deploying OCS as VNFs The OCS system components are deployed as fo llowed: VNFs Front-Ends: Each site has two pay-load FE-VNF with non-parallelizable instantiation of one VNF container with capabilities to handle real-time transactions (DCCA-server) and trigger onlinecharging (OCF-client). Also two service control FE-VNF with non-parallelizable instantiation of VNF containers to handle operation and maintenance (OAM) of all front-end VNFs. Site B has the same instantiation. The pay load VNFs are active at both sites to take traffic. The system control VNFs has only one active for all sites, the rest are standby.

Figure 22 shows a deployment view of OCS Front-End VNFs with instantiation of with associated function components for one site.


P a g e | 34

F IGURE IGURE 22 : DEPLOYING VIEW OF OCS FRONT -END AS VNF S – ONE ONE SITE Back-Ends: Each site has one BE-VNF with parallelizable instantiation of 3 components. VNFC1 with capabilities to handle rating (RT), account balance management (ABMF) and chargi ng (OCF-server). VNFC2 has the database (SDP). VNFC3 act as system system manager/controller (OAM) for the back-end. ELM: Each site has one ELM-VNF with non-parallelizable instantiation of one VNF container with capabilities to handle alarm and event handling (ELM), as well as management of the ELM itself (OAM).

Figure 23 shows a deployment view of OCS Back-End VNFs with instantiation of VNFCs with associated function components for one site.

F IGURE -E ND IGURE 23 : DEPLOYING VIEW OF OCS BACK -E ND AND ELM AS VNF S – ONE ONE SITE The reason for having two BE-VNF one each site is to provide redundancy (no SPOF). Most likely just one BE-NVF per site is enough, enough , but it’s unclear to me if the virtual routers could be configured redundant for a single VNF at this point. Further research is needed.


P a g e | 35

Principally the splitting of a cluster is done by keeping the primary database on one site and moving the standby database to another site. Furthermore, traffic handling nodes (FE) are divided between the sites. The external VNF load balancer for ELM and BE shall direct flows to the active VNF that has the appropriate configured/learned state. Replication is done between the active and standby databases. Figure databases. Figure 24 shows the geographically split cluster following a high availability (HA)/disaster recovery (DR) model. Note: example has one BE-VNF per site.

F IGURE IGURE 24 : GEOGRAPHICALLY SPLIT C LUSTER LUSTER OVERVIEW

The BE-VNF and FE-VNF are deployed in a single tenant virtual data center (vDC) on each site for isolation of the served VM space, and provided access only with authentication controls, shown below in Figure 25. The VNFs in the vDC should be able to temporarily scale vertically resource assignments if it does not cause any temporal or spatial conflict. Carefully plan how Allocation Models are placed in Provider vDCs [24]. The average latency between site A and B must not exceed 10 ms during normal operation.

F IGURE IGURE 25 : TENANTS VIEW OF VIRTUAL DATA CENTERS EXECUTION WITH OCS VNF S10

10

Note that the figure is simplified, e.g. missing load balancer for the OAM network.


P a g e | 36

The business support system (BSS) communication via the OAM network should be encrypted. The VNFs should take advantage of accelerated vSwitches and use NICs that are single-root I/O virtualization (SR-IOV). The VNFC VM should not have more than 6vCPU, if needed - scaled horizontal.

11.2.1Scaling 11.2.1 Scaling and fault handling The ELM component monitors all of the OCS VNFs. ELM can trigger O n-demand to the VNFM. Horizontal scaling is supported by FE-VFN, by adding payload VNFs. BE-VNF will only support vertical scaling. ELM also send alarm related to application, middleware and OS to the fault management system one the OSS side. No EMs are introduced at this point. In case of a site failover for the BE-VNF, the external VNF LB has be identified by the failover and point to the active. The same goes for the system control FE-VNF.

11.3Analyze 11.3 Analyze of the Solution architecture I will not draw any conclusions whether OCS architecture are compliance to the Telco-cloud based on this theoretical study. Further analysis and a suggested proof of concept are needed before that. But from a theoretical point of view, subject to the investigation time has been very short, I have not find any major obstacles that prevent a deployment. The lack of API between EM and VNFM has not been identified as critical, as OCS is a fairly s tatic system. My only concern, from a scalability perspective, is that the Back-End Back-E nd doesn’t have the capability of horizontal scaling with its current architecture. The risk of capacity shortage is obvious, based on the “the performance curve”, as the as the vertical scaling reaches its limits. One alternative to support horizontal scaling of the Back-End is to introduce a new component – Account – Account Finder (AF). The AF should ho ld the location information of which Back-End has which subscriber account balance state. Another alternative could be customer segmentation. How this affect the rest of the OCS architecture is out of scope for this paper, as further research is needed.


P a g e | 37

12 Future directions This section points out open areas of research related to the NFV. They are additional to the once earlier identified in this paper and have been identified during the research but not highlighted. Network virtualization for cloud computing is focused at the different networking layers (layer 3, 2, or 1). For example, Layer 3 VPNs or Virtual Private Networks (L3VPN) are distinguished by their use of layer 3 protocols (e.g., IP or MPLS) in the VPN backbone to carry data between the distributed end points with possible heterogeneous tunneling techniques. QoS with respect to network performance has to be guaranteed even when multiple users are sharing a specific infrastructure simultaneously. Some studies (such as [25]) have shown that performance can improve significantly if virtual machines are interoperated via a high-speed cluster interconnect. There are some implementations that back up this idea based on the usage of InfiniBand [26], providing improved networked latencies as compared to IaaS solutions based on Ethernet. However, merging networking research with real-time design and scheduling techniques is an open area of research. The VNFM (Appendix C: MANO architecture) is architecture) is attracting a lot of interest from all directions, and while the ETSI standards are not yet “set in stone” and neither are the companies’ MANO strategies and products [27]. As the NFV initiative has developed over the past several years, the focus of attention in NFV management has shifted perceptibly. Initially, the VIM (e.g., OpenStack, VMware, etc.) at the bottom of the stack was the issue; then, the special requirements and complexities of Telco network orchestration kicked in and attention shifted up to the NFVO at the top. So now, n ow, it’s the turn of the VNFM, which – notwithstanding – notwithstanding the awesome responsibilities of the NFVO – NFVO – is the “bit in the middle” that often turns out to be the most contentious. Put very simply, the VNFM is responsible for the lifecycle management of the VNF under the control of the NFVO, which it achieves ac hieves by instructing the VIM. However, this is the big question: who is best placed to supply the VNFM? Vendors are taking different approaches to VNFM development; they will need to be harmonized if carriers are to realize multi-vendor NFV MANO. Is the mark of a good VNF supplier one that that also provides its own VNFM? Is the mark of a good MANO supplier one that can accommodate a VNF without a VNFM? Is the mark of a good NFVI platform vendor one that takes away the need for a VNF supplier to even develop a VNFM? There are likely many more angles to explore around the VNFM, but from a CSP’s perspective more research how to reduce the risk of multi-vendor NFV implementations is an open area of research.


P a g e | 38

13 Conclusions Despite that it has been a reality for some years, Cloud computing is a fairly new paradigm for dynamically provisioning computing services (X-as-a-service). Located in data centers using virtualization technology allowing server consolidation and efficient resource usage in general. In the last decades, important advances mainly at m achine virtualization, network, and storage levels have contributed to the wide spread usage and adoption of this paradigms in different domains. Real-time application domains are still behind in the full adoption of cloud computing, due to their strong timing requirements and needed predictability guarantees. Merging cloud computing with real-time is a complex problem that requires real-time virtualization technology to consolidate the predictability capabilities. This paper has analyzed some of the problems and challenges for achieving real-time cloud computing as a first step towards presenting an abstract map of the situation today, identifying the needed elements on different levels to make it happen. The presented concerns range from the hypervisor structure and role, the different possible types of virtualization and their respective performance, general resource management concerns, and the important ro le of the network in the overall picture of virtualization technology. For the latter, this paper has described the OCS architecture, the different components and how they interact, and the realtime challenges that appear therein. A terminology mapping between cloud and real-time systems domains has been settled in order to connect both areas. Furthermore, an overview of the technology and architecture for NFV that aims to revolutionize the telecommunication industry by decoupling network functions from the underlying proprietary hardware has been presented. Although NFV is a promising solution for CSP, it faces certain challenges that could degrade i ts performance and hinder its implementation in the telecommunications industry. Some of the NFV challenges, and proposed solutions have been discussed. Lastly, a proposed solution architecture of for OCS as VNFs in the Telco-cloud (NFVI) has been presented. Introducing a generic VNF design for the traffic nodes and pointing out concerns related to the scalability with current backend architectural in a virtualization technology. Besides all the advantages brought by NFV to the telecommunications industry, it faces technical challenges that might hinder its progress. Therefore, IT organizations, network enterprises, telecommunication equipment vendors, and academic researchers should be aware of these challenges and explore new approaches to overcome them.


P a g e | 39

14 Discussions It turned out that both NFVI and Cloud computing are two very large areas to study. In combination with “how the OCS architecture are affected by this” this ” made the workload more or less overwhelming. It has been difficult to keep a constant detail level and scoop, because for each stone I took off, I found something new and exciting. In retrospect it would have been enough with one area and gone into more depth or a more formal analyze and scooping method. I f eel that the OCS as VNF didn’t get enough attention to cover all aimed aspects. A better research before beginning had probably changed the scoop of this paper, e.g. during the writing I identified that some of the architectural decisions has already been made for the Telco-cloud (NFVI). I still hope this paper can give a new real-time perspective to the subject. However, it has been extremely instructive and interesting area(s) to explore.


P a g e | 40

15 References [1]

Wikipedia, "Network function virtualization," [Online]. Available: https://en.wikipedia.org/wiki/Ne https://en.wikipedia.org/wiki/Network_function_ twork_function_virtualization. virtualization.

[2]

Wikipedia, "Communications service provider," [Online]. Available: https://en.wikipedia.org/wiki/Commun https://en.wikipedia.org/wiki/Communications_service ications_service_provider. _provider.

[3]

3. T. 29.078, "Customised Applications for Mobile network Enhanced Enhanced Logic (CAMEL) Phase Phase X; CAMEL Application Application Part (CAP) specification," 08 01 2016. [Online]. Available: http://www.3gpp.org/DynaReport/29078.htm. http://www.3gpp.org/DynaReport/29078.htm.

[4]

IETF, "Diameter "Diameter Credit-Control Application," [Online]. Available: https://tools.ietf.org/html/rfc4006. https://tools.ietf.org/html/rfc4006.

[5]

Wikipedia, "CAP theorem," [Online]. Available: https://en.wikipedia.org/wiki/CAP_t https://en.wikipedia.org/wiki/CAP_theorem. heorem.

[6]

ETSI, "NFV "NFV - Resiliency Requirements," [Online]. Available: https://www.etsi.org/deliver/e https://www.etsi.org/deliver/etsi_gs/NFVtsi_gs/NFVREL/001_099/001/01.01.01_60/gs_nf REL/001_099/001/01.01.01_60/gs_nfv-rel001v010101p.pdf. v-rel001v010101p.pdf.

[7]

A. Syrewicze, Syrewicze, "VMware "VMware vs. vs. Hyper-V: Architectural Differences," 2013. [Online]. [Online]. Available: Available: http://syrewiczeit http://syrewiczeit.com/vmware.com/vmwarevs-hyper-v-architectural-differences/.

[8]

C. Bradford, "Virtualization Wars: VMware vs. Hyper V: Which is Right For Your Virtual Virtual Environment?," Environment?," 2014. [Online]. Available: http://www.storagecraft.com/blog/virtual http://www.storagecraft.com/blog/virtualization-wars-vmware-vs ization-wars-vmware-vs-hyper-v-which-is-right -hyper-v-which-is-right-for-your-virtual-for-your-virtualenvironment/.

[9]

N. Sharma, "Hyper-V and VMware vSphere Architectures: Pros and Cons," [Online]. [Online]. Available: Available: http://www.serverwatch.com/serv http://www.serverwatch.com/server-tutorials/micros er-tutorials/microsoft-hyper-v-and-vmware oft-hyper-v-and-vmware-vsphere-archit -vsphere-architectures-advantage ectures-advantages-ands-anddisadvantages.html.

[10] Z. H. Shah, "Windows "Windows Server Server 2012 Hyper-V," in Deploying Hyper-V Enterprise Server Virtualization Virtualization Platform , 2013.. [11] L. P. M. T. Hasan Fayyad-Kazan, "Benchmarking "Benchmarking the Performance of Microsoft Hyper-V server, server, VMware ESXi and Xen Xen Hypervisors," 2013. [Online]. Available: http://www.cisjournal.org/journalofcomputing/ http://www.cisjournal.org/journalofcomputing/archive/vol4no12/v archive/vol4no12/vol4no12_5.pdf. ol4no12_5.pdf. [12] VMware, "ESXi architecture," [Online]. Available: Available: http://www.vmware.com/fil http://www.vmware.com/files/pdf/ESXi_archit es/pdf/ESXi_architecture.pdf. ecture.pdf. [13] H.-A. V. T. Intel Intel Corp. [Online]. [Online]. Available: Available: http://www.intel.com/content http://www.intel.com/content/www/us/en/virtual /www/us/en/virtualization/virtualizationization/virtualizationtechnology/intel-virtualization-technology.html. [14] A. I. t. S.-I. T. Intel Corp. [Online]. Available: Available: http://www.intel.com/content http://www.intel.com/content/dam/doc/application-not /dam/doc/application-note/pci-sig-sr-iove/pci-sig-sr-iovprimer-sr-iov-technology-paper.pdf. [15] Microsoft, "Hyper-V "Hyper-V Architecture," Architecture," [Online]. Available: Available: https://msdn.microsoft.co https://msdn.microsoft.com/enm/enus/library/cc768520%28v=bts.10%29.aspx. [16] M. V. N. F. D. R. Miguel G. Xavier, Xavier, "Performance "Performance Evaluation of Container-based Container-based Virtualization for," [Online]. Available: Available: http://marceloneves.org/papers/pdp2013-containers.pdf. [17] Wikipedia, "Operating-system-le "Operating-system-level vel virtualization," virtualization," [Online]. Available: Available: https://en.wikipedia.org/wiki/Operati https://en.wikipedia.org/wiki/Operating-systemng-systemlevel_virtualization. [18] IETF, "An Analysis Analysis of Container-based Container-based Platforms for NFV," [Online]. Available: Available: https://tools.ietf.org/html/draft-natarajanhttps://tools.ietf.org/html/draft-natarajannfvrg-containers-for-nfv-01. [19] Wikipedia, "Cloud computing," computing," [Online]. Available: https://en.wikipe https://en.wikipedia.org/wiki/Cloud_comput dia.org/wiki/Cloud_computing. ing. [20] ETSI, "NFV "NFV - Use Cases," Cases," [Online]. [Online]. Available: Available: http://www.etsi.org/deliver/e http://www.etsi.org/deliver/etsi_gs/nfv/001_ tsi_gs/nfv/001_099/001/01.01.01_60/ 099/001/01.01.01_60/gs_nfv001v010101p.pdf gs_nfv001v010101p.pdf.. [21] ETSI, "NFV "NFV - Architectural Framework," [Online]. [Online]. Available: Available: https://www.etsi.org/deliver/e https://www.etsi.org/deliver/etsi_gs/NFV/001_ tsi_gs/NFV/001_099/002/01.01.01_60/ 099/002/01.01.01_60/gs_NFV002v010101 gs_NFV002v010101p.pdf. p.pdf.


P a g e | 41

[22] ETSI, "NFV - Virtual Virtual Network Functions Functions Architecture," Architecture," [Online]. Available: Available: https://www.etsi.org/deliver/e https://www.etsi.org/deliver/etsi_gs/NFVtsi_gs/NFVSWA/001_099/001/01.01.01_60/gs_ SWA/001_099/001/01.01.01_60/gs_nfv-swa001v010101p.pdf. nfv-swa001v010101p.pdf. [23] OpenStack, "OpenStack-Foundat "OpenStack-Foundation-NFV-Report," ion-NFV-Report," [Online]. [Online]. Available: Available: http://www.openstack.org/asset http://www.openstack.org/assets/telecoms-ands/telecoms-andnfv/OpenStack-Foundation-NFV-Report.pdf. [24] VMware, "Allocation "Allocation Models for for Organizations using using vCloud Director," Director," [Online]. Available: http://kb.vmware.com/selfservic http://kb.vmware.com/selfservice/microsites/se e/microsites/search.do?language=en_US&c arch.do?language=en_US&cmd=displayKC&ext md=displayKC&externalId=1026290. ernalId=1026290. [25] N. R. &. J.-C. Ducom, "Recommendations "Recommendations for for Virtualization Technologies Technologies in High Performance Computing," 2010. [Online]. Available: http://www3.nd.edu/~ http://www3.nd.edu/~nregola/files/c nregola/files/cloudcom2010.pdf. loudcom2010.pdf. [26] Infiniband. [Online]. Available: http://www.infiniband http://www.infinibandta.org/content/page ta.org/content/pages.php?pg=about_us_i s.php?pg=about_us_infiniband. nfiniband. [27] D. Snow, "Networkmatter," "Networkmatter," 04 08 2014. 2014. [Online]. Available: Available: https://networkmatter.com/2014/ https://networkmatter.com/2014/08/04/stuck-in-t 08/04/stuck-in-the-manohe-manowith-you-who-supplies-the-vnfm/. [28] MustBeGeek, MustBeGeek, "Difference between vSphere, vSphere, ESXi and vCenter," vCenter," [Online]. Available: Available: http://www.mustbegeek.com/dif http://www.mustbegeek.com/difference-betw ference-between-vsphere-e een-vsphere-esxi-and-vcente sxi-and-vcenter/. r/. [29] S. Lowe, "Mastering VMware Vsphere Vsphere 5.5," [Online]. Available: http://books.denisio.ru/VMware/Mastering http://books.denisio.ru/VMware/Mastering%20VMware%20vSphere %20VMware%20vSphere%205.5.pdf. %205.5.pdf. [30] VMWare, "Creating "Creating a vCloud NFV," [Online]. [Online]. Available: https://www.google.se/url?sa=t&rc https://www.google.se/url?sa=t&rct=j&q=&esrc= t=j&q=&esrc=s&source=web&c s&source=web&cd=2&cad=rja&uact= d=2&cad=rja&uact=8&ved=0ahUKEwjX89G 8&ved=0ahUKEwjX89Gck83KAhXLa ck83KAhXLa 3IKHWbnBZcQFggjMAE&url=http%3A 3IKHWbnBZcQFggjMA E&url=http%3A%2F%2Fwww.vmware %2F%2Fwww.vmware.com%2Ffiles%2Fpdf .com%2Ffiles%2Fpdf%2Ftechpaper%2Fvc %2Ftechpaper%2Fvcloud-nfv-referenc loud-nfv-referenceearchitecture2.0.pdf&usg=AFQjCNGZ architecture2.0.pdf&usg=AFQjCNGZW9PlC7M5-kjH2Rkm2n W9PlC7M5-kjH2Rkm2n4. 4. [31] B. Giri, "Mustbegeek," "Mustbegeek," [Online]. Available: http://www.mustbegeek http://www.mustbegeek.com/difference .com/difference-between-vs -between-vsphere-esxi-and-vc phere-esxi-and-vcenter/. enter/. [32] Wiki, "Linux "Linux Containers," Containers," [Online]. [Online]. Available: https://en.wikipedia.org/wiki/LXC. https://en.wikipedia.org/wiki/LXC.

Illusatration frontpage: ChannelAxis (www.channelaxis.com)

A special thanks to the Mikael Gardh and Sven-Gunnar Nyberg (Opponents), Michael Thurell (Study counselor Data Föreningen) and Magnus Stomfelt (Symsoft - OCS Vendor).


P a g e | 42

16 Appendix A: VMWare VMware vSphere is a software suite that has many software components such as vCenter, ESXi, and vSphere client. VSphere is not a particular software that you can be install and use, “it is just a package name which has other sub components” [28]. components” [28]. The core of the vSphere product suite is the hypervisor, which is the virtualization layer that serves as the foundation for the rest of the product line [29]. All the vir tual machines or Guest OS are installed on ESXi server. To install, manage and access those virtual servers which sit above of ESXi server, another part of vSphere suit called vSphere client or vCenter is needed [28].

FIGURE 26 : VMWARE TECHNOLOGY MAPPING TO NFV REFERENCE ARCHITECTURAL FRAMEWORK [30]

17 Appendix B : OpenStack – Tracker Figure showing areas OpenStack Tracker address (in red) the NFVI MANO: NFVO and VNFM


P a g e | 43

18 Appendix C: MANO architecture MANO is the grey color block in the diagram below. It can be broken up into three functional blocks: 1. Virtualized Infrastructure Manager (VIM): Controls and manages the NFVI infrastructure that includes physical ( server, storage etc.), virtual resources (Virtual Machines) and software resources (hypervisor) in an NFV environment. 2. VNF Manager (VNFM) : VNFM is to VNFs, what VIM is to NFVI. Oversees lifecycle management of VNF instances; responsible for the FCAPs of VNFs (i.e. Fault, Configuration, Accounting, Performance and Security Management); scales up/scales down VNFs. There may be multiple VNFMs managing separate VNFs or there may be one VNFM managing multiple VNFs. 3. NFV Orchestrator (VNFO) : Responsible for on-boarding of the VNF packages; global resource management; validation and authorization of NFVI resource requests among different Point of Presence (PoP); topology management of the network services instances (also called VNF Forwarding Graphs). 4. And a group of repositories: deployment templates; detailed list of Network Services instances and related VNF Instances; repository of NFVI resources utilized for the purpose of establishing NFV services. 5. Element Management (EM), is not a part of the MANO but if it is available, it needs to coordinate with VNFM. It is responsible for the FCAPS of VNF. 6. OSS/BSS include collection of systems/applications that a service provider uses to operate its business.


P a g e | 44

FIGURE 27 : NFV MANO ARCHITECTURE

19 Appendix D: OCS RTO calculation & Latency Based on The left graph show the maximum response time for an initial charging control request (CCR-I) between DCCA-server in FE and BE. The right graph show the number of data session per second between OCF – OCF –client client in FE and OCF-server in BE.

FIGURE 28 : RESPONSE TIME CCR-I

FIGURE 29 : NUMBER OF PADS REQUEST PER SECOND

We assume that the RTO is configures to 1 sec in the DCCA-server, because none of the beaks are higher. And also can see that the average max response is ~100ms. Now by matching the left diagram max response peak of 1000ms with right diagrams for number of request we can see a drop in at the corresponded time. But only at o ne of the peeks. So by also looking at the bottom graph, showing average response time for CCR-I, we can see


P a g e | 45

something interesting. Average peek above 100ms are are not causing any trouble in max but in over 100ms is causing disturbance. By this we assume that the actual RTO should be 200ms max and min 80ms. The delta between the two graphs in bottom (average) shows the network latency between the two geographical separated sites - ~10ms.

Challenges in Online Charging System Virtualization

Recommend Documents