NOC - Network Operations and Control
Handling Preventive Maintenance Tasks Training Document
version 3
© Nokia Oyj
1 (31)
Handling Preventive Maintenance Tasks
The information in this document is subject to change without notice and describes only the product defined in the introduction of this documentation. This document is intended for the use of Nokia Networks' customers only for the purposes of the agreement under which the document is submitted, and no part of it may be reproduced or transmitted in any form or means without the prior written permission of Nokia Networks. The document has been prepared to be used by professional and properly trained personnel, and the customer assumes full responsibility when using it. Nokia Networks welcomes customer comments as part of the process of continuous development and improvement of the documentation. The information or statements given in this document concerning the suitability, capacity, or performance of the mentioned hardware or software products cannot be considered binding but shall be defined in the agreement made between Nokia Networks and the customer. However, Nokia Networks has made all reasonable efforts to ensure that the instructions contained in the document are adequate and free of material errors and omissions. Nokia Networks will, if necessary, explain issues which may not be covered by the document. Nokia Networks' liability for any errors in the document is limited to the documentary correction of errors. Nokia Networks WILL NOT BE RESPONSIBLE IN ANY EVENT FOR ERRORS IN THIS DOCUMENT OR FOR ANY DAMAGES, INCIDENTAL OR CONSEQUENTIAL (INCLUDING MONETARY LOSSES), that might arise from the use of this document or the information in it. This document and the product it describes are considered protected by copyright according to the applicable laws. NOKIA logo is a registered trademark of Nokia Corporation. Other product names mentioned in this document may be trademarks of their respective companies, and they are mentioned for identification purposes only. Copyright © Nokia Networks Oy 2002. All rights reserved.
2 (31)
© Nokia Oyj
Contents
Contents 1
Objectives ...............................................................................................4
2 2.1 2.1.1 2.1.2
Handling preventive maintenance tasks..............................................5 What is preventive maintenance ..............................................................6 Automatic maintenance functions in DX200 / IPA 2800...........................6 Handling manual maintenance functions .................................................8
3 3.1.1
Daily checking tasks ..............................................................................9 Handling checking tasks.........................................................................10
4
Preventive maintenance - Non-service effecting tasks ....................16
5 5.1 5.1.1 5.1.2 5.2 5.2.1 5.2.2 5.2.3 5.3
Preventive maintenance - Service effecting tasks ............................20 Preparing service effecting tasks ...........................................................21 Requesting maintenance work ...............................................................21 Handling measurements ........................................................................22 Techniques used to reduce alarms at the NMS .....................................22 Maintenance mode and filtering .............................................................22 Alarm blocking........................................................................................23 Using BTS maintenance modes.............................................................23 Types of service effecting tasks .............................................................23
6 6.1
Task work instructions ........................................................................25 Using the documentation........................................................................25
7 7.1 7.1.1 7.1.1.1 7.2 7.2.1 7.2.2
Follow-up procedures..........................................................................26 Testing and analysing ............................................................................26 Analysing test results .............................................................................27 Handling abnormal results......................................................................29 Reporting and logging ............................................................................30 Reporting................................................................................................30 Logging ..................................................................................................30
8
Summary ...............................................................................................31
© Nokia Oyj
3 (31)
Handling Preventive Maintenance Tasks
1
Objectives The aim of this module is to give the participant the knowledge needed for completing preventive maintenance tasks. After completing this module, the participant should be able to:
4 (31)
•
Depending on the individual responsibilities of the participant, identify the routine tasks that must be performed. In addition, identify when the task should be completed.
•
Explain how to plan and prepare preventive maintenance tasks in terms of the effect on service they will have. Also, outline a schedule when the tasks should be done and techniques available to cause minimum disturbance to network service.
•
Depending on the individual responsibility of the participant and the task involved, identify the task in question and where information on the procedure can be located within the system documentation.
•
Explain the importance of testing the network after any maintenance work, analyse the results, and outline how any corrective actions can be initiated.
•
Explain or outline any reports or maintenance logging procedures that may be necessary. In addition, should the individual tasks involve network measurements or results, describe how these results should be handled.
© Nokia Oyj
Handling preventive maintenance tasks
2
Handling preventive maintenance tasks When operating Nokia equipment, there is a certain set of tasks that has to be performed in order to keep the equipment functioning to its most efficient degree. This could include checking the system daily, installing software change notes (that is, new software code) and perhaps making back-ups of the software. The number and type of tasks performed is dependent of different factors. These may include the responsibility of a NOC person, the location of the equipment, and even contractual agreements.
ZAHO/P; ZCEL ; √ ZNLI ; √ ZUSI; ZISI;
Keep a log of events Following Procedures
Daily Checking Task • • • • • •
Alarm situation Speech circuit state Signalling link's state RNW state Making back-ups Checking units and I/O state
HLR
Regular Maintenance Task • • • •
Back-ups to tape Safecopying Installing Change Notes Cleaning the I/O devices
MGW RNC
Figure 1.
© Nokia Oyj
MSC BSC
Overview of preventive maintenance tasks
5 (31)
Handling Preventive Maintenance Tasks
2.1
What is preventive maintenance Preventive maintenance routines are designed to detect faults or problems before they reach the point of becoming network faults. In order to do this successfully the operator needs to set up an appropriate maintenance strategy, with responsibilities and procedures to be followed. Maintenance can be run in two ways: automatically by the system, or manually by the operating and maintenance (O&M) personnel. This section gives a brief description of the automatic functions (such as fault control), but the focus is on the routine maintenance actions. The purpose of preventive maintenance is to provide regular maintenance to the exchange in order to reduce the possibility of failures. Before performing any tasks, determine whether there will be any interruption in service. If so, then certain steps need to be taken before the task is executed. Note If you have a maintenance contract with Nokia, Nokia personnel may carry out some of the maintenance routines included in this manual.
2.1.1
Automatic maintenance functions in DX200 / IPA 2800 The modular structure of the network elements means not only fault tolerance but also ease of maintenance. If a plug-in unit becomes faulty, you replace it with a new plug-in unit. The redundancy schemes ensure that the network element remains operational even in fault situations. Maintenance consists of automatic functions integrated in the network element and functions carried out by the operating and maintenance personnel. Preventive maintenance focuses on the latter, even though a brief description of the automatic functions is also included in Figure 2.
6 (31)
© Nokia Oyj
Handling preventive maintenance tasks
Supervision of system
Alarm system
Recovery system
Diagnostic system
When a fault is detected, the system will take an automatic action to repair the situation. This may include switching links, change functional unit, etc.
Figure 2.
Automatic recovery actions in the DX200 and in IPA 2800
System maintenance handles all the fault situations and user-initiated configuration management tasks within the hardware and software of the system. System maintenance is responsible for availability performance on the network element level. It is designed to perform its task as automatically and autonomously as possible. For example, the RNC offers the possibility for the following maintenance procedures: •
•
Fault localisation for the RNC −
Finding the general part of the equipment, in which the fault exists
−
Finding the fault that is determining the faulty item of the equipment
Reconfiguration of the RNC −
•
Changing the quantity, types, or arrangement of hardware and/or software (for example, swap to redundant unit)
Reconfiguration supports to the WBTS −
© Nokia Oyj
WBTS detects its own hardware faults, disables the faulty logical object and informs RNC of the fault.
7 (31)
Handling Preventive Maintenance Tasks
Alarm
Supervision
FAILURE
Fault / disturbance observation
Fault Fault detection detection Performs Performs continuos checks and tasks in order to detect irregularities
Recovery
Activation Processing of Processing of of Fault Fault alarm alarm events events recovery elimination elimination Alarm Alarm printouts printouts Identifies faulty units
Diagnostics Activation of fault location
Fault Fault location location
Informs the user
User
User
Figure 3.
Automatic maintenance functions, fault management principles
Usually all activities can be performed remotely unless the action to be taken involves the hardware. RNC's alarm system is also responsible for monitoring the radio network by gathering alarm reports from the BTSs. It reports a decrease in service level to the user, including the cause of the service loss. When necessary, fault detection leads to automatic recovery actions.
2.1.2
Handling manual maintenance functions In addition to automatic system maintenance, preventive maintenance tasks can be categorised as either daily checking, preventive maintenance − service effecting, or preventive maintenance − non-service effecting. These are performed manually in order to assess the service provided and to safeguard the service level. Routines are completed at different intervals and in the following section these are presented as an idea. It is worth noting that operators normally have additional tasks that relate to the organisation.
8 (31)
© Nokia Oyj
Daily checking tasks
3
Daily checking tasks Monitoring the network is an operating task that is performed continuously. When a fault is reported, it must be investigated and a repair action must be taken. O&M personnel are not only responsible for monitoring. There are also other tasks that should be performed on a regular basis (such as daily checking tasks). When performing checking tasks, it is maybe more a question of finding the most appropriate time to perform the check rather than preparing the equipment. Most of the tasks should be performed on a daily basis to ensure the quality or that the state of the network has not diminished. In the beginning of every shift, a basic check of network functionality should be performed. The flowchart below outlines a way in which daily checks can be handled.
Perform check
No
Can the situation be fixed?
Fault
Is it a fault or due to main. work
Yes
Was it successfully fixed?
Work underway
No
Are the results normal? Yes
Yes
No
Escalate problem
Troubleshoot
Figure 4.
© Nokia Oyj
Follow-up procedures
Escalate to next level
Handling checking tasks
9 (31)
Handling Preventive Maintenance Tasks
Once the checking task has been performed, look to see if you get expected results. If the results are normal, then carry on with the necessary follow-up procedure. If the results are unexpected, check to see if it is due to a fault or if there is perhaps some scheduled work. If there is work under, then carry out the necessary follow-up procedures. If there is a fault, check to see if the situation can be fixed. If the fault cannot be fixed, the problem should be escalated to the next level. Alternatively, if the problem can be fixed, and depending on the individual's responsibility, troubleshoot the fault. If unresolved in the set time limit, escalate to next level. If it is resolved, carry out the follow-up procedure.
3.1.1
Handling checking tasks In the beginning of every shift, there are a number of daily tasks that should be performed to check the integrity of the network: Checking alarms
When alarms are generated, they are stored in a local database and at the NMS. Current alarms can be seen at the NMS using the Alarm Monitor tool. Local alarms are kept in the database of network element. It is also advisable to do a check using the NMS Alarm History tool. This will give an idea as to what has been happening previous to the shift. If the connection between the NMS and the network element is lost, alarms are placed into a temporary buffer until the connection is restored.
HLR alarms ZAHO
MSC alarms ZAHO
WBTS alarms RNC Object Browser
Easily viewed with the Alarm History RNC alarms ZAHP BTS alarms ZEOL Copies of all alarms are kept Alarms buffered before Access to alarm at the NetAct. being sent to NetAct manual BSC alarms ZAHO
Figure 5.
10 (31)
© Nokia Oyj
Distribution of alarms in the network
Daily checking tasks
Checking the radio network status
The BTS/WBTSs are controlled by the BSC/RNC, which always monitors their operational state. A base station has an administration state, which is set by the operator. Checking the radio network status will let us know if the base stations are able to carry traffic. It is also possible to check the administrative state by changing the mode in NetAct top-level user interface. Alternatively, MML/NEMU can also be used to check the administrative and the operational states in 2G and the operational state of all WCDMA cells as well as showing which cells have alarms in 3G. Expect to see the administrative state unlocked and the operational state working (WO).
Change mode in TLUI to see the state of objects
< ZEDO; < ZEDO; DX 200 MSC02 2000-06-12 16:01:11 BASE CONTROLLER DATA FOR BSC07 2000-06-12 NUMBER16:01:11 00007 DX STATION 200 MSC02 ======================================================== BASE STATION CONTROLLER DATA FOR BSC07 NUMBER 00007 ======================================================== BSC ADMINISTRATIVE STATE .............. :UNLOCKED BSC OPERATIONAL STATESTATE ................. :AVAILABLE BSC ADMINISTRATIVE .............. :UNLOCKED BSC BSC TRAFFIC OPERATIONAL LIMITATION STATE ............... ................. :0 :AVAILABLE % SIGNALLING NETWORK CODE ........(SNC).. :NATIONAL NETWORK NUMBER 0 BSC TRAFFIC LIMITATION ............... :0 % SIGNALLING POINT CODECODE ..........(SPC).. :000370 ( hexadecimal ) 0 SIGNALLING NETWORK ........(SNC).. :NATIONAL NETWORK NUMBER SIGNALLING POINT CODE ..........(SPC).. :000370 ( hexadecimal )
EPO:; EPO:; DX 200 MSC02 2000-06-12 16:01:39 BASE NUMBER 06301 DX TRANSCEIVER 200 MSC02STATION DATA FOR BTS6301 2000-06-12 16:01:39 ========================================================= BASE TRANSCEIVER STATION DATA FOR BTS6301 NUMBER 06301 ========================================================= BSC NAME :BSC07 NUMBER :00007 LA BSC NAME :LAC1763 LAC NAME :BSC07 NUMBER:01763 :00007 CELL LA IDENTITY NAME :LAC1763 ..........................(CI).... LAC :06301 :01763 BTS ADMINISTRATIVE STATE ....................... :UNLOCKED CELL IDENTITY ..........................(CI).... :06301 BTS ADMINISTRATIVE STATE ....................... :UNLOCKED
DX 200 BSC4-KUTOJA 1997-10-19 RADIO NETWORK CONFIGURATION IN BSC:
LAC ===
MML Command : ZEDO;
15:54:59
B C ADM OP S D-CHANN BUSY STA STATE ARFCN ET-PCM BCCH/CBCH U NAME ST HR FR === ====== ===== ====== =========== = ===== == === ===
CI ==
BCF-001 C1755DF21INDOOR BTS-001 01755 05500 TRX-001 C1755DF21INDOR2 BTS-002 01755 05501 TRX-002 C1755DF21INDOR3 BTS-003 01755 05502 TRX-003
U U L U U U U
WO WO BL-USR WO WO WO WO
0 BCF01 WO 10
34 MBCCHC
0
13
34 MBCCHC
0
16
34 MBCCHC
0
0
0
0
0
0
0
COMMAND EXECUTED
MML Command : ZEEI; MML Command : ZEPO;
Figure 6.
© Nokia Oyj
Checking the status of 2G radio network
11 (31)
Handling Preventive Maintenance Tasks
Change mode in TLUI to see the state of objects
Figure 7.
Checking the status of 3G radio network
Checking the functional unit states
IPA2800 network element is comprised of several functional units that perform various tasks. For example, one functional unit is responsible for signalling, another for call control. There should always be a working unit (WO-EX), and a redundant unit (SP-EX) that can be used if the working unit fails. Checking the working states of I/O device
In addition to functional units, there are also I/O devices connected to IPA2800 network elements, such as disks that are used to store the actual software and data. In addition, there are cartridge tape unit, floppy disk unit, printer, etc. These could be checked. Expect to see WO-EX.
12 (31)
© Nokia Oyj
Daily checking tasks
2n ECET
ET
BSS
2n
TGFP
CLS
CDSU
PSTN EXT. SYNC.
GSW ECET
PABX
ET
HLR
n+1 CCSU
n+1/L
MFSU 2n CCMU
n+1
BSU
CM
n+1
CASU
MB 2n
2n CMU
n+1
PAU
VLRU
2n M
2n
2n
CHU STU LAN LAN
NSS
X.25 or LAN to OMC and SMS 2n
BDCU 2n OMU
LAN
VDU and LPT VDU and LPT
MSC MML command: ZISI; MML command: ZISI;
- Units - I/O devices
RNC
MML command: ZUSI; MML command: ZIHI;
Figure 8.
- Units - I/O devices
Checking the unit state
Checking the state of the signalling links
If a problem in the signalling links between network elements occur, an alarm is generated. However, it is also advisable that the states of the signalling links are checked to confirm that all of using signalling links are available. A new logical connection configuration is created to display the transmission in the Iub interface and other connection related parameters. CoCo object is used to reserve the transmission resources in RNC for the WBTS.
HLR
RNC
DCN
MGW MSC
WBTS CoCos - NEMU
ZDTI LapD Links
PSTN
BSC
NLI :; NLI:; DX 200 MSC02
DX 200
DX 200 MSC02 SIGNALLING LINK STATES SIGNALLING LINK STATES TERM LINK TERM STATE LINK LINK LINK SET UNIT TERM FUNCT LINK LINK SET -----------------UNIT TERM FUNCT ----STATE ------------------------------0 16-------HLR01 CCSU -2 0 0 UA-----INS HLR01 CCSU CCSU-1-2 -INS 10 1616HLR01 00 00 AVUA -EX HLR01 CCSU CCSU-2-1 -EX 21 1716MSC01 10 00 AVAV -EX MSC01 CCSU CCSU-1-2 -EX 32 1817MSC03 61 00 AVAV -EX CCSU-2-1 -EX 43 1918MSMSC03 C04 CCSU 26 00 AVAV -EX MSC04 BSU CCSU -EX 54 2019BSC01 -1 -2 0 2 0 0 AVAV -EX 5 20 BSC01 BSU -1 0 0 AV -EX 6 20 BSC01 BSU -1 1 0 AV -EX BSC01 CCSU BSU -2 -1 -EX 100 6 2120PSTN 51 00 AVAV -EX 100 PSTN CCSU CCSU-1-2 -EX 101 2121PSTN 55 00 AVAV -EX 101 21 PSTN CCSU -1 5 0 AV -EX COMMAND EXECUTED COMMAND EXECUTED
Figure 9.
© Nokia Oyj
BSC1 -KUTOJA
2000 -06-23
17:23:12
2000 -06-12 18:25:16 2000 -06-12 18:25:16 SIGNALLING LINK STATES INFO INFO -------
LINK ---0 1
LINK SET -------16 MSC02 16 MSC02
COMMAND EXECUTED
TERM LINK UNIT TERM FUNCT STAT E ---------------------BCSU -1 0 0 AV -EX BCSU -1 0 1 AV -EX
INFO ----
ZNLI
Checking the signalling links
13 (31)
Handling Preventive Maintenance Tasks
Checking the state of ATM interface
The ATM interface is the basis of a VP link termination point or VC link termination point, which is the basis of the connection. All ATM interfaces need to be checked to ensure that the administrative state is unlocked and the operational state is enabled.
IN T E R R O G A T E IN T E R F A C E IN T E R F A C E E X C H A N G E A D M IN ID T Y P E T E R M IN A L STATE -----------------------------1 NNI SET 1 UNLO CKED ACC GSMP PROF PORT ------YES NO
OPER ST A T E -------ENABLED
IN T E R F A C E E X C H A N G E ID T Y P E T E R M IN A L ------------------2 NNI SET 2 ACC GSMP PROF PORT ------YES NO
A D M IN STATE -----------LO C K E D
OPER ST A T E -------ENABLED
IN T E R F A C E E X C H A N G E A D M IN ID T Y P E T E R M IN A L STATE -----------------------------3 UNI SET 0 UNLO CKED
M SC
ACC GSMP PROF PORT ------YES NO
MGW
OPER STATE -------ENABLED
ZLAI
CO MM AND EX ECUTED
W BTS
RNC
Figure 10.
State of ATM interface
Checking the TCP/IP ATM interface
The IP over ATM (IPOA) configuration creates VC from external extermination point at NIU to internal termination point in the computer unit and attaches the new VC to an IP interface
14 (31)
© Nokia Oyj
Daily checking tasks
Figure 11.
Checking TCP/IP ATM interface
Checking the states of all speech circuits, circuit groups and routes
The speech or data traffic between exchanges or between RAN and CS network is carried by a circuit, which is grouped into a circuit group. The circuits and circuit group have to be checked to ensure that they are in the working state using ZCEL or ZRCI command in 2G networks. In 3G networks use the commands ZCRL or ZRRI to check the routes. Printer and tape inspection
Ensure that all printers have enough paper and that tape units have enough tape for the next 24 hours, and that they are operating correctly. Checking transmissions alarms
A transmission network can be managed either locally using the Node Manager software or centrally from NetAct. NetAct provides complete management of the transmission elements including, for example, cross-connects, multiplexers and SDH products. The precondition is that the transmission views of the network are already made.
© Nokia Oyj
15 (31)
Handling Preventive Maintenance Tasks
4
Preventive maintenance - Non-service effecting tasks In tasks such as daily checking of the system, it is more a matter of handling the follow-up procedures. But when performing more difficult maintenance tasks, it becomes more important to plan and prepare the work. Preventive maintenance includes all planned maintenance actions, which are routine by nature and resourced in a periodic manner. These routines are designed to detect faults or problems before they reach the point of becoming network faults.
Yes
If needed, book time to do task
Is task service effecting? No
Is the NMC* aware of the work
Perform task
If needed, block alarms
No
Will system be restarted? No
Can the situation be fixed?
No
Yes
Are the results normal? Yes
Yes Ensure that the system is ready (e.g. measuremets)
Escalate the problem
Escalate to the next level
No
Troubleshoot
Was it successfully fixed?
Yes
Return system back to normal
Follow -up procedures
Network Management Center - main control center
Figure 12.
16 (31)
© Nokia Oyj
Handling common maintenance tasks, non-service effecting
Preventive maintenance - Non-service effecting tasks
After the non-service effecting task has been performed, check the results. If they are normal, continue with the follow-up procedures. If the results were not expected, check if the situation can be fixed. If the situation is unable to be fixed, then escalate the problem. It may need to be escalated to the next level, and then continue with the follow-up procedures. But, if the situation can be fixed, and has been resolved successfully, then carry out the follow-up. Alternatively, if it has been fixed, but unsuccessfully, it will need to be escalated. Either troubleshoot the problem if authorised, or else escalate the problem to the next level. Always follow up once you have finished the task. There are a number of non-service effecting tasks that can be performed: Safecopying
There are many instances when safecopying should be performed in the DX200 and IPA2800 platform; for example, before and after software upgrades, after any major changes, or updating data files in the FB package. You can use the command calendar to execute the fallback and backup commands. The command calendar runs an MML command or a command file containing several MML commands at the time you have specified. You still need to check whether the safecopying succeeded or not, but you do not need to type the commands by hand. Safecopying should also be done for NEMU as well, but the NEMU SW backup is taken by an external backup application not using the DX safecopying. Also, backup systems may be done by someone else (that is, system administrator) or whoever is responsible for taking backups of NetAct, since all backups are stored in one central location. Also, you still need to check whether the safecopying succeeded or not. Diagnostics and testing
Run fault diagnoses on all spare computer units, synchronisation units and network interface units to ensure that the hardware is in working order and the cabling is correct. Note In the IPA 2800 platform, there can be ten simultaneous tests active at the same time. However, only one test per unit can be active at the same time. The unit diagnoses may give false results if the internal cabling is faulty, since the fault location programs test plug-in units, not cabling.
© Nokia Oyj
17 (31)
Handling Preventive Maintenance Tasks
Various checking tasks
Several other tasks such as blocked alarms, time in the element (needs to be accurate for call duration, activating measurements, and to time stamp alarms), database cleanup (acknowledge unacknowledged cancelled alarms in the NMS), hardware alarms (to check the alarms and alarm inputs) and voltage in the cabinet. All the equipment that is not supplied by Nokia and that may have an effect on the performance of the exchange must be regularly tested according to the manufacturer's instructions. Examples of this include: •
Check at regular intervals that the standby power supplies and their alarm systems are in working order.
•
Service the environmental equipment (such as fans and air conditioning) regularly and test the alarm systems.
•
Test the equipment room fire alarm system regularly.
It is recommended to perform non-service effecting tasks at regular intervals. The tasks are recommended to be performed either daily, weekly, bi-weekly, monthly, semi-annually, or annually. The following is an example of recommended tasks that are performed in normal network operation by Nokia. It is likely that the operator will perform these at different intervals. Therefore, the participant is advised to get familiar with the tasks that they are expected to perform. Table 1.
18 (31)
Non-service effecting tasks frequency intervals
Description
Frequency
Back up the LFILES in the FB package
Daily
Copy the FB package to the DAT
Daily/ Bi-weekly
Check what alarms are blocked
Daily/ Weekly
Clean the CTU
After 25 hours of continuous use
Print and save the unit states
Weekly
Check the time in the element
Weekly
Check printer ribbon
Weekly
Database clean up
Weekly
General cleaning of the MTU
Fortnightly
Check the hardware alarms
Monthly
General cleaning of printer
Monthly
General cleaning of VDU
Monthly
© Nokia Oyj
Preventive maintenance - Non-service effecting tasks
Description
Frequency
Diagnose computer units
Semi-annually
Changeover computer units
Semi-annually
Set the summer/winter time
Semi-annually
Clean hardware
Semi-annually
Clean the FDU
Annually
Check the voltage
Annually
© Nokia Oyj
19 (31)
Handling Preventive Maintenance Tasks
5
Preventive maintenance - Service effecting tasks Before we can perform any of these tasks, there needs to be some planning involved if the task is service effecting. The planned work process helps to keep the Network Management Centre (NMC) and others informed of any work that is taking place in the network. If the task is service effecting, everyone from the NMC to customer care knows and understands what is happening. More difficult tasks that require a physical action or that have an effect on the service require planning on when the task can take place. If we take a change note installation as an example, it is necessary to restart the equipment to get it to use the change note. A system restart results in an entire system being unavailable for a short time. Therefore, the objective would be to complete this at a low traffic period. In addition, an operator can have many MSC/BSCs, which means that many change notes might be installed at one time.
Is task service effecting?
Yes
If needed, book time to do the task
No
Is the NMC* aware of the work
Perform task
If needed, alarm reduction technique
No
Will the system be restarted No
Can the situation be fixed?
Are results normal
No
Yes
Yes
Yes Ensure the system is ready (e.g. measurements)
Escalate the problem
Escalate to the next level
Was it successfully No fixed?
Troubleshoot
Yes
Return system back to normal
Follow -up procedures
Network Management Center - main control center
Figure 13.
20 (31)
© Nokia Oyj
Handling common maintenance tasks, service effecting
Preventive maintenance - Service effecting tasks
Since the task has an effect on service, booking time to do the task may be necessary. If the system is going to be unavailable for a short time, the work should be completed at a time when the traffic period is low. Also, the Network Management Centre (NMC) should be made aware that work is going to be performed, that will have an effect on the service. If needed, reduce the alarm flow to the NMS. If the work requires a system restart, ensure that the system is ready. This means making sure that the measurements are stopped, since a restart may alter the measurement results. Then proceed in the same way as in checking non-service effecting tasks.
5.1
Preparing service effecting tasks If handling tasks that will have an effect on the service (that is, call set-up will not take place), more planning is needed. Before starting any maintenance task, it is important to establish the following:
5.1.1
•
In many organisations it is necessary to plan the work far ahead so the right people are involved and, also, the management centre is aware that there will be operational work and subscribers will be effected.
•
If several tasks have to be done and they do not affect one another, performing them separately may lead to longer outage times. If possible, it could be planned to perform several tasks at the same time.
•
If the network element is to be reset, then measurement collection may have to be stopped.
•
Maintenance work will cause unnecessary alarms being generated. These can be reduced in order not to disrupt the work of the NOC centre.
Requesting maintenance work Depending on an operator's workflow, it may be that a team will plan what maintenance work is carried out and when. Once the dates have been fixed, the fact that the maintenance work is carried out maybe placed into a log. This log may be a manual entry or on a computer system. An example of this could be the Nokia trouble management system, or simply adding notes to elements on the NMS/2000 top-level user interface.
© Nokia Oyj
21 (31)
Handling Preventive Maintenance Tasks
5.1.2
Handling measurements The network element continuously collects measurements about its activities (such as traffic, resources and radio access). After a measurement period, the network element returns a set of values to the NMS, which is used in quality reports to identify the state of the network. When a network element is restarted (such as in upgrades and change notes), the value of the measurements is set again to 0. This results to the fact that the values sent to the NMS are wrong. Although the effect is not so great, it is noticeable in quality reports. Therefore, it is occasionally recommended that the measurements are stopped before carrying out the task, and restarted afterwards.
5.2
Techniques used to reduce alarms at the NMS Tasks such as restarts, transmission repairs and equipment related (that is, hardware unit replacement) can cause many alarms being generated. For NOC operators, this can disturb their task of monitoring the network as these are not 'new' faults. Therefore, it is possible to reduce the alarm flow.
5.2.1
Maintenance mode and filtering There are two methods that allow for alarm reduction in NetAct. The first method is using the maintenance mode in NetAct. This is the recommended approach, since it is highly visible and easily activated from TLUI. The alarms from the selected network element are filtered from the fault management applications, and once maintenance mode is removed, and automatic alarm upload is done to refresh the alarm situation for that element. An alternative approach would be to create a filtering rule using the FM Rule Editor that allows for all or specified alarms from being seen at NetAct. If alarms are filtered, it is important that, after the maintenance work, the rule be deactivated. WARNING It may be the case that only a few people have access to the filtering tool. Also, only one user can use the tool at any time.
22 (31)
© Nokia Oyj
Preventive maintenance - Service effecting tasks
5.2.2
Alarm blocking When using alarm blocking commands (ZAB in 2G and ZAF in 3G) locally, the network element will not generate specified or all alarms. WARNING If alarms are blocked at the network element, unblocking the alarms will not re-generate them. The only way to refresh the alarm situation is by performing a restart. It is also possible to specify the date and time that you would like to have the alarms blocked.
5.2.3
Using BTS maintenance modes When a BTS is being subjected to any type of manual work, unnecessary and irrelevant alarms are produced. Therefore, one technique to limit this is to use maintenance modes to limit the alarms being generated by the base station, when working is being carried out. The maintenance mode commands (ZEM) are primarily used in testing or in installing new BTSs to the BSC. Sometimes it is necessary to prevent sending BTS alarms to the NetAct and outputting them locally. This method of alarm reductions is not recommended since the visibility of the sites in maintenance mode is not easily known.
5.3
Types of service effecting tasks When it comes to performing service effecting tasks, there are very few that fit into this category. The task can be split into three groups: •
Software upgrades
•
Hardware upgrades
•
General site inspection
Network elements are carried out when necessary. However, an operator may perform regular site inspections to check the condition of the physical site. Examples of MSC, HLR and BSC maintenance are upgrading the software or hardware units, repairing plug-in units or replacing site equipment (e.g. fuses). Software upgrades are divided into different groups according to delivery mode. These include release level upgrades (major update), correction update and feature level upgrade (minor upgrades). Hardware upgrades may be required due to a software upgrade.
© Nokia Oyj
23 (31)
Handling Preventive Maintenance Tasks
The following table describes what the tasks are and how often you need to carry out the maintenance routines. Table 2.
Service effecting tasks frequency interval
Description
When performed
Installation of release-level upgrade
When arrives and time available
Installation of change notes
When arrives and time available
Installation of feature-level update
When arrives and time available
Installation of technical note
When arrives and time available
Upgrade of hardware
When arrives and time available
General site inspection of external equipment
When manufacture recommends
Loop Test
Weekly or when testing
Note It is also a good idea to monitor any open technical notes (field support bulletins) to ensure that the ‘work around’ solution and correction plan works as recommended. Also, Nokia Online Services has an e-mail service, which can send you the latest technical notes on the products that you have selected.
24 (31)
© Nokia Oyj
Task work instructions
6
Task work instructions When operating a telecommunication network, all the systems must be documented. Detailed information about equipment is needed to be able to perform maintenance work. The number of people engaged in daily work is so large that detailed operating procedures and work instructions are needed. Personnel need some type of documentation to outline what needs to be performed. Work instructions might explain how to perform the task or give reference to manuals, which will help further explain or give instructions. These instructions also act as a checklist so that tasks are not forgotten. Performing checking tasks can be achieved from a remote location such as the NMS terminal. However, tasks that require changing software or hardware units, have to be performed on site with the equipment. The documentation to carry out the tasks is not reproduced here, because the student is expected to find the information from the documentation.
6.1
Using the documentation In order to perform the maintenance tasks, there are many manuals that are used to give instructions to help perform the tasks. Reference to these manuals are given in the individual practical sections.
© Nokia Oyj
25 (31)
Handling Preventive Maintenance Tasks
7
Follow-up procedures Once the task has been performed, it is necessary to take the appropriate action(s) to complete the task. Also, if the task has not been completed but escalated, procedures should be followed as well. These procedures can be one or many steps such as testing and analysing and/or reporting and logging. Maintenance work may cause disturbance to normal operations in the network. In order to keep the network service quality as high as possible as well as to minimise any down time, network integrity has to be assessed after any maintenance work. Hence, any recommended testing should be performed in addition to checking tasks.
7.1
Testing and analysing Tests are useful for finding faults in the network as well as for verifying that the procedure was performed correctly. Once the work is completed, any recommended testing should be performed and results analysed to make sure the appropriate results were achieved. There may not be a test for every task though (that is, checking tasks) but in cases where maintenance work was performed, testing should be done to confirm it was done successfully. Safecopying
Whether safecopying the L-Files or taking a full FB package, check to make sure the safecopy was successful, has the correct information and expected results. You may check NEMU software backup procedure using OmniBack II graphical user interface. See HP OpenView OmniBack II Administrator's Guide, Chapter 5: Backing Up Data for more details. Upgrades
With the upgrades comes the recommended testing. Follow the specified testing documentation. A successful upgrade should display the expected results.
26 (31)
© Nokia Oyj
Follow-up procedures
7.1.1
Analysing test results To verify the functionality of the network element after Change Delivery installation, some test cases should be performed. Change Note is not a test plan, and hence it should not be followed literally. It only includes an assortment of tests that are recommended to be executed after a software change. Test instructions outline the pre-requirements, test execution as well as expected and unexpected results of the test. Successful change delivery should result the expected results. A full test case list can be found from the document Testing Instructions for the Change Note. Example 1.
Example of one recommended test of BSC Change Delivery 0.2 for SW S9 9.18-0
Reason for Change Note: Case 1. BCF stays in BL-SYS after BCF reset, because BTLARM sends all messages to kernel (PID=NULL). BTLARM found slave for BCF but slave pid is still null. Case 2. Some TRXs may stay in BL-RST operational state after BSC restart.
Testing Instructions for S9_015 Pre-requirements: Case 1. Working BSC Case 2. Working BSCE, 4C32/4HX (MCMU), 128 TRXs.
Test Execution: Case 1. Give BCF reset ZEFS:1:L; ZEFS:1:U Case 2. Give system restart ZUSS:SYM:C=DSK;
Expected Results: Case 1. BCF is resetted and recovers properly. Case 2. No TRXs in state BL-RST after system restart.
Unexpected Results: Case 1. BCF reset is not executed. BCF stays in state BL-SYS. Case 2. Some TRXs in state BL-RST after system restart without any specific reason.
© Nokia Oyj
27 (31)
Handling Preventive Maintenance Tasks
Example 2.
Example of correction change note and recommended test of MSC/HLR Change Delivery for SW M9 4.10-0
Reason for the Change Note: UNEXPECTED DATA VALUE IN SUB ADMIN COMMAND When CW is tried to provide simultaneously with withdrawing any of these supplementary services: AOC, HOLD, CLIP, CT, MTPY, CLIR or vice versa, error /***DX ERROR: 196644***/ /***UNEXPECTED DATA VALUE***/ appears.
TESTING INSTRUCTIONS FOR THE CHANGE NOTE Pre-requirements: 1.
Create a subscriber with MML command ZMIC:…;
2.
Make Location Update to the subscriber.
Test execution: 1. Set the supplementary services of the subscriber. MML command: ZMSD:IMSI=…:CBO=BOIC&BOIH,CBI=N,PSW=0000,CFU=N,CFB=N,CFN R=N,CFNA=N,AOC=N,CW=N,HOLD=N,CLIP=Y,CLIR=PCBN,RDI=0,OCCF =N; 2.
Check the subscriber's supplementary services for the HLR. MML command: ZMSO:IMSI=…;
3.
Check the subscriber's supplementary services for the VLR. MML command: ZMVS:IMSI=…:BSERV=T11;
Expected results: 1.
The command is successful. MML response is 'COMMAND EXECUTED'
2.
and 3. Status of the services is correct both in the HLR and in the VLR.
Unexpected results: 1.
28 (31)
The command fails with error code 'UNEXPECTED DATA VALUE'.
© Nokia Oyj
Follow-up procedures
7.1.1.1
Handling abnormal results
Sometimes maintenance work is not successfully completed and some faults or unsuccessful test results may remain. Hence, it is necessary to make a decision how to proceed. If the performance of maintenance task has followed the instructions and faults remain, the proceeding options are:
*Yes
If needed, book time to do the task
Is task service effecting? No
Is the NMC* aware of the work?
Perform task
If needed, block alarms Will the system be restarted?
No
Can the situation be fixed?
No
Are the results normal?
No Yes
Yes
Yes
Ensure that the system is ready (e.g. measurements)
Escalate the problem
Esclate to the next level
No
Was it successfully fixed?
Troubleshoot
Yes
Return the system back to normal Follow-up procedures
Network Management Center - main control center
Figure 14.
Handling abnormal results
Troubleshooting the problem
This option is available depending on the individual's responsibility. If unauthorised to troubleshoot, then it should be escalated. If authorised, there should be a certain time allocated to solve the problem, for example 10 minutes. If it is unresolved, then it should be escalated to the next level. Escalating the problem to the next level
Not all monitoring personnel are authorised to troubleshoot. In such cases, or if the problem still remains unresolved, it should be escalated. Depending on how the organisation is set up, it may be assigned to either second level surveillance, or to engineering services, or perhaps to performance management.
© Nokia Oyj
29 (31)
Handling Preventive Maintenance Tasks
7.2
Reporting and logging Whether there is any testing or not, the next step is to report and log events. Regardless of the task performed, some final actions should be taken to recognise that the task has been completed.
7.2.1
Reporting It is important to report the results, both expected and unexpected. In this way everyone is kept current and the proper people can be notified of the situation. Whether your notify the shift leader, engineering support, or customer care, every network has different methods for reporting.
7.2.2
Logging After the results have been reported to the appropriate person, the last step is to make a log. The purpose of a log is to have a 'record of performance, events, or day-to-day activities' completed. A log is a helpful reference to report or check for any consistencies or inconsistencies, tasks performed too frequently of not frequently enough, report the work in process, check for planned work. A log can be anything from a checklist to an exchange diary, to a troubleticket system. Every network implements their logging in different ways. Once the testing is complete, it is recommended that an exchange diary be maintained and updated. It helps to determine whether maintenance tasks are performed too frequently or not frequently enough. Tip Start filling in the exchange diary already when the exchange is being set up and installed. It is recommended that the following events are recorded in the exchange diary:
30 (31)
•
Hardware changes
•
Software and hardware updates (change notes, correction packages etc.)
•
Essential modifications to the configuration or routing in the exchange
•
Safe-copying
•
Operational failures
•
Any other relevant information.
© Nokia Oyj
Summary
8
Summary The type of routine checking and maintenance task performed by a NOC person is very dependent on the organisation, their responsibility and skills. However, to keep the network working optimally, there are identifiable tasks that must be performed. On a daily basis, the system has to be checked for any problems. In addition, there are tasks that are performed on a weekly, monthly and whenneed basis. When handling any maintenance work, it is important to plan and prepare tasks, especially in terms of the effect on service they will have. In addition, a schedule when the tasks should be performed is essential as well as using available techniques to cause minimum disturbance to network service. When performing the task, the Nokia documentation is in-depth, concise and should be used when performing any maintenance work, especially if a new NOC person is performing the task. After maintenance work, it is important to test that the work has been successful and that the system is working normally again. If problems exist, these should be reported as soon as possible. When completing any work, it is important to keep a log of all the tasks that have been performed for later reference. Each network element should have its own diary about the activities that have been applied to it.
© Nokia Oyj
31 (31)