Securi Se curi ty Leve Level: l:
eRAN3 RAN3..0 Se Ser v ic ice e Dr Dr o p Opt ptim imiz iza ati tion on Gui Guide de LTE Performance Maintenance Team 2012-03-05 www.huawei.com
HUAWEI TECHNOLOGIES CO., LTD.
Huawei Confidential
Change His Histo tory ry Dat e
Ver s i o n
Des c r i p t i o n
2012-01-10
1.0
Completed the draft.
R e v i ew e r
Page 2
Aut hor
R&D Per s o nn nne el Nam e
Em p l o yee ID
Co n t ac t In f o r m at i o n
He Yongzhen
00107656
See Huawei telephone directory.
Page 3
Ab A b s t r ac actt •
This slide descri describes bes formulas formu las of key performance indi indicator cator (KPI) counters, mechanism of traffic measurement counters, service drop rate and factors affecting KPI KPIs, s, common commo n fault location methods, m ethods, and and deliverables deliverables to be b e subm submitted itted for further furt her fault fault location l ocation if a fault causing t he servic service e drop cannot be located located based based on rou tine troub leshooti leshooting ng operations.
Page 4
Contents •
Formul as of Service-Drop-Related Counters
•
Common Symptoms of Service Drops
•
Causes of Service Drops and Data Handling
•
Checklist and Deliverables for Service Drop s
•
Service Drop Cases
Page 5
Formulas of Service-Drop-Related Counters on the UE Side (1/2) •
On the UE side
•
Call Drop Rate = eRAB AbnormRel/ eRAB Setup Success *100%
eRAB AbnormRel: indicates the number of abnormal E-RAB releases.
eRAB Setup Success: indicates the number of successful E-RAB setups.
Definition Stated in Huawei Genex PA
1. The UE receives the RRC Connection Reconfiguration message in a scenario where no Non-Access Stratum (NAS) message "DEACTIVATE EPS BEARER CONTEXT REQUEST" is received, no NAS message "DETACH REQUEST" is received from the MME, and no NAS message "DETACH REQUEST" is sent to the network side. The RRC Connection Reconfiguration message carries a "drb-ToReleaseList" information element (IE) and the ERABAbnormalRel counter is incremented by 1. The number of eps-BearerIdentitys under the Releaselist is recorded. ERAB num indicates the number of released E-RABs. The ERAB num is subtracted by 1 for each abnormal release. If the E-RAB number becomes 0, the UE state becomes RRC_IDLE; otherwise, the UE state does not change.
Page 6
Formulas of Service-Drop-Related Counters on the UE Side (2/2)
2. The UE receives the RRC connection release message in a scenario where no NAS message "DEACTIVATE EPS BEARER CONTEXT REQUEST" is received, no NAS message "DETACH REQUEST" is received from the MME, and no NAS message "DETACH REQUEST" is sent to t he network side. In this case, an abnormal release is counted into the ERABAbnormalRel counter if RLC transmission exists in 4s before receiving the RRC connection release message (both uplink and downlink transmission must be considered; the condition is met as long as data transmission is performed in either direction). Then, the UE state becomes RRC_IDLE.
3. An abnormal release is counted into the ERABAbnormalRel counter if the UE is in the RRC_IDLE state before receiving the RRC connection release message. The ERABAbnormalRel counter is incremented by 1 and the E-RAB num is incremented based on the number of releases.
4. An abnormal release is counted into the ERABAbnormalRel counter if the UE sends an RRC connection request message in a scenario where no RRC Connection Reconfiguration, DEACTIVATE EPS BEARER CONTEXT REQUEST, DETACH REQUEST, RRC State, and RRC Connection release message is received.
5. An Abnormal E-RAB release event is simultaneously recorded along with an RRC connection reestablishment failure event. Note that some sites may have the UE-initiated reestablishments counted into service drops because different acceptance conditions are used in various sites.
Page 7
Formulas of Service-Drop-Related Counters on the Network Side •
On the network s ide
Call Drop Rate = L.E-RAB.AbnormRel/(L.E-RAB.NormRel + L.ERAB.AbnormRel)*100%
L.E-RAB.AbnormRel: indicates the total number of abnormal E-RAB releases.
L.E-RAB.NormRel: indicates the total number of normal E-RAB releases.
Page 8
Abnormal Release Counter on the Network Side •
As shown by point A in figure 1, when the eNodeB sends an E-RAB Release Indication and the cause value is not Normal Release, User Inactivity, cs-fallback-triggered, and inter-RAT redirection, the L.ERAB.AbnormRel counter is incremented by 1. If the E-RAB Release Indication requires the release of multiple E-RABs, related counters are incremented based on the number of releases.
•
As shown by point A in figure 2, after the eNodeB sends a UE Context Release Request to the MME, all E-RABs of the UE are released. If the release cause value is not Normal Release, User Inactivity, cs fallback triggered, and Inter-RAT redirection, related counters are incremented.
Note:
In the E-RAB release procedure, one or multiple E-RABs are released. At least one default bearer
remains after the E-RAB release procedure is complete. In the UE Context Release procedure, all E-RABs of the UE are released. No bearer, even no default bearer, remains after the UE Context Release procedure is complete. Page 9
Counters Indicating Causes of Abnormal Releases on the Network Side (1/2) •
By abnormal-release c ause, the counters can be classifi ed into five types:
•
L.E-RAB.AbnormRel.Radio: number of abnormal E-RAB releases caused by radiolayer problems
L.E-RAB.AbnormRel.TNL: number of abnormal E-RAB releases caused by transportlayer problems
L.E-RAB.AbnormRel.Cong: number of abnormal E-RAB releases caused by network congestion
L.E-RAB.AbnormRel.HOFailure: number of abnormal E-RAB releases caused by handover failures
L.E-RAB.AbnormRel.MME: number of abnormal E-RAB releases caused by EPC problems
Abn or mal E-RAB rel eases caused by EPC pro bl ems
As shown by points A in figures 1 and 2 on the right, the MME initiates an E-RAB or UE context release procedure. If the cause value of the E-RAB Release Command or the UE Context Release Command message received by the eNodeB from the MME is not Normal Release, Detach, User Inactivity, cs fallback triggered, or inter-RAT redirection, the cause is counted into the L.E-RAB.AbnormRel.MME counter.
Note: The L.E-RAB.AbnormRel.MME counter is not included in the L.E-RAB.AbnormRel counter, that is, abnormal E-RAB releases caused by EPC problems are not recorded as service drops from eRAN2.1 V100R003C00SPC400.
Page 10
Counters Indicating Causes of Abnormal Releases on the Network Side (2/2) •
Abn or mal E-RAB rel eases caused by non-EPC pro bl ems
As shown by point A in figure 3, when the eNodeB sends an E-RAB Release Indication to the MME, carrying a cause value being radio error, the L.ERAB.AbnormRel.Radio counter is incremented; if the cause value indicates a transport-layer problem, the L.E-RAB.AbnormRel.TNL counter is incremented; if the cause value indicates congestion, the L.E-RAB.AbnormRel.Cong counter is incremented. If the E-RAB Release Indication requires the release of multiple E-RABs, related counters are incremented based on the number of releases of corresponding causes.
As shown by point A in figure 4, after the eNodeB sends a UE Context Release Request to the MME, all E-RABs of the UE ar e released. If the cause value indicates a radio error, the L.E-RAB.AbnormRel.Radio counter is incremented; if the cause value indicates a transport-layer problem, the L.E-RAB.AbnormRel.TNL counter is incremented; if the cause value indicates congestion, the L.E-RAB.AbnormRel.Cong counter is incremented and records abnormal releases caused by preemption and resource congestion; If the cause value indicates a handover failure, the L.ERAB.AbnormRel.HOFailure counter is incremented. Related counters are incremented based on the number of releases of corresponding causes. Releases are not counted again when the MME responds with a UE Context Release Command message.
Page 11
Contents •
Definit ion of Servi ce-Drop-Related Counters
•
Common Symptoms of Service Drops
•
Causes of Service Drops and Data Handling
•
Checklist and Deliverables for Service Drop s
•
Service Drop Cases
Page 12
Symptoms of Service Drops Observed in Drive Tests In a drive test, use the Probe, Huawei test UEs or Huawei data card (if a commercial UE is used, install the corresponding UE signaling tracing software), and traffic monitoring software installed in the drive test PC to observe the following information. The
traffic volume suddenly drops to zero.
The
UE receives system messages in a non-handover or reestablishment scenario. The UE receives system messages.
The traffic volume drops to zero.
Page 13
Symptoms of Service Drops Observed in the Traffic Measurement Data Service drops are monitored by means of traffic measurement on commercial networks. The service drop rate and number of service drops are observed for determining a fault. The traffic measurement result exported from the M2000 displays the following information. Entire-network
service drop rate, number of service drops, number of successful connection
establishments Service
drop rate, number of service drops, and service drop time of top cells
Top cells contribute a lot to service drops.
The entire-network service drop rate is high.
Service drop occurrence period of top cells
Page 14
Contents •
Definit ion of Servi ce-Drop-Related Counters
•
Common Symptoms of Service Drops
•
Causes of Service Drops and Data Handling
•
Checklist and Deliverables for Service Drop s
•
Service Drop Cases
Page 15
Procedure of Analyzing Service Drops
Step 1: Identify the range of service dr ops. An alyze th e traffi c measurem ent data or CHR data to confirm the range where service drops occur, that is, to check whether it is a top-cell or top-eNodeB prob lem, entire-networ k problem, a comprehensive pr oblem, or a top-UE-type or top-UE problem. Note 1: The method of analyzing service drops varies between different scenarios. If the service drop rate deteriorates after the upgrade, compare the difference of the service drop rate before and after the upgrade and analyze the overall range where the deterioration occurs. In an existing site to be optimized (counters related to the service drop rate do not meet requirements or need to be improved), only analyze the range with a high service drop rate, not requiring comparison of the difference of the service drop rate before and after the upgr ade
Step 2: Break down causes of service drops. Use various data sources to identify major causes of service drops.
Step 3: Perform routine troubleshooting operations for service drops. Follow the routine troubleshooting operation checklist to locate root causes and determine rectification measures to solve this problem. Note that the routine troubleshooting operations for service drops are described in details in the next section.
Step 4: Perform r ectific ation m easures. Perform rectification measures to solve the problem and evaluate the effect. If the rectificatio n t arget is not met, repeat the preceding steps for furt her analysis.
Page 16
Determining the Range of Service Drops: Top Cell Selection Principle Top cells are selected according to different principles in different scenarios.
Scenario 1: The servic e drop rate deteriorates. The servic e drop rate deteri orates in scenarios , for example, after the upgrade or wh ere the rate suddenly deterior ates d ue to unknown reason. TOP cell selection principle: Calculate the service drop rate and difference in the number of
abnormal E-RAB releases before and after the specified time (by subtracting the value before deterioration from that after deterioration). Sort deviation values of the service drop rate and number of abnormal E-RAB releases in a descending order to determine top cells with service drop rate deterioration and top cells with abnormal E-RAB releases.
Scenario 2: Existi ng si tes are to be optim ized. Counters related to the service dr op rate do not meet requir ements or need to be im proved t o reach target values. TOP cell selection principle: Sort the service drop rate and number of abnormal E-RAB releases in a descending order to determine top cells wit h service drop rate deterior ation and top cells with abnormal E-RAB releases.
Page 17
Determining the Range of Service Drops: Criteria Top-cell problem: If 20% of top cells with service drop rate deterioration and 20% of top cells with
abnormal E-RAB releases are subtracted and the entire-network service-drop-rate counters are significantly improved to reach original values or target values, service drops are caused by top-cell problems. Entire-network problem: If 20% of top cells with service drop rate deterioration and 20% of top
cells with abnormal E-RAB releases are subtracted and the entire-network service-drop-rate counters are not improved, service drops are caused by entire-network problems. Comprehensive problem: If 20% of top cells with service drop rate deterioration and 20% of top
cells with abnormal E-RAB releases are subtracted and the entire-network service-drop-rate counters are improved to a certain extent but are not as good as original values (still cannot meet target values), service drops are caused by comprehensive (top-cell + entire-network) problems. Top-UE pr oblem: If 20% of top cells with abnormal E-RAB releases are subtracted and the entire-
network service-drop-rate counters are significantly improved to reach original values or target values, service drops are caused by top-UE problems. Note: Currently, the UE type cannot be obtained from the CHR. Query complaints to check whether this type of problem occurs and then analyze symptoms to check whether known problems occur on related terminals. The eNodeB cannot obtain international mobile subscriber identifiers (IMS Is) of top UEs due to security restrictions and needs to use temporary mobile subscriber identifiers (TMSIs) to determine top UEs.
Page 18
Classification of Service Drop Causes: Obtaining Data Sources
If the service drop range is determined, use various data sources to locate causes of the service drop. Data sources include:
Traffic measurement data
Export the traffic measurement data file from the M2000 or PRS. For details, see section 2.3.3 in eRAN2.1 Service Drop Troubleshooting and Optimization Guide.
Signaling tracing result on the network side
Perform signaling tracing on the M2000 to obtain the the signaling tracing result. For details, see section 2.2.2 in eRAN2.1 Service Drop Troubleshooting and Optimization Guide.
Drive test data
Perform drive tests to obtain related data. For details, see section 2.1.3 in eRAN2.1 Service Drop Troubleshooting and Optimization Guide. Page 19
Classification of Service Drop Causes: Tools Available tools, tool function, and too-obtaining approach Tool Name
TraceViewer
Function
Approach
Used to replay signaling messages traced This tool is released along with the version and is on LMT.
contained in the OfflineTool package.
Used to trace information of Huawei UEs, PROBE
http://support.huawei.com/support/pages/editionctrl/catal og/ShowVersionDetail.do?actionFlag=clickNode&node=0 scheduling information, and signal quality. 00001099409&colID=ROOTENWEB|CO0000000174 including signaling information,
Used to measure and analyze information ASSISTANT
NIC
PRS
of Huawei UEs, including signaling information, scheduling information, and http://support.huawei.com/support/pages/editionctrl/catal og/ShowVersionDetail.do?actionFlag=clickNode&node=0 signal quality. 00001099389&colID=ROOTENWEB|CO0000000174 http://support.huawei.com/support/pages/editionctrl/catal og/ShowVersionDetail.do?actionFlag=clickNode&node=0 Used to collect data in batches. 00001468041&colID=ROOTENWEB|CO0000000174 Used to resolve traffic measurement data http://support.huawei.com/support/pages/editionctrl/catal og/ShowVersionDetail.do?actionFlag=clickNode&node=0 of the eNodeB. 00001430110&colID=ROOTWEB|CO0000000065 Used to resolve and analyze original
OMstar
traffic measurement data and CHR data, http://support.huawei.com/support/pages/editionctrl/catal og/ShowVersionDetail.do?actionFlag=clickNode&node=0 and compare parameters. 00001470066&colID=ROOTENWEB|CO0000000174
Page 20
Classification of Service Drop Causes: Tracing Tool Interface
Signaling tracing interface on the M2000
Probe interface
Page 21
Classification of Service Drop Causes: Analysis Tool Interface Probe used to trace and analyze the data of Huawei UEs
TrafficReview used to analyze the eNodeB tracing data.
Page 22
Classification of Service Drop Causes: Identifying Reconfiguration Messages RRC RECONFIGURATION Use the message query software to display the details.
If the cqi-ReportConfig IE exists, that is a Channel Quality Indicator (CQI) reconfiguration message.
If the measConfig IE exists, that is a measurement configuration message.
If the targetPhysCellId IE exists, the RRCConnectionReconfigu ration message is a handover command.
Page 23
Classifying Service Drop Causes Based on Traffic Measurement Data •
Trend Analysis
Obtain the entire-network service drop rate of at least one to two weeks. If an upgrade is performed, collect and analyze the service drop rate of two weeks before the upgrade and that of one week after the upgrade, as shown in the figure on the right.
•
Cause Analysis
Analyze traffic measurement counters to check whether the E-RAB release is caused by a radio fault or a cell resource problem, as shown in the figure on the bott om left.
•
Top cell analysis
Analyze traffic measurement data to determine top cells and top periods of RRC connection or E-RAB establishment failures, as shown in the figure on the bottom right.
Page 24
Analyzing Service Drop Causes by Using Signaling Tracing
Signaling tracing can be used to locate in which procedure a service drop occurs and is specially effective in location of drive test problems and repeatable problems. However, signaling tracing can only be performed before a problem occurs and requires manual analysis. Therefore, signaling tracing cannot apply to unrepeatable problems or small-probability problems.
Standard interface tracing (major): After top cells and top periods are determined by using traffic measurement, perform standard interface tracing for the corresponding cells and periods to check which step triggers the service drop.
Single-UE entire-network tracing (minor): Obtain the IMSI of a top UE from the EPC based on the known TMSI, and then perform entire-networking tracing on the UE. This method is specially effective for subsequent VIP maintenance. For details about the operation method, see chapter 6 in LTE OM Tracing and Data Collection Guide.doc .
Page 25
Analyzing Service Drop Causes by Using Drive Test Data
Compared with the eNodeB signaling tracing, the advantage of the drive test is to obtain not only signaling messages but also the uplink signal strength, uplink transmit power, bit error rate, and scheduling information (the information depends on the drive test software and UE); the disadvantage of the drive test is that, only Uu tracing (RRC and NAS message) results are available and need to be analyzed along with the eNodeB signaling tracing results.
Differentiating an uplink problem from a downlink problem
The drive test software can be used to determine whether the UE does not receive a message from the eNodeB or the eNodeB does not receive the response from the UE. the downlink RSRP and SINR can be observed to check the quality of the downlink channel. The uplink transmit power can be observed to check whether signal demodulation on the uplink is restricted.
Isolating UE faults from non-UE faults
Logs are analyzed to determine whether received signaling messages are properly processed or the UE encounters faults such as suddenly stopped data transmissions.
Page 26
Contents •
Definit ion of Servi ce-Drop-Related Counters
•
Common Symptoms of Service Drops
•
Causes of Service Drops and Data Handling
•
Checklist and Deliverables for Service Drop s
•
Service Drop Cases
Page 27
Entire-Network Service Drop: Routine Operation Checklist
Routine Operation Analysis Operation
deliverables
Solution Operation
Preliminary analysis on traffic measurement data related to service drops
1. Quickly analyze the traffi c measurement data and export the range and causes of service drops. 2. Analyze the service drop rate trend to identify the turning point.
1. Distribution of service drop causes and top causes; 2. Operations performed at the turning point of the service drop rate
1. Perform corresponding optim ization operations based on top service drop causes. 2. Provide operations performed at the turning point of the service drop rate and evaluate the i mpact of each operation on the service drop rate.
Version check
1. Check whether the eNodeB is upgraded or has patches installed patches. 2. Check whether the EPC is upgraded or has patches installed patches.
Version No. before and after the upgrade
Provides modifications before and after the upgrade possibly affecting the service drop rate by referring to the release notes.
Equipment and transport alarms
Check alarms on the entire network.
List critical and major alarms.
Analyze the impact of alarms on the service drop rate and check whether the service drop rate is recovered after alarms are cleared.
Data configuration check
1. Check parameter settings on the entire network. 2. Check modified parameters on the EPC.
1.
1. Check whether param eter modification affects the service drop rate. 2. Revert parameters and check whether the service drop rate is recovered.
2. 3.
Parameter differences before and after the upgrade. Parameter differences in comparison with the baseline parameters of the new version. Objective and impact of parameter modification on the EPC.
Operation record check
Check whether a great amount of operation Records of operations on the entire network records exist on the entire network and whether neighboring cells and PCIs are replanned.
Analyze the impact of operations on the service drop rate and check whether the operations can be reverted.
Neighboring relationship check
Check whether neighboring cells are missing. Deployment of a great number of eNodeBs between existing eNodeBs in a scattered manner may make the neighboring relationships of many adjacent sites become improper.
Information of missing neighboring cells
Add missing neighboring cells and check whether the service drop rate is recovered.
Major events check
Check whether large-scale telephone number release is implemented or other important activities such as ceremonies, holidays, and sport events are held.
1. Verify the UE type involved in t he telephone number release, number release amount, and subscription policies. 2. Confirm the range and period of time of important activities.
Confirm the relationship between the important event and the deterioration of service drop rate.
Note: For details about routine troubleshooting operations for a comprehensive (entire-network + top-cell) problem, see the checklists of the topcell problem and the entire-network problem.
Page 28
Top-Cell Service Drop: Routine Troubleshooting Operation Checklist Routine Operation
Analysis Operation
deliverables
Preliminary analysis on the traffic measurement data related to top-eNodeB service drops Top-eNodeB version check
1. Quickly analyze the traffi c measurement data and export the range and causes of service drops. 2. Analyze the service drop rate trend to identify the turning point. Check whether the eNodeB is upgraded or has patches installed patches.
1. Distribution of service drop causes and top causes; 2. Operations performed at the turning point of the service drop rate
Equipment and 1. Check alarms of top eNodeBs. transport alarms of top eNodeBs Top-eNodeB Check parameter settings of top eNodeBs. parameter settings check
Top-eNodeB operation Check whether a great amount of operation record check records exist on the entire network and whether neighboring cells and PCIs are replanned. Top-eNodeB neighboring relationship check
Top-cell coverage check Top-cell interference check Major events check
Check whether neighboring cells are missing. Deployment of a great number of eNodeBs between existing eNodeBs in a scattered manner may make the neighboring relationships of many adjacent sites become improper. Analyze the MCS and CQI information in the traffic measurement data, CHR data, and drive test data to check whether top cells encounters cross coverage or weak coverage. Analyze the real-time tracing data to check whether top cells encounter intermodulation interference and external interference. Check whether large-scale telephone number release is implemented or other important activities such as ceremonies, holidays, and sport events are held.
Version No. before and after the upgrade
List critical and major alarms.
Solution Operation 1. Perform corresponding optim ization operations based on top service drop causes. 2. Provide operations performed at the t urning point of the service drop rate and evaluate the impact of each operation on the service drop rate. Provides modifications before and after the upgrade possibly affecting the service drop rate by referring to the release notes. Analyze the impact of alarms on the service drop rate and check whether the service drop rate is recovered after alarms are cleared. 1. Check whether param eter modification affects the service drop rate. 2. Revert parameters and check whether the service drop rate is recovered.
1. Parameter differences before and after the upgrade; 2. Parameter differences in comparison with the baseline parameters of the new version. Records of operations on the entire Analyze the impact of operations on the service drop network rate and check whether the operations can be reverted. Information of missing neighboring cells
Add missing neighboring cells and check whether the service drop rate is recovered.
Top-cell coverage evaluation report
1. If weak coverage exists, adjust the coverage by means of network optimization.
1. Top-cell interference evaluation report
1. If interference exists, solve the problem by referring to the int erference check manual.
1. Verify the UE type involved in the Confirm the relationship between the im portant event telephone number release, number release and the deterioration of service drop rate. amount, and subscription policies. 2. Confirm the range and period of time of important activities.
Page 29
Fault Location: Radio Problems •
Symptom:
According to the definition of the traffic measurement counter on the eNodeB, if abnormal releases are counted into the L.E-RAB.AbnormRel.Radio counter, the service drop is caused by the radio interf ace problem on the wireless network side.
•
Possible causes
A service drop with the cause value being radio is caused by the reason that RLC retransmissions reach the maximum timer, out-of-synchronization occurs, or signaling message exchange fails due to weak coverage, uplink interference, or UE faults. For det ails about interference elimination, see LTE RF Channel Test and Check Manual.
•
Handling procedure
Analyze the CHR data to check whether top UEs exist.
Analyze the CHR data to verify inner causes of abnormal releases.
If a service drop is caused on a fail ure in exchange of non-procedure messages, view the L2 DRB scheduling data to check whether weak coverage or interf erence occurs.
If a procedure message exchange fails, observe the last ten message to locate the faulty point and determine whether the UE does not receive the message from the eNodeB or receives but not processes the message, or the eNodeB does not receive the response from the UE.
Inner release cause values in the CHR are: UEM_UECNT_REL_UE_RLC_UNRESTORE_IND, UEM_UECNT_REL_UE_RESYNC_TIMEROUT_REL_CAUSE, UEM_UECNT_REL_UE_RESYNC_DATA_IND_REL_CAUSE, UEM_UECNT_REL_UE_RLF_RECOVER_FAIL_REL_CAUSE, and UEM_UECNT_REL_RRC_REEST_SRB1_FAIL UEM_UECNT_REL_RB_RECFG_FAIL_RRC_CONN_RECFG_CMP_FAIL.
Page 30
Fault Location: Hanover Failures •
Symptom:
According to the definition of the traffic measurement counter on the eNodeB, if abnormal releases are counted into the L.E-RAB.AbnormRel.HOFailure counter, service drops are caused by handover failures.
•
Possible c auses
A service drop with the cause value being handover failure is caused by an abnormal release due to a failure in handover out of the serving cell.
•
Handling procedure
Use inter-specific-cell outgoing handover counters to determine the target cell with the largest service drop rate.
Analyze the CHRs of the serving cell and the target cell to check whether the UE fails to receive the handover command or the UE fails to random access the target cell. The corresponding inner release cause values in the CHR are UEM_UECNT_REL_HO_OUT_X2_REL_BACK_FAIL and UEM_UECNT_REL_HO_OUT_S1_REL_BACK_FAIL.
Optimize the handover relationship including handover parameters and neighboring relationship and then check whether related counters are recovered.
Page 31
Fault Location: Transport Problems •
Symptom:
•
Possible causes
•
According to the definition of the traffic measurement counter on the eNodeB, if abnormal releases are counted into the L.E-RAB.AbnormRel.TNL counter, service drops are caused by transport-layer problems.
A service drop with the cause value being TNL is caused by a transport fault between the eNodeB and the MME, for example, intermittently disrupted S1 link.
Handli ng procedure
Query alarms to check whether there are transport-related alarms, clear the alarms if any, and then check whether related counters are recovered.
Check whether the eNodeB encounters transport-related alarms on the M2000.
Clear alarms by referring to the alarm help.
If alarms are cleared and the L.E-RAB.AbnormRel.TNL counter still has a large value, collect and send the following information to the next fault location station.
Page 32
Fault Location: Congestion Problems •
Symptom
According to the definition of the traffic measurement counter on the eNodeB, if abnormal releases are counted into the L.E-RAB.AbnormRel.Cong counter, service drops are caused by congestion problems.
•
Possibl e Causes
A service drop with the cause value being congestion is caused by congestion of radio resources on the eNodeB, for example, the maxim number of users reaches.
•
Handli ng Procedure
If a top cell encounters service drops caused by long-term congestion, enable the load balancing or interoperation function to reduce the load of the serving cell for a short-term solution. For a long-term solution, expand the capacity. After solving the problem, check whether related counters are recovered.
Turn on the MLB algorithm switch and check whether the situation is improved.
Page 33
Fault Location: MME Problems •
Symptom
According to the definition of the traffic measurement counter on the eNodeB, if abnormal releases are counted into the L.E-RAB.AbnormRel.MME counter, the service drop is caused by an abnormal release initiated by the EPC. This type of abnormal releases is not counted into the L.E-RAB.AbnormRel counter.
•
Possible Causes
•
A service drop with the cause value being MME is caused by an abnormal release initiated by the EPC.
Handling Procedure
This type of service drops is caused by non-eNodeB problems and needs to be located by using EPCrelated information.
Inner release cause values in the CHR: UEM_UECNT_REL_MME_CMD. The service drop is caused by the release initiated by the EPC. Work with the EPC technical support personnel to solve this problem.
Obtain the S1 tracing result of top cells and analyze the distribution of various causes of abnormal releases initiated by the EPC.
Send measurement results and related signaling procedures to the EPC technical support personnel for further analysis.
Page 34
Deliverables for Service Drops •
Check result based on the routine trou bleshooting operation checklist for service drops
•
For some diff icult p roblems, collect more logs for fur ther location.
BRD log (mandatory)
Standard interface signaling (mandatory)
Indicates S1, X2, and Uu interface tracing results.
Network configuration (mandatory)
Indicates logs of the LMPT and LBBP on the eNodeB to which top cells belong.
Includes networking information, engineering parameters, and configuration files of top eNodeBs.
TTI tracing (optional; depending on fault location requirements)
Indicates IFTS tracing results and cell tracing results. Only information of top cells in t op periods needs to be collected because there is a great amount of data.
Single-UE tracing (optional; depending on fault location requirements)
Used for in-depth top-UE location and is performed on the entire network by using the IMSI that is obtained from the EPC based on the TMSI of the top UE.
Page 35
Contents •
Definit ion of Servi ce-Drop-Related Counters
•
Common Symptoms of Service Drops
•
Causes of Service Drops and Data Handling
•
Checklist and Deliverables for Service Drop s
•
Service Drop Cases
Page 36
Cases: Overview •
After the network in D2 of Germany is upgraded to eRAN2.1 V100R003C00SPC420, the R&D personnel analyze the service drop rate of this site. This document uses this analysis as an example to describe the procedure of analyzing service drops and causes of service drops.
•
After D2 is upgraded, some problems encountered in the old version are solved and the average service drop rate decreases to 0.6%. Since the network is upgraded based on segments, the service drop rate experiences a slow decrease process during the period from Dec 5th to Dec 10th. The whole network is upgraded by Dec 12th.
Page 37
Case 1: Service drops are caused by the reason that top UEs continuousl y fail in reestablishment.
As shown in the figure on the upper right, most abnormal releases on the eNodeB are caused by failures in exchanging the first three signaling messages during the reestablishment process. As shown in the figure on the middle right, from the perspective of fault occurrence time, most service drops occur in a continuous manner within a period from 11:51 to 18:49 in cell 0. As shown in the figure on the bottom right, from the perspective of TMSI information, service drops are caused by a certain UE (TMSI C2 B0 B0 40) and the main cause value of reestablishment is reconfiguration failure. As shown in the figure on the bottom left, from the perspective of reconfiguration message type, messages are not handover commands or measurement configuration messages but may be CQI, sounding, and transmission mode (TM) reconfiguration messages. In addition, the UE does not respond to the RRC CONN REESTAB message and therefore the eNodeB releases E-RABs 5s later.
Page 38
Case 2: Top UEs encounters continuous faults.
The CHR of the eNodeB shows that most abnormal releases are caused by the reason that RLC retransmissions reach the maximum number of ti mes, that is, DRB retransmissions reach the m aximum number of times (8 retransmissions).
From the perspective of fault occurrence time, most service drops occur in a continuous manner within a period from 10:51 to 13:49 in cell 2.
From the perspective of TMSI information, service drops are caused by a certain UE (TMSI C2 7F 20 56).
The last 16 64-ms messages of DRB scheduling information show the similar problem, that is, a fault (simi lar to suddenly stopped data transmission) occurs soon after access. The release occurs within tens of seconds to two m inutes after access and is not possibly caused in a test using commands. In addition, the access type is MO-DATA. This type of releases occurs in actual service performance process.
Page 39
Case 3: The uplink link quality is poor. •
The figure on the right shows that, f rom the last four 512-ms messages of DRB scheduling information to the last 16 64-ms messages of DRB scheduling information, the uplink RSRP and SINR are poor. The uplink RSRP reaches – 135 dBm or below. The sounding SINR and demodulation reference signal (DMRS) SINR are –3 dB or less. The service drop is possibly caused by uplink weak coverage.
•
The figure on the left shows that, from the last four 512-ms messages to the last 16 64ms messages, the uplink RSRP is around – 130 dBm. The sounding SINR and DMRS SINR are –3 dB or less. The service drop is possibly caused by small uplink interference in a weak-coverage area.
Page 40
Case 4: Reconfiguration of the target cell fails. •
Release cause (Unspecified d isplayed in the S1 tracing result)
TGT_ENB_RB_RECFG_FAIL indicates an abnormal release caused by an RB reconfiguration failure on the target eNodeB during the handover process.
After the UE successfully hands over to the target cell, the target eNodeB sends a PATH SWITCH REQ ACK message to the MME and immediately sends a UE CONTEXT REL REQ message about 100 ms later, carrying the S1-AP cause value of unspecified. The figure on the left displays the last ten messages.
•
Problem analysis
During the handover process, the MME sends a PATH_SWITCH_ACK message carrying the downlink AMBR value inconsistent with that carries in the S1 or X2 handover request. This is a defect of the RR module. The upper-layer RR control module sends an AMBR update message to t he lower-layer RB module. The RB module determines not to send a Uu reconfiguration message to the UE and then responds with a null value to the upper-layer RR control module. In this case, the upper-layer RR control module handles with this response as a fault and then releases the UE. This problem is included in eRAN2.1 V100R003C00SPC430.
Page 41
Case 6: A service drop is caused by the inter-RAT redirection. •
Release cause (Inter-RAT redirection displ ayed i n the S1 tracing r esult)
IRHO_REIDRECTION_TRIGER indicates a release caused by inter-RAT redirection. Releases caused by this reason are mistakenly counted into service drops in eRAN2.1 V100R003C00SPC400 and eRAN2.1 V100R003C00SPC401. The following figure shows related messages.
This problem will be solved in eRAN2.1 V100R003C00SPC420.
Page 42
Case 6: Releases are counted into the L.E-RAB.AbnormRel.TNL counter due to transport faults. •
On Dec 11th of 2011, the entire-network service drop rate of 900 MHz and 2.6 GHz deteriorate in Tele2 and Telnor, as shown in the following figure.
•
The field personnel has discussed this problem with the operator. It is likely that this problem is caused by EPC faults. However, no response is received from the operator.
Page 43
Case 7: Service drops are caused by radio problems. •
Release c ause
UE_RESYNC_TIMEROUT_REL_CAUSE (Radio Connection With UE Lost displayed in the S1 tracing result): indicates a L2-report release caused by resynchronization after timeout of the resynchronization timer following the out-of-synchronization.
UE_RLC_UNRESTORE_IND (Radio resources not available displayed in the S1 tracing result): indicates the L2-reported RLC unrestore indication that is sent after the maximum number of RLC retransmissions reaches.
UE_RESYNC_DATA_IND_REL_CAUSE (Unspecified displayed in the S1 tracing result): indicates a L2-reported release caused by data-triggered resynchronization after the out-of-synchronization.
•
Cause analysi s
From the last four 512-ms messages of DRB scheduling information to the last 16 64-ms messages of DRB scheduling information, abnormal releases are caused by faults similar to suddenly stopped data transmission in most cases. Possibly, the SIM card is removed or the UE is faulty. The following figure shows information recorded in the CHR.
Page 44
Case 8: The reestablishment procedure fails. •
Release cause (Radio Connection With UE Lost displ ayed in the S1 tracing result)
RRC_REEST_SRB1_FAIL: indicates a release occurring at the SRB 1 restoration stage during the RRC connection reestablishment.
The last ten messages, as shown in the following figure, after the eNodeB sends an RRC_CONN_REESTAB message, the eNodeB does not receive the RRC_CONN_REESTAB_CMP message from the UE before the radio interface 5s timer expires.
For the perspective of L2 scheduling, the UE responds with an ACK message after receiving the RRC_CONN_REESTAB message from the eNodeB.
That is possibly because some UEs do not send the RRC_CONN_REESTAB_CMP message. For example, Samsung UEs have this problem.
Page 45