Security Level: internal
Guide to Optimizing LTE Service Drops
www.huawei.com
HUAWEI TECHNOLOGIES CO., LTD.
Huawei Confidential
Change History Date
Version
Description
2012.1.10
1.0
Completed the draft.
Reviewer
Page 2
Author
Abstract This document
Defines the call drop rate.
Describes how to use the related counters to diagnose a call drop and to analyze factors influencing the KPI.
Describes common diagnosis methods and standard actions to be taken by front-line engineers to handle a call drop problem.
Describes the deliverables that the front-line engineers must submit to R&D engineers if the front-line engineers fail to solve the problem after taking the standard actions
Page 3
Content • Definition of the Service Drop Rate • Symptoms of a Service Drop • Cause Analysis and Data Processing • Checklist and Deliverables • Case Study
Calculation of the Call Drop Rate on the UE Side (1/3)
Call Drop Rate = eRAB AbnormRel / eRAB Setup Success x 100% where eRAB AbnormRel is the number of e-RAB e -RAB abnormal releases and eRAB Setup Success is the number of successful e-RAB setup events.
Page 5
Calculation of the Call Drop Rate on the UE Side (2/3) • I.
eRAB AbnormRel is calculated by Huawei Genex PA as follows: eRAB eR AB Ab Abno norm rmRe Rell inc incre reme ment nts s by by 1 if th the e UE UE
Does not receive the DEACTIVATE EPS BEA RER CONTEXT REQUEST message and,
Does not receive the DETACH REQUEST message from the MME and,
Does not send the DETACH REQUEST message and,
Receives the RRCConnectionReconfiguration RRCConnectionReconfiguration message containing the IE drb -ToReleaseList.
In this case, if the ERAB num minus the eps-BearerIdentity contained in the ReleaseList is 0, the UE transits to RRC_Idle mode. II.. II
eRAB eR AB Ab Abno norm rmRe Rell inc incre reme ment nts s by 1 ifif the the UE
Does not receive the DEACTIVATE EPS BEARER CONTEXT REQUEST message and,
Does not receive the DETACH REQUEST message from the MME and,
Does not send the DETACH REQUEST message and,
Receives the RRCConnectionRelease RRCConnectionRelease message and the RLC layer performs data transmission in the last 4s in any direction.
In this case, the UE directly transits to RRC_Idle mode.
Page 6
Calculation of the Call Drop Rate on the UE Side (3/3)
III. ERAB ERABAbno Abnormal rmalRel Rel incremen increments ts by 1 for each each release released d e-RAB if the UE UE has established e-RAB(s) and enters the RRC_Idle mode before receiving the RRCConnectionRelease message. IV. ERAB ERABAbno Abnormal rmalRel Rel increments increments by 1 if the UE initiates the RRC RRC connection connection setup request without receiving the RRC Connection Reconfiguration, Deactivate Deactivate EPS Bearer Context Request, Detach Request, RRC State, or RRC Connection Release message. V. ERAB ERABAbno Abnormal rmalRel Rel incremen increments ts by 1 ifif the event event RRCRees RRCReestablis tablishFai hFaill occurs. occurs. The timestamp contained in these two events is the same. Note: The acceptance criteria of some customers may require that all RRC reestablishments initiated by the UE be counted as service drops.
Page 7
Calculation of the Call Drop Rate on the Network Side
Call Drop Rate = L.E-RAB.AbnormRel / (L.E-RAB.NormRel + L.ERAB.AbnormRel) x 100% where L.E-RAB.AbnormRel is the number of e-RAB e-R AB abnormal releases and L.E-RAB.NormRel is the number of e-RAB normal releases.
Page 8
Counters Recorded by the Network •
•
As shown in point point A of Fig1, Fig1, if the eNodeB eNodeB sends the E-RAB E-RAB RELEASE RELEASE INDICATION message message containing a cause value that is not "Normal Release", "User Inactivity", "cs fallback triggered ", or "Inter-RAT redirection ", L.E-RAB.AbnormRel increments by 1. If the E-RAB RELEASE INDICATION message requests release of multiple e-RABs, L.E-RAB.AbnormRel increments by 1 for each e-RAB. As shown in point point A of Fig2, Fig2, when the eNodeB eNodeB sends the the UE CONTEXT CONTEXT RELEASE REQUEST message to the MME, the eNodeB releases all e-RABs of the UE. If the release cause is not "Normal Release", "User Inactivity", "cs fallback triggered ", or "Inter-RAT redirection ", L.ERAB.AbnormRel increments by 1 for each release.
Note: The eRAB Release procedure releases one or multiple e-RABs. After the procedure, at least the default bearer is maintained. The UE Context Release procedure releases all connections. No bearer is maintained after this procedure. Page 9
Counters That Count Abnormal Releases by the Network (1/4) •
Currently, there are five counters that count e-RAB e -RAB abnormal releases by the network:
L.E-RAB.AbnormRel.Radio (Number of e-RAB abnormal releases caused by the eNodeB)
L.E-RAB.AbnormRel.TNL (Number of e-RAB abnormal releases caused by the transmission network)
L.E-RAB.AbnormRel.Cong (Number of e-RAB abnormal releases caused by network congestion)
L.E-RAB.AbnormRel.HOFailure (Number of e-RAB abnormal releases caused by handover failures)
L.E-RAB.AbnormRel.MME (Number of e-RAB abnormal releases caused by the EPC)
Page 10
Counters That Count Abnormal Releases by the Network (2/4) •
Abnormal releases caused by the EPC
As shown in point A of Fig1 Fig1 and Fig2, if the eNodeB receives the E-RAB RELEASE COMMAND or UE CONTEXT RELEASE COMMAND COMMAND message from the t he MME containing a cause value that is not “Normal Release””, “Detach Release Detach””, “User Inactivity” Inactivity”, “cs fallback triggered” triggered”, or “Inter-RAT redirection” redirection”, L.E-RAB.AbnormRel.MME increments by 1.
Note: L.E-RAB.AbnormRel.MME L.E-RAB.AbnormRel.MME does not include L.ERAB.AbnormRel. A release initiated by the EPC is not counted as a call drop in eRAN2.1SPC400 and later versions.
Page 11
Counters That Count Abnormal Releases by the Network (3/4) •
Abnormal release release not caused by by the EPC
As shown in point point A of Fig3, Fig3, if the eNodeB eNodeB sends the ERAB RELEASE INDICATION message to the MME with a cause value indicating a radio error, L.ERAB.AbnormRel.Radio increments by 1. If the cause value indicates a transmission network error, L.ERAB.AbnormRel.TNL increments by 1. If the cause value indicates network congestion, L.E-RAB.AbnormRel.Cong increments by 1. If the E-RAB RELEASE INDICATION message requires release of multiple e-RABs, the concerned counter increments by 1 for each e-RAB.
Page 12
Counters That Count Abnormal Releases by the Network (4/4) •
Abnormal release release not caused by EPC
As shown in point point A of Fig4, Fig4, the eNodeB eNodeB sends the UE CONTEXT RELEASE REQUEST message to the MME to release all e-RABs of the UE. If the cause value indicates a radio error, L.E-RAB.AbnormRel.Radio increments by 1. If the cause value indicates a transmission network error, L.ERAB.AbnormRel.TNL increments by 1. If the cause value indicates network congestion, L.E-RAB.AbnormRel.Cong increments by 1. This counter measures the abnormal releases caused by preemption and resource congestion. If the cause value indicates a handover failure, L.ERAB.AbnormRel.HOFailure increments by 1. The concerned counter increments by 1 for each e-RAB. The counters no longer increment when the MME sends the UE CONTEXT RELEASE COMMAND message.
Page 13
Content • Definition of the Service Drop Rate • Symptoms of a Service Drop • Cause Analysis and Data Processing • Checklist and Deliverables • Case Study
Symptoms of a Call Drop as Observed in a Drive Test Huawei test UE and UE Probe, or other commercial comm ercial UEs and their signaling trace software are used in a drive test. Symptoms shown by the traffic monitoring software installed on the drive test computer are: The throughput suddenly falls to a low value or zero. The UE begins to receive system information when a handover is not complete or when the UE is not in a re-establishment r e-establishment scenario. Low throughput
UE receives system information.
Symptoms of a Call Drop as Observed from the Traffic Statistics The call drop problem of a commercial network is observed from the traffic statistics and is reflected by the call drop rate and call drop count. The symptoms shown by the traffic statistics exported from the M2000 are: Global call drop rate, call drop count, and number of s uccessful service setups Call drop rate, call drop count, and time segment of top cells Top cells occupy a high percentage of call drops
High global call drop rate
Time segment of call drops
Content • Definition of the Service Drop Rate • Symptoms of a Service Drop • Cause Analysis and Data Processing • Checklist and Deliverables • Case Study
Steps in Analyzing a Call Drop Problem (1/2)
Step 1: Determine the scope of the call drop problem:
Analyze the traffic statistics and CHR to determine the scope of the call drop problem, whether it is a top-cell or top-site problem, entire-network problem, comprehensive problem, or top-terminal/top-UE problem.
Note: The analysis method varies for different scenarios. In a scenario of degraded performance after upgrade, you need to compare the differences before and after the upgrade to determine the scope of the degradation. In a scenario of inventory optimization where the call drop performance is below expectation or to be improved, you need to determine the region of performance degradation.
Step 2: Classify the causes of a call drop problem:
Analyze the data sources to classify the causes of a call drop problem.
Page 18
Steps in Analyzing a Call Drop Problem (2/2)
Step 3: Do as required by the checklist:
Do as required by the checklist to determine the root cause and the closing action.
Note: The checklist is described in the next chapter.
Step 4: Close the problem:
Close the problem and evaluate the result. If the result is unsatisfactory, repeat the preceding steps.
If the closing actions are reproducible, consider the merits of copying the closing actions to the entire network.
Page 19
Determining the Scope of a Call Drop Problem – Problem – Principles of Selecting Top Cells (1/2) The principles of selecting top cells vary for different scenarios.
Scenario 1: Performance degradation in the time dimension:
The call drop performance degrades after an upgrade, or degrades suddenly due to unknown reasons.
Principles of selecting top cells Calculate the difference of the counters (call drop rate and
number of e-RAB abnormal releases) before and after the upgrade of each cell. Sort the cells by the difference of the call drop rate and the difference of the number of e-RAB abnormal releases to obtain the top
cells of degraded call drop rate and top cells of number of e-RAB abnormal releases. Page 20
Determining the Scope of a Call Drop Problem – Problem – Principles of Selecting Top Cells (2/2) Scenario 2: Performance degradation in an inventory optimization:
The call drop performance of the live network is below expectation and needs to be optimized to the target value.
Principles of selecting top cells Sort the cells by the difference of the call drop rate and the difference of the number of e-RAB abnormal releases to obtain t he top cells of degraded call drop rate and top cells of number of e-RAB abnormal releases.
Page 21
Determining the Scope of a Call Drop Problem – Problem – Criteria Criteria (1/2)
Top-cell problem:
After one-fifth of the top cells of high high call drop rate rate and large number of e-RAB abnormal releases are removed removed from calculation of the t he entire-network call drop performance, if the performance is significantly improved to the expected value, the call drop problem is defined as a top-cell problem.
Entire-network problem
After one-fifth of the top cells of high high call drop rate rate and large number of e-RAB abnormal releases are removed removed from calculation of the t he entire-network call drop performance, if the performance is not significantly improved, the call drop problem is defined as an entire-network problem.
Page 22
Determining the Scope of a Call Drop Problem – Problem – Criteria Criteria (2/2)
Comprehensive problem
After one-fifth of the top cells of high high call drop rate rate and large number of e-RAB abnormal releases are removed removed from calculation of the t he entire-network call drop performance, if the call drop performance is improved a little to a value slightly below the expected value, the problem is defined as a comprehensive (top-cell plus entire-network) problem.
Top-UE problem
After one-fifth of the top UEs are are removed from calculation of the entire-network call drop performance, if the performance is significantly improved to the expected value, the problem is defined as a top-UE problem.
Note Currently, the CHR of the LTE system provides no information about the terminal type. The terminal type is provided by complaining users or inferred from the symptoms. Due to security concerns, the eNodeB does not provide IMSI information. Therefore, top UEs can be inferred only from the TMSI, not from the IMSI.
Page 23
Classifying the Causes of Call Drop Problems – Problems – Obtaining Obtaining Data Source
After determining determining the scope of the the call drop problem, problem, analyze analyze the following following data data sources to infer the causes of the problem:
Traffic statistics
Traffic statistics can be obtained from the M2000/PRS. For details, see section 2.3.3 of LTE Service Drop Troubleshooting and Optimization Guide.doc.
Signaling trace on the network side
Signaling trace can be performed on the M2000. For details, see s ection 2.2.2 of LTE Service Drop Troubleshooting and Optimization Guide.doc.
Drive test data
The drive test data can be obtained by performing a driv e test. For details, see section 2.1.3 of LTE Service Drop Troubleshooting and Optimization Guide.doc.
Page 24
Classifying the Causes of Call Drop Problems – Problems – Acquiring Acquiring Tools The following table lists available tools, usages, and acquisition method. Tool Name
TraceViewer
Probe
Usage
Acquisition Method
Plays back signaling messages traced on the
Released together with the product version and integrated in
LMT.
OfflineTool file package.
Installed on Huawei UE and traces signaling, scheduling, and signal quality information.
http://support.huawei.com/support/pages/editionctrl/catalog/Sh owVersionDetail.do?actionFlag=clickNode&node=000001099 409&colID=ROOTENWEB|CO0000000174
Installed on Huawei UE, counts and analyzes Assistant
signaling, scheduling, and signal quality information.
NIC
PRS
OMstar
Batch data collection collection tool Parses and analyzes traffic statistics of the eNodeB. Parses and analyzes original traffic statistics and CHR. Compares parameters.
http://support.huawei.com/support/pages/editionctrl/catalog/Sh owVersionDetail.do?actionFlag=clickNode&node=000001099 389&colID=ROOTENWEB|CO0000000174 http://support.huawei.com/support/pages/editionctrl/catalog/Sh owVersionDetail.do?actionFlag=clickNode&node=000001468 041&colID=ROOTENWEB|CO0000000174 http://support.huawei.com/support/pages/editionctrl/catalog/Sh owVersionDetail.do?actionFlag=clickNode&node=000001430 110&colID=ROOTWEB|CO0000000065 http://support.huawei.com/support/pages/editionctrl/catalog/Sh owVersionDetail.do?actionFlag=clickNode&node=000001470 066&colID=ROOTENWEB|CO0000000174
Classifying the Causes of Call Drop Problems Problems – – Interfaces Interfac es of the Tracing Tools Tools
Signaling Trace Management interface of the M2000 Huawei UE Probe
Classifying the Causes of Call Drop Problems – Problems – Int Interfaces erfaces of the t he Analysis Tools Tools
Huawei UE Probe
eNodeB TrafficReview TrafficReview
Classifying the Causes of Call Drop Problems Problems – – Identifying Reconfiguration Messages Identifying the RRC CONNECTION RECONFIGURATION message Start
the Message Browser to view vi ew the details of the message.
If the message contains the IE cqiReportConfig, the message is a CQI reconfiguration message. If the message contains the IE measConfig, the message is a measurement control message.
If the message contains the IE targetPhysCellId, the message is a handover command.
Analyzing Traffic Statistics to Obtain Causes of Call Drop Problems •
Trend analysis
Obtain the call drop KPI of the global network for at least one to two weeks, or two weeks before and one week after the upgrade in case an upgrade has been performed. An example is shown in the upper right figure.
•
Cause analysis
The counters indicate whether an abnormal release is caused by the Uu interface or cell resource congestion, as shown in the lower left figure.
•
Top analysis
Analysis of the traffic statistics can show the top cells and top time segments that have the highest RRC connection setup failure and e-RAB setup failure, as shown in the lower right figure.
Page 29
Analyzing Signaling Trace to Obtain Causes of a Call Drop
The signaling trace clearly shows the signaling procedure that causes the call drop and is effective for diagnosing problems found during a drive test or reproducible problems. The disadvantage is that the trace must be performed before the problem is triggered and that manual analysis is required. The signaling trace cannot be used for irreproducible or small -probability problems.
Standard interface trace (a major means): Obtain top cells and top time segments by analyzing the traffic statistics, start the standard interface trace on the top cells and at top time segments, check which signaling procedure causes the call drop.
Single-UE global-network trace (a minor means): Query the IMSI of a TMSI from the EPC, start the global-network trace of this IMSI. This method is effective for ensuring VIP service.
Page 30
Analyzing Drive Test Data to Obtain Causes of a Call Drop
The advantage of a drive test is that the downlink signal strength, uplink transmit power, bit error rate, and scheduling information can be obtained, depending on the drive test software and UE capability. The disadvantage is that in terms of signaling trace, only the signaling (including the RRC and NAS messages) of the Uu interface is traced. Therefore, it is desirable to combine a drive test with the signaling trace on the eNodeB.
Determine whether a call drop is caused by uplink or downlink problem.
The drive test can show whether the t he UE or eNodeB fails to receive the signaling message; the downlink RSRP/SINR obtained from the drive test indicates the downlink channel quality; the uplink transmit power indicates whether the uplink is insufficient.
Determine whether a call drop is caused by UE.
The UE log shows whether the UE correctly processes the received signaling messages and whether the UE suddenly does not send any data.
Page 31
Content • Definition of the Service Drop Rate • Symptoms of a Service Drop • Cause Analysis and Data Processing • Checklist and Deliverables • Case Study
Checklist for the Entire-Network Problem (1/2) Standard Action Preliminary analysis of traffic statistics
Analysis Action
Deliverables
Closing Action
1. Analyze the traffic 1. Distribution of the 1. Optimize the network according statistics to determine the causes and top causes to the top causes of the call drop range and cause of the call 2. Actions that affect problem. drop. the call drop rate 2. Describe the actions that affect 2. Analyze the trend of the the call drop rate and the impact. call drop rate to determine change of the call drop rate. Version check 1. Check whether the New and old version Describe the changes that may eNodeB version is upgraded numbers affect the call drop rate based on or a new patch is installed. the Release Notes. 2. Check whether the EPC version is upgraded or a new patch is installed. Equipment and transmission alarms
1. Global alarm check
Parameter configuration check
1. Global parameter configuration check 2. Inspection of EPC parameter change
Critical and major alarms
1. Analyz Analyzee the impact impact of of alarms alarms on the call drop rate. 2. Cle Clear ar the the alarm alarmss and chec check k whether the call drop KPI is restored. 1. Difference of 1. Determine whether the parameters before and parameter change affects affects the callafter the upgrade drop KPI. 2. Difference of 2. Roll back the parameters and parameters compared compared check whether the call-drop KPI is with the baseline restored. 3. Purpose and impact of the change of EPC parameters
Page 33
Checklist for the Entire-Network Problem (2/2) Standard Action
Analysis Action
Deliverables
Closing Action
Operation record check
Check whether batch Records of batch operations operations affecting the affecting the global network global network are recorded and whether neighboring cells and PCI are re-planned.
Analyze the impact of batch operations on the call drop rate. Determine whether the batch operations can be rolled back.
Neighbor relationship check
Check for missed configuration of neighbor relationship. Deployment of scattered sites causes incorrect neighbor relationship.
Missed configuration of neighbor relationship
Add neighboring cells that are not configured in the neighbor relationship. Check whether the call drop KPI is restored.
Major event check
Check for allocation of a large quantity of phone numbers and major activity (such as ceremony, holidays, and games)
1. Check the terminal type involved in the number allocation, quantity of number allocation, and subscription policy. policy. 2. Determine the range and time segment of the major event.
Check whether the major event is coupled to the deterioration of the call drop rate in the time dimension.
Note The standard actions of a comprehensive problem (entire-network plus top-cell problem) are a combination of the checklist for the entire -network problem and the checklist for the top-cell problem. Page 34
Checklist for the Top-cell Problem (1/2) Standard Action
Analysis Action
Deliverables
Closing Action
Preliminary analysis of traffic statistics of top sites
1. Analyze the traffic 1. Distribution of the causes 1. Optimize the network according statistics to determine the and top causes to the top causes of the call drop range and cause of the 2. Actions that affect the call problem. call drop. drop rate 2. Describe the actions that affect 2. Analyze the trend of the call drop rate and the impact. the call drop rate to determine change of the call drop rate. Version check Check whether the New and old version Describe the changes that may of top sites eNodeB version is numbers affect the call drop rate based on the upgraded or a new patch Release Notes. is installed. Equipment and Alarm check of top sites Critical and major alarms Analyze the impact of alarms on the transmission call drop rate. Clear the alarms alarms of top and check whether the call drop sites KPI is restored. Parameter configuration check of top sites
Parameter configuration 1. Difference of parameters 1. Determine whether the parameter check of top sites before and after the the upgrade change affects the call-drop 2. Difference of parameters KPI. compared with the baseline 2. Roll back the parameters and check whether the call-drop KPI is restored.
Page 35
Checklist for the Top-cell Problem (2/2) Standard Action
Analysis Action
Deliverables
Closing Action
Operation Check whether batch Records of batch operations Analyze the impact of batch record check operations affecting the global affecting the global network operations on the call drop of top sites network are recorded and rate. Determine whether the whether neighboring cells and batch operations can be rolled PCI are re-planned. back. Neighbor Check for missed Missed configuration of Add neighboring cells that are relationship configuration of neighbor neighbor relationship not configured in the neighbor check of top relationship. Scattered site relationship. Check whether cells deployment or network the call drop KPI is restored. optimization leads to incorrect neighbor relationship. Coverage Analyze the MCS and CQI Coverage evaluation report of Perform network optimization check of top contained in the traffic top cells to optimize the coverage. cells statistics, CHR, and drive test data to check for coverage overlap or weak coverage of the top cells. Interference Analyze the real-time trace Interference evaluation report Find out and remove the check of top data of the top cells to check of top cells interference. cells for inter-modul inter-modulation ation interference and external interference. Check whether the major Major event Check for allocation of a large 1. Check the terminal type quantity of phone numbers and involved in the number event is coupled to the check major activity (such as allocation, quantity of number deterioration of the call drop ceremony cerem ony,, holidays, holidays, and games allocation, and subscription rate in the time dimension. in the vicinity of top cells. policy.. policy 2. Determine the range and time segment of the major event. Page 36
Diagnosing Radio Problems •
Fault Description
If the abnormal release is recorded in the counter L.E -RAB.AbnormRel.Radi -RAB.AbnormRel.Radio, o, the abnormal release is caused by Uu interface and occurs in a non-handover scenario.
•
Possible Cause
The abnormal release is caused by weak coverage, uplink interference, or abnormal UE that lead to maximum number of RLC retransmissions, out-of-sync, or failure of signaling interactions. For details about diagnosing the interference problem, see LTE RF Channel Check and Troubleshooting Guide.
•
Fault Handling Procedure
Analyze the CHR CHR to check whether some top UEs have the the highest count.
Analyze the cause values recorded in the CHR.
If the call drop is caused by a f actor other than the signaling procedures, analyze analyze the DRB scheduling at layer 2 to determine whether the call drop is caused by weak coverage or interference.
If the call drop is caused by signaling procedures, observe the last ten signaling messages to determine the faulty signaling procedure. Determine whether the fault of the signaling procedure is due to failure to receive or process the signaling messages by either the UE or eNodeB.
The cause values recorded in the CHR are UEM_UECNT_REL_UE_RLC_UNRESTORE_IND, UEM_UECNT_REL_UE_RESYNC_TIMEROUT_REL_CAUSE, UEM_UECNT_REL_UE_RESYNC_DATA_IND_REL_CAUSE, UEM_UECNT_REL_UE_RLF_RECOVER_FAIL_REL_CAUSE, UEM_UECNT_REL_UE_RLF_RECOVE R_FAIL_REL_CAUSE, UEM_UECNT_REL_RRC_REEST_SRB1_FAIL, and UEM_UECNT_REL_RB_RECFG_FAIL_RRC_CONN_RECFG_CMP_FAIL.
Page 37
Diagnosing Handover Failures •
Fault Description
If the abnormal release is recorded in the t he counter L.E-RAB.AbnormRel.HOFailure, the abnormal release is caused by outgoing handover failure.
•
Fault Handling Procedure
Obtain the top cells that have the highest counter L.E-RAB.AbnormRel.HOFailure, analyze the pairs of source and target cells to obtain the top target cells that have the highest failure rate.
Analyze the CHR of the source and target cells to determine whether the handover failure is caused by failure to receive the handover command or random access failure. Examples of the t he cause values are UEM_UECNT_REL_HO_OUT_X2_REL_BACK UEM_UECNT_REL_HO_OUT_X2_REL_BACK_FAIL _FAIL and UEM_UECNT UEM_UECNT_REL_HO_OUT_S1 _REL_HO_OUT_S1_REL_BACK_FAIL. _REL_BACK_FAIL.
Optimize the handover parameters and neighbor relationship and check whether the call drop KPI is improved.
Page 38
Diagnosing the Transmission Network Problem •
Fault Description
If the abnormal release is recorded in the t he counter L.E-RAB.AbnormRel.TNL, the abnormal release is caused by the transmission network.
•
Possible Cause
This call drop is caused by the t he abnormal transmission between the eNodeB and MME, such as S1 interface break.
•
Fault Handling Procedure
Check for alarms about the transmission network. Clear the alarms and check whether the problem of abnormal release is solved.
Observe the M2000 and check whether alarms about the transmission network are recorded in the M2000.
Clear the alarms.
If abnormal releases are still recorded in the counter L.E-RAB.AbnormRel.TNL, collect the logs and submit them to R&D engineers for further analysis.
Page 39
Diagnosing the Congestion Problem •
Fault Description
If the abnormal release is recorded in the t he counter L.E-RAB.AbnormRel.Cong, the call drop is caused by resource resourc e congestion.
•
Possible Cause
This call drop is caused by radio r adio resource congestion, such as exceeding the maximum number of users.
•
Fault Handling Procedure
If the long-term congestion of a top cell leads to call drops, a short-term solution is to enable the MLB algorithm or inter-operation to alleviate the load of the local cell. The long-term solution is to expand the capacity.
Enable the MLB algorithm and check whether the congestion problem is alleviated.
Page 40
Diagnosing MME Faults •
Fault Description
If an abnormal release is recorded in the counter L.E-RAB.AbnormRel.MME, the abnormal release is initiated by the EPC. However, this abnormal release is not recorded in the counter c ounter L.E-RAB.AbnormRel. L.E-RAB.AbnormRel.
•
Fault Handling Procedure
Analyze the information of the EPC. The cause value recorded in the CHR is UEM_UECNT UEM_UECNT_REL_MME_CM _REL_MME_CMD. D. Analyze the last ten signaling messages recorded in the CHR. If these messages show that the problem is not caused by the eNodeB, focus f ocus on analysis of the EPC.
Analyze the S1 interface trace of of the top cells to obtain the distribution distribution of the cause value. Discuss with the EPC engineers about the analysis result and signaling messages.
Page 41
Deliverables •
Output of the activities in the checklist
•
If the front-line engineers fail to solve a difficult diffi cult problem, collect the following information and submit them to R&D engineers for further analysis:
One-click log (Mandatory)
Standard interface signaling (Mandatory)
Signaling trace of the S1, X2, and Uu interfaces
Network configuration (Mandatory)
Logs of the LMPT and LBBP of the top cells
Topology information, engineering parameters, and configuration files of the top sites
TTI trace (Optional)
IFTS trace and cell trace. These traces generate large amount of data. Only the data of the top cells and
top time segments is collected.
Single-UE trace (Optional)
The single-UE trace is used for in-depth diagnosis of top UEs. The entire-network single-UE trace can be performed by using the IMSI queried from the EPC using the TMSI.
Page 42
Content • Definition of the Service Drop Rate • Symptoms of a Service Drop • Cause Analysis and Data Processing • Checklist and Deliverables • Case Study
Case 1: RRC Reestablishm Reestablishment ent Failure of a UE
As shown in the upper right figure , the cause value of the abnormal release is RRC_REEST_SRB1_FAIL.
As shown in the middle right fig ure, this problem occurs repeatedly from 11:51 o'clock to 18:49 o'clock in cell 0.
As shown in the lower right figu re, the TMSI column shows that this problem is contributed by a single UE whose TMSI is C2 B0 B0 40 and the cause value is "Reconfiguration Failure".
As shown in the lower left figu re, the message type indicates that this reconfiguration message is not a handover command command or measurement control. This message is probably for reconfiguration of the CQI, SRS, or transmission mode (TM). Upon reception of the RRC CONN REESTAB message, the UE does not respond. Therefore, the eNodeB releases the UE in 5s.
Page 44
Case 2: UE Exception
Analysis of the CHR shows that the cause value of the abnormal release is
RLC_UNRESTORE_IND. This cause value indicates that the maximum number of DRB RLC retransmissions is exceeded.
This problem occurs repeatedly from 10:51 to 13:49 in cell 2.
The TMSI column indicates that this p roblem is contributed by a single UE whose TMSI is C2 7F 20 56.
The last 16 DRB scheduling procedures at a period of 64ms indicate that the
symptoms are similar. The symptoms are that the UE encounters suddenly terminated data transmission shortly after the access. The duration from access to release is tens of seconds to 2 minutes, indicating that the problem is not caused by script test. The a ccess type is MO-DATA, indicating that the u ser is performing a service.
Page 45
Case 3: Poor Uplink Quality •
As shown in the the right figure, the the uplink RSRP and SINR received by the eNodeB are poor from the last four 512 ms to the last sixteen 64 ms: The uplink RSRP is below – below –135 135 dBm and the SINR of the SRS and DMRS is below – –3 3 dB, indicating that the service drop is caused by uplink weak coverage.
•
As shown in the left figure, from the last four 512 ms to the last sixteen 64 ms, the uplink RSRP is about – about –130 130 dBm but the SINR of the uplink SRS and DMRS is below – below –3 3 dB, indicating that the service drop is due to weak coverage caused by weak uplink interference.
Page 46
Case 4: Target Cell Reconfiguration Failure •
•
Release cause
TGT_ENB_RB_RECFG_FAIL is the cause value contained in the RB reconfiguration failure message during a handover.
The symptom is that after the UE is successfully handed over to the target cell, the target eNodeB sends the PATH SWITCH REQ ACK message to the MME and, in 100 ms, sends s ends the UE CONTEXT REL REQ message containing the cause value "unspecified". The lower left figure shows the last ten signaling messages.
Fault diagnosis
During the handover procedure, the EPC delivers the PATH_SWITCH_ACK message containing the downlink AMBR value that is inconsistent with the downlink AMBR contained in the S1/X2 handover request. Analysis shows that this is a defect of the RR module. The upper-layer control module of the RR module sends the AMBR Update message to the RB module who thinks that there is no need to deliver a reconfiguration message to the UE. Therefore, the RB module returns a null value to the upper-layer control module. However, the upper-layer control module regards this return value as an exception and releases the UE. This problem is solved in eRAN2.1SPC430.
Page 47
Case 5: Service Drop Caused by Inter-RAT Redirection •
Release cause: Inter-RAT redirection
IRHO_REDIRECTION_TRIGER is the IRHO_REDIRECTION_TRIGER release caused by inter-RAT redirection. In eRAN2.1SPC400/SPH401, this cause value is counted as a call drop, as shown in the following figure.
This problem is solved in eRAN2.1 SPC420, as shown in the right figure.
Page 48
Case 6: Service Drop Caused by Abnormal Transmission •
On December 11, the service drop rate of the entire network deteriorates for the Tele2 900M, Telenor 900M, and Tele2 2.6G bands, as shown in the following figure.
•
Huawei field engineers discussed with the customer and suspected the EPC. However, they got no positive answer.
Page 49
Case 7: Service Drop Caused by Abnormal Uu Interface •
Release cause
UE_RESYNC_TIMEROUT_REL_CAUSE UE_RESYNC_TIMER OUT_REL_CAUSE indicates that the abn ormal release is caused by resynch ronization upon timeout of the resynchronization timer. The same problem is recorded by the standard interface trace as "Radio Connection With UE Lost".
UE_RLC_UNRESTORE_IND indicates that the abnormal release is caused by res toration failure after exceeding the maximum number of RLC retransmissions. The same problem is recorded by the standard interface as "Radio resources not available".
UE_RESYNC_DATA_IND_REL_CAUSE indicates that the abnormal release is caused by resynchronization triggered L2 report data. The same
problem is recorded by the standard interface trace as "Unspecified".
•
Cause analysis
The DRB scheduling information at the last 4 512ms and 16 64ms periods shows that most abnormal releases are caused by suddenly terminated data transmission, possibly caused by unplugging the data card or UE fault. The following figure shows the CHR information.
Page 50
Case 8: RRC Connection Reestablishment Failure •
Release cause (“Radio Connection With UE Lost” recorded in the standard interface trace)
RRC_REEST_SRB1_FAIL indicates failure to restore SRB1 during RRC reestablishment.
The last 10 signaling messages as shown in the following figure indicates that after sending the RRC_CONN_REESTAB message, the eNodeB fails to receive the RRC_CONN_REESTAB_CMP RRC_CONN_REESTAB_CMP message from the UE before the 5s timer on the Uu interface expires.
The L2 scheduling information shows that the UE sends the ACK message upon reception of the RRC_CONN_REESTAB message.
We suspect that the problem is caused by failure of som e UEs to send the RRC_CONN_REESTAB_CMP message. Some Samsung UEs have such a problem.
Page 51
Thank you www.huawei.com