Phone: (469) 634-4200

Use Case – Surviving EPC Network Node Failures

Verify your DRA/DSC will protect HSS and PCRF from signaling spikes caused by node outages within the core network.

Scaling up an LTE core network is a major challenge for operators, particularly the Diameter signaling load generated by user management, dynamic policy, and charging.  Service providers, Operators and Original Equipment Manufacturers (OEM) are scaling their networks and networks elements for the extremely high volume of Diameter signaling traffic that is expected in the next few years.

Network elements such as the Mobility Management Entity (MME), Policy and Charging Enforcement Function (PCEF), Policy and Charging Rules Function (PCRF), and Home Subscriber Server (HSS) along with the Diameter Routing Agent/Diameter Signaling Controller (DRA/DSC), must be prepared for this traffic volume.  Some of these nodes have the difficult task of maintaining the state of the subscribers, one of the most valuable resources for a network operator.  Other nodes, such as Online Charging Systems (OCS) will also begin to see high traffic volume as more LTE elements come online.  Part of the preparation is devising reasonable high availability and recovery for the networks and the nodes that make up the networks.

When a node fails within the network, a signaling storm or spike is created.  The DRA/DSC, a critical component of the LTE core network, will need to sort out this signaling storm, routing traffic to the remaining network nodes until the faulty node is restored to service.  Among its many responsibilities is that of balancing Diameter signaling traffic in the Evolved Packet Core (EPC) between clients and the available destination servers.

In this specific Case, a Developing Solutions customer used dsTest to verify that their DRA/DSC is engineered for, and capable of, handling this failure event from the time the node fails until it is back in service.

Case Setup

In order to test the network setup to verify the objective, we established a test situation with the following parameters:

  • Two regions consisting of MMEs, PDN Gateways (PGW) and Serving Gateways (SGW);
  • Simulation of Diameter traffic when a PGW/PCEF in a region goes down and subscribers are moved to a new PGW.

A network diagram of the Case is shown in Figure 1:

DRA Testing, DRA Netwrok Element Testing, DRA Element Protocol Tesing

In order to test this simulation, we established the following criteria:

  • Two million subscribers spread equally between two EPC network regions;
  • PGW in region 1 shuts down;
  • Subscribers must return using a different packet core node (MME, SGW, PGW) but with the same eNodeB.

The message flow for the signaling of a subscriber that must be rerouted (by the DRA) is shown in Figure 2.

DRA element protocol testing, DRA Network Element Testing, EPC Network Node Failure

Case Results

dsTest maintained the state of the subscribers returning to the network after their session is lost.  When sessions are lost for the 1 million subscribers in Region 1, dsTest throttled the subscribers at a rate that has been established as a safe rate based upon the known capabilities of the DRA/DSC under test and remaining in-service network nodes.

Since the rate at which subscribers reconnect depends upon how fast subscribers can get through the new MME, we chose 30K TPS, based on customer input/requirements.  This equates to 15K subscribers trying to authenticate per second.  In addition to the 30K TPS the MME will generate, downstream traffic for the Gx sessions plus the related cancel messages from the HSS will add more TPS to the network under test.