sAccepted 22 February 2015
Available online 3 March 2015
Early fault detection
Fault detection delay ue of early detection of network element outages. Timeliness of outage detection as ult, more people start lt, problems related to anagement system is and resolved faults as
Decision Support Systems 73 (2015) 57–73
Contents lists available at ScienceDirect
Decision Supp .eused to link customer traffic over an ADSL port to the Ethernet aggregation. The physical link between the subscriber and the DSLAM port is a twisted copper pair in subscriber cable — the most common kind of well as reported failures. Most of the data refer to alarms that are used to detect and diagnose faults. Generally, all active network elements throw alarms, but there are passive pieces of equipment in the networkusers such as: Internet access, access to video services, Internet Protocol
TeleVision (IPTV), Video On Demand (VOD) and Voice Over Internet
Protocol (VOIP) telephone services. In the access part of the network (2), Digital Subscriber Line Access Multiplexer (DSLAM) architecture is offered to a larger number of customers. As a res to report failures. After repair of this kind of fau services of the whole group disappear. A fault m designed to record data related to all detectedBroadband networks contain a multitude of hardware and software components in different locations that can be subject to fault. A typical broadband network (Fig. 1) consists of three main parts. The IP/MPLS core part (1) is based onMultiprotocol Label Switching (MPLS) technology. In addition, there are head-end servers that provide services to installations. This part of the network is spatially most extensive.
From the perspective of broadband networkmanagement, faults can be categorized as those that affect services offered only to one particular customer and those that affect a whole group of customers. For example, a network element outage mostly degrades or disables servicesaccess. The end point of the access part is a D which the customer installation begins. The ⁎ Corresponding author. Tel.: +385 1 376 49 80.
E-mail address: email@example.com (Ž. Deljac). http://dx.doi.org/10.1016/j.dss.2015.02.014 0167-9236/© 2015 Elsevier B.V. All rights reserved.network termination equipment (ADSL modem, Splitter), other CPEs (IPTV STB, TV, handset and other devices) and in-house customer1. Introductionsystemused and its performance. The intent of this paper is to investigate and propose a complementary solution to improve the performance of the existing systems in detecting faults earlier than it was able to do before. In developing our approach two constraints are given. The existing operational environment cannot be changed; threshold tuning and parameter changing cannot be done; furthermore no additional infrastructure investment has been planned. Hence, our approach relies on an alternative method based on a two-stage hybrid statistical and diagnostic detector which we designed in a way that exploits additional available data and avoids alarm monitoring system imperfections. The role of this detector is twofold: early detection of network element outages based on customer trouble calls and rule-based decisionmaking for faulty-element isolation based on knowledge derived from fault and network management data. In this paper we present results of statistical analysis of trouble-reporting data. The analysis showed that the timing of customers' trouble reports and their content have information potential that can be utilized for early detection of outages. The detector is explained in detail and its accuracy and reduction delay is evaluated. The method presented can reduce the outage detection delay timeby2.33h on average observed in relation to theperformance of an existing faultmanagementprocesswhich was designed to detect outages solely on the basis of an alarm monitoring system, for the “difficulties in work” type of malfunction. We attained an overall probability of correct detection of 95.3%. Out of the total number of outages that hypothetically could be detected, by using thismethodwewere able to detect 77.5% of cases 1 h before the alarmwas raised in the existing alarm system,while 23% of caseswere detected 4 h before the actual alarm. The approach has been tested on real telecommunication network data over the period of one year. © 2015 Elsevier B.V. All rights reserved.Received 13 February 2014
Received in revised form 14 January 2015 well as accuracy in finding outages on equipment in a telecommunication network depend on the monitoring
Article history: This paper deals with the issEarly detection of network element outage trouble calls Željko Deljac a,⁎, Mirko Randić b, Gordan Krčelić a a T-Hrvatski Telekom (T-HT), Technical Functions, Savska 32, Zagreb, Croatia b Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia a b s t r a c ta r t i c l e i n f o j ourna l homepage: wwwistribution point behind third part (3) includesbased on customer ort Systems l sev ie r .com/ locate /dssincapable of producing alarms. Therefore, not all the network is under the real-time supervision of the fault management system. Data about alarms is entered into the database automatically while other data (e.g. about failures reported by customers) is entered by technicians during the resolving process. Accordingly, databases give an accurate 58 Ž. Deljac et al. / Decision Support Systems 73 (2015) 57–73insight into faults, causes of faults and failures that have been reported.
In  we presented results of the operational data analysis referring to the Croatian telecom T-HT broadband network. The following distribution of faults considering their locations has been derived from operational data. The majority, 70.86%, of faults occur in the customer part of the network (part 3). In the access part (part 2), 26.53% of faults appear, while the remainder, 2.61%, occur in the core part of the network (part 1). Devices in part 1 are mostly doubled, which means redundancy and fault-tolerant characteristics. On the contrary, there are no redundancies related to devices in parts 2 and 3. Therefore, each fault in these parts causes service failure. Early detection and diagnosis of network element outages that appear in the access part (part 2) of a broadband network is the subject of research presented in this paper.