Author's personal copy ARTIC AR TICLE LE IN PR PRESS ESS Reliability Engineering and System Safety 94 (2009) 1618–1628
Contents lists available at ScienceDirect
Reliability Engineering and System Safety journal homepage: www.elsevier.com/locate/ress
A practical procedure for the selection of time-to-failure models based on the assessment of trends in maintenance data D.M. Louit a, R. Pascual b, , A.K.S. Jardine c
a
Komatsu Chile, Av. Americo Vespucio 0631, Quilicura, Santiago, Chile Centro de Minerı´a, Pontificia Universidad Cato´lica de Chile, Av. Vicun˜a Mackenna ˜a Mackenna 4860, Santiago, Chile c Department of Mechanical and Industrial Engineering, University of Toronto, 5 King’s College Road, Toronto, Ont., Canada M5S 3G8 b
a r t i c l e
i n f o
Article history: Received Received 24 April 2008 Received in revised form 6 April 2009 Accepted 10 April 2009 Available online 18 April 2009 Keywords: Trend testing Time to failure failure Model selection Repairable systems NHPP
a b s t r a c t
Many times, reliability studies rely on false premises such as independent and identically distributed time between failures assumption (renewal process). This can lead to erroneous model selection for the time to failure of a particular component or system, which can in turn lead to wrong conclusions and decisio decisions. ns. A strong strong statisti statistical cal focus, focus, a lack of a systema systematic tic approac approach h and someti sometimes mes inadeq inadequate uate theoretical theoretical background seem to have made it difficult difficult for maintenance maintenance analysts to adopt the necessary stage of data testing testing before the selection of a suitable model. In this paper, a framework for model selection to represent the failure process for a component or system is presented, based on a review of available trend tests. The paper focuses only on single-time-variable models and is primarily directed to analysts responsible responsible for reliability reliability analyses in an industrial industrial maintenance environment. environment. The model selection selection framework framework is directed directed towards the discriminatio discrimination n between the use of statistical distributions distributions to repres represent ent the time time to failure failure (‘‘renewa (‘‘renewall approa approach’ ch’’); ’); and the use of stochas stochastic tic point point proces processes ses (‘‘rep (‘‘repaira airable ble systems systems approa approach’ ch’’), ’), when there there may be the presenc presence e of system system ageing ageing or reliabi reliability lity growth. An illustrative example based on failure data from a fleet of backhoes is included. & 2009 Elsevier Ltd. All rights reserved.
1. Introducti Introduction on
As described by Dekker and Scarf [1] [1] maintenance maintenance optimization consists of mathematical models aimed at finding balances between costs and benefits of maintenance, or the most appropriat priate e moment moment to execu execute te maint mainten enanc ance. e. Many Many times, times, these these models are fairly complex and maintenance analysts have been slow to apply them, since often data are scarce or, due to lack of statistic statistical al theoreti theoretical cal knowled knowledge, ge, models models are very difficult difficult to implem implement ent corr correct ectly ly in an indust industria riall settin setting. g. Other Other,, more more qualitative qualitative techniques techniques such as reliability centered maintenance (RCM) or total productive productive maintenance (TPM) have then played an important role in maintenance optimization. Nevertheless, data analysis and statistical modeling are definitely very valuable tools engineer engineerss can employ to optimize optimize the maintena maintenance nce of assets assets under their supervision. Acknowledging that many reliability studies or maintenance optimiza optimization tion programs programs do not require require sophistic sophisticated ated statistica statisticall inputs, Ansell and Phillips [2] reinforce that even at a basic level, we should always be critical of the analysis and ask whether a technique is appropriate.
Corresponding author. E-mail address:
[email protected] (D.M. Louit).
0951-8320 0951-8320/$ /$ - see front matter & 2009 Elsevier Elsevier Ltd. All rights reserved. reserved. doi:10.1016/j.ress.2009.04.001 doi:10.1016/j.ress.2009.04.001
The gap between researchers researchers and practitioners practitioners of maintenance maintenance has resulted in the fact that although many models rely on very specific assumptions for their proper application, these are not normally discriminated by the practitioner according to the real operating conditions of their plants or fleets, i.e. real-world data [3,22,43].. O’Connor (cited in [2] [3,22,43] [2])) points out that much reliability analysis is done under false premises such as independence of components, components, constant failure rates, identically distributed variables, etc. As critical constituents of any reliability analysis, timeto-failure models are not excluded of this situation; thus many times the use of conventional time-to-failure analysis techniques is adopted when they are, in fact, not appropriate. The aim of this paper is to provide practitioners with a review of techniques useful for the selection of a suitable time-to-failure model, specifically looking at the case when the standard use of statistical distributions is useless, given the presence of long-term trends in the maintenance failure data. The paper focuses on the selection of single time variable models, since they are the most comm commonl only y applie applied d in pract practice ice,, rather rather than than in more more comple complex x multivar multivariate iate models models such as the proportio proportional nal hazards hazards model, model, whic which h have have also also show shown n grea greatt valu value e in thei theirr appl applic icat atio ion n to maintenance and reliability [3] [3].. The above does not imply that we propose that time-to-failure mode models ls shou should ld be the the cent center er of atte attent ntio ion n in a reli reliab abil ilit ity y improvement study; on the contrary, they should only act as a
Author's personal copy ARTICLE IN PRESS D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 1618–1628
Objectives
Maintenance Data
Model Selection Failure Process
Optimization Model
Solution Fig. 1. General framework in a reliability improvement study (modified from [2]).
tool for the objective the engineers assign to it. Actually, the logical priority is that of objective, data and, finally, model selection (as shown in Fig. 1). In other words, as suggested by Ansell and Phillips [4] an analysis should be problem led rather than technique or model centered. Nevertheless, correct assessment of the failure process and of time to failure is usually of critical importance to the (posterior) economic analysis required to find an optimal solution to the problem that originated the analysis. Discussion of approaches to maintenance and reliability optimization and models mixing reliability and economics can be found in several references, for example [5–7]. When dealing with reliability field data, frequently some practical problems such as the unavailability of large sets of data occur. This paper will briefly touch on this and other problems, as they are relevant to the discussion of model selection techniques. This document is structured as follows. Section 2 refers to common practical problems found in the analysis of reliability data. Section 3 describes the concept of repairable systems and identifies some of the models available for their representation. Section 4 presents a series of graphical and analytical tests used to determine the existence of trends in the data. Section 5 proposes a procedure based on these tests to correctly select a time-to-failure model, discriminating between a renewal approach and the use of an alternative, non-stationary model, such as the non-homogeneous Poisson process (NHPP). Section 6 presents numerical examples using data coming from a fleet of backhoes. Finally, Section 7 contains a summary of the paper.
2. Some practical problems in reliability data
2.1. Scarce data One major problem associated with reliability data is, ironically, the lack of sufficient data to properly run statistical analyses, as many authors mentioned repeatedly. In fact, Bendell [8] points out that all statistical methodologies are limited when done based on small data sets, since the amount of information contained by such sets is by nature small. Furthermore, as mentioned in [9], empirical evidences indicate that sets of failure times typically contain ten or fewer observations, which emphasize the need to develop methods to deal adequately with small data sets (naturally the larger the data set, the more precise the statistical analysis). Also, many data sets are collected for maintenance management rather than reliability. Hence the information content is often very poor and can be misleading without careful scrutiny of the material and cleaning if necessary.
1619
Thomas, a discussant of Ansell and Phillips [10], states that the insufficient data quantity problem would never disappear, since as the aim of maintenance is to make failures rare events, one could expect that as maintenance improves, fewer failures should occur. Thus the solution, he says, is based on better a priori modeling. Bayesian techniques, directed to incorporate into the models all the prior information available, are useful in this case. Barlow and Proschan [11], Lindley and Singpurwalla [12], Singpurwalla [13], Walls and Quigley [44] and Guikema and Pate´ -Cornell [45] are relevant sources for Bayesian methods in reliability. Although it escapes the reach of this paper, it is valuable to mention Bayesian analysis, because it provides a means to reach optimal decisions, using standard models as the ones discussed later on in this paper, when lack of data is a problem. In simple words, and as described in [14], Bayesian estimation methods incorporate information beyond that contained in the data, such as:
previous systems estimates; generic information coming from actual data from similar systems; generic information from reliability sources and expert judgment and belief. This information is converted into a prior distribution, which is then updated using new data gathered during the operation into posterior distributions representing the failure process of the component or system of interest. Scarf [3] warns about the fact that many times the expert judgment comes from the same people responsible for the maintenance actions; thus it is possible that prior distributions may reflect the current practice rather than the real underlying failure process; so special care should be taken when attempting to use a Bayesian approach. 2.2. Data censoring Another common practical problem in a reliability study is the presence of censored data. A censored observation corresponds to a non complete time to failure or to a non-failure event, but this does not mean it does not contain relevant information for the reliability modeler. Censoring can usually be classified in right, left or interval censoring. Truncation may also be a practical problem in some data sets, commonly confused with censoring. An example of the latter is the case when the time to failure can only be registered if it lies within a certain observation window (failures that occurred outside this interval are not observed, thus no information is available to the modeler). Another situation that may arise is when data collection begins in a specified moment of time and the operating time of the items under analysis is unknown before the start of the monitoring period. The monitored life to failure of a component under observation can then be called residual life. If time-to-failure data are found to be subject of representation by a renewal process (RP), a statistical distribution can be fitted to times between failures (see Section 3). There are well-known techniques to determine parameters for many distributions in the presence of these censoring types. Detailed descriptions of such techniques can be found in [15–18], among others. Tsang and Jardine [19] propose a methodology for the estimation of the parameters of a Weibull distribution using residual-life data. 2.3. Combining data A valid alternative when data are scarce is the combination or pooling of data from similar pieces of equipment. This is a
Author's personal copy ARTICLE IN PRESS 1620
D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 1618–1628
normally used procedure in reliability analysis, especially in operations where a large number of identical systems are utilized, such as fleets of mobile equipment or parallel production lines. Stamatelatos et al. [14] provide a check list of coupling factors for common cause failure events (one triggering cause originates several failures), found to be helpful in the definition of conditions needed for proper data pooling, as usually these same conditions are met by equipment subject to data pooling:
same design; same hardware; same function; same installation, maintenance or operations people (and conditions);
same procedures; same system–component interface; same location and same environment. The word ‘‘same’’ in the list above could be replaced by ‘‘similar’’ in many cases, as engineers should use their judgment and experience in assessing the similarities between two or more items before they are combined for analysis. If a more rigorous analysis is needed, when in the presence of two or more samples of data from possibly different populations, various statistical methods can be used to determine if there are significant differences between two populations (or ‘‘two-sample’’ problem), even in heavily censored data sets [20,21]. 2.4. Effect of repair actions Usually in practice, components can be repaired or adjusted, rather than replaced, whenever a breakdown occurs. These interventions (here referred to as ‘‘repair actions’’) are likely to modify the hazard rate of the component; so it can be argued that the expected time to failure after an intervention takes place is different from the expected time to the first failure of a new component. But the most common approach to reliability assessment does not take this into account, as time to failure is modeled using statistical distributions assumed to be valid for every failure of the component or system (first, second, third, etc.). This is called the ‘‘as good as new’’ or renewal assumption. In fact, in the reliability literature, as Ascher and Feingold [22] notice, there is ‘‘a fixation in mortality, or rather, its equivalent in reliability terms, time to failure in a non-repairable item or time to first failure of a repairable system’’. In order to take repair actions into account (when they effectively affect the behavior of the component or system under study), a so called ‘‘repairable systems approach’’ has to be adopted.
3. Repairable systems
A non-repairable system is one which, when it fails, is discarded (as repair is physically not feasible or non-economical). The reliability figure of interest is, then, the survival probability. The times between failures of a non-repairable system are independent and identically distributed, iid [23]. This is the most common assumption made when analyzing time-to-failure data, but as many authors mention, it might be unrealistic in some situations. Many examples have been given of systems that rather than being discarded (and replaced) on failure, are repaired. In this case, the usual non-repairable methodologies (statistical
distribution fitting such as Weibull analysis, for instance) simply cannot be appropriate [24]. Repairable systems, on the other hand, are those that can be restored to their fully operational capabilities by any method, other than the replacement of the entire system [22]. In this sense, reliability is interpreted as the probability of not failing for a particular period. This analysis does not assume that times between failures are independent or identically distributed. When dealing with repairable systems, reliability is not modeled in terms of statistical distributions, but using stochastic point processes. The number of failures in an interval of time can be represented through a stochastic point process. Furthermore, in this case the point process can be interpreted as a counting process, and what it counts is the number of events (failures) in a certain time interval. In reliability analysis, such a process is said to be time truncated when it stops counting at a particular instant. It is called failure truncated when it stops counting when a certain number of failures is reached. The five main stochastic process models applied to modeling of repairable systems are [22]
The renewal process (RP). The homogeneous Poisson process (HPP). The branching Poisson process (BPP). The superposed renewal process (SRP) and The non-homogeneous Poisson process (NHPP). The RP assumes that the system is returned to an ‘‘as new’’ condition every time it is repaired, so that it actually converts time between failures in time to first failure of a new system or, in other words, leads to a ‘‘non-repairable system approach’’, in which time to failure can be modeled by a statistical distribution and the iid assumption is valid. The HPP is a special case of the RP, that assumes that times between failures are independent and identically exponentially distributed, so the iid assumption is also valid and the time to failure is described by an exponential distribution (constant hazard rate). The BPP is used to represent time-to-failure data that can be assumed to be identically distributed, but not independent. As Ascher and Feingold [22] mention, this process is applicable when a primary failure (or a sequence of primary failures having iid times to failure) can trigger one or more subsidiary failures; thus there is dependence between the subsidiary failures and the occurrence of the primary, triggering failure. Very few practical applications of this model are found in the literature. A thorough description of the BPP and its application to the study of repairable systems can be found in [25]. The SRP is a process derived from the combination of various independent RPs, and in general it is not an RP. For example, think of a set of parts within a system that are discarded and replaced every time they fail, independently. Each part can be modeled as an RP, and then the system would be modeled using an SRP. But as a possibility exists to investigate the times between failures for the system as a whole, then the question whether this approach is justified or not arises [4]. In addition, the superposition of independent RPs converges to a Poisson process (possibly nonhomogeneous), when the number of superimposed processes grows (by the theorem of Grigelionis, see [26]). Since the RP and the HPP are equivalent to the regular, nonrepairable items methods, and the BPP and SRP have either not been largely applied or, in the case of the latter, can be approximated by an NHPP (with a relatively large number of constituent processes), they will not be described in greater detail here. These models are covered in detail by Ascher and Feingold [22].
Author's personal copy ARTICLE IN PRESS D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 1618–1628
When the repair or substitution of the failed part in a complex system does not involve a significant modification of the reliability of the equipment as a result of the repair action, the NHPP is able to correctly describe the failure-repair process. Then, the NHPP can be interpreted as a minimal repair model [27]. Note that for (i) hazardous maintenance, i.e. when condition of the equipment is worse after repair than it was before failure or for (ii) imperfect repair , i.e. when reliability after repair is better than just before failure, but not as good as new, other models have been proposed. These models are even more flexible than the NHPP, as they allow for better representation of imperfect repair scenarios (see, e.g. [28,29], among others). Nevertheless, we concentrate on the NHPP given its simplicity, along with the following reasons (as listed by Coetzee [30]):
1621
observed from the failure data, thus an iid assumption may be made, since time between failures is apparently independent from the age of the equipment. For B and C, however, a trend is clearly present; for B a decrease in reliability is evident, while for C reliability growth is taking place. Whenever these latter situations occur, and there is significant evidence for recognition of an ageing process taking place, the usual RP approach has to be disregarded, and an alternative non-stationary approach would usually be used to model time between failures for the system. Note that ‘‘equipment age’’ refers to the age of the system under analysis, measured from the moment it was first put into operation, as apposed to the time elapsed since the last repair (which is significant for a RP). 3.1. The non-homogeneous Poisson process
i. it is generally suitable for the purpose of modeling data with a trend, due to the fact that the accepted formats of the NHPP are monotonously increasing/decreasing functions; ii. NHPP models are mathematically straightforward and their theoretical base is well developed; iii. models have been tested fairly well, and several examples are available in the literature for their application. Under the NHPP, times between successive failures are neither independent nor identically distributed, which makes this model the most important and widely used in the modeling of repairable systems data [31,32]. Actually, whenever a trend is found to be present in time between failures data, a non-stationary model such as the NHPP is mandatory, and the regular distribution fitting methods are not valid. The next section of the paper reviews a series of trend-testing techniques found very helpful for model selection purposes, focusing on the discrimination between a renewal approach and the need for an alternative model, such as the NHPP. Should the reader decide to pursue the modeling of times to failure using other non-stationary models (i.e. imperfect repair models), the techniques presented in this paper are equally valuable to establish the existence of trends in the data, which justify the decision of not using the standard distribution fitting methods such as Weibull analysis. Fig. 2 shows three theoretical situations that may occur in practice, in relation to time to failure of a particular system. From the figure, it can be noticed that the three data sets generated from systems A–C are very different. For A, no clear trend can be
For each of the following diagrams, failure:
represents the occurrence of a
In practical terms, the NHPP permits the modeling of trend in the number of failures to be found in an interval in relation to total age of the system, through the intensity function. Two popular parameterizations for the intensity function of an NHPP are the power law intensity (also called Weibull intensity) and the loglinear intensity. The power law intensity gives its name to the power law process (NHPP with Weibull intensity) and is given by lðt Þ ¼ Zbt b1 ,
(1)
with Z , b 40 and t X0. The log-linear intensity function has the form lðt Þ ¼ e aþbt ,
(2)
with Noa,boN and t X0. Several practical examples reviewed show that the power-law process is preferred, because of its similarities with the Weibull distribution fitting method used regularly for non-repairable data. Actually, Z is the scale parameter and b the shape parameter, and the intensity function is of the same form of the failure rate of a Weibull distribution. Details concerning the fitting of a power-law and log-linear intensity functions to data from water pumps in a nuclear plant are discussed in [23]. A practical example using jet engine data combined from several pieces of equipment is found in [24]. Ascher and Kobbacy [33], and Baker [34] also provide applications of both log-linear and power-law processes. When using an NHPP, engineers could imagine it as a twostage problem: the first (say, inner stage) relates to the fitting of an intensity function to data and the second (outer stage) uses the cumulative intensity function to estimate reliability (or probability of failure) for the system. When trends are not present and data can be assumed to be iid, only one stage is needed, where directly a time-to-failure distribution is fitted to data and reliability (or probability of failure) estimates are obtained from it.
Age
4. Trend testing techniques
Age
Age Fig. 2. Possible trends in time between failures.
Given that a clear need for trend testing in time between failures data is identified, a first step in model selection should be that of assessing the existence of trend or time dependency in the data. Several techniques accomplish this task, and a selection of them is described here. Before presenting the testing techniques, it is of great importance to identify the possible trends one may encounter when analyzing reliability data. A trend in the pattern of failures can be monotonic or non-monotonic. In the case of a monotonic trend (such as the ones shown in Fig. 2), the system is said to be
Author's personal copy ARTICLE IN PRESS D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 1618–1628
1622
improving (or ‘‘happy system’’) if the time between arrivals tends to get longer (decreasing trend in number of failures); and it is said to be deteriorating (or ‘‘sad system’’) if the times tend to get shorter (increasing trend in number of failures). Non-monotonic trends are said to occur when trends change with time or they repeat themselves in cycles. One common form of non-monotonic trend is the bath-tub shape trend, in which time between failures increases in the beginning of the equipment life, then tends to be stable for a period and decreases at the end. It should be remembered, when testing for trend, that the choice of the time scale could have an impact on the pattern of failures; so special attention has to be given to the selection of the time unit (calendar hours, operating hours, production through put, etc. [31]). 4.1. Graphical methods 4.1.1. Cumulative failures vs. time plot The simplest method for trend testing is a plot of the cumulative failures against time for the system observed (Newton, in Ansell and Phillips [4]). When a linear plot results, data can be assumed to have no trend and the same distribution for time between failures is accepted. Fig. 3 shows generic plots expected. Plot A clearly shows the existence of a trend in the data, while plot B shows no evidence of trend. Sometimes, a curve like plot C occurs, where instead of a smooth trend, two or more straight lines can be plotted. This may be the consequence of changes in the maintenance policy or changes in the operational conditions of the equipment; for example, dividing the failure behavior of the system into two or more clearly different periods. When this situation arises, one alternative is to discard data not representative of the current situation; thus a no-trend plot would result, for the most recent data set, and a renewal assumption could then be made. When a plot like D occurs, a non-monotonic trend may be present in the data set. This kind of test is very simple to perform, does not require any calculations and is very powerful when there are strong trends in the data. When slight trends are present, this solution may not be enough and an analytical test should be performed. A weakness for this test is that assessment of trend is based on interpretation (as in all graphical procedures). Ascher and Feingold [22] provide an example of the use of this graphical test for diesel propulsion engines in a US Navy ship. Also in [22], it is noted that this test may result in masking local variations when very large samples are available. An alternative procedure is to divide the total observation interval into several equally sized intervals, calculating (and then plotting, if necessary) the average rate of occurrence of failures for each of them, using li ðt Þ ¼
N i ðt Þ N i1 ðt Þ with ði 1ÞDt pt pi Dt ; Dt
s e r u l i a F e v i t a l u m u C
(3)
s e r u l i a F e v i t a l u m u C
Time
where N i(t ) is the total number of failures observed from time zero to the ith interval and D t the length of each interval. If there is a trend in the data, then it will be reflected in the average rate of occurrences calculated. Then, if the system is improving, the successive values of l i(t ) calculated will decrease and vice versa. 4.1.2. Scatter plot of successive service lives A complementary test to the cumulative failures against time plot is one consisting in plotting the service life of the ith failure, against that of the (i1)th failure If no trend is observable, only one cluster of points should be obtained. Two or more clusters, or linear plots, indicate trend. This test is also very helpful in checking for unusual values for the failure times in a set of data, which may be related to poor data collection, accidents or other situations not representative of the failure process, and thus providing a means for identification of candidates for data filtering. Knights and Segovia [35], for example, applied this test to data coming from mining shovel cables (see Fig. 4 for an example of this type of plot, points out of the cluster suggest a revision of some failure times). The tests described up to this point are for a single system only. When in presence of multiple systems, two alternatives are available for combination of the systems for assessment of trend. The first is based on the assumption that all systems follow the same failure process (independently), and leads to the use of the total time on test (TTT) transform of failure times (see definition in Section 4.1.4). This approach results in one single process with times to failure given by the TTT transformed values; thus singlesystem tests can be applied to the transformed data set. The second alternative assumes that all systems follow possibly different failure processes, and leads to the combination of the
) e r u l i a f h t i ( e f i l e c i v r e S
Service life ((i-1)th failure) Fig. 4. Example of a successive service life plot (highlighted points indicate anomalies).
s e r u l i a F e v i t a l u m u C
Time
s e r u l i a F e v i t a l u m u C
Time
Time
Fig. 3. Cumulative failures vs. time plots—examples (A: Increasing trend, B: no trend, C: two clearly different periods, D: non-monotonic trend).
Author's personal copy ARTICLE IN PRESS D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 1618–1628
results of various single-system tests into what is called a combined test. This is usually performed by combining test statistics from single systems (for examples see Sections 4.2.2 and 4.2.4). 4.1.3. Nelson–Aalen plot Another useful graphical test is the Nelson–Aalen plot. This test uses a non-parametric estimate of the cumulative intensity function of an NHPP, L(t ), and plots it against time [31]. The estimate is given by
X
1 Lðt Þ ¼ , Y ðT ij Þ T pt ^
(4)
ij
where T ij is the time to the ith failure of the jth process under observation, Y (T ij) the number of systems operating immediately before time T ij and L (t ) ¼ 0 for t omin{T ij}. The formula in Eq. (4) is valid for multiple systems under observation (multiple processes, j ¼ 1,2,y,m). If there is no trend, then the plot would tend to be linear, and any deviation from a straight line indicates some kind of trend. It should be noted that when only one system is observed, then the Nelson–Allen plot is equivalent to the cumulative failures vs. time plot. It is also interesting to notice that the Nelson–Aalen plot counts the number of systems operating before a certain time; thus it may include suspensions to assess trend. 4.1.4. Total time on test (TTT) plot As mentioned above, sometimes we are in the presence of several pieces of equipment. Now, the combined failure process for the entire group of components observed may or may not present a trend. This test is directed to the identification of trend for the combined behavior. So, if there are m independent processes with the same intensity function (i.e. several identical systems under observation) and the observation intervals for each one are all contained in the interval [0, S ], then the total number of failures will be N ¼ m i¼1 ni , where ni is the number of failures observed for each process in its particular observation interval. For the superposed process (combination of the m individual processes), let S k denote the time to the kth failure time. And let p(u) denote the number of processes under observation at time u. If all processes are observed from time 0 to time S , then p(u) is t equal to m. Then, T ðt Þ ¼ 0 pðuÞ du is the total time on test from time 0 to time t (this is known as the total time on test, or TTT, transform—see [36]). The TTT plot test for NHPPs is given by a plot of the total time on test statistic , calculated as
P
R
T ðS k Þ ¼ T ðS Þ
R R
Sk 0 pðuÞ du , S 0 p ðuÞ du
1
1
0
1
1623
1
0
1
0
1
Fig. 5. Typical shape of TTT plots. A: increasing, B: decreasing, C: bath-tub shape intensities. Modified from [31]).
upper right section, whereas for a bath-tub shape ( Fig. 5C), further spacing will occur in the middle section of the curve. Some other graphical tools, such as control charts for reliability monitoring (described by Xie et al. [37]), can also constitute a useful method to identify if improvement or deterioration has occurred in a particular parameter of interest, such as the rate of occurrences of failures (ROCOF) or failure intensity. Nevertheless, they rely on an RP assumption and are not directed to test for trend when evaluating the use of a repairable systems approach. 4.2. Analytical methods If preferred over the graphical approach, analytical testing methods are available to test data for trends. Additionally, the null and alternative hypotheses of these tests are of great help in the determination of the most suitable model for the data. Ascher and Feingold [22] provide a very complete survey of analytical trend tests, and present them organized according to their null hypothesis (i.e. RP, HPP, NHPP, monotonic trend, nonmonotonic trend, etc.). Hereby, only the most popular tests will be described, according primarily to Elvebakk [38]. Other methods are described and referenced in [46]. 4.2.1. The Mann test The null hypothesis for this non-parametric test is an RP. Then, if this hypothesis is accepted, we can continue the reliability analysis, fitting a distribution to time-to-failure data. The alternative hypothesis is a monotonic trend. The test statistic is calculated counting the number of reverse arrangements, M , among the times between failures. Let T 1,T 2,y,T n be the interarrival times of n failures. Then a reverse arrangement occurs whenever T ioT j for io j. For example, if the following times to failure were observed for a system: 21; 17; 48; 37; 64; 13;
(5)
against the scaled failure number k/N , with k ¼ 1,2,y,N . When p(u) ¼ m, that is, when all processes are observed during the complete interval, the TTT plot is also called a scaled Nelson–Allen plot with axes interchanged [31]. Fig. 5 shows different forms possible to obtain when constructing a TTT plot. As in other graphical techniques, a linear plot is representative of a no-trend situation (thus validating a renewal assumption for the entire group of observed items). The TTT plot is especially useful to identify non-monotonic trends in time-to-failure data, such as the bath-tub failure intensity (see Fig. 5C). It is important to mention that spacing between points will not be constant in a TTT Plot. Rather, in an increasing trend ( Fig. 5A), larger spacement between points will occur at the lower left section of the plot. In the presence of a decreasing trend curve (Fig. 5B) points should tend to be further from each other at the
then, for the first failure time, 3 reverse arrangements occur, as 21o48, 21o37 and 21o64. We have that for the sample: M ¼ 3 þ 3 þ 1 þ 1 þ 0 ¼ 8.
In general
XX n1
M ¼
n
I ðT i oT j Þ
(6)
i¼1 j¼iþ1
I ( ) is an indicator variable used for counting the reverse arrangements present in the data set. It takes the value of 1 whenever the condition is met, in this case, when ( T ioT j). Mann [39] who originally developed the test, showed that M is approximately normally distributed for nX10 and tabulated probabilities for smaller samples. If the hypothesis of an RP is correct, then the expected number of reverse arrangements is equal to n(n1)/4, so large deviations
Author's personal copy ARTICLE IN PRESS D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 1618–1628
1624
from this number would indicate presence of a trend. This test considers a single system under observation. 4.2.2. The Laplace test This well-known test has a null hypothesis of HPP vs. an alternative hypothesis of NHPP with monotonic intensity. In other words, if the null hypothesis is not rejected, then we can assume that times between failures are iid exponentially distributed. If not, then a NHPP should be used. The test is optimal for NHPP with log-linear intensity function. The general idea behind the test is to compare the mean value of the failure times in an interval with the midpoint of the interval. If the mean of the failure times tends to deviate from the midpoint, then a trend is present and data cannot be assumed to be independent and identically distributed. The test statistic, L, approximately follows a standard normal distribution under the null hypothesis, and is calculated as
P q ffiffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi
L ¼
^
,
(7)
2 1 12 nðb aÞ
where T j is the age at failure for the jth failure, [a,b] is the interval ^ of observation and n is given by: ^
n¼
(
nðobserved number of failures Þ if the process is time truncated ðn 1Þ if the process is failure truncated :
Note that Eq. (7) can be simplified when the starting point of observation is time t ¼ 0, since (b+a) and (ba) both equal the end point of the observation interval. The statistic above is applicable for the case when only one process is being observed. Generalization of the laplace Test to more than one process is fairly simple, and for m processes, the statistic is given by the following expression (combined Laplace test statistic): ^
L ¼
^
ni m 1 Sm i¼1 S j¼1 T ij Si¼1 2 ni ðbi þ ai Þ
(8)
q ffiffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ^ 2 1 m 12Si¼1 ni ðbi ai Þ
L is expected to be equal to zero (or very close to zero) when no trend is present in the data. Then, the null hypothesis is rejected with small or large values of L, and the sign is an indication of the type of trend. If L40, then an increasing trend (deterioration) is detected. Analogously, if L o0, then a decreasing trend (improvement) is detected. 4.2.3. The Lewis–Robinson test This test, used for testing of the RP assumption, was derived by Lewis and Robinson [40]. The test statistic LR is obtained by dividing the Laplace test statistic L by the estimated coefficient of variation of the times between failures, cv, which is calculated as
c
cv ¼
sð X Þ X
c
,
(9)
where X is a random variable representing the times between failures of the system. Then, the LR statistic is given by LR ¼
L , cv
c
X ^
n
MH ¼ 2
ln
j¼1
(10)
with L given by Eq. (7). If the failure times follow a HPP, then LR is asymptotically equivalent to L, as cv is equal to 1 when the times between failures are exponentially distributed. That is, LR is asymptotically standard normally distributed. As in the Laplace test, the expected value of the statistic is zero when no trend is present; thus deviations from this value indicate trend. The sign is an indication of the type of trend.
c
b a , T j a
(11)
^
where a, b, T j and n have the same meaning as in the Laplace test. As before, the generalization to m processes is given by (combined MH test statistic):
XX m
MH ¼ 2
^
ni
ln
i¼1 j¼1
^
^ n 1 j¼1 T j 2 nðb þ aÞ
4.2.4. The military handbook test As in the Laplace test, the null hypothesis for this one is a HPP, and the alternative a NHPP with monotonic intensity. This test is optimal for NHPP with increasing power-law intensity (reliability deterioration with Weibull intensity function). The test statistic for a single system (process) is w 2 distributed ^ with 2 n degrees of freedom under the null hypothesis, and is defined as
bi ai . T ij ai
(12)
In this case, the MH statistic is w 2 distributed with 2 p degrees of ^ freedom under the null hypothesis of HPPs, where p ¼ m i¼1 ni . TTT-based statistics for both the Laplace and the Military Handbook test are also available for the pooling of data from several systems (see [31]). Another test, known as the Anderson–Darling Test (derived by Anderson and Darling [41]), has been found to be very powerful against non-monotonic trends, but normally simpler graphical tests are able to detect this situation. For this reason, it will not be described here.
P
5. Model selection procedure
Vaurio [42] and Ascher and Feingold [22] proposed procedures based on various trend tests, directed to the proper selection of models for time-to-failure data. Both methodologies are robust and incorporate a set of tests leading to the selection of a model, but are subject to simplification in order to achieve a larger use of the testing techniques by maintenance analysts. Based on this, a new diagram consisting of several steps to model selection is proposed. This procedure only considers explicitly two models with practical application—the RP and the NHPP (though it leaves the option open to the user to select other non-stationary models). The procedure also reduces the number of tests considered in order to concentrate the user’s efforts on the techniques that seem to be subject to easier practical implementation. The diagram presented below is similar to that of Vaurio [42], which though very complete in its procedure for model selection, appears to be too complex for regular industrial application. The Ascher and Feingold [22] flow diagram (A–F flow diagram) is simpler, but as they consider a broader review of tests and do not include them explicitly in the graphical representation of the procedure, it can possibly result in misguiding the practitioner. Fig. 6 presents the suggested guideline for model selection, applying the testing techniques reviewed here. As mentioned, this procedure is believed to be a simple way for maintenance analysts to correctly assess the failure processes in their operations and to discriminate whether a standard renewal approach or a ‘‘repairable systems’’ approach should be used to represent them. Although the use of an NHPP is suggested in this paper as it is capable of representing data with a trend, the reader should note that the NHPP is best interpreted as a minimal repair (or ‘‘as bad as old’’) model, thus it will not be necessarily be the most appropriate model for imperfect repair situations (neither ‘‘as good as new’’ nor ‘‘as bald as old’’ system after repair). Minimal repair is defined here as the situation when the component’s reliability
Author's personal copy ARTICLE IN PRESS D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 1618–1628
CMMS Databases Collect operating time for each failure registered 1. Define object of study 2. Identify similar systems
No data?
1625
Evaluate Bayesian techniques
Valid to combine? Order them chronologically (failures only)
Graphical tests (any) Test for Renewal Assumption
Weibull Exponential other No trend RP Valid?
Mann Test
Fit distributions to data
TEST HPP Valid?
Laplace Test LR test Military HB Test
Test against NHPP HPP Rejected?
Weibull Log-linear
Determine intensity function
Evaluate goodness of fit
NHPP (or other non-stationary model)
Evaluate goodness of fit RP
TIME TO FAILURE MODEL
Fig. 6. Framework for time-to-failure model selection—a practitioners’ approach.
characteristics are not noticeable changed by the repair action. Nevertheless, the procedure remains valid for the testing of the renewal assumption, even in the scenario of imperfect repair. When imperfect repair is observed, more flexible models (e.g. models I and II by Kijima [29]) could provide a better fit to the failure data. The box for ‘‘testing for renewal assumption’’ and the box for ‘‘testing against NHPP’’ have been grouped in Fig. 6, under TEST. Intuitively, all tests considered (graphical tests, Mann, Laplace, Lewis–Robinson and Military Handbook) have the same objective of testing for trends in the data, but only the Laplace and Military Handbook tests explicitly have a HPP as null hypothesis (that is, if no trend is found when applying these tests, then a HPP can be used, and an exponential distribution should be fitted to failure times). It is important to notice that suspended data is not considered in trend testing, as no technique is currently available to assess the existence of trends incorporating the effect of censored observations, to my knowledge, with the exception of the Nelson–Aalen plot. This plot, as it counts the number of systems in operation before each failure, may indirectly use suspensions in the trend assessment calculations. This is not necessarily a shortcoming of the framework proposed, since the tests reviewed usually need few failure times to be validated (using only failures and ignoring censored observations).
‘‘Two-sample’’ techniques allow for evaluation of the pooling of censored data sets, so that intensity functions or distributions may be fitted using more information. In a ‘‘repairable systems’’ approach, the simplest extension of the treatment of a single system to multiple systems is when they are all observed from the same time, as in this combination the group may be thought of as a single system. In this case, the rate of occurrences of failures for the combined set must be divided by the number of systems to determine the ROCOF for a single unit.
6. Case study
Failure data coming from a fleet of backhoes, collected between 1998 and 2003, are used in the following numerical example to illustrate the use of the trend tests and selection procedure described in the paper. These equipments are operated by a construction firm in the United States. The data consist of the age at failure for each of 11 pieces of equipment, with a total of 43 failures. Table A1 in the Appendix presents the complete data set. The following example will consider two cases: (i) single-system analysis, for which all calculations are based on backhoe #7 (with 7 failures during the observation period) and (ii) multiple-systems analysis, using the pooled data for all 11 backhoes. Time to failure is expressed in operating hours.
Author's personal copy ARTICLE IN PRESS D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 1618–1628
1626
6.1. Single-system example Backhoe number 7 had seven failures during the years 1998–2003. The times between failures are: 3090, 850, 904, 166, 395, 242 and 496 h. By looking at these numbers, deterioration appears to be present in the data, that is, times between failures seem to be shorter as the equipment ages. Fig. 7 (cumulative failures vs. time plot) confirms this belief. Similar plots are obtained for 7 out of the 11 backhoes analyzed. When plotting the successive service lives, only two points lay outside the main cluster (implying that one failure time might be an anomaly, see Fig. 8). This is the first service life of 3090 hours, which is much larger than the rest. This failure time (as well as all other failure times in the data set) was validated by the fleet operator;thus no need for further revision or elimination of data is identified. According to the procedure depicted in Fig. 6, the Mann test could be used to confirm the findings obtained by the cumulative failures vs. time plot, that a trend is present in the data. In this example, M ¼ 0+1+0+3+1+1 ¼ 6. This number differs significantly from the expected value of 10.5. Actually, it is significantly low, as P ðM oM =H 1 Þ ¼ 0 :881, implying that the data show a degradation trend (H 1 is the alternative hypothesis, of degradation trend). As a trend was identified, a renewal approach is not valid for modeling time to failure, and a ‘‘repairable systems’’ approach is required. Alternatively, testing through parametric tests such as the Laplace, Lewis–Robinson or Military Handbook tests was performed, with similar results (NHPP with monotonic intensity is the model selected). The Laplace statistic L equals 2.189 for backhoe #7. This is given that the process in this case is failure truncated (e.g. final point of the observation interval given by the failure time of the last recorded failure event). This result is
8 7
s e r 6 u l i a 5 f e v 4 i t a l u 3 m u 2 C
statistically significant (at 0.05 significance level), indicating an increasing trend in the intensity of the failure process (i.e. degradation). The LR statistic equals 10.187 and the MH statistic equals 3.5698, leading us to the same conclusion (the latter at 0.01 significance level). Fitting of an intensity function to the data is the next step considered in the procedure presented in Fig. 6, but will not be included here for brevity. It is important to mention that not all tests need be performed in this case. On the contrary, the idea is to choose a test that accommodates the user and, only in the case that not enough evidence is available, then validate its results using a second testing technique.
6.2. Multiple-system example If we were interested in modeling the time to failure for the pooled group of backhoes, existence of trends in the pooled behavior should be assessed. A common mistake is to assume that every piece of equipment operated for the same number of hours over the entire interval, which in this case is not true. As these backhoes are operated by a construction company in different projects, during the same observation period of 1998–2003, some of them operated more than 6000 h, whereas others barely reached 2500 h of operation. Then, for the performance of graphical tests such as the TTT or Nelson–Aalen plots, the modeler should have special care in identifying the number of units in operation for different ages of the fleet. Note that time is expressed in operating hours, so in the superposed failure process, 11 backhoes can be considered to be ‘‘in operation’’ just for the interval between t ¼ 0 and t ¼ 2028 operating hours. For more advanced ages of the fleet, the number of backhoes in operation decreases. Fig. 9 shows a TTT plot constructed for this example. From the form of the curve, a clear indication of an increasing trend in the intensity of failures is observed (i.e. degradation). Fig. 10 presents a Nelson-Aalen plot for the same data, again suggesting the same conclusion. Results for the combined Laplace and Military Handbook tests are the following: L ¼ 3.096 (significant indicator of degradation, at 0.01 significance level) and MH ¼ 31.859 (significant indicator of degradation, at 0.01 significance level). Then, the pooled failure behavior for the fleet effectively presents a trend, thus a RP cannot
1 0 0
1000
2000
3000
4000
5000
6000
7000
1
Age (hours) Fig. 7. Cumulative failures vs. time plot—backhoe #7.
3500 ) s r 3000 u o h ( 2500 ) e r u l i a 2000 f
c i t s i t a t s
T T T
h t
( 1500 e f i l e 1000 c i v r e S 500
i
0
0 0
500
1000
1500
2000
2500
th
Service life ((i-1) failure) (hours) Fig. 8. Successive service life plot—backhoe #7.
3000
3500
0
1 Scaled failure number
Fig. 9. TTT plot—all backhoes combined. Increasing intensity is suggested.
Author's personal copy ARTICLE IN PRESS D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 1618–1628
25
1627
Table A1 Failure times for a fleet of 11 backhoes.
Fleet of backhoes—failure data
20 y t i s n 15 e t n i e v i t a l u 10 m u C
5
0 0
2000
4000
Equipment #
Failure #
Age at failure (h)
TBF (h)
1
1 2 3 4 5
346 1925 3108 3610 3892
346 1579 1183 502 282
2
1 2 3 4
875 2162 3248 4422
875 1287 1086 1174
3
1 2 3 4
2920 4413 4691 4801
2920 1493 278 110
4
1 2 3 4
1234 1911 2352 3063
1234 677 441 711
5
1 2 3
896 1885 2028
896 989 143
6
1 2 3
1480 3648 5859
1480 2168 2211
7
1 2 3 4 5 6 7
3090 3940 4844 5010 5405 5647 6143
3090 850 904 166 395 242 496
8
1 2 3 4
1710 1787 2297 2915
1710 77 510 618
9
1 2 3
1885 2500 2815
1885 615 315
10
1 2 3
1691 2230 2500
1691 539 270
11
1 2 3
1210 2549 2621
1210 1339 72
6000
T ij Fig. 10. Nelson–Aalen plot—all backhoes combined. Increasing intensity is suggested.
be assumed (the use of an NHPP is suggested). This was expected not only because the graphical tests indicated a similar trend, but also because the results obtained for single systems showed that many of the backhoes presented individually a deterioration trend. The next step in this example would be, then, to estimate parameters for an NHPP representing the failure process of the backhoes. Coetzee [30] contains expressions for parameter estimation and goodness of fit procedures for an NHPP.
7. Conclusions
This paper reviews several tests available to assess the existence of trends, and proposes a practical procedure to discriminate between (i) the common renewal approach to model time to failure and (ii) the use of a non-stationary model such as the NHPP, which is a model believed to be subject to an easy practical implementation, within the alternatives available for a ‘‘repairable systems approach’’. The procedure suggested is simple, yet it is believed that it will lead to better representation of the failure processes commonly found in industrial operations. Through numerical examples, the use of the several tests reviewed is illustrated. Some practical problems that one may encounter when analyzing reliability data are also briefly discussed and references are given in each case for further review.
Acknowledgements
We would like to thank Dr. Dragan Banjevic, of the Center for Maintenance Optimization and Reliability Engineering at the University of Toronto, for his valuable comments on an earlier version of this paper.
Appendix. Data used in the numerical example
See Table A1.
References [1] Dekker R, Scarf PA. On the impact of optimisation models in maintenance decision making: the state of the art. Reliability Engineering and System Safety 1998;60:111–9. [2] Ansell JI, Phillips MJ. Strategies for reliability data analysis. In: Comer P, editor. Proceedings of the 11th advances in reliability technology symposium. London: Elsevier; 1990. [3] Scarf PA. On the application of mathematical models in maintenance. European Journal of Operational Research 1997;99:493–506. [4] Ansell JI, Phillips MJ. Practical problems in the statistical analysis of reliability data (with discussion). Applied Statistics 1989;38:205–31. [5] Jardine AKS, Tsang AHC. Maintenance, replacement and reliability: theory and applications. Boca Raton: CRC Press; 2006. [6] Campbell JD. Uptime: strategies for excellence in maintenance management. Portland: Productivity Press; 1995. [7] Campell JD, Jardine AKS, editors. Maintenance excellence: optimizing equipment life-cycle decisions. New York: Marcel Dekker; 2001. [8] Bendell T. An overview of collection, analysis, and application of reliability data in the process industries. IEEE Transactions on Reliability 1998;37: 132–7.
Author's personal copy ARTICLE IN PRESS 1628
D.M. Louit et al. / Reliability Engineering and System Safety 94 (2009) 1618–1628
[9] Percy DF, Kobbacy KAH, Fawzi BB. Setting preventive maintenance schedules when data are sparse. International Journal of Production Economics 1997;51:223–34. [10] Ansell JI, Phillips MJ. Discussion of practical problems in the statistical analysis of reliability data (with discussion). Applied Statistics 1989;38: 231–47. [11] Barlow RE, Proschan F. Inference for the exponential life distribution. In: Serra A, Barlow RE, editors. Theory of reliability, Proceedings of the International School of Physics ‘‘Enrico Fermi’’. Amsterdam: North-Holland; 1986. p. 143–64. [12] Lindley DV, Singpurwalla ND. Reliability and fault tree analysis using expert opinions. Journal of the American Statistical Association 1986;81:87–90. [13] Singpurwalla ND. Foundational issues in reliability and risk analysis. SIAM Review 1988;30:264–81. [14] Stamatelatos M, et al. Probabilistic risk assessment procedures guide for NASA managers and practitioners. Washington, DC: Office of Safety and Mission Assurance NASA Headquarters; 2002. [15] Meeker WQ, Escobar LA. Statistical methods for reliability data. New York: Wiley; 1998. [16] O’Connor PDT. P ractical reliability engineering. 3rd ed. New York: Wiley; 1991. [17] Mann NR, Shafer RE, Singpurwalla ND. Methods for statistical analysis of reliability and life data. New York: Wiley; 1974. [18] Barlow RE, Proschan F. Mathematical theory of reliability. New York: Wiley; 1965. [19] Tsang AH, Jardine AKS. Estimators of 2-parameter Weibull distributions from incomplete data with residual lifetimes. IEEE Transactions on Reliability 1993;42:291–8. [20] Bohoris GA. Parametric statistical techniques for the comparative analysis of censored reliability data: a review. Reliability Engineering and System Safety 1995;48:149–55. [21] Bohoris GA, Walley DM. Comparative statistical techniques in maintenance management. IMA Journal of Mathematics Applied in Business and Industry 1992;3:241–8. [22] Ascher HE, Feingold H. Repairable systems reliability. Modeling, inference, misconceptions and their causes. New York: Marcel Dekker; 1984. [23] Saldanha PLC, de Simone EA, Frutoso e Melo PF. An application of nonhomogeneus Poisson point processes to the reliability analysis of service water pumps. Nuclear Engineering and Design 2001;210:125–33. [24] Weckman GR, Shell RL, Marvel JH. Modeling the reliability of repairable systems in the aviation industry. Computers and Industrial Engineering 2001;40:51–63. [25] Rigdon SE, Basu AP. Statistical methods for the reliability of repairable systems. New York: Wiley; 2000. [26] Thompson WA. On the foundations of reliability. Technometrics 1981;23: 1–13. [27] Calabria R, Pulcini G. Inference and test in modeling the failure/repair process of repairable mechanical equipments. Reliability Engineering and System Safety 2000;67:41–53.
[28] Brown M, Proschan F. Imperfect repair. Journal of Applied Probability 1983;20:851–9. [29] Kijima M. Some results for repairable systems with general repair. Journal of Applied Probability 1989;26:89–102. [30] Coetzee J. The role of NHPP models in the practical analysis of maintenance failure data. Reliability Engineering and System Safety 1997;56:161–8. [31] Kvaloy JT, Lindqvist BH. TTT-based tests for trend in repairable systems data. Reliability Engineering and System Safety 1998;60:13–28. [32] Miller AG, Kaufer B, Carlsson L. Activities on component reliability under the OECD Nuclear Energy Agency. Nuclear Engineering and Design 2000;198: 325–34. [33] Ascher HE, Kobbacy KAH. Modelling preventive maintenance for repairable systems. IMA Journal of Mathematics Applied in Business and Industry 1995;6:85–100. [34] Baker RD. Some new tests of the power law process. Technometrics 1996;38: 256–65. [35] Knights PF, Segovia R. Reliability model for the optimal replacement of shovel cables. Transactions of the Institution of Mining and Metallurgy, Section A: Mining Industry 1999;108:A8–A16. [36] Bergman B. On age replacement and the total time on test concept. Scandinavian Journal of Statistics 1979;6:161–8. [37] Xie M, Goh TN, Ranjan P. Some effective control chart procedures for reliability monitoring. Reliability Engineering and System Safety 2002; 77:143–50. [38] Elvebakk G. Extending the use of some traditional trend tests for repairable systems data by resampling techniques, 1999. /http://www.math.ntnu.no/ preprint/statistics/1999/S19-1999.psS. [39] Mann HB. Nonparametric tests against trend. Econometrica 1945;13:245–59. [40] Lewis PA, Robinson DW. Testing for monotone trend in a modulated renewal process. In: Proschan F, Serfling RJ, editors. Reliability and biometry. Philadelphia: SIAM; 1974. p. 163–82. [41] Anderson TW, Darling DA. Asymptotic theory of certain goodness of fit criteria based on stochastic processes. Annals of Mathematical Statistics 1952;23:193–212. [42] Vaurio JK. Identification of process and distribution characteristics by testing monotonic and non-monotonic trends in failure intensities and hazard rates. Reliability Engineering and System Safety 1999;64:345–57. [43] Tukey JW. The future of data analysis. Annals of Mathematical Statistics 1962;33:1–67. [44] Walls L, Quigley J. Building prior distributions to support Bayesian reliability growth modelling using expert judgement. Reliability Engineering and System Safety 2001;74(2):117–28. [45] Guikema SD, Pate´-Cornell ME. Probability of infancy problems for space launch vehicles. Reliability Engineering and System Safety 2005;87: 303–14. [46] Vierta¨va¨ J, Vaurio JK. Testing statistical significance of trends in learning, ageing and safety indicators. Reliability Engineering and System Safety 2009;94:1128–32.