Using Wavelet Shrinkage to Deal with Contamination Problem in Survival Function for Weibull Distribution

Abstract


Introduction
Survival analysis (also called time-to-event analysis or duration analysis) is a branch of statistics aimed at analyzing the expected duration of time until one or more events happen called survival times or duration times such as death in biological organisms and failure in mechanical systems.This topic is called reliability theory or reliability analysis in engineering.Survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of interest is time until an event occurs.It is the study of time between entry into observation and a subsequent event.Now the scope of the survival analysis has become wide.Survival analysis is a set of statistical techniques used to describe and specify time to accident data.We use the term "failure" in survival analysis to describe the occurrence of the task event (even though the event may actually be a "success" such as recovery from therapy).The term "survival time" specifies the length of time taken for failure to occur.(David, 2012. Singh, 2011).Weibull distribution is a very beneficial distribution in survival analysis and reliability analysis.Several methods have been demonstrated to estimate the parameters of different distributions such as the method of moments method, maximum likelihood, etc.The Weibull distribution has gained much weight in the real world and is increasingly used in reliability and lifetime analysis or survival analysis.The 2-parameter Weibull distribution has the shape parameter ( β ) and the scale parameter (α ).Most distributions such as normal, gamma, inverse gamma, and some other common distributions have two parameters which are of immense interest.(Nketiah, 2021).
The contamination of the estimation of the intercept have only a small impact on the estimation of the regression coefficients.Good leverage points are observations that are on the outskirts of the design space but are near to the regression line.They have a minor impact on the estimate of both the intercept and the regression coefficients, but they have an impact on inference.In contrast, bad leverage points are observations that are far off the regression line.Signals are typically contaminated by random noise, hence, several methods have been used to smooth noisy signals including the Fourier transform, the Svitzky Goloylocal polynomial, the mean filters, and Gussian function and so on.However, these methods usually smooth the signal to reduce the noise, but, in the process, also blure the signal.In recent years, a new method has been introduced to the method of de-noising known as wavelet shrinkage.(Taha and Saleh, 2022) Wavelet shrinkage make de-noising a method of reducing noise in signals.Wavelet shrinkage is a signal denoising technique based on the idea of thresholding the wavelet coefficients.Donoho et al. (1995a) have introduced the method of wavelet shrinkage for general curve estimation problems.There are several good reasons why wavelet shrinkage can be used for estimation function.(Mustafa, and Taha, 2013)

Survival Analysis
Survival analysis is always treated with the analysis of data in times of accidents in life.The survival analysis and modeling the time it takes events occur, i.e. this typical event is death which is derived from the name ' survival ' analysis.Let T be a random variable that represents failure time of an event with probability of density function f (t) and cumulative distribution function F(t) = Pr(T  t) , the survival function S(t) is defined as: (David, 2012)

Weibull Distribution
The probability density function and the cumulative distribution function of a two parameter Weibull distribution with scale parameter, α > 0 and shape parameter, β > 0, are given by, (Nketiah, 2021) The cumulative distribution function is, Scale and shape parameters estimated by using Maximum Likelihood Estimation (MLE).
The method of MLE is a commonly used procedure for estimating parameters.Assume be a random sample of size obtained from a population with pdf, ( ) where is a hidden vector of parameter, ( ) likelihood function is given as, The MLE of is the value of that maximizes the likelihood function or the log-likelihood function where the likelihood function will be: . / ./ ∏ ( ) ./ ( ) Taking the algorithm of both sides, we get: Differentiating β, α , we obtain the estimating equations as follows: Equations 8 and 9 are solved numerically to obtain the estimated parameters.

Contaminated Data (Contamination Outlier and Noise)
The data come from two types of distributions the first of which is called Basic Distribution ( ) that generates good data while the second of which is called Contamination Distribution ( ) and P is a ratio of contamination then the distribution of an arbitrary observation is (Hawkins, 1980): In literature on data mining and statistics, outliers are sometimes known as abnormalities, discordant, deviants or anomalies.In the majority of applications, the data is produced by one or more producing processes which may either reflect system activity or observations made about entities.Noisy data is data that have been made due to the presence of too much variation.It is presumed that the signal or observation is presented and disguised by noise.The difficulty of separating the noise from the signal or observation has long been a focus in statistics.So, the useful data need to be used to inform researchers.However, the percentage of noisy data that is relevant is frequently too small to be useful.(Taha and Saleh, 2022)

Wavelet Shrinkage
Wavelet shrinkage is a well-established technique for removing the noise present in the observation while preserving the significant features of the original data (Donoho, 1994).The wavelet shrinkage is based on thresholding of the wavelet coefficients.The wavelet shrinkage has several good properties that gained this popularity in statistics nearly minimax for a wide range of loss function and for general function classes; simple, practical and fast; adaptable to spatial and frequency in homogeneities; readily extendable to high dimensions; applicable to various problems such as density estimation and inverse problems.In statistics, applications of wavelets arise mainly in the tasks involving non-parametric regression, density estimation, assessment of scaling, functional data analysis and stochastic processes.(Donoho and Johnstone , 1995)

Wavelet
Wavelets are small waves that can be grouped together to form larger waves or different waves (Ali, et al, 2022).A few fundamental waves were used, i.e. they were stretched in infinitely many ways, and moved in infinitely many ways to produce a wavelet system that could make an accurate model of any wave.
Consider generating an orthogonal wavelet basis for functions ( ) (the space of square integrable real functions), starting with two parent wavelets: the scaling function (also called farther wavelet) and the mother wavelet .Other wavelets are then generated by and (Donald et al., 2004).The dilation and translation of the functions are defined by formulas (11) and (12).
The discrete wavelet transform (DWT) is a broadly applicable observation of processing algorithm which is benefit in several applications, for e.g.science, engineering, mathematics and computer science.DWT decomposes an observation by using scaled and shifted versions of a compact supported basis function (mother wavelet), and provides multiresolution representation of the observation.It gives a vector of observations y consisting of 2 k observations where k is an integer and the DWT of y due to formula (13).(Ali, et al, 2022) (13) Where w is wavelet matrix with (n × n) dimension, W is a vector with (n × 1) dimension including both scaling and wavelet coefficients.The vector of wavelet coefficients can be organized into (k+1) elements.W= [W 1 , W 2 , …, W k ,V k0 ] T at each DWT, the approximate coefficients are divided into bands using the same wavelet as before with the result that the details are appended with the details of the latest decomposition as in the following formula (Taha and Saleh, 2022): ∑ (14) At each level (k) the observations can be reconstructed from the de-noise data (reducing the contamination) by the inverse DWT (Ramazan et al., 2002).

Thresholding
The simplest method of non-linear wavelet de-noising is thresholding in which the wavelet coefficient is sub divided into two sets one of which represents signal while the other represents noise.To apply the thresholds of the wavelet coefficients, there are different rules and several different methods for choosing a threshold value exist such as:

I. SURE
The SURE threshold proposed by Donoho and Johonstone (1994), which is based upon the minimization of Stein's risk estimator.In SURE threshold method specifies a threshold estimate of at each level k for the wavelet coefficients and then for the soft threshold estimator, we have.
be a wavelet coefficient in the kth level, and then select that minimizes

II. Minimax
The optimal minimax threshold method is submitted by Donoho and Johonstone (1994) as an improvement to the universal threshold method.Minimax is based on an estimator ̃ that attains to the minimax risk as: Where Where ( ) and ̃ ̃( ), denote the vectors of true and estimated sample values.The threshold minimax estimator is different from universal counter parts in which the minimax threshold method concentrates on reducing the overall mean square error (MSE) but the estimates are not over-smoothing.Donoho and Johnstone (1994) proposed universal threshold which is given by ̃

III. Universal Threshold
Where N is the data length series and ̃( ) is the estimator of standard deviation of details coefficients, which is estimated as: ̃( ) (20) MAD is the median absolute deviation of the wavelet coefficients at the finest scale

Thresholding Rules
There are two main thresholding rules:

1-Soft Thresholding
It was proposed by Donoho & Johnostone in 1995.Soft thresholding zeros all the signal values smaller than δ followed by subtracts δ from the values larger than δ which is defined as follows: Where and

2-Hard Thresholding
Donoho and Johnstone proposed Hard thresholding which is the simplest thresholding technique based on the premise of (keep or kill).Hard thresholding zeroes out all the signal values smaller than δ.The wavelet coefficient is set to the vector with element."Quotation" (Donoho, and Johnstone , 1995)

Proposed Method
The proposed method included dealt with the contamination problem of Weibull distribution in survival analysis using Wavelet Shrinkage.First, compute the DWT coefficients ( ) for a wavelet ( ) (Daubechies, Symlets, and Coiflets wavelets).Second, the threshold level is estimated by one of the methods (e.g.SURE, Minimax, and Universal threshold).Third, Thresholding rules (Soft) is used to keep or kill the discrete wavelet coefficients.Thus, we get the modified DWT coefficients ( ), then it is used to compute the inverse of the modified DWT (Taha and Jwana, 2022) as in formula (25).
( ) Finally, the data for Weibull distribution which have less contamination are used to estimate the shape and scale parameter of the Weibull distribution using the method of maximum likelihood and then analyze the survival function on this basis.

Evaluation Criterion
To measure the accuracy of the estimated parameters (scale and shape) of the Weibull distribution, the mean squared error (MSE) can be used as in the following formula: ( ̂) ∑ ( ̂ ) (26) m: number of samples.

Experimental and Application
To compare between the classical and the proposed method in terms of efficiency and accuracy of the estimated parameters for Weibull distribution and reliability function, an experimental aspect was done by simulating the Weibull distribution, then an applied aspect of the real data based on MSE criterion and by designing a program in MATLAB (version 2020a) dedicated to this purpose (Appendix).

Experimental Aspect
Four cases were selected for scale parameter (0.5 and 1) and shape parameter (5 and 10), the sample size (50 and 100) and the addition of contamination percentages (10% and 20%) has a Cauchy distribution (α = 0 and β = 0.5).For the first experimental with n = 100, figure (1) is shown.

Figure (1): The Original data (*), Contamination data (.), and De-noise data (-)
Figure (1) shows the scatter plot of the data generated from the Weibull distribution (*) at scale parameter (0.5) and shape parameter (5), and the values of the scatter of the contamination data (.) at 10% contamination, and then the data processed from the contamination (-) using the (Sym3) wavelet with universal threshold and soft rule.The Survival function of the Weibull distribution for the contaminated and treated data is shown in Fig. 2 and 3 respectively.
For the purpose of the comparison between the proposed and classical method in estimating the parameters of the Weibull distribution, the experiment was repeated to (1000) times and the average criterion for MSE was calculated.Three wavelets (Db2), (Sym3), and (Coif4) were used with different methods in estimating the threshold level (SURE, Minimax, and Universal), with threshold rule (Soft), and for different samples (50, and 100) and percentage of contamination (10% and 20%).The results are summarized in tables (1)(2)(3)(4) for the average of (MSE) criterion when at  Tables (1)(2)(3)(4) shows that all the proposed methods have better efficiency than the classical method in estimating scale and shape parameters for Weibull distribution depending on the average of criterion (MSE) for all cases.Also, (Coif4) wavelet with Universal threshold method is the best efficient compared with all other proposed methods and with the classical method because it has the lowest average of the criterion (MSE(α) and MSE(β)).The efficiency of the estimated parameters decreases with increasing contamination percentage for all simulations.Also, the efficiency of the estimated scale parameter α is not affected by an increase in its real value, and the efficiency of the estimated shape parameter β decreases as its real value increases for all simulations.

Application Part
Real data represent the time of kidney failure.The distribution of the data was tested using Kolmogorov-Smirnov, and the test statistic (0.12074) is less than critical value (0.23059), that supports the hypothesis of the Weibull distribution for data (p-value 0.45096 > 0.01).The statistic test (the goodness of fit, χ 2 ) for the classical and proposed methods is summarized in table (5).  2. Coif4 wavelet with Universal Threshold Method is the best efficient compared with all other proposed methods and with the classical method because it has the lowest average of the Criterion (MSE(α) and MSE(β)).
3. For real data the proposed method (Db2 with Universal threshold method) is the best Recommendations 1.The proposed method for estimating the parameters of Weibull distribution is recommended.
2. The use of other types of orthogonal wavelets (bior, rbio, and dmey), methods for estimating the threshold level, and the thresholding rules in estimating the parameters of Weibull distribution and Survival function is also recommended.
3. Using a Bayesian approach with Wavelet Shrinkage in estimation the parameters of Weibull, Gamma, and Gomppertz distribution is recommended as well.

Figure ( 2 Figure ( 3 )
Figure (2): Survival Function of the Weibull Distribution for the Contaminated Data

Figure ( 4 )Figure ( 4 )Figure ( 5 ) 1 .
Figure (4): The Probability Density Function of the Weibull Distribution for Real Data Figure (4) shows the Probability Function of the Weibull Distribution of Kidney Failure Data for the Time Period (12-24).