Journal of the american statistical association, 85, 633639. Sestimator are asymptotically normal with rate of convergence 1 2 and their asymptotic. Supandi et al 593 sestimators sestimators were first introduced in the context of regression by rousseeuw and yohai 1984. Least squares, for example, minimizes the variance of the residuals and is a special case of sestimators. The following theorem shows that the estimators from rousseeuw 1984, rousseeuw and leroy 1987 and rousseeuw and van zomeren 1990 are zero breakdown and inconsistent. Robust tests for linear regression models based on estimates.
During the last ten years the topic of highbreakdown methods is a very popular one among the developers of statistical methods. Following seminal papers by box 1953 and tukey 1960, which demonstrated the need for robust statistical procedures, the theory of robust statistics blossomed in the 1960s and 1970s. Oct 27, 2020 the mestimator takes the following form. Least trimmed squares lts regression is based on the subset of h cases out of n whose least squares t possesses the smallest sum of squared residuals. S estimation is a high breakdown value method introduced by rousseeuw and yohai 1984. Pdf robust regression by means of sestimators researchgate.
Outlier detection using nonconvex penalized regression. However all these estimates are highly inefficient when all the observations satisfy the regression model with normal errors. In this note it is shown that all regressionequivariant highbreakdown. For more details see salibianbarrera and yohai 2006 or thieler, fried and rathjens 2016. These estimates have a very high computational complexity, and thus the usual algorithms compute only approximate solutions. If most of the large sample theory in the text is covered, then the course should be limited to ph. Sestimators of regression parameters, proposed by rousseeuw and yohai 1984, search for the slope and intercept values that minimize some measure of. The sestimate was introduced for the first time by rousseeuw and yohai, 1984 as the method of estimation that can make a specified scale estimator to have minimum value, and can be defined as.
The errors and the regressors are stationary longrange dependent gaussian. The olive and hawkins paradigm, as illustrated by this book, is to give theory for the estimator actually used. With the same breakdown value, it has a higher statistical efficiency than lts estimation. Rousseeuw born october 1956 is a statistician known for his work on robust statistics and cluster analysis. We will consider estimators of scale defined by a function, which satisfy. The performance of this method was improved by the fastlts algorithm of rousseeuw and van driessen.
Rousseeuw and yohai 1984 introduced sestimators in univariate regression. The use of alternative regression methods in social. The default, equivalent to init s, uses as initial estimator an sestimator rousseeuw and yohai, 1984 which is computed using the fasts algorithm of salibianbarrera and yohai 2006, calling lmrob. Donoho and huber 1983 advocated a finitesample version of the breakdown value, in line with hodgess 1967 study in the univariate framework.
Part of the springer series in statistics book series sss. Rousseeuw, 1984 the asymptotic breakdown point is then defined as 2. These are all called highbreakdown estimators since they can be tuned to resist contamination in up to 50% of the observations. A robust learning approach for regression models based on. When setting int to true, this adds an intercept column to the design matrix. A fast procedure for outlier diagnostics in large regression. Some examples are repeated medians siegel, 1982, least median of squares rousseeuw, 1984, sestimators rousseeuw and yohai, 1984, mmestimators yohai, 1987 and. Mm estimation, which was introduced by yohai, combines high breakdown value estimation and m estimation.
Sestimators encyclopedia of statistical sciences tyler wiley. En robust and nonlinear time series, editores franke, hardle and martin. Rousseeuw and yohai 1984 proposed svestimates, defined by the property of minimizing an mestimateofthe residuals scale. Next 10 the masking breakdown point of multivariate outlier identification rules.
Userfriendly covariance estimation for heavytailed distributions yuan ke, stanislav minskery, zhao ren z, qiang sun xand wenxin zhou abstract we provide a survey of recent results on covariance estimation for heavy. Trade regression eindhoven university of technology. A combination of the high breakdown value method and mestimation is the mmestimation yohai, 1987. Research partially funded by ubc and an nserc operating grant. For this reason, rousseeuw and yohai 1984 propose to minimize. Proof of the breakdown point of sestimators can be found in m uller and neykov 2003. Least trimmed squares lts regression is based on the subset of h observations out of a total of n observations whose least squares fit possesses the smallest sum of squared residuals. Later, they were applied to the multivariate scale and location estimation problem davies, 1992.
The regression parameters, the scale parameters and the changepoint are estimated using a method introduced by rousseeuw and yohai 1984. The robustreg procedure uses the fastlts algorithm that was proposed by rousseeuw and van driessen. An empirical comparison between robust estimation and robust. S estimation is a high breakdown value method that was introduced by rousseeuw and yohai 1984. He obtained his phd in 1981 at the vrije universiteit brussel, following research carried out at the eth in zurich in the group of frank hampel, which led to a book on influence functions. Mestimation huber, 1973, sestimation rousseeuw and yohai, 1984, and mmestimation yohai 1987. The br akdown point approach is highly attractive for a number of reasons, not the least.
Sestimators of regression parameters, proposed by rousseeuw and yohai 1984, search for the slope and intercept values that minimize some measure of scale associated with the residuals. That is, an minimizes the mscale an a implicitly defined by the equation 2. Rousseeuw 1984 who considered the asymptotic behaviour of the least. A generalization is given by sestimators rousseeuw and. We will focus on mmestimates of regression yohai 1987 calculated with an initial sestimate rousseeuw and yohai 1984 but our method can, in principle, received october 2000. The ltsestimator and the sestimator are asymptotically normal with rate of convergence n12 and their asymptotic. Later, they were applied to the multivariate scale and location. Rand wilcox, in introduction to robust estimation and hypothesis testing third edition, 2012. Rousseeuw and yohai 1984, by permission of springerverlag, new york. Most of the literature on high breakdown multivariate robust statistics follows the rousseeuw and yohai paradigm. Twosample sestimators, for robustly estimating two location vectors and a common covariance matrix, were considered byhe and fung2000. A robust testing procedure for the equality of covariance.
Therefore we develop a new algorithm called fastlts. The use of alternative regression methods in social sciences. Donoho 1982, donoho and huber 1983, rousseeuw 1984, rousseeuw and yohai 1984, yohai 1986, hampel et al. They considered observation as hlps if its corresponding rmd value exceeds the cutoff points. Robust regression by means of sestimators springerlink. Fast and robust bootstrap for multivariate inference. It should be noted that the problem of bias robustness and the desirability of optimal bias robust estimators, namely minmax bias estimates, is clearly recognized in hampel et. It has the high breakdown property of s estimation. Ias robust regressionu asnhington uim dept of statistics r.
The asymptotic distribution of mmestimates has been studied by yohai 1987 under the assumption that h ho central parametric model. Pdf paper 26527 robust regression and outlier detection. The sestimator in changepoint random model with long. Eviews offers three different methods for robust least squares. A disadvantage of the procedure is the lack of assumptions related to the distribution of errors rousseeuwyohai, 1984. The following dataset can be found in the world almanac and book of facts. The asymptotics of sestimators in the linear regression. Stefanski department of statistics, north carolina state university. However, all these estimates have very low efficiency under a regression model with normal errors.
In the latter two papers, the authors construct regression estimators which have both high breakdown points and high efficiency. The asymptotic breakdown point of the sestimator is given by rousseeuw and yohai, 1984. Rousseeuw 1984 proposed the least median of squares lms and the least trimmed squares lts. Robust estimator to deal with regression models having both. Introduction to rousseeuw 1984 least median of squares. Citeseerx citation query a resampling design for computing. Mm estimation, introduced by yohai 1987, combines high breakdown value estimation and m estimation. Rousseeuw and yohai 1984 proposed the sestimator, which has a similar objective function as in the mestimator but with some constraints. This is called sestimator and it has the property that is more robust than the classical estimators. Robust least squares refers to a variety of regression methods designed to be robust, or less sensitive, to outliers. Fast and robust diagnostic technique for the detection of. Croux 443 a measure of dispersion of the residuals that is less sensitive to extreme values than the variance.
Almost all start with an initial high breakdown point estimate not necessarily e. Rousseeuw and yohai 1984 proposed a class of estimates based on the minimization of a robust mestimate of the residual scale sestimates. In this paper we consider the problem of performing inference for a linear regression model using robust estimators. The goal of sestimators is to have a simple highbreakdown regression estimator, which share the flexibility and nice asymptotic properties of mestimators. S estimation is a high breakdown value method that was introduced by rousseeuw and yohai. The intercept adjustment technique is also used in this implementation. Yohai 1984, and sestimators for multivariate location and scatter have been. Pdf detecting influential observations in principal. Silvapulle 1991 proposed a new class of ridge type m estimators obtained by using m estimators instead of ls estimators.
Unfortunately, another common feature of these estimators is the timeconsuming nature. It has a higher statistical e ciency than sestimation. The paper considers twophase random design linear regression models. Part of the lecture notes in statistics book series lns, volume 26. A disadvantage of the procedure is the lack of assumptions related to the distribution of errors rousseeuw yohai, 1984. Sestimators were first introduced in the context of regression by rousseeuw and yohai 1984. Proof of the breakdown point of sestimators can be found in m. Least squares, for example, minimizes the variance of the residuals and. It turned out that the computation time of existing lts algorithms grew too fast with the size of the data set, precluding their use for data mining. Robust regression diagnostics of influential observations in. An empirical comparison between robust estimation and. He obtained his phd in 1981 at the vrije universiteit brussel, following research carried out at the eth in zurich in the group of frank hampel, which led to a book. Robust and nonlinear time series analysis, 256272, 1984. Regression parameter an overview sciencedirect topics.
900 1331 287 185 1045 501 689 1103 1247 1019 309 23 120 1047 418 104 45 1003 705 1519 991 521