COMPARATIVE ANALYSIS OF THE NORMALITY OF STATISTICAL CRITERIA FOR SAMPLES OF CONTAMINATED DATA
-
Abstract
The paper considers the influence of contaminated samples with anomalous observations on the reliability of statistical analysis results and hypothesis testing for sample homogeneity. The main focus is on visual data analysis as an effective means of preliminary research. The use of histograms, scatter plots, and density estimates allows for the visual identification of outliers, the assessment of the distribution shape, and the detection of differences between samples. The purpose of the study is to evaluate the robustness of popular statistical criteria for testing the normality of distribution in the presence of contamination in small samples. The scientific novelty lies in the quantitative study of the impact of different types of contamination on the results of popular criteria, as well as in the practical assessment of their behavior under conditions of violation of assumptions about data homogeneity. The practical novelty lies in the development of recommendations for practitioners on the selection of the optimal criterion when analyzing samples with possible anomalous observations, taking into account the stability of statistical methods. Research methods include numerical modeling of samples with controlled introduction of structural contaminants, assessment of the frequency of false rejections/acceptance of the null hypothesis, as well as comparative analysis of the results obtained using the following statistical criteria: Student's t-test for comparing the mean values of two samples; the Kolmogorov-Smirnov one-sample test to verify the conformity of the empirical distribution with the theoretical one; the Anderson-Darling criterion to verify the normality assessment; the Kolmogorov-Smirnov two-sample test to verify the homogeneity of two distributions. The results of the study showed the importance of choosing the appropriate criterion depending on the sample size and the expected level of contamination. Presenting the average values and ranges for N repetitions of the experiment allows for a visual assessment of the stability and reliability of each test in the presence of contaminated data. Based on the experiments conducted, practical recommendations are proposed for the preliminary diagnosis of samples and the selection of the optimal approach to testing hypotheses in the presence of contaminated data.
Downloads
References
Ghasemi A., Zahediasl S. Normality Tests for Statistical Analysis: A Guide for Non-Statisticians. International Journal of Endocrinology and
Metabolism. 2012;10(2):486–489. DOI: 10.5812/ijem.3505
Razali N. M., Wah Y. B. Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests. Journal of Statistical
Modeling and Analytics. 2011;2(1):21–33.
Blanca M. J., Alarcón R., Arnau J., Bono R., Bendayan R. Non-normal data: Is ANOVA still a valid option? Psicothema. 2017;29(4):552–557.
DOI: 10.7334/psicothema2016.383
Osborne J. W. Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data.
SAGE Publications; 2013.
Babak V.P. et al. Models and Measures in Measurements and Monitoring. Springer International Publishing, 2021. 266 p. DOI: 10.1007/978-3-
-70783-5
Klymenko O.D. Verification of homogeneity of pseudorandom samples by Anderson and Bush-Wind criteria // System technologies. №5(160).
Dnipro, 2025. – P.24–33. DOI: 10.34185/1562-9945-5-160-2025-03
Fedorenko O.D., Klym V.Y., Klymenko S.V. Nonparametric statistics of random variables with unknown probability distribution function //
System technologies. №5(160). Dnipro, 2025. – P.101–111. DOI: 10.34185/1562-9945-5-160-2025-11
Malaichuk V., Klymenko S., Astakhov D. Computer Processing of Measurements in Problems of Observation of the Condition of Technical
Objects. Journal of Rocket-Space Technology. 2023;30(4):99–106. DOI: 10.15421/452213
Montgomery, D. C., & Runger, G. C. Applied Statistics and Probability for Engineers. 6th ed. Hoboken, NJ: John Wiley & Sons, 2014. – 792 p.
Fisher N.I. Statistical Analysis of Circular Data. Cambridge University Press, 2000. – 277 p.
Єременко В.С., Куц Ю.В., Мокійчук В.М. та ін. Статистичний аналіз даних вимірювань: навч. посібник. – К.: НАУ, 2015. – 321 с.
Переїденко А.В., Єременко В.С. Формування навчальної вибірки інформаційних сигналів під час неруйнівного контролю
виробів з композиційних матеріалів. Відбір і обробка інформації. 2011, Вип. 35(111). – С. 47–54.
Підлипна Р., Підлипний Ю., Індус К. Використання статистичних методів у фінансовому менеджменті для оптимізації прийняття управлінсь-ких рішень. Молодий вчений. 2024, №4(128). – С. 215–220.
Хомутінін Ю.В., Кашпаров В.О., Жебровська К.І. Оптимізація відбору і вимірювання проб при радіоекологічному моніторингу: Моногра-
фія. – К.: УкрНДІ с.-г. радіології, 2001. – 160 с.
Лихач О.Ю., Угрюмов М.Л., Шевченко Д.О., Шматков С.І. Методи виявлення викидів в пробних вибірках при управлінні процесами в сис-
темах за станом. 2022. DOI: 10.26565/2304-6201-2022-53-03
Vovk S.M., Нnatushenko V.V. Criteria and techniques for processing noisy data with anomalous values // Системні технології. Регіональ-
ний міжвузівський збірник наукових праць. - Випуск 6 (119). - Дніпро, 2018. - С.12 - 26.
Copyright (c) 2025 Олексій Клименко, Сергій Вовк (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in the journal Journal of Rocket-Space Technology are licensed under the Creative Commons Attribution 4.0 International (CC BY) license. This means that you are free to:
- Share, copy, and redistribute the article in any medium or format
- Adapt, remix, transform, and build upon the article
as long as you provide appropriate credit to the original work, include the authors' names, article title, journal name, and indicate that the work is licensed under CC BY. Any use of the material should not imply endorsement by the authors or the journal.