COMPARATIVE ANALYSIS OF THE NORMALITY OF STATISTICAL CRITERIA FOR SAMPLES OF CONTAMINATED DATA

-

Keywords: criterion, method, distribution, stability, emissions, small samples, modeling, statistics, test

Abstract

The paper considers the influence of contaminated samples with anomalous observations on the reliability of statistical analysis results and hypothesis testing for sample homogeneity. The main focus is on visual data analysis as an effective means of preliminary research. The use of histograms, scatter plots, and density estimates allows for the visual identification of outliers, the assessment of the distribution shape, and the detection of differences between samples. The purpose of the study is to evaluate the robustness of popular statistical criteria for testing the normality of distribution in the presence of contamination in small samples. The scientific novelty lies in the quantitative study of the impact of different types of contamination on the results of popular criteria, as well as in the practical assessment of their behavior under conditions of violation of assumptions about data homogeneity. The practical novelty lies in the development of recommendations for practitioners on the selection of the optimal criterion when analyzing samples with possible anomalous observations, taking into account the stability of statistical methods. Research methods include numerical modeling of samples with controlled introduction of structural contaminants, assessment of the frequency of false rejections/acceptance of the null hypothesis, as well as comparative analysis of the results obtained using the following statistical criteria: Student's t-test for comparing the mean values of two samples; the Kolmogorov-Smirnov one-sample test to verify the conformity of the empirical distribution with the theoretical one; the Anderson-Darling criterion to verify the normality assessment; the Kolmogorov-Smirnov two-sample test to verify the homogeneity of two distributions. The results of the study showed the importance of choosing the appropriate criterion depending on the sample size and the expected level of contamination. Presenting the average values and ranges for N repetitions of the experiment allows for a visual assessment of the stability and reliability of each test in the presence of contaminated data. Based on the experiments conducted, practical recommendations are proposed for the preliminary diagnosis of samples and the selection of the optimal approach to testing hypotheses in the presence of contaminated data.

Downloads

Download data is not yet available.

Author Biography

Oleksii Klymenko, Oles Honchar Dnipro National University

Klymenko Oleksii Denysovych
Postgraduate student of the Department of Cybersecurity and Computer-Integrated Technologies
Faculty of Physics and Engineering, DNU
Specialty 174 Automation, Computer-Integrated Technologies and Robotics

References

Ghasemi A., Zahediasl S. Normality Tests for Statistical Analysis: A Guide for Non-Statisticians. International Journal of Endocrinology and

Metabolism. 2012;10(2):486–489. DOI: 10.5812/ijem.3505

Razali N. M., Wah Y. B. Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests. Journal of Statistical

Modeling and Analytics. 2011;2(1):21–33.

Blanca M. J., Alarcón R., Arnau J., Bono R., Bendayan R. Non-normal data: Is ANOVA still a valid option? Psicothema. 2017;29(4):552–557.

DOI: 10.7334/psicothema2016.383

Osborne J. W. Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data.

SAGE Publications; 2013.

Babak V.P. et al. Models and Measures in Measurements and Monitoring. Springer International Publishing, 2021. 266 p. DOI: 10.1007/978-3-

-70783-5

Klymenko O.D. Verification of homogeneity of pseudorandom samples by Anderson and Bush-Wind criteria // System technologies. №5(160).

Dnipro, 2025. – P.24–33. DOI: 10.34185/1562-9945-5-160-2025-03

Fedorenko O.D., Klym V.Y., Klymenko S.V. Nonparametric statistics of random variables with unknown probability distribution function //

System technologies. №5(160). Dnipro, 2025. – P.101–111. DOI: 10.34185/1562-9945-5-160-2025-11

Malaichuk V., Klymenko S., Astakhov D. Computer Processing of Measurements in Problems of Observation of the Condition of Technical

Objects. Journal of Rocket-Space Technology. 2023;30(4):99–106. DOI: 10.15421/452213

Montgomery, D. C., & Runger, G. C. Applied Statistics and Probability for Engineers. 6th ed. Hoboken, NJ: John Wiley & Sons, 2014. – 792 p.

Fisher N.I. Statistical Analysis of Circular Data. Cambridge University Press, 2000. – 277 p.

Єременко В.С., Куц Ю.В., Мокійчук В.М. та ін. Статистичний аналіз даних вимірювань: навч. посібник. – К.: НАУ, 2015. – 321 с.

Переїденко А.В., Єременко В.С. Формування навчальної вибірки інформаційних сигналів під час неруйнівного контролю

виробів з композиційних матеріалів. Відбір і обробка інформації. 2011, Вип. 35(111). – С. 47–54.

Підлипна Р., Підлипний Ю., Індус К. Використання статистичних методів у фінансовому менеджменті для оптимізації прийняття управлінсь-ких рішень. Молодий вчений. 2024, №4(128). – С. 215–220.

Хомутінін Ю.В., Кашпаров В.О., Жебровська К.І. Оптимізація відбору і вимірювання проб при радіоекологічному моніторингу: Моногра-

фія. – К.: УкрНДІ с.-г. радіології, 2001. – 160 с.

Лихач О.Ю., Угрюмов М.Л., Шевченко Д.О., Шматков С.І. Методи виявлення викидів в пробних вибірках при управлінні процесами в сис-

темах за станом. 2022. DOI: 10.26565/2304-6201-2022-53-03

Vovk S.M., Нnatushenko V.V. Criteria and techniques for processing noisy data with anomalous values // Системні технології. Регіональ-

ний міжвузівський збірник наукових праць. - Випуск 6 (119). - Дніпро, 2018. - С.12 - 26.

Published
2025-12-29
How to Cite
Klymenko, O., & Vovk, S. (2025). COMPARATIVE ANALYSIS OF THE NORMALITY OF STATISTICAL CRITERIA FOR SAMPLES OF CONTAMINATED DATA: -. Journal of Rocket-Space Technology, 34(4), 97-104. https://doi.org/10.15421/452551
Section
Applied mechanics and mathematical methods