Araştırma Makalesi
BibTex RIS Kaynak Göster

Zero Truncated Models in Regression Analysis: An Examination of Their Advantages on Small Mean Values

Yıl 2025, Cilt: 30 Sayı: 1, 102 - 112, 29.04.2025
https://doi.org/10.53433/yyufbed.1590611

Öz

In this study, two of the most commonly used zero-truncated regression models for modeling positive count data, namely Zero Truncated Poisson and Zero Truncated Negative Binomial, are compared with the classical Poisson and Negative Binomial regression models. The role of the mean of the dependent variable in model selection is examined. Simulations were first conducted using different mean values for the dependent variable, followed by a comparison of model performances using two different real data sets. The real data sets were constructed using crime data published by Turkish Statistical Institude (TSI). AIC, BIC, and residual plots were utilized to determine the most suitable model. The study found that zero-truncated models perform better when the mean of the dependent variable is below 5, compared to classical models.

Destekleyen Kurum

The financial support provided by the Scientific Research Projects Council of Van Yüzüncü Yıl University under project number FDK-2019-7987.

Teşekkür

The authors gratefully acknowledge the financial support provided by the Scientific Research Projects Council of Van Yüzüncü Yıl University under project number FDK-2019-7987.

Kaynakça

  • Agresti, A. (1997). Categorical data analysis. John Wiley & Sons.
  • Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer.
  • Cox, R. (1983). Some remarks on overdispersion. Biometrika, 70(2), 269-274.
  • Coxe, S., West, S. G., & Aiken, L. S. (2009). The analysis of count data: A gentle introduction to Poisson regression and its alternatives. Journal of Personality Assessment, 91(2), 121-136. https://doi.org/10.1080/00223890802634175
  • Creel, M. D., & Loomis, J. B. (1990). Theoretical and empirical advantages of truncated count data estimators for analysis of deer hunting in California. American Journal of Agricultural Economics, 72(2), 434-441. https://doi.org/10.2307/1242345
  • Draper, N. R., & Smith, H. (1998). Applied regression analysis (3rd ed.). Wiley.
  • Hilbe, J. M. (2011). Negative binomial regression (2nd ed.). Cambridge University Press.
  • Hilbe, J. M. (2014). Modeling count data. Cambridge University Press.
  • Jansakul, N., & Hinde, J. P. (2002). Score tests for zero-inflated Poisson models. Computational Statistics & Data Analysis, 40(1), 75-96. https://doi.org/10.1016/S0167-9473(01)00104-9
  • Khoshgoftaar, T. M., Gao, K., & Szabo, R. M. (2005). Comparing software fault predictions of pure and zero-inflated Poisson regression models. International Journal of Systems Science, 36(11), 707-715. https://doi.org/10.1080/00207720500159995
  • Kutner, M. H., Nachtsheim, C. J., & Neter, J. (2004). Applied linear regression models (4th ed.). McGraw-Hill/Irwin.
  • Lawal, B. H. (2012). Zero-inflated count regression models with applications to some examples. Quality & Quantity, 46, 19-38. https://doi.org/10.1007/s11135-010-9324-x
  • Lawless, J. F. (1987). Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics, 15(3), 209-225. https://doi.org/10.2307/3314912
  • Lee, A. H., Wang, K., Yau, K. K., & Somerford, P. J. (2003). Truncated negative binomial mixed regression modelling of ischaemic stroke hospitalizations. Statistics in Medicine, 22(7), 1129-1139. https://doi.org/10.1002/sim.1419 Liu, X., Saat, M. R., Qin, X., & Barkan, C. P. L. (2013). Analysis of U.S. freight-train derailment severity using zero-truncated negative binomial regression and quantile regression. Accident Analysis & Prevention, 59, 87-93. https://doi.org/10.1016/j.aap.2013.04.039
  • Long, J. S. (1997). Regression models for categorical and limited dependent variables. Sage Publications. https://doi.org/10.1080/00401706.1998.10485496
  • Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to linear regression analysis (5th ed.). Wiley.
  • Puza, B., Johnson, H., O'Neill, T., & Barry, S. (2008). Bayesian truncated Poisson regression with application to Dutch illegal immigrant data. Communications in Statistics - Simulation and Computation, 37(8), 1565-1577. https://doi.org/10.1080/03610910802117073
  • Rose, C. E., Martin, S. W., Wannemuehler, K. A., & Plikaytis, B. D. (2006). On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. Journal of Biopharmaceutical Statistics, 16(4), 463-481. https://doi.org/10.1080/10543400600719384
  • Sáez-Castillo, A. J., & Conde-Sánchez, A. (2013). A hyper-Poisson regression model for overdispersed and underdispersed count data. Computational Statistics & Data Analysis, 61, 148-157. https://doi.org/10.1016/j.csda.2012.12.009
  • Sileshi, G. (2008). The excess-zero problem in soil animal count data and choice of appropriate models for statistical inference. Pedobiologia, 52(1), 1-17. https://doi.org/10.1016/j.pedobi.2007.11.003
  • Simo, T., Esa, L., Anti, M., Jaakko, T., Harri, S., Selina, J., & Timo, A. (2007). Self-reported health problems and sickness absence in different age groups predominantly engaged in physical work. Occupational and Environmental Medicine, 64(11), 739-746. https://doi.org/10.1136/oem.2006.027789
  • Thygesen, H. H., & Zwinderman, A. H. (2006). Modeling sage data with a truncated gamma-Poisson model. BMC Bioinformatics, 7, 157. https://doi.org/10.1186/1471-2105-7-157
  • Van Der Heijden, P. G., Cruyff, M., & Van Houwelingen, H. C. (2003). Estimating the size of a criminal population from police records using the truncated Poisson regression model. Statistica Neerlandica, 57(3), 289-304. https://doi.org/10.1111/1467-9574.00232
  • Winkelmann, R. (2008). Econometric analysis of count data. Springer-Verlag Berlin Heidelberg.
  • Yesilova, A., Kaya, Y., Kaki, B., & Kasap, İ. (2010). Analysis of plant protection studies with excess zeros using zero-inflated and negative binomial hurdle models. GU Journal of Science, 23(2), 131-136.
  • Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1-25. https://doi.org/10.18637/jss.v027.i08

Sıfır Değer Kesilmiş Modellerin Regresyon Analizindeki Avantajları: Küçük Ortalama Değerler Üzerine Bir İnceleme

Yıl 2025, Cilt: 30 Sayı: 1, 102 - 112, 29.04.2025
https://doi.org/10.53433/yyufbed.1590611

Öz

Bu çalışmada, pozitif sayım verilerinin modellenmesinde en sık kullanılan zero-truncated regresyon modellerinden ikisi Sıfır değer kesilmiş Poisson, Sıfır değer kesilmiş Negatif Binom ile klasik Poisson ve Negatif Binom regresyon modelleri karşılaştırılmış ve bağımlı değişkenin ortalamasının model seçimindeki rolü incelenmiştir. Öncelikle bağımlı değişken için farklı ortalama değerleri kullanılarak simülasyonlar gerçekleştirilmiş, ardından iki farklı gerçek veri seti üzerinden model performansları karşılaştırılmıştır. Gerçek veri setleri, Türkiye İstatistik Kurumu (TÜİK) tarafından yayımlanan suç verileri kullanılarak oluşturulmuştur. En uygun modelin belirlenmesinde AIC, BIC ve residual grafikleri kullanılmıştır. Bağımlı değişkenin ortalamasının 5’ten küçük olduğu durumlarda, sıfır değer kesilmiş modellerin daha üstün bir performans sergilediği tespit edilmiştir.

Kaynakça

  • Agresti, A. (1997). Categorical data analysis. John Wiley & Sons.
  • Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer.
  • Cox, R. (1983). Some remarks on overdispersion. Biometrika, 70(2), 269-274.
  • Coxe, S., West, S. G., & Aiken, L. S. (2009). The analysis of count data: A gentle introduction to Poisson regression and its alternatives. Journal of Personality Assessment, 91(2), 121-136. https://doi.org/10.1080/00223890802634175
  • Creel, M. D., & Loomis, J. B. (1990). Theoretical and empirical advantages of truncated count data estimators for analysis of deer hunting in California. American Journal of Agricultural Economics, 72(2), 434-441. https://doi.org/10.2307/1242345
  • Draper, N. R., & Smith, H. (1998). Applied regression analysis (3rd ed.). Wiley.
  • Hilbe, J. M. (2011). Negative binomial regression (2nd ed.). Cambridge University Press.
  • Hilbe, J. M. (2014). Modeling count data. Cambridge University Press.
  • Jansakul, N., & Hinde, J. P. (2002). Score tests for zero-inflated Poisson models. Computational Statistics & Data Analysis, 40(1), 75-96. https://doi.org/10.1016/S0167-9473(01)00104-9
  • Khoshgoftaar, T. M., Gao, K., & Szabo, R. M. (2005). Comparing software fault predictions of pure and zero-inflated Poisson regression models. International Journal of Systems Science, 36(11), 707-715. https://doi.org/10.1080/00207720500159995
  • Kutner, M. H., Nachtsheim, C. J., & Neter, J. (2004). Applied linear regression models (4th ed.). McGraw-Hill/Irwin.
  • Lawal, B. H. (2012). Zero-inflated count regression models with applications to some examples. Quality & Quantity, 46, 19-38. https://doi.org/10.1007/s11135-010-9324-x
  • Lawless, J. F. (1987). Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics, 15(3), 209-225. https://doi.org/10.2307/3314912
  • Lee, A. H., Wang, K., Yau, K. K., & Somerford, P. J. (2003). Truncated negative binomial mixed regression modelling of ischaemic stroke hospitalizations. Statistics in Medicine, 22(7), 1129-1139. https://doi.org/10.1002/sim.1419 Liu, X., Saat, M. R., Qin, X., & Barkan, C. P. L. (2013). Analysis of U.S. freight-train derailment severity using zero-truncated negative binomial regression and quantile regression. Accident Analysis & Prevention, 59, 87-93. https://doi.org/10.1016/j.aap.2013.04.039
  • Long, J. S. (1997). Regression models for categorical and limited dependent variables. Sage Publications. https://doi.org/10.1080/00401706.1998.10485496
  • Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to linear regression analysis (5th ed.). Wiley.
  • Puza, B., Johnson, H., O'Neill, T., & Barry, S. (2008). Bayesian truncated Poisson regression with application to Dutch illegal immigrant data. Communications in Statistics - Simulation and Computation, 37(8), 1565-1577. https://doi.org/10.1080/03610910802117073
  • Rose, C. E., Martin, S. W., Wannemuehler, K. A., & Plikaytis, B. D. (2006). On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. Journal of Biopharmaceutical Statistics, 16(4), 463-481. https://doi.org/10.1080/10543400600719384
  • Sáez-Castillo, A. J., & Conde-Sánchez, A. (2013). A hyper-Poisson regression model for overdispersed and underdispersed count data. Computational Statistics & Data Analysis, 61, 148-157. https://doi.org/10.1016/j.csda.2012.12.009
  • Sileshi, G. (2008). The excess-zero problem in soil animal count data and choice of appropriate models for statistical inference. Pedobiologia, 52(1), 1-17. https://doi.org/10.1016/j.pedobi.2007.11.003
  • Simo, T., Esa, L., Anti, M., Jaakko, T., Harri, S., Selina, J., & Timo, A. (2007). Self-reported health problems and sickness absence in different age groups predominantly engaged in physical work. Occupational and Environmental Medicine, 64(11), 739-746. https://doi.org/10.1136/oem.2006.027789
  • Thygesen, H. H., & Zwinderman, A. H. (2006). Modeling sage data with a truncated gamma-Poisson model. BMC Bioinformatics, 7, 157. https://doi.org/10.1186/1471-2105-7-157
  • Van Der Heijden, P. G., Cruyff, M., & Van Houwelingen, H. C. (2003). Estimating the size of a criminal population from police records using the truncated Poisson regression model. Statistica Neerlandica, 57(3), 289-304. https://doi.org/10.1111/1467-9574.00232
  • Winkelmann, R. (2008). Econometric analysis of count data. Springer-Verlag Berlin Heidelberg.
  • Yesilova, A., Kaya, Y., Kaki, B., & Kasap, İ. (2010). Analysis of plant protection studies with excess zeros using zero-inflated and negative binomial hurdle models. GU Journal of Science, 23(2), 131-136.
  • Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1-25. https://doi.org/10.18637/jss.v027.i08
Toplam 26 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Uygulamalı İstatistik
Bölüm Fen Bilimleri ve Matematik / Natural Sciences and Mathematics
Yazarlar

Rıdvan Kara 0000-0001-6977-4766

Abdullah Yeşilova 0000-0002-0666-8170

Yayımlanma Tarihi 29 Nisan 2025
Gönderilme Tarihi 24 Kasım 2024
Kabul Tarihi 3 Mart 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 30 Sayı: 1

Kaynak Göster

APA Kara, R., & Yeşilova, A. (2025). Zero Truncated Models in Regression Analysis: An Examination of Their Advantages on Small Mean Values. Yüzüncü Yıl Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 30(1), 102-112. https://doi.org/10.53433/yyufbed.1590611
OSZAR »