Zero Truncated Models in Regression Analysis: An Examination of Their Advantages on Small Mean Values
Yıl 2025,
Cilt: 30 Sayı: 1, 102 - 112, 29.04.2025
Rıdvan Kara
,
Abdullah Yeşilova
Öz
In this study, two of the most commonly used zero-truncated regression models for modeling positive count data, namely Zero Truncated Poisson and Zero Truncated Negative Binomial, are compared with the classical Poisson and Negative Binomial regression models. The role of the mean of the dependent variable in model selection is examined. Simulations were first conducted using different mean values for the dependent variable, followed by a comparison of model performances using two different real data sets. The real data sets were constructed using crime data published by Turkish Statistical Institude (TSI). AIC, BIC, and residual plots were utilized to determine the most suitable model. The study found that zero-truncated models perform better when the mean of the dependent variable is below 5, compared to classical models.
Destekleyen Kurum
The financial support provided by the Scientific Research Projects Council of Van Yüzüncü Yıl University under project number FDK-2019-7987.
Teşekkür
The authors gratefully acknowledge the financial support provided by the Scientific Research
Projects Council of Van Yüzüncü Yıl University under project number FDK-2019-7987.
Kaynakça
- Agresti, A. (1997). Categorical data analysis. John Wiley & Sons.
- Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer.
- Cox, R. (1983). Some remarks on overdispersion. Biometrika, 70(2), 269-274.
- Coxe, S., West, S. G., & Aiken, L. S. (2009). The analysis of count data: A gentle introduction to Poisson regression and its alternatives. Journal of Personality Assessment, 91(2), 121-136. https://doi.org/10.1080/00223890802634175
- Creel, M. D., & Loomis, J. B. (1990). Theoretical and empirical advantages of truncated count data estimators for analysis of deer hunting in California. American Journal of Agricultural Economics, 72(2), 434-441. https://doi.org/10.2307/1242345
- Draper, N. R., & Smith, H. (1998). Applied regression analysis (3rd ed.). Wiley.
- Hilbe, J. M. (2011). Negative binomial regression (2nd ed.). Cambridge University Press.
- Hilbe, J. M. (2014). Modeling count data. Cambridge University Press.
- Jansakul, N., & Hinde, J. P. (2002). Score tests for zero-inflated Poisson models. Computational Statistics & Data Analysis, 40(1), 75-96. https://doi.org/10.1016/S0167-9473(01)00104-9
- Khoshgoftaar, T. M., Gao, K., & Szabo, R. M. (2005). Comparing software fault predictions of pure and zero-inflated Poisson regression models. International Journal of Systems Science, 36(11), 707-715. https://doi.org/10.1080/00207720500159995
- Kutner, M. H., Nachtsheim, C. J., & Neter, J. (2004). Applied linear regression models (4th ed.). McGraw-Hill/Irwin.
- Lawal, B. H. (2012). Zero-inflated count regression models with applications to some examples. Quality & Quantity, 46, 19-38. https://doi.org/10.1007/s11135-010-9324-x
- Lawless, J. F. (1987). Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics, 15(3), 209-225. https://doi.org/10.2307/3314912
- Lee, A. H., Wang, K., Yau, K. K., & Somerford, P. J. (2003). Truncated negative binomial mixed regression modelling of ischaemic stroke hospitalizations. Statistics in Medicine, 22(7), 1129-1139. https://doi.org/10.1002/sim.1419
Liu, X., Saat, M. R., Qin, X., & Barkan, C. P. L. (2013). Analysis of U.S. freight-train derailment severity using zero-truncated negative binomial regression and quantile regression. Accident Analysis & Prevention, 59, 87-93. https://doi.org/10.1016/j.aap.2013.04.039
- Long, J. S. (1997). Regression models for categorical and limited dependent variables. Sage Publications. https://doi.org/10.1080/00401706.1998.10485496
- Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to linear regression analysis (5th ed.). Wiley.
- Puza, B., Johnson, H., O'Neill, T., & Barry, S. (2008). Bayesian truncated Poisson regression with application to Dutch illegal immigrant data. Communications in Statistics - Simulation and Computation, 37(8), 1565-1577. https://doi.org/10.1080/03610910802117073
- Rose, C. E., Martin, S. W., Wannemuehler, K. A., & Plikaytis, B. D. (2006). On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. Journal of Biopharmaceutical Statistics, 16(4), 463-481. https://doi.org/10.1080/10543400600719384
- Sáez-Castillo, A. J., & Conde-Sánchez, A. (2013). A hyper-Poisson regression model for overdispersed and underdispersed count data. Computational Statistics & Data Analysis, 61, 148-157. https://doi.org/10.1016/j.csda.2012.12.009
- Sileshi, G. (2008). The excess-zero problem in soil animal count data and choice of appropriate models for statistical inference. Pedobiologia, 52(1), 1-17. https://doi.org/10.1016/j.pedobi.2007.11.003
- Simo, T., Esa, L., Anti, M., Jaakko, T., Harri, S., Selina, J., & Timo, A. (2007). Self-reported health problems and sickness absence in different age groups predominantly engaged in physical work. Occupational and Environmental Medicine, 64(11), 739-746. https://doi.org/10.1136/oem.2006.027789
- Thygesen, H. H., & Zwinderman, A. H. (2006). Modeling sage data with a truncated gamma-Poisson model. BMC Bioinformatics, 7, 157. https://doi.org/10.1186/1471-2105-7-157
- Van Der Heijden, P. G., Cruyff, M., & Van Houwelingen, H. C. (2003). Estimating the size of a criminal population from police records using the truncated Poisson regression model. Statistica Neerlandica, 57(3), 289-304. https://doi.org/10.1111/1467-9574.00232
- Winkelmann, R. (2008). Econometric analysis of count data. Springer-Verlag Berlin Heidelberg.
- Yesilova, A., Kaya, Y., Kaki, B., & Kasap, İ. (2010). Analysis of plant protection studies with excess zeros using zero-inflated and negative binomial hurdle models. GU Journal of Science, 23(2), 131-136.
- Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1-25. https://doi.org/10.18637/jss.v027.i08
Sıfır Değer Kesilmiş Modellerin Regresyon Analizindeki Avantajları: Küçük Ortalama Değerler Üzerine Bir İnceleme
Yıl 2025,
Cilt: 30 Sayı: 1, 102 - 112, 29.04.2025
Rıdvan Kara
,
Abdullah Yeşilova
Öz
Bu çalışmada, pozitif sayım verilerinin modellenmesinde en sık kullanılan zero-truncated regresyon modellerinden ikisi Sıfır değer kesilmiş Poisson, Sıfır değer kesilmiş Negatif Binom ile klasik Poisson ve Negatif Binom regresyon modelleri karşılaştırılmış ve bağımlı değişkenin ortalamasının model seçimindeki rolü incelenmiştir. Öncelikle bağımlı değişken için farklı ortalama değerleri kullanılarak simülasyonlar gerçekleştirilmiş, ardından iki farklı gerçek veri seti üzerinden model performansları karşılaştırılmıştır. Gerçek veri setleri, Türkiye İstatistik Kurumu (TÜİK) tarafından yayımlanan suç verileri kullanılarak oluşturulmuştur. En uygun modelin belirlenmesinde AIC, BIC ve residual grafikleri kullanılmıştır. Bağımlı değişkenin ortalamasının 5’ten küçük olduğu durumlarda, sıfır değer kesilmiş modellerin daha üstün bir performans sergilediği tespit edilmiştir.
Kaynakça
- Agresti, A. (1997). Categorical data analysis. John Wiley & Sons.
- Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer.
- Cox, R. (1983). Some remarks on overdispersion. Biometrika, 70(2), 269-274.
- Coxe, S., West, S. G., & Aiken, L. S. (2009). The analysis of count data: A gentle introduction to Poisson regression and its alternatives. Journal of Personality Assessment, 91(2), 121-136. https://doi.org/10.1080/00223890802634175
- Creel, M. D., & Loomis, J. B. (1990). Theoretical and empirical advantages of truncated count data estimators for analysis of deer hunting in California. American Journal of Agricultural Economics, 72(2), 434-441. https://doi.org/10.2307/1242345
- Draper, N. R., & Smith, H. (1998). Applied regression analysis (3rd ed.). Wiley.
- Hilbe, J. M. (2011). Negative binomial regression (2nd ed.). Cambridge University Press.
- Hilbe, J. M. (2014). Modeling count data. Cambridge University Press.
- Jansakul, N., & Hinde, J. P. (2002). Score tests for zero-inflated Poisson models. Computational Statistics & Data Analysis, 40(1), 75-96. https://doi.org/10.1016/S0167-9473(01)00104-9
- Khoshgoftaar, T. M., Gao, K., & Szabo, R. M. (2005). Comparing software fault predictions of pure and zero-inflated Poisson regression models. International Journal of Systems Science, 36(11), 707-715. https://doi.org/10.1080/00207720500159995
- Kutner, M. H., Nachtsheim, C. J., & Neter, J. (2004). Applied linear regression models (4th ed.). McGraw-Hill/Irwin.
- Lawal, B. H. (2012). Zero-inflated count regression models with applications to some examples. Quality & Quantity, 46, 19-38. https://doi.org/10.1007/s11135-010-9324-x
- Lawless, J. F. (1987). Negative binomial and mixed Poisson regression. The Canadian Journal of Statistics, 15(3), 209-225. https://doi.org/10.2307/3314912
- Lee, A. H., Wang, K., Yau, K. K., & Somerford, P. J. (2003). Truncated negative binomial mixed regression modelling of ischaemic stroke hospitalizations. Statistics in Medicine, 22(7), 1129-1139. https://doi.org/10.1002/sim.1419
Liu, X., Saat, M. R., Qin, X., & Barkan, C. P. L. (2013). Analysis of U.S. freight-train derailment severity using zero-truncated negative binomial regression and quantile regression. Accident Analysis & Prevention, 59, 87-93. https://doi.org/10.1016/j.aap.2013.04.039
- Long, J. S. (1997). Regression models for categorical and limited dependent variables. Sage Publications. https://doi.org/10.1080/00401706.1998.10485496
- Montgomery, D. C., Peck, E. A., & Vining, G. G. (2012). Introduction to linear regression analysis (5th ed.). Wiley.
- Puza, B., Johnson, H., O'Neill, T., & Barry, S. (2008). Bayesian truncated Poisson regression with application to Dutch illegal immigrant data. Communications in Statistics - Simulation and Computation, 37(8), 1565-1577. https://doi.org/10.1080/03610910802117073
- Rose, C. E., Martin, S. W., Wannemuehler, K. A., & Plikaytis, B. D. (2006). On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. Journal of Biopharmaceutical Statistics, 16(4), 463-481. https://doi.org/10.1080/10543400600719384
- Sáez-Castillo, A. J., & Conde-Sánchez, A. (2013). A hyper-Poisson regression model for overdispersed and underdispersed count data. Computational Statistics & Data Analysis, 61, 148-157. https://doi.org/10.1016/j.csda.2012.12.009
- Sileshi, G. (2008). The excess-zero problem in soil animal count data and choice of appropriate models for statistical inference. Pedobiologia, 52(1), 1-17. https://doi.org/10.1016/j.pedobi.2007.11.003
- Simo, T., Esa, L., Anti, M., Jaakko, T., Harri, S., Selina, J., & Timo, A. (2007). Self-reported health problems and sickness absence in different age groups predominantly engaged in physical work. Occupational and Environmental Medicine, 64(11), 739-746. https://doi.org/10.1136/oem.2006.027789
- Thygesen, H. H., & Zwinderman, A. H. (2006). Modeling sage data with a truncated gamma-Poisson model. BMC Bioinformatics, 7, 157. https://doi.org/10.1186/1471-2105-7-157
- Van Der Heijden, P. G., Cruyff, M., & Van Houwelingen, H. C. (2003). Estimating the size of a criminal population from police records using the truncated Poisson regression model. Statistica Neerlandica, 57(3), 289-304. https://doi.org/10.1111/1467-9574.00232
- Winkelmann, R. (2008). Econometric analysis of count data. Springer-Verlag Berlin Heidelberg.
- Yesilova, A., Kaya, Y., Kaki, B., & Kasap, İ. (2010). Analysis of plant protection studies with excess zeros using zero-inflated and negative binomial hurdle models. GU Journal of Science, 23(2), 131-136.
- Zeileis, A., Kleiber, C., & Jackman, S. (2008). Regression models for count data in R. Journal of Statistical Software, 27(8), 1-25. https://doi.org/10.18637/jss.v027.i08