Air Pollution Level Prediction and Comparative Analysis of Machine Learning Models: A Case Study of Delhi AQI
DOI:
https://doi.org/10.31305/rrijm.2025.v10.n11.027Keywords:
Air Quality Index (AQI), Machine Learning, Linear Regression, Random Forest, XGBoost, Pollution Prediction, DelhiAbstract
Air pollution poses significant health and environmental challenges, particularly in urban regions like Delhi, India. Accurate prediction of Air Quality Index (AQI) is essential for public health planning and pollution mitigation. This study investigates the predictive performance of three machine learning models, Linear Regression, Random Forest, and XGBoost, on AQI data from Delhi. Using pollutant features such as PM2.5, PM10, CO, NO2, SO2, and O3, models were trained and evaluated based on R², Adjusted R², Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). Linear Regression achieved near-perfect prediction (R² = 1.0), Random Forest showed excellent performance (R² ≈ 0.9999988), and XGBoost demonstrated very good prediction (R² ≈ 0.999956). Feature importance analysis revealed the relative influence of pollutants, with PM2.5 being the most dominant. Results highlight the strengths and limitations of each model and provide insights for designing robust AQI prediction systems.
References
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Central Pollution Control Board (CPCB). (2022). National Air Quality Monitoring Programme. Ministry of Environment, Forest and Climate Change, India.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).
Cohen, A. J., Brauer, M., Burnett, R., Anderson, H. R., Frostad, J., Estep, K., ... & Feigin, V. (2017). Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: An analysis of data from the Global Burden of Diseases Study 2015. The Lancet, 389(10082), 1907–1918.
Gurjar, B. R., Nagpure, A. S., & Sharma, A. (2016). Air pollution trends over Indian megacities and their local, regional, and global implications. Atmospheric Environment, 142, 1–3.
Lelieveld, J., Evans, J. S., Fnais, M., Giannadaki, D., & Pozzer, A. (2015). The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature, 525(7569), 367–371.
Li, J., et al. (2019). PM2.5-based air quality prediction using Random Forest and XGBoost. Atmospheric Pollution Research, 10(6), 1752–1763.
Li, M., Zhang, J., & Wang, H. (2017). A machine learning approach for air quality forecasting in urban environments. Environmental Pollution, 228, 293–301.
Mahajan, S., Guttikunda, S. K., & Nishadh, K. (2016). Air quality trends in Indian cities: An analysis using national air quality monitoring data. Current Science, 111(9), 1530–1538.
Sharma, M., et al. (2020). Machine learning for air quality prediction in urban areas. Environmental Science & Technology, 54(3), 1574–1584.
Sharma, M., Singh, N., & Jindal, A. (2020). Predicting urban air quality using machine learning approaches: A case study of Delhi, India. Environmental Monitoring and Assessment, 192, 615.
World Health Organization (WHO). (2021). Air pollution. https://www.who.int/health-topics/air-pollution
Zhang, L., Wang, Y., & Li, Z. (2012). Forecasting PM2.5 concentrations with a hybrid ARIMA and neural network model. Environmental Science and Pollution Research, 19(6), 2254–2265.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
This is an open access article under the CC BY-NC-ND license Creative Commons Attribution-Noncommercial 4.0 International (CC BY-NC 4.0).