Machine Learning-Based Prediction of Air Quality Index (AQI) in Mumbai: Comparative Analysis of Linear Regression, Random Forest, and XGBoost Models

Authors

  • Ashutosh Kumar Upadhyay School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India
  • Sapna Ratan Shah School of Computational & Integrative Sciences, Jawaharlal Nehru University, New Delhi 110067, India

DOI:

https://doi.org/10.31305/rrijm.2025.v10.n11.030

Keywords:

Air Pollution Prediction, Particulate Matter (PM2.5, PM10), Machine Learning Models, Linear Regression, Random Forest, XGBoost, Environmental Data Analytics

Abstract

Accurate prediction of Air Quality Index (AQI) is essential for understanding pollution dynamics and informing public health interventions. This study evaluates the performance of three supervised machine learning algorithms Linear Regression, Random Forest Regressor, and XGBoost Regressor, for AQI prediction based on pollutant concentrations in Mumbai. The dataset includes six key air pollutants: PM2.5, PM10, NO₂, SO₂, CO, and O₃. Model performance was assessed using R², Adjusted R², RMSE, and MAE. The Linear Regression model achieved exceptional results with an R² value of 1.00, RMSE of 2.62×10⁻¹⁴, and MAE of 1.90×10⁻¹⁴, indicating a near-perfect fit. Random Forest and XGBoost models also performed extremely well, achieving R² scores of 0.9999988 and 0.9999962, respectively. Overall, the findings demonstrate that even simple linear models can effectively predict AQI when pollutant-AQI relationships exhibit strong linearity. This study confirms the robustness of ML algorithms and highlights their suitability for real-time air pollution monitoring and policy applications.

References

Amanpreet Kaur and Sapna Ratan Shah, “A Mathematical Modeling Approach to Air Pollution Dispersion for Predicting Pollutant Distribution from Point Sources”, International journal of advanced Research, 13, (04), pp. 1349-1353, April (2025).

Amanpreet Kaur, Sapna Ratan Shah, “A Mathematical Modeling Approach to Air Pollution Dispersion for Enhancing Community Health and Environmental Safety”, International Journal of Innovative Research in Technology, 11(12), pp. 3929-3933, (2025).

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

Central Pollution Control Board (CPCB). (2022). National Air Quality Monitoring Programme. Ministry of Environment, Forest and Climate Change, India.

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794).

Cohen, A. J., Brauer, M., Burnett, R., Anderson, H. R., Frostad, J., Estep, K., ... & Feigin, V. (2017). Estimates and 25-year trends of the global burden of disease attributable to ambient air pollution: An analysis of data from the Global Burden of Diseases Study 2015. The Lancet, 389(10082), 1907–1918.

Grange, S. K., & Carslaw, D. C. (2019). Using machine learning to fully explain the behavior of air pollutants. Environmental Science & Technology, 53(14), 8184–8192.

Gurjar, B. R., Nagpure, A. S., & Sharma, A. (2016). Air pollution trends over Indian megacities and their local, regional, and global implications. Atmospheric Environment, 142, 1–3.

Guttikunda, S., (2019). Air quality in Mumbai: Source apportionment and policy implications. Atmospheric Environment, 216, 116912.

Jain, S. et al. (2020). Air quality prediction using ML techniques. Environmental Monitoring and Assessment.

Lelieveld, J., Evans, J. S., Fnais, M., Giannadaki, D., & Pozzer, A. (2015). The contribution of outdoor air pollution sources to premature mortality on a global scale. Nature, 525(7569), 367–371.

Li, J., et al. (2019). PM2.5-based air quality prediction using Random Forest and XGBoost. Atmospheric Pollution Research, 10(6), 1752–1763.

Li, M., Zhang, J., & Wang, H. (2017). A machine learning approach for air quality forecasting in urban environments. Environmental Pollution, 228, 293–301.

Liang, X., (2021). Air quality prediction using machine learning methods: A review. Science of the Total Environment, 759, 143722.

Mahajan, S., Guttikunda, S. K., & Nishadh, K. (2016). Air quality trends in Indian cities: An analysis using national air quality monitoring data. Current Science, 111(9), 1530–1538.

Maharashtra Pollution Control Board (MPCB). (2022). Air Quality Status of Maharashtra.

Priya Singh Gurjar, Sapna Ratan Shah, “Mathematical modelling of atmospheric pollutant dispersion under steady state conditions with constant eddy diffusivity. Research Review International Journal of Multidisciplinary, 10(5), 240–247 (2025). https://doi.org/10.31305/rrijm.2025.v10.n5.024

Sharma, M., et al. (2020). Machine learning for air quality prediction in urban areas. Environmental Science & Technology, 54(3), 1574–1584.

Sharma, M., Singh, N., & Jindal, A. (2020). Predicting urban air quality using machine learning approaches: A case study of Delhi, India. Environmental Monitoring and Assessment, 192, 615.

Upadhyay, A. K., Vashisth, M., Kaur, A., & Shah, S. R. (2025). Mathematical modeling of atmospheric pollutant dispersion under periodic emissions: Implications for respiratory and cardiovascular health. International Journal of Science, Engineering and Technology, 13(5). ISSN (Online): 2348-4098, ISSN (Print): 2395-4752.

World Health Organization (WHO). (2021). Air pollution. https://www.who.int/health-topics/air-pollution

World Health Organization (WHO). (2023). Air Pollution and Health.

Zhang, L., Wang, Y., & Li, Z. (2012). Forecasting PM2.5 concentrations with a hybrid ARIMA and neural network model. Environmental Science and Pollution Research, 19(6), 2254–2265.

Zhang, Y., (2019). Prediction of PM2.5 concentration using machine learning methods in China. Science of the Total Environment, 655, 992–1002.

Downloads

Published

15-11-2025

How to Cite

Upadhyay, A. K., & Shah, S. R. (2025). Machine Learning-Based Prediction of Air Quality Index (AQI) in Mumbai: Comparative Analysis of Linear Regression, Random Forest, and XGBoost Models. RESEARCH REVIEW International Journal of Multidisciplinary, 10(11), 299–307. https://doi.org/10.31305/rrijm.2025.v10.n11.030

Most read articles by the same author(s)

1 2 > >>