Assessing the Uncertainties of Machine Learning Methods for Predicting the Hydrological Responses of Low Impact Development Practices - IAHR World Congress, 2019 Water Connecting the World

Low impact development (LID) practices, such as green roofs and bioretention cells, are regarded as environmentally-friendly alternatives to the conventional drainage infrastructures. It is essential to accurately predict the hydrological responses of LID practices for assessing and optimizing LID designs. However, the accuracy of the commonly-used process-based hydrological models is sometimes affected by their model structure and the availability of field measurements. Machine learning methods can potentially avoid these issues by directly modeling the correlation between the input (e.g., rainfall time series) and the response (e.g., outflow hydrograph) of a system. However, considerable uncertainties are involved when training machine learning models. As a case study, the correlation between rainfall time series and outflow rates in an LID site in the U.S. is modeled using 11 commonly-used machine learning models, including random forest, k-nearest neighbors, and gradient boosting machine. These models are trained on high temporal resolution data using formal machine learning procedures, which include feature engineering, pre-processing, model tuning, and resampling. Different methods are used in the training procedures for assessing the involved uncertainties. For example, in feature engineering, the original high-resolution time series is transformed into different sets of features (e.g., mean and peak rainfall intensity in the past two hours) which are used as input to machine learning models, and different types of transformations are used to pre-process these features. The results show that some machine learning models can achieve comparable or better prediction accuracy when compared to process-based models, and performance of different machine learning models can vary significantly. The feature engineering and the resampling procedures are found to have significant impacts on the quality of the trained models. Evaluating multiple machine learning models and using various methods in model training are crucial for assessing the uncertainties involved in machine learning.