Towards a Multi-Representational Approach to Prediction, Understanding and Discovery in Hydrology

(De La Fuente, Gupta, and Condon) This paper was recently published in Water Resources Research after a lengthy review process. It compares conceptual/physical and machine-learning approaches to modeling catchment systems.

Key to model development is the selection of an appropriate representational system, including both the representation of what is observed (the data), and the formal mathematical structure used to construct the input-state-output mapping. These choices are critical, because they completely determine the questions we can ask, the nature of the analyses and inferences we can perform, and the answers we can obtain. Accordingly, a representation that is suitable for one kind of investigation might be limited in its ability to support some other kind. Arguably, how different representational approaches affect what we can learn from data is poorly understood.

This paper explores three representational strategies as vehicles for understanding how catchment scale hydrological processes vary across hydro-geo-climatologically diverse Chile. Specifically, we test a lumped water-balance model (GR4J), a data-based dynamical systems model (LSTM), and a data-based regression tree model (Random Forest). Insights were obtained regarding system memory encoded in data, spatial transferability by use of surrogate attributes, and informational deficiencies of the data set that limit our ability to learn an adequate input-output relationship.

As expected, each approach exhibits specific strengths, with LSTM providing the best characterization of dynamics, GR4J being the most robust under informationally deficient conditions, and Random Forest regression-tree method being most supportive of interpretation. Overall, the contrasting nature of the three approaches suggests the value of adopting a multi-representational framework to more fully extract information from the data and, by doing so, find information that better facilities the goals of robust prediction and improved understanding, ultimately supporting enhanced scientific discovery.

Hoshin Gupta