
Spatial data is often organized in multiple separate databases, leading to difficulties for decision makers to get an overview of the big picture. To combine them, data cleansing of the input-datasets is inevitable in many cases. One aspect of this task is the assignment of empty attribute-values. My UNIGIS master thesis put forward a simple approach for doing so based on attributes of neighboring objects in urban drainage networks. A special emphasis was put on the question of how to deal with inconsistencies in the underlying network topology during these attribute assignments.
How Better Water Management can be Achieved through Data Validation
“Will we ever invent anything this useful again?” was once asked by the Economist [1]. The invention referred to by the British journal was one often overlooked or even considered mundane: Modern day sanitation. In many parts of the world, Urban Drainage Networks (UDNs) are a key part of sanitation. These systems prevent diseases, avoid floods and enable the treatment of wastewater [2, 3].
Wastewater utility databases are crucial for the construction and maintenance of these networks. They can be e.g. used to store information about the location and condition of such facilities, or as inputs for their hydraulic assessment using simulations [4, 5]. Unfortunately, these databases are usually incomplete [6, 7]. Moreover, in practice, they are often organized in multiple separate instances, containing, for example, information about one municipality each.
In the case of Switzerland, a need for greater integration among these separate datasets was recognized [8]. This led to the establishment of a centralized database combining multiple previously distinct urban drainage network datasets [9]. To ensure the integrity of this central management system, the heterogeneous input datasets for this platform first need to be validated [10]. One assessment of this validation consists of a check if multiple attributes are correctly assigned.
A New Hybrid Approach for Wastewater Utility Data Management
Many methods to automatically allocate missing attribute values in urban drainage and water distribution networks exist in the academic literature [11, 12, 13, 14, 15]. All of them, however, assign values in all places where they are missing, irrespective of the degree of confidence of these assignments. In contrast, a completion method was developed in my thesis which only completed attribute values in cases where this is expected to be very accurate. For the other objects, manual assignments are undertaken in this hybrid approach. The method developed thus combines the speed of fully automatized methods from academia with the precision of manual assignments common in practice.
The autocompletion method created can be understood by having a look at Figure 1. This sketch depicts the basic elements of an urban drainage network. These are manholes where water is entering the sewer system and pipes connecting the manholes. Of course, there is a flow direction for the (waste-) water in this gravity-driven system which is indicated by the arrows of the pipes. Finally, the colors of these elements indicate their owner. An automatic assignment of the owners is then performed in cases where the object up- and downstream of elements with missing values are identically attributed. Also, assignments are undertaken when the entity downstream has a particular attribute value, which is “private” in here. Following these rules, attribute values could be assigned to the pipe between the manholes 3 and 4 and to the manhole 2.4. No assignments would, however, be possible for the manhole 5, as well as for the manhole 3.1 and its pipe downstream. On a technical level, Python was used for the implementation of this concept.

Figure 1: Example of an urban drainage network. Many more attributes, similar to the one describing ownership, can be present in such a network.
Insights on how to Deal with Topological Inconsistencies
A challenge faced by the methodology developed was posed by topological errors in the datasets. These can be, for example, incorrectly attributed flow directions. To assess how this problem could be best dealt with, two workflows were compared. Both workflows had the goal of creating a dataset in which both these topological inconsistencies are removed and all objects properly attributed. However, in one of these workflows, such inconsistencies were removed before the execution of the Python-scripts, and in the other one after running the codes. This second workflow was motivated by the aspiration to combine the topological rectifications with other data cleansing operations for increased efficiency.
Three attributes were thus autocompleted in four municipalities, both with and without prior corrections of the underlying network topology. The resulting proportions of assigned values can be seen in Figure 2. While there is a large variability of this percentage regarding the municipality assessed, only a marginal difference was found between the two workflows. Thus, 48.4% of the missing attribute values were assigned where topological errors were not removed prior to the execution of the autocompletion-scripts. In contrast, when these inconsistencies were first removed, this proportion mildly increased to 51.9%.

Figure 2: Percentages of autocompletion for the two workflows in the four municipalities and the aggregated percentage in all villages combined.
To evaluate which of these two workflows is more efficient, the net benefits of them, measured in minutes of work, were compared to the current workflow used in practice. This latter one only uses manual attribute assignments. The results of this comparison are given in Table 1. They clearly indicate a superior performance of the workflow in which topological errors are removed after attribute completion. Therefore, the efficiency gains of lumping topological error removal with other data cleansing tasks outweigh the marginal increase in the percentage of autocompletion.

Table 1: Net benefits of the two compared workflows.
Finally, it must be noted that defective network topologies could theoretically lead to erroneous attribute assignments. Such errors were searched for meticulously but not found in any of the municipalities assessed. This observation presents a strong empirical indication that a correct network topology is no precondition for a save application of the scripts developed.
Putting the Outcome into Perspective and Identifying New Challenges
By comparing the insights of this thesis with other approaches for automatic attribute assignments, the usage of erroneous network topologies can be identified as a novelty. Previous studies [11, 12, 13, 14, 15] implicitly or explicitly always used input datasets without topological errors. The simplicity of the approach created, compared to earlier, more sophisticated models, was also found to be potentially beneficial for its acceptance by decision-makers [16].
Furthermore, a quantitative comparison with selected models from the literature [11, 12] showed on the one hand that these models assigned more objects correctly. However, on the other hand, the precision of the values assigned was found to be superior in the rugged method developed.
Ultimately, wastewater utility management could be further assisted by combining the autocompletion procedure created with previous literature models. By doing so, the results of the implemented Python script would be directly inserted into the utility database. The ones of the more sophisticated models could then first be checked by an operator to ensure correct assignments. This would allow for further efficiency gains while not compromising on quality. Also, methods can be envisioned to automatically rectify topological errors, especially the ones caused by wrongly attributed flow directions of pipes. Surely, these would be great topics for future UNIGIS master theses!
References
[1] Economist, “The great innovation debate,” 12 January 2013. [Online]. Available: https://www.economist.com/leaders/2013/01/12/the-great-innovation-debate.
[2] P. Bach, W. Rauch, P. Mikkelsen, D. McCarthy and A. Deletic, “A critical review of integrated urban water modelling — Urban drainage and beyond,” Environmental modelling & software, no. 54, pp. 88–107, April 2014.
[3] T. Larsen and W. Gujer, “The concept of sustainable Urban Water Management,” Water Science and Technology, pp. 3-10, 1997.
[4] J. Schilling and J. Tränckner, “Generate_SWMM_inp: An Open-Source QGIS Plugin to Import and Export Model Input Files for SWMM,” Water, vol. 14, no. 14, p. 2262, 2022.
[5] C. Egger, A. Scheidegger, P. Reichert and M. Maurer, “Sewer deterioration modeling with condition data lacking historical records,” Water Research, vol. 17, no. 47, pp. 6762-6779, 2013.
[6] M. Bilal, W. Khan, J. Muggleton, E. Rustighi, H. Jenks, S. R. Pennock, P. R. Atkins and A. Cohn, “Inferring the most probable maps of underground utilities using Bayesian mapping model,” Journal of Applied Geophysics, no. 150, pp. 52-68, 2018.
[7] N. Metje, P. Atkins, M. Brennan, D. Chapman, H. Lim, J. Machell, J. Muggleton, S. Pennock, J. Ratcliffe and M. Redfern, “Mapping the Underworld–State-of-the-art review,” Tunnelling and Underground Space Technology, Vols. 5-6, no. 22, pp. 568-586, 2007.
[8] M. Maurer, F. Chawla, J. von Horn and P. Staufer, “Abwasserentsorgung 2025 in der Schweiz,” Eawag: Das Wasserforschungs-Institut des ETH-Bereichs, Dübendorf, 2012.
[9] R. Battaglia, “Aqua und Gas,” Fachverband für Wasser, Gas und Wärme, 10 October 2020. [Online]. Available: https://www.aquaetgas.ch/wasser/abwasser/20201010_ag10_ohne-daten-keine-taten/. [Accessed 6 April 2026].
[10] S. Burckhard, “Aqua und Gas,” Fachverband für Wasser, Gas und Wärme, November 2017. [Online]. Available: https://vsa.ch/wp-content/uploads/2020/04/201711_Sauber_Gewaesser_dank_hoher_Datenqualitaet_Burckhardt_AquaetGaz.pdf. [Accessed 6 April 2026].
[11] Y. Belghaddar, N. Chahinian, A. Seriai, A. Begdouri, R. Abdou and C. Delenne, “Graph convolutional networks: Application to database completion of wastewater networks,” Water, vol. 12, no. 13, p. 1681, 2021.
[12] M. Hajibabaei, S. Hesarkazzazi and R. Sitzenfrei, “Filling data gaps in urban drainage networks: An automated graph theory framework for data collection and reconstruction,” Water Research, vol. A, no. 287, p. 124272, 2025.
[13] G. Kabir, S. Tesfamariam, J. Hemsing and R. Sadiq, “Handling incomplete and missing data in water network database using imputation methods,” Sustainable and Resilient Infrastructure, vol. 6, no. 5, pp. 365-377, 2020.
[14] F. Tscheikner-Gratl, R. Sitzenfrei, W. Rauch and M. Kleidorfer, “Enhancement of limited water supply network data for deterioration modelling and determination of rehabilitation rate,” Structure and Infrastructure Engineering, vol. 3, no. 12, pp. 366-380, 2016.
[15] M. Hajibabaei, S. Hesarkazzazi, A. Minaei, A. Dastgir and R. Sitzenfrei, “Using complex network theory for missing data reconstruction in water distribution networks,” Sustainable Cities and Society, vol. 101, p. 105114, 2024.
[16] O. Loyola-Gonzalez, “Black-box vs. white-box: Understanding their advantages and weaknesses from a practical point of view,” IEEE access, no. 7, pp. 154096-154113, 2026.
About the Author

Michael Demarmels works as a project engineer in urban drainage. He obtained a master’s degree in environmental engineering from ETH Zürich in 2018 and one in geoinformatics from UNIGIS Salzburg in 2026. Since 2022, he is employed at Infragon Ingenieure AG in Burgdorf, Switzerland.
Master Thesis: https://unigis.at/files/Mastertheses/Full/108365.pdf
LinkedIn: https://ch.linkedin.com/in/michael-demarmels-b32466226
E-Mail: michael.demarmels@gmail.com

