Machine learning and hydrodynamic proxies for enhanced rapid tsunami vulnerability assessment | Communications … – Nature.com

Synthetic variables for shielding mechanism and debris impact as proxies for water velocity

To comprehensively analyze the individual contributions of the three approaches for accounting for water velocity, we systematically trained different eXtra Trees (XT) models33, each featuring a unique combination of input variables. The reference scenario (ID0) serves as both the initial benchmark and foundational baseline, encompassing the minimum set of variables retained across all subsequent scenarios. This baseline incorporates only basic input variables sourced from the original MLIT database, further enriched with some of the geospatial variables introduced by Di Bacco et al. characterized by the most straightforward computation23. Subsequently, the additional models are generated by iteratively introducing velocity-related (directly or indirectly) features into the model. This stepwise approach allows us to isolate the incremental improvements in predictive accuracy attributed to each individual component under consideration. Table 1 in Methods offers a concise overview of all tested variables, with those included in the reference scenario highlighted in italics.

The core results of the analysis aimed at assessing the predictive performance variability among the various trained models are summarized in Fig.1, which illustrates the global average accuracy (expressed in terms of hit rate (HR) on the test set) achieved by each model across ten training sessions. In the figure, each column represents a specific combination of input features, with x markers indicating excluded variables during each model training. Insights into the importance of individual input features on the models predictive performance are provided by the circles, the size of which corresponds to the mean decrease in accuracy (mda) when each single variable is randomly shuffled.

Circle size reflects the mean decrease in accuracy (mda) when individual variables are shuffled and x markers indicate excluded variables in model training.

The pair plot in Fig.2, illustrating the correlations and distributions among considered velocity-related variables as well as Distance across the seven damage classes in the MLIT dataset, has been generated to support the interpretation of the results and enrich the discussion. This graphical representation employs scatter plots to display the relationships between each pair of variables, while the diagonal axis represents kernel density plots for the individual features.

The pie chart summarizes the distribution of the various damage states within the dataset (shades from light pink to violet). The pair plot displays the relationships between each pair of variables, while the diagonal axis represents kernel density plots for the individual features.

The baseline model (ID0), established as a reference due to its exclusion of any velocity information, attains an average accuracy of 0.836. In ID1, the model exclusively incorporates the direct contribution of vsim, resulting in a modest improvement, with accuracy reaching 0.848. The subsequent model, ID2, closely resembling ID1 but replacing vsim with vc, demonstrates a decline in performance, with an accuracy value of 0.828. This decrease is attributed to the redundancy between vc and inundation depth (h), both in their shared importance as variables and in the decrease of hs importance compared to the previous case. Essentially, when both variables are included, the model might become confused because h, which could have been a relevant variable when introduced alone, may now appear less important due to the addition of vc, which basically provides the same information in a different format.

The analysis proceeds with the introduction of buffer-related proxies to account for possible dynamic water effects on damage. Initially, we isolate the effect of the two considered mechanisms: the shielding (ID3) exerted by structures within the buffers (NShArea and NSW) and the debris impact (NDIArea, ID4). In both instances, we observe an enhancement in accuracy, with values reaching 0.877 and 0.865, respectively. Their combined effect is considered in model ID5, yielding only a marginal overall performance improvement (0.878), due to the noticeable correlation between NShArea and NDIArea, especially for the more severe damage levels (Fig.2), with the two variables sharing their overall importance. Combination ID6, with the addition of vc, does not exhibit an increase in accuracy compared to the previous model (0.871), thus confirming the redundant contribution of a variable directly derived from another.

In the subsequent three input feature combinations, we explore the possible improvements in accuracy through the inclusion of vsim in conjunction with the considered proxies. In the case of ID7, where vsim is combined solely with shielding effect, no enhancement is observed (0.870) compared to the corresponding simple ID3. Similarly, when replacing shielding with the debris proxy (ID8), an overall accuracy of 0.867 is achieved, closely resembling the performance of ID4, lacking direct velocity input. The highest accuracy (0.889) is instead obtained when all three contributions are included simultaneously. Hence, the inclusion of vsim appears to result only in a marginal enhancement of model performance, with also an overall lower importance compared to the considered two proxies. From a physical perspective, albeit without a noticeable correlation between the data points of vsim and NShArea (Fig.2), this result can be explained by recognizing that flow velocity indirectly encapsulates the shielding effect arising from the presence of buildings, which are typically represented in hydrodynamic models as obstructions to wave propagation or through an increase in bottom friction for urban areas8,34,35,36. Since this alteration induced by the presence of buildings directly influences the hydrodynamic characteristics of the tsunami on land, the resulting values of vsim offer limited additional improvement to the models predictive ability compared to what is alreadyprovided by h and NShArea. Moreover, the very weak correlation of the considered proxies with the primary response variable h (Fig.2) reinforces their importance in the framework of a machine learning approach, since they provide distinct input information compared to flow velocity, which, instead, is directly related to h, as discussed for vc. Such observations then support the idea of regarding these proxies as suitable variables for capturing dynamic water effects on buildings.

In all previous combinations, observed field values (hMLIT) served as the primary data source for inundation depth information. However, for a more comprehensive analysis, we also introduced feature combination ID10, similar to ID9 but employing simulated inundation depths (hsim) in place of hMLIT. This model achieves accuracy levels comparable to its counterparts and exhibits a consistent feature importance pattern, albeit with a slight increase in the importance of the Distance variable.

For completeness, normalized confusion matrices, describing hit and misclassification rates among the different damage classes, are reported in Supplementary Fig.S1. These matrices reveal uniform error patterns across all models, with Class 5 consistently exhibiting higher misclassification rates, as a result of its underrepresentation in the dataset, as illustrated in Fig.2. Concerning the potential influence of such dataset imbalance on the results, it is worth noting that, for the primary aim of this study, it does not alter the overall outcomes in terms of relative importance of the various features on damage predictions, as affecting all trained models in the same way.

Delving further into the analysis of the results, the objective shifts toward gaining a thorough understanding of the relationships between the variables influencing the damage mechanisms. Indeed, while we have shown that the inclusion of water velocity components or the adoption of a more comprehensive multi-variable approach enhances tsunami damage predictions, machine learning algorithms have often been criticized for their inherent black-box nature30,31,32.

To address this challenge, we have chosen to embrace the concept of explanation through visualization by illustrating how it remains possible to derive explicit and informative insights from the outcomes derived from a machine learning approach, all while embracing the inherent complexity arising from the multi-variable nature of the problem at hand.

The results of trained models are then translated into the form of traditional fragility functions, expressing the probability of exceeding a certain damage state as a function of inundation depth, for fixed values of the feature under investigation, distinguished for velocity-related (Fig.3), site-dependent (Fig.4) and structural building attributes (Fig.5). In addition to the central value, the derived functions incorporate the 10th90th confidence intervals to provide a comprehensive representation of predictive uncertainty associated with them.

Fragility functions for fixed values of a direct velocity information (vsim), b proxy for shielding effect (NShArea) and c proxy for debris impact (NDIArea). The median fragility function is represented as a solid line, while the shaded area represents the 10th90th confidence interval.

Fragility functions for fixed values of a coastal typology (CoastType) and b distance from the coastline (Distance). The median fragility function is represented as a solid line, while the shaded area represents the 10th90th confidence interval.

Fragility functions for fixed values of a structural type (BS) and b number of floors (NF). The median fragility function is represented as a solid line, while the shaded area represents the 10th90th confidence interval.

Starting with the analysis of the fragility functions obtained for fixed values of velocity-related variables (Fig.3), it is possible to observe the substantial impact of the hydrodynamic effects, especially in more severe inundation scenarios. Notably, differences in the median fragility functions for the more damaging states (DS5) are only evident when velocity reaches high values (around 10m/s), while those for 0.1 and 2m/s are practically overlapping, albeit featuring a wide uncertainty band, demonstrating how the several additional explicative variables included into the model affect the damage process. More pronounced differences in the fragilities become apparent for lower damage states, under shallower water depths (h<2m) and slower flow velocities, although a substantial portion of the predictive power in non-structural damage scenarios predominantly relies on the inundation depth8,11,13. The velocity proxy accounting for the shielding effect (NShArea) mirrors the behavior observed for vsim, but with greater variability for DS7.

For instance, the probability of reaching DS7 with an inundation depth of 4m drops from ~70% for an isolated building (NShArea=0) to roughly 40% for one located in a densely populated area (NShArea=0.5). This substantial variation not only highlights the influence of this variable for describing the damage mechanism, but also explains its profound impact on the models predictive performance shown in Fig.1. Conversely, for less severe DS, the central values of the three considered fragility functions tend to converge onto a single line, indicating that the shielding mechanism primarily influences the process leading to the total destruction of buildings. Distinct patterns emerge for the velocity proxy related to debris impact (NDIArea), particularly for DS5, emphasizing its crucial role in predicting relevant structural damages.

For example, at an inundation depth of 4m, the probability of reaching DS7 is 40% when NDIArea=0 (i.e., no washed-away structures in the buffer area for the considered building), but it rises to ~90% when NDIArea=0.3 (i.e., 30% of the buffer area with washed-away buildings). Moreover, similarly to NshArea, the width of the uncertainty band generally narrows with decreasing damage state, thus suggesting that inundation depth acts as the main predictor for low entity damages. These results represent an advancement beyond the work of Reese et al.26, who first attempted to incorporate information on shielding and debris mechanisms into fragility functions based on a limited number of field observations for the 2009 South Pacific tsunami, and Charvet et al.8, who investigated the possible effect of debris impacts (through the use of a binary variable) on damage levels for the 2011 Great East Japan event.

Concerning morphological variables, Fig.4 well represents the amplification effect induced by ria-type coasts, especially for the higher damage states, consistently with prior literature8,11,13,37,38. However, above 6m, the median fragility curve for the plain coastal areas exceeds that of the ria-type region, in line with findings by Suppasri et al.37,38, who also described a similar trend pattern. Nevertheless, it is worth observing that the variability introduced by other contributing features muddles the differences between the two coastal types, with the magnitude of the uncertainty band almost eclipsing the noticeable distinctions in the central values. This observation highlights the imperative need to move beyond the use of traditional univariate fragility functions, in favor of multi-variable models, intrinsically capable of taking these complex interactions into account. Distance from the coast has emerged as a pivotal factor in predictive accuracy (Fig.1) and this is also evident in the corresponding fragility functions computed for Distance values of 170, 950 and 2600m (Fig.4). Obviously, a clear negative correlation exists between Distance and inundation depth (Fig.2), with structures closer to the coast being more susceptible to damage, especially in case of structural damages. In detail, more pronounced differences in the fragility patterns are observed for DS5 and DS6, where the probability of exceeding these damage states with a 2m depth is almost null for buildings located within a distance of 1km from the coast, while it increases to over 80% for those in close proximity to the coastline. This mirrors the observations resulting for NDIArea (Fig.3), where greater distances result in less damage potential from washed-away buildings.

Figure5 illustrates the fragility functions categorized by structural types (BS) and building characteristics represented in terms of NF. Overall, the observed patterns align with the findings discussed in the preceding figures. When focusing on the median curves, it becomes evident that these features exert minimal influence on the occurrence of non-structural damages, with overlapping curves and relatively narrow uncertainty bands for DS5, owing to the mentioned dominance of inundation depth as main damage predictive variable in such cases.

However, for the more severe damage states, distinctions become more marked. Reinforced-concrete (RC) buildings exhibit lower vulnerability, followed by steel, masonry and wood structures, with the latter two showing only minor differences among them. A similar trend is also evident for NF, with taller buildings being less vulnerable than shorter ones under severe damage scenarios. The most relevant differences emerge when transitioning from single or two-story buildings to multi-story dwellings. However, once again, it is worth noting that, beyond these general patterns, also highlighted in previous studies1,5,8,11,26,34,37, the influence of other factors tends to blur the distinctions among the central values of the different typologies, as visible, for instance, for the confidence interval for steel buildings, which encompasses both median fragility functions for wood and masonry structures.

More here:
Machine learning and hydrodynamic proxies for enhanced rapid tsunami vulnerability assessment | Communications ... - Nature.com

Related Posts

Comments are closed.