Machine learning-guided determination of Acinetobacter density in … – Nature.com

A descriptive summary of the physicochemical variables and Acinetobacter density of the waterbodies is presented in Table 1. The mean pH, EC, TDS, and SAL of the waterbodies was 7.760.02, 218.664.76 S/cm, 110.532.36mg/L, and 0.100.00 PSU, respectively. While the average TEMP, TSS, TBS, and DO of the rivers was 17.290.21C, 80.175.09mg/L, 87.515.41 NTU, and 8.820.04mg/L, respectively, the corresponding DO5, BOD, and AD was 4.820.11mg/L, 4.000.10mg/L, and 3.190.03 log CFU/100mL respectively.

The bivariate correlation between paired PVs varied significantly from very weak to perfect/very strong positive or negative correlation (Table 2). In the same manner, the correlation between various PVs and AD varies. For instance, negligible but positive very weak correlation exist between AD and pH (r=0.03, p=0.422), and SAL (r=0.06, p=0.184) as well as very weak inverse (negative) correlation between AD and TDS (r=0.05, p=0.243) and EC (r=0.04, p=0.339). A significantly positive but weak correlation occurs between AD and BOD (r=0.26, p=4.21E10), and TSS (r=0.26, p=1.09E09), and TBS (r=0.26, 1.71E-09) whereas, AD had a weak inverse correlation with DO5 (r=0.39, p=1.31E21). While there was a moderate positive correlation between TEMP and AD (r=0.43, p=3.19E26), a moderate but inverse correlation occurred between AD and DO (r=0.46, 1.26E29).

The predicted AD by the 18 ML regression models varied both in average value and coverage (range) as shown in Fig.1. The average predicted AD ranged from 0.0056 log units by M5P to 3.2112 log unit by SVR. The average AD prediction declined from SVR [3.2112 (1.46464.4399)], DTR [3.1842 (2.23124.3036)], ENR [3.1842 (2.12334.8208)], NNT [3.1836 (1.13994.2936)], BRT [3.1833 (1.68904.3103)], RF [3.1795 (1.35634.4514)], XGB [3.1792 (1.10404.5828)], MARS [3.1790 (1.19014.5000)], LR [3.1786 (2.18954.7951)], LRSS [3.1786 (2.16224.7911)], GBM [3.1738 (1.43284.3036)], Cubist [3.1736 (1.10124.5300)], ELM [3.1714 (2.22364.9017)], KNN [3.1657 (1.49884.5001)], ANET6 [0.6077 (0.04191.1504)], ANET33 [0.6077 (0.09500.8568)], ANET42 [0.6077 (0.06920.8568)], and M5P [0.0056 (0.60240.6916)]. However, in term of range coverage XGB [3.1792 (1.10404.5828)] and Cubist [3.1736 (1.10124.5300)] outshined other models because those models overestimated and underestimated AD at lower and higher values respectively when compared with raw data [3.1865 (14.5611)].

Comparison of ML model-predicted AD in the waterbodies. RAW raw/empirical AD value.

Figure2 represents the explanatory contributions of PVs to AD prediction by the models. The subplot A-R gives the absolute magnitude (representing parameter importance) by which a PV instance changes AD prediction by each model from its mean value presented in the vertical axis. In LR, an absolute change from the mean value of pH, BOD, TSS, DO, SAL, and TEMP corresponded to an absolute change of 0.143, 0.108, 0.069, 0.0045, 0.04, and 0.004 units in the LRs AD prediction response/value. Also, an absolute response flux of 0.135, 0.116, 0.069, 0.057, 0.043, and 0.0001 in AD prediction value was attributed to pH, BOD, TSS, DO. SAL, and TEMP changes, respectively, by LRSS. Similarly, absolute change in DO, BOD, TEMP, TSS, pH, and SAL would achieve 0.155, 0.061. 0.099, 0.144, and 0.297 AD prediction response changes by KNN. In addition, the most contributed or important PV whose change largely influenced AD prediction response was TEMP (decreases or decreases the responses up to 0.218) in RF. Summarily, AD prediction response changes were highest and most significantly influenced by BOD (0.209), pH (0.332), TSS (0.265), TEMP (0.6), TSS (0.233), SAL (0.198), BOD (0.127), BOD (0.11), DO (0.028), pH (0.114), pH (0.14), SAL(0.91), and pH (0.427) in XGB, BTR, NNT, DTR, SVR, M5P, ENR, ANET33, ANNET64, ANNET6, ELM, MARS, and Cubist, respectively.

PV-specific contribution to eighteen ML models forecasting capability of AD in MHWE receiving waterbodies. The average baseline value of PV in the ML is presented on the y-axis. The green/red bars represent the absolute value of each PV contribution in predicting AD.

Table 4 presents the eighteen regression algorithms performance predicting AD given the waterbodies PVs. In terms of MSE, RMSE, and R2, XGB (MSE=0.0059, RMSE=0.0770; R2=0.9912) and Cubist (MSE=0.0117, RMSE=0.1081, R2=0.9827) ranked first and second respectively, to outmatched other models in predicting AD. While MSE and RMSE metrics ranked ANET6 (MSE=0.0172, RMSE=0.1310), ANRT42 (MSE=0.0220, RMSE=0.1483), ANET33 (MSE=0.0253, RMSE=0.1590), M5P (MSE=0.0275, RMSE=0.1657), and RF (MSE=0.0282, RMSE=0.1679) in the 3, 4, 5, 6, and 7 position among the MLs in predicting AD, M5P (R2=0.9589 and RF (R2=0.9584) recorded better performance in term of R-squared metric and ANET6 (MAD=0.0856) and M5P (MAD=0.0863) in term of MAD metric among the 5 models. But Cubist (MAD=0.0437) XGB (MAD=0.0440) in term of MAD metric.

The feature importance of each PV over permutational resampling on the predictive capability of the ML models in predicting AD in the waterbodies is presented in Table 3 and Fig. S1. The identified important variables ranked differently from one model to another, with temperature ranking in the first position by 10/18 of the models. In the 10 algorithms/models, the temperature was responsible for the highest mean RMSE dropout loss, with temperature in RF, XGB, Cubist, BRT, and NNT accounting for 0.4222 (45.90%), 0.4588 (43.00%), 0.5294 (50.82%), 0.3044 (44.87%), and 0.2424 (68.77%) respectively, while 0.1143 (82.31%),0.1384 (83.30%), 0.1059 (57.00%), 0.4656 (50.58%), and 0.2682 (57.58%) RMSE dropout loss was attributed to temperature in ANET42, ANET10, ELM, M5P, and DTR respectively. Temperature also ranked second in 2/18 models, including ANET33 (0.0559, 45.86%) and GBM (0.0793, 21.84%). BOD was another important variable in forecasting AD in the waterbodies and ranked first in 3/18 and second in 8/18 models. While BOD ranked as the first important variable in AD prediction in MARS (0.9343, 182.96%), LR (0.0584, 27.42%), and GBM (0.0812, 22.35%), it ranked second in KNN (0.2660, 42.69%), XGB (0.4119, 38.60); BRT (0.2206, 32.51%), ELM (0.0430, 23.17%), SVR (0.1869, 35.77%), DTR (0.1636, 35.13%), ENR (0.0469, 21.84%) and LRSS (0.0669, 31.65%). SAL rank first in 2/18 (KNN: 0.2799; ANET33: 0.0633) and second in 3/18 (Cubist: 0.3795; ANET42: 0.0946; ANET10: 0.1359) of the models. DO ranked first in 2/18 (ENR [0.0562; 26.19%] and LRSS [0.0899; 42.51%]) and second in 3/18 (RF [0.3240, 35.23%], M5P [0.3704, 40.23%], LR [0.0584, 27.41%]) of the models.

Figure3 shows the residual diagnostics plots of the models comparing actual AD and forecasted AD values by the models. The observed results showed that actual AD and predicted AD value in the case of LR (A), LRSS (B), KNN (C), BRT 9F), GBM (G), NNT (H), DTR (I), SVR (J), ENR (L), ANET33 (M), ANER64 (N), ANET6 (O), ELM (P) and MARS (Q) skewed, and the smoothed trend did not overlap. However, actual AD and predicted AD values experienced more alignment and an approximately overlapped smoothed trend was seen in RF (D), XGB (E), M5P (K), and Cubist (R). Among the models, RF (D) and M5P (K) both overestimated and underestimated predicted AD at lower and higher values, respectively. Whereas XGB and Cubist both overestimated AD value at lower value with XGB closer to the smoothed trend that Cubist. Generally, a smoothed trend overlapping the gradient line is desirable as it shows that a model fits all values accurately/precisely.

Comparison between actual and predicted AD by the eighteen ML models.

The comparison of the partial-dependence profiles of PVs on AD prediction by the 18 modes using a unitary model by PVs presentation for clarity is shown in Figs. S2S7. The partial-dependence profiles existed in i. a form where an average increase in AD prediction accompanied a PV increase (upwards trend), (ii) inverse trend, where an increase in a PV resulted in a decline AD prediction, (iii) horizontal trend, where increase/decrease in a PV yielded no effects on AD prediction, and (iv) a mixed trend, where the shape switch between 2 or more of iiii. The models' response varied with a change in any of the PV, especially changes beyond the breakpoints that could decrease or increase AD prediction response.

The partial-dependence profile (PDP) of DO for models has a downtrend either from the start or after a breakpoint(s) of nature ii and iv, except for ELM which had an upward trend (i, Fig. S2). TEMP PDP had an upward trend (i and iv) and, in most cases filled with one or more breakpoints but had a horizontal trend in LRSS (Fig. S3). SAL had a PDP of a typical downward trend (ii and iv) across all the models (Fig. S4). While pH displayed a typical downtrend PDP in LR, LRSS, NNT, ENR, ANN6, a downtrend filled with different breakpoint(s) was seen in RF, M5P, and SVR; other models showed a typical upward trend (i and iv) filled with breakpoint(s) (Fig. S5). The PDP of TSS showed an upward trend that returned to a plateau (DTR, ANN33, M5P, GBM, RF, XFB, BRT), after a final breakpoint or a declining trend (ANNT6, SVR; Fig. S6). The BOD PDP generally had an upward trend filled with breakpoint(s) in most models (Fig. S7).

See the original post:
Machine learning-guided determination of Acinetobacter density in ... - Nature.com

Related Posts

Comments are closed.