Application of machine learning for identification of heterotic groups in sunflower through combined approach of … – Nature.com

Experiment 1

For accurate identification of heterotic grouping pattern, a multi-prong strategy was adopted, wherein morphological, bio-chemical, and molecular datasets of sunflower genotypes were analyzed by using three clustering algorithms, i.e., hierarchical, K-means and hierarchical+K-means hybrid classification algorithm. Efficacy of these three machine learning algorithms were tested on the sunflower genotypes and the algorithm that best explains and accurately classified the genotypes were used for final parental selection for further hybrid development.

Figure2 represents the dendrogram obtained by using hierarchical classification algorithm. For hierarchical clustering, Ward.D2 method was applied on combined dataset of morphological+bio-chemical+molecular characterization. Cluster diagram (Fig.2) showed two distinct classes of genotypes, wherein cluster 1 contains all the restorer lines, while cluster 2 has CMS+B-line and self-pollinated lines. Number of genotypes grouped in cluster 1 includes 31 sunflower genotypes, while the rest 78 sunflower genotypes grouped in cluster 2. Further, at genetic distance of 18, these clusters can be sub-divided into 6 smaller groups. Sub-group 1-A has six genotypes, while there are 3, 8, 6, 2 and 6 genotypes in subgroup 1-B, 1-C, 1-D, 1-E and 1-F respectively. Likewise, Cluster-2 can be divided into six sub-groups at the genetic distance 18. The number of genotypes recorded in sub-group 2-A was 8, while sub-group 2-B had 11 genotypes. Similarly, the number of genotypes recorded in sub-groups 2-C, 2-D, 2-E, and 2-F were 7, 20, 20 and 12 respectively.

Hierarchical clustering of 109 sunflower genotypes through Ward.D2 method.

K-means cluster algorithm is an unsupervised machine learning based approach that tends to group the similar data points in one cluster, which is away from the dis-matching data points. More precisely, this algorithm aims to minimize the sum of square values within a cluster and consequently maximize the sum of squares between clusters. In the present study, K-means clustering applied on the 109 sunflower genotypes, precisely grouped the sunflower genotypes into 2 major clusters (Fig.3). The size of cluster 1 is 31, while cluster 2 classified 78 sunflower genotypes. Cluster 1 predominantly contains restorer lines, while cluster 2 contains self-pollinated (SFP) lines i.e. A-lines and B-lines of sunflower genetic pool under study. Although K-means application precisely grouped the sunflower genotypes into two major clusters, selecting genotypes with more precision to smaller groups was not possible using this algorithm. As many SFP lines lie closer to the A-line or B-lines, making it harder to distinguish between them.

K-means clustering of 109 sunflower genotypes.

Finally, a hybrid algorithm by using hierarchical+K-means clustering algorithms was applied on the sunflower genotypes to examine if the accuracy of harvesting more precise heterotic groups can be improved further or not? Setting the number of k(s) to 12, two major clusters were observed, that were further grouped into 12 smaller clusters (Fig.4). Cluster 1 contains 12 genotypes in which there were 2 B-lines and 10 restorer lines, cluster 2 contains 8 genotypes (4 CMS+4 B-lines). Cluster 3 had 4 genotypes (1 B-line+3 SFP lines), and 12 genotypes (6 CMS-lines, 5 B-lines and 1 SFP line) were grouped into cluster 4. Cluster 5 gathered 15 genotypes which were all Restorer lines, 11 genotypes were grouped in cluster 6 (5 CMS lines, 4 SFP lines, 1 Restorer line and 1 B-line). Likewise, cluster 7 had 6 sunflower genotypes (5 SFP lines+1 CMS lines), cluster 8 had 11 genotypes (6 SFP lines, 4 restorer lines and 1 CMS line). 6 sunflower genotypes (3 CMS lines, 2 SFP lines and 1 restorer lines) were grouped in cluster 9, while cluster 10 showed a grouping of 8 genotypes (3 CMS lines, 3 Restorer lines and 2 B-lines). Cluster 11 had 8 sunflower genotypes (3 SFP lines, 2 CMS lines, 2 B-lines and 1 Restorer line) and 8 sunflower genotypes tend to group in cluster 12 (3 Restorer, 2 CMS-lines, 2 B-lines and 1 SFP line).

Clustering of 109 sunflower genotypes through hybrid (hierarchical+K-means) machine learning.

Grouping of sunflower genotypes observed by the application of hybrid algorithm (hierarchical+K-means) was found to be useful to some extent as it can be used to group closer genotypes, however, grouping of genotypes with distinct characteristics like restorer lines and CMS lines closely is somewhat confusing, hence this algorithm is also found to be not a good fit for the current study. As the grouping of genotypes using hierarchical clustering algorithm is clearer and more definitive, hence selection of potential parents for the development of sunflower hybrids were based on the grouping observed through hierarchical clustering approach.

As 12 clusters were observed through hierarchical clustering method, 1 genotype from each of the 12 clusters was selected for further utilization in sunflower hybrid breeding program. Genotypes exhibiting the highest seed yield potential from each of the 12 clusters (recorded at the height of 18) were selected. Moreover, all the restorer lines tend to cluster separately from CMS lines, hence Line Tester mating design was followed for sunflower hybrid F1 development.

To assess the practical efficiency of the identified heterotic groups, selected parental lines were crossed in Line Tester mating design and 36 F1 hybrids of sunflower were generated. Heterosis (mid-parent heterosis, better parent heterosis) and combining ability analysis (General combining ability and Specific combining ability) were conducted to evaluate the potential of methodology used for identification/mining of heterotic grouping pattern and thereof selection of potential parental lines for commercial hybrid development.

Table 1 presents the mean performance of 12 sunflower lines that were planted at NARC, Islamabad. The study focused on nine agro-morphological traits. Among the lines, CMS-HAP-112 exhibited the shortest duration to initiate flowering, taking only 46.5days, while RHP-41 had the longest duration of 56.5days. CMS-HAP-111 completed 100% flowering the earliest, within 55days, followed by CMS-HAP-112 at 55.5days. On the other hand, RHP-41 took the maximum number of days to complete flowering, with a duration of 67.5days. Regarding plant height, the 12 parental sunflower lines displayed a range from 200.14cm (CMS-HAP-54) to 134.6cm (CMS-HAP-111). In terms of leaf area, CMS-HAP-56 had the highest recorded value of 257.48 cm2, while RHP-38 had the lowest average leaf area of 141.5 cm2. The largest head diameter of 19.3cm was observed in CMS-HAP-99, whereas the smallest head diameter of 10.45cm was found in RHP-38. In the context of stem curvature, the lowest value recorded was 6.95cm for RHP-71, while CMS-HAP-111 and CMS-HAP-12 exhibited the highest stem curvatures of 48cm and 45.7cm, respectively. The number of leaves varied among the parental lines, with CMS-HAP-111 having the fewest leaves (23.35), and CMS-HAP-112 having the highest number of leaves (33.1), followed by CMS-HAP-99 (33). The 100 seed weight of the parental lines ranged from 3.48g (RHP-69) to 6.61g (CMS-HAP-99). CMS-HAP-112 displayed the highest mean seed yield per plant at 68.19g, while the lowest seed yield per plant was observed in RHP-68 (27.28g) and RHP-41 (27.9g) (Table 1).

Table 2 shows the average of 36 sunflower hybrids grown in NARC, Islamabad. The research focused on nine agromorphological traits. Hybrids RHP-68CMS-HAP-112 and RHP-38CMS-HAP-112 had the shortest flowering times, only 44days. On the other hand, the hybrid RHP-71CMS-HAP-56 had the longest time to flower initiation at 56.5days. RHP-68CMS-HAP-112 and RHP-38CMS-HAP-54 showed the minimum number of days (50) required for hybrids to complete 100% flowering, whereas RHP-71CMS- HAP-111 was 66 5days. The number of days until the flowering rate reaches 100%. Regarding the mean leaf area approaching physiological maturity, RHP-71CMS-HAP-56 showed the highest value of 176.53 cm2, while RHP-69CMS-HAP had the lowest mean leaf area. The largest head diameter he recorded with the RHP-71CMS-HAP-99 was 23.95cm, followed by he with the RHP-53CMS-HAP-111 with a diameter of 22.77cm. Conversely, RHP-68CMS-HAP-112 had the smallest head diameter of 17.11cm, followed by RHP-68CMS-HAP-54 with 17.53cm, and the tallest hybrid in terms of plant height was RHP-71CMS. -HAP-112 had an average height of 175.17cm. while the smaller hybrids were RHP-53CMS-HAP-111 (131cm) and RHP-41CMS-HAP-56 (132cm).

Regarding stem curvature, the lowest recorded value was 42.77cm for RHP-68CMS-HAP-54, followed by RHP-53CMS-HAP-54 with a stem curvature of 48.83cm. HAP-99 and RHP-38CMS-HAP-112 exhibited maximum stem curvatures of 77.5cm and 74.83cm, respectively. RHP-53CMS-HAP-111 has the lowest number of seats (26), RHP-71CMS-HAP-56 has the highest number of seats (36.67), followed by RHP-71CMS-HAP-99 continued. (36.17). Test weights of hybrids ranged from 4.41g (RHP-71CMS-HAP-111) to 7.34g (RHP-38CMS-HAP-12). The minimum seed yield per plant for hybrid RHP-53CMS-HAP-111 was 49.3g, whereas RHP-71CMS-HAP-54 showed the highest average seed yield of 103.36g per plant, compared to RHP-41 followed by RHP-41CMS-HAP-111 of 99.45g.

Results of heterosis and heterobeltiosis for nine morphological characteristics of sunflower plants are presented in Table 3 and 4. Range of heterosis for days to flower initiation reported in present study was from 10.14**% (CMS-HAP-111RHP-71) to 13.04% (CMS-HAP-56RHP-68). The heterotic effects of six hybrids were found to be in positive direction, while non-significant heterosis effects were found of six cross combinations. Remaining all cross combinations showed a highly significant heterosis for days to flower initiation. Heterobeltiotic effects recorded for 36 sunflower hybrids were found to be in the range of 20.35% (CMS-HAP-112RHP-41) to 3.65*% (CMS-HAP-111RHP-71). Most of heterobeltiotic effects are in negative direction.

CMS-HAP-54RHP-38 showed the maximum heterotic effect in negative direction for days taken to 100% flowering ( 18.37**%) followed by CMS-HAP-56RHP-41 ( 17.0**%) and CMS-HAP-56 xRHP-38 ( 16.73**%). Whereas hybrid CMS-HAP-111RHP-71 depicted the highest positive heterotic effect for this trait (13.68**%) followed by CMS-HAP-12RHP-71 (8.94**%). The heterotic effect was significant for all hybrids except for CMS-HAP-111RHP-53. Range of heterobeltiosis was recorded from -23.7**% (CMS-HAP-112RHP-41) to 7.26**% (CMS-HAP-111RHP-71). Heterobeltiotic effect of all the hybrid combinations found to be statistically highly significant for days to complete flowering except four hybrids viz., CMS-HAP-112RHP-71, CMS-HAP-12RHP-71, CMS-HAP-54RHP-71 and CMS-HAP-99RHP-71.

Results obtained of heterosis and heterobeltiosis effects for leaf area in hybrid combination under study depicted that heterosis over mid parent ranged from 3.63ns% to 44.26**%. Highest magnitude of positive heterosis effect was noted for CMS-HAP-12RHP-38 (3.63ns%) while negative heterotic effect in negative direction was recorded for F1 hybrid CMS-HAP-56RHP-41 ( 44.26**%). Highest effect for heterobeltiosis observed in negative direction was ( 48.28**%) for CMS-HAP-56RHP-41, followed by CMS-HAP-56RHP-68 ( 46.11**%). Heterobeltiotic effects of 29 hybrids was found to be statistically significant.

Maximum heterosis for head diameter was observed for CMS-HAP-12RHP-38 (59.49**%), whereas lowest magnitude of mid parent heterosis was depicted by CMS-HAP-112RHP-68(4.65ns%) (Table 3). All hybrids exhibited positive mid parent heterosis. Maximum heterobeltiosis was observed for CMS-HAP-12RHP-71 (31.71**%), while minimum heterobeltiosis was recorded for CMS-HAP-99RHP-69 ( 6.68ns%). Only six sunflower hybrids showed a negative heterobeltiotic effect for head diameter. Maximum mid parent heterosis for plant height recorded was 31.4**% (CMS-HAP-54RHP-53), while minimum mid parent heterosis of 13.92*% was observed for CMS-HAP-111RHP-38. As many as thirty hybrids exhibited a negative magnitude of mid parent heterosis for head diameter in the present study. Range of heterobeltiosis observed was from 35.34% (CMS-HAP-54RHP-68) to 5.17*% (CMS-HAP-111RHP-71). Results for heterobeltiosis of 34 hybrids were found to be negative with respect to better parent heterosis.

Range of heterotic effects for the 36 sunflower hybrids under study recorded was from 65.87**% (CMS-HAP-111RHP-69) to 317.24**% (CMS-HAP-54RHP-71). All sunflower F1 hybrid combinations under study expressed highly significant positive heterotic effects for stem curvature. Heterobeltiosis was statistically significant for 24 hybrids and all 36 F1 hybrids showed positive heterotic effects over the best parent. Maximum heterobeltiosis observed was for CMS-HAP-99RHP-68 (194.68**%), while minimum heterobeltiosis was recorded for CMS-HAP-111RHP-69 (10.06ns%). Results for number of leaves per plant obtained depicted that maximum positive heterosis was recorded for CMS-HAP-111RHP-71 (45.58**%) followed by CMS-HAP-56RHP-71 (31.89**%). Maximum magnitude of negative heterotic effect was noted for CMS-HAP-112RHP-53 ( 9.25ns%), followed by CMS-HAP-99RHP-69 ( 8.66ns%). Of all the 36 hybrid combinations under study, 22 expressed positive heterosis for the average number of leaves per plant. Highest magnitude of heterobeltiotic effect in negative direction was recorded for CMS-HAP-111RHP-53 ( 20.37**%) while maximum better parent positive heterosis was noted for CMS-HAP-111RHP-71 (36.02**%) followed by CMS-HAP-56RHP-71 (24.29**%).

Among all the hybrids tested the results of 25 hybrids for 100 seed weight was found to be statistically significant (Table 4). Maximum heterotic effect noted for this character was 57.72**% (CMS-HAP-56RHP-69) while minimum mid-parent heterosis observed was 3.45ns% (CMS-HAP-111RHP-71). Only two hybrid combinations expressed heterosis for 100 seed weight in negative direction. Heterosis over better parent for 100 seed weight ranges from 15.49*% (CMS-HAP-111RHP-38) to 37.18**% (CMS-HAP-56RHP-53). Results of 10 hybrid combinations were found to be statistically significant. Heterobeltiotic effect of 24 hybrids were on positive side (Table 4). Among all the 36 hybrids tested, 35 sunflower hybrids expressed a positive mid parent heterosis for seed yield per plant. The maximum heterotic effect noted for this character was 134.69**% (CMS-HAP-111RHP-41) followed by 125.18**% (CMS-HAP-12RHP-71) and minimum mid-parent heterosis observed was 1.79ns (CMS-HAP-112RHP-53). Maximum heterobeltiosis recorded was 74.93**% (CMS-HAP-11RHP-41) while minimum heterobeltiosis noted was 27.58ns% (CMS-HAP-112RHP-53). Heterobeltiotic effect of only nine hybrids were negative while rest of 27 hybrids expressed a positive gain over their better parent for seed yield per plant (Table 4).

Line Tester mating design had the ability to evaluate a greater number of hybrids than the diallel and partial diallel mating designs. This technique of hybrid evaluation is quite successful in cases where hybrids must be developed from Restorer and complete male sterile lines. Results pertaining to General Combining Ability of 12 parental lines are presented in Table 5.

Pursual of GCA estimates of all 12 hybrids for DFI showed that only two parents, one CMS, i.e., CMS-HAP-12 (7.65**) and one R-line i.e., RHP-68 (1.07**) had positive and significant GCA effects. Similarly, the same two parents had the highest, positive and significant GCA effect for DFC, depicting that these hybrids are late maturing. For leaf area GCA estimates, CMS-HAP-12 (14.73**) were found to be highly significant and positive among all the 12 parental lines under examination, while CMS-HAP-99 showed the lowest GCA magnitude of 13.99**. GCA effects for average leaf area for all the six male lines were found to be non-significant. Range of GCA estimates for head diameter recorded was from 2.57** (CMS-HAP-12) to 1.17** (CMS-HAP-54), while among male lines RHP-68 was found to be a good general combiner for head diameter with GCA effect of 1.02*. The best general combining ability recorded for plant height was from CMS-HAP-12 (13.22**), while lowest GCA estimate of 10.3** was shown by CMS-HAP-111. Stem curvature GCA estimates of all the 12 parents under study were found to be statistically non-significant. GCA of number of leaves per plant were highly significant for two CMS lines viz., CMS-HAP-111 ( 1.94**) and CMS-HAP-12 (4.53**). RHP-71 (0.64ns) showed the maximum GCA among tester lines. For 100 seed weight only 2 parental lines i.e., CMS-HAP-112 (0.45*) and RHP-69 (0.41*) showed good general combining ability for this yield related important plant characteristic. CMS-HAP-12 exhibited highest GCA effect of 20.43** for seed yield per plant among female lines, while for testers no male line exhibited a significant positive GCA effect for seed yield.

Result of combination specific combining ability of thirty-six sunflower hybrids developed from 12 parental line following L T mating design for nine agro-morphological traits are presented in Table 6. SCA effect of CMS-HAP-12RHP-68 (3.18**) was the highest for DFI, while SCA estimate of 2.9** showed by CMS-HAP-112RHP-41 was the lowest in magnitude. Combination specific combining ability estimates for days taken to flower completion was found to be highest for CMS-HAP-12RHP-68 (3.60**), while CMS-HAP-112RHP-68 cross combination recorded maximum negative SCA effect for DFC, showing that this cross combination is the earliest in flowering than rest of hybrids study. Significant SCA estimates were recorded for all the 36 hybrids for leaf area with maximum SCA effect of 20.87** was observed for CMS-HAP-54RHP-38. Only three hybrids showed a positive and significant SCA magnitude for head diameter, with maximum value of 2.46* (CMS-HAP-12RHP-38). For head diameter, 21 hybrid combination depicted a negative SCA estimates showing that head diameter of hybrids was less than that of their respective parents. The highest magnitude of SCA for plant height was shown by CMS-HAP-112RHP-71 (15.6*). Combination specific combining ability estimates for stem curvature were positive for 34 cross combinations. Range of SCA effects for number of leaves per plant was from 3.47* (CMS-HAP-99RHP-41) to 3.53* (CMS-HAP-11RHP-53). Only one cross combination was found to be significant for head diameter SCA effect and in negative direction, i.e., CMS-HAP-111RHP-38 ( 1.30**). Positive SCA effects of 17 hybrids for 100 seed weight was observed. For seed yield per plant magnitude of SCA recorded was positive for 19 cross combinations, while maximum positive SCA magnitude was depicted by CMS-HAP-111RHP-53 (3.60**) followed by CMS-HAP-112RHP-53 (2.93**).

Original post:
Application of machine learning for identification of heterotic groups in sunflower through combined approach of ... - Nature.com

Related Posts

Comments are closed.