Category Archives: Machine Learning
In the AI science boom, beware: your results are only as good as your data – Nature.com
Hunter Moseley says that good reproducibility practices are essential to fully harness the potential of big data.Credit: Hunter N.B. Moseley
We are in the middle of a data-driven science boom. Huge, complex data sets, often with large numbers of individually measured and annotated features, are fodder for voracious artificial intelligence (AI) and machine-learning systems, with details of new applications being published almost daily.
But publication in itself is not synonymous with factuality. Just because a paper, method or data set is published does not mean that it is correct and free from mistakes. Without checking for accuracy and validity before using these resources, scientists will surely encounter errors. In fact, they already have.
In the past few months, members of our bioinformatics and systems-biology laboratory have reviewed state-of-the-art machine-learning methods for predicting the metabolic pathways that metabolites belong to, on the basis of the molecules chemical structures1. We wanted to find, implement and potentially improve the best methods for identifying how metabolic pathways are perturbed under different conditions: for instance, in diseased versus normal tissues.
We found several papers, published between 2011 and 2022, that demonstrated the application of different machine-learning methods to a gold-standard metabolite data set derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG), which is maintained at Kyoto University in Japan. We expected the algorithms to improve over time, and saw just that: newer methods performed better than older ones did. But were those improvements real?
Scientific reproducibility enables careful vetting of data and results by peer reviewers as well as by other research groups, especially when the data set is used in new applications. Fortunately, in keeping with best practices for computational reproducibility, two of the papers2,3 in our analysis included everything that is needed to put their observations to the test: the data set they used, the computer code they wrote to implement their methods and the results generated from that code. Three of the papers24 used the same data set, which allowed us to make direct comparisons. When we did so, we found something unexpected.
It is common practice in machine learning to split a data set in two and to use one subset to train a model and another to evaluate its performance. If there is no overlap between the training and testing subsets, performance in the testing phase will reflect how well the model learns and performs. But in the papers we analysed, we identified a catastrophic data leakage problem: the two subsets were cross-contaminated, muddying the ideal separation. More than 1,700 of 6,648 entries from the KEGG COMPOUND database about one-quarter of the total data set were represented more than once, corrupting the cross-validation steps.
NatureTech
When we removed the duplicates in the data set and applied the published methods again, the observed performance was less impressive than it had first seemed. There was a substantial drop in the F1 score a machine-learning evaluation metric that is similar to accuracy but is calculated in terms of precision and recall from 0.94 to 0.82. A score of 0.94 is reasonably high and indicates that the algorithm is usable in many scientific applications. A score of 0.82, however, suggests that it can be useful, but only for certain applications and only if handled appropriately.
It is, of course, unfortunate that these studies were published with flawed results stemming from the corrupted data set; our work calls their findings into question. But because the authors of two of the studies followed best practices in computational scientific reproducibility and made their data, code and results fully available, the scientific method worked as intended, and the flawed results were detected and (to the best of our knowledge) are being corrected.
The third team, as far as we can tell, included neither their data set nor their code, making it impossible for us to properly evaluate their results. If all of the groups had neglected to make their data and code available, this data-leakage problem would have been almost impossible to catch. That would be a problem not just for the studies that were already published, but also for every other scientist who might want to use that data set for their own work.
More insidiously, the erroneously high performance reported in these papers could dissuade others from attempting to improve on the published methods, because they would incorrectly find their own algorithms lacking by comparison. Equally troubling, it could also complicate journal publication, because demonstrating improvement is often a requirement for successful review potentially holding back research for years.
So, what should we do with these erroneous studies? Some would argue that they should be retracted. We would caution against such a knee-jerk reaction at least as a blanket policy. Because two of the three papers in our analysis included the data, code and full results, we could evaluate their findings and flag the problematic data set. On one hand, that behaviour should be encouraged for instance, by allowing the authors to publish corrections. On the other, retracting studies with both highly flawed results and little or no support for reproducible research would send the message that scientific reproducibility is not optional. Furthermore, demonstrating support for full scientific reproducibility provides a clear litmus test for journals to use when deciding between correction and retraction.
Now, scientific data are growing more complex every day. Data sets used in complex analyses, especially those involving AI, are part of the scientific record. They should be made available along with the code with which to analyse them either as supplemental material or through open data repositories, such as Figshare (Figshare has partnered with Springer Nature, which publishes Nature, to facilitate data sharing in published manuscripts) and Zenodo, that can ensure data persistence and provenance. But those steps will help only if researchers also learn to treat published data with some scepticism, if only to avoid repeating others mistakes.
Link:
In the AI science boom, beware: your results are only as good as your data - Nature.com
Machine Learning APIs on the Horizon – Digital Engineering 24/7
Last November, the Autodeskcrowd returned to familiar grounds: the Venetian in Las Vegas for the annual Autodesk University. In the convention center a few corridors away from the slot machines, roulette wheels and Blackjack tables, Autodesk CEO Andrew Anagnost decided it was time to show his hands.
For better or worse, AI[artificial intelligence]has arrived, with all the looming implications for all of us, he said.Weve been working to get you excited about AI for years. But now were moving from talking about it to actually changing your businesses.
Autodesks big bet comes in the form of Autodesk AI, based in part on the technology from BlankAI it acquired. The implementation will lead to 3D models that can be rapidly created, explored and edited in real time using semantic controls and natural language without advanced technical skills, the company announced.
More details come from Jeff Kinder, executive vice president of product development and manufacturing. Debuting in our automotive design studio next year, BlankAI will allow you to pull from your historical library of design work, then use it to rapidly generate new concepts that build on your existing design style, he said. This is where general AI ends, and personalized machine learning begins.
The ability to use natural language to generate 3D assets, as described by Autodesk in its announcement, would be a big achievement itself. But the next step is more tantalizing. The initial AI will no doubt be trained based on publicly available design data and Autodesk data. But the company also revealed it planned to give users a way to further refine the algorithms using their own proprietary data.
At some point, as some of these capabilities get to a level of automation, we may actually license a model to a particular customer, and they can train and improve that model on top of their capabilities. Thats the business model that Microsoft is using right now for some of their tools. I think its a very robust model, said Anagnost during an industry pressQ&A.
Although initially debuting in Autodesks automotive portfolio, Autodesk ultimately aims to include these capabilities as part of its Fusion industry cloud, said Stephen Hooper, Autodesk vice president of design and manufacturing. If the algorithm is trained on your historical data, it understands your design cues, styling cues and brand identity, making it much more helpful in generating your preferred designs.While its clearly on Autodesks roadmap, the exact mechanism remains unclear. Were still evaluating how and when we might provide a private model, Hooper said.
Last year, at PCB West in Santa Clara, CA, Kyle Miller,research and product manager at Zuken, unveiled a new offering from Zuken for its CR-8000printed circuit boarddesign software: the Autonomous Intelligent Place and Route (AIPR). Miller pointed out AI-optimized layout tends to be cleaner, simpler, with fewer clashes, because the software could process complex design hierarchy and signal clusters much better than humans could. What takes Autorouter [another product] a set-up time of 30 minutes and auto-routing time of 15 minutes, might just take AIPR 30 seconds, he said.
AIPR is just a launchpad for the PCB software makers long-term goal. The next step, according to Miller, is to apply machine learning to all the PCB designs available in Zukens library. The outcome is what the company calls the Basic Brain, which enhances the user experience by routing the design utilizing the Smart Autorouter based on learned approaches and strategies.
After that, Zuken plans to offer a tool that its customers can use to apply machine learning to their own library. The company calls it the Dynamic Brain, which learns from your PCB designers, utilizing past design examples and integrating them into AI algorithms. Ultimately, the goal is the Autonomous Brain, an AI-driven powerhouse in continuous learning mode, pushing the boundaries of creativity.
Zukens roadmap is a multiyear roadmap; therefore, Dynamic Brain is not expected to show up in the portfolio soon. Our first goal is to make the base productthe Basic Brainas capable as possible before delivering the Dynamic Brain, Miller said.
The plan to let customers use the AI tool to ingest proprietary data also invites certain questions about security. It has been very important to Zuken from the beginning of this process that we have no internal access to any customer data All training of the Dynamic Brain will be done on the customer site. We have no plan to use cloud-based services for this (unless specifically agreed with a customer). No data is shared with Zuken servers, said Miller.
The training will be done via the Zuken CR-8000 platform, Miller explained. This communicates on the local network only (or wider network if the customer has a secure multisite network) with the AIPR server, which handles the training, he added.
On the first day he took the job as Ansyss CTO, Prith Banerjee decided he was going to focus on AI-powered simulation.In 2018, we started investing in it, specifically to explore opportunities in two areas: Can AI or ML make simulation faster? Can it make simulation easier to use? he asked.
The investment appears to be bearing fruit. Last October, Ansys launched Ansys SimAI, described as a cloud-enabled, physics-neutral platform that will empower users across industries to greatly accelerate innovation and reduce time to market.
Once trained with customer data, Ansys SimAI predicts simulation results, which align with AnsysFluentcalculations. However, Ansys SimAI takes a mere 5 minutes.Image courtesy of Ansys.
The software is a departure from typical simulation products. Its better to think of it as a way to use AI to train the software to develop good finite element analysis (FEA) instincts.
SimAI is a giant leap forward compared to our previous technology, in that the users do not need to parametrize the geometry, said Banerjee.
You feed the software a set of simulation results, then let the AI-like software learn the correlations between the topology and the outcomes.The users can take their simulation results, upload them to SimAI and train the model. After uploading the data, the users select which variables they are interested in and how long they are willing to wait for the training to complete. Once the training is done, they can upload new geometries and make predictions, explained Banerjee.
This is similar to how, over time, a veteranFEAuser learns to anticipate certain stress distributions, deformation and airflow patterns based on the designs topology. Except, with high-performance computing (HPC), the software can learn in a few hours what would have taken a human months or years to learn. But with HPC comes the need to rely on the cloudAnsys cloud, based on Amazon Web Services (AWS) infrastructure.
AWS infrastructure provides state-of-the-art security and is used by many security-sensitive customers spanning defense, healthcare and financial organizations, Banerjee said. In 2018, Ansys launched Ansys Discovery, a fast-paced simulation tool targeting the designers. Since then, cloud has become an integral part of the companys strategy and offerings.
The personalization of machine learning is a trend that weve been seeing in the last five years, said Johanna Pingel, AI product marketing manager at MathWorks. Essentially, you start with out-of-the-box algorithms, but then you want to incorporate your own engineering data, she added.
MathWorks offers MATLAB and Simulink. MATLABapps let you see how different algorithms work with your data. Iterate until youve got the results you want, then automatically generate a MATLAB program to reproduce or automate your work, according to the company.
Once you have an executable program, you may deploy it in Simulink to build system models to conduct what-if analyses.
Suppose youre an autonomous vehicle developer. Its relatively easy to develop or find an out-of-the-box lane-detection algorithm, Pingel pointed out.
But thats just the starting point. You may want to refine it to work for nighttime, or for the UK, where the drivers drive on the left. Without such refinement options, the algorithms scope will likely be too broad to be effective for your enterprises specific needs, she said.
MATLAB is ideal for such training, according to Pingel. She explained, You can import your data through a[graphical user interface (GUI)], and train a model through a GUI. You use a low-code, app-based workflow for training models.
MathWorks has also taken note of its own customers growing interest in ChatGPT-like interactions. In response, in November 2023, the company launched the MATLAB AI Chat Playground, trained on ChatGPT. It appears as a chat panel in MATLAB, allowing users to query the software using natural language. However, the tool is experimental and still evolving, Pingel cautioned.
Although natural language-based input might make engineering tools more accessible,Pingelpointed out the domain knowledge and expertise of the human still remains essential in crafting the input and assessing the output.
Engineers must use their inherent knowledge of the problem when theyre talking to the software about the kind of structural capabilities they want. They have to bring that to the table when theyre using generative AI, she said.
Former SolidWorks CEO and Onshape cofounder John McEleney warned, Im not dismissing the technology, but theres a lot of AI washing happening. Everyone wants to jump on the AI bandwagon with AI this, AI that.
For AI training to be reliable, the sample data pool has to be large enough to represent a rich variety of scenarios.
The question is, do you have enough models to train your AI engine? he asked. If youre a large automotive or aerospace company, sure. But for most midsize manufacturers, maybe not. If your training is based on 50 to 100 models, are you reaching a critical mass? he asked.
McEleney revealed Onshape is currently exploring some internal models to gain insights. It would be logical and reasonable to assume that design assistant-type suggestions will be how we would introduce these features, he said.
Considering how speaking to AI chatbots such as Siri on smartphones has become the norm, McEleney said, You can imagine being able to tell your software, Go do this, and the system being able to find samples from your previous work to execute it for you.
He also foresees users being highly protective of their proprietary data even if they want to benefit from AI training.
So I can see that, at least in the beginning, people will want to do that type of training internally, he added.
Most people would like access to others data, because a larger sample pool makes the AI algorithm more reliable. But the same people are also highly protective of their proprietary data, because it contains IP that gives them a competitive advantage. Thats the dilemma of the AI era.
View original post here:
Machine Learning APIs on the Horizon - Digital Engineering 24/7
Development and validation of a cuproptosis-related prognostic model for acute myeloid leukemia patients using … – Nature.com
De Kouchkovsky, I. & Abdul-Hay, M. Acute myeloid leukemia: A comprehensive review and 2016 update. Blood Cancer J. 6(7), e441 (2016).
Article PubMed PubMed Central Google Scholar
Schwind, S. et al. BAALC and ERG expression levels are associated with outcome and distinct gene and microRNA expression profiles in older patients with de novo cytogenetically normal acute myeloid leukemia: A Cancer and Leukemia Group B study. Blood. 116(25), 56605669 (2010).
Article CAS PubMed PubMed Central Google Scholar
Shallis, R. M., Wang, R., Davidoff, A., Ma, X. & Zeidan, A. M. Epidemiology of acute myeloid leukemia: Recent progress and enduring challenges. Blood Rev. 36, 7087 (2019).
Article PubMed Google Scholar
Yang, X. & Wang, J. Precision therapy for acute myeloid leukemia. J. Hematol. Oncol. 11(1), 3 (2018).
Article PubMed PubMed Central Google Scholar
Newell, L. F. & Cook, R. J. Advances in acute myeloid leukemia. Bmj. 375, n2026 (2021).
Article PubMed Google Scholar
Liu, H. Emerging agents and regimens for AML. J Hematol Oncol. 14(1), 49 (2021).
Article PubMed PubMed Central Google Scholar
Tsvetkov, P. et al. Copper induces cell death by targeting lipoylated TCA cycle proteins. Science. 375(6586), 12541261 (2022).
Article CAS PubMed PubMed Central ADS Google Scholar
Sriskanthadevan, S. et al. AML cells have low spare reserve capacity in their respiratory chain that renders them susceptible to oxidative metabolic stress. Blood. 125(13), 21202130 (2015).
Article CAS PubMed PubMed Central Google Scholar
Porporato, P. E., Filigheddu, N., Pedro, J. M. B., Kroemer, G. & Galluzzi, L. Mitochondrial metabolism and cancer. Cell Res. 28(3), 265280 (2018).
Article CAS PubMed Google Scholar
Li, P. et al. A novel cuproptosis-related LncRNA signature: Prognostic and therapeutic value for acute myeloid leukemia. Front Oncol. 12, 966920 (2022).
Article CAS PubMed PubMed Central Google Scholar
Zhu, Y., He, J., Li, Z. & Yang, W. Cuproptosis-related lncRNA signature for prognostic prediction in patients with acute myeloid leukemia. BMC Bioinform. 24(1), 37 (2023).
Article CAS Google Scholar
Luo, D. et al. Characterization of cuproptosis identified immune microenvironment and prognosis in acute myeloid leukemia. Clin Transl. Oncol. 25(8), 23932407 (2023).
Article CAS PubMed Google Scholar
Wolpert, D. Stacked generalization. Neural Netw. 5, 241259 (1992).
Article Google Scholar
Wang, S. et al. Multidimensional cell-free DNA Fragmentomic assay for detection of early-stage lung cancer. Am. J. Respir. Crit. Care Med. 207(9), 12031213 (2023).
Article CAS PubMed Google Scholar
Albuquerque, C., Henriques, R. & Castelli, M. A stacking-based artificial intelligence framework for an effective detection and localization of colon polyps. Sci. Rep. 12(1), 17678 (2022).
Article CAS PubMed PubMed Central ADS Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43(7), e47 (2015).
Article PubMed PubMed Central Google Scholar
Tyner, J. W. et al. Functional genomic landscape of acute myeloid leukaemia. Nature. 562(7728), 526531 (2018).
Article CAS PubMed PubMed Central ADS Google Scholar
Li, H. et al. Development and validation of prognostic model for lung adenocarcinoma patients based on m6A methylation related transcriptomics. Front. Oncol. 12, 895148 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 28(11), 19471951 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M., Furumichi, M., Sato, Y., Kawashima, M. & Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 51(D1), D587D592 (2023).
Article CAS PubMed Google Scholar
Stekhoven, D. J. & Bhlmann, P. MissForestnon-parametric missing value imputation for mixed-type data. Bioinformatics. 28(1), 112118 (2012).
Article CAS PubMed Google Scholar
Tang, Z., Shen, Y., Zhang, X. & Yi, N. The spike-and-slab lasso Cox model for survival prediction and associated genes detection. Bioinformatics. 33(18), 27992807 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yi, N., Tang, Z., Zhang, X. & Guo, B. BhGLM: Bayesian hierarchical GLMs and survival models, with applications to genomics and epidemiology. Bioinformatics. 35(8), 14191421 (2019).
Article CAS PubMed Google Scholar
Wang, J. et al. Development and external validation of a prognostic model for survival of people living with HIV/AIDS initiating antiretroviral therapy. Lancet. Reg. Health West. Pac. 16, 100269 (2021).
Article PubMed PubMed Central Google Scholar
Bansal, A. & Heagerty, P. J. A comparison of landmark methods and time-dependent ROC methods to evaluate the time-varying performance of prognostic markers for survival outcomes. Diagn. Progn. Res. 3, 14 (2019).
Article PubMed PubMed Central Google Scholar
Harrell FE, editor Regression modeling strategies : With applications to linear models, logistic and ordinal regression, and survival analysis (2015).
Winer, E. S. & Stone, R. M. Novel therapy in acute myeloid leukemia (AML): Moving toward targeted approaches. Ther. Adv. Hematol. 10, 2040620719860645 (2019).
Article CAS PubMed PubMed Central Google Scholar
Valent, P. et al. Immunotherapy-Based Targeting and Elimination of Leukemic Stem Cells in AML and CML. Int. J. Mol. Sci. 20(17), 4233 (2019).
Article CAS PubMed PubMed Central Google Scholar
Fu, D. et al. Prognosis and characterization of immune microenvironment in acute myeloid Leukemia through identification of an autophagy-related signature. Front. Immunol. 12, 695865 (2021).
Article CAS PubMed PubMed Central Google Scholar
Geeleher, P., Cox, N. & Huang, R. S. pRRophetic: an R package for prediction of clinical chemotherapeutic response from tumor gene expression levels. PLoS One. 9(9), e107468 (2014).
Article PubMed PubMed Central ADS Google Scholar
Ishwaran, H., Kogalur, U. B., Blackstone, E. H. & Lauer, M. S. Random survival forests. The Annals of Applied Statistics. 2(3), 841860 (2008).
Article MathSciNet Google Scholar
Van Belle, V., Pelckmans, K., Van Huffel, S. & Suykens, J. A. Improved performance on high-dimensional survival data by application of Survival-SVM. Bioinformatics. 27(1), 8794 (2011).
Article PubMed Google Scholar
Wang, S. V. et al. Generalized boosted modeling to identify subgroups where effect of dabigatran versus warfarin may differ: An observational cohort study of patients with atrial fibrillation. Pharmacoepidemiol. Drug Saf. 27(4), 383390 (2018).
Article CAS PubMed Google Scholar
Clift, A. K. et al. Predicting 10-year breast cancer mortality risk in the general female population in England: A model development and validation study. Lancet. Digit. Health. 5(9), e571e581 (2023).
Article CAS PubMed Google Scholar
Li, Z. et al. Identification of a 24-gene prognostic signature that improves the European LeukemiaNet risk classification of acute myeloid leukemia: An international collaborative study. J. Clin. Oncol. 31(9), 11721181 (2013).
Article CAS PubMed PubMed Central Google Scholar
Chen, Z. et al. A novel 4-mRNA signature predicts the overall survival in acute myeloid leukemia. Am. J. Hematol. 96(11), 13851395 (2021).
Article CAS PubMed Google Scholar
Li, S. et al. Distinct evolution and dynamics of epigenetic and genetic heterogeneity in acute myeloid leukemia. Nat. Med. 22(7), 792799 (2016).
Article CAS PubMed PubMed Central Google Scholar
Messling, J. E. et al. Targeting RIOK2 ATPase activity leads to decreased protein synthesis and cell death in acute myeloid leukemia. Blood. 139(2), 245255 (2022).
Article CAS PubMed PubMed Central Google Scholar
Mller, I. et al. MPP8 is essential for sustaining self-renewal of ground-state pluripotent stem cells. Nat. Commun. 12(1), 3034 (2021).
Article PubMed PubMed Central ADS Google Scholar
Yu, X. et al. High expression of LOC541471, GDAP1, SOD1, and STK25 is associated with poor overall survival of patients with acute myeloid leukemia. Cancer Med. 12(7), 90559067 (2023).
Article CAS PubMed PubMed Central Google Scholar
Zachari, M. & Ganley, I. G. The mammalian ULK1 complex and autophagy initiation. Essays Biochem. 61(6), 585596 (2017).
Article PubMed PubMed Central Google Scholar
Hwang, D. Y. et al. ULK1 inhibition as a targeted therapeutic strategy for FLT3-ITD-mutated acute myeloid leukemia. J. Exp. Clin. Cancer Res. 39(1), 85 (2020).
Article CAS PubMed PubMed Central Google Scholar
Slone, J. D. et al. Integrated analysis of the molecular pathogenesis of FDXR-associated disease. Cell Death Dis. 11(6), 423 (2020).
Article CAS PubMed PubMed Central Google Scholar
Jiang, R. et al. Cuproptosis-related gene PDHX and heat stress-related HSPD1 as potential key drivers associated with cell stemness, aberrant metabolism and immunosuppression in esophageal carcinoma. Int. Immunopharmacol. 117, 109942 (2023).
Article CAS PubMed Google Scholar
Zhao, W., Zhang, X., Chen, Y., Shao, Y. & Feng, Y. Downregulation of TRIM8 protects neurons from oxygen-glucose deprivation/re-oxygenation-induced injury through reinforcement of the AMPK/Nrf2/ARE antioxidant signaling pathway. Brain Res. 1728, 146590 (2020).
Article CAS PubMed Google Scholar
Han, F., Tan, Y., Cui, W., Dong, L. & Li, W. Novel insights into etiologies of leukemia: a HuGE review and meta-analysis of CYP1A1 polymorphisms and leukemia risk. Am. J. Epidemiol. 178(4), 493507 (2013).
Article PubMed Google Scholar
Shi, H., Zhang, C. J., Chen, G. Y. & Yao, S. Q. Cell-based proteome profiling of potential dasatinib targets by use of affinity-based probes. J. Am. Chem. Soc. 134(6), 30013014 (2012).
Article CAS PubMed Google Scholar
Pinnell, N. et al. The PIAS-like coactivator Zmiz1 Is a direct and selective cofactor of notch1 in T cell development and Leukemia. Immunity. 43(5), 870883 (2015).
Article CAS PubMed PubMed Central Google Scholar
Huang, S., Li, D., Zhuang, L., Sun, L. & Wu, J. Identification of Arp2/3 complex subunits as prognostic biomarkers for hepatocellular carcinoma. Front. Mol. Biosci. 8, 690151 (2021).
Article CAS PubMed PubMed Central ADS Google Scholar
Huang, R. & Zhou, P. K. DNA damage repair: Historical perspectives, mechanistic pathways and clinical translation for targeted cancer therapy. Signal Transduct. Target Ther. 6(1), 254 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kennedy, V. E. & Smith, C. C. FLT3 mutations in acute myeloid Leukemia: Key concepts and emerging controversies. Front Oncol. 10, 612880 (2020).
Machine Learning Market Expected to Hit $208 Billion by 2028 – Analytics Insight
Machine Learning Market Prediction: Machine learning, a subset of artificial intelligence, empowers computers to acquire knowledge from data and algorithms without the need for direct programming. Its applications span diverse industries, including healthcare, retail, finance, manufacturing, and media. The Machine Learning market size was valued at US$41.03 billion in revenue in 2023 and is anticipated to reach US$208.16 Billion by 2028, with a CAGR of 38.38% over the forecast period. This remarkable growth fueled by various factors is reshaping industries and driving the adoption of ML technologies.
The abundance of data, coupled with advancements in data quality, is a cornerstone for the growth of the ML market. Access to diverse and high-quality datasets empowers ML models to glean valuable insights, resulting in more accurate and effective outcomes. Industries across the spectrum are leveraging this wealth of information to make informed decisions and enhance their operations.
Industries grappling with challenges such as rising costs, inefficiencies, and inequalities are turning to ML for bespoke solutions. The adaptability of ML models allows them to be tailored to specific needs, offering innovative solutions to longstanding problems. As businesses increasingly seek efficiency gains and competitive advantages, ML becomes a critical tool in their arsenal.
The surge in ML adoption is closely linked to the widespread adoption of cloud and edge computing. These technologies provide the necessary infrastructure and scalability for deploying and running ML models. Cloud and edge computing enable businesses to harness the power of ML without the need for extensive on-premises hardware, facilitating seamless integration and operation.
Ongoing research and development in ML technology, particularly in areas such as natural language processing, deep learning, and speech synthesis, are enhancing the performance and capabilities of ML models. These advancements are driving the development of more sophisticated and versatile applications, expanding the potential use cases for ML across various domains.
The exponential growth in data usage and ML applications raises concerns about privacy and security. The potential exposure of sensitive and personal data to hackers and malicious actors poses a significant threat. Striking a balance between the benefits of ML and safeguarding user and business data is a crucial challenge that the industry must address to ensure sustained growth.
The success of ML applications hinges on user and stakeholder trust. Lack of transparency in ML algorithms can lead to skepticism and hinder widespread acceptance, especially in critical sectors like healthcare and finance. Establishing clear guidelines and fostering transparency is paramount to overcoming this challenge and ensuring the responsible deployment of ML technologies.
The shortage of skilled professionals proficient in designing, developing, and maintaining ML systems and applications is a bottleneck for the industry. As the demand for ML expertise skyrockets, addressing this skills gap becomes crucial for sustained growth. Educational initiatives, upskilling programs, and industry collaborations are essential to cultivating a robust talent pool.
The ethical use of ML is an ongoing concern, with issues such as bias, discrimination, and accountability coming to the forefront. Striking a balance between innovation and responsible deployment is essential to mitigate these ethical challenges. Establishing ethical frameworks and guidelines can help guide the development and implementation of ML technologies in a socially responsible manner.
The machine learning market forecast is indicative of its transformative impact on industries worldwide. The convergence of factors such as data availability, demand for innovation, cloud and edge computing, and R&D advancements propels the industry forward. However, addressing challenges like privacy concerns, building trust, bridging the skills gap, and navigating ethical dilemmas is crucial for sustained and responsible growth. As the machine learning landscape continues to evolve, stakeholders must work collaboratively to harness its potential while ensuring ethical and responsible deployment.
Join our WhatsApp and Telegram Community to Get Regular Top Tech Updates
Go here to see the original:
Machine Learning Market Expected to Hit $208 Billion by 2028 - Analytics Insight
Cutting-Edge Technology Safeguards Apple Quality: Hyperspectral Imaging and Machine Learning to Combat Codling … – Spectroscopy Online
In a new technology effort to tackle postharvest losses caused by invasive pests, researchers at the University of Kentucky, led by Alfadhl Y. Khaled, Nader Ekramirad, Kevin D. Donohue, et al., have unveiled a research study utilizing non-destructive hyperspectral imaging and machine learning to predict and manage the physicochemical quality attributes of apples during storage, specifically addressing the impact of codling moth infestation. The study, titled "Non-Destructive Hyperspectral Imaging and Machine Learning-Based Predictive Models for Physicochemical Quality Attributes of Apples during Storage as Affected by Codling Moth," was published in the journal Agriculture (Volume 13, Issue 5) (1).
As the demand for high-quality apples persists globally, challenges arise in preserving fruit quality during long-term storage, especially in the face of invasive pests such as the codling moth (CM). This study focused on Gala apples, evaluating their firmness, pH, moisture content (MC), and soluble solids content (SSC) under different storage conditions.
The research employed near-infrared hyperspectral imaging (HSI) and machine learning models, utilizing partial least squares regression (PLSR) and support vector regression (SVR) methods. Data preprocessing involved SavitzkyGolay smoothing filters and standard normal variate (SNV), followed by outlier removal using the Monte Carlo sampling method. The study revealed significant effects of CM infestation on near-infrared (NIR) spectra, showcasing the potential impact of pests on apple quality.
Results indicated highly accurate predictive models for apple quality attributes during storage at different temperatures (0 C, 4 C, and 10 C), with maximum correlation coefficients of prediction (Rp) reaching 0.97 for pH, 0.95 for firmness, 0.92 for SSC, and 0.91 for MC. Additionally, the study employed the competitive adaptive reweighted sampling (CARS) method to extract effective wavelengths, enhancing real-time prediction capabilities (1).
The multispectral models derived from this approach demonstrated superior performance compared to full-wavelength HSI models, showcasing the potential for fast, real-time prediction of apple quality characteristics (1).
This new study opens avenues for the development of non-destructive monitoring and evaluation systems, offering valuable insights for the apple industry to combat postharvest losses and ensure the delivery of high-quality produce to consumers.
(1) Khaled, A. Y.; Ekramirad, N.; Donohue, K. D., et al. Non-Destructive Hyperspectral Imaging and Machine Learning-Based Predictive Models for Physicochemical Quality Attributes of Apples during Storage as Affected by Codling Moth. Agriculture 2023, 13 (5), 1086. DOI: 10.3390/agriculture13051086
Transfer learning: Everything you need to know about the ML process – Android Police
Artificial intelligence has begun to mirror a fundamental human skill: transfer learning. This approach is inspired by our cognitive abilities and leverages knowledge acquired in one task to advance in other domains. Just as humans use language to share and build upon their knowledge, artificial intelligence follows a similar path by applying insights from one dataset or problem to another. This article looks at what transfer learning is, how it works, why and when it should be used, and its benefits.
Transfer learning is a powerful technique in machine learning (ML) where a model, initially trained for a specific task, is repurposed for a new, yet related, task. This approach capitalizes on the knowledge and patterns the model acquired. Transfer learning applies insights from a task with abundant data to a new task where data is scarce.
For example, someone who speaks Spanish, a Romance language, generally finds it easier to learn other languages in the same family, like Italian or French. This ease comes from the shared vocabulary, grammar, and structure. Similarly, in AI, a neural network trained to recognize faces in photos can be modified for tasks like recognizing emotions. The network's fundamental understanding of facial features helps it notice small changes in expressions.
Source:Robotic Automation Expert (RAX)
Transfer learning is a valuable technique in machine learning. It's beneficial in scenarios such as data scarcity, time constraints, computational limitations, domain similarity, enhanced generalization, and rapid prototyping. When data is scarce, using a pre-trained model avoids overfitting, often accompanying models trained from scratch. This approach uses the knowledge acquired by these models, improving accuracy.
Transfer learning is also a practical and efficient solution when time and computational resources are limited. It reduces the extensive training periods and computational power as it builds upon pre-existing knowledge bases. By transferring relevant knowledge and patterns between the source and target tasks, this method allows for better generalization to new, unknown data. Furthermore, transfer learning facilitates rapid prototyping, allowing quicker development and deployment of models.
For example, consider a language model like GPT (Generative Pre-trained Transformer), which has been trained on large amounts of text data from the internet. Suppose you want to create a chatbot specializing in medical advice despite the general nature of the GPT's training. In that case, fine-tune this model on a smaller, specialized dataset of medical dialogues and literature.
By doing this, you transfer the general language understanding capabilities of the GPT model and adapt it to the specific context of medical communication. You can leverage the extensive learning of the base model by adjusting the base model to your needs with a relatively small amount of specialized data.
Transfer learning involves essential steps, including finding pre-trained models, freezing layers, training new layers, and fine-tuning the model. Let's explore each of these steps in detail.
The first step is to find a pre-trained model. Organizations might source these models from their collections or open source repositories like PyTorch Hub or TensorFlow Hub. These platforms offer a range of pre-trained models suitable for tasks like image classification, text embeddings, and more.
Deep neural networks are organized in a hierarchical layer structure, each layer serving a distinct role in data processing. The inner layers detect basic features like edges and colors, fundamental in tasks like animal shape recognition. Middle layers increase in complexity, combining these simple patterns to form intricate structures, such as identifying animal fur patterns.
The latter layers are where the network's complex learning occurs, focusing on high-level, task-specific features like distinguishing between animal species. This layered architecture is crucial in transfer learning, where inner and middle layers often retain their learned features for general applicability. In contrast, the latter layers are retrained for specific new tasks.
In transfer learning, the inner and middle layers of the pre-trained model are often frozen, meaning it retains the learned features (like recognizing basic shapes in image recognition tasks) from the original training, which are generally applicable to the new task.
After the appropriate layers have been identified and frozen, the next step involves augmenting the pre-trained model with new layers tailored to the task. These added layers bridge the pre-existing knowledge within the frozen layers and the nuances of the new dataset.
Training these new layers involves exposing the model to the new dataset, where it learns to adjust its internal parameters, weights, and biases based on the input data and the desired output. Through iterations and adjustments, the model fine-tunes itself to optimize its performance on the specific task.
Although not always necessary, fine-tuning can enhance model performance. This involves unfreezing some layers and retraining them at a low learning rate on the new dataset. It allows the model to adjust more finely to the specificities of the new task. The aim is to achieve superior performance in the targeted domain.
In practice, the decision on which layers to freeze or train is based on the level of feature similarity between the pre-trained model and the new task.
For example, consider a neural network trained for general object recognition. It can identify cars, trees, animals, and other objects. If we want to adapt this network for a more specific task, like recognizing different types of birds, we can freeze the inner and middle layers. These layers, which have learned to detect edges, colors, and basic shapes, are helpful for any image recognition task, including birds.
The latter layers, which are specialized for recognizing an array of objects, aren't as effective for the specific task of bird classification. Therefore, we would retrain these layers on a bird-specific dataset, allowing the network to develop the high-level understanding necessary for distinguishing different bird species.
Transfer learning is a versatile technology with applications in various industries. Let's explore where it can be used.
Transfer learning is necessary in improving machine learning models for NLP tasks. It empowers models to detect and understand language elements, dialects, phrases, and vocabulary.
In computer vision, transfer learning takes pre-trained models and repurposes them for tasks involving smaller datasets or specific image features. It's handy for tasks such as object detection, where models can leverage the knowledge of identifying common objects or image structures.
Transfer learning has become indispensable in deep learning and neural networks. Training complex neural networks demands substantial computational resources and time. Transfer learning alleviates this burden by transferring useful features from one network to another, making it an efficient approach for model development. These transfer learning techniques find practical application in various industries, such as:
Transfer learning is a shortcut for AI that changes how we teach machines to be more intelligent. It makes AI more effective in understanding human behavior, which means better Health and Fitness apps, self-driving cars, AI-ready smartphones, and shopping experiences. In the words of Mark Van Doren, "The art of teaching is the art of assisting discovery." Now, AI is doing both teaching and discovering for us.
Read the original post:
Transfer learning: Everything you need to know about the ML process - Android Police
Measuring CO2 with Machine Learning The Independent – The Indy Online
Artificial intelligence seems almost inescapable in today's increasingly technology driven world.
Deep learning models, such as OpenAIs Chat GPT, have been at the forefront of public amazement and controversy since their mainstream introduction in late 2022.
Today, Fort Lewis College students are discovering new ways that artificial intelligence can be used to reduce the costs of studying the environment.
Lincoln Scheer, a third-year computer engineering student, said he is using machine learning to measure carbon dioxide levels in areas affected by wildfires.
While one goal of this project is to map carbon dioxide levels, the project also seeks to reduce the cost necessary for environmental science, he said.
"It's really important that we lower the costs for these sensors, he said. We need lower cost tools, because a lot of these communities don't have the funding.
So what is the price difference between these tools? Scheer says the $30,000 machines typically used in this study could eventually be replaced by inexpensive alternatives that cost $60.
Scheer said the inexpensive sensors are less accurate than their thousand dollar counterparts, but can be calibrated with AI to match the results of high-end equipment.
Dr. Joanna Casey, assistant professor of physics and engineering, agrees with the necessity for inexpensive alternatives.
According to the World Health Organization, 7 million people die premature deaths due to air pollution, Casey said.
Having low-cost tools to measure air quality and levels of pollution can help people understand and minimize their exposure, and have lower and less health consequences, she said.
And for Durango, an area affected by wildfire smoke, students have a perfect testing ground, Scheer said.
While Scheers project is about a years time from completion, he is currently working to collect wildfire data, such as at the recent Perins Peak fire, he said.
However, this process of machine learning is slightly different from deep learning language models, such as the previously mentioned ChatGPT.
Anders Ladow, a third year computer engineering major and recent AI collaborator with Scheer, said that machine learning models require human intervention.
You have to define exactly what the machine learning algorithm is doing, he said. What you give to it to analyze has to be really specific, and the algorithm can't make any changes to that data that you're feeding to the model.
The main difference between deep learning models, like ChatGPT, and Scheers machine learning project is that deep learning models can actively change the data sets it has been fed, Ladow said.
Despite these differences, both models are very useful for data extraction, Ladow said.
Additionally, Casey said that air quality sensing systems using machine learning have already entered the market.
We're standing on the shoulders of giants, Casey said. What we're able to do now is move into more complex problems that would be difficult to model or understand without these tools.
Some of these problems that artificial intelligence could assist with are analyzing complex visual data, such as analyzing security footage, Ladow said.
While tangible effects of artificial intelligence are likely a few years away, projects like Scheers highlight the capabilities of machine learning.
See the rest here:
Measuring CO2 with Machine Learning The Independent - The Indy Online
A computer vision and machine learning system that monitors and controls workup processes – Phys.org
This article has been reviewed according to ScienceX's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:
fact-checked
peer-reviewed publication
trusted source
proofread
close
A team of chemists and engineers at the University of British Columba working with colleagues at pharmaceutical company Pfizer has developed a chemical processing system combining computer vision with a real-time machine-learning monitoring system for use in conducting chemical workup processes. Their paper is published in the journal Chemical Science.
In chemistry, workup processes are activities conducted to isolate a pure product through selective separation from other components. It is often tedious, which, besides being unpleasant, leads to mistakes or omissions. In this new effort, the research team has attempted to automate the process by combining computer vision with real-time monitoring techniques, a machine-learning system and computer processing, along with appropriate hardware, to carry out a workup process without assistance from human chemists.
The system developed by the team, called Heinsight2.0, as its name suggests, builds on knowledge learned from its predecessor, Heinsight1.0. Its components include a webcam (either overhead or side-mounted), reaction vessel, dosing unit, temperature probe and overhead stirrer. It also has a secondary device that allows for displaying iControl, real-time reaction trends, EasyMax and CV model output.
The system works by monitoring a workup process and controlling it by sending signals at appropriate times to direct the action as it happens. The system controls the action by responding as a chemist would as events unfold. If a material changes from one desired color to another, for example, the system can recognize that and use it as a cue to instigate a follow-up action.
The researchers note that, like a human chemist, the system is capable of monitoring multiple sensory cues and responding to them in desired ways. It can also operate under many types of scenarios, such as those involving the use of solid-liquid mixing, crystallizations, exchange distillations and liquid-to-liquid extraction.
They also note that they have made the program script publicly available, which means other chemists could build their own units and then use the code to run their systems in the same way. They also plan to continue work on their system to give it more capabilities.
More information: Rama El-khawaldeh et al, Keeping an "eye" on the experiment: computer vision for real-time monitoring and control, Chemical Science (2023). DOI: 10.1039/D3SC05491H
Journal information: Chemical Science
2024 Science X Network
See original here:
A computer vision and machine learning system that monitors and controls workup processes - Phys.org
DOD’s cutting-edge research in AI and ML to improve patient care – DefenseScoop
The Defense Departments responsibility to its active and veteran service members extends to their health and well-being. One organization driving innovation for patient care is the DODs Uniformed Services University. And within the university is a center known as the Surgical Critical Care Initiative, SC2i a consortium of federal and non-federal research institutions.
In a recent panel discussion with DefenseScoop, Dr. Seth Schobel, scientific director for SC2i, shared how cutting-edge research in artificial intelligence and machine learning improves patient care. Schobel elaborated on one specific tool called the WounDx Clinical Decision Support Tool which predicts the best time for surgeons to close extremity wounds.
[These wounds] are actually one of the most common combat casualty injuries experienced by our warfighters. We believe the use of these tools will allow military physicians to close most wounds faster, and it has the potential to save costs and avoid wound infections and other complications. We believe by using this tool well increase the success rate of military surgeons on closing these wounds at first attempt [improving rates] from 72% to 88% of the time, he explained.
Uniformed Services Universitys Chief Technology and Senior Information Security Officer, Sean Baker, joined Schobel on the panel to elaborate on how when IT and medical research teams work together, they can drive better health outcomes in patient care.
Overall, our job is to provide cutting-edge tools into the hands of clinical experts, recognizing that risk management does not mean risk avoidance. Clinical care is not going to advance without taking some measure of digital risks, he explained.
Baker added, We need to continue to empower our users across the healthcare space, across government, to use these emerging capabilities in a risk-informed way to take this into the next level of education, of research, of care delivery.
Schobel and Baker both underlined AI and MLs disruptive potential to positively improve patient care in the near future.
We need to be ready for this [disruptor] by understanding how these tools are built and how they apply in different clinical settings. This will dramatically improve a data-driven and evidence-based healthcare system, Schobel explained. By embracing these considerations, the public health sector, as well as the military, can harness the power of AI and ML to enhance patient care and improve health outcomes, and really be at the forefront of that transformation for the future of healthcare.
Googles Francisco Rubio-Bertrand, who manages federal healthcare client business, reacted to the panel interview, saying: We believe that Google, by leveraging its vast resources and expertise, can be a driving force in advancing research and healthcare. Through access to our powerful cloud computing platforms and extensive datasets, we can significantly accelerate the development of AI/ML models specifically designed to address pressing needs in the healthcare sector.
Watch the full discussion to learn more about driving better patient care and health outcomes with artificial intelligence and machine learning.
This video panel discussion was produced by Scoop News Group for DefenseScoop, and underwritten by Google for Government.
More:
DOD's cutting-edge research in AI and ML to improve patient care - DefenseScoop
Can Artificial Intelligence assist with cybersecurity management? | Womble Bond Dickinson – JDSupra – JD Supra
AI has great capability to bothharm and toprotect in a cybersecurity context. As with the development of any new technology, the benefits provided through correct and successful use of AI are inevitably coupled with the need to safeguard information and to prevent misuse.
ENISA published a set of reports earlier last year focused on AI and the mitigation of cybersecurity risks.Here we consider the main themes raised and provide our thoughts on how AI can be used advantageously*.
Using AI to bolster cybersecurity
In Womble Bond Dickinson's 2023 global data privacy law survey, half of respondents told us they were already using AI for everyday business activities ranging from data analytics to customer service assistance and product recommendations and more.However, alongside day-to-day tasks, AI's 'ability to detect and respond to cyber threats and the need to secure AI-based application'makes it a powerful tool to defend against cyber-attacks when utilized correctly.In one report, ENISA recommended a multi-layered framework which guides readers on the operational processes to be followed by coupling existing knowledge with best practices to identify missing elements. The step-by-step approach for good practice looks to ensure the trustworthiness of cybersecurity systems.
Utilizing machine-learning algorithms, AI is able to detect both known and unknown threats in real time, continuously learning and scanning for potential threats. Cybersecurity software which does not utilize AI can only detect known malicious codes, making it insufficient against more sophisticated threats. By analyzing the behavior of malware, AI can pin-point specific anomalies that standard cybersecurity programs may overlook. Deep-learning based programNeuFuzz is considered a highly favorable platform for vulnerability searches in comparison to standard machine learning AI, demonstrating the rapidly evolving nature of AI itself and the products offered.
A key recommendation is that AI systems should be used as an additional element to existing ICT, security systems and practices. Businesses must be aware of the continuous responsibility to have effective risk management in place with AI assisting alongside for further mitigation. The reports do not set new standards or legislative perimeters but instead emphasize the need for targeted guidelines, best practices and foundations which help cybersecurity and in turn, the trustworthiness of AI as a tool.
Amongst other factors, cybersecurity management should consider accountability, accuracy, privacy, resiliency, safety and transparency. It is not enough to rely on traditional cybersecurity software especially where AI can be readily implemented for prevention, detection and mitigation of threats such as spam, intrusion and malware detection. Traditional models do exist, but as ENISA highlights they are usually designed to target or 'address specific types of attack' which, 'makes it increasingly difficult for users to determine which are most appropriate for them to adopt/implement.'The report highlights that businesses need to have a pre-existing foundation of cybersecurity processes which AI can work alongside to reveal additional vulnerabilities. A collaborative network of traditional methods and new AI based recommendations allow businesses to be best prepared against the ever-developing nature of malware and technology based threats.
In the US in October 2023, the Biden administration issued an executive order with significant data security implications. Amongst other things, the executive order requires that developers of the most powerful AI systems share safety test results with the US government, that the government will prepare guidance for content authentication and watermarking to clearly label AI-generated content and that the administration will establish an advanced cybersecurity program to develop AI tools and fix vulnerabilities in critical AI models. This order is the latest in a series of AI regulations designed to make models developed in the US more trustworthy and secure.
Implementing security by design
A security by design approach centers efforts around security protocols from the basic building blocks of IT infrastructure. Privacy-enhancing technologies, including AI, assist security by design structures and effectively allow businesses to integrate necessary safeguards for the protection of data and processing activity, but should not be considered as a 'silver bullet' to meet all requirements under data protection compliance.
This will be most effective for start-ups and businesses in the initial stages of developing or implementing their cybersecurity procedures, as conceiving a project built around security by design will take less effort than adding security to an existing one. However, we are seeing rapid growth in the number of businesses using AI. More than one in five of our survey respondents (22%), for instance, started to use AI in the past year alone.
However, existing structures should not be overlooked and the addition of AI into current cybersecurity system should improve functionality, processing and performance. This is evidenced by AI's capability to analyze huge amounts of data at speed to provide a clear, granular assessment of key performance metrics. This high-level, high-speed analysis allows businesses to offer tailored products and improved accessibility, resulting in a smoother retail experience for consumers.
Risks
Despite the benefits, AI is by no-means a perfect solution. Machine-learning AI will act on what it has been told under its programming, leaving the potential for its results to reflect an unconscious bias in its interpretation of data. It is also important that businesses comply with regulations (where applicable) such as the EU GDPR, Data Protection Act 2018, the anticipated Artificial Intelligence Act and general consumer duty principles.
Cost benefits
Alongside reducing the cost of reputational damage from cybersecurity incidents, it is estimated that UK businesses who use some form of AI in their cybersecurity management reduced costs related to data breaches by 1.6m on average.Using AI or automated responses within cybersecurity systems was also found to have shortened the average breach lifecycle by 108 days, saving time, cost and significant business resource. Further development of penetration testing tools which specifically focus on AI is required to explore vulnerabilities and assess behaviors, which is particularly important where personal data is involved as a company's integrity and confidentiality is at risk.
Moving forward
AI can be used to our advantage but it should not been seen to entirely replace existing or traditional models to manage cybersecurity. While AI is an excellent long-term assistant to save users time and money, it cannot be relied upon alone to make decisions directly. In this transitional period from more traditional systems, it is important to have a secure IT foundation. As WBD suggests in our 2023 report, having established governance frameworks and controls for the use of AI tools is critical for data protection compliance and an effective cybersecurity framework.
Despite suggestions that AI's reputation is degrading, it is a powerful and evolving tool which could not only improve your business' approach to cybersecurity and privacy but with an analysis of data, could help to consider behaviors and predict trends. The use of AI should be exercised with caution, but if done correctly could have immeasurable benefits.
___
* While a portion of ENISA's commentary is focused around the medical and energy sectors, the principles are relevant to all sectors.
[View source.]
See more here:
Can Artificial Intelligence assist with cybersecurity management? | Womble Bond Dickinson - JDSupra - JD Supra