Page 23«..1020..22232425..3040..»

Quantum Machine Learning: Exploring the Intersection of New Frontiers – DataScientest

There are already several services available for quantum Machine Learning. IBM offers its Q Experience: an online platform for accessing various quantum processor prototypes via the Cloud. The service includes a circuit composer, and supports Python and Qiskit.

For its part, Rigetti Computing offers the Forest tool suite entirely dedicated to quantum computing. It includes a programming language and development tools.

Finally, Canadian startup Xanadu provides cloud access to a photonic quantum processor. This can handle chips with 8, 12 or 24 qubits.

Quantum Machine Learning is still in its infancy. However, a number of applications are already proving highly successful, and this disruptive technology should open up new opportunities in the future

Originally posted here:
Quantum Machine Learning: Exploring the Intersection of New Frontiers - DataScientest

Read More..

Wall Street’s Favorite Machine Learning Stocks? 3 Names That Could Make You Filthy Rich – InvestorPlace

Machine learning stocks receive a lot of love in 2024

Source: a-image / Shutterstock.com

United States equities are on the rise again in 2024. TheS&P 500and Nasdaq have appreciated 7.2% and 7.4%, respectively. While stocks may be back on the rise, equities investors may want to reconsider putting money in innovative companies. Given the traction AI-related technology companies got last year, machine learning stocks may also receive a lot of love in 2024.

Machine learning (ML)is a branch of artificial intelligence (AI) that enables computers to learn from data and experience without explicit programming. Over the past decade, the technology has also garnered attention for its numerous applications. ML has also received positive attention from Wall Street. Below are three machine learning stocks that could make investors rich in the long-term.

Source: rafapress / Shutterstock.com

UiPath(NYSE:PATH) creates and implements software allowing customers to automate various business processes using robotic process automation (RPA) and artificial intelligence.

TheUiPath Business Automation Platformenables employees to quickly build automations for both existing and new processes by using software robots to perform a myriad of repetitive tasks. These range from simply logging into applications or moving folders to extracting information from documents along with updating information fields and databases. UiPath also provides a number ofturnkey automation solutions, allowing the company to target customers in a variety of industries including banking, healthcare and manufacturing.

Last year, shares of PATH almost doubled. Since the start of the new year, there has been pullback from all the major indices and, of course, UiPath, at its frothy valuation, saw some selling pressure. The companys share price has fallen 7% YTD. Selling pressure has continued slightly after weaker-than-expected guidance in UiPaths Q4 2023 earnings report. Outside of guidance, the company beat both revenue and earnings estimates. Q4 revenue increased 31% YOY to $405 million, and annual recurring revenue increased 22% to $1.5 billion. The company also achieved its first quarter of GAAP profitability as a public company in the fourth quarter.

Strong financial figures, despite weaker-than-expected guidance, could make UiPath a strong performer in 2024.

Source: JHVEPhoto / Shutterstock.com

Its hard to make a machine learning list without listing a semiconductor name, since semiconductors help machine learning programs to work the way they do. Advanced Micro Devices (NASDAQ:AMD) has built a number of advanced hardware for gaming and other computing applications. AMDs Radeon GPUs nowadays support RDNA 3 architecture-based GPUs for desktop-level AI and machine learning workflows.

2024 will be a big year for AMD in terms of AI and ML computing. The chipmaker announced the MI300x GPU chipset almost a year ago in its second quarter 2023 earnings report. To follow that up, in the third-quarter earnings report, AMD announced itexpects to sell $2 billion in AI chips next year. Because these AI chips arestillin high demand in North America, Europe and Asia, AMD will likely reap a significant profit upon entering the space.

Wall Street, notably, is loving AMDs stock. Wall Street firms have recently begun to boost their target prices for the chipmaker. The investment bank Jefferiesraisedtheir target price for AMD to $200/share from $130/share. JPMorgan, Goldman Sachs, Baird and a host of other investment banksalso made significant increases to their target pricesin late January 2024. Moreover, Japanese bank Mizuho Securities has recently raised its target price for $200/share to $235/share.

Source: Mamun sheikh K / Shutterstock.com

Last on our list of machine learning stocks is Palantir Technologies(NYSE:PLTR). Palantir has received a lot of love from some on Wall Street and a number of retail investors. Shares have risen 37% YTD. For those who dont know, Palantir initially focused on serving the defense and intelligence sectors but has since expanded its customer base to include various industries such as healthcare, energy and finance. The company provides a number of AI and ML-based data analytics tools for a number of businesses.

Most recently, Palantir has enjoyed a lot of attention due to its new AI Platform (AIP). AIP candeploycommercial and open-source large language models onto internally held data sets and, from there, recommend business processes and actions. Although I think Palantir has become too overvalued based on many believing its a fully-grown AI company when its just in the beginning, the company certainly has the potential to make investors money in the long-term.

On the date of publication, Tyrik Torresdid not have (either directly or indirectly) any positions in the securities mentioned in this article.The opinions expressed in this article are those of the writer, subject to the InvestorPlace.comPublishing Guidelines.

Tyrik Torres has been studying and participating in financial markets since he was in college, and he has particular passion for helping people understand complex systems. His areas of expertise are semiconductor and enterprise software equities. He has work experience in both investing (public and private markets) and investment banking.

See more here:
Wall Street's Favorite Machine Learning Stocks? 3 Names That Could Make You Filthy Rich - InvestorPlace

Read More..

GE HealthCare and Hartford renew imaging agreement around AI and machine learning – DOTmed HealthCare Business News

Hartford HealthCare HealthCenter - Southington (Photo courtesy of Hartford HealthCare)

The collaboration dates back to 2016 and includes AI and machine learning software deployments to enhance clinical expertise as well as upgrades through a phased approach of Hartford HealthCares CT, PET/CT, MR, X-ray, nuclear medicine, mammography, ultrasound, and OEC 3D surgical imaging C-arm solutions. GE HealthCare will also provide its most recent patient monitoring, anesthesia, maternal infant care, and diagnostic cardiology technologies.

As part of the agreement, GE HealthCare technicians will be available in-house for repairs and maintenance, and regular upgrades will be performed as well as build-in-place upgrades with some existing MR, CT, PET/CT, and X-rays to refresh older systems to minimize construction costs, waste, equipment downtime, and disruptions to patient care.

This is especially important now, as technologies, equipment, and training are advancing at an ever-increasing pace, said Karen Goyette, executive vice president and chief strategy and transformation officer, in a statement.

Hartford HealthCare is made up of nearly 500 locations, including two tertiary-level teaching hospitals, an acute-care community teaching hospital, an acute-care hospital and trauma center, three community hospitals, a behavioral health network, a multispecialty physician group, a clinical care organization, a regional home care system, an array of senior care services, a mobile neighborhood health program and a comprehensive physical therapy and rehabilitation network. It serves 185 towns and cities.

Many of the software and AI solutions will be deployed within various imaging modalities to accelerate speed of use and improve accuracy, including:

X-ray GE HealthCares Critical Care Suite 2.0 will assess scans for signs of critical conditions, such as collapsed lungs or errors in chest X-ray acquisition, using AI-powered insights and analytics and provide feedback to ICU clinicians to help expedite diagnosis, optimize treatment decisions, and improve patient outcomes.

CT GE HealthCares TrueFidelity CT image-reconstruction technology is powered by a deep neural network that improves reading confidence for head, whole-body, cardiovascular, and other anatomical applications for patients of all ages.

MR Using AI, AIR Recon DL technology reconstructs MR images, improving the quality, speed, and workflow of the scanning process by reducing artifacts, increasing clarity, and facilitating faster acquisitions. This, in turn, improves patient comfort.

As part of the initial collaboration, the jointly created Care Logistics Center, formed in 2017, will match patients based on their needs with the best care regimens.

The renewal extends the collaboration to 2030.

See the original post here:
GE HealthCare and Hartford renew imaging agreement around AI and machine learning - DOTmed HealthCare Business News

Read More..

EASA Discusses Autonomous Operations in New Artificial Intelligence Paper – Inside Unmanned Systems

The European Union Aviation Safety Agency (EASA) has published Issue 2 of its Concept Paper on Artificial Intelligence (AI) and Machine Learning (ML).

AI is being adopted widely and rapidly, including in the aviation domain, with its development significantly accelerating in the last decade due to an rising capacity to collect and store massive amounts of data. Increasing computing power and the development of more and more potent algorithms and architectures are also playing a role, affecting aviation products, services and business plans.

In its new Artificial Intelligence Concept Paper Issue 02 Guidance for Level 1 & 2 machine learning applications, EASA lays out a number of autonomous operations-related scenarios that are likely to become relevant in the near future. To cite an example, the report describes an ongoing innovation partnership contract (IPC) between Boeing and EASA involving an experimental auto-taxi system.

As currently envisaged, the system would receive, via standard radio communication, taxi clearance from ground control, provide a readback of the clearance, and plan an appropriate ground taxiing route based on that clearance. The system then executes the plan and autonomously controls the aircraft as it travels from one location to another at an airfield, such as from the boarding gate to the departure runway. While executing the plan, the system detects potential obstacles in the aircrafts path to which it can then react accordingly. The system employs a LIDAR system for the detection of obstacles. Optical cameras can also be added to the sensor array for object classification, to support improved awareness and intent prediction capabilities for objects and people in the environment. System operations are monitored by the flight crew, who retain the ability to override and disconnect the system at any time.

More widely, the newConcept Paperfocuses on strengthening four aviation pillars safety, efficiency, sustainability, and passenger experience while positioning ML at the forefront of aviation innovation. EASA acknowledges that the path to ML deployment is bringing unique challenges, particularly in terms of safeguarding operational safety.

The Concept Paper refines EASA guidance for Level 1 AI applications, i.e. those enhancing human capabilities, while broadening the discussion on topics such as learning assurance, AI explainability and ethics-based assessment. It also provides comprehensive guidance for the development and deployment of Level 2 AI-based systems. Level 2 AI includes the groundbreaking concept of human-AI teaming (HAT), setting the stage for AI systems that automatically make decisions under human oversight.

With the paper, EASA highlights its commitment to a future where AI and ML are fully integrated into aviation systems, while emphasizing the building of trust in AI applications, ensuring they complement human expertise and enhance overall aviation safety and sustainability.

As an independent and neutral body, EASA works to ensure confidence in safe air operations in Europe and world-wide, proposing and formulating rules, standards and guidance, certifying aircraft, parts, and equipment; and endorsing and overseeing organizations in all aviation domains.

Read the original here:
EASA Discusses Autonomous Operations in New Artificial Intelligence Paper - Inside Unmanned Systems

Read More..

Self-supervised learning: What is it? How does it work? – DataScientest

In the case of Natural Language Processing (NLP), we use self-supervised learning to train the model on sentences from which words have been randomly omitted. It must then predict these removed words.

This method, applied to NLP, has proved effective and highly relevant. For example, the wav2vec and BERT models developed respectively by Facebook and Google AI are among the most revolutionary in NLP. Wav2vec has proved its worth in the field of Automatic Speech Recognition (ASR).

In this way, certain parts of audios are masked and the model is trained to predict these parts. BERT, an acronym for Bidirectional Encoder Representations from Transformers, is a Deep Learning model that currently offers the best results for most NLP tasks.

Unlike previous models, which scan text one-dimensionally to predict the next word, the BERT algorithm hides words randomly in the sentence and tries to predict them. To do this, it uses the full context of the sentence, both left and right.

Read more:
Self-supervised learning: What is it? How does it work? - DataScientest

Read More..

Benchmarking machine learning and parametric methods for genomic prediction of feed efficiency-related traits in … – Nature.com

The FE-related traits and genomic information were obtained for 1,156 animals from an experimental breeding program at the Beef Cattle Research Center (Institute of Animal Science IZ).

Animals were from an experimental breeding program at the Beef Cattle Research Center at the Institute of Animal Science (IZ) in Sertozinho, So Paulo, Brazil. Since the 1980s, the experimental station has maintained three selection herds: Nellore control (NeC) with animals selected for yearling body weight (YBW) with a selection differential close to zero, within birth year and herd, while Nellore Selection (NeS) and Nellore Traditional (NeT) animals are selected for the YBW with a maximum selection differential, also within birth year and herd25. In the NeT herd, sires from commercial herds or NeS eventually were used in the breeding season, while the NeC and NeS were closed herds (only sires from the same herd were used in the breeding season), with controlled inbreeding rate by planned matings. In addition, the NeT herd has been selected for lower residual feed intake (RFI) since 2013. In the three herds, the animal selection is based on YBW measured at 378days of age in young bulls.

The FE-related traits were evaluated on 1156 animals born between 2004 and 2015 in a feeding efficiency trial, in which they were either housed in individual pens (683 animals) or group pens equipped with the GrowSafe feeding system (473 animals), with animals grouped by sex. From those, 146 animals were from the NeC herd (104 young bulls and 42 heifers), 300 from the NeS herd (214 young bulls and 86 heifers), and 710 from the NeT herd (483 young bulls and 227 heifers). Both feeding trials comprised at least 21 days for adaptation to the feedlot diet and management and at least 56 days for the data collection period. The young bull and heifers showed an average age at the end of the feeding trial was 36627.5 and 38445.4 days, respectively.

A total of 780 animals were genotyped with the Illumina BovineHD BeadChip assay (770k, Illumina Inc., San Diego, CA, USA), while 376 animals were genotyped with the GeneSeek Genomic Profiler (GGP Indicus HD, 77K). The animals genotyped with the GGP chip were imputed to the HD panel using FImpute v.326 with an expected accuracy higher than 0.97. Autosomal SNP markers with a minor allele frequency (MAF) lower than 0.10 and a significant deviation from HardyWeinberg equilibrium (P105) were removed, and markers and samples with call rate lower than 0.95 were also removed. An MAF lower than 10% was used to remove genetic markers with lower significance and noise information in a stratified population. After this quality control procedure, genotypes from 1,024 animals and 305,128 SNP markers remained for GS analyses. Population substructure was evaluated using a principal component analysis (PCA) based on the genomic relationship matrix using the ade4 R package (Supplementary Figure S1)27.

Animals were weighed without fasting at the beginning and end of the feeding trial, as well as every 14 days during the experimental period. The mixed ration (dry corn grain, corn silage, soybean, urea, and mineral salt) was offered ad libitum and formulated with 67% of total digestible nutrients (TDN) and 13% of crude protein (CP), aiming for an average daily gain (ADG) of 1.1kg.

The following feed efficiency-related traits were evaluated: ADG, dry matter intake (DMI), feed efficiency (FE), and RFI. In the individual pens, the orts were weighed daily in the morning before the feed delivery to calculate the daily dietary intake. In the group pens, the GrowSafe feeding system automatically recorded the feed intake. Thus, the DMI (expressed as kg/day) was estimated as the feed intake by each animal with subsequent adjustments for dry matter content. ADG was estimated as the slope of the linear regression of body weight (BW) on feeding trial days, and the FE was expressed as the ratio of ADG and DMI. Finally, RFI was calculated within each contemporary group (CG), as the difference between the observed and expected feed intake considering the average metabolic body weight (MBW) and ADG of each animal (Koch et al., 1963) as follows:

$$DMI=CG+ {beta }_{0}+{beta }_{1}ADG+{beta }_{2}MBW+varepsilon$$

where ({beta }_{0}) is the model intercept, ({beta }_{1}) and ({beta }_{2}) are the linear regression coefficients for (ADG) and ({MBW=BW}^{0.75}), respectively, and (varepsilon) is the residual of the equation representing the RFI estimate.

The contemporary groups (CG) were defined by sex, year of birth, type of feed trial pen (individual or collective) and selection herd. Phenotypic observations with values outside the interval of3.5 standard deviations below and above the mean of each CG for each trait were excluded, and the number of animals per CG ranged from 10 to 70.

The (co)variance components and heritability for FE-related traits were estimated considering a multi-trait GBLUP (MTGBLUP) as follows:

$$mathbf{y}=mathbf{X}{varvec{upbeta}}+mathbf{Z}mathbf{a}+mathbf{e},$$

Where ({varvec{y}}) is the matrix of phenotypic FE-related traits (ADG, FE, DMI, and RFI) of dimension Nx4 (N individuals andfour traits); ({varvec{upbeta}}) is the vector of fixed effects, linear and quadratic effects of cow age, and linear effect of animals age at the beginning of the test; (mathbf{a}) is the vector of additive genetic effects (breeding values) of animal, and (mathbf{e}) is a vector with the residual terms. The (mathbf{X}) and (mathbf{Z}) are the incidence matrices related to fixed (b) and random effects (a), respectively. It was assumed that the random effects of animals and residuals were normally distributed, as (mathbf{a}sim {text{N}}(0,mathbf{G}otimes {mathbf{S}}_{mathbf{a}})) and (mathbf{e}sim {text{N}}(0,mathbf{I}otimes {mathbf{S}}_{mathbf{e}})), where (mathbf{G}) is the additive genomic relationship matrix between genotyped individuals according to VanRaden28, (mathbf{I}) is an identity matrix,is the Kronecker product, and ({mathbf{S}}_{mathbf{a}}=left[begin{array}{ccc}{upsigma }_{{text{a}}1}^{2}& cdots & {upsigma }_{mathrm{a1,4}}\ vdots & ddots & vdots \ {upsigma }_{mathrm{a1,4}}& cdots & {upsigma }_{{text{a}}4}^{2}end{array}right]) and ({mathbf{S}}_{mathbf{e}}=left[begin{array}{ccc}{upsigma }_{{text{e}}1}^{2}& cdots & {upsigma }_{mathrm{e1,4}}\ vdots & ddots & vdots \ {upsigma }_{mathrm{e1,4}}& cdots & {upsigma }_{{text{e}}4}^{2}end{array}right]) are the additive genetic and residual (co)variance matrices, respectively. The G matrix was obtained according to VanRaden28: (mathbf{G}=frac{mathbf{M}{mathbf{M}}^{mathbf{^{prime}}}}{2sum_{{text{j}}=1}^{{text{m}}}{{text{p}}}_{{text{j}}}left(1-{{text{p}}}_{{text{j}}}right)}) where (mathbf{M}) is the SNP marker matrix with codes 0, 1, and 2 for genotypes AA, AB, and BB adjusted for allele frequency expressed as (2{{text{p}}}_{{text{j}}}), and ({{text{p}}}_{{text{j}}}) is the frequency of the second allele jth SNP marker.

The analyses were performed using the restricted maximum likelihood (REML) method through airemlf90 software29. The predictf90 software29 was used to obtain the phenotypes adjusted for the fixed effects and covariates (({{text{y}}}^{*}={text{y}}-{text{X}}widehat{upbeta })). The adjusted phenotypes were used as the response variable in the genomic predictions.

Tthe GEBVs accuracy (({{text{Acc}}}_{{text{GEBV}}})) in the whole population, was calculated based on prediction error variance (PEV) and the genetic variance for each FE-related trait (({upsigma }_{{text{a}}}^{2})) using the following equation30: ({text{Acc}}=1-sqrt{{text{PEV}}/{upsigma }_{{text{a}}}^{2}}) .

A forward validation scheme was applied for computing the prediction accuracies using machine learning and parametric methods, splitting the dataset based on year of birth, with animals born between 2004 and 2013 assigned as the reference population (n=836) and those born in 2014 and 2015 (n=188) as the validation set. For ML approaches, we randomly split the training dataset into fivefold to train the models.

Genomic prediction for FE-related traits considering the STGBLUP can be described as follows:

$${mathbf{y}}^{mathbf{*}}={varvec{upmu}}+mathbf{Z}mathbf{a}+mathbf{e}$$

where ({mathbf{y}}^{mathbf{*}}) is the Nx1 vector of adjusted phenotypic values for FE-related traits, (upmu) is the model intercept, (mathbf{Z}) is the incidence connecting observations; (mathbf{a}) is the vector of predicted values, assumed to follow a normal distribution given by ({text{N}}(0,{mathbf{G}}sigma_{a}^{2})) and (mathbf{e}) is the Nx1 vector of residual values considered normally distributed as ({text{N}}(0,mathbf{I}{upsigma }_{{text{e}}}^{2})), in which I is an identity matrix, ({upsigma }_{{text{e}}}^{2}) is the residual variance. The STGBLUP model was performed using blupf90+software29.

Genomic prediction for FE-related traits considering MTGBLUP can be described as follows:

$${mathbf{y}}^{mathbf{*}}={varvec{upmu}}+mathbf{Z}mathbf{a}+mathbf{e}$$

where ({mathbf{y}}^{mathbf{*}}) is the matrix of adjusted phenotypes of dimension Nx4, (upmu) is the trait-specific intercept vector, (mathbf{Z}) is the incidence matrix for the random effect; (mathbf{a}) is an Nx4 matrix of predicted values, assumed to follow a normal distribution given by ({text{MVN}}(0,{mathbf{G}} otimes {mathbf{S}}_{{mathbf{a}}})) where ({mathbf{S}}_{mathbf{a}}) represents genetic (co)variance matrix for the FE-related traits (44). The residual effects (e) were considered normally distributed as ({text{MVN}}(0,mathbf{I}otimes {mathbf{S}}_{mathbf{e}})) in which I is an identity matrix, and ({mathbf{S}}_{mathbf{e}}) is the residual (co)variance matrix for FE-related traits (44). The MTGBLUP was implemented in the BGLR R package14 considering a Bayesian GBLUP with a multivariate Gaussian model with an unstructured (co)variance matrix between traits (({mathbf{S}}_{mathbf{a}})) using Gibbs sampling with 200,000 iterations, including 20,000 samples as burn-in and thinning interval of 5 cycles. Convergence was checked by visual inspection of trace plots and distribution plots of the residual variance.

Five Bayesian regression models with different priors were used for GS analyses: Bayesian ridge regression (BRR), Bayesian Lasso (BL), BayesA, BayesB, and BayesC. The Bayesian algorithms for GS were implemented using the R package BGLR version 1.0914. The BGLR default priors were used for all models, with 5 degrees of freedom (dfu), a scale parameter (S), and . The Bayesian analyses were performed considering Gibbs sampling chains of 200,000 iterations, with the first 20,000 iterations excluded as burn-in and a sampling interval of 5 cycles. Convergence was checked by visual inspection of trace plots and distribution plots of the residual variance. For Bayesian regression methods, the general model can be described as follows:

$${mathbf{y}}^{mathbf{*}}=upmu +sum_{{text{w}}=1}^{{text{p}}}{{text{x}}}_{{text{iw}}}{{text{u}}}_{{text{w}}}+{{text{e}}}_{{text{i}}}$$

where (upmu) is the model intercept; ({{text{x}}}_{{text{iw}}}) is the genotype of the ith animal at locus w (coded as 0, 1, and 2); ({{text{u}}}_{{text{w}}}) is the SNP marker effect (additive) of the w-th SNP (p=305,128); and ({{text{e}}}_{{text{i}}}) is the residual effect associated with the observation of ith animal, assumed to be normally distributed as (mathbf{e}sim {text{N}}(0,{mathbf{I}upsigma }_{{text{e}}}^{2})).

The BRR method14 assumes a Gaussian prior distribution for the SNP markers (({{text{u}}}_{{text{w}}})), with a common variance ({(upsigma }_{{text{u}}}^{2})) across markers so that ({text{p}}left({{text{u}}}_{1},dots ,{{text{u}}}_{{text{w}}}|{upsigma }_{{text{u}}}^{2}right)=prod_{{text{w}}=1}^{{text{p}}}{text{N}}({{text{u}}}_{{text{w}}}{|0,upsigma }_{{text{u}}}^{2})). The variance of SNP marker effects is assigned a scaled-inverse Chi-squared distribution [({text{p}})(({upsigma }_{{text{u}}}^{2})={upchi }^{-2}({upsigma }_{{text{u}}}^{2}|{{text{df}}}_{{text{u}}},{{text{S}}}_{{text{u}}}))], and the residual variance is also assigned a scaled-inverse Chi-squared distribution with degrees of freedom (dfe)and scale parameters (Se).

Bayesian Lasso (BL) regression31 used an idea from Tibshirani32 to connect the LASSO (least absolute shrinkage and selection operator) method with the Bayesian analysis. In the BL, the source of variation is split intoresidual term(({upsigma }_{{text{e}}}^{2}))and variation due to SNP markers (({upsigma }_{{{text{u}}}_{{text{w}}}}^{2})). The prior distribution for the additive effect of the SNP marker (left[{text{p}}left({{text{u}}}_{{text{w}}}|{uptau }_{{text{j}}}^{2},{upsigma }_{{text{e}}}^{2}right)right]) follows a Gaussian distribution with marker-specific prior variance given by ({text{p}}left({{text{u}}}_{{text{w}}}|{uptau }_{{text{j}}}^{2},{upsigma }_{{text{e}}}^{2}right)=prod_{{text{w}}=1}^{{text{p}}}{text{N}}({{text{u}}}_{{text{w}}}left|0,{uptau }_{{text{j}}}^{2}{upsigma }_{{text{e}}}^{2}right)). This prior distribution leads to marker-specific shrinkage of their effect, whose their extent depends on the variance parameters (left({uptau }_{{text{j}}}^{2}right)). The variance parameters (left({uptau }_{{text{j}}}^{2}right)) is assigned as exponential independent and identically distributed prior,({text{p}}left( {{uptau }_{{text{j}}}^{2} left| {uplambda } right.} right) = mathop prod limits_{{{text{j}} = 1}}^{{text{p}}} {text{Exp}}left( {{uptau }_{{text{j}}}^{2} left| {{uplambda }^{2} } right.} right)) and the square lambda regularization parameter (({uplambda }^{2})) follows a Gamma distribution (({text{p}}left({uplambda }^{2}right)={text{Gamma}}({text{r}},uptheta ))), where r and (uptheta) are the rate and shape parameters, respectively31. Thus, the marginal prior for SNP markers is given by a double exponential (DE) distribution as follows: ({text{p}}left( {{text{u}}_{{text{w}}} left| {uplambda } right.} right) = int {{text{N}}left( {{text{u}}_{{text{w}}} left| {0,{uptau }_{{text{j}}}^{2} ,{upsigma }_{{text{e}}}^{2} } right.} right){text{Exp}}left( {{uptau }_{{text{j}}}^{2} left| {{uplambda }^{2} } right.} right)}), where the DE distribution places a higher density at zero and thicker tails, inducing stronger shrinkage of estimates for markers with relatively small effect and less shrinkage for markers with substantial effect. The residual variance (({upsigma }_{{text{e}}}^{2})) is specified as a scaled inverse chi-squared prior density, with degrees of freedom dfe and scale parameter Se.

BayesA method14,33 considers Gaussian distribution with null mean as prior for SNP marker effects (({{text{u}}}_{{text{w}}})), and a SNP marker-specific variance (({upsigma }_{{text{w}}}^{2})). The variance associated with each marker effect assumes a scaled inverse chi-square prior distribution, ({text{p}}left({upsigma }_{{text{w}}}^{2}right)={upchi }^{-2}left({upsigma }_{{text{w}}}^{2}|{{text{df}}}_{{text{u}}},{{text{S}}}_{{text{u}}}^{2}right)), with degrees of freedom (({{text{df}}}_{{text{u}}})) and scale parameter (({{text{S}}}_{{text{u}}}^{2})) treated as known14. Thus, BayesA places a t-distribution for the markers effects, i.e., ({text{p}}left({{text{u}}}_{{text{w}}}|{{text{df}}}_{{text{u}}},{{text{S}}}^{2}right)={text{t}}left(0,{{text{df}}}_{{text{u}}},{{text{S}}}_{{text{u}}}^{2}right)), providing a thicker-tail distribution compared to the Gaussian, allowing a higher probability of moderate to large SNP effects.

BayesB assumes that a known proportion of SNP markers have a null effect (i.e., a point of mass at zero), and a subset of markers with a non-null effect that follow univariate t-distributions3,12, as follows:

$${text{p}}left({{text{u}}}_{{text{w}}}|{text{df}},uppi ,{{text{df}}}_{{text{u}}},{S}_{B}^{2}right)=left{begin{array}{cc}0& mathrm{with probability pi }\ {text{t}}left({{text{u}}}_{{text{w}}}|{{text{df}}}_{{text{u}}},{S}_{B}^{2}right)& mathrm{with probability }left(1-uppi right)end{array}right.$$

where (uppi) is the proportion of SNP markers with null effect, and (1-uppi) is the probability of SNP markers with non-null effect contributing to the variability of the FE-related trait3. Thus, the prior distribution assigned to SNP with non-null effects is a scaled inverse chi-square distribution.

BayesC method34 assumes a spikeslab prior for marker effects, which refers to a mixture distribution comprising a fixed amount with probability (uppi) of SNP markers have a null effect, whereas a probability of 1 of markers have effects sampled from a normal distribution. The prior distribution is as follows:

$${text{p}}left({{text{u}}}_{{text{w}}},{upsigma }_{{text{w}}}^{2},uppi right)=left{prod_{{text{j}}=1}^{{text{w}}}left[uppi left({{text{u}}}_{{text{w}}}=0right)+left(1-uppi right){text{N}}(0,{upsigma }_{{text{w}}}^{2})right]*{upchi }^{-2}left({upsigma }_{{text{w}}}^{2}|{{{text{df}}}_{{text{u}}},mathrm{ S}}_{{text{B}}}^{2}right)*upbeta (uppi |{{text{p}}}_{0},{uppi }_{0}right},$$

Where ({upsigma }_{{text{w}}}^{2}) is the common variance for marker effect, ({{text{df}}}_{{text{u}}}) and ({{text{S}}}_{{text{B}}}^{2}) are the degrees of freedom and scale parameter, respectively, ({{text{p}}}_{0}) and ({uppi }_{0})[0,1] are the prior shape parameters of the beta distribution.

Two machine learning (ML) algorithms were applied for genomic prediction: Multi-layer Neural Network (MLNN) and support vector regression (SVR). The ML approaches were used to alleviate the standard assumption adopted in the linear methods, which restrict to additive genetic effects of markers without considering more complex gene action modes. Thus, ML methods are expected to improve predictive accuracy for different target traits. To identify the best combination of hyperparameters (i.e., parameters that must be tuned to control the learning process to obtain a model with optimal performance) in the supervised ML algorithms (MLNN and SVR), we performed a random grid search by splitting the reference population from the forward scheme into five-folds35.

In MLNN, handling a large genomic dataset, such as 305,128 SNPs, is difficult due to the large number of parameters that need to be estimated, leading to a significant increase in computational demand36. Therefore, an SNP pre-selection strategy based on GWAS results in the training population using an MTGBLUP method (Fig.1A) was used to reduce the number of markers to be considered as input on the MLNN. In addition, this strategy can remove noise information in the genomic data set. In this study, the traits displayed major regions explaining a large percentage of genetic variance, which makes using pre-selected markers useful37.

(A) Manhattan plot for percentage of genetic variance explained by SNP-marker estimated through multi-trait GWAS in training population to be used as pre-selection strategies for multi-layer neural network. (B) General representation of neural networks with two hidden layers used to model nonlinear dependencies between trait and SNP marker information. The input layer ((X={x}_{i,p})) considered in the neural network refers to the SNP marker information (coded as 0, 1, and 2) of the ith animal. The selected node represents the initial weight ((W={w}_{p})), assigned as random values between -0.5 and 0.5, connecting each input node to the first hidden layer and in the second layer the ({w}_{up}) refers to the output weight from the first hidden layer, b represents the bias which helps to control the values in the activation function. The output ((widehat{y})) layer represents a weighted sum of the input features mapped in the second layer.

The MLNN model can be described as a two-step regression38. The MLNN approach consists of three different layer types: input layer, hidden layer, and output layer. The input layer receives the input data, i.e., SNP markers. The hidden layer contains mapping processing units, commonly called neurons, where each neuron in the hidden layer computes a non-linear function (activation) of the weighted sum of nodes on the previous layer. Finally, the output layer provides the outcomes of the MLNN. Our proposed MLNN architecture comprises two fully connected hidden layers schematically represented in Fig.1B. The input layer in MLNN considered SNP markers that explained more than 0.125% of the genetic variance for FE-related traits (Fig.1A;~15k for ADG and DMI, and~16k for FE and RFI). The input covariate (X={{x}_{p}}) contains pre-selected SNP markers (p) with a dimension Nxp (N individuals and p markers). The pre-selected SNP markers are combined with each k neuron (with k=1, , Nr) through the weight vector ((W)) in the hidden layer and then summed with a neuron-specific bias (({b}_{k})) for computing the linear score for the neuron k as:({Z}_{i}^{[1]}=f({{b}_{k}}^{[1]}+X{W}^{[1]})) (Fig.1B). Subsequently, this linear score transformed using an activation function (fleft(.right)) to map k neuron-specific scores and produce the first hidden layer output ((fleft({z}_{1,i}right))). In the second-hidden layer, each neuron k receives a net input coming from hidden layer 1 as: ({Z}_{i}^{[2]}={{b}_{k}}^{left[2right]}+{Z}_{i}^{[1]}{W}^{[2]}), where ({W}^{[2]}) represents the weight matrix of dimension k x k (knumber of neurons) connecting the ({Z}_{i}^{[1]}) into the second hidden layer, and ({{b}_{k}}^{left[2right]}) is a bias term in hidden layer 2. Then, the activation function is applied to map the kth hidden neuron unit in the second hidden layer and generate the output layer as ({V}_{2,i}=fleft({z}_{2,i}right)). In the MLNN, a hyperbolic tangent activation function (({text{tanh}}left({text{x}}right)={{text{e}}}^{{text{x}}}-{{text{e}}}^{-{text{x}}}/{{text{e}}}^{{text{x}}}+{{text{e}}}^{-{text{x}}})) was adopted in the first and second layers, providing greater flexibility in the MLNN39.

The prediction of the adjusted FE-related trait was obtained as follows38:

$${mathbf{y}}^{mathbf{*}}=mathbf{f}left(mathbf{b}+{mathbf{V}}_{2,mathbf{i}}{mathbf{W}}_{0}right)+mathbf{e}$$

where ({mathbf{y}}^{mathbf{*}}) represents the target adjusted feed efficiency-related trait for the ith animal; (k) the number of neurons considered in the model and assumed the same in the first and second layer; ({mathbf{W}}_{0}) represents the weight from the k neuron in layer 2, (mathbf{b}) is related to the bias parameter. The optimal weights used in MLNN were obtained by minimizing the mean square error of prediction in the training subset40.

The MLNN model was implemented using the R package h2o (https://github.com/h2oai/h2o-3), with the random grid search using the h2o.grid function (https://cran.r-project.org/web/packages/h2o) to determine the number of neurons to maximize the prediction accuracy. We used the training population split into fivefold to assess the best neural network architecture and then apply it in the disjoint validation set41,42. We considered a total of 1000 epochs36, numbers of neurons ranging from 50 to 2500 with intervals of 100, and applied a dropout ratio of 0.2 and regularization L1 and L2 parameters as 0.0015 and 0.0005, respectively. In this framework, the MLNN was performed using two hidden layers of neural networks with the number of neurons (k) of 750 for ADG, 1035 for DMI, 710 for FE, and 935 for RFI obtained during the training process.

Support vector regression (SVR) is a kernel-based supervised learning technique used for regression analysis43. In the context of GS, the SVR uses linear models to implement nonlinear regression by mapping the predictor variables (i.e., SNP marker) in the feature space using different kernel functions (linear, polynomial, or radial basis function) to predict the target information, e.g., adjusted phenotype the GS44. SVR can map linear or nonlinear relationships between phenotypes and SNP markers depending on the kernel function. The best kernel function mapping genotype to phenotype (linear, polynomial, and radial basis) was determined using the training subset split into fivefold. The radial basis function (RBF) was chosen as it outperformed the linear and polynomial (degree equal 2) kernels in the training process, increasing 8.25% in predictive ability and showing the lowest MSE.

The general model for SVR using a RBF function can be described as38,45: ({mathbf{y}}_{mathbf{i}}^{mathbf{*}}=mathbf{b}+mathbf{h}{left(mathbf{m}right)}^{mathbf{T}}mathbf{w}+mathbf{e}), where (mathbf{h}{left(mathbf{m}right)}^{mathbf{T}}) represents the kernel radial basis function used to transform the original predictor variables, i.e. SNP marker information (({text{m}})), (b) denotes the model bias, and (w) represents the unknown regression weight vector. In the SVR, the learn function (mathbf{h}{left(mathbf{m}right)}^{mathbf{T}}) was given by minimizing the loss function. The SVR was fitted using an epsilon-support vector regression that ignores residual absolute value ((left|{y}_{i}^{*}-{widehat{y}}_{i}^{*}right|)) smaller than some constant () and penalize larger residuals46.

The kernel RBF function considered in the SVR follows the form: (mathbf{h}{left(mathbf{m}right)}^{mathbf{T}}=mathbf{exp}left(-{varvec{upgamma}}{Vert {mathbf{m}}_{mathbf{i}}-{mathbf{m}}_{mathbf{j}}Vert }^{2}right)), where the ({varvec{upgamma}}) is a gamma parameter to quantity the shapes of the kernel functions, (m)and({m}_{i}) are the vectors of predictor variables for labels i and j. The main parameters in SVR are the cost parameter (({text{C}})), gamma parameter (({varvec{upgamma}})), and epsilon ((upepsilon)). The parameters ({text{C}}) and (upepsilon) were defined using the training data set information as proposed by Cherkasky and Ma47: ({text{C}}={text{max}}left(left|overline{{{text{y}} }^{*}}+3{upsigma }_{{{text{y}}}^{*}}right|,left|overline{{{text{y}} }^{*}}-3{upsigma }_{{{text{y}}}^{*}}right|right)) and (upepsilon =3{upsigma }_{{{text{y}}}^{*}}left(sqrt{{text{ln}}left({text{n}}right)/{text{n}}}right)), in which the (overline{{{text{y}} }^{*}}) and ({upsigma }_{{{text{y}}}^{*}}) are the mean and the standard deviation of the adjusted FE-related traits on the training population, and n represents the number of animals in the training set. The gamma () was determined through a random search of values varying from 0 to 5, using the training folder split into fivefold. The better-trained SVR model considered the parameter of 2.097 for ADG, 0.3847 for DMI, 0.225 for FE, and 1.075 for RFI. The SVR was implemented using the e1071 R package48.

Prediction accuracy (acc) of the different statistical approaches was assessed by Pearsons correlation between adjusted phenotypes (({{text{y}}}^{*})) and their predicted values (({widehat{{text{y}}}}_{{text{i}}}^{*})) on the validation set, and root mean squared error (RMSE). The prediction bias was assessed using the slope of the linear regression of ({widehat{y}}_{i}^{*}) on ({{text{y}}}^{*}), for each model. The Hotelling-Williams test49 was used to assess the significance level of the difference in the predictive ability of Bayesian methods (BayesA, BayesB, BayesC, BL, and BRR), MTGBLUP, and machine learning (MLNN and SVR) against STGBLUP. The similarity between the predictive performance of the different models was assessed using Wards hierarchical clustering method with an Euclidian distance analysis. The relative difference (RD) in the predictive ability was measured as ({text{RD}}=frac{({{text{r}}}_{{text{m}}}-{{text{r}}}_{{text{STGBLUP}}})}{{{text{r}}}_{{text{STGBLUP}}}}times 100), where ({{text{r}}}_{{text{m}}}) represents the acc of each alternative approach (SVR, MLNN, and MTGBLUP, or Bayesian regression modelsBayesA, BayesB, BayesC, BL, and BRR), and ({{text{r}}}_{{text{STGBLUP}}}) is the predictive ability obtained using the STGBLUP method.

The animal procedures and data sampling presented in this study were approved and performed following the Animal Care and Ethical Committee recommendations of the So Paulo State University (UNESP), School of Agricultural and Veterinary Science (protocol number 18.340/16).

Read more here:
Benchmarking machine learning and parametric methods for genomic prediction of feed efficiency-related traits in ... - Nature.com

Read More..

Research on lightweight algorithm for gangue detection based on improved Yolov5 | Scientific Reports – Nature.com

The following improvements have been made to Yolov5s. The EfficientVIT network was proposed by Liu et al.22 to cascade groups of attentional modules and give different complete features to divide the attentional head, which saves computational costs and increases attentional diversity. Comprehensive experiments demonstrate that the efficiency is significantly better than existing effective models, yielding a better speed-capacity trade-off. Mpdiou is a modern bounding box similarity comparison metric based on minimum point distance, Mpdiou, proposed by Ma23 and others, which incorporates all the relevant factors considered in the existing loss functions, i.e., overlapping or non-overlapping areas, centroid distances, width and height biases while simplifying the computation process. C3_Faster, as a current Partial Convolution (PConv) technique proposed by Chen et al.24, performs spatial feature extraction more efficiently due to both reduced redundant computation and reduced memory access. Based on PConv, FasterNet, a novel family of neural networks, is additionally proposed, which achieves higher operation speed than others on different devices without compromising the accuracy of visual tasks. This is because the lightweight improvement of Yolov5s requires a reduction in both the number of parameters and the amount of computation, which can be achieved by all of the above methods and satisfies the experimental requirements. Thus, firstly, the entire backbone network in the original Yolov5s is replaced by the EfficientVIT network in the backbone module, secondly, the C3 module is replaced by C3_Faster in the HEAD module, and again, the Neck region of the Yolov5 model is appropriately streamlined, the 2020 feature map branch, which has the largest sensory field and is suitable for detecting objects of larger size, is deleted, and finally Mpdiou is used to replace CIOU, while the SE attention mechanism is introduced, which is conducive to the model's better fusion of valuable features to improve the detection performance. A schematic of the structure of the improved model is shown in Fig.2.

Structure of Yolov5s improved model.

EfficientVit is a lightweight network model. EfficientVit designs a different building block with a mezzanine layout, namely a single memoryless bound MHSA between valid FFN layers, which improves channel communication while increasing memory efficiency. EfficientVit also proposes a cascade group attention module that assigns different complete feature segmentations to the attention head25, and the overall framework is shown in Fig.3. Containing three phases, each phase contains a number of sandwich structures, which consist of 2N DWConv (spatially localized communication) and FFN (channel communication) and cascaded packet attention. Cascading group attention differs from previous MHSA in that heads are first segmented and then Q, K, and V are generated. Alternatively, to learn richer feature maps and increase the model capacity, the output of each head is summed with the input of the next head. Finally, multiple header outputs are concatenated and mapped using a linear layer to obtain the final output, which is denoted as Eq:

$${X}_{ij} = Attn(X_{ij} W_{ij}^{Q} ,X_{ij} W_{ij}^{K} ,X_{ij} W_{ij}^{V} )$$

(1)

$${X}_{i + 1} = Concat[{X}_{ij} ]_{j = 1:h} W_{i}^{P}$$

(2)

$$X^{prime}_{ij} = X_{ij} + {X}_{i(j - 1)} ,1 < j le h$$

(3)

The jth head in Eqs. (1), (2) computes the self-attention on Xij, which is the jth partition of the input feature Xi, i.e., Xi=[Xi1, Xi2, , Xih] and 1jh is the total number of heads, (W_{ij}^{Q}), (W_{ij}^{K}), and (W_{ij}^{V}) are the projection layers that partition the input feature into different subspaces, and (W_{i}^{P}) is a linear layer that projects the connected output features back to the input dimension that is consistent with the input.

Equation(3) where (X^{prime}_{ij}) is the sum of the jth input segmentation point Xij and the (j-1)th head output (widetilde{X}_{i(j - 1)}) computed according to Eq.(1). It replaces Xij as the original input feature for the j-th head when computing self-attention. In addition, another label interaction layer is applied after Q-projection, which allows self-attention to jointly capture local and global relations and greatly enhance the feature representation.

The loss function is an influential component in neural networks whose main role is to measure the distance between the information predicted by the network and the desired information, i.e. The closer the two are to each other, the smaller the value of the loss function. The loss functions of the YOLO algorithm family mainly include the localization loss function (lossrect), the confidence prediction loss function (lossobj), and the category loss functions (loscls). The localization loss function used by Yolov5 is the CIOU function, which is computed as follows.

$$CIOU_Loss = 1 - IOU + frac{{lambda^{2} (a,a^{gt} )}}{{c^{2} }} + alpha mu$$

(4)

$$alpha = frac{mu }{(1 - IOU) + mu }$$

(5)

$$mu = frac{4}{pi }left[ {(arctan frac{{w^{gt} }}{{h^{gt} }}) - arctan frac{w}{h}} right]^{2}$$

(6)

Equations(4)(6) in which a and agt are the centroids of the prediction and target frames, respectively, and is the Euclidean distance between the two centroids; C is the diagonal length of the smallest closed region of the predicted and target frames. is the weight of the function; is the consistency of the aspect ratios of the two frames; Here, h and w are the height and width of the predicted frame, respectively. The hgt and wgt are the height and width of the target frames, respectively. The CIOU function mainly notices the overlapping parts of the prediction and target frames. The Mpdiou loss function is used.

Mpdiou is a bounding box similarity comparison metric based on the minimum point distance that includes all the relevant factors considered in existing loss functions. Mpdiou simplifies the similarity comparison between two bounding boxes and is suitable for overlapping or non-overlapping bounding box regression. Therefore, Mpdiou can be a decent alternative to the intersection and merging ratio as a metric for all performance metrics in 2D/3D computer vision tasks. It also simplifies the computation by directly minimizing the upper-left and lower-right point distances between the predicted bounding boxes and the actual labeled bounding boxes. Mpdiou is computed as follows.

$${text{d}}_{1}^{2} = (x_{1}^{B} - x_{1}^{A} )^{2} + (y_{1}^{B} - y_{1}^{A} )^{2}$$

(7)

$${text{d}}_{2}^{2} = (x_{2}^{B} - x_{2}^{A} )^{2} + (y_{2}^{B} - y_{2}^{A} )^{2}$$

(8)

$$M{text{pdiou}} = frac{A cap B}{{A cup B}} - frac{{d_{1}^{2} }}{{w^{2} + h^{2} }} - frac{{d_{2}^{2} }}{{w^{2} + h^{2} }}$$

(9)

In Eqs. (7)(9) d1, d2 denote the intersection and minimum point distance, two arbitrary shapes: A, BSRn, and the width and height of the input image: w, h. Output: Mpdiou.Let ((x_{1}^{A} ,y_{1}^{A} )), ((x_{2}^{A} ,y_{2}^{A} )) denote the coordinates of the upper left and lower right points of A. Let ((x_{1}^{B} ,y_{1}^{B} )), ((x_{2}^{B} ,y_{2}^{B} )) denote the coordinates of the upper left and lower right points of B, respectively.

The object detection head is part of the feature pyramid used to perform object detection, which includes multiple convolutional, pooling, and fully connected layers, among others. In the Yolov5 model, the detection head module is mainly responsible for multiple object detection feature maps extracted from the backbone network. The module consists of three main parts. The C3 module is an essential part of the Yolov5 network and its main role is to increase the depth and receptive field of the network and improve the feature extraction capability. C3-Faster is implemented as C3-Faster by multiple Faster_Blocks, which can be used to replace the C3 module in Yolov5 thereby achieving accelerated network inference, where the Faster_Block is implemented by the lightweight convolutional PConv proposed in the literature21 in combination with additional operations. Replace the C3 module with C3-Faster in the HEAD module.

The Neck region in the Yolov5 model uses a multipath structure to aggregate features and enhance network feature fusion. The size of the coal and gangue is too narrow with respect to the whole image, making the Neck region redundant for large object detection. In order to improve the model detection speed, the Neck region of the Yolov5 model is properly streamlined by removing the 2020 feature map branch that has the largest receptive field and is suitable for detecting objects of larger sizes. Elimination is performed to reduce the model complexity and improve the real-time performance of detection. As shown in Fig.4.

Improved neck and prediction structure.

The SE attention mechanism is introduced into the original model to improve the object detection accuracy. The SE attention mechanism consists of three parts, namely, Tightening Squeeze, Incentive Expiration, and Feature Schema Calibration, with the main purpose of enhancing useful features. First, the global information of the feature maps is obtained by global average pooling, and the individual channels refine this information to derive the channel weights and adjust the weights of the original feature maps for better performance. The resulting feature maps are compressed along the spatial dimension, and the dimensionality of the feature maps is compressed using a global average pooling compression operation to turn each two-dimensional feature channel into a real number, with the output dimension matching the number of input feature channels. The feature map from WHC is compressed into a 11C vector by The feature map is compressed from WHC to a 11C vector by the Excitation operation using the completely connected layer acting on the feature map, and the Sigmoid activation function to obtain the normalized weights. The weight information is obtained through learning, and the weights are applied to the corresponding channels, and finally The scale operation is performed, and the weights of each feature channel obtained after the Excitation operation are multiplied with the original feature map channels one by one, and the generated feature vectors are multiplied with the corresponding channels of the feature map to obtain the weights of the corresponding channels, which are re-calibrated to the feature map. The SE module is shown in Fig. 5.

See more here:
Research on lightweight algorithm for gangue detection based on improved Yolov5 | Scientific Reports - Nature.com

Read More..

Orange isn’t building its own AI foundation model here’s why – Light Reading

There has been a flurry of interest in generative AI (GenAI) from telcos, each of which has taken its own nuanced approach to the idea of building its own large language models (LLMs). While Vodafone seems todismiss the ideaand Verizon appears content to build on existing foundation models, Deutsche Telekom and SK Telecomannounced last yearthey will develop telco-specific LLMs. Orange, meanwhile, doesn't currently see the need to build a foundation model, its chief AI officer Steve Jarrett has recently told Light Reading.

Jarrett said the company is currently content with using existing models and adapting them to its needs using two main approaches. The first one is retrieval-augmented generation (RAG), where a detailed source of information is passed to the model together with the prompt to augment its response.

He said this allows the company to experiment with different prompts easily, adding that existing methodologies can be used to assess the results. "That is a very, very easy way to dynamically test different models, different styles of structuring the RAG and the prompts. And [] that solves the majority of our needs today," he elaborated.

At the same time, Jarrett admitted that the downside of RAG is that it may require a lot of data to be passed along with the prompt, making more complex tasks slow and expensive. In such cases, he argued, fine-tuning is a more appropriate approach.

Distilling models

In this case, he explained, "you take the information that you would have used in the RAG for [] a huge problem area. And you make a new version of the underlying model that embeds all that information." Another related option is to distill the model.

This involves not just structuring the output of the model, but downsizing it, "like you're distilling fruit into alcohol," Jarrett said, adding "there are techniques to actually chop the model down into a much smaller model that runs much faster."

This approach is, however, highly challenging. "Even my most expert people frequently make mistakes," he admitted, saying: "It's not simple, and the state of the art of the tools to fine tune are changing every single day." At the same time, he noted that these tools are improving constantly and, as a result, he expects fine-tuning to get easier over time.

He pointed out that building a foundation model from scratch would be an even more complex task, which the company currently doesn't see a reason to embark on. Nevertheless, he stressed that it's impossible to predict how things will evolve in the future.

Complexity budget

One possibility is that big foundational models will eventually absorb so much information that the need for RAG and other tools will diminish. In this scenario, Orange may never have to create its own foundation model, Jarrett said, "as long as we have the ability to distill and fine tune models, where we need to, to make the model small enough to run faster and cheaper and so on."

He added: "I think it's a very open question in the industry. In the end, will we have a handful of massive models, and everyone's doing 99% RAG and prompt engineering, or are there going to be millions of distilled and fine-tuned models?"

One factor that may determine where things will go in the future is what Jarrett calls the complexity budget. This is a concept that conveys how much computing was needed from start to finish to produce an answer.

While a very large model may be more intensive to train in the beginning, there may be less computing required for RAG and fine-tuning. "The other approach is you have a large language model that also obviously took a lot of training, but then you do a ton more compute to fine tune and distill the model so that your model is much smaller," he added.

Apart from cost, there is also an environmental concern. While hyperscalers tend to perform relatively well in terms of using clean energy, and Jarrett claimed that Orange is "fairly green as a company," he added that the carbon intensity of the energy used for on-premises GPU clusters tends to vary in the industry.

Right tool for the job

The uncertainty surrounding GenAI's future evolution is one of the reasons why Orange is taking a measured approach to the technology, with Jarrett stressing it is not a tool that's suited to every job. "You don't want to use the large language model sledge hammer to hit every nail," he said.

"I think, fairly uniquely compared to most other telco operators, we actually have the ability, the skill inside of Orange to help make these decisions about what tool to use when. So we prefer to use a statistical method or basic machine learning to solve problems because those results are more [] explainable. They're usually cheaper, and they're usually less impactful on the environment," he added.

In fact, Jarrett says one of the things Orange is investigating at the moment is how to use multiple AI models together to solve problems. The notion, he added, is called agents, and refers to a high-level abstraction of a problem, such as asking how the network in France is working on a given day. This, he said, will enable the company to solve complex problems more dynamically.

In the meantime, the company is making a range of GenAI models available to its employees, including ChatGPT, Dolly and Mistral. To do so, it has built a solution that Jarrett says provides a "secure, European-resident version of leading AI models that we make available to the entire company."

Improving customer service

Jarrett says this is a more controlled and safer way for employees to use models than if they were accessed directly. The solution also notifies the employee of the cost of running a specific model to answer a question. Available for several months, it has so far been used by 12% of employees.

Orange has already deployed GenAI in many countries within its customer service solutions to predict what the most appealing offer may be to an individual customer, Jarrett said, adding "what we're trialling right now is can generative AI help us to customize and personalize the text of that offer? Does that make the offer incrementally more appealing?"

Another potential use case is in transcribing a conversation with a customer care agent in real time, using generative AI to create prompts. The tool is still in development but could help new recruits to improve faster, raising employee and customer satisfaction, said Jarrett.

While Orange doesn't currently use GenAI for any use cases in the network, some are under development, although few details are being shared at this stage. One use case involves predicting when batteries at cell sites may need replacing.

Jarrett admits, however, that GenAI is still facing a number of challenges, such as hallucinations. "In a scenario where the outputs have to be correct 100% of the time, we're not going to use generative AI for that today, because [it's] not correct 100% of the time," he said.

Dealing with hallucinations

Yet it can be applied in areas that are less sensitive. "For example, if for internal use you want to have a summary of an enormous transcript of a long meeting that you missed, it's okay if the model hallucinates a little bit," he added.

Hallucinations cannot be stopped entirely and will likely continue to be a problem for some time, said Jarrett. But he believes RAG and fine-tuning could mitigate the issue to some extent.

"The majority of the time, if we're good at prompt engineering and we're good at passing the appropriate information with the response, the model generates very, very useful, relevant answers," Jarrett said about the results achieved with RAG.

The availability and quality of data is another issue that is often discussed, and also one that Orange is trying to address. Using data historically kept in separate silos has been difficult, said Jarrett. "[The] availability of the data from the marketing team to be able to run a campaign on where was our network relatively strong, for example those use cases were either impossible, or took many, many, many months of manual meetings and collaboration."

As a result, the company is trying to create a marketplace where data is made widely available inside each country and appropriately labeled. Orange calls this approach data democracy.

Visit link:
Orange isn't building its own AI foundation model here's why - Light Reading

Read More..

Google’s AI correctly predicts floods up to seven days early – Android Authority

C. Scott Brown / Android Authority

TL;DR

You may have heard plenty about Googles various generative AI products like Circle to Search, its AI wallpaper tool, Search Generative Experience (SGE), and more. But you may not have heard nearly as much about the other ways it is using AI, such as predicting floods. Recently, Google was able to accurately predict flooding up to seven days in advance thanks to machine learning (ML).

Today, Google announced it was able to significantly improve global-scale flood forecasting with the help of machine-learning technologies. According to the firm, it was able to improve the reliability of global nowcasts, on average, from zero to five days. In some cases, it was even able to predict floods a full week out before they happened.

One of the reasons why it can be difficult to predict floods ahead of time is the fact that most rivers dont have whats called a streamflow gauge. These gauges help provide relevant data, like precipitation and physical watershed information. However, the tech giant was able to get around this problem by feeding its ML technology on all available river data and applying the ML model to basins where no data was available.

Lending further credence to how impressive of an accomplishment this is, the companys findings were published in Nature. For context, Nature is a leading multidisciplinary science journal that publishes peer-reviewed research.

The ultimate goal of the technology is to scale the accuracy of flood forecasting to a global level, even in areas where local data is not available. Google has been able to provide forecasts to over 80 countries through its Flood Hub. It also delivers alerts on Google Search, Maps, and Android notifications.

Go here to read the rest:
Google's AI correctly predicts floods up to seven days early - Android Authority

Read More..

Artificial intelligence will radically improve health care, but only if managed carefully – The Hill

More important than the speed of bringing artificial intelligence (AI) into widespread use in American health care, is ensuring we do it correctly. To unlock the innovation’s greatest positive impact, assurance of integrity and transparency must take the highest priority. This can be accomplished by applying the principles that guide clinical research, including the respect for the human person, maximization of benefits and avoidance of harms to patients, just distribution of benefits, meaningful informed consent and protection of patient confidential information. 

The emergence of artificial intelligence is reminiscent of the great Gold Rush, a frenzied time bursting with unlimited potential yet filled with uncertainty, speculation and unforeseen consequences. The advancement of AI brings medicine to the precipice of truly transformational change that can help reduce existing burdens and inefficiencies while at the same time improve patient care and experience. Examples range from ambient voice transcription tools that enable doctors or nurses to spend more time with their patients to diagnostic devices that detect diabetic retinopathy or colon polyps, with the list growing daily. Its applications are nearly limitless; a new revolution has arrived.

This technology has galvanized the field of health care, but its broad implementation is a road yet to be traveled. It remains to be seen how medical professionals and patients will interact with and utilize artificial intelligence. Unfortunately, the potential for harm has already been demonstrated with examples of substantial algorithmic bias and the use of AI to deny patient care authorizations. Experts use the term human-in-the-loop (HITL) to describe requisite human involvement within the system of automated processes. However, this is inadequate as we must not merely be one dimension of the progressive machine learning system, but atop the hierarchy. The last line bears repeating: Humans must remain atop the hierarchy. We need to control AI, not the other way around.

The complexity of artificial intelligence will require significant bandwidth to properly oversee its application and erect sensible guardrails that enable innovation and at the same protect patients and other key stakeholders. The size and scope of this undertaking far exceeds what can be accomplished by the federal government alone. Unlike the top-down approaches pursued in other parts of the world, we must utilize public-private partnerships to develop these guidelines and guardrails and validate that what is produced is trustworthy and of value. This can be achieved, in part, by creating independent assurance laboratories that evaluate AI models and their applications using commonly accepted principles. We need more than one hen guarding the chicken house.

Avoiding similar missteps that hindered the integration of now mature technologies, such as Electronic Health Records, is paramount. National standards are critical to establish health AI best practices for the use of emerging innovations, and adoption of these benchmarks should be as close to the end beneficiaries as possible. Federal authority has an important role to play here, that of a convener and enabler of creation of these standards. However, their implementation should be deferred as much as possible to the local governance at the health system level with federal authorities intervening only when necessary. Progress will not be free, but we must learn from past mistakes.

In our pursuit of bringing artificial intelligence into mainstream medicine, ethical considerations must maintain supremacy. Patients in rural or low-income communities must have access to the benefits of this technology. Further, it is imperative AI used on or by these communities is as trustworthy as those used by premier health systems. Just as access to health care is not a guarantee of quality, access to artificial intelligence systems will not certify the capacity or reliability of what is available.

Reducing clinician burden, improving patient health and experience, and introducing new, life-saving technologies to the burgeoning world of health care is an exciting endeavor. Traversing these unknowns in a way that circumvents avoidable hazards will allow human intelligence to harness the power of unlimited computations to create better and more affordable care. Practitioners and patients alike eagerly anticipate the powerful capabilities and practical benefits of artificial intelligence in the delivery of health care. It is essential to ensure that its imminent and explosive entrance into care settings is executed judiciously and strategically to maximize its positive impact for all.

Greg Murphy, MD, a practicing urologist, represents North Carolina’s 3rd District. Michael Pencina, PhD, serves as chief data scientist in Duke Health and professor of biostatistics and bioinformatics in the Duke University School of Medicine. 

More:
Artificial intelligence will radically improve health care, but only if managed carefully - The Hill

Read More..