Page 1,312«..1020..1,3111,3121,3131,314..1,3201,330..»

Eye-opening Origin Story: Scientists Trace Key Innovation in Our … – University of California San Diego

In order to see in different wavelengths, there needs to be enough light around and thats one of the arguments for why we can see in the dark really wellwe have this enzymatic recycling system that many invertebrates dont seem to have, said Daugherty, a researcher in the Department of Molecular Biology. Eyes are diverse and complicated, and weve gone down this path because of this system.

With more genomes from more organisms becoming available, the researchers believe that other critical functions and systems will similarly trace their roots to bacteria.

This reshapes the way that we think about evolution and the way we think about complex structures that seem like theyve emerged out of nowhere, said Daugherty.

In addition to Kalluraya, a Selma and Robert Silagi Award for Undergraduate Excellence winner at UC San Diego and now a graduate student at MIT, Weitzel, Tsu and Daugherty coauthored the paper. The research was supported by the National Institutes of Health (R35 GM133633 and T32 GM007240), the Pew Biomedical Scholars program, Burroughs Wellcome Fund Investigators in the Pathogenesis of Infectious Disease program, UC San Diegos Halcolu Data Science Institute and the UC San Diego Triton Research and Experimental Learning Scholars program.

See the rest here:

Eye-opening Origin Story: Scientists Trace Key Innovation in Our ... - University of California San Diego

Read More..

10 best degrees for the future – Study International News

Choosing what to study at university can be a difficult decision. You have to take into consideration the time, effort and money put into earning your qualification and how it will prepare you for a successful future.

As such, it is important to choose one of the best degrees for the future that will guarantee you find employment and earn well.

While there are no guarantees when it comes to the future, there are a number of degrees that seem to be future-proof.

The Bureau of Labour Statistics released a list of jobs that are expected to grow rapidly by 2031. The jobs in these fields are in sectors that are growing and projected to see consistent growth in the next decade.

If you are looking for the best degrees that will earn you a successful future, we have created a list for you.

With a nursing degree, you will be able to secure employment in many different areas of healthcare. This includes caring for the sick, providing emotional support and helping patients while working in a clinic, doctors office or hospital.

If you are considering a career as a registered nurse, youll be happy to hear that it has the highest projected growth than any other occupation in the US.

According to the Bureau of Labour Statistics, the demand for registered nurses is expected to grow by 46% by the year 2031. This surpasses the average growth rate of 8% across all professions in that same time period.

Registered nurses in the US earn a salary of USD92,712 a year, making it among the highest-paid jobs.

To become a registered nurse, complete your degree at one of these top universities for nursing:

Pursuing an engineering degree can lead to a successful high-paying job. Source: Marco Bertorello/AFP

A degree in electrical engineering is focused on the practical design and building of structures or machines. As part of this course, you will learn to design and analyse electrical systems.

A degree in electrical engineering can lead you to some of the most in-demand jobs. This includes the role of a wind turbine service technician.

This line of work is expected to see the highest growth after nursing. The job of a wind turbine service technician is predicted to grow by 44% in the next decade.

A wind turbine service technician earns around USD52,976 in the US. Though not the highest paying job, those in this line do not have to worry about securing employment.

Here are the best universities to earn a degree in electrical engineering:

Culinary arts is one of the best degrees for the future, as predicted by the Bureau of Labor Statistics. Source: David Becker / GETTY IMAGES NORTH AMERICA / Getty Images/AFP

If you have a passion for cooking and curating delicious meals, then you are in luck as the demand for chefs is rapidly increasing.

The employment of chefs and head cooks is predicted to increase by 15% by 2031, with over 24,300 jobs expected to open in this field.

Many universities around the world offer a degree in culinary arts that will ensure you are career-ready. Here are some of the best:

As a chef in the US, you can expect to earn around USD51,995 annually or higher for those with more experience in the field.

Data science is one of the highest-paying careers. Source: Morgan Sette / AFP

With the rise of technology and the expansion of businesses globally, the need for data scientists is increasing. As a data scientist, your role is to make informed decisions about how a company operates.

And with so many job openings in the field, there is no better time to pursue a degree in data science.

According to BLS, it is predicted that the employment growth for a data scientist is expected to increase by 35.8% by 2031, with 40,500 job openings.

If you are looking to pursue a degree in data science, consider these top universities:

Data scientists are some of the highest paid in the US, with an average annual salary of USD129,753.

According to BLS, the employment for jobs in information technology is predicted to grow by 15 per cent by 2031, which will result in about 682,800 job openings.

A degree in information technology can prepare you for work as a software developer, cybersecurity expert, IT consultant, business analyst and many other in-demand jobs in the field.

As a student majoring in IT, you will learn to sharpen your skills in the design, implementation and maintenance of such systems as well as learn about information security.

Those working as an information technology specialist in the US can earn up to USD44,450 annually.

Kickstart your career in this field by attending these universities renowned for their information technology programme:

Computer science is one of the best degrees for the future as it leads to employment in many different industries. Source: Goh Chain Hin/AFP

Graduates of computer science are in high demand as nearly every company relies on computers to keep their organisations running. As such, whether big or small, almost all businesses need computer specialists who can swoop in and fix any tech-related problems.

According to the US BLS, it is predicted that the employment of computer science will see a rapid growth of 15% by 2031.

A degree in computer science will equip you with the skills to write codes in many programming languages and develop software systems, which could lead to a career in cybersecurity, information systems management or software development.

Earn your computer science qualification from one of these universities:

Enrol at the London School of Economics (LSE) to gain an economics and finance degree that will prepare you for a successful career. Source: Adrian Dennis/AFP

Though you can choose to pursue either economics or finance separately, studying both subjects will give you knowledge in both areas and have the extra edge that employers are looking for.

With a degree in economics and finance, students will gain a better understanding of world trade, economic models, marketing and management.

As a graduate in this field, you will be able to find employment in many sectors, especially the banking and insurance industry.

On average, you will earn up to USD74,159 annually as a financial analyst in the US. The best part is that you dont have to worry about securing a job, as the demand for a financial analyst is expected to increase by 9% by 2031.

Get a degree in economics and finance from some of the top universities in the world to secure a successful career. Here are the top universities for a degree in economics and finance:

A business degree is one of the most common academic paths students take as it leads to some of the most in-demand jobs. This degree will provide you with a broad education that includes finance, advertising, marketing, economics and the art of negotiation.

With a degree in business, you will be able to work in almost any industry.

A degree in business can prepare students for a lucrative career with an annual salary of USD82,805 annually. Not only that, those in this field will have many employment opportunities as an increase of almost 56,500 new jobs is expected to open by 2031.

Study at these universities to gain a business degree that will prepare you for the future:

A degree in artificial intelligence is a well-rounded course that includes learning about advanced mathematics, engineering, and computer science. This bachelors degree allows you to create machines or systems that can solve problems without human intervention.

Those employed in the field of artificial intelligence earn an average annual salary of USD 149,833.

BLS predicts that the investigation and security services sector will grow 6.5 per cent by 2029, faster than the average of 3.7 per cent for all workers in all industries.

Arm yourself with a degree in artificial intelligence from one of these top universities:

The growth of digital media has risen in the last decade with the increased use of computers, smartphones and tablets. These devices have changed the way we consume media.

As such, this has resulted in a growth of digital marketing jobs available. The role of a digital marketing specialist is among the ten most in-demand jobs, with over 860,000 openings.

The digital marketing sector is ever-evolving. To ensure you are up to date with the tools and techniques, arm yourself with a degree in digital marketing.

You will learn a range of skills, including graphic design, content development and social media strategy to help businesses.

Working in the digital marketing sector in the US can earn you up to USD60,251 a year.

Interested? Check out these universities around the world for the best digital marketing programme:

Read the original:

10 best degrees for the future - Study International News

Read More..

Podcast: A Sleep Scientist on New Technology and Sleep as a Social Justice Issue – InventUM | University of Miami Miller School of Medicine

Reading Time: 2 minutes

While the study of sleep is a relatively new field, researchers over the past two decades have revealed the powerful effects of sleep or the lack of it on overall health and well-being. But who are these slumber scientists, who research a realm that begins when their patients slip into a dream state?

Meet Azizi Seixas, Ph.D., associate professor of psychiatry and behavioral sciences at the University of Miami Miller School of Medicine. Dr. Seixas dedicates his career to advancing our knowledge of sleep to improve the health of communities, particularly those that are underserved. His motivation to specialize in sleep health was spurred by his own struggles and those he witnessed while growing up in a low-income inner city.

I know exactly how the lack of sleep can have a deleterious effect on your health, your livelihood and every other facet, said Dr. Seixas, who is also the interim chair of the Department of Informatics and Health Data Science and associate director of the Center for Translational Sleep and Circadian Sciences.

On the latest edition of Inside U Miami Medicine, Dr. Seixas shares his journey from growing up in Jamaica to becoming a faculty member at prestigious academic institutions in the U.S. Much of Dr. Seixas work focuses on disparities in health outcomes between ethnic and socioeconomic groups disparities that are correlated with differences in quantity and quality of sleep.

People believe that sleep is a luxury, he said. We believe sleep is a social justice issue.

Thats why he and his team are committed to translational research and creating solutions that improve the health of these communities. One such innovation is the MILBOX, a project that provides participants with a variety of in-home and wearable technology that acts as a remote health monitoring system.

Tune in to the episode to hear from Dr. Seixas about these inventions and more. Click here to listen on Spotify, or search Inside U Miami Medicine wherever you listen to podcasts.

View post:

Podcast: A Sleep Scientist on New Technology and Sleep as a Social Justice Issue - InventUM | University of Miami Miller School of Medicine

Read More..

Synthetic financial data: banking on it – FinTech Magazine

By banking on synthetic financial data, banks can tackle head-on the challenge highlighted by Gartner that, by 2030, 80% of heritage financial services firms will go out of business, become commoditised or exist only formally but without being able to compete effectively.

A pretty dire prophecy, but nonetheless realistic, with small neobanks and big tech companies eyeing their market. Survivalist banks and financial institutions need a strategy in which creating, using, and sharing synthetic financial data is a key component, says Tobias Hann, CEO of MOSTLY AI.

Banks and financial institutions are aware of their data and innovation gaps and AI-generated synthetic data is one area theyre investing in to gain a competitive edge. Synthetic financial data is generated by AI thats trained on real-world data. The resulting synthetic data looks, feels and means the same as the original. Its a perfect proxy for the original, since it contains the same insights and correlations, plus its completely privacy-secure.

Easy-to-deploy data science use cases in banking demonstrate clear value from the adoption of synthetic financial data, including advanced analytics, AI, and machine learning; data sharing; and software testing.

AI and machine learning unlock a range of business benefits for retail banks. These include advanced analytics which improve customer acquisition by optimising the marketing engine with hyper-personalised messages and precise next-best actions. Intelligence from the very first point of contact increases customer lifetime value. Since synthetic financial data is GDPR compliant, yet containing the intelligence of the original data, no customer consent is needed to harness its power, says Hann.

Synthetic financial data also enables the lowering of operating costs should decision-making in acquisition and servicing be supported with well-trained machine learning algorithms. In addition, underserved customer segments can get the credit they need by fixing embedded biases via data synthesisation; and it facilitates mass-market AI explainability, which is increasingly demanded by tech-savvy customers.

Open financial data is the ultimate form of data sharing. According to McKinsey, economies embracing financial data sharing could see GDP gains of 1-5% by 2030, with benefits flowing to consumers and financial institutions. More data means better operational performance, better AI models, more powerful analytics, and enhanced customer-centric digital banking products.

One of the most common data sharing use cases is connected to developing and testing digital banking apps and products. Banks accumulate tons of apps, continuously developing them, onboarding new systems, and adding new components. Manually generated test data for such complex systems is a hopeless task, and many revert to the risky use of production data for testing.

Generally, manual test data generation tools miss most of the business rules and edge cases that are vital for robust testing practices.

To put it simply, it's impossible to develop intelligent banking products without intelligent test data. The same goes for testing AI and machine learning models. Testing those models with synthetically simulated edge cases is extremely important to do when developing from scratch and when recalibrating models to avoid drifting.

Not all synthetic data generators are created equal. Its important to select the right synthetic data vendor who can match the financial institution's needs. If a synthetic data generator is inaccurate, the resulting synthetic datasets can lead your data science team astray. If it's too accurate, the generator overfits or learns the training data too well and could accidentally reproduce some of the original information from the training data.

Open-source options are also available. However, the control over quality is fairly low. Until a global standard for synthetic financial data is in place, it's important to proceed with caution when selecting vendors. Opt for synthetic data companies with extensive experience in dealing with sensitive financial data and know-how when it comes to integrating synthetic data successfully within existing infrastructures.

Our team at MOSTLY AI has seen large banks and financial organisations from up close. We know that synthetic financial data will be the data transformation tool that will change the financial data landscape forever, enabling the flow and agility necessary for creating competitive digital services, concludes Hann.

Go here to read the rest:

Synthetic financial data: banking on it - FinTech Magazine

Read More..

Faculty approve the creation of statistics major – The Middlebury Campus

The Mathematics Department, soon to be called the Department of Mathematics and Statistics, will now be home to a new statistics major. On Friday, April 7, Middlebury faculty voted to approve the new major with a vote of 68 to 26. The faculty discussed the proposal for over an hour before the vote.

Psychology Professor Mike Dash is on the Educational Affairs Committee (EAC) and has been working closely with the Mathematics Department on their proposal of the new major ahead of the faculty vote. The faculty vote was supposed to take place in February but was delayed until April to allow the EAC to write up a formal recommendation and the Mathematics Department to revise their proposal based on feedback they receive from the EAC.

Students should be able to graduate with a major in statistics starting next academic year, Dash said.

Alex Lyford is one of the mathematics professors who worked on the proposal.

Students interested in majoring in statistics or wanting to learn more about our statistics and data science offerings can reach out to any of our statisticians Professors Lyford, Tang, Malcolm-White or Peterson, Lyford said on behalf of the whole department.

We hope that this major and the courses within it allow students at Middlebury to explore how data, probabilistic thinking and mathematics can help us solve some of the world's most challenging and interesting problems, Lyford said.

flowchart of the courses required for the newly approved statistics major

Lily Jones 23 is an online editor and senior writer.

She previously served as a Senior News Writer and SGA Correspondent.

Jones is double majoring in Philosophy and Political Science. She also is an intern for the Rohatyn Center for Global Affairs and on the ultimate frisbee team.

View post:

Faculty approve the creation of statistics major - The Middlebury Campus

Read More..

Why semantics matter in the modern data stack – VentureBeat

Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More

Most organizations are now well into re-platforming their enterprise data stacks to cloud-first architectures. The shift in data gravity to centralized cloud data platforms brings enormous potential. However, many organizations are still struggling to deliver value and demonstrate true business outcomes from their data and analytics investments.

The term modern data stack is commonly used to define the ecosystem of technologies surrounding cloud data platforms. To date, the concept of a semantic layer hasnt been formalized within this stack.

When applied correctly, a semantic layer forms a new center of knowledge gravity that maintains the business context and semantic meaning necessary for users to create value from enterprise data assets. Further, it becomes a hub for leveraging active and passive metadata to optimize the analytics experience, improve productivity and manage cloud costs.

Wikipedia describes the semantic layer as a business representation of data that lets users interact with data assets using business terms such as product, customer or revenue to offer a unified, consolidated view of data across the organization.

Transform 2023

Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

The term was coined in an age of on-premise data stores a time when business analytics infrastructure was costly and highly limited in functionality compared to todays offerings. While the semantic layers origins lie in the days of OLAP, the concept is even more relevant today.

While the term modern data stack is frequently used, there are many representations of what it means. In my opinion, Matt Bornstein, Jennifer Li and Martin Casado from Andreessen Horowitz (A16Z) offer the cleanest view in Emerging Architectures for Modern Data Infrastructure.

I will refer to this simplified diagram based on their work below:

This representation tracks the flow of data from left to right. Raw data from various sources move through ingestion and transport services into core data platforms that manage storage, query and processing and transformation prior to being consumed by users in a variety of analysis and output modalities. In addition to storage, data platforms offer SQL query engines and access to Artificial Intelligence (AI) and machine learning (ML) utilities.A set of shared services cuts across the entire data processing flow at the bottom of the diagram.

A semantic layer is implicit any time humans interact with data: It arises organically unless there is an intentional strategy implemented by data teams. Historically, semantic layers were implemented within analysis tools (BI platforms) or within a data warehouse. Both approaches have limitations.

BI-tool semantic layers are use case specific; multiple semantic layers tend to arise across different use cases leading to inconsistency and semantic confusion. Data warehouse-based approaches tend to be overly rigid and too complex for business users to work with directly; work groups will end up extracting data to local analytics environments again leading to multiple disconnected semantic layers.

I use the term universal semantic layer to describe a thin, logical layer sitting between the data platform and analysis and output services that abstract the complexity of raw data assets so that users can work with business-oriented metrics and analysis frameworks within their preferred analytics tools.

The challenge is how to assemble the minimum viable set of capabilities that gives data teams sufficient control and governance while delivering end-users more benefits than they could get by extracting data into localized tools.

The set of transformation services in the A16Z data stack includes metrics layer, data modeling, workflow management and entitlements and security services. When implemented, coordinated and orchestrated properly, these services form a universal semantic layer that delivers important capabilities, including:

Lets step through each transformation service with an eye toward how they must interact to serve as an effective semantic layer.

Data modeling is the creation of business-oriented, logical data models that are directly mapped to the physical data structures in the warehouse or lakehouse. Data modelers or analytics engineers focus on three important modeling activities:

Making data analytics-ready: Simplifying raw, normalized data into clear, mostly de-normalized data that is easier to work with.

Definition of analysis dimensions: Implementing standardized definitions of hierarchical dimensions that are used in business analysis that is, how an organization maps months to fiscal quarters to fiscal years.

Metrics design: Logical definition of key business metrics used in analytics products. Metrics can be simple definitions (how the business defines revenue or ship quantity). They can be calculations, like gross margin ([revenue-cost]/revenue). Or they can be time-relative (quarter-on-quarter change).

I like to refer to the output of semantic layer-related data modeling as a semantic model.

The metrics layer is the single source of metrics truth for all analytics use cases. Its primary function is maintaining a metrics store that can be accessed from the full range of analytics consumers and analytics tools (BI platforms, applications, reverse ETL, and data science tools).

The term headless BI describes a metrics layer service that supports user queries from a variety of BI tools. This is the fundamental capability for semantic layer success if users are unable to interact with a semantic layer directly using their preferred analytics tools, they will end up extracting data into their tool using SQL and recreating a localized semantic layer.

Additionally, metrics layers need to support four important services:

Metrics curation: Metrics stewards will move between data modeling and the metrics layer to curate the set of metrics provided for different analytics use cases.

Metrics change management: The metrics layer serves as an abstraction layer that shields the complexity of raw data from data consumers. As a metrics definition changes, existing reports or dashboards are preserved.

Metrics discoverability: Data product creators need to easily find and implement the proper metrics for their purpose. This becomes more important as the list of curated metrics grows to include a broader set of calculated or time-relative metrics.

Metrics serving: Metrics layers are queried directly from analytics and output tools. As end users request metrics from a dashboard, the metrics layer needs to serve the request fast enough to provide a positive analytics user experience.

Transformation of raw data into an analytics-ready state can be based on physical materialized transforms, virtual views based on SQL or some combination of those. Workflow management is the orchestration and automation of physical and logical transforms that support the semantic layer function and directly impact the cost and performance of analytics.

Performance: Analytics consumers have a very low tolerance for query latency. A semantic layer cannot introduce a query performance penalty; otherwise, clever end users will again go down the data extract route and create alternative semantic layers. Effective performance management workflows automate and orchestrate physical materializations (creation of aggregate tables) as well as decide what and when to materialize. This functionality needs to be dynamic and adaptive based on user query behavior, query runtimes and other active metadata.

Cost: The primary cost tradeoff for performance is related to cloud resource consumption. Physical transformations executed in the data platform (ELT transforms) consume compute cycles and cost money. End user queries do the same. The decisions made on what to materialize and what to virtualize directly impact cloud costs for analytics programs.

Analytics performance-cost tradeoff becomes an interesting optimization problem that needs to be managed for each data product and use case. This is the job of workflow management services.

Transformation-related entitlements and security services relate to the active application of data governance policies to analytics. Beyond cataloging data governance policies, the modern data stack must enforce policies at query time, as metrics are accessed by different users. Many different types of entitlements may be managed and enforced alongside (or embedded in) a semantic layer.

Access control: Proper access control services ensure all users can get access to all of the data they are entitled to see.

Model and metrics consistency: Maintaining semantic layer integrity requires some level of centralized governance of how metrics are defined, shared and used.

Performance and resource consumption: As discussed above, there are constant tradeoffs being made on performance and resource consumption. User entitlements and use case priority may also factor into the optimization.

Real time enforcement of governance policies is critical for maintaining semantic layer integrity.

Layers in the modern data stack must seamlessly integrate with other surrounding layers. The semantic layer requires deep integration with its data fabric neighbors most importantly, the query and processing services in the data platform and analysis and output tools.

A universal semantic layer should not persist data outside of the data platform. A coordinated set of semantic layer services needs to integrate with the data platform in a few important ways:

Query engine orchestration: The semantic layer dynamically translates incoming queries from consumers (using the metrics layer logical constructs) to platform-specific SQL (rewritten to reflect the logical to physical mapping defined in the semantic model).

Transform orchestration: Managing performance and cost requires the capability to materialize certain views into physical tables. This means the semantic layer must be able to orchestrate transformations in the data platform.

AI/ML integration: While many data science activities leverage specialized tools and services accessing raw data assets directly, a formalized semantic layer creates the opportunity to provide business vetted features from the metrics layer to data scientists and AI/ML pipelines.

Tight data platform integration ensures that the semantic layer stays thin and can operate without persisting data locally or in a separate cluster.

A successful semantic layer, including a headless BI approach to implementing the metrics layer, must be able to support a variety of inbound query protocols including SQL (Tableau), MDX (Microsoft Excel), DAX (Microsoft Power BI), Python (data science tools), and RESTful interfaces (for application developers) using standard protocols such as ODBC, JDBC, HTTP(s) and XMLA.

Leading organizations incorporate data science and enterprise AI into everyday decision-making in the form of augmented analytics. A semantic layer can be helpful in successfully implementing augmented analytics. For example:

The A16Z model implies that organizations could assemble a fabric of home-grown or single-purpose vendor offerings to build a semantic layer. While certainly possible, success will be determined by how well-integrated individual services are. As noted, even if a single service or integration fails to deliver on user needs, localized semantic layers are inevitable.

Furthermore, it is important to consider how vital business knowledge gets sprinkled across data fabrics in the form of metadata. The semantic layer has the advantage of seeing a large portion of active and passive metadata created for analytics use cases. This creates an opportunity for forward-thinking organizations to better manage this knowledge gravity and better leverage metadata for improving the analytics experience and driving incremental business value.

While the semantic layer is still emerging as a technology category, it will clearly play an important role in the evolution of the modern data stack.

This article is a summary of my current research around semantic layers within the modern, cloud-first data stack. Ill be presenting my full findings at the upcoming virtual Semantic Layer Summit on April 26, 2023.

David P. Mariani is CTO and cofounder of AtScale, Inc.

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even considercontributing an articleof your own!

Read More From DataDecisionMakers

View original post here:

Why semantics matter in the modern data stack - VentureBeat

Read More..

Rao honored with Long Service Award // Mizzou Engineering – University of Missouri College of Engineering

April 11, 2023

Praveen Rao has been honored with a Long Service Award from PLOS One, a peer-reviewed open access scientific journal. Rao, an associate professor of electrical engineering and computer science, has been an academic editor of the journals editorial board for more than five years.

Rao directs the Scalable Data Science (SDS) Lab at Mizzou, where his research focuses on big data management, data science, health informatics and cybersecurity. He is also director of graduate studies for the PhD in Informatics program.

Last month, he was tapped to be an associate editor for a newly approved journal under the umbrella of the Association for Computing Machinery (ACM). ACM Transactions on Probabilistic Machine Learning will publish work around probabilistic methods that learn from data to improve performance on decision making or prediction tasks under uncertainty. Rao has been a senior member of ACM since 2020.

Visit link:

Rao honored with Long Service Award // Mizzou Engineering - University of Missouri College of Engineering

Read More..

Neural network based integration of assays to assess pathogenic … – Nature.com

A vector representation of the SBRL assays that preserves species discrimination

The CDC SBRL dataset contains more than 30 different assays that include tests to determine substrate utilization and catalytic activities. Prior to the advent of DNA sequencing, these phenotypic assays were the only method available for bacterial species identification among bacteria that had similar gram staining and colony morphology. The dataset was narrowed down to focus on eight assays that had measurements listed in them for at least 80% of the strains (Table 1).

To determine if these eight assays can differentiate between various types of bacteria, a Uniform Manifold Approximation and Projection (UMAP) dimension reduction was performed to visualize the dataset (Fig.2A). Every point in the plot was a bacterial strain. The clusters that were formed based on the results from the selected eight assays belonged to bacteria with the same species names, suggesting the machine-learning approach to use the SBRL results to aggregate similar bacteria together can recapitulate the observations of human microbiologists that were made over the course of decades. The subset of assays that the computer scientists used maintained discriminative power across species.

Exploratory data analysis discovered that the SBRL dataset discriminate between different bacterial species. (A) 2D UMAP was performed on the SBRL assays followed by k-means clustering to provide the bacterial samples cluster labels. Every point in the plot is a bacterial sample. The points form groups in the UMAP, suggesting that the SBRL assays can aggregate similar bacteria together. The colors in the figure are the k-mean labels. (B) The neural network model pushes the samples from the same bacteria species closer together. An example output of two species, Vibrio parahaemolyticus and Yersinia enterocolitica, are shown in the UMAP before and after training to show clusters are refined by the model. We quantified how well the samples from the same species are clustered together before and after the training and found the normalized mutual information went from 0.65 to 0.74.

The next challenge was to develop a vector representation for the assays that would be useful to downstream machine learning models. Two solutions were investigated to address this limitation, both of which integrated the data based on species identification. The first method computed the percent of species that have a positive signal from the assay, henceforth referred to as pps (percent positive signal). PPS was considered as a positive control, as it enhanced the pathogenicity assays with the SBRL dataset but did so without the use of machine learning. The second method used a neural network embedding model (NNEM) to create bacterial species vectors using the data from the biochemical assays, henceforth referred to as vectorization. Given we only used data from eight assays and wanted to remain comparable to the PPS, we did not choose to change the dimensionality from eight. The model simply transformed the representation of the eight assays into an eight dimensional vector per species. This process involved as input the various bacterial strains and their biochemical characteristics into NNEM, then asking the model to predict the species name for each strain based on the assay. As Fig.2A showed, this should be possible by the model. The architecture of the neural network model is shown in Supplementary Fig. 3. As the model was trained to predict the species name for each strain, it created distinct vectors for each species and these new distinct vectors represented the species for downstream analyses. This learned vector representation of the SBRL biochemical assays was then integrated into our pathogenic models at the species level. In a sense, this approach combined very old data with very new algorithms to enhance the predictive power of machine learning models trained to predict pathogenic potential. We observed that after the NNEM training, the Vibrio parahaemolyticus strains and Yersinia enterocolitica strains from the initial panel of 40 formed tighter clusters (Fig.2B). We quantified how much the NNEM helped the strains that belong to the same species cluster together and found an improvement in the normalized mutual information7, a metric used to measure how well groups cluster, from 0.65 to 0.74. It should be noted that we do not claim that the NNEM can distinguish between strains perfectly, as can be seen from the normalized mutual information scores. Namely, if it was perfect, NMI=1. We instead used the vectorization to provide a species prior for our machine learning models trained only on pathogenicity assays to benefit from the additional context.

Previously, the PathEngine platform2 was developed to evaluate results of four phenotypic assays that measure pathogenic potential of a blinded set of 40 bacterial strains. These four pathogenicity assays would reasonably be expected to associated with bacterial pathogenicity due to known biological mechanisms8,9. The host immune activation assay detected activation of NF-B signal Jurkat T lymphocytes to capture presence of pathogen-associated molecular patterns (PAMPs)10,11. The AMR assay was used to discover antibiotic resistance, providing an indication whether any instance of infection could be efficiently treated12. The host adherence assay measured the ability of bacteria binding to host cells, a crucial step for pathogens to establish an infection13,14. Lastly, the host toxicity assay detected host cell death induced by the bacteria to measure the cytotoxicity of these strains15,16. The data produced by the assays were used to train ML models to predict a strains pathogenic potential from these properties. Traditionally, an expert would review the data and make a pathogenic call based on their interpretation of the data. Here, the model learns the features from each assay and then combines those features into an ensemble model that makes a pathogenic call. The model from each assay as well as the ensemble is compared to the friend or foe designation provided by NIST. Details can be found in our prior work2. The CDC SBRL dataset contains some of the same species as the bacteria used for PathEngine analysis. It was therefore hypothesized that by integrating the SBRL data with the results of the four pathogenicity phenotypic assay data, the models would have more context about each species and achieve better performance.

However, the SBRL data was not easily integrated with the results of the pathogenicity assays, since none of the actual strains tested for pathogenic potential were present in the SBRL dataset. The two representations described in the previous section were then integrated at the species level, rather than actual strains. In other words, every strain was supplemented with SBRL data that was represented through pps or vectorization. Having established two ways to integrate the SBRL biochemical data with results from the pathogenicity assays, we then performed three tests to evaluate if, and how much, the integration of the SBRL biochemical data impacted the ML results. A total of 22 bacterial strains that belong to 14 unique species were enriched with the SBRL data based on the species names. Note that we had 40 strains to use without integrating with the SBRL data but only 22 left after the integration as the remaining species were not in the SBRL dataset (Supplementary Table 1). With many fewer strains for training and testing, the accuracy of the ML models to predict pathogenic potential was expected to be lower than we had in the original PathEngine paper2, as smaller dataset sizes are generally understood to result in lower performance for this sort of model. For each assay, we tested a model with 10 cross validation that used either (1) the pathogenicity assays only, (2) the pathogenicity assay combined with the pps or, (3) the pathogenicity assay combined with the vector representation created by the NNEM. These models were used to test how well the PathEngine predictions matched the pathogenicity designations provided by NIST. We used balanced accuracy as the metric to ensure that the performance was not biased towards the majority class and henceforth refer to this metric as accuracy. The possibility that the observed prediction improvement was due entirely to the removal of less well-understood bacterial strains from the analysis was precluded by the fact that a control condition of prediction from assay without SBRL vectors, as well as with SBRL pps. Any and all improvement can thus be attributed to the vector representation we developed.

For the immune activation assay, adding the pps increased the ML accuracy up to 24% (Fig.3A,B). When the vector representation were used instead of the average values, the accuracy improved from 51 to 85% (Fig.3A,C).

Incorporation of information from SBRL enhanced the predictions of pathogenic potential of the immune activation assay up to 34%. (A) Ten-fold cross validation of an ML model with an A. immune activation assay data alone, (B) the percent positive signal (pps) and (C) NNEM of SBRL data. The results went from 51%, 75%, to 85%, balanced accuracy respectively.

For the AMR assay, pps increased the accuracy by 2% (Fig.4A,B) and the vectors improved the accuracy by 8% (Fig.4C). For the adherence assay, pps increased the accuracy by 2% (Fig.5A,B) and the vectors improved the accuracy by 7% (Fig.5C). The toxicity assay is the only exception where the performance decreased when the SBRL representations were included (Supplementary Fig. 1AC).

Incorporation of information from SBRL enhanced the predictions of pathogenic potential of the AMR assay up to 8%. (A) Ten-fold cross validation of an ML model with an A. AMR assay data alone, (B) the percent positive signal (pps) and (C) NNEM of SBRL data. The results went from 61%, 63%, to 69%, balanced accuracy respectively.

Incorporation of information from SBRL enhanced the predictions of pathogenic potential of the adherence assay up to 7%. (A) Ten-fold cross validation of an ML model with an A. adherence assay data alone, (B) the percent positive signal (pps) and (C) NNEM of SBRL data. The results went from 58%, 60%, to 65%, balanced accuracy respectively.

In order to investigate the cause for decrease in performance of the toxicity assay predictions, all the predictions were grouped into four prediction classes ( predicted as , predicted as +, + predicted as and + predicted as +). Namely, each bacterial observation was classified as either non-pathogenic () or pathogenic (+). DAPI signals from the toxicity assay showed that the host cell death induced by the bacteria can be distinguishable between different prediction classes (Supplementary Fig. 2A). After integrating the DAPI signal and the SBRL assays, we observed that the signals were masked by the presence of the SBRL assays and stayed flat throughout the time course. The assays were not as distinct between different classes as before (Supplementary Fig. 2B). Similar observations were seen when the SBRL vectors were incorporated (Supplementary Fig. 2C).

As each assay reveals different aspects of bacterial pathogenicity8,9, we combined predictions from the best performing model from each of the four assays to make a final threat assessment call. Using the models trained without using the SBRL vectors for the ensemble, we achieved accuracy of 70%, precision of 86%, recall of 73% and F1 of 79%. When the SBRL vectors were included, the ensemble performance achieved accuracy of 79%, precision of 90%, recall of 82% and F1 of 86% (Table 2). These results confirmed that adding the SBRL data provided useful context about the bacterial species for the ML models and thus improved the pathogenicity predictions.

To understand which SBRL assays were useful for the model predictions, we annotated each assay based on literature review and also quantified the assay importance by data-driven approaches. The assays are listed and annotated with their relevance for threat assessment in Table 1. For the data-driven approaches, we first examined the signals for the four prediction classes. If the as (non-pathogens predicted to be non-pathogens) and + as + (pathogens predicted to be pathogens) had dramatically different signals, it suggested that the assay is likely useful for threat assessment.

Consistent with the literature designation, MacConkey Agar (MacC) and Salmonella Shigella (SS) Agar are most relevant for threat assessment as they have the most pronounced difference between the as and + as + classes (Fig.6A). This is consistent with established microbiological understanding. Specifically, growth on MacConkey agar and SS agar are highly associated with pathogenicity, because most Enterics will grow on these agars. These assays are what have always been used to separate coliform bacteria from other similar bacteria. The re-discovery of these markers by computer scientists with no training in microbiology is a testament to the usefulness of a data-driven approach. It gives us confidence that heretofore unrecognized markers of pathogenicity will be similarly detectable. Supplementary Table 1 lists all the species used in this assay. Details of these strains and associated tags have been described previously2. The rest of the assays used were not as distinguishable as MacC and SS between the as and + as + classes but did show noticeable differences to be considered as assays useful for threat assessment as supported by the literature (Table 1). To quantify the importance, we performed drop-assay tests where we dropped one assay at a time and compared the change in the model performance to the baseline where no assay was dropped. The change in the performance quantified the importance of the assay. We found the majority of the assays have positive importance for predicting pathogenicity with the exception of lead acetate paper (TSI:H2S=paper) and oxidase tests (O) (Fig.6B).

Comparison of threat designations of the SBRL assays based on literature and the contribution determined by the models. (A) Data-driven qualitative assessment of threat relevance of the SBRL assays based on ML predictions. Non-pathogenic strains annotated as and pathogenic strains as +. The predictions belong to 4 groups: predicted to be , predicted to be +, + predicted to be and + predicted to be +. SS, MacC are the most useful assays as their predicted to be and + predicted to be + groups are differentiable. (B) The quantitative measurement of the assay contribution by determining the changes in performance when each assay is dropped one by one. If an assay is dropped and the accuracy decreases, the assay gets a positive importance score and vice versa.

Read this article:

Neural network based integration of assays to assess pathogenic ... - Nature.com

Read More..

Predictive Analytics Helps Everyone in the Enterprise | NASSCOM … – NASSCOM Community

Gartner has predicted that, predictive and prescriptive analytics will attract 40% of net new enterprise investment in the overall business intelligence and analytics market. Why the focus on predictive analytics? Its simple! Investment in predictive analytics benefits everyone in the organization, including business users and team members, data scientists and the organization in general.

When an enterprise selects an assisted predictive modeling solution, it can satisfy the needs of business users, IT, and data scientists and achieve impressive results for the organization.

In this article, we review some of the many benefits of predictive analytics:

General Benefits

Bringing together data from across the enterprise to use for analytics ensures that your organization is considering all available information when it makes a decision. By analyzing historical data and using it to test theories and to hypothesize, the business can determine the best alternative and better understand the outcome before it decides on a direction, thereby avoiding missteps. Predictive analytics provides support for data-driven, fact-based decisions and enables insight, perspective and clarity for improved business agility and efficiency. Decisions are made on a more timely basis, problem solving is easier and the business can avoid re-work and damaging missteps in the market.

Business Users/Citizen Data Scientists

Assisted Predictive Modeling with Augmented Data Science and Machine Learning allows business users without data science or analytical skills to apply predictive analytics to any use case using forecasting, regression, clustering and other techniques with auto-recommendations and guidance to suggest appropriate analytical techniques. With self-serve predictive analytics tools, business users can leverage sophisticated predictive techniques with auto-recommendations to choose the right kind of predictive algorithm or technique for the best results. Team members can bridge the gap of data science skills so they dont have to wait for IT or data scientists to help them produce a report or perform analytics. Instead, they can use assisted predictive modeling to improve business agility and align processes, activities and tasks with business objectives and goals.

Data Scientists

Data Scientists spend much of their day addressing requests from management and business users instead of focusing on strategic initiatives where data analytics must be 100% accurate to ensure appropriate strategy. With sophisticated predictive analytics tools that are founded on the latest, most effective algorithms and analytical techniques, data scientists can spend less time coding and creating queries and pulling data together manually, less time slogging through complex systems and solutions to achieve their goals. The ability to combine data science skills with simple, easy-to-use tools and sophisticated features and functionality will make your data scientists and business analysts more productive and effective. Data Scientists can create and re-purpose analytical models and focus on strategic initiatives

These are just a few of the many benefits of predictive analytics. When an enterprise selects an assisted predictive modeling solution, it can satisfy the needs of business users, IT, and data scientists and achieve impressive results for the organization.

Investment in predictive analytics benefits everyone in the organization, including business users and team members, data scientists and the organization in general.

Originally posted here:

Predictive Analytics Helps Everyone in the Enterprise | NASSCOM ... - NASSCOM Community

Read More..

Spring Tack Faculty Lecture to challenge our notions of AI – William & Mary

Should we be frightened or excited by the rise of artificial intelligence? According to Dan Runfola, associate professor of applied science and data science at William & Mary, the answer is both.

Runfola will address the AI revolution in the spring 2023Tack Faculty Lecture, Everything Is AI-awesome: Rise of the Machines, on May 2 at 7 p.m.in theSadler CentersCommonwealth Auditorium. The event is free and open to the public with a reception to follow, andattendees are asked to RSVP.

At William & Mary, we have this wonderful nexus of individuals who care not only about the models, but also about how we are going to use these algorithms in practice, said Runfola, whose research takes place at the intersection of deep learning and satellite imagery analysis.

Runfola compared the age of AI to the Industrial Revolution, with its potential to disrupt almost every job including his own.

The number of jobs at which humans are better than AI is going to go down, which leads us to a more fundamental question: How do we handle reallocation of wealth in a society where entire sectors that used to be dominated by human labor no longer require human attention?

Because research in this field is frequently in commercial settings, commercial applications have been a key driver of innovation, explained Runfola, who iscurrently the principal investigator of the Geospatial Evaluation and Observation Lab.Here at William & Mary, were seeing students consider implications well outside of commercial opportunities. Rather than asking how do we make models more accurate, they are focusing on how to ensure that a particular modeling strategy will not result in some populations being left behind. This raises fundamentally different questions about data collection and modeling.

The group of individuals controlling the creation of algorithms is also relatively small. As reported by Runfola, over the past 20 years much of AI research has happened within the private sector.

So, what comes next? Runfola is optimistic about a future in which AI can be a copilot for our everyday lives, taking on an increasingly broad spectrum of tasks.

Today, with the right prompts you can ask a generative algorithm to write a poem; tomorrow, an AI might decide to read you a poem it wrote because you were looking glum, said Runfola, with little doubt that such can be created in the near term. That could be a beautiful thing.But on the other side if we dont put safeguards, these same technologies could be used in detrimental ways: I see youre looking a little tired today, Dan would you like me to order you a bottle of wine from our online store?

But is it all doom and gloom? Some may find it refreshing that Runfola also mentioned AIs potential to do incredibly helpful things for our society, making us happier and more productive. His lecture promises to challenge the audiences notions of AI and provide a glimpse (with live examples) into what the future will look like.

It is up to us to determine whether the implications of these changes will lead to a more vibrant world, or to one in which power is consolidated in an exceptionally small number of hands, concluded Runfola.

The Tack Faculty Lecture Series is made possible through a generous commitment by Martha 78 and Carl Tack 78. Initially launched in 2012, the Tacks commitment has created an endowment for the series of speakers from the W&M faculty.

Editors note:Datais one of four cornerstone initiatives in W&Ms Vision 2026 strategic plan. Visit theVision 2026 websiteto learn more.

Antonella Di Marzio, Senior Research Writer

View post:

Spring Tack Faculty Lecture to challenge our notions of AI - William & Mary

Read More..