Category Archives: Data Science

Data Science More Democratized and Dynamic: Gartner – CDOTrends

According to analyst group Gartner, Cloud Data Ecosystems and Edge AI are two of the top trends impacting the future of data science and machine learning.

Speaking at the Gartner Data & Analytics Summit in Sydney in August 2023, Peter Krensky, director analyst at Gartner, said: "As machine learning adoption continues to grow rapidly across industries, data science and machine learningor DSMLis evolving from just focusing on predictive models, toward a more democratized, dynamic and data-centric discipline.

This is now also fueled by the fervor around generative AI. While potential risks are emerging, so too are the many new capabilities and use cases for data scientists and their organizations.

In addition to Cloud Data Ecosystems and Edge AI, Gartner cited three other key trends: Responsible AI, Data-Centric AI, and Accelerated AI Investment.

The Summit heard that Data Ecosystems are moving from self-contained software or blended deployments to full cloud-native solutions. By 2024, Gartner expects 50% of new system deployments in the cloud will be based on a cohesive cloud data ecosystem rather than manually integrated point solutions.

Demand for Edge AI is growing to enable data processing at the point of creation at the edge, helping organizations gain real-time insights, detect new patterns and meet stringent data privacy requirements. Edge AI also helps organizations improve AI development, orchestration, integration and deployment.

Gartner predicts that more than 55% of all data analysis by deep neural networks will occur at the point of capture in an edge system by 2025, up from less than 10% in 2021. Organizations should identify the applications, AI training and inferencing required to move to edge environments near IoT endpoints.

On Responsible AI, Gartner said this trend made AI a positive force rather than a threat to society and itself. Gartner predicts the concentration of pre-trained AI models among 1% of AI vendors by 2025 will make responsible AI a societal concern.

Data-centric AI represents a shift from a model and code-centric approach to being more data-focused to build better AI systems.

The use of generative AI to create synthetic data is one area that is rapidly growing, relieving the burden of obtaining real-world data so machine learning models can be trained effectively. By 2024, Gartner predicts 60% of data for AI will be synthetic to simulate reality, future scenarios, and de-risk AI, up from 1% in 2021.

Gartner also forecasts that investment in AI will continue to accelerate by organizations implementing solutions and industries looking to grow through AI technologies and AI-based businesses.

By the end of 2026, Gartner predicts that more than USD10 billion will have been invested in AI startups that rely on foundation models large AI models trained on vast amounts of data.

Image credit: iStockphoto/NicoElNino

View post:

Data Science More Democratized and Dynamic: Gartner - CDOTrends

A Guide to Using ggmap in R – Built In

InR, ggmap is a packagethat allows users to retrieve and visualize spatial data from Google Maps, Stamen Maps or other similar map services. With it, you can create maps that display your data in a meaningful way, providing a powerful tool for data visualization and exploration.

Adding spatial and map capabilities with ggmap can be a great way to enhance your data science or analytics projects. Whether you want to showcase some examples on a map or because you have some geographical features to build algorithms, having the ability to combine data and maps is a great asset for any developer.

ggmap is an R package that allows users to access visual data from maps like Google Maps or Stamen Maps and display it. Its a useful tool for working with spatial data and creating maps for data visualization and exploration.

In this post, Ill be taking you on a journey through the world of ggmap, exploring its features and capabilities and showing you how it can transform the way you work with spatial data. Whether youre a data scientist, a geographic information specialist professional or simply someone with an interest in maps and spatial analysis, this post should help you to grasp the basic concepts of the ggmap package using the R programming language.

Lets get started.

To use ggmap, you first need to install the package in R. This can be done by running the following command:

Once the package is installed, you can load it into your R session by running the usual library command:

The first thing we need to do is to create a map. We can do that using get_map(), a function to retrieve maps using R code.

More on RGrouping Data With R

This function takes a number of arguments that allow you to specify the location, type (such as street, satellite and terrain, etc.) and the source of a map. For example, the following code retrieves a street map of Lisbon, using Stamen as a source:

In ggmaps, you can also use Google Maps as your source. To do that, we will need to set up a Google Maps API key, which we will cover later in the article.

When you run the code above, you will notice something odd in the output:

This happens because the argument location uses Google Maps to translate the location to tiles. To use get_map without relying on Google Maps, well need to rely on the library osmdata:

Now, we can use the function getbb, get boundary box, and feed it to the first argument of the function:

With the code above, weve just downloaded our map into an R variable. By using this variable, well be able to plot our downloaded map using ggplot-like features.

Once we have retrieved the map, we can use the ggmap()function to view it. This function takes the map object that we created with get_map(), and plots it in a 2D format:

On the x-axis, we have the longitude of our map, while in the y we are able to see the latitude. We can also ask for other types of maps, by providing a maptype argument on get_map:

Also, you can pass coordinates to the function as arguments. However, in order to use this function, you will need to set up your Google Maps API.

Unlocking Google Maps as a source for the ggmap package will give you access to a range of useful features. With the API set up, you will be able to use the get_map function to retrieve map images based on specified coordinates, as well as unlocking new types and sizes of the map image.

Using ggmap without Google Maps will prevent you from using a bunch of different features that are very important, such as:

So lets continue to use ggmap, but this time using Google services by providing a Google Maps API key. To use it, you need to have a billing address active on Google Cloud Platform, so proceed at your own risk. I also recommend you set up billing alerts, in case Google Maps API pricing changes in the future.

To register your API key in R, you can use the register_google function:

Now, you can ask for several cool things, for example, satellite maps:

Visualizing our new map:

We can also tweak the zoom parameter for extra detail on our satellite:

Another familiar map that we can access with google services is the famous roadmap:

Now, we can also provide coordinates to location, instead of a named version:

The madrid_sat map is a map centered on Madrid. We gave the Madrid coordinates to get_map by passing a vector with longitude and latitude to the location argument.

So far, weve seen the great features regarding map visualization using the ggmap. But, of course, these maps should be able to integrate with our R data. Next, well cover the most interesting part of this post, mixing R Data with our ggmap.

You can also use ggmap to overlay your own data on the map. First, we will create a sample DataFrame, and then use the geom_point()functions from the ggplot2 package to add those coordinates to the map.

Lets create a data frame from two famous locations in Portugal, Torre de Belm and Terreiro do Pao, and then use geom_point() to add those locations to the map:

We can now overlay our lisbon_locations on top of ggmap:

Also, we can rely on geom_segmentto connect the two dots:

With just a few lines of code, we can easily retrieve and visualize spatial data from Google Maps and add our coordinates to it, providing a valuable tool for data exploration and visualization.Finally, we can also work with shapefiles and other complex coordinate data in our plot. For example, Ill add the Lisbon Cycling Roads to the map above by reading a shapefile and plotting it using geom_polygon:

Although not perfect, the map above is a pretty detailed visualization of Lisbons cycling areas, with just a few errors. Weve built this map by:

An error occurred.

More on RThe Ultimate Guide to Logical Operators in R

The ggmap package in R provides a useful tool for working with spatial data and creating maps for data visualization and exploration. It allows users to retrieve and visualize maps from various sources, such as Google Maps, Stamen Maps, and others, and provides options for customizing the map type, location and size.

Additionally, setting up a Google Maps API can unlock additional features, such as the ability to transform coordinates into map queries and access different types of maps. Overall, incorporating spatial and map capabilities into data science projects can greatly enhance your data storytelling skills and provide valuable insights.

Read the rest here:

A Guide to Using ggmap in R - Built In

No 10 scientist who helped during Covid pandemic dies in mountain … – The Independent

Get the free Morning Headlines email for news from our reporters across the worldSign up to our free Morning Headlines email

A leading scientist who helped guide Britain through the Covid-19 pandemic has been killed in a cycling accident near Lake Garda in Italy.

Susannah Boddie, 27, was a lead health data scientist at No 10 Downing Street during the pandemic, having joined their data science team in February 2021.

She suffered fatal injuries while cycling on a wooded path on the Brescia side of the lake at around 10am on Saturday.

Local reports said that Ms Boddie, from Henley-on-Thames in Oxfordshire, had been travelling down a steep downhill trail when she was thrown from the bike, with her partner calling paramedics.

Despite their best efforts, she was pronounced dead at the scene.

In a statement, her heartbroken family said: Susannah lived life to the full and had achieved so much in her short life. She crammed in more into her life than you would have thought possible.

She was the loveliest, kindest person who always inspired and cared for others and was adored by all her many friends. She will leave the biggest hole in our family and that of Rob her much-loved partner.

She was the most wonderful daughter, sister, granddaughter and friend you could ever wish for and her memory will continue to inspire us in all we do.

A Downing Street spokeswoman added: Susannah was an incredible scientist, an inspiring sportswoman, a loved and admired colleague and friend to those at No10 and many others within the civil service.

Our thoughts are with her family at this difficult time.

Her LinkedIn account shows that she graduated from Cambridge University in 2018 with a degree in pharmacology, before she went on to study a masters degree in systems biology.

Ms Boddie and her partner are said to have recently finished a tour of the Dolomites and were due to fly home this week from nearby Verona.

A source told the Italian newspaper Il Giorno: Its a very steep trail and although the woman was wearing a helmet she was thrown quite violently and there was nothing that could be done.

Its not a tarmacked road, its a gravel track so it can be a bit tricky getting down there.

A spokesperson from the Italian police earlier said: I can confirm that a 27-year-old British woman has died after an accident while cycling near Lake Garda. The circumstances are still being investigated and officers are preparing a report.

The womans partner raised the alarm and he was taken to hospital but was not injured although very shocked."

Read more:

No 10 scientist who helped during Covid pandemic dies in mountain ... - The Independent

Grand challenges in bioinformatics education and training – Nature.com

Asif M. Khan

Present address: College of Computing and Information Technology, University of Doha for Science and Technology, Doha, Qatar

These authors contributed equally: Esra Bra Ik, Michelle D. Brazas, Russell Schwartz.

Deceased: Christian Schnbach.

Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Istanbul, Turkey

Esra Bra Ik&Asif M. Khan

APBioNET.org, Singapore, Singapore

Esra Bra Ik,Harpreet Singh,Hilyatuz Zahroh,Maurice Ling&Asif M. Khan

Ontario Institute for Cancer Research, Toronto, Ontario, Canada

Michelle D. Brazas

Bioinformatics.ca, Toronto, Ontario, Canada

Michelle D. Brazas

Carnegie Mellon University, Pittsburgh, PA, USA

Russell Schwartz

School of Computer Science and Engineering, University of New South Wales, Sydney, New South Wales, Australia

Bruno Gaeta

Swiss Institute of Bioinformatics, Lausanne, Switzerland

Patricia M. Palagi

Dutch Techcentre for Life Sciences, Utrecht, the Netherlands

Celia W. G. van Gelder

Amrita School of Biotechnology, Amrita Vishwa Vidyapeetham, Clappana, India

Prashanth Suravajhala

Bioclues.org, Hyderabad, India

Prashanth Suravajhala&Harpreet Singh

Department of Bioinformatics, Hans Raj Mahila Maha Vidyalaya, Jalandhar, India

Harpreet Singh

European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK

Sarah L. Morgan

Genetics Research Centre, Universitas YARSI, Jakarta, Indonesia

Hilyatuz Zahroh

School of Applied Science, Temasek Polytechnic, Singapore, Singapore

Maurice Ling

Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, Luxembourg

Venkata P. Satagopam

International Society for Computational Biology, Leesburg, VA, USA

Venkata P. Satagopam

CSIRO Data61, Brisbane, Queensland, Australia

Annette McGrath

Institute of Medical Science, University of Tokyo, Tokyo, Japan

Kenta Nakai

Department of Biochemistry, YLL School of Medicine, National University of Singapore, Singapore, Singapore

Tin Wee Tan

National Supercomputing Centre, Singapore, Singapore

Tin Wee Tan

State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, Biomedical Pioneering Innovative Center and Beijing Advanced Innovation Center for Genomics, Center for Bioinformatics, Peking University, Beijing, China

Ge Gao

Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa

Nicola Mulder

Department of Biology, School of Sciences and Humanities, Nazarbayev University, Astana, Kazakhstan

Christian Schnbach

School of Landscape and Horticulture, Yunnan Agricultural University, Kunming, China

Yun Zheng

Cancer Research Center, Spanish National Research Council, University of Salamanca & Institute for Biomedical Research of Salamanca, Salamanca, Spain

Javier De Las Rivas

Centre for Bioinformatics, School of Data Sciences, Perdana University, Kuala Lumpur, Malaysia

Asif M. Khan

Conceptualization: M.D.B., R.S., B.G., P.M.P., C.W.G.v.G., P.S., H.S., S.L.M., H.Z., M.L., A.M., K.N., T.W.T., G.G., A.M.K. Writing original draft preparation: E.B.I., M.D.B., R.S., B.G., P.M.P., C.W.G.v.G., P.S., H.S., S.L.M., H.Z., A.M.K. Writing review and editing: E.B.I., M.D.B., R.S., B.G., P.M.P., C.W.G.v.G., P.S., H.S., S.L.M., H.Z., M.L., V.P.S., A.M., K.N., T.W.T., G.G., N.M., C.S., Y.Z., J.D.L.R., A.M.K. Project administration: E.B.I., M.D.B., R.S., A.M.K.

Read more from the original source:

Grand challenges in bioinformatics education and training - Nature.com

Optimizing GPT Prompts for Data Science | by Andrea Valenzuela … – Medium

DataCamp Tutorial on Prompt EngineeringSelf-made image. Tutorial cover.

Its been a week and in ForCodeSake we still have emotional hangover!

Last Friday 28th of July, we run our first ForCodeSake online tutorial on Prompt Engineering. The tutorial was organized by DataCamp as part of their series of webinars about GPT models.

As the first online debut of ForCodeSake, we decided to show different techniques to optimize the queries when using GPT models in Data Science or when building powered-LLM applications. Concretely, the tutorial had three main goals:

Learn the principles of Good Prompting.

Learn how to standardize and test the quality of your prompts at scale.

Learn how to moderate AI responses to ensure quality.

Feeling like you would like to follow the tutorial too?In this short article, we aim to provide the pointers to the courses material for you to benefit from the full experience.

To follow the webinar, you need to have an active OpenAI account with access to the API and generate an OpenAI API Key. No idea where to start? Then the following article is for your!

Have you ever received lackluster responses from ChatGPT?

Before solely attributing it to the models performance, have you considered the role your prompts play in determining the quality of the outputs?

GPT models have showcased mind-blowing performance across a wide range of applications. However, the quality of the models completion doesnt solely depend on the model itself; it also depends on the quality of the given prompt.

The secret to obtaining the best possible completion from the model lies in understanding how GPT models interpret user input and generate responses, enabling you to craft your prompt accordingly.

More:

Optimizing GPT Prompts for Data Science | by Andrea Valenzuela ... - Medium

Insight: What happens when a $2 million gene therapy is not enough – Reuters

LOS ANGELES, Aug 12 (Reuters) - Baby Ben Kutschke was diagnosed at three months with spinal muscular atrophy, a rare inherited disorder which is the leading genetic cause of death in infancy globally. It leaves children too weak to walk, talk, swallow or even breathe.

So when in 2021 his parents heard about Zolgensma a one-time therapy costing millions of dollars that promises to replace genes needed for the body to control muscles they had high hopes.

They were disappointed.

After treatment with the $2.25 million therapy at almost eight months old, Ben was able to hold his head up for a few seconds a significant milestone, his mother Elizabeth Kutschke told Reuters. But he did not advance to rolling over or sitting up, and after a few weeks doctors recommended the family add another drug to help him.

"I just started to worry," she said from their home in Berwyn, Illinois. "He wasn't getting worse, but the gradual progress we had seen ... was no longer happening."

Ben is one of a growing number of patients with spinal muscular atrophy (SMA) whose doctors are turning to additional drugs on top of the gene therapy, six top U.S. neurologists told Reuters.

Their experience raises broader questions around other high-cost gene therapies coming to market, sometimes after accelerated regulatory approvals, drug pricing experts said.

Zolgensma, launched in 2019 by Swiss-based healthcare group Novartis as a "potential cure" for SMA, was the most expensive drug in the world at the time.

Gene therapies work by replacing genes the body's blueprint for its development. The gene Zolgensma delivers instructs the body to make a protein vital for muscle control.

Other SMA therapies need to be taken continuously, but Zolgensma's price was justified by hopes this revolutionary approach could beat the condition once and for all.

Zolgensma has been given to more than 3,000 children globally, with 2022 sales of $1.4 billion representing 91% of gene therapy sales worldwide, according to the IQVIA Institute for Human Data Science. In the U.S., where costs are borne by government health programs such as Medicaid as well as private insurance, IQVIA estimated Zolgensma sales totaled $434 million last year.

It has worked well for many. Novartis' data presented in March shows that depending on the timing of treatment, most patients have gone on to swallow, breathe, or even walk independently, said Sitra Tauscher-Wisniewski, vice president at Novartis Gene Therapies. Some are able to run and climb.

Three of six families interviewed by Reuters whose children received Zolgensma said they were progressing as well as hoped; Ben's was the only one to turn to another treatment.

But Novartis' data also shows almost one-third of children in an ongoing study went on to be given other drugs.

If gene therapies do fall short, it becomes harder to justify prices that researchers have argued are already poor value.

"The perception that Zolgensma is going to be a complete cure ... is not coming to fruition from the data we have seen over the last four years," said Dr. Roger Hajjar, director of the Mass General Brigham Gene & Cell Therapy Institute.

"Following the injection of Zolgensma you are basically going from a deadly disease to a more chronic disease state," he said, adding that many patients would not survive without this treatment.

Novartis said Zolgensma has been "transformative," but each child's journey is unique and outcomes can vary. It says there is no evidence additional therapies can help. Notably, it has dropped the term "potentially curative" common in its analyst calls in 2018 and 2019 from descriptions of Zolgensma, instead calling it a "one-time treatment."

"People still have misconceptions about Zolgensma," Kutschke told Reuters. "It is a treatment, not a cure."

Novartis CEO Vasant Narasimhan spelled out the pricing argument in a 2018 call: "Payers appreciate that when you deliver a potentially curative therapy that takes cost out of their healthcare system and enables people to live, hopefully, a more normal life, they are willing to pay and they see the value," he said.

The multimillion-dollar pricing strategy is not unique. More recently, the first hemophilia gene therapy approved by the U.S. Food and Drug Administration was priced by CSL Behring at $3.5 million; 26 more gene therapies are in late-stage development, according to IQVIA.

Drugmakers say the long-term benefits are worth it. CSL said it is confident that its treatment can generate cost savings and has offered partial refunds if patients need to resume injections of blood-clotting proteins in the first four years after the therapy.

Novartis offers payment by installments although it said no one in the U.S. has taken them. It also has plans linking payment to how well a patient responds, but said U.S. refunds under them have been rare.

The economics of treating the relatively small number of patients make a high price tag imperative for manufacturers.

"One of the arguments is you're saving all of these millions of dollars down the line," said Stacie Dusetzina, professor of health policy at Nashville's Vanderbilt University School of Medicine.

"But if you found out later on that oh, actually you have to get most of those other treatments, I think the question is, 'was that price really a fair price to begin with?'"

The Institute for Clinical and Economic Review (ICER), a drug pricing research group, has said Zolgensma's maximum price should be $900,000 less than half its current cost.

The two other available SMA treatments are also not cheap.

Biogen's Spinraza, injected into the spine, has a U.S. list price of $800,000 in the first year followed by maintenance doses at $400,000 a year. Roche's Evrysdi, an oral solution, costs $100,000 to $340,000 per year depending on how much the patient weighs.

Both Biogen and Roche said studies of their drugs as treatments for patients who do not respond sufficiently to Zolgensma have been encouraging so far. Biogen's Chief Medical Officer Maha Radhakrishnan said a second drug could still add value given the high cost of caring for severely disabled patients.

The fact that some children need treatment with other expensive drugs after Zolgensma shows that the gene therapy represents "poor value," said Steven Pearson, ICER president.

It's a headache for health insurers, who usually cover the full cost of Zolgensma upfront: It's not yet clear how to manage a payment plan over time, and that would require more real-world data on how patients respond, said Sree Chaguturu, chief medical officer at CVS Health, which owns health insurer Aetna.

"How long do you actually need to monitor and track those patients?" he said, noting that Americans commonly move between different insurance plans.

Many parents of children with SMA say Zolgensma is worth it.

Amanda Cook, a 32-year-old bookkeeper in Lebanon, Virginia, already knew when she gave birth to her son Weston in 2021 that he was diagnosed with SMA: It had led to the death of his older brother, Jackson, at 7 months.

Weston had a flaw in the gene that produces a protein critical to the function of cells that move muscles. The lack of that protein, known as SMN (survival motor neuron), results in SMA, which affects fewer than 300 babies a year in the United States.

But he had Zolgensma at 11 days old and "is ahead of everything, honestly," Cook said.

All therapies have been stopped and he is "doing amazing as a true 2-year old."

The treatment works best when infused as soon as possible after birth some patients can begin to suffer from degeneration before birth and Ben Kutschke was a relative latecomer.

His parents saw him having problems when he was about a month old. Formula spilled from the corners of his mouth. He couldn't hold a pacifier; his doctor suggested they try different sizes but it became clear his condition was serious when Elizabeth heard liquid slipping down his windpipe.

Ben was found, like about 60% of cases, to have the most severe form of SMA which often leads to paralysis and death before age two.

"The earlier and better preserved the nerves are, the more likely you are to have success," said Dr. Jerry Mendell, director of the gene therapy center at Nationwide Children's Hospital in Columbus, Ohio and principal investigator in the Zolgensma trials.

Zolgensma is becoming available to younger babies across the U.S.: Most states screen newborns for it, and health insurers typically cover Zolgensma for children with the most severe form.

But even after diagnosis, not all children are immediate candidates.

Zolgensma delivers the gene through a modified virus. Ben had already been exposed to the virus in nature and had developed antibodies, which would neutralize the replacement genes in his body.

Dr. Russell Butterfield, pediatric neurologist, University of Utah Health and Intermountain Primary Children's Hospital, said it's "not all that rare" for babies to have antibodies to the vector. He put the incidence at 15% to 20%.

Babies can also be ineligible due to issues such as liver problems Zolgensma's label warns it can cause these. Last year, two children in Russia and Kazakhstan died of acute liver failure several weeks after receiving it.

Ben was originally treated with Evrysdi, which aims to boost SMN levels directly. It took months for his antibody levels to fall to the point he could start Zolgensma.

"When they did say the antibodies were down, oh I cried on that phone call," Kutschke said.

When he finally received Zolgensma, Ben's hand movements became smoother and he was able to lift his head, "but then after that big rocket forward in progress, it seemed to just stall out," his mother said.

The body develops antibodies to the virus used, so each therapy using it can only be administered once. Doctors recommended a third treatment, Biogen's Spinraza.

The most recent Novartis data, updated in March, shows this was not unusual. Its study found that 24 of 81 children given Zolgensma as of May 2022 had been subsequently treated with other SMA drugs. Novartis declined to provide details on them.

"There are a whole lot of things that happen in real life circumstances" to affect the utility of treatment with Zolgensma, Mendell said. For less responsive patients, Mendell said it is reasonable to use other treatments.

But it can be difficult to get insurance coverage after Zolgensma. Some health plans explicitly exclude add-on therapies if a child received it, according to policies posted on their websites.

Kutschke said her insurance company UMR, part of UnitedHealth Group, balked at reimbursing another drug. A spokesperson for UnitedHealth declined to comment.

Kutschke said UMR reasoned that Ben "should be fine" after getting Zolgensma "Which is really heartbreaking because we thought he would be too."

It took nearly six months of paperwork and appeals supported by Ben's medical team before payment for Spinraza was approved.

Since then Ben, who now uses a wheelchair at age 2, began talking a lot more. He was suddenly able to move his lower body, his mother said.

Additional reporting by Eric Cox in Chicago; Edited by Caroline Humer and Sara Ledwith

Our Standards: The Thomson Reuters Trust Principles.

See more here:

Insight: What happens when a $2 million gene therapy is not enough - Reuters

Automated Data Mapping Tools: 5 Red Flags to Avoid – JD Supra

Data subject access requests (DSARs), records of processing activities (RoPAs), vendor risk management, a dozen other data privacy compliance requirementsall of them depend upon or are significantly facilitated by a map of the personal information your organization processes.

But theres no hard and fast requirement for a data map in the GDPR, CPRA, or any other data privacy regulation. As a result, many privacy professionals dont think to investigate data mapping until they start diving into the day-to-day work of their privacy program. After weeks of interviews, dozens of emails, and a labyrinth of spreadsheets, they realize that they need an automated tool.

But as is always the case, not every tool is made equal. Some are inefficient; some substitute the work of manual data mapping with more work of a different kind; some just create yet more work for a privacy professional.

To help privacy professionals spot tools that are more trouble than their worth in advance, weve identified five red flags to watch out for in an automated data mapping tool.

Data mapping isnt unique to data privacy. Knowing where organizational data lives and what types of data are available is important for a wide variety of projects. That might include:

There's a wide variety of tasks that a data scientist can accomplish, but usually, the business wants them to conduct analyses that translate directly to dollars and cents. When privacy professionals need to rely on data scientists to map the organizations data for compliance purposes, theyll often find that compliance tasks are de-prioritized in favor of revenue generation.

Making a persuasive business case for your privacy program can mitigate this to a degree, but the reality is that data scientists are always going to be an in-demand resource at any organization. If your automated data mapping tool is owned and operated by the data science function, your privacy program will always be steps behind, and your organizational compliance posture will never be where you want it to be.

Software can do a lot of things, but magic isnt one of them. Invariably, an automated data mapping tool will run into edge cases, exceptions, and instances where manual effort is required. Consider how youll map data from:

In 1955, psychologists Joseph Luft and Harrington Ingham coined the term unknown unknowns; that is to say, issues that you aren't aware of and which you lack insight into. Unknown unknowns always appear, and the hallmark of a good tool is being prepared to handle them.

For the unknown unknown stores of personal data at your organization, its essential that your tool provides a way to facilitate discovery and streamline manual mapping efforts.

When automated data mapping tools make no mention of how they facilitate necessary manual work, they also tend to have a very narrow definition of automation and a very narrow scope. For example, an allegedly automated data mapping solution might automate just the discovery of personal data stores and not the metadata labeling and tagging that makes downstream compliance activities possible.

Whether you use an in-house automated data mapping tool or a third-party tool, a common issue that privacy professionals run into is being inundated by data stores that need to be investigated. Because data privacy compliance is an ongoing process, new data stores will be added to your data map all the time. Not all of these data stores pose the same level of risk. Some might not be involved in downstream data transfers, for example; they might not store sensitive data; or they might not store large volumes of data.

Some automated data mapping tools present these data stores as equally important. That means youll have to spend time manually investigating low-risk data stores while stores that actually pose a high risk remain unmitigated.

But in reality, it isnt too much to ask for an automated data mapping tool to estimate the level of risk posed by one data store or another. Its possible to assess the number of exports to vendors, the number of connected systems, the number and types of data fields scored, the number of identities handled, and so on to estimate high-risk versus low-risk data stores.

Since your organizations data landscape is perpetually changing, youll need to use your automated data mapping tool to scan for data stores on a regular basis. When you do, you wont want to have to wade through a backlog of data stores youve already investigated and evaluated as being irrelevant.

Not everything thats capable of holding personal information will actually do so. Or sometimes youll find data stores that require no further action. The right tool will provide quality-of-life capabilities that allow you to flag certain data stores as irrelevant, so you dont waste unnecessary team re-reviewing something that doesnt affect compliance.

Non-privacy-focused data mapping tools are often guilty of this, but even some tools meant strictly for privacy professionals suffer the same flaw: They dont make it easy to actually do anything with your data map.

There isnt a law that specifically says you need to have a data map for your organization. However, a myriad of regulatory requirements depends upon or are made significantly less tedious with a data map, such as:

Thats why the best data mapping tools for privacy professionals are integrated into an overall compliance platform.

Take Osano for example. Privacy professionals who use Osano as their automated data mapping tool can easily use discovered data for DSARs, to populate their RoPAs, and to quickly filter and search through data stores and associated metadata to identify redundancies, unneeded data, and data stores that are potentially responsive to a DPIA.

In fact, Osano passes all of the tests we described in this articleit:

See more here:

Automated Data Mapping Tools: 5 Red Flags to Avoid - JD Supra

The first adult learners of the Saturdays in the D summer program … – University of Michigan

By: Greta GuestSource: Michigan NewsPhotos By: Scott Soderberg, Michigan Photography

LaCresha Goss celebrates earning her certificate in writing and editing during the Saturdays in the D program this summer at the U-M Detroit Center.

LaCresha Goss celebrates earning her certificate in writing and editing during the Saturdays in the D program this summer at the U-M Detroit Center.

Switching gears after losing his job with a local auto supplier, Raphael French thought a Python class offered through Saturdays in the D could be a great start.

The program spoke to me because I wanted to get into computers. I figured Id have to learn the language, French said. I wanted to get back to something someone couldnt take away from me and thats education.

He and other students who participated in the six-week pilot Saturdays in the D Summer Camp and Adult Skills Enrichment Experience graduated Saturday with a course certificate. Of the 109 who registered, 38 made it to the finish line.

The city of Detroit, Detroit Public Schools Community District, University of Michigan Center for Academic Innovation, Ally Financial and the Song Foundation partnered to offer the program this summer after a three-year COVID delay.

Melia Howard, city of Detroit director of community relations, at the Saturdays in the D graduation.

You all are what we are calling our pioneer learners, said Melia Howard, director of community relations for the city of Detroit, noting that the program was first focused on childrens extracurricular activities and tutoring. We transitioned this year to include adults so families can have a shared experience in learning coding and other programs.

While Detroit youth headed to the University of Michigan campus in Ann Arbor for a variety of STEM camps, for the first time adult residents had the opportunity to develop new skills with free professional development classes offered at the U-M Detroit Center on Woodward Avenue.

The classes offered adults a choice of four topics: Getting Started with Python; Writing and Editing; Science of Success; and Emotional Intelligence.

Enhancing skills for new directions

LaCresha Goss started out with the Python program with her daughter Selena, a 2013 U-M graduate, but switched to the writing course. Both women want to enhance their careers.

LaCresha, who works at a Veterans Administration call center, said she wants to be more of an advocate for the veterans. I want to use my words better. Thats what I hope this program will do for me.

LaCresha Goss and her daughter Selena Goss at the Saturdays in the D graduation ceremony at the U-M Detroit Center.

And Selena wants to go from her work in food access to policy. I want to be in rooms where policies are made for people who look like me.

The adult learners also were able to access:

Workshops on budgeting, credit and home ownership; a panel about trends in tech jobs called TechnicALLY Speaking, plus peer mentors from Ally Financial. Admissions counseling from the U-M Admissions Office for learners interested in furthering their education. Detroit at Work program staff to learn about top fields in need of workers. Free laptop loans

Because its all about helping communities build new skills and explore career paths, its no surprise Saturdays in the D has been so popular over the years, said Ali Summerville, business administration executive and head of Corporate Citizenship at Ally. Supporting our hometown communities is a top priority, and were confident the return of Saturdays in the D will continue to inspire a brighter future as a catalyst for economic mobility.

Rapheal French graduates from the Saturdays in the D program during a ceremony at the U-M Detroit Center.

French met with an admissions officer to talk about options. Hes starting at Wayne County Community College in the fall to study computer information systems. But the ultimate goal is to transfer to U-M, he said.

When I go back into the workforce, I want to have an education, he said. The prestige behind U-M made me want to join the Saturdays in the D program. To me, its about the access. Without it, I wouldnt have met all these people. Going to Michigan is an ultimate dream of mine.

French said he was very inspired by his program instructor Tamara Qawasmeh. She was a data curator at the U-M Institute of Social Research when she learned Python. Then she went for her masters degree at U-M in applied data science.

Rapheal French receives his certificate for the Python course from instructor Tamara Qawasmeh.

Honestly, Python changed my life. My coding brain got turned on, Qawasmeh said. But you cant learn coding without help. Having a support system is the only way I was able to learn Python.

Creating the support systems

The online courses were orginally created by U-M faculty in collaboration with the Center for Academic Innovation to make the content work for a global online audience. Those online courses were then selected to be featured in the Saturdays in the D program.

The next step was to work with the U-M Center for Innovation content strategy team and the city of Detroit team to perfect the offerings for the in-person approach to learning. Course facilitators were able to use an adult learning framework and lesson plan templates designed by the Center of Academic Innovation as the basis for their weekly sessions, but were given the latitude to personalize it for each group.

We are proud to offer Michigan Onlines workforce development and skill-building opportunities to learners in the city of Detroit. Through the enthusiastic participation of learners, we have built on our understanding of what learning opportunities are most relevant to these adult learners, and how to best support them by blending facilitated instruction in online and in-person environments. We are using this information to inform future programs, said Sarah Dysart, senior director of online learning at the Center for Academic Innovation.

Ally Financial team includes Bhusan Satapathy, director of Data & Analytics; Dan Lemont, executive director of Tech Strategy; Brenda Rios, public relations; and Bill Barbour, manager of Fraud Technology.

While these courses are offered through the universitys online education platform Michigan Online, they were redesigned as a hybrid (online + in-person) experience and focused on how people from various educational and socio-economic backgrounds learn best.

We knew the program participants were adult learners, so we wanted to learn more about them. Thats why we collected additional data on their backgrounds, aspirations, and needs to inform design decisions and how the course facilitators could best employ the principles of adult and inclusive learning, said Ahmed Lachheb, senior learning experience designer at the Center for Academic Innovation.

Several faculty members who created the original online courses including Charles Severance (Python), Paula Caproni (Science of Success), Jeffrey Sanchez-Burks and Cheri Alexander (Emotional Intelligence) and Patrick Barry (Writing and Editing) have also attended some of the in-person sessions.

Lauren Atkins Budde, director of academic content strategy for the U-M Center for Innovation.

At the end of the day, this program is about communities that value education. It took a lot of support and passion for lifelong learning from the city and several units and faculty at U-M to pull this together, said Lauren Atkins Budde, director of academic content strategy at the U-M Center for Innovation. And through the past two months weve seen a group of curious strangers become a dedicated, inspiring community of learners invested in themselves and each other. Its pretty magical.

Back to News + Stories

Go here to read the rest:

The first adult learners of the Saturdays in the D summer program ... - University of Michigan

How AI is transforming scientific research, with Rebecca Willett (Ep … – UChicago News

One of the first areas to be affected by AI has been science. Researchers have already started to integrate these new technologies into their work., but what does it mean to let these systems into our labs? Can they be trusted? And, if they so, how radically can they push science forward?

Prof. Rebecca Willett is the faculty director of AI at the University of Chicago Data Science Institute. She is at the forefront of thinking through how AI will change science, and in this episode, she examines the opportunitiesand the dangersof this rapidly emerging technology.

Subscribe to Big Brains onApple Podcasts,StitcherandSpotify.

(Episode published August 10, 2023)

Subscribe to the Big Brains newsletter.

Please rate and review the Big Brains podcast.

Paul Rand: Big Brains is supported by the University of Chicago Graham School. Are you a lifelong learner with an insatiable curiosity? Join us at Graham and access more than 50 open enrollment courses every quarter in literature, history, religion, science, and more. We open the doors of U Chicago to learners everywhere. Expand your mind and advance your leadership. Online and in-person offerings are available. Learn more at graham.uchicago.edu/bigbrains.

Hey ChatGPT.

ChatGPT: Hello Paul. How can I assist you today?

Paul Rand: Tell me everything you know about Professor Rebecca Willett.

ChatGPT: As of my last update in September 2021, Rebecca Willett was a professor of statistics and computer science at the University of Chicago.

Paul Rand: At this point, many of you have probably had an experience just like this.

ChatGPT: Professor Willetts Research has included significant contributions to the theory and methods of machine learning.

Paul Rand: Instead of simply going online when you need to research something. For instance, a guest coming on your podcast, you go to ChatGPT.

ChatGPT: She has also worked on applications of these techniques to a variety of fields including healthcare, astronomy and social science.

Paul Rand: ChatGPT isnt the first AI in our world.

Tape: ChatGPT, the revolutionary new language model developed by OpenAI with the ability to generate human-like text and hold natural conversations.

Paul Rand: But it did prove that these machine learning systems are capable of things that many people thought were impossible.

Tape: It impersonates what its read on the internet. Type in a request in it can write legal documents, software, even school essays. People are predicting it will wipe out whole industries.

Tape: Artificial intelligence tools like ChatGPT have really, theyve seemingly changed the world overnight, and many are racing right now to find ways to harness what seems like incredible power.

Paul Rand: ChatGPT, are you going to change the world?

ChatGPT: As an AI language model, I dont have personal agency or the ability to directly change the world. It is ultimately up to humans to utilize technology and knowledge to shape the world for the better.

Paul Rand: One of the ways we shape the world is with what we focus on in this podcast, research. And AIs could give us the power to completely revolutionize the way that we do science. But as our ChatGPT friend said, its up to us to use them to make it better, even though it has the power to make it worse.

Rebecca Willett: I think theres an enormous amount of hope that its really just going to fundamentally change the way we think of the scientific method.

Paul Rand: That of course, is University of Chicago professor Rebecca Willett.

Rebecca Willett: Just as when computers became widely accessible, it changed the way that we approached science and what we could do with scientific data. I think AI is similarly going to have just an enormous impact on this entire process.

Paul Rand: There isnt a single thing that the professors and researchers on our show do that couldnt or wont be changed by AI.

Rebecca Willett: The way that we formulate hypotheses, design experiments, collect data, and then analyze that data and then kind of close the loop by using that to help us develop new hypotheses.

Paul Rand: Willett is the faculty director of AI at the Data Science Institute, where she is at the forefront of thinking through how AI will change science, both the opportunities and the dangers.

Rebecca Willett: And so the hope is that with these more complicated non-linear models that are being used in AI right now, things like neural networks, that we can take large amounts of data and figure out patterns that as humans we didnt recognize or even expect.

Paul Rand: There is no one better to help us think through how these systems will fundamentally change how research happens.

Rebecca Willett: And weve seen this sort of step up in terms of the capability of these systems, and I think this has really just sparked the imagination of a lot of people. With these kinds of capabilities, what could we do now that we hadnt even imagined trying to do a few years ago?

Paul Rand: Welcome to Big Brains where we translate the biggest ideas and complex discoveries into digestible brain food. Big Brains, little bites from the University of Chicago Podcast Network. Im your host Paul Rand. On todays episode, how AI will Change Science and Research. ChatGPT if you were going to interview Rebecca Willett for a podcast about how AI will change science and research, what would you ask?

ChatGPT: That sounds like a fascinating podcast episode. Here are some potential questions you might consider. One, what are some of the most exciting opportunities that AI offers for advancing scientific research?

Rebecca Willett: So theres a lot of work in trying to use AI to accelerate the development of pharmaceuticals and drugs. I am especially excited about this because as we know, there are diseases that predominantly impact underserved communities, that are often under prioritized for this kind of research or under-resourced. And so if we can use AI to accelerate this process or reduce the cost of this discovery process, hopefully well see a real leap forward in the treatment of disease worldwide.

Another thing that I think we will see people doing is using AI to design new materials, especially materials that are more sustainable and perhaps more biodegradable or better for the environment. Using AI to design things like microbial communities that can help break down plastics or remove nitrates from water. It could be really useful for developing sustainable climate policies. So not only do we want to predict what the climate might look like under different scenarios, but wed like to have a better sense of uncertainties associated with those predictions and to design better economic policies, better tax strategies, better incentive programs.

If we only have forecasting systems that can run on supercomputers, then our ability to do that is somewhat limited. But with AI systems, I think well be able to do this much more effectively and quickly and reliably. And so these are just a few of the things off the top of my head, and this is just in the basic sciences. If we expand our scope to also think about health sciences or healthcare, theres just a lot of potential there as well, in terms of improving our ability to analyze lab tests or medical imaging data, our ability to understand a patients entire case history or even better evaluate how they will respond to different kinds of treatments.

Paul Rand: These are just a few of the incredible ways AI could change science. But what do they look like in practice? There are some basic steps of the scientific process; hypothesis generation, experiment design, data collection, that are going to be revolutionized by AI. But well start with Willetts specialty, data analysis.

Rebecca Willett: One kind of first pass thing thats going to happen is that people are going to start using AI to analyze data being collected within scientific context. So many of us have read, for instance about the James Webb Space Telescope.

Paul Rand: Right, right.

Tape: NASAs James Webb Space Telescope, the largest and most powerful of its kind, launched last Christmas and released its first image in July, the deepest sharpest view weve ever seen of the universe. Since then it has captured far away star nurseries, cosmic cliffs and galactic clusters. Anyone can see that the images carry breathtaking beauty and astonishing scale, but what do they actually tell us about our cosmos?

Rebecca Willett: This instrument and many instruments like it are collecting just huge volumes of data that cant possibly be looked at by a human, not all of it. And so the hope is that by using these AI tools, were going to see patterns that might escape a human or be able to see phenomena or anomalies that kind of countermand our current understanding of the science and lead us to asking new questions that we hadnt thought about before or questioning where our existing models are starting to break down. And so using AI to just analyze the raw data is the bare minimum of what were going to be seeing a lot of in the future.

Paul Rand: This raw power to analyze massive sets of data could solve a problem thats plagued science forever. Many times, whatever was being studied led to a negative result. For example, we thought these two compounds when mixed would create a new malaria drug, but they didnt. And because its not a positive result, it would get discarded.

Rebecca Willett: Yeah, I think this is a common concern in the sciences. I think people refer to it as the file drawer effect. Right? You get a negative result, you put it in your filing cabinet and forget about it.

Paul Rand: Yes, yes.

Rebecca Willett: Thats just sort of the nature of the field. If I have a positive result, then itll generally get more attention. And publishers are most interested in publishing those kinds of results.

Paul Rand: But that doesnt mean the result is useless. We still learn something. As the famous saying goes, were just discovering a thousand ways not to make a light bulb.

Rebecca Willett: And I think perhaps AI will change some of these trends. And I know that there are ongoing efforts with using things like large language models to analyze the scientific literature and to cross-reference different papers that are being published in different journals by different groups around the world, in order to kind of extract higher level themes or patterns.

Paul Rand: Fascinating.

Rebecca Willett: And I think thats a setting where these negative results could be enormously impactful and help with the development of those models. And so its possible that this kind of file drawer effect that weve had in the sciences for decades, we might just change the way we think about it with the development of these AI tools for trying to extract information from the literature. Maybe well see an added value to that that was a little harder to harness in the past.

Paul Rand: But there is a concern when it comes to using AI to analyze data. The founders of ChatGPT have already admitted theyre not quite sure how their AI comes to any individual results. In the context of an experiment, what if an AI analyzes the data incorrectly? If half the time AI models make predictions that contradict established scientific knowledge but turn out to be correct, how will we know when its right or when its wrong, especially if we dont understand how it works?

Rebecca Willett: Real science is about more than detecting patterns. Its about really understanding what the underlying mechanisms are. Its just much more than making raw predictions. And its not clear to what extent AI tools are really reflecting understanding, as opposed to having recognized different patterns. So lets just take ChatGPT as an example, because I think a lot of people listening have maybe played around with it a little bit. And when you do, it can almost feel like youre interacting with a human. It produces very realistic text. But under the hood, what its doing is on the most basic level, very simple. Its saying, Im going to build a model of a probability distribution thats going to say, What is the most likely next word that youre going to say, given the last 400 words that you say?

Paul Rand: Yep.

Rebecca Willett: And then when I want to generate some text, I just start drawing words from this probability distribution. And so of course, building this model is not trivial, but at the end of the day, all its doing is its generating somewhat random sequences of words from this distribution. Thats a far cry from understanding what the language is telling us or actually being sentient, for instance.

And I think its the same with science. Right? I think this could be an enormously useful tool, but thats a far cry from it really understanding science. And I think humans are just going to be an essential part of this process. If youre trying to use something like ChatGPT for science and having it write a scientific paper for you, youre going to be in trouble. Its definitely going to be making stuff up. Like I said, its drawing words at random from a very sophisticated probability distribution, but it doesnt actually know anything. And the more text you have it generate, the more likely it is that its going to be inconsistent with itself. I have two feelings about this. On one hand, people already make mistakes in science, innocent mistakes. This is why we form a scientific community. This is why all science isnt done by a handful of Nobel Prize winners.

Paul Rand: Right.

Rebecca Willett: We have thousands of people all trying to examine each others work, find where the potential holes might be, identify real discoveries that change the way we think. And that community is going to play a critical role in analyzing ideas coming out of an AI model, evaluating whether they make any sense at all, whether its a fresh take that nobody thought of, or whether its just complete BS. Ultimately, just that human in the loop is essential, people with rigorous scientific training who can evaluate these systems. Having peer review determine whats ready for publication versus whats relatively more or less made up.

Paul Rand: One of the other areas, at least as Ive read about AI and the sciences, one of the ones that gets talked about is this idea of hypothesis generation. And I wonder if you can tell us what that is and why that might be particularly compelling.

Rebecca Willett: Were starting to also see people thinking about using AI for things like even deciding what data to collect in the first place or what experiments to run. So imagine, for instance, that I wanted to design a microbial community that could help improve somebody with a broken gut microbiome, and I want to help fix that. So we could just sort of randomly put a bunch of probiotics in their system and hope for the best. But a lot of the current approaches can be pretty short-lived if they work at all. And so what wed like to know is what determines whats going to make a good microbial community versus a bad one. And theres maybe trillions of possibilities. I cant just build them all and test them all. It would take too many resources.

And so what Id like to do is to integrate AI into this process, design a small number of communities, run some experiments on it, take that data and narrow down somehow the space of the hypotheses I have about what makes a good microbial community versus a bad one, and use that model, and any kind of uncertainties associated with that model, to help design my next set of experiments or which microbial communities I wanted to test next. And the hope is that by using AI in this process, well be able to use our money and experimental resources much more effectively than if we didnt have AI helping to suggest the next new experiments to run.

Paul Rand: But if we become too reliant, is there a concern about a future where our research agendas are becoming driven by AI? Could AI actually lead to a decrease in creative ideas from the scientific community, through path dependency based on the inputs we put into the system?

Rebecca Willett: It depends on the context. So if we go back to my earlier example where I want to find the best microbial community out of trillions of possibilities, and I have a very clear notion of what makes it the best, I can measure that, I have a lot to gain here. I can reduce the amount of resources I have to spend on collecting data, but that approach is not appropriate if Im really sort of more in an exploratory mode. So if I dont know exactly what Im looking for, then using one of these methods might mean that I just never do an experiment on something thats really interesting, but just not exactly aligned with my overall objective. And so theres this kind of inherent trade off between exploration and exploitation.

Paul Rand: How do you mean by that?

Rebecca Willett: Part of good science is just exploring the unknown. Part of what we try to do to make products and services available to people is exploitation, trying to exploit our known knowledge to design better systems or to guide the way we design experiments.

Paul Rand: Okay.

Rebecca Willett: And so depending on the context, yeah, I think using AI for experimental design would not be the right choice. And relying overly on an AI system to make predictions without kind of a thoughtful human behind the scenes is possibly a fools errand.

Paul Rand: And of course, as our AI co-host mentioned at the beginning, who that human is behind the scenes matters a great deal. How AI could open the ability to do science up to more people and why that may not be a good thing, after the break.

If youre getting a lot out of the important research shared on Big Brains, theres another University of Chicago podcast network show you should check out. Its called Entitled, and its about human rights. Co-hosted by lawyers and new Chicago Law School professors, Claudia Flores and Tom Ginsburg, Entitled explores the stories around why rights matter and whats the matter with rights.

Big Brains is supported by the University of Chicago Graham School. Are you a lifelong learner with an insatiable curiosity? Join us at Graham and access more than 50 open enrollment courses every quarter in literature, history, religion, science, and more. We open the doors of UChicago to learners everywhere, expand your mind and advance your leadership online and in-person offerings are available. Learn more at graham.uchicago.edu/bigbrains.

There is this concern that AI will eliminate jobs, but could it be the other way around? There have always been strong barriers to doing science, like needing a deep knowledge of fields, methods and statistics, and lets be honest, a high level of intelligence. But could these tools open the gates wider to people who may know how to ask the right questions and explore ideas, but dont have the other skills or time or money to acquire those skills?

Rebecca Willett: Im not sure about the answer.

I think theres inherent value to rigorous scientific training. So as we said before, what ChatGPT is doing is its generating plausible strings of text that might in no way be true. And I think its important for somebody to be able to recognize when this string of words is at all consistent with our understanding of science or where it might be going awry. And with no background, I think youre just unequipped to do that. On the other hand, creativity is extremely important in science. We normally associate it more with the arts and humanities, but really thinking of creative explanations for how the world works and why is essential. And so to some extent, if these tools allow people to generate more creative ideas, if we can develop AI assistance for scientists that allow them to really harness their creativity, I think it could be exciting.

And theres a lot of people who are really thinking about leveraging or developing creative AI assistance. Another way in which AI might help democratize science is in helping us to process our training data. For instance, one big citizen science initiative thats been running for many years now is called Galaxy Zoo, where humans do a little bit of training and then theyre presented with images of galaxies and theyre asked to answer some questions about those galaxies. And what this is doing is basically producing labels for the training data that might be used to analyze just millions of images of galaxies. And I think that having high quality training data is essential to making a lot of these AI systems work well. And so these kinds of citizen science projects provide a really cool opportunity I think, for science enthusiasts to play an important role.

I think there are also a broader category of risks that we need to think about. For instance, if we place too much trust in these AI systems, we might think, well, we need to train fewer scientists in the United States, because the AI is going to do all this work for us. And I think if we overestimate the capability of those systems, thats a real risk and a real missed opportunity. We still need those human thinkers.

Paul Rand: But what if those human thinkers are bad actors? We know that news organizations and people on social media will often cite case studies theyve seen online, but have done very little research into. In a future where AI can generate a thousand fake studies that look legitimate in a matter of minutes, how should the scientific community be thinking about maintaining integrity?

So if you were going to build safeguards in to help advise on protecting against some of these downsides, what kind of safeguards would come top of mind to you?

Rebecca Willett: Yeah, its a good question. So first Ill just tell you some of the things that the people mightve read about already in the news.

Paul Rand: Okay. Rebecca Willett: So theyll say something like, Well, I want to know what data that system was trained on. And on one hand that sounds good. I want to know if your face recognition was only trained on white men and will probably fail on Black women. That seems like a useful thing for me to know. On the other hand, when we look at something like ChatGPT that was trained on trillions of words that no human could possibly read, where no human could possibly read all of them, its kind of vacuous. Right? Telling me that doesnt tell me anything informative about whats going on under the hood for that ChatGPT system.

Another thing people have called for is building transparent or explainable AI systems, where the AI system can explain the decision its making to a layperson. And again, this sounds good in certain contexts, if were using AI to decide whos going to be let out on bail before defending their case in court, it sounds good for us to be able to explain what criteria the AI system is using. On the other hand, there are other tasks that are very difficult to explain, especially to a lay person. Like, how is a CAT scan image constructed from the raw data off the scanner? So there are a variety of things like this that have been proposed, that in the right context are important and meaningful, and in general are really insufficient.

And I hate to say this because I dont have a better solution that I can propose. I think that these are actually open technical questions. How do we build a system thats going to allow us to somehow certify it, certify that its not too biased against vulnerable groups, certify that its protecting peoples privacy in very general ways, certify that your autonomous vehicle is not going to kill a bicyclist? Besides just designing tests and trying things out, we dont really have a good handle on this. And its an open question about whether we can actually build in hooks or inroads into these systems that will allow us to test and validate and certify these systems more effectively.

Another risk, science misinformation, if you will. So you could imagine someone maliciously trying to generate a bunch of fake scientific articles towards some end, presumably malicious, and making it very hard for earnest scientists to figure out. Well, what is actually known? What experiments were actually run and whats been faked? And thats going to just put a drain on the resources for this whole scientific community.

And so yeah, I think there are definitely several different risks. Some of them, just in terms of what we need to do as academics to make sure that people are using AI in a rigorous and ethical way, and others about outside actors potentially doing malicious things that would have a terrible effect on us all. Right now, human oversight is just essential. Here at the University of Chicago, like most US universities, we have IRBs, institutional review boards. And before I run certain experiments, I need their approval to make sure that theres no major ethical lapse. Now, for the most part, those boards are for when Im running experiments on humans or animals. A lot of the research that I do on AI is not covered by those sorts of human oversight boards. So yeah, there certainly are risks.

Paul Rand: Here at the University of Chicago, Im seeing your name popping up with great frequency, all sorts of different topics with AI and the sciences.

Rebecca Willett: One of the great things about U Chicago is that theres a huge number of interactions across different departments. And so physicists and chemists, astronomers, ecologists, computer scientists, and statisticians, are constantly getting together and talking with each other, and partnering to help advance using AI in a rigorous way in the sciences. And I think this is especially exciting, because its not like things are somehow pigeonholed, where one little group is thinking about AI and physics, and a totally separate group is thinking about AI and chemistry, with no meeting in between. Weve really been focused on trying to think about core principles in AI that will influence many of the sciences. And were already seeing connections across different disciplines.

Paul Rand: Can you give any examples of some of those?

Rebecca Willett: The Margot and Tom Pritzker Foundation recently supported a joint conference between the University of Chicago and Caltech, bringing in worldwide experts in AI and science across multiple different disciplines for a three-day conference. And this was really an experiment. Most of the conferences in this space are much more kind of narrowly focused on a particular scientific domain, but it turned out to be great. We had a U Chicago researcher, Samantha Reisenfeld, talking about how she uses clustering to understand aspects of immune responses in tissues. The idea is Ive got a lot of different data points. So for example, Ive just got lots of different images of dogs, for instance. And these data points or these dog images, they dont have any labels. And what I want to do is I want to just group them into groups where somehow everything in the group is similar, and members of different groups are dissimilar.

Paul Rand: Fascinating.

Rebecca Willett: And so she was using these kinds of clustering ideas to analyze data from human tissues, and understanding peoples immune responses to different pathogens. And there was a physicist from MIT who was listening to this talk. And he said, This is amazing, because it turns out Im studying particle physics, and Im facing exactly the same challenge, but in a totally different context. And some of the specific approaches that Samantha was using turned out to be extremely relevant to the constraints associated with his physics problem.

Paul Rand: My goodness.

Rebecca Willett: And people were thrilled by this. They said, Yeah, normally I just talk to the same group of people over and over, and see the same ideas in our small little insular community. And by having this conference across different boundaries, I saw a whole different set of methods I could use.

Paul Rand: As we step into our AI future, it can sometimes feel like were in the famous dinner scene from Jurassic Park.

Tape: The world has just changed so radically, and were all running to catch up.

Paul Rand: Its clear that AI could be a powerful tool that scientists could use to cure diseases, solve climate change, or even take us outer space. But-

Speaker 9: Your scientists were so preoccupied with whether or not they could, they didnt stop to think if they should.

Paul Rand: As Professor Willett explains, there are all sorts of ways these systems could go wrong, radically wrong, if they get too far ahead of human oversight, judgment and control. And even ChatGPT agrees.

ChatGPT: AI can be a powerful tool. It doesnt replace the need for human judgment. AI is best used in partnership with human researchers, rather than as a replacement for them.

Matt Hodapp: Big Brains is a production of the University of Chicago Podcast Network. If you like what you heard, please leave us a rating and review. The show is hosted by Paul M. Rand and produced by me, Matt Hodapp, and Lea Ceasrine. Thanks for listening.

More:

How AI is transforming scientific research, with Rebecca Willett (Ep ... - UChicago News

Data Science And The UK Insurance Industry (Video) – Insurance … – Mondaq News Alerts

To print this article, all you need is to be registered or login on Mondaq.com.

In this video, Tim Rourke explains the key challenges insurersin the UK face when embedding data science into theirorganisations.

The proliferation of data available to insurers presents new andexciting opportunities across the value chain.

As a data rich industry, insurance is bursting with moreopportunities than most sectors to use data and analytics toimprove business performance and customer experience. Many insurershave taken steps in this direction - some very successfully, someless so.

There are a number of key challenges insurers need to be awareof.

The content of this article is intended to provide a generalguide to the subject matter. Specialist advice should be soughtabout your specific circumstances.

POPULAR ARTICLES ON: Insurance from UK

Weightmans

The case of Elizabeth Love v National Health Service Fife Health Board [2023] 23 Edin 18, is the latest case to highlight the battle defenders face in attempting to persuade...

Carey Olsen

Guernsey has introduced significant new protection for insurance policyholders. Claims by policyholders of a Guernsey-licensed insurance company...

Barton Legal

Whilst originally seen as another UK' based contract, NEC is now very much an international form that rivals FIDIC in its international reach.

The rest is here:

Data Science And The UK Insurance Industry (Video) - Insurance ... - Mondaq News Alerts