Ethics of AI: Benefits and risks of artificial intelligence – ZDNet

In 1949, at the dawn of the computer age, the French philosopher Gabriel Marcel warned of the danger of naively applying technology to solve life's problems.

Life, Marcel wrote in Being and Having, cannot be fixed the way you fix a flat tire. Any fix, any technique, is itself a product of that same problematic world, and is therefore problematic, and compromised.

Marcel's admonition is often summarized in a single memorable phrase: "Life is not a problem to be solved, but a mystery to be lived."

Despite that warning, seventy years later, artificial intelligence is the most powerful expression yet of humans' urge to solve or improve upon human life with computers.

But what are these computer systems? As Marcel would have urged, one must ask where they come from, whether they embody the very problems they would purport to solve.

Ethics in AI is essentially questioning, constantly investigating, and never taking for granted the technologies that are being rapidly imposed upon human life.

That questioning is made all the more urgent because of scale. AI systems are reaching tremendous size in terms of the compute power they require, and the data they consume. And their prevalence in society, both in the scale of their deployment and the level of responsibility they assume, dwarfs the presence of computing in the PC and Internet eras. At the same time, increasing scale means many aspects of the technology, especially in its deep learning form, escape the comprehension of even the most experienced practitioners.

Ethical concerns range from the esoteric, such as who is the author of an AI-created work of art; to the very real and very disturbing matter of surveillance in the hands of military authorities who can use the tools with impunity to capture and kill their fellow citizens.

Somewhere in the questioning is a sliver of hope that with the right guidance, AI can help solve some of the world's biggest problems. The same technology that may propel bias can reveal bias in hiring decisions. The same technology that is a power hog can potentially contribute answers to slow or even reverse global warming. The risks of AI at the present moment arguably outweigh the benefits, but the potential benefits are large and worth pursuing.

As Margaret Mitchell, formerly co-lead of Ethical AI at Google, has elegantly encapsulated, the key question is, "what could AI do to bring about a better society?"

Mitchell's question would be interesting on any given day, but it comes within a context that has added urgency to the discussion.

Mitchell's words come from a letter she wrote and posted on Google Drive following the departure of her co-lead, Timnit Gebru, in December. Gebru made clear that she was fired by Google, a claim Mitchell backs up in her letter. Jeff Dean, head of AI at Google, wrote in an internal email to staff that the company accepted the resignation of Gebru. Gebru's former colleagues offer a neologism for the matter: Gebru was "resignated" by Google.

Margaret Mitchell [right], was fired on the heels of the removal of Timnit Gebru.

I was fired by @JeffDean for my email to Brain women and Allies. My corp account has been cutoff. So I've been immediately fired 🙂

Timnit Gebru (@timnitGebru) December 3, 2020

Mitchell, who expressed outrage at how Gebru was treated by Google, was fired in February.

The departure of the top two ethics researchers at Google cast a pall over Google's corporate ethics, to say nothing of its AI scruples.

As reported by Wired's Tom Simonite last month, two academics invited to participate in a Google conference on safety in robotics in March withdrew from the conference in protest of the treatment of Gebru and Mitchell. A third academic said that his lab, which has received funding from Google, would no longer apply for money from Google, also in support of the two professors.

Google staff quit in February in protest of Gebru and Mitchell's treatment, CNN's Rachel Metz reported. And Sammy Bengio, a prominent scholar on Google's AI team who helped to recruit Gebru, resigned this month in protest over Gebru and Mitchell's treatment, Reuters has reported.

A petition on Medium signed by 2,695 Google staff members and 4,302 outside parties expresses support for Gebru and calls on the company to "strengthen its commitment to research integrity and to unequivocally commit to supporting research that honors the commitments made in Google'sAI Principles."

Gebru's situation is an example of how technology is not neutral, as the circumstances of its creation are not neutral, as MIT scholars Katlyn Turner, Danielle Wood, Catherine D'Ignazio discussed in an essay in January.

"Black women have been producing leading scholarship that challenges the dominant narratives of the AI and Tech industry: namely that technology is ahistorical, 'evolved', 'neutral' and 'rational' beyond the human quibbles of issues like gender, class, and race," the authors write.

During an online discussion of AI in December, AI Debate 2, Celeste Kidd, a professor at UC Berkeley, reflecting on what had happened to Gebru, remarked, "Right now is a terrifying time in AI."

"What Timnit experienced at Google is the norm, hearing about it is what's unusual," said Kidd.

The questioning of AI and how it is practiced, and the phenomenon of corporations snapping back in response, comes as the commercial and governmental implementation of AI make the stakes even greater.

Ethical issues take on greater resonance when AI expands to uses that are far afield of the original academic development of algorithms.

The industrialization of the technology is amplifying the everyday use of those algorithms. A report this month by Ryan Mac and colleagues at BuzzFeed found that "more than 7,000 individuals from nearly 2,000 public agencies nationwide have used technology from startup Clearview AI to search through millions of Americans' faces, looking for people, including Black Lives Matter protesters, Capitol insurrectionists, petty criminals, and their own friends and family members."

Clearview neither confirmed nor denied BuzzFeed's' findings.

New devices are being put into the world that rely on machine learning forms of AI in one way or another. For example, so-called autonomous trucking is coming to highways, where a "Level 4 ADAS" tractor trailer is supposed to be able to move at highway speed on certain designated routes without a human driver.

A company making that technology, TuSimple, of San Diego, California, is going public on Nasdaq. In its IPO prospectus, the company says it has 5,700 reservations so far in the four months since it announced availability of its autonomous driving software for the rigs. When a truck is rolling at high speed, carrying a huge load of something, making sure the AI software safely conducts the vehicle is clearly a priority for society.

TuSimple says it has almost 6,000 pre-orders for a driverless semi-truck. When a truck is rolling at high speed, carrying a huge load of something, making sure the AI software safely conducts the vehicle is clearly a priority for society.

Another area of concern is AI applied in the area of military and policing activities.

Arthur Holland Michel, author of an extensive book on military surveillance, Eyes in the Sky, has described how ImageNet has been used to enhance the U.S. military's surveillance systems. For anyone who views surveillance as a useful tool to keep people safe, that is encouraging news. For anyone worried about the issues of surveillance unchecked by any civilian oversight, it is a disturbing expansion of AI applications.

Calls are rising for mass surveillance, enabled by technology such as facial recognition, not to be used at all.

As ZDNet's Daphne Leprince-Ringuet reported last month, 51 organizations, including AlgorithmWatch and the European Digital Society, have sent a letter to the European Union urging a total ban on surveillance.

And it looks like there will be some curbs after all. After an extensive report on the risks a year ago, and a companion white paper, and solicitation of feedback from numerous "stakeholders," the European Commission this month published its proposal for "Harmonised Rules On Artificial Intelligence For AI." Among the provisos is a curtailment of law enforcement use of facial recognition in public.

"The use of 'real time' remote biometric identification systems in publicly accessible spaces for the purpose of law enforcement is also prohibited unless certain limited exceptions apply," the report states.

The backlash against surveillance keeps finding new examples to which to point. The paradigmatic example had been the monitoring of ethic Uyghurs in China's Xianxjang region. Following a February military coup in Myanmar, Human Rights Watch reports that human rights are in the balance given the surveillance system that had just been set up. That project, called Safe City, was deployed in the capital Naypidaw, in December.

As one researcher told Human Rights Watch, "Before the coup, Myanmar's government tried to justify mass surveillance technologies in the name of fighting crime, but what it is doing is empowering an abusive military junta."

Also: The US, China and the AI arms race: Cutting through the hype

The National Security Commission on AI's Final Report in March warned the U.S. is not ready for global conflict that employs AI.

As if all those developments weren't dramatic enough, AI has become an arms race, and nations have now made AI a matter of national policy to avoid what is presented as existential risk. The U.S.'s National Security Commission on AI, staffed by tech heavy hitters such as former Google CEO Eric Schmidt, Oracle CEO Safra Catz, and Amazon's incoming CEO Andy Jassy, last month issued its 756-page "final report" for what it calls the "strategy for winning the artificial intelligence era."

The authors "fear AI tools will be weapons of first resort in future conflicts," they write, noting that "state adversaries are already using AI-enabled disinformation attacks to sow division in democracies and jar our sense of reality."

The Commission's overall message is that "The U.S. government is not prepared to defend the United States in the coming artificial intelligence era." To get prepared, the White House needs to make AI a cabinet-level priority, and "establish the foundations for widespread integration of AI by 2025." That includes "building a common digital infrastructure, developing a digitally-literate workforce, and instituting more agile acquisition, budget, and oversight processes."

Why are these issues cropping up? There are issues of justice and authoritarianism that are timeless, but there are also new problems with the arrival of AI, and in particular its modern deep learning variant.

Consider the incident between Google and scholars Gebru and Mitchell. At the heart of the dispute was a research paper the two were preparing for a conference that crystallizes a questioning of the state of the art in AI.

The paper that touched off a controversy at Google: Gebru and Bender and Major and Mitchell argue that very large language models such as Google's BERT present two dangers: massive energy consumption and perpetuating biases.

The paper, coauthored by Emily Bender of the University of Washington, Gebru, Angelina McMillan-Major, also of the University of Washington, and Mitchell, titled "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" focuses on a topic within machine learning called natural language processing, or NLP.

The authors describe how language models such as GPT-3 have gotten bigger and bigger, culminating in very large "pre-trained" language models, including Google's Switch Transformer, also known as Switch-C, which appears to be the largest model published to date. Switch-C uses 1.6 trillion neural "weights," or parameters, and is trained on a corpus of 745 gigabytes of text data.

The authors identify two risk factors. One is the environmental impact of larger and larger models such as Switch-C. Those models consume massive amounts of compute, and generate increasing amounts of carbon dioxide. The second issue is the replication of biases in the generation of text strings produced by the models.

The environment issue is one of the most vivid examples of the matter of scale. As ZDNet has reported, the state of the art in NLP, and, indeed, much of deep learning, is to keep using more and more GPU chips, from Nvidia and AMD, to operate ever-larger software programs. Accuracy of these models seems to increase, generally speaking, with size.

But there is an environmental cost. Bender and team cite previous research that has shown that training a large language model, a version of Google's Transformer that is smaller than Switch-C, emitted 284 tons of carbon dioxide, which is 57 times as much CO2 as a human being is estimated to be responsible for releasing into the environment in a year.

It's ironic, the authors note, that the ever-rising cost to the environment of such huge GPU farms impacts most immediately the communities on the forefront of risk from change whose dominant languages aren't even accommodated by such language models, in particular the population of the Maldives archipelago in the Arabian Sea, whose official language is Dhivehi, a branch of the Indo-Aryan family:

Is it fair or just to ask, for example, that the residents of the Maldives (likely to be underwater by 2100) or the 800,000 people in Sudan affected by drastic floods pay the environmental price of training and deploying ever larger English LMs [language models], when similar large-scale models aren't being produced for Dhivehi or Sudanese Arabic?

The second concern has to do with the tendency of these large language models to perpetuate biases that are contained in the training set data, which are often publicly available writing that is scraped from places such as Reddit. If that text contains biases, those biases will be captured and amplified in generated output.

The fundamental problem, again, is one of scale. The training sets are so large, the issues of bias in code cannot be properly documented, nor can they be properly curated to remove bias.

"Large [language models] encode and reinforce hegemonic biases, the harms that follow are most likely to fall on marginalized populations," the authors write.

The risk of the huge cost of compute for ever-larger models, has been a topic of debate for some time now. Part of the problem is that measures of performance, including energy consumption, are often cloaked in secrecy.

Some benchmark tests in AI computing are getting a little bit smarter. MLPerf, the main measure of performance of training and inference in neural networks, has been making efforts to provide more representative measures of AI systems for particular workloads. This month, the organization overseeing the industry standard MLPerf benchmark, the MLCommons, for the first time asked vendors to list not just performance but energy consumed for those machine learning tasks.

Regardless of the data, the fact is systems are getting bigger and bigger in general. The response to the energy concern within the field has been two-fold: to build computers that are more efficient at processing the large models, and to develop algorithms that will compute deep learning in a more intelligent fashion than just throwing more computing at the problem.

Cerebras's Wafer Scale Engine is the state of the art in AI computing, the world's biggest chip, designed for the ever-increasing scale of things such as language models.

On the first score, a raft of startups have arisen to offer computers dedicate to AI that they say are much more efficient than the hundreds or thousands of GPUs from Nvidia or AMD typically required today.

They include Cerebras Systems, which has pioneered the world's largest computer chip; Graphcore, the first company to offer a dedicated AI computing system, with its own novel chip architecture; and SambaNova Systems, which has received over a billion dollars in venture capital to sell both systems but also an AI-as-a-service offering.

"These really large models take huge numbers of GPUs just to hold the data," Kunle Olukotun, Stanford University professor of computer science who is a co-founder of SambaNova, told ZDNet, referring to language models such as Google's BERT.

"Fundamentally, if you can enable someone to train these models with a much smaller system, then you can train the model with less energy, and you would democratize the ability to play with these large models," by involving more researchers, said Olukotun.

Those designing deep learning neural networks are simultaneously exploring ways the systems can be more efficient. For example, the Switch Transformer from Google, the very large language model that is referenced by Bender and team, can reach some optimal spot in its training with far fewer than its maximum 1.6 trillion parameters, author William Fedus and colleagues of Google state.

The software "is also an effective architecture at small scales as well as in regimes with thousands of cores and trillions of parameters," they write.

The key, they write, is to use a property called sparsity, which prunes which of the weights get activated for each data sample.

Scientists at Rice University and Intel propose slimming down the computing budget of large neural networks by using a hashing table that selects the neural net activations for each input, a kind of pruning of the network.

Another approach to working smarter is a technique called hashing. That approach is embodied in a project called "Slide," introduced last year by Beidi Chen of Rice University and collaborators at Intel. They use something called a hash table to identify individual neurons in a neural network that can be dispensed with, thereby reducing the overall compute budget.

Chen and team call this "selective sparsification", and they demonstrate that running a neural network can be 3.5 times faster on a 44-core CPU than on an Nvidia Tesla V100 GPU.

As long as large companies such as Google and Amazon dominate deep learning in research and production, it is possible that "bigger is better" will dominate neural networks. If smaller, less resource-rich users take up deep learning in smaller facilities, than more-efficient algorithms could gain new followers.

The second issue, AI bias, runs in a direct line from the Bender et al. paper back to a paper in 2018 that touched off the current era in AI ethics, the paper that was the shot heard 'round the world, as they say.

Buolamwini and Gebru brought international attention to the matter of bias in AI with their 2018 paper "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification," which revealed that commercial facial recognition systems showed "substantial disparities in the accuracy of classifying darker females, lighter females, darker males, and lighter males in gender classification systems."

That 2018 paper, "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification," was also authored by Gebru, then at Microsoft, along with MIT researcher Joy Buolamwini. They demonstrated how commercially available facial recognition systems had high accuracy when dealing with images of light-skinned men, but catastrophically bad inaccuracy when dealing with images of darker-skinned women. The authors' critical question was why such inaccuracy was tolerated in commercial systems.

Buolamwini and Gebru presented their paper at the Association for Computing Machinery's Conference on Fairness, Accountability, and Transparency. That is the same conference where in February Bender and team presented the Parrot paper. (Gebru is a co-founder of the conference.)

Both Gender Shades and the Parrot paper deal with a central ethical concern in AI, the notion of bias. AI in its machine learning form makes extensive use of principles of statistics. In statistics, bias is when an estimation of something turns out not to match the true quantity of that thing.

So, for example, if a political pollster takes a poll of voters' preferences, if they only get responses from people who talk to poll takers, they may get what is called response bias, in which their estimation of the preference for a certain candidate's popularity is not an accurate reflection of preference in the broader population.

Also: AI and ethics: One-third of executives are not aware of potential AI bias

The Gender Shades paper in 2018 broke ground in showing how an algorithm, in this case facial recognition, can be extremely out of alignment with the truth, a form of bias that hits one particular sub-group of the population.

Flash forward, and the Parrot paper shows how that statistical bias has become exacerbated by scale effects in two particular ways. One way is that data sets have proliferated, and increased in scale, obscuring their composition. Such obscurity can obfuscate how the data may already be biased versus the truth.

Second, NLP programs such as GPT-3 are generative, meaning that they are flooding the world with an amazing amount of created technological artifacts such as automatically generated writing. By creating such artifacts, biases can be replicated, and amplified in the process, thereby proliferating such biases.

On the first score, the scale of data sets, scholars have argued for going beyond merely tweaking a machine learning system in order to mitigate bias, and to instead investigate the data sets used to train such models, in order to explore biases that are in the data itself.

Before she was fired from Google's Ethical AI team, Mitchell lead her team to develop a system called "Model Cards" to excavate biases hidden in data sets. Each model card would report metrics for a given neural network model, such as looking at an algorithm for automatically finding "smiling photos" and reporting its rate of false positives and other measures.

One example is an approach created by Mitchell and team at Google called model cards. As explained in the introductory paper, "Model cards for model reporting," data sets need to be regarded as infrastructure. Doing so will expose the "conditions of their creation," which is often obscured. The research suggests treating data sets as a matter of "goal-driven engineering," and asking critical questions such as whether data sets can be trusted and whether they build in biases.

Another example is a paper last year, featured in The State of AI Ethics, by Emily Denton and colleagues at Google, "Bringing the People Back In," in which they propose what they call a genealogy of data, with the goal "to investigate how and why these datasets have been created, what and whose values influence the choices of data to collect, the contextual and contingent conditions of their creation, and the emergence of current norms and standards of data practice."

Vinay Prabhu, chief scientist at UnifyID, in a talk at Stanford last year described being able to take images of people from ImageNet, feed them to a search engine, and find out who people are in the real world. It is the "susceptibility phase" of data sets, he argues, when people can be targeted by having had their images appropriated.

Scholars have already shed light on the murky circumstances of some of the most prominent data sets used in the dominant NLP models. For example, Vinay Uday Prabhu, who is chief scientist at startup UnifyID Inc., in a virtual talk at Stanford University last year examined the ImageNet data set, a collection of 15 million images that have been labeled with descriptions.

The introduction of ImageNet in 2009 arguably set in motion the deep learning epoch. There are problems, however, with ImageNet, particularly the fact that it appropriated personal photos from Flickr without consent, Prabhu explained.

Those non-consensual pictures, said Prabhu, fall into the hands of thousands of entities all over the world, and that leads to a very real personal risk, he said, what he called the "susceptibility phase," a massive invasion of privacy.

Using what's called reverse image search, via a commercial online service, Prabhu was able to take ImageNet pictures of people and "very easily figure out who they were in the real world." Companies such as Clearview, said Prabhu, are merely a symptom of that broader problem of a kind-of industrialized invasion of privacy.

An ambitious project has sought to catalog that misappropriation. Called Exposing.ai, it is the work of Adam Harvey and Jules LaPlace, and it formally debuted in January. The authors have spent years tracing how personal photos were appropriated without consent for use in machine learning training sets.

The site is a search engine where one can "check if your Flickr photos were used in dozens of the most widely used and cited public face and biometric image datasets [] to train, test, or enhance artificial intelligence surveillance technologies for use in academic, commercial, or defense related applications," as Harvey and LaPlace describe it.

Some argue the issue goes beyond simply the contents of the data to the means of its production. Amazon's Mechanical Turk service is ubiquitous as a means of employing humans to prepare vast data sets, such as by applying labels to pictures for ImageNet or to rate chat bot conversations.

An article last month by Vice's Aliide Naylor quoted Mechanical Turk workers who felt coerced in some instances to produce results in line with a predetermined objective.

The Turkopticon feedback aims to arm workers on Amazon's Mechanical Turk with honest appraisals of the work conditions of contracting for various Turk clients.

A project called Turkopticon has arisen to crowd-source reviews of the parties who contract with Mechanical Turk, to help Turk workers avoid abusive or shady clients. It is one attempt to ameliorate what many see as the troubling plight of an expanding underclass of piece workers, what Mary Gray and Siddharth Suri of Microsoft have termed "ghost work."

There are small signs the message of data set concern has gotten through to large organizations practicing deep learning. Facebook this month announced a new data set that was created not by appropriating personal images but rather by making original videos of over three thousand paid actors who gave consent to appear in the videos.

The paper by lead author Caner Hazirbas and colleagues explains that the "Casual Conversations" data set is distinguished by the fact that "age and gender annotations are provided by the subjects themselves." Skin type of each person was annotated by the authors using the so-called Fitzpatrick Scale, the same measure that Buolamwini and Gebru used in their Gender Shades paper. In fact, Hazirbas and team prominently cite Gender Shades as precedent.

Hazirbas and colleagues found that, among other things, when machine learning systems are tested against this new data set, some of the same failures crop up as identified by Buolamwini and Gebru. "We noticed an obvious algorithmic bias towards lighter skinned subjects," they write.

Read more:
Ethics of AI: Benefits and risks of artificial intelligence - ZDNet

Related Posts

Comments are closed.