AI and You: OpenAI’s Sora Previews Text-to-Video Future, First Ivy League AI Degree – CNET

AI developments are happening pretty fast. If you don't stop and look around once in a while, you could miss them.

Fortunately, I'm looking around for you and what I saw this week is that competition between OpenAI, maker of ChatGPT and Dall-E, and Google continues to heat up in a way that's worth paying attention to.

A week after updating its Bard chatbot and changing the name to Gemini, Google's DeepMind AI subsidiary previewed the next version of its generative AI chatbot. DeepMind told CNET's Lisa Lacy that Gemini 1.5 will be rolled out "slowly" to regular people who sign up for a wait list and will be available now only to developers and enterprise customers.

Gemini 1.5 Pro, Lacy reports, is "as capable as" the Gemini 1.0 Ultra model, which Google announced on Feb. 8. The 1.5 Pro model has a win rate -- a measurement of how many benchmarks it can outperform -- of 87% compared with the 1.0 Pro and 55% against the 1.0 Ultra. So the 1.5 Pro is essentially an upgraded version of the best available model now.

Gemini 1.5 Pro can ingest video, images, audio and text to answer questions, added Lacy. Oriol Vinyals, vice president of research at Google DeepMind and co-lead of Gemini, described 1.5 as a "research release" and said the model is "very efficient" thanks to a unique architecture that can answer questions by zeroing in on expert sources in that particular subject rather than seeking the answer from all possible sources.

Meanwhile, OpenAI announced a new text-to-video model called Sora that's capturing a lot of attention because of the photorealistic videos it's able to generate. Sora can "create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions." Following up on a promise it made, along with Google and Meta last week, to watermark AI-generated images and video, OpenAI says it's also creating tools to detect videos created with Sora so they can be identified as being AI generated.

Google and Meta have also announced their own gen AI text-to-video creators.

Sora, which means "sky" in Japanese, is also being called experimental, with OpenAI limiting access for now to so-called "red teamers," security experts and researchers who will assess the tool's potential harms or risks. That follows through on promises made as part of President Joe Biden's AI executive order last year, asking developers to submit the results of safety checks on their gen AI chatbots before releasing them publicly. OpenAI said it's also looking to get feedback on Sora from some visual artists, designers and filmmakers.

How do the photorealistic videos look? Pretty realistic. I agree with the The New York Times, which described the short demo videos -- "of wooly mammoths trotting through a snowy meadow, a monster gazing at a melting candle and a Tokyo street scene seemingly shot by a camera swooping across the city" -- as "eye popping."

The MIT Review, which also got a preview of Sora, said the "tech has pushed the envelope of what's possible with text-to-video generation." Meanwhile, The Washington Post noted Sora could exacerbate an already growing problem with video deepfakes, which have been used to "deceive voters" and scam consumers.

One X commentator summarized it this way: "Oh boy here we go what is real anymore." And OpenAI CEO Sam Altman called the news about its video generation model a "remarkable moment."

You can see the four examples of what Sora can produce on OpenAI's intro site, which notes that the tool is "able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world. The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions."

But Sora has its weaknesses, which is why OpenAI hasn't yet said whether it will actually be incorporated into its chatbots. Sora "may struggle with accurately simulating the physics of a complex scene and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark. The model may also confuse spatial details of a prompt, for example, mixing up left and right."

All of this is to remind us that tech is a tool -- and that it's up to us humans to decide how, when, where and why to use that technology. In case you didn't see it, the trailer for the new Minions movie (Despicable Me 4: Minion Intelligence) makes this point cleverly, with its sendup of gen AI deepfakes and Jon Hamm's voiceover of how "artificial intelligence is changing how we see the worldtransforming the way we do business."

"With artificial intelligence," Hamm adds over the minions' laughter, "the future is in good hands."

Here are the other doings in AI worth your attention.

Twenty tech companies, including Adobe, Amazon, Anthropic, ElevenLabs, Google, IBM, Meta, Microsoft, OpenAI, Snap, TikTok and X, agreed at a security conference in Munich that they will voluntarily adopt "reasonable precautions" to guard against AI tools being used to mislead or deceive voters ahead of elections.

"The intentional and undisclosed generation and distribution of Deceptive AI Election content can deceive the public in ways that jeopardize the integrity of electoral processes," the text of the accord says, according to NPR. "We affirm that the protection of electoral integrity and public trust is a shared responsibility and a common good that transcends partisan interests and national borders."

But the agreement is "largely symbolic," the Associated Press reported, noting that "reasonable precautions" is a little vague.

"The companies aren't committing to ban or remove deepfakes," the AP said. "Instead, the accord outlines methods they will use to try to detect and label deceptive AI content when it is created or distributed on their platforms. It notes the companies will share best practices with each other and provide 'swift and proportionate responses' when that content starts to spread."

AI has already been used to try to trick voters in the US and abroad. Days before the New Hampshire presidential primary, fraudsters sent an AI robocall that mimicked President Biden's voice, asking them not to vote in the primary. That prompted the Federal Communications Commission this month to make AI-generated robocalls illegal. The AP said that "Just days before Slovakia's elections in November, AI-generated audio recordings impersonated a candidate discussing plans to raise beer prices and rig the election. Fact-checkers scrambled to identify them as false as they spread across social media."

"Everybody recognizes that no one tech company, no one government, no one civil society organization is able to deal with the advent of this technology and its possible nefarious use on their own," Nick Clegg, president of global affairs for Meta, told the Associated Press in an interview before the summit.

Over 4 billion people are set to vote in key elections this year in more than 40 countries,. including the US, The Hill reported.

If you're concerned about how deepfakes may be used to scam you or your family members -- someone calls your grandfather and asks them for money by pretending to be you -- Bloomberg reporter Rachel Metz has a good idea. She suggests it may be time for us all to create a "family password" or safe word or phrase to share with our family or personal network that we can ask for to make sure we're talking to who we think we're talking to.

"Extortion has never been easier," Metz reports. "The kind of fakery that used to take time, money and technical know-how can now be accomplished quickly and cheaply by nearly anyone."

That's where family passwords come in, since they're "simple and free," Metz said. "Pick a word that you and your family (or another trusted group) can easily remember. Then, if one of those people reaches out in a way that seems a bit odd -- say, they're suddenly asking you to deliver 5,000 gold bars to a P.O. Box in Alaska -- first ask them what the password is."

How do you pick a good password? She offers a few suggestions, including using a word you don't say frequently and that's not likely to come up in casual conversations. Also, "avoid making the password the name of a pet, as those are easily guessable."

Hiring experts have told me it's going to take years to build an AI-educated workforce, considering that gen AI tools like ChatGPT weren't released until late 2022. So it makes sense that learning platforms like Coursera, Udemy, Udacity, Khan Academy and many universities are offering online courses and certificates to upskill today's workers. Now the University of Pennsylvania's School of Engineering and Applied Science said it's the first Ivy League school to offer an undergraduate major in AI.

"The rapid rise of generative AI is transforming virtually every aspect of life: health, energy, transportation, robotics, computer vision, commerce, learning and even national security," Penn said in a Feb. 13 press release. "This produces an urgent need for innovative, leading-edge AI engineers who understand the principles of AI and how to apply them in a responsible and ethical way."

The bachelor of science in AI offers coursework in machine learning, computing algorithms, data analytics and advanced robotics and will have students address questions about "how to align AI with our social values and how to build trustworthy AI systems," Penn professor Zachary Ives said.

"We are training students for jobs that don't yet exist in fields that may be completely new or revolutionized by the time they graduate," added Robert Ghrist, associate dean of undergraduate education in Penn Engineering.

FYI, the cost of an undergraduate education at Penn, which typically spans four years, is over $88,000 per year (including housing and food).

For those not heading to college or who haven't signed up for any of those online AI certificates, their AI upskilling may come courtesy of their current employee. The Boston Consulting Group, for its Feb. 9 report, What GenAI's Top Performer Do Differently, surveyed over 150 senior executives across 10 sectors. Generally:

Bottom line: companies are starting to look at existing job descriptions and career trajectories, and the gaps they're seeing in the workforce when they consider how gen AI will affect their businesses. They've also started offering gen AI training programs. But these efforts don't lessen the need for today's workers to get up to speed on gen AI and how it may change the way they work -- and the work they do.

In related news, software maker SAP looked at Google search data to see which states in the US were most interested in "AI jobs and AI business adoption."

Unsurprisingly, California ranked first in searches for "open AI jobs" and "machine learning jobs." Washington state came in second place, Vermont in third, Massachusetts in fourth and Maryland in fifth.

California, "home to Silicon Valley and renowned as a global tech hub, shows a significant interest in AI and related fields, with 6.3% of California's businesses saying that they currently utilize AI technologies to produce goods and services and a further 8.4% planning to implement AI in the next six months, a figure that is 85% higher than the national average," the study found.

Virginia, New York, Delaware, Colorado and New Jersey, in that order, rounded out the top 10.

Over the past few months, I've highlighted terms you should know if you want to be knowledgeable about what's happening as it relates to gen AI. So I'm going to take a step back this week and provide this vocabulary review for you, with a link to the source of the definition.

It's worth a few minutes of your time to know these seven terms.

Anthropomorphism: The tendency for people to attribute humanlike qualities or characteristics to an AI chatbot. For example, you may assume it's kind or cruel based on its answers, even though it isn't capable of having emotions, or you may believe the AI is sentient because it's very good at mimicking human language.

Artificial general intelligence (AGI): A description of programs that are as capable as -- or even more capable than -- than a human. While full general intelligence is still off in the future, models are growing in sophistication. Some have demonstrated skills across multiple domains ranging from chemistry to psychology, with task performance paralleling human benchmarks.

Generative artificial intelligence (gen AI): Technology that creates content -- including text, images, video and computer code -- by identifying patterns in large quantities of training data and then creating original material that has similar characteristics.

Hallucination: Hallucinations are unexpected and incorrect responses from AI programs that can arise for reasons that aren't yet fully known. A language model might suddenly bring up fruit salad recipes when you were asking about planting fruit trees. It might also make up scholarly citations, lie about data you ask it to analyze or make up facts about events that aren't in its training data. It's not fully understood why this happens but can arise from sparse data, information gaps and misclassification.

Large language model (LLM): A type of AI model that can generate human-like text and is trained on a broad dataset.

Prompt engineering: This is the act of giving AI an instruction so it has the context it needs to achieve your goal. Prompt engineering is best associated with OpenAI's ChatGPT, describing the tasks users feed into the algorithm. (e.g. "Give me five popular baby names.")

Temperature: In simple terms, model temperature is a parameter that controls how random a language model's output is. A higher temperature means the model takes more risks, giving you a diverse mix of words. On the other hand, a lower temperature makes the model play it safe, sticking to more focused and predictable responses.

Model temperature has a big impact on the quality of the text generated in a bunch of [natural language processing] tasks, like text generation, summarization and translation.

The tricky part is finding the perfect model temperature for a specific task. It's kind of like Goldilocks trying to find the perfect bowl of porridge -- not too hot, not too cold, but just right. The optimal temperature depends on things like how complex the task is and how much creativity you're looking for in the output.

Editors' note: CNET is using an AI engine to help create some stories. For more, seethis post.

More:

AI and You: OpenAI's Sora Previews Text-to-Video Future, First Ivy League AI Degree - CNET

Related Posts

Comments are closed.