A.I. Has a Measurement Problem – The New York Times

Theres a problem with leading artificial intelligence tools like ChatGPT, Gemini and Claude: We dont really know how smart they are.

Thats because, unlike companies that make cars or drugs or baby formula, A.I. companies arent required to submit their products for testing before releasing them to the public. Theres no Good Housekeeping seal for A.I. chatbots, and few independent groups are putting these tools through their paces in a rigorous way.

Instead, were left to rely on the claims of A.I. companies, which often use vague, fuzzy phrases like improved capabilities to describe how their models differ from one version to the next. And while there are some standard tests given to A.I. models to assess how good they are at, say, math or logical reasoning, many experts have doubts about how reliable those tests really are.

This might sound like a petty gripe. But Ive become convinced that a lack of good measurement and evaluation for A.I. systems is a major problem.

For starters, without reliable information about A.I. products, how are people supposed to know what to do with them?

I cant count the number of times Ive been asked in the past year, by a friend or a colleague, which A.I. tool they should use for a certain task. Does ChatGPT or Gemini write better Python code? Is DALL-E 3 or Midjourney better at generating realistic images of people?

We are having trouble retrieving the article content.

Please enable JavaScript in your browser settings.

Thank you for your patience while we verify access. If you are in Reader mode please exit andlog intoyour Times account, orsubscribefor all of The Times.

Thank you for your patience while we verify access.

Already a subscriber?Log in.

Want all of The Times?Subscribe.

See the rest here:
A.I. Has a Measurement Problem - The New York Times

Related Posts

Comments are closed.