Inbred, gibberish or just MAD? Warnings rise about AI models

Berliner Boersenzeitung - Inbred, gibberish or just MAD? Warnings rise about AI models

Berlin 11°C

EUR -

AED 4.229626

AFN 72.557604

ALL 96.200283

AMD 434.304194

ANG 2.061644

AOA 1056.111273

ARS 1608.366971

AUD 1.624462

AWG 2.075944

AZN 1.961012

BAM 1.959872

BBD 2.316914

BDT 141.153259

BGN 1.968616

BHD 0.434975

BIF 3415.570318

BMD 1.151703

BND 1.471489

BOB 7.977574

BRL 6.023521

BSD 1.150395

BTN 106.10737

BWP 15.685657

BYN 3.42682

BYR 22573.37436

BZD 2.313607

CAD 1.577706

CDF 2608.606438

CHF 0.906401

CLF 0.026516

CLP 1047.036065

CNY 8.011532

CNH 7.927786

COP 4266.390788

CRC 540.339027

CUC 1.151703

CUP 30.520123

CVE 110.495044

CZK 24.447537

DJF 204.846478

DKK 7.472351

DOP 70.218019

DZD 152.293142

EGP 60.314344

ERN 17.275542

ETB 181.205966

FJD 2.548085

FKP 0.865883

GBP 0.864249

GEL 3.132339

GGP 0.865883

GHS 12.521068

GIP 0.865883

GMD 84.64982

GNF 10085.259587

GTQ 8.817357

GYD 240.800286

HKD 9.024915

HNL 30.45433

HRK 7.536975

HTG 150.776526

HUF 390.904627

IDR 19546.066035

ILS 3.578709

IMP 0.865883

INR 106.404091

IQD 1506.930794

IRR 1521456.949262

ISK 143.444364

JEP 0.865883

JMD 180.956741

JOD 0.816554

JPY 183.182895

KES 149.25565

KGS 100.716474

KHR 4612.683422

KMF 494.080561

KPW 1036.583062

KRW 1717.137006

KWD 0.353285

KYD 0.958592

KZT 555.504113

LAK 24686.288142

LBP 103012.919266

LKR 358.214225

LRD 210.506434

LSL 19.352807

LTL 3.400679

LVL 0.696653

LYD 7.373351

MAD 10.807353

MDL 20.015584

MGA 4788.970338

MKD 61.646389

MMK 2418.752297

MNT 4116.758787

MOP 9.277475

MRU 45.865285

MUR 53.692156

MVR 17.805285

MWK 1994.352117

MXN 20.347536

MYR 4.512364

MZN 73.59289

NAD 19.352807

NGN 1574.711229

NIO 42.33015

NOK 11.076035

NPR 169.776624

NZD 1.970322

OMR 0.442828

PAB 1.15039

PEN 3.97095

PGK 4.960413

PHP 68.687266

PKR 321.348828

PLN 4.260298

PYG 7466.7073

QAR 4.204854

RON 5.092139

RSD 117.408061

RUB 94.300137

RWF 1678.895356

SAR 4.324546

SBD 9.273119

SCR 15.398642

SDG 692.173095

SEK 10.712771

SGD 1.471444

SHP 0.864075

SLE 28.332368

SLL 24150.643776

SOS 656.266306

SRD 43.271205

STD 23837.922132

STN 24.551755

SVC 10.065913

SYP 127.696075

SZL 19.338261

THB 37.263379

TJS 11.043195

TMT 4.036718

TND 3.397774

TOP 2.773023

TRY 50.912745

TTD 7.801208

TWD 36.762926

TZS 3005.944222

UAH 50.714084

UGX 4343.023049

USD 1.151703

UYU 46.76696

UZS 13908.897074

VES 513.943044

VND 30289.782943

VUV 137.728848

WST 3.172031

XAF 657.325511

XAG 0.014343

XAU 0.00023

XCD 3.112535

XCG 2.073207

XDR 0.817502

XOF 657.325511

XPF 119.331742

YER 274.684228

ZAR 19.245057

ZMK 10366.706959

ZMW 22.402543

ZWL 370.847823

RBGPF

0.1000

82.5

+0.12%
CMSC

0.0000

22.99

0%
RYCEF

0.3800

16.5

+2.3%
NGG

-0.0100

90.89

-0.01%
GSK

0.3800

53.77

+0.71%
RELX

0.3300

34.47

+0.96%
VOD

0.1900

14.6

+1.3%
RIO

2.0300

89.86

+2.26%
BTI

1.0100

60.94

+1.66%
AZN

2.1100

192.01

+1.1%
BCE

0.6521

25.9

+2.52%
JRI

-0.0500

12.54

-0.4%
BCC

1.7200

71.72

+2.4%
CMSD

-0.0400

22.95

-0.17%
BP

0.2300

42.9

+0.54%

Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

TECHNOLOGY 05.08.2024

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

(H.Schneide--BBZ)

Berliner Boersenzeitung - Inbred, gibberish or just MAD? Warnings rise about AI models

Inbred, gibberish or just MAD? Warnings rise about AI models

Featured

Nvidia chief expects revenue of $1 trillion through 2027

Nvidia making AI module for outer space

Datavault AI Schedules Conference Call to Discuss Fourth Quarter and Full Year 2025 Financial Results on Thursday, March 19, 2026

Ingredients of life discovered in Ryugu asteroid samples