Berliner Boersenzeitung - Inbred, gibberish or just MAD? Warnings rise about AI models

EUR -
AED 4.229626
AFN 72.557604
ALL 96.200283
AMD 434.304194
ANG 2.061644
AOA 1056.111273
ARS 1608.366971
AUD 1.624462
AWG 2.075944
AZN 1.961012
BAM 1.959872
BBD 2.316914
BDT 141.153259
BGN 1.968616
BHD 0.434975
BIF 3415.570318
BMD 1.151703
BND 1.471489
BOB 7.977574
BRL 6.023521
BSD 1.150395
BTN 106.10737
BWP 15.685657
BYN 3.42682
BYR 22573.37436
BZD 2.313607
CAD 1.577706
CDF 2608.606438
CHF 0.906401
CLF 0.026516
CLP 1047.036065
CNY 8.011532
CNH 7.927786
COP 4266.390788
CRC 540.339027
CUC 1.151703
CUP 30.520123
CVE 110.495044
CZK 24.447537
DJF 204.846478
DKK 7.472351
DOP 70.218019
DZD 152.293142
EGP 60.314344
ERN 17.275542
ETB 181.205966
FJD 2.548085
FKP 0.865883
GBP 0.864249
GEL 3.132339
GGP 0.865883
GHS 12.521068
GIP 0.865883
GMD 84.64982
GNF 10085.259587
GTQ 8.817357
GYD 240.800286
HKD 9.024915
HNL 30.45433
HRK 7.536975
HTG 150.776526
HUF 390.904627
IDR 19546.066035
ILS 3.578709
IMP 0.865883
INR 106.404091
IQD 1506.930794
IRR 1521456.949262
ISK 143.444364
JEP 0.865883
JMD 180.956741
JOD 0.816554
JPY 183.182895
KES 149.25565
KGS 100.716474
KHR 4612.683422
KMF 494.080561
KPW 1036.583062
KRW 1717.137006
KWD 0.353285
KYD 0.958592
KZT 555.504113
LAK 24686.288142
LBP 103012.919266
LKR 358.214225
LRD 210.506434
LSL 19.352807
LTL 3.400679
LVL 0.696653
LYD 7.373351
MAD 10.807353
MDL 20.015584
MGA 4788.970338
MKD 61.646389
MMK 2418.752297
MNT 4116.758787
MOP 9.277475
MRU 45.865285
MUR 53.692156
MVR 17.805285
MWK 1994.352117
MXN 20.347536
MYR 4.512364
MZN 73.59289
NAD 19.352807
NGN 1574.711229
NIO 42.33015
NOK 11.076035
NPR 169.776624
NZD 1.970322
OMR 0.442828
PAB 1.15039
PEN 3.97095
PGK 4.960413
PHP 68.687266
PKR 321.348828
PLN 4.260298
PYG 7466.7073
QAR 4.204854
RON 5.092139
RSD 117.408061
RUB 94.300137
RWF 1678.895356
SAR 4.324546
SBD 9.273119
SCR 15.398642
SDG 692.173095
SEK 10.712771
SGD 1.471444
SHP 0.864075
SLE 28.332368
SLL 24150.643776
SOS 656.266306
SRD 43.271205
STD 23837.922132
STN 24.551755
SVC 10.065913
SYP 127.696075
SZL 19.338261
THB 37.263379
TJS 11.043195
TMT 4.036718
TND 3.397774
TOP 2.773023
TRY 50.912745
TTD 7.801208
TWD 36.762926
TZS 3005.944222
UAH 50.714084
UGX 4343.023049
USD 1.151703
UYU 46.76696
UZS 13908.897074
VES 513.943044
VND 30289.782943
VUV 137.728848
WST 3.172031
XAF 657.325511
XAG 0.014343
XAU 0.00023
XCD 3.112535
XCG 2.073207
XDR 0.817502
XOF 657.325511
XPF 119.331742
YER 274.684228
ZAR 19.245057
ZMK 10366.706959
ZMW 22.402543
ZWL 370.847823
  • RBGPF

    0.1000

    82.5

    +0.12%

  • CMSC

    0.0000

    22.99

    0%

  • RYCEF

    0.3800

    16.5

    +2.3%

  • NGG

    -0.0100

    90.89

    -0.01%

  • GSK

    0.3800

    53.77

    +0.71%

  • RELX

    0.3300

    34.47

    +0.96%

  • VOD

    0.1900

    14.6

    +1.3%

  • RIO

    2.0300

    89.86

    +2.26%

  • BTI

    1.0100

    60.94

    +1.66%

  • AZN

    2.1100

    192.01

    +1.1%

  • BCE

    0.6521

    25.9

    +2.52%

  • JRI

    -0.0500

    12.54

    -0.4%

  • BCC

    1.7200

    71.72

    +2.4%

  • CMSD

    -0.0400

    22.95

    -0.17%

  • BP

    0.2300

    42.9

    +0.54%

Inbred, gibberish or just MAD? Warnings rise about AI models
Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

(H.Schneide--BBZ)