Berliner Boersenzeitung - Inbred, gibberish or just MAD? Warnings rise about AI models

EUR -
AED 4.320095
AFN 75.885663
ALL 95.39106
AMD 434.359293
ANG 2.105503
AOA 1079.875165
ARS 1641.608916
AUD 1.626097
AWG 2.117403
AZN 2.00155
BAM 1.955617
BBD 2.368967
BDT 144.323592
BGN 1.962246
BHD 0.444119
BIF 3501.171877
BMD 1.176335
BND 1.49156
BOB 8.128238
BRL 5.776866
BSD 1.176185
BTN 111.070676
BWP 15.79252
BYN 3.324188
BYR 23056.161221
BZD 2.365567
CAD 1.606091
CDF 2724.390954
CHF 0.915576
CLF 0.026587
CLP 1046.373458
CNY 8.005017
CNH 8.000023
COP 4398.19802
CRC 540.701063
CUC 1.176335
CUP 31.172871
CVE 110.244828
CZK 24.30766
DJF 209.470369
DKK 7.473237
DOP 69.953444
DZD 155.593016
EGP 62.020486
ERN 17.645021
ETB 183.670087
FJD 2.570173
FKP 0.864396
GBP 0.864212
GEL 3.152187
GGP 0.864396
GHS 13.250758
GIP 0.864396
GMD 85.872502
GNF 10320.111643
GTQ 8.981158
GYD 246.116934
HKD 9.20856
HNL 31.271069
HRK 7.533241
HTG 154.005567
HUF 356.064543
IDR 20432.346547
ILS 3.416253
IMP 0.864396
INR 111.13652
IQD 1540.955585
IRR 1544409.901346
ISK 143.806836
JEP 0.864396
JMD 185.392625
JOD 0.834004
JPY 184.389884
KES 151.900296
KGS 102.835777
KHR 4719.557692
KMF 492.883828
KPW 1058.643569
KRW 1725.519067
KWD 0.361876
KYD 0.980308
KZT 543.610531
LAK 25796.582394
LBP 105337.827942
LKR 378.68071
LRD 215.849771
LSL 19.297891
LTL 3.473411
LVL 0.711553
LYD 7.437639
MAD 10.757232
MDL 20.115115
MGA 4913.101009
MKD 61.641843
MMK 2469.840437
MNT 4209.987489
MOP 9.484411
MRU 47.016594
MUR 55.076306
MVR 18.180264
MWK 2039.30888
MXN 20.271482
MYR 4.612434
MZN 75.167161
NAD 19.297891
NGN 1599.45028
NIO 43.28208
NOK 10.821804
NPR 177.729344
NZD 1.973736
OMR 0.452335
PAB 1.17629
PEN 4.066656
PGK 5.19405
PHP 71.143536
PKR 327.806219
PLN 4.232417
PYG 7184.685358
QAR 4.299213
RON 5.224695
RSD 117.388809
RUB 87.170473
RWF 1724.438389
SAR 4.447279
SBD 9.448624
SCR 16.852352
SDG 706.388119
SEK 10.84046
SGD 1.491516
SHP 0.878253
SLE 28.944025
SLL 24667.14716
SOS 672.236999
SRD 44.031407
STD 24347.754442
STN 24.495518
SVC 10.292117
SYP 130.036684
SZL 19.285193
THB 37.889551
TJS 10.974871
TMT 4.128935
TND 3.41668
TOP 2.832332
TRY 53.363256
TTD 7.971541
TWD 36.930438
TZS 3063.933249
UAH 51.665846
UGX 4407.193579
USD 1.176335
UYU 46.911416
UZS 14267.389376
VES 583.707963
VND 30947.014765
VUV 138.838256
WST 3.180917
XAF 655.895531
XAG 0.014572
XAU 0.00025
XCD 3.179103
XCG 2.119812
XDR 0.818154
XOF 655.836996
XPF 119.331742
YER 280.672359
ZAR 19.312335
ZMK 10588.444039
ZMW 22.394901
ZWL 378.779312
  • RBGPF

    0.0000

    63.18

    0%

  • CMSC

    -0.0400

    22.97

    -0.17%

  • CMSD

    0.0000

    23.42

    0%

  • GSK

    -0.0300

    50.5

    -0.06%

  • NGG

    -1.9400

    85.91

    -2.26%

  • BCE

    0.3400

    24.57

    +1.38%

  • BTI

    -1.4800

    58.08

    -2.55%

  • RIO

    -2.4000

    103.11

    -2.33%

  • RELX

    -1.5900

    34.16

    -4.65%

  • BP

    -0.8200

    43.81

    -1.87%

  • RYCEF

    -0.0500

    17.45

    -0.29%

  • BCC

    -1.4800

    72.76

    -2.03%

  • VOD

    -0.4400

    15.69

    -2.8%

  • AZN

    -2.4000

    182.52

    -1.31%

  • JRI

    -0.0200

    13.15

    -0.15%

Inbred, gibberish or just MAD? Warnings rise about AI models
Inbred, gibberish or just MAD? Warnings rise about AI models / Photo: Fabrice COFFRINI - AFP/File

Inbred, gibberish or just MAD? Warnings rise about AI models

When academic Jathan Sadowski reached for an analogy last year to describe how AI programs decay, he landed on the term "Habsburg AI".

Text size:

The Habsburgs were one of Europe's most powerful royal houses, but entire sections of their family line collapsed after centuries of inbreeding.

Recent studies have shown how AI programs underpinning products like ChatGPT go through a similar collapse when they are repeatedly fed their own data.

"I think the term Habsburg AI has aged very well," Sadowski told AFP, saying his coinage had "only become more relevant for how we think about AI systems".

The ultimate concern is that AI-generated content could take over the web, which could in turn render chatbots and image generators useless and throw a trillion-dollar industry into a tailspin.

But other experts argue that the problem is overstated, or can be fixed.

And many companies are enthusiastic about using what they call synthetic data to train AI programs. This artificially generated data is used to augment or replace real-world data. It is cheaper than human-created content but more predictable.

"The open question for researchers and companies building AI systems is: how much synthetic data is too much," said Sadowski, lecturer in emerging technologies at Australia's Monash University.

- 'Mad cow disease' -

Training AI programs, known in the industry as large language models (LLMs), involves scraping vast quantities of text or images from the internet.

This information is broken into trillions of tiny machine-readable chunks, known as tokens.

When asked a question, a program like ChatGPT selects and assembles tokens in a way that its training data tells it is the most likely sequence to fit with the query.

But even the best AI tools generate falsehoods and nonsense, and critics have long expressed concern about what would happen if a model was fed on its own outputs.

In late July, a paper in the journal Nature titled "AI models collapse when trained on recursively generated data" proved a lightning rod for discussion.

The authors described how models quickly discarded rarer elements in their original dataset and, as Nature reported, outputs degenerated into "gibberish".

A week later, researchers from Rice and Stanford universities published a paper titled "Self-consuming generative models go MAD" that reached a similar conclusion.

They tested image-generating AI programs and showed that outputs become more generic and strafed with undesirable elements as they added AI-generated data to the underlying model.

They labelled model collapse "Model Autophagy Disorder" (MAD) and compared it to mad cow disease, a fatal illness caused by feeding the remnants of dead cows to other cows.

- 'Doomsday scenario' -

These researchers worry that AI-generated text, images and video are clearing the web of usable human-made data.

"One doomsday scenario is that if left uncontrolled for many generations, MAD could poison the data quality and diversity of the entire internet," one of the Rice University authors, Richard Baraniuk, said in a statement.

However, industry figures are unfazed.

Anthropic and Hugging Face, two leaders in the field who pride themselves on taking an ethical approach to the technology, both told AFP they used AI-generated data to fine-tune or filter their datasets.

Anton Lozhkov, machine learning engineer at Hugging Face, said the Nature paper gave an interesting theoretical perspective but its disaster scenario was not realistic.

"Training on multiple rounds of synthetic data is simply not done in reality," he said.

However, he said researchers were just as frustrated as everyone else with the state of the internet.

"A large part of the internet is trash," he said, adding that Hugging Face already made huge efforts to clean data -- sometimes jettisoning as much as 90 percent.

He hoped that web users would help clear up the internet by simply not engaging with generated content.

"I strongly believe that humans will see the effects and catch generated data way before models will," he said.

(H.Schneide--BBZ)