Berliner Boersenzeitung - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 4.325115
AFN 75.960045
ALL 95.502105
AMD 434.86493
ANG 2.107954
AOA 1081.131951
ARS 1639.146274
AUD 1.625507
AWG 2.119867
AZN 2.005656
BAM 1.957893
BBD 2.371724
BDT 144.491599
BGN 1.964531
BHD 0.444636
BIF 3505.247586
BMD 1.177704
BND 1.493297
BOB 8.1377
BRL 5.789944
BSD 1.177554
BTN 111.199974
BWP 15.810904
BYN 3.328058
BYR 23083.000864
BZD 2.368321
CAD 1.612377
CDF 2727.563092
CHF 0.915417
CLF 0.026664
CLP 1049.393639
CNY 8.014336
CNH 8.004449
COP 4413.940847
CRC 541.330493
CUC 1.177704
CUP 31.209159
CVE 110.373163
CZK 24.292264
DJF 209.714213
DKK 7.473098
DOP 70.034877
DZD 155.763467
EGP 62.090682
ERN 17.665562
ETB 183.883897
FJD 2.572047
FKP 0.865402
GBP 0.864288
GEL 3.155907
GGP 0.865402
GHS 13.266183
GIP 0.865402
GMD 85.972603
GNF 10332.125269
GTQ 8.991613
GYD 246.403439
HKD 9.220214
HNL 31.307472
HRK 7.536367
HTG 154.184845
HUF 354.593164
IDR 20429.633469
ILS 3.416876
IMP 0.865402
INR 111.194996
IQD 1542.749409
IRR 1546207.746698
ISK 143.78596
JEP 0.865402
JMD 185.608441
JOD 0.835018
JPY 184.405653
KES 152.100798
KGS 102.955487
KHR 4725.051722
KMF 493.457997
KPW 1059.875934
KRW 1720.53171
KWD 0.36238
KYD 0.981449
KZT 544.243347
LAK 25826.612157
LBP 105460.451551
LKR 379.121531
LRD 216.101041
LSL 19.320356
LTL 3.477455
LVL 0.712381
LYD 7.446297
MAD 10.769754
MDL 20.138531
MGA 4918.820342
MKD 61.661657
MMK 2472.715575
MNT 4214.888329
MOP 9.495452
MRU 47.071326
MUR 55.139624
MVR 18.201375
MWK 2041.682836
MXN 20.266415
MYR 4.617803
MZN 75.226608
NAD 19.320356
NGN 1601.724866
NIO 43.332465
NOK 10.853009
NPR 177.936238
NZD 1.976529
OMR 0.452833
PAB 1.177659
PEN 4.07139
PGK 5.200096
PHP 71.23949
PKR 328.187817
PLN 4.233434
PYG 7193.049039
QAR 4.304218
RON 5.220994
RSD 117.367624
RUB 87.395277
RWF 1726.445805
SAR 4.452457
SBD 9.459623
SCR 16.870726
SDG 707.204687
SEK 10.853957
SGD 1.492339
SHP 0.879275
SLE 28.968733
SLL 24695.862149
SOS 673.019549
SRD 44.082684
STD 24376.097627
STN 24.524033
SVC 10.304098
SYP 130.18806
SZL 19.307642
THB 37.932704
TJS 10.987647
TMT 4.133741
TND 3.420657
TOP 2.835629
TRY 53.422894
TTD 7.980821
TWD 36.878616
TZS 3060.139342
UAH 51.72599
UGX 4412.323986
USD 1.177704
UYU 46.966026
UZS 14283.998023
VES 584.387458
VND 30983.040139
VUV 138.999877
WST 3.18462
XAF 656.659058
XAG 0.014577
XAU 0.00025
XCD 3.182804
XCG 2.12228
XDR 0.819107
XOF 656.600455
XPF 119.331742
YER 281.004388
ZAR 19.315467
ZMK 10600.751704
ZMW 22.420971
ZWL 379.220248
  • CMSC

    0.0650

    23.01

    +0.28%

  • RBGPF

    0.0000

    63.18

    0%

  • NGG

    1.2100

    87.12

    +1.39%

  • RIO

    1.7240

    104.834

    +1.64%

  • BTI

    0.3600

    58.44

    +0.62%

  • GSK

    -0.1800

    50.32

    -0.36%

  • BCE

    -0.3250

    24.245

    -1.34%

  • BCC

    -1.3650

    71.395

    -1.91%

  • AZN

    0.0650

    182.585

    +0.04%

  • RYCEF

    -0.8500

    16.6

    -5.12%

  • BP

    -0.3750

    43.435

    -0.86%

  • RELX

    -0.0041

    33.5

    -0.01%

  • CMSD

    0.0400

    23.46

    +0.17%

  • JRI

    -0.0300

    13.12

    -0.23%

  • VOD

    0.4450

    16.135

    +2.76%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

(A.Lehmann--BBZ)