Berliner Boersenzeitung - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 4.401854
AFN 77.897256
ALL 96.833701
AMD 453.488183
ANG 2.145273
AOA 1098.954337
ARS 1729.081733
AUD 1.717911
AWG 2.15866
AZN 2.040433
BAM 1.967924
BBD 2.410672
BDT 146.262316
BGN 2.012596
BHD 0.451741
BIF 3559.317113
BMD 1.198423
BND 1.51589
BOB 8.270852
BRL 6.245461
BSD 1.196884
BTN 109.783816
BWP 15.753184
BYN 3.410526
BYR 23489.096101
BZD 2.407251
CAD 1.629915
CDF 2684.467728
CHF 0.918076
CLF 0.026087
CLP 1030.047915
CNY 8.334614
CNH 8.319005
COP 4402.875269
CRC 594.668609
CUC 1.198423
CUP 31.758217
CVE 110.793941
CZK 24.250068
DJF 212.983927
DKK 7.467255
DOP 75.441109
DZD 154.838707
EGP 56.32577
ERN 17.976349
ETB 185.75505
FJD 2.638029
FKP 0.875018
GBP 0.869277
GEL 3.229785
GGP 0.875018
GHS 13.10474
GIP 0.875018
GMD 87.484534
GNF 10486.203264
GTQ 9.183655
GYD 250.410645
HKD 9.3486
HNL 31.710475
HRK 7.538203
HTG 156.968364
HUF 380.014633
IDR 20012.470194
ILS 3.722842
IMP 0.875018
INR 109.714872
IQD 1569.934484
IRR 50483.580457
ISK 145.296991
JEP 0.875018
JMD 188.048533
JOD 0.849674
JPY 182.912353
KES 154.872094
KGS 104.8009
KHR 4830.844578
KMF 493.750766
KPW 1078.604207
KRW 1722.583589
KWD 0.36696
KYD 0.997445
KZT 602.997475
LAK 25817.036779
LBP 102525.11035
LKR 370.616394
LRD 222.24754
LSL 19.126971
LTL 3.538632
LVL 0.724915
LYD 7.579969
MAD 10.851761
MDL 20.180327
MGA 5362.944187
MKD 61.664206
MMK 2516.748037
MNT 4272.540069
MOP 9.617632
MRU 47.793202
MUR 54.551915
MVR 18.515755
MWK 2080.462606
MXN 20.660008
MYR 4.735568
MZN 76.411323
NAD 19.12714
NGN 1687.955172
NIO 43.98542
NOK 11.521264
NPR 175.654642
NZD 1.992241
OMR 0.460804
PAB 1.196864
PEN 4.010525
PGK 5.10172
PHP 70.626078
PKR 335.259502
PLN 4.197765
PYG 8022.492074
QAR 4.363467
RON 5.096534
RSD 117.411955
RUB 91.863782
RWF 1740.110589
SAR 4.4941
SBD 9.680475
SCR 16.921881
SDG 720.847311
SEK 10.55304
SGD 1.512938
SHP 0.899128
SLE 29.124591
SLL 25130.335892
SOS 684.955658
SRD 45.895983
STD 24804.942092
STN 24.687519
SVC 10.472563
SYP 13254.051915
SZL 19.126646
THB 37.171467
TJS 11.179126
TMT 4.194481
TND 3.392135
TOP 2.885515
TRY 52.012492
TTD 8.139212
TWD 37.57956
TZS 3061.041504
UAH 51.378175
UGX 4273.36308
USD 1.198423
UYU 44.84629
UZS 14530.882075
VES 429.60616
VND 31319.59375
VUV 143.507965
WST 3.270848
XAF 660.03991
XAG 0.011307
XAU 0.000236
XCD 3.238799
XCG 2.157108
XDR 0.823023
XOF 662.125411
XPF 119.331742
YER 285.707797
ZAR 19.153443
ZMK 10787.225649
ZMW 23.632299
ZWL 385.891804
  • RBGPF

    -0.8300

    82.4

    -1.01%

  • RYCEF

    0.1500

    17.15

    +0.87%

  • SCS

    0.0200

    16.14

    +0.12%

  • CMSC

    -0.0496

    23.73

    -0.21%

  • VOD

    0.2250

    14.455

    +1.56%

  • GSK

    0.5850

    50.905

    +1.15%

  • NGG

    1.5700

    84.15

    +1.87%

  • RIO

    1.9210

    92.391

    +2.08%

  • AZN

    1.0800

    95.31

    +1.13%

  • BTI

    1.1500

    60.14

    +1.91%

  • BCC

    -2.2750

    81.125

    -2.8%

  • RELX

    -1.5200

    37.99

    -4%

  • CMSD

    -0.0500

    24.11

    -0.21%

  • BCE

    0.2550

    25.405

    +1%

  • JRI

    -0.0580

    13.672

    -0.42%

  • BP

    0.6250

    37.385

    +1.67%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

(A.Lehmann--BBZ)