Berliner Boersenzeitung - AI systems are already deceiving us -- and that's a problem, experts warn

EUR -
AED 4.24008
AFN 72.724514
ALL 96.508212
AMD 435.724665
ANG 2.066402
AOA 1058.549174
ARS 1611.776544
AUD 1.622763
AWG 2.07785
AZN 1.960194
BAM 1.960182
BBD 2.322973
BDT 141.516394
BGN 1.973159
BHD 0.435859
BIF 3429.606086
BMD 1.154361
BND 1.473795
BOB 7.970061
BRL 5.979824
BSD 1.153369
BTN 106.512363
BWP 15.674587
BYN 3.459434
BYR 22625.472664
BZD 2.319656
CAD 1.580741
CDF 2614.627194
CHF 0.905599
CLF 0.02653
CLP 1047.652011
CNY 7.94991
CNH 7.94404
COP 4269.692195
CRC 540.627436
CUC 1.154361
CUP 30.590563
CVE 112.146595
CZK 24.429622
DJF 205.153016
DKK 7.472137
DOP 70.358441
DZD 152.479986
EGP 60.311659
ERN 17.315413
ETB 181.6675
FJD 2.547792
FKP 0.867882
GBP 0.863953
GEL 3.139771
GGP 0.867882
GHS 12.565224
GIP 0.867882
GMD 84.83615
GNF 10135.288544
GTQ 8.834752
GYD 241.306816
HKD 9.046783
HNL 30.67094
HRK 7.536837
HTG 151.288898
HUF 388.410086
IDR 19588.349267
ILS 3.577884
IMP 0.867882
INR 106.666809
IQD 1512.212714
IRR 1516830.157279
ISK 143.59058
JEP 0.867882
JMD 181.435643
JOD 0.818461
JPY 183.486813
KES 149.548017
KGS 100.949257
KHR 4628.986439
KMF 492.91224
KPW 1038.975448
KRW 1713.590561
KWD 0.35402
KYD 0.961182
KZT 555.751774
LAK 24789.899418
LBP 103373.014559
LKR 359.166113
LRD 211.823654
LSL 19.26605
LTL 3.408527
LVL 0.698261
LYD 7.385146
MAD 10.845186
MDL 20.120682
MGA 4796.368931
MKD 61.715884
MMK 2424.334665
MNT 4126.260076
MOP 9.309756
MRU 46.295668
MUR 53.839473
MVR 17.834634
MWK 2003.970748
MXN 20.387028
MYR 4.530836
MZN 73.758321
NAD 19.266689
NGN 1566.110086
NIO 42.388525
NOK 11.057172
NPR 170.421662
NZD 1.967464
OMR 0.443817
PAB 1.153414
PEN 3.957729
PGK 4.966642
PHP 68.797607
PKR 322.384125
PLN 4.259188
PYG 7476.71599
QAR 4.205625
RON 5.092578
RSD 117.444885
RUB 95.089628
RWF 1684.21248
SAR 4.334119
SBD 9.294521
SCR 17.340571
SDG 693.770822
SEK 10.702431
SGD 1.472937
SHP 0.86607
SLE 28.396756
SLL 24206.382345
SOS 659.717532
SRD 43.432838
STD 23892.938954
STN 24.934194
SVC 10.091562
SYP 127.990792
SZL 19.266786
THB 37.228589
TJS 11.055152
TMT 4.051807
TND 3.385164
TOP 2.779423
TRY 51.000472
TTD 7.825462
TWD 36.765236
TZS 3018.653819
UAH 50.674456
UGX 4353.696015
USD 1.154361
UYU 46.884822
UZS 13973.538209
VES 516.932208
VND 30359.69036
VUV 138.04672
WST 3.179352
XAF 657.452522
XAG 0.014506
XAU 0.000231
XCD 3.119718
XCG 2.07872
XDR 0.819389
XOF 664.332234
XPF 119.331742
YER 275.373143
ZAR 19.214417
ZMK 10390.613359
ZMW 22.496979
ZWL 371.703723
  • RBGPF

    0.1000

    82.5

    +0.12%

  • RYCEF

    0.3800

    16.5

    +2.3%

  • CMSC

    -0.0400

    22.95

    -0.17%

  • CMSD

    -0.0700

    22.88

    -0.31%

  • GSK

    -0.3600

    53.41

    -0.67%

  • BCE

    0.1100

    26.01

    +0.42%

  • RIO

    -0.0600

    89.8

    -0.07%

  • NGG

    -0.4700

    90.42

    -0.52%

  • BTI

    -0.3900

    60.55

    -0.64%

  • AZN

    -0.7200

    191.29

    -0.38%

  • BCC

    1.2000

    72.92

    +1.65%

  • RELX

    -0.1800

    34.29

    -0.52%

  • VOD

    0.1500

    14.75

    +1.02%

  • JRI

    -0.0800

    12.46

    -0.64%

  • BP

    0.9500

    43.85

    +2.17%

AI systems are already deceiving us -- and that's a problem, experts warn
AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

(A.Lehmann--BBZ)