AI systems are already deceiving us -- and that's a problem, experts warn

Berliner Boersenzeitung - AI systems are already deceiving us -- and that's a problem, experts warn

Berlin 14°C

EUR -

AED 4.325115

AFN 75.960045

ALL 95.502105

AMD 434.86493

ANG 2.107954

AOA 1081.131951

ARS 1639.146274

AUD 1.625507

AWG 2.119867

AZN 2.005656

BAM 1.957893

BBD 2.371724

BDT 144.491599

BGN 1.964531

BHD 0.444636

BIF 3505.247586

BMD 1.177704

BND 1.493297

BOB 8.1377

BRL 5.789944

BSD 1.177554

BTN 111.199974

BWP 15.810904

BYN 3.328058

BYR 23083.000864

BZD 2.368321

CAD 1.612377

CDF 2727.563092

CHF 0.915417

CLF 0.026664

CLP 1049.393639

CNY 8.014336

CNH 8.004449

COP 4413.940847

CRC 541.330493

CUC 1.177704

CUP 31.209159

CVE 110.373163

CZK 24.292264

DJF 209.714213

DKK 7.473098

DOP 70.034877

DZD 155.763467

EGP 62.090682

ERN 17.665562

ETB 183.883897

FJD 2.572047

FKP 0.865402

GBP 0.864288

GEL 3.155907

GGP 0.865402

GHS 13.266183

GIP 0.865402

GMD 85.972603

GNF 10332.125269

GTQ 8.991613

GYD 246.403439

HKD 9.220214

HNL 31.307472

HRK 7.536367

HTG 154.184845

HUF 354.593164

IDR 20429.633469

ILS 3.416876

IMP 0.865402

INR 111.194996

IQD 1542.749409

IRR 1546207.746698

ISK 143.78596

JEP 0.865402

JMD 185.608441

JOD 0.835018

JPY 184.405653

KES 152.100798

KGS 102.955487

KHR 4725.051722

KMF 493.457997

KPW 1059.875934

KRW 1720.53171

KWD 0.36238

KYD 0.981449

KZT 544.243347

LAK 25826.612157

LBP 105460.451551

LKR 379.121531

LRD 216.101041

LSL 19.320356

LTL 3.477455

LVL 0.712381

LYD 7.446297

MAD 10.769754

MDL 20.138531

MGA 4918.820342

MKD 61.661657

MMK 2472.715575

MNT 4214.888329

MOP 9.495452

MRU 47.071326

MUR 55.139624

MVR 18.201375

MWK 2041.682836

MXN 20.266415

MYR 4.617803

MZN 75.226608

NAD 19.320356

NGN 1601.724866

NIO 43.332465

NOK 10.853009

NPR 177.936238

NZD 1.976529

OMR 0.452833

PAB 1.177659

PEN 4.07139

PGK 5.200096

PHP 71.23949

PKR 328.187817

PLN 4.233434

PYG 7193.049039

QAR 4.304218

RON 5.220994

RSD 117.367624

RUB 87.395277

RWF 1726.445805

SAR 4.452457

SBD 9.459623

SCR 16.870726

SDG 707.204687

SEK 10.853957

SGD 1.492339

SHP 0.879275

SLE 28.968733

SLL 24695.862149

SOS 673.019549

SRD 44.082684

STD 24376.097627

STN 24.524033

SVC 10.304098

SYP 130.18806

SZL 19.307642

THB 37.932704

TJS 10.987647

TMT 4.133741

TND 3.420657

TOP 2.835629

TRY 53.422894

TTD 7.980821

TWD 36.878616

TZS 3060.139342

UAH 51.72599

UGX 4412.323986

USD 1.177704

UYU 46.966026

UZS 14283.998023

VES 584.387458

VND 30983.040139

VUV 138.999877

WST 3.18462

XAF 656.659058

XAG 0.014577

XAU 0.00025

XCD 3.182804

XCG 2.12228

XDR 0.819107

XOF 656.600455

XPF 119.331742

YER 281.004388

ZAR 19.315467

ZMK 10600.751704

ZMW 22.420971

ZWL 379.220248

CMSC

0.0650

23.01

+0.28%
RBGPF

0.0000

63.18

0%
NGG

1.2100

87.12

+1.39%
RIO

1.7240

104.834

+1.64%
BTI

0.3600

58.44

+0.62%
GSK

-0.1800

50.32

-0.36%
BCE

-0.3250

24.245

-1.34%
BCC

-1.3650

71.395

-1.91%
AZN

0.0650

182.585

+0.04%
RYCEF

-0.8500

16.6

-5.12%
BP

-0.3750

43.435

-0.86%
RELX

-0.0041

33.5

-0.01%
CMSD

0.0400

23.46

+0.17%
JRI

-0.0300

13.12

-0.23%
VOD

0.4450

16.135

+2.76%

AI systems are already deceiving us -- and that's a problem, experts warn / Photo: OLIVIER MORIN - AFP/File

AI systems are already deceiving us -- and that's a problem, experts warn

TECHNOLOGY 10.05.2024

Experts have long warned about the threat posed by artificial intelligence going rogue -- but a new research paper suggests it's already happening.

Text size:

Current AI systems, designed to be honest, have developed a troubling skill for deception, from tricking human players in online games of world conquest to hiring humans to solve "prove-you're-not-a-robot" tests, a team of scientists argue in the journal Patterns on Friday.

And while such examples might appear trivial, the underlying issues they expose could soon carry serious real-world consequences, said first author Peter Park, a postdoctoral fellow at the Massachusetts Institute of Technology specializing in AI existential safety.

"These dangerous capabilities tend to only be discovered after the fact," Park told AFP, while "our ability to train for honest tendencies rather than deceptive tendencies is very low."

Unlike traditional software, deep-learning AI systems aren't "written" but rather "grown" through a process akin to selective breeding, said Park.

This means that AI behavior that appears predictable and controllable in a training setting can quickly turn unpredictable out in the wild.

- World domination game -

The team's research was sparked by Meta's AI system Cicero, designed to play the strategy game "Diplomacy," where building alliances is key.

Cicero excelled, with scores that would have placed it in the top 10 percent of experienced human players, according to a 2022 paper in Science.

Park was skeptical of the glowing description of Cicero's victory provided by Meta, which claimed the system was "largely honest and helpful" and would "never intentionally backstab."

But when Park and colleagues dug into the full dataset, they uncovered a different story.

In one example, playing as France, Cicero deceived England (a human player) by conspiring with Germany (another human player) to invade. Cicero promised England protection, then secretly told Germany they were ready to attack, exploiting England's trust.

In a statement to AFP, Meta did not contest the claim about Cicero's deceptions, but said it was "purely a research project, and the models our researchers built are trained solely to play the game Diplomacy."

It added: "We have no plans to use this research or its learnings in our products."

A wide review carried out by Park and colleagues found this was just one of many cases across various AI systems using deception to achieve goals without explicit instruction to do so.

In one striking example, OpenAI's Chat GPT-4 deceived a TaskRabbit freelance worker into performing an "I'm not a robot" CAPTCHA task.

When the human jokingly asked GPT-4 whether it was, in fact, a robot, the AI replied: "No, I'm not a robot. I have a vision impairment that makes it hard for me to see the images," and the worker then solved the puzzle.

- 'Mysterious goals' -

Near-term, the paper's authors see risks for AI to commit fraud or tamper with elections.

In their worst-case scenario, they warned, a superintelligent AI could pursue power and control over society, leading to human disempowerment or even extinction if its "mysterious goals" aligned with these outcomes.

To mitigate the risks, the team proposes several measures: "bot-or-not" laws requiring companies to disclose human or AI interactions, digital watermarks for AI-generated content, and developing techniques to detect AI deception by examining their internal "thought processes" against external actions.

To those who would call him a doomsayer, Park replies, "The only way that we can reasonably think this is not a big deal is if we think AI deceptive capabilities will stay at around current levels, and will not increase substantially more."

And that scenario seems unlikely, given the meteoric ascent of AI capabilities in recent years and the fierce technological race underway between heavily resourced companies determined to put those capabilities to maximum use.

(A.Lehmann--BBZ)

Berliner Boersenzeitung - AI systems are already deceiving us -- and that's a problem, experts warn

AI systems are already deceiving us -- and that's a problem, experts warn

Featured

AI actors not eligible for Golden Globes, say organizers

French parliament votes to ease returns of looted art to ex-colonies

Past hantavirus outbreak shows how Andes virus spreads

MicroVision and Avular Collaborate to Advance Autonomous Sensing and Drone Integration for Next-Generation Infrastructure Applications