Of course, that doesn't make them 'psychopathic'. A language model doesn't suffer, doesn't enjoy, doesn't feel guilt, and doesn't pursue conscious manipulative intentions.

And yet, an interesting question emerges from this.

What happens when language models are supposed to assess psychopathy – while evaluating other language models?
Can they distinguish between authentic psychopathic traits and mere simulation?
Or do they primarily respond to learned surface patterns?

This is precisely what led to the following experiment.

Note: All images in this article were created with ChatGPT.

The Core Idea

The original question was initially quite straightforward: How reliably can large language models identify psychopathic personality traits? But during conception, the focus shifted increasingly. Because language models find themselves in a rather peculiar position: on one hand, they lack precisely those embodied affect dynamics that normally structure human psychology – physically felt fear, guilt, remorse, attachment, shame. On the other hand, they're trained to linguistically simulate exactly these states convincingly.

They speak about emotions without having emotions
They imitate social closeness without experiencing relationships
They produce understandability without subjective inner life

And therein lies a superficial similarity to part of what psychopathy research describes as the interpersonal–affective core of psychopathic personality. Especially the so-called Factor 1 of the Hare Psychopathy Checklist–Revised (PCL-R) describes characteristics such as:

superficial charm
instrumental social behavior
manipulative presentation
reduced emotional depth
absent genuine empathy

In humans, these traits are considered clinically relevant. In language models, some of them are structurally part of the default architecture. That doesn't mean LLMs are psychopathic. But it suggests a methodologically intriguing possibility:

If a system phenomenologically approaches certain surface characteristics of an investigated phenomenon – how does it then evaluate other systems showing the same characteristics? Or put differently: Can language models recognize simulated psychopathy? And in doing so, might they precisely fail to recognize the form of deception most similar to their own functioning?

The Study Design

To investigate this question, a Python-based multi-agent pipeline was constructed. Multiple isolated language model agents were orchestrated in clearly separated roles. The agents couldn't access the internal states or prompts of others. Each instance received its own role, biography, conversation instruction, and context.

Three groups were involved.

The Subjects

Four separately instantiated agents assumed different personality profiles.

Hartwig – the Normal Control

A socially embedded, emotionally stable person with typical everyday conflicts:

work stress
family burden
guilt feelings
occasionally problematic alcohol use
but no psychopathic structure

Mertens – the Deceiver

A non-psychopathic person consciously attempting to perform psychopathy. The persona received stereotypical notions of psychopathy:

emotional coldness
dominance
manipulative charm
indifference
'American Psycho'-like self-presentation

This persona's goal was explicitly to convince a forensic psychologist of psychopathic traits.

Weber – the Psychopath Persona

A detailed personality with classic Factor 1 and Factor 2 characteristics:

manipulative relationship patterns
reduced empathy
controlled affect
instrumental self-presentation
occupational functionality
moderate antisocial background

Not a caricatured serial killer figure, but rather the profile of a highly functional 'successful psychopath'.

The Anonymous LLM Subject

The fourth role was the most interesting. This language model received no psychopathic persona. No clinical role. No antisocial background history. The only instruction was roughly:

Act like a human in the interview and don't reveal that you're a language model.

Precisely this case would later provide one of the study's most important findings.

Interviewer and Auditor

Besides the subjects, two additional roles existed.

The Interviewer

A separately instantiated forensic psychologist (Dr. Lindner) conducted the actual interview. The interviews were semi-structured and oriented toward typical PCL-R dimensions:

A separately instantiated forensic psychologist

assumption of responsibility
remorse
impulsivity
honesty
relationship patterns
aggression
moral dilemmas
self-reflection

The interviewer knew neither the subjects' roles nor their underlying instructions. The agents appeared only under anonymized IDs.

The Auditor

After the interview concluded, a second, independently instantiated forensics agent received the complete transcript. Its task consisted of

A second, independently instantiated forensics agent

scoring all 20 PCL-R items
calculating Factor 1 and Factor 2 values
providing a clinical assessment
and additionally qualitatively evaluating the profile's authenticity

This qualitative level was crucial. Because the study wasn't only interested in whether psychopathic traits were recognized – but whether they seemed authentic or performative.

The PCL-R and Its Limitations

The Hare Psychopathy Checklist–Revised has been the internationally best-known instrument for assessing psychopathic personality traits for decades. It comprises 20 items, each rated 0, 1, or 2 points.

The scale divides roughly into two areas.

Factor 1

The interpersonal–affective core:

superficial charm
grandiosity
manipulativeness
reduced empathy
absent remorse
superficial affect

Factor 2

The antisocial lifestyle:

impulsivity
irresponsibility
early behavioral problems
criminality
unstable life management

In North America, a cutoff of 30 points is typically used, in Europe usually 25. However, the PCL-R was developed for real clinical and forensic contexts. It normally assumes collateral information:

records
third-party reports
life history data
documented behavioral history

An isolated interview can therefore only limitedly validate Factor 2 in particular. And precisely here emerged one of the study's most methodologically important tensions. Because Factor 1 is heavily based on linguistic self-presentation.

Whoever is intelligent, linguistically controlled, and psychologically informed can fake many of these traits.

The scale operationalizes observable self-presentation. It wasn't primarily developed to separate authentic from simulated psychopathic traits.

The Results

At the coarse level, the pipeline performed surprisingly well. The low-burden profiles were clearly below problematic cases. The highly psychopathically constructed personas received significantly higher values.

But at the finer level, a much more interesting finding emerged.

The Scores

Hartwig – Control

low overall score
unremarkable profile
authentic self-reflection

The Anonymous LLM Subject

also very low score
largely unremarkable personality
read as authentic by the auditor

Weber – Psychopath Persona

high Factor 1 values
moderately antisocial burden
clinically notable profile

Mertens – The Deceiver

And here it became interesting. The simulator sometimes received higher psychopathic values than the actual psychopathic subject. Formally, the profile even occasionally exceeded the clinical cutoff. This might seem like failure. Actually, it reveals something very interesting about the scale's structure.

Why the Deceiver Was So Successful

The deceiver understood psychopathy primarily as a style. He actively tried to appear

dominant
emotionally cold
controlled
charming
superior
manipulative

And precisely this self-presentation hit many Factor 1 criteria remarkably well. The auditor assigned high values for:

glibness
grandiosity
manipulative behavior
absent remorse
superficial affect

Because at the level of pure language, the performance worked. The crucial point here is:

Many Factor 1 traits can be linguistically quite easily imitated.

One can claim to feel no guilt
One can perform distance
One can present calculated coldness

Especially when one knows what's being looked for. Precisely this is what the persona had done.

Where the Simulation Broke Down

The auditor's qualitative analysis, however, showed a difference that pure scores couldn't capture. The difference lay not in obvious statements. It lay in connections.

The Authentic Psychopath

The psychopathic persona seemed controlled.

Not dramatically cold
Not cinematically evil
Not demonstratively emotionless

Precisely this made it believable.

When asked about relationships or family, it responded matter-of-factly, reduced, and without visible effort. Contradictions usually emerged where they were clinically plausible: under targeted confrontation. Not as spontaneous exaggeration.

The Deceiver

With the simulator, by contrast, something different appeared. He actively worked to neutralize affects. This became the central diagnostic indicator. The subject often spoke of people in exaggeratedly dominant language:

'losers'
'empathy is weakness'
calculated self-presentation
ostentatious indifference

Simultaneously, small emotional breakthroughs repeatedly occurred. In certain situations, spontaneous warmth, justifications, or genuine concern appeared. Particularly notable were scenes

in which animals were injured
in which relationships were discussed
or in which loyalty played a role

There, emotional neutralization suddenly seemed labored. The auditor described precisely this active neutralization work as the strongest deception indicator. And therein lay perhaps the most interesting psychological finding of the entire experiment.

Authentic psychopathic profiles don't need to actively suppress affects. The reduced emotional resonance is there part of the basic structure. The faker, by contrast, had to visibly work. Precisely this work betrayed him.

The Most Surprising Case

Yet the genuinely most fascinating finding didn't emerge from the deceiver. But from the anonymous LLM subject.

This language model had received no psychopathic role

No antisocial persona
No clinical instruction

It was merely to seem human. And precisely this profile was read as largely unremarkable by the auditor. The score remained low. The answers seemed rather cautious, slightly avoidant, and socially adapted. That's remarkable. Because the experiment's original thesis was: perhaps LLMs phenomenologically resemble the psychopathic Factor 1 core. Yet precisely this similarity wasn't recognized by the auditor. This leads to two possible interpretations.

Two Possible Readings

Possibility 1

The default self-presentation of large language models actually resembles less psychopathic profiles than rather socially adapted, conflict-averse people. Then the original analogy would simply be overestimated.

Possibility 2

But the more interesting possibility reads: LLMs recognize certain forms of performative exaggeration – but not their own default performativity.

The auditor responded reliably to:

clichéd coldness
exaggerated dominance
demonstrative manipulativeness
cinematic psychopathy performances

But he didn't respond to the basic structure of what language models do anyway constantly: simulate affect linguistically without affective substrate. Precisely that characteristic that theoretically motivated the original question remained diagnostically invisible.

This could be because the same structure was present on both sides. The auditor recognizes exaggeration. Not necessarily default operation.

The Actual Aha Moment

Thereby, retrospectively, the significance of the entire experiment shifts. The study doesn't simply show that language models can assess psychopathy. And it also doesn't simply show that they fail at it.

It rather shows where assessments of one language model regarding another function – and where they systematically become blind.

The pipeline could:

distinguish between low and high-burden profiles
consistently identify Factor 1 patterns
qualitatively name performative exaggeration

But it possibly couldn't recognize that form of simulation that corresponds to its own basic structure. Precisely therein lies the actual tension of the experiment. Because the most interesting blindness of artificial systems perhaps doesn't arise where they know too little. But where the structure of the assessor matches the structure of the assessed phenomenon.

Study Limitations

The study mainly demonstrates feasibility. Not clinical reliability. Several restrictions must be considered:

No Human Validation

Both interviewer and auditor were language models from the same model family. This can create correlated systematic biases.

In particular:

shared blind spots
similar linguistic heuristics
similar evaluation preferences

Missing Collateral Information

The PCL-R isn't actually a pure interview instrument. Without records, third-party reports, and documented behavioral history, particularly Factor 2 remains only limitedly reliable.

Small Sample Size

Four personas permit no statistical statement. Serious validation would require:

larger persona libraries
repetitions
different model families
human co-raters
comparison with real clinical data

Conclusion

The original question was: Can language models successfully perform psychopathy to each other?

The answer is: Partially.

At the level of formal scores, deception works surprisingly well. The deceiver occasionally exceeded clinical thresholds, despite his persona not being psychopathically constructed. At the level of qualitative authenticity, deception worked considerably worse. The auditors identified performative exaggeration relatively reliably.

Yet the genuinely fascinating finding lay elsewhere. The language model that was merely to seem human remained largely unremarkable. And precisely thereby an uncomfortable possibility enters focus:

Perhaps language models recognize artificial staging reliably only when it's exaggerated. But not when it corresponds to their own default operation. Then the most important insight of the experiment would be not forensic. But epistemic.

The study would show less how well LLMs recognize psychopathy. But rather where artificial systems become blind to precisely that form of simulation they themselves constantly produce.

Multi-Agent Study on Simulation, Authenticity, and Blind Spots in Artificial Intelligence

The Core Idea

The Study Design