NLP+CSS · 2024

Can LLMs (or humans) disentangle text?

Pieuchon, Daoud, Jerzak, Johansson & Johansson

A question that sounds simple, but sits at the core of using LLMs for social science measurement: can a model reliably separate the latent dimensions we care about — without leaking confounds or inventing structure?

Text-as-data Evaluation LLMs Measurement
Summary

Evaluating LLM measurement reliability

Models can produce fluent outputs — but measurement needs stability, calibration, and clear failure modes. We test reliability directly with careful tasks, baselines, and comparisons to human judgments.

Disentanglement

Do the inferred dimensions separate, or do they collapse together?

Leakage

Does the method smuggle in information it shouldn’t have?

Human comparison

Where models match humans — and where they diverge.
Figures

Visual snapshots

Disentanglement performance

Deep dive: llms-distill

Related: content analysis

Related: readme2