CuRe: Cultural Reasoning for Responsible Language Model Development

Independent Research Fund Denmark (DFF), 2026–2030

logo

Project Summary

CuRe investigates how language models interpret culture — not through facts or stereotypes, but through the rich, ambiguous, and historically layered world of literature. By combining NLP, literary studies, and expert-driven evaluation design, the project develops new methods for assessing and improving cultural reasoning in AI systems.

Why Culture? Why Literature?

Culture shapes meaning, and meaning is where AI struggles most.

Literature embeds cultural knowledge through:

Referencing — allusions, echoes of earlier works
Reuse — motifs, clichés, idioms, recurring patterns
Perspective and interpretation — multiple legitimate readings

This makes literature the ideal testing ground for evaluating how AI understands culture — and for building models that respect cultural nuance rather than reducing it to stereotypes.

Objectives

CuRe addresses three core research questions:

How can we build robust benchmarks for cultural reasoning where multiple interpretations are valid?
How do we model and evaluate interpretive depth in AI, beyond surface pattern recognition?
How can we improve cultural reasoning in AI responsibly, without reinforcing essentialism?

Work Packages

WP1 — Empirical Foundations

Construction of high-quality corpora and interpretive benchmarks based on Danish literature, including MeMo, Mini-WorldLit, and canonical texts. Data includes passages, interpretive annotations, student essays, and expert commentary, designed following the ECBD framework.

WP2 — Methodological Reflection

Evaluation of retrieval-augmented generation (RAG), long-context models, and soft-label annotation strategies. Analysis of interpretive ambiguity, multiple valid readings, and the relationship between close and distant reading.

WP3 — Model Adaptation & Training

Adaptation of Danish Foundation Models (DFM), experiments with pretraining mixtures, fine-tuning with expert feedback, and human-in-the-loop reinforcement learning.

Team

Principal Investigators

Daniel Hershcovich (PI) Tenure-Track Assistant Professor, Department of Computer Science, University of Copenhagen
Jens Bjerring-Hansen (Co-PI) Associate Professor, Department of Nordic Studies and Linguistics, University of Copenhagen
Desmond Elliott (Advisor) Associate Professor, Department of Computer Science, University of Copenhagen

Researchers

Alexander Conroy (Postdoc)
PhD Student (2027–2030)

International Collaborators

David Bamman (UC Berkeley)
Timothy Tangherlini (UC Berkeley)
Maria Antoniak (University of Colorado)
Jackie Cheung (McGill University)

Advisory Board

Serge Belongie
Anders Søgaard
Carsten Levisen
Bolette Sandford Pedersen

Methods and Approach

Expert-driven annotation of interpretive phenomena
Soft labels capturing multiple valid interpretations
Retrieval-augmented reasoning over literary corpora
Human-in-the-loop refinement of model behaviour
Comparisons of open-source and proprietary LLMs
Transparent benchmark design via ECBD principles

Expected Outcomes

A publicly released benchmark suite for cultural and literary reasoning
New corpora and annotations grounded in Danish literary heritage
Methodological insights into responsible cultural evaluation
Improved AI reasoning about cultural and historical context
Cross-disciplinary impact across NLP and the humanities

Publications

A rolling list of project publications (2026–2030) will be maintained here.

Benchmark releases
Methodological papers
Joint CS–literature analyses
Workshops and conference outputs

Contact

For inquiries or collaboration:

dh@di.ku.dk