Valentina Pyatkin

Postdoctoral Researcher
Allen Institute for AI
University of Washington

News

September. 2025: I will be participating in the WallenBerg Advanced Scientific Forum on the Evaluation of Generative AI.
August. 2025: Invited talk on open LLM research and precise instruction following at the Oxford Machine Learning Summer School.
August. 2025: Invited talk on "Generalizing Precise Instruction Following" at NVIDIA.
July. 2025: Invited talk at the ICML workshop on "Models of Human Feedback for AI Alignment".
July. 2025: Invited talk at the "2nd AI for Math Workshop @ ICML 2025".
July. 2025: Invited talk on OLMo at the "International Open-Source LLM Builders Summit" - EPFL Geneva.
May. 2025: Co-organizing another edition of the SoLaR workshop on Socially Responsible Language Modelling Research at COLM 2025.
May. 2025: 2 papers accepted to ICML, see you in Vancouver!
Mar. 2025: Invited talk at Stanford on "Designing Trustworthy AI in a Polarized World".
Will be serving as Tutorial Chair of EMNLP 2025.
Dec. 2024: Co-organizing the SoLaR workshop on Socially Responsible Language Modelling Research at NeurIPS 2024.
Oct. 2024: I'm attending COLM and will also be a mentor at the MLR at Penn workshop!
Aug. 2024: 2 paper awards at ACL 2024!
Jun. 2024: Serving as Internal Communication Chair for ACL 2024
Apr. 2024: Invited talk at the UMass NLP Seminar.
Mar. 2024: Invited talk at the University of Edinburgh.
Mar. 2024: Co-organized the UnImplicit workshop at EACL-2024.
Mar. 2024: Invited talk at the Harvard Efficient ML Seminar.
Mar. 2024: DAAD sponsored visit to the University of Saarbrücken, the Max Planck Institute for Software Systems and the University of Stuttgart.
Feb. 2024: Invited talk at the UBC NLP group.
Dec. 2023: Invited talk at Brown/Tübingen.
Sep. 2023: Gave an invited talk at the KR 2023 workshop on Computational Machine Ethics.
Jan. 2023: Invited talk at the UT Austin Seminar on "Social Implications and Impact of NLP".
Jul. 2022: Co-organized the UnImplicit workshop at NAACL-2022.

I am on the academic job market for faculty positions! Feel free to reach out if you have an opening in your department.

Bio

I am a postdoctoral researcher (and Young Investigator) at the Allen Institute for AI and the University of Washington, advised by Prof. Hanna Hajishirzi. I completed my PhD in Computer Science at the NLP lab of Bar Ilan University, supervised by Prof. Ido Dagan and Prof. Reut Tsarfaty. I also was a visiting PhD student at UW NLP, with Prof. Yejin Choi, and had the pleasure of interning twice at the Allen Institute for AI. My work has been awarded an ACL Outstanding Paper Award and the ACL Best Theme Paper Award. I am also very honored to have received the AI2 Outstanding Intern of the Year Award. Previously I did a research internship at Google, obtained an MSc from the University of Edinburgh and a BA from the University of Zurich. My work has been featured in the press, for example by TechCrunch and GeekWire.

Research

My research focuses on how one can develop generative AI that is contextually robust, responsible and open. In particular, I have focused on extending language models’ capabilities through post-training and adaptation. Additionally, I have been involved in the construction of multiple, widely-used benchmarks, such as RewardBench! More specifically, my research is centered around:

Open Science of LLMs and Post-Training:Developing good open recipes for language model post-training.

I am a core contributor on the Tulu and Open-Instruct project, where develop post-training pipelines consisting of supervised finetuning, direct preference optimization, and reinforcement learning with verifiable rewards.
I have worked on the open science of language models, by contributing to OLMo and OLMo2.

Steerability, Underspecification and Context: Improving how models deal with contextual robustness, underspecified inputs, and how they can respond more precisely to instructions.
Critical Evaluation: Building challenging benchmarks for more realistic, human-centered evaluation of generative models and reward models.

Awards

Aug. 2024: ACL Outstanding Paper Award!
Aug. 2024: ACL Theme Paper Award for OLMo!
Oct. 2023: Was selected as a DAAD AInet fellow
Feb. 2023: Was awarded a postdoctoral scholarship from the Eric and Wendy Schmidt Foundation.
Jan. 2023: Was awarded the AI2 Outstanding Intern of the Year Award
Jan. 2021: Awarded the Nadav Award for Excellence in Research.

Publications

Below is a selection of my recent publications; for my full publication record, please see my Google Scholar page.

2025

IF-RLVR: Generalizing Verifiable Instruction Following

Valentina Pyatkin, Saumya Malik, Victoria Graf, Hamish Ivison, Shengyi Huang, Pradeep Dasigi, Nathan Lambert, Hannaneh Hajishirzi 📄 Paper

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Valentina Pyatkin*, Nathan Lambert*, Jacob Morrison*, Shengyi Huang*, Hamish Ivison*, Faeze Brahman*, Lester James V Miranda*, Alisa Liu. Nouha Dziri, Xinxi Lyu, Yuling Gu, Saumya Malik, Victoria Graf, Jena D Hwang, Jiangjiang Yang, Ronan Le Bras, Oyvind Tafjord, Chris Wilhelm, Luca Soldaini, Noah A Smith, Yizhong Wang, Pradeep Dasigi, Hannaneh Hajishirzi.
🎓 In: COLM 2025 | 📄 Paper

2 OLMo 2 Furious

Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William Merrill, Lester James V Miranda, Jacob Morrison, Tyler Murray, Crystal Nam, Valentina Pyatkin, Aman Rangapur, Michael Schmitz, Sam Skjonsberg, David Wadden, Christopher Wilhelm, Michael Wilson, Luke Zettlemoyer, Ali Farhadi, Noah A Smith, Hannaneh Hajishirzi 🎓 In: COLM 2025 |📄 Paper

RewardBench 2: Advancing Reward Model Evaluation

Saumya Malik, Valentina Pyatkin, Sander Land, Jacob Morrison, Noah A. Smith, Hannaneh Hajishirzi, Nathan Lambert.
📄 Paper

Diverging Preferences: When do Annotators Disagree and do Models Know?

Michael J.Q. Zhang, Zhilin Wang, Jena D. Hwang, Yi Dong, Olivier Delalleau, Yejin Choi, Eunsol Choi, Xiang Ren, Valentina Pyatkin.
🎓 In: ICML 2025 | 📄 Paper

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

Jing-Jing Li, Valentina Pyatkin, Max Kleiman-Weiner, Liwei Jiang, Nouha Dziri, Anne G. E. Collins, Jana Schaich Borg, Maarten Sap, Yejin Choi, Sydney Levine. 🎓 In: ICML 2025 | 📄 Paper

RewardBench: Evaluating Reward Models for Language Modeling

Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda, Bill Yuchen Lin, Khyathi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi.
🎓 In: NAACL Findings 2025 | 📄 Paper

Superlatives in Context: Modeling the Implicit Semantics of Superlatives

Valentina Pyatkin, Bonnie Webber, Ido Dagan, Reut Tsarfaty.
🎓 In: NAACL 2025 | 📄 Paper

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

Paul Röttger, Musashi Hinck, Valentin Hofmann, Kobi Hackenburg, Valentina Pyatkin, Faeze Brahman, Dirk Hovy. 🎓 In: TACL 2025 | 📄 Paper

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Lester James V. Miranda*, Yizhong Wang*, Yanai Elazar, Sachin Kumar, Valentina Pyatkin, Faeze Brahman, Noah A. Smith, Hanna Hajishirzi, Pradeep Dasigi.
🎓 In: ACL 2025 |📄 Paper

WILDBENCH: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu, Faeze Brahman, Abhilasha Ravichander, Valentina Pyatkin, Nouha Dziri, Ronan Le Bras, Yejin Choi
🎓 In: ICLR 2025 | 📄 Paper

2024

Explicating the Implicit: Argument Detection Beyond Sentence Boundaries

Paul Roit, Aviv Slobodkin, Eran Hirsch, Arie Cattan, Ayal Klein, Valentina Pyatkin, Ido Dagan.
🎓 In: ACL 2024 | 📄 Paper

Self-Directed Synthetic Dialogues and Revisions Technical Report

Nathan Lambert, Hailey Schoelkopf, Aaron Gokaslan, Luca Soldaini, Valentina Pyatkin, Louis Castricato.
📄 Paper

The Art of Saying No: Contextual Noncompliance in Language Models

Faeze Brahman*, Sachin Kumar*, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi.
🎓 In: NeurIPS 2024 | 📄 Paper

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

Hamish Ivison, Yizhong Wang, Jiacheng Liu, Zeqiu Wu, Valentina Pyatkin, Nathan Lambert, Noah A Smith, Yejin Choi, Hannaneh Hajishirzi.
🎓 In: NeurIPS 2024 | 📄 Paper

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

Paul Röttger*, Valentin Hofmann*, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, Dirk Hovy.
🎓 In: ACL 2024 | 📄 Paper
⭐Outstanding Paper Award⭐

OLMo: Accelerating the Science of Language Models

Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A Smith, Hannaneh Hajishirzi.
🎓 In: ACL 2024 | 📄 Paper ⭐Best Theme Paper Award⭐

Promptly Predicting Structures: The Return of Inference

Maitrey Mehta, Valentina Pyatkin, Vivek Srikumar.
🎓 In: NAACL 2024 | 📄 Paper

Retrieving Texts based on Abstract Descriptions

Shauli Ravfogel, Valentina Pyatkin, Amir DN Cohen, Avshalom Manevich, Yoav Goldberg.
🎓 In: COLM 2024 | 📄 Paper

2023

Camels in a Changing Climate: Enhancing LM Adaptation with TÜLU 2

Hamish Ivison*, Yizhong Wang*, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi.
📄 Paper

” You Are An Expert Linguistic Annotator”: Limits of LLMs as Analyzers of Abstract Meaning Representation

Allyson Ettinger, Jena D Hwang, Valentina Pyatkin, Chandra Bhagavatula, Yejin Choi.
🎓 In: EMNLP Findings | 📄 Paper

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

Kavel Rao, Liwei Jiang, Valentina Pyatkin, Yuling Gu, Niket Tandon, Nouha Dziri, Faeze Brahman, Yejin Choi.
🎓 In: EMNLP Findings | 📄 Paper

Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement

Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren.
🎓 In: ICLR | 📄 Paper

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties

Taylor Sorensen, Liwei Jiang, Jena Hwang, Sydney Levine, Valentina Pyatkin, Peter West, Nouha Dziri, Ximing Lu, Kavel Rao, Chandra Bhagavatula, Maarten Sap, John Tasioulas, Yejin Choi.
🎓 In: AAAI | 📄 Paper

PlaSma: Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning

Faeze Brahman, Chandra Bhagavatula, Valentina Pyatkin, Jena D. Hwang, Xiang Lorraine Li, Hirona J. Arai, Soumya Sanyal, Keisuke Sakaguchi, Xiang Ren, Yejin Choi.
🎓 In: ICLR | 📄 Paper

Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design

Valentina Pyatkin, Frances Yung, Merel C.J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg.
🎓 In: TACL | 📄 Paper

ClarifyDelphi: Reinforced Clarification Questions with Defeasibility Rewards for Social and Moral Situations

Valentina Pyatkin, Jena D. Hwang, Vivek Srikumar, Ximing Lu, Liwei Jiang, Yejin Choi and Chandra Bhagavatula.
🎓 In: ACL | 📄 Paper

Revisiting Sentence Union Generation as a Testbed for Text Consolidation

Eran Hirsch, Valentina Pyatkin, Ruben Wolhandler, Avi Caciularu, Asi Shefer, Ido Dagan.
🎓 In: ACL Findings | 📄 Paper

2022

Just-DREAM-about-it: Figurative Language Understanding with DREAM-FLUTE

Gu, Yuling, Yao Fu, Valentina Pyatkin, Ian H. Magnusson, Bhavana Dalvi and Peter Clark.
🎓 In: Proceedings of the Workshop on Figurative Language Processing at EMNLP 2022 | 📄 Paper

QASem Parsing: Text-to-text Modeling of QA-based Semantics

Ayal Klein, Eran Hirsch, Ron Eliav, Valentina Pyatkin, Avi Caciularu, Ido Dagan.
🎓 In: EMNLP | 📄 Paper

Design Choices in Crowdsourcing Discourse Relation Annotations: The Effect of Worker Selection and Training

Merel C.J. Schoman, Valentina Pyatkin, Frances Yung, Ido Dagan, Reut Tsarfaty, Vera Demberg.
🎓 In: LREC | 📄 Paper

Draw Me a Flower: Grounding Formal Abstract Structures Stated in Informal Natural Language

Royi Lachmy, Valentina Pyatkin, Avshalom Manevich, Reut Tsarfaty.
🎓 In: TACL | 📄 Paper

2021

Asking It All: Generating Contextualized Questions for any Semantic Role

Valentina Pyatkin*, Paul Roit*, Julian Michael, Reut Tsarfaty, Yoav Goldberg, Ido Dagan.
🎓 In: EMNLP | 📄 Paper

The Possible, the Plausible, and the Desirable: Event-Based Modality Detection for Language Processing

Valentina Pyatkin*, Shoval Sadde*, Aynat Rubinstein, Paul Portner, Reut Tsarfaty.
🎓 In: ACL | 📄 Paper

2020

QADiscourse - Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines

Valentina Pyatkin, Ayal Klein, Reut Tsarfaty, Ido Dagan.
🎓 In: EMNLP | 📄 Paper

QA-Nom: Question-Answer driven SRL for Nominalizations

Ayal Klein, Jonathan Mamou, Valentina Pyatkin, Daniela Stepanov, Hangfeng He, Dan Roth, Luke Zettlemoyer, Ido Dagan.
🎓 In: COLING | 📄 Paper

2017

Discourse Relations and Conjoined VPs: Automated Sense Recognition

Valentina Pyatkin, Bonnie Webber.
🎓 In: EACL SRW 2017 | 📄 Paper

* : Equal contribution.

Misc

Besides this I love rowing (currently at Lake Washington Rowing Club) and going to the “cinemathèque”. I think that Italian Neorealism produced some of the most beautiful movies. My Erdős number is 3 (Paul Erdős → Noga Alon → Ido Dagan → Me) and my Kevin Knight number is 2 (Kevin Knight → Yejin Choi → Me).