Daniel Khashabi

Daniel Khashabi
Assistant Professor, Department of Computer Science, Johns Hopkins University
Office: Hackerman Hall 316B
Email: danielkjhu.edu

Other affiliations:

Research Themes

I am broadly interested in making language-driven AI systems more helpful, reliable, and efficient. As these systems increasingly engage in reasoning and creative discovery, their impact hinges on balancing open-ended exploration with grounded trustworthiness. At the core of this agenda lies a tension: creative reasoning thrives on revealing novel connections and deep structural parallels across distant domains, yet it is inherently prone to false associations that can undermine, rather than enhance, human productivity. My work draws on algorithmic and statistical tools to develop computational frameworks to harness the benefits of creative reasoning while remaining transparent, verifiable, and aligned with human values—enabling trustworthy, AI-driven discovery.

I have pursued this vision along three complementary axes:

Understanding the foundations: What fundamental principles govern the intelligent behavior that emerges from large-scale training? And can we use these insights as guiding principles of scaling?
Reasoning, communication and interaction: How can we amplify AI’s capacity for reasoning across tasks, contexts, and interactions with the world? How can we characterize the trade-off between generality vs. specialization (sticking to what user wants) in adaptive systems?
Safety, oversight and evaluation: How can oversight mechanisms (e.g., debate, red teaming, verification) be formalized and automated at scale? How can AI systems balance autonomy with auditing and safety mechanisms to ensure human trust?

Our research is driven by a key real application: accelerating scientific discovery in an era where knowledge grows faster than any individual can absorb.

Most of my research is aligned with the following research communities: natural language processing (ACL, NAACL, EMNLP), machine learning and AI (COLM, ICLR, NeurIPS, ICML, AAAI, IJCAI).

Information for Prospective Students and Visitors

Here is some information for prospective students and visitors.

Due to the large number of emails I receive, I cannot respond to every email individually. Please review the information below before contacting me.

Current JHU students: If you are an undergraduate or masters student and would like to work on research with my group, please fill out this form. The minimum time commitment is 15 hours per week for six months.
Prospective visiting students: Please fill out the above form. For visiting graduate students, the minimum length of a visit is six months.
Prospective postdocs: Please email me directly with your CV and I will get back to you if there is an opportunity that is a good fit.
Prospective graduate students: Please apply through the system and list me as a potential advisor in your application. There is no need to contact me.

Recent Talks

2025, Apple Workshop on Reasoning and Planning (slides)
2025, University of Pennsylvania Computational Linguistics lunch (slides)
2024, University of Cambridge the Language Technology Lab seminar (slides)
2024, Oracle Labs ML seminar (slides)
2024, Tel Aviv NLP seminar (slides)
2024, Forum on ‘‘Engineered AI Systems’’ (slides)
2024, Keynote at ‘‘Engineering for Professionals’’ quarterly meeting (slides)
2024, Workshop on ‘‘LLMs for Healthy Aging’’ (slides)
2023, NYU ‘‘Text-as-Data’’ talk series (slides)
2023, Hopkins Center for Language and Speech Technologies seminar (video)
2023, Hopkins Electrical Engineering department seminars (slides)
2023, Amazon ‘‘Human in the Loop’’ seminar
2023, Hopkins Center for Health Security seminars (slides)
2023, UMD Computational Linguistics seminar (slides)
2023, Applied Physics Lab, Intelligent Systems Center seminars
2022, University of Tehran NLP seminar
2021, University of Glasgow IR seminar (slides)
2021, Johns Hopkins University (slides)
2021, Google AI (slides)
2021, UCLA Big Data and ML seminar (slides)
2021, USC NLP seminar (slides)
2020, Tel Aviv University NLP seminar (slides)
2019, Workshop on Progress Towards the Holy Grail, Conference on Constraint Programming (CP), 2019. (slides)
2019, CMU LTI seminar (slides)
2018, NYU NLP seminar Reasoning-Driven Question Answering.
2018, Stanford NLP seminar (slides)
2018, Mid-Atlantic Student Colloquium on Speech, Language and Learning (slides)

Teaching

CS 601.471/671, NLP: Self-supervised Models: Spring 2023, Spring 2024, Spring 2025 (not teaching in Spring 2026; apologies to all students who wanted to take this course)
CS 601.771, Advances in Self-supervised Models: Fall 2022, Fall 2024, Fall 2025

Intelligence Amplification Lab (IALab)

PhD students:

Adam Byerly
Jack Jingyu Zhang - co-advised w/ Benjamin Van Durme
Andrew Wang - co-advised w/ Nick Andrews
Jiefu Ou - co-advised w/ Benjamin Van Durme
Tianjian Li
Hannah Gonzalez - co-advised w/ Benjamin Van Durme
Zheyuan “Brian” Zhang - co-advised w/ Tianmin Shu
Austen Liao - co-advised w/ Benjamin Van Durme

Here’s a team photo from our recent fun outing.
We’re also grateful to collaborate with a number of exceptional PhD, MS and undergraduate students who are not listed here.

Publication

Disclaimer: This material is presented to ensure the timely dissemination of scholarly works. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms invoked by each author's copyright.

Genomic Next-Token Predictors are In-Context Learners.
Nathan Breslow, Aayush Mishra, Mahler Revsine, Michael C. Schatz, Anqi Liu and Daniel Khashabi.
arXiv preprint arXiv:2511.12797, 2025. [code]

CreativityPrism: A Holistic Benchmark for Large Language Model Creativity.
Zhaoyi Joey Hou, Bowei Alvin Zhang, Yining Lu, Bhiman Kumar Baghel, Anneliese Brei, Ximing Lu, Meng Jiang, Faeze Brahman, Snigdha Chaturvedi, Haw-Shiuan Chang, Daniel Khashabi and Xiang Lorraine Li.
arXiv preprint arXiv:2510.20091, 2025. [data]

Query Decomposition for RAG: Balancing Exploration-Exploitation.
Roxana Petcu, Kenton Murray, Daniel Khashabi, Evangelos Kanoulas, Maarten de Rijke, Dawn Lawrie and Kevin Duh.
arXiv preprint arXiv:2510.18633, 2025.

World-in-World: World Models in a Closed-Loop World.
Jiahan Zhang, Muqing Jiang, Nanru Dai, Taiming Lu, Arda Uzunoglu, Shunchi Zhang, Yana Wei, Jiahao Wang, Vishal M. Patel, Paul Pu Liang, Daniel Khashabi, Cheng Peng, Rama Chellappa, Tianmin Shu, Alan Yuille, Yilun Du and Jieneng Chen.
arXiv preprint arXiv:2510.18135, 2025. [project]

Gold Panning: Turning Positional Bias into Signal for Multi-Document LLM Reasoning.
Adam Byerly and Daniel Khashabi.
arXiv preprint arXiv:2510.09770, 2025.

The Alignment Waltz: Jointly Training Agents to Collaborate for Safety.
Jingyu Zhang, Haozhu Wang, Eric Michael Smith, Sid Wang, Amr Sharaf, Mahesh Pasupuleti, Benjamin Van Durme, Daniel Khashabi, Jason Weston and Hongyuan Zhan.
arXiv preprint arXiv:2510.08240, 2025.

Safe and Efficient In-Context Learning via Risk Control.
Andrea Wynn, Metod Jazbec, Charith Peris, Rinat Khaziev, Anqi Liu, Daniel Khashabi and Eric Nalisnick.
arXiv preprint arXiv:2510.02480, 2025. [poster] [code]

The Flaw of Averages: Quantifying Uniformity of Performance on Benchmarks.
Arda Uzunoglu, Tianjian Li and Daniel Khashabi.
arXiv preprint arXiv:2509.25671, 2025.

IA2: Alignment with ICL Activations Improves Supervised Fine-Tuning.
Aayush Mishra, Daniel Khashabi and Anqi Liu.
arXiv preprint arXiv:2509.22621, 2025. [code]

Linguistic Nepotism: Trading-off Quality for Language Preference in Multilingual RAG.
Dayeon Ki, Marine Carpuat, Paul McNamee, Daniel Khashabi, Eugene Yang, Dawn Lawrie and Kevin Duh.
arXiv preprint arXiv:2509.1393, 2025. [code]

Jointly Reinforcing Diversity and Quality in Language Model Generations.
Tianjian Li, Yiming Zhang, Ping Yu, Swarnadeep Saha, Daniel Khashabi, Jason Weston, Jack Lanchantin and Tianlu Wang.
arXiv preprint arXiv:2509.02534, 2025. [code]

BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases.
Mathew J. Koretsky, Maya Willey, Adi Asija, Owen Bianchi, Chelsea X. Alvarado, Tanay Nayak, Nicole Kuznetsov, Sungwon Kim, Mike A. Nalls, Daniel Khashabi and Faraz Faghri.
arXiv preprint arXiv:2505.20321, 2025. [data] [code]

Lost in the Haystack: Smaller Needles are More Difficult for LLMs to Find.
Owen Bianchi, Mathew J. Koretsky, Maya Willey, Chelsea X. Alvarado, Tanay Nayak, Adi Asija, Nicole Kuznetsov, Mike A. Nalls, Faraz Faghri and Daniel Khashabi.
arXiv preprint arXiv:2505.18148, 2025. [code]

Science Hierarchography: Hierarchical Abstractions of Scientific Literature.
Muhan Gao, Jash Shah, Weiqi Wang, Kuan-Hao Huang and Daniel Khashabi.
arXiv preprint arXiv:2504.13834, 2025. [data]

Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol.
Weiqi Wang, Jiefu Ou, Yangqiu Song, Benjamin Van Durme and Daniel Khashabi.
arXiv preprint arXiv:2504.10284, 2025. [data]

The Translation Barrier Hypothesis: Multilingual Generation with Large Language Models Suffers from Implicit Translation Failure.
Niyati Bafna, Tianjian Li, Kenton Murray, David R. Mortensen, David Yarowsky, Hale Sirin and Daniel Khashabi.
The Asia-Pacific Chapter of the Association for Computational Linguistics (AACL), 2025. [code]

Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback.
Dongwei Jiang, Alvin Zhang, Andrew Wang, Nicholas Andrews and Daniel Khashabi.
Advances in Neural Information Processing Systems (NeurIPS), 2025. [code]

Jailbreak Distillation: Renewable Safety Benchmarking.
Jingyu Zhang, Ahmed Elgohary, Xiawei Wang, A S M Iftekhar, Ahmed Magooda, Benjamin Van Durme, Daniel Khashabi and Kyle Jackson.
Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings, 2025. [data] [talk]

ClaimCheck: How Grounded are LLM Critiques of Scientific Papers?.
Jiefu Ou, William Gantt Walden, Kate Sanders, Zhengping Jiang, Kaiser Sun, Jeffrey Cheng, William Jurayj, Miriam Wanner, Shaobo Liang, Candice Morgan, Seunghoon Han, Weiqi Wang, Chandler May, Hannah Recknor, Daniel Khashabi and Benjamin Van Durme.
Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings, 2025. [data]

Challenging the Evaluator: LLM Sycophancy under User Rebuttal.
Sungwon Kim and Daniel Khashabi.
Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings, 2025. [code]

Evaluating the Evaluators: Are readability metrics good measures of readability?
Isabel Cachola, Daniel Khashabi and Mark Dredze.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025. [code]

Certified Mitigation of Worst-Case LLM Copyright Infringement.
Jingyu Zhang, Jiacan Yu, Marc Marone, Benjamin Van Durme and Daniel Khashabi.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025. [talk] [code]

ICL Ciphers: Quantifying "Learning’’ in In-Context Learning via Substitution Ciphers.
Zhouxiang Fang, Aayush Mishra, Muhan Gao, Anqi Liu and Daniel Khashabi.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025. [talk] [code]

Hell or High Water: Can Language Model Agents Formulate Backup Plans?
Andrew Wang, Sophia Hager, Adi Asija, Daniel Khashabi and Nicholas Andrews.
Conference on Language Modeling (COLM), 2025. [poster] [code]

Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy.
Abe Bohan Hou, Hongru Du, Yichen Wang, Jingyu Zhang, Zixiao Wang, Paul Pu Liang, Daniel Khashabi, Lauren Gardner and Tianxing He.
Conference on Language Modeling (COLM), 2025. [code]

Core: Robust Factual Precision Scoring with Informative Sub-Claim Identification.
Zhengping Jiang, Jingyu Zhang, Nathaniel Weir, Seth Ebner, Miriam Wanner, Kate Sanders, Daniel Khashabi, Anqi Liu and Benjamin Van Durme.
Annual Meeting of the Association for Computational Linguistics (ACL) - Findings, 2025. [code]

Rationalyst: Mining Implicit Rationales for Process Supervision of Reasoning.
Dongwei Jiang, Guoxuan Wang, Yining Lu, Andrew Wang, Jingyu Zhang, Chuyu Liu, Benjamin Van Durme and Daniel Khashabi.
Annual Meeting of the Association for Computational Linguistics (ACL), 2025. [slides] [talk] [code]

SimpleMix: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning.
Tianjian Li and Daniel Khashabi.
International Conference on Machine Learning (ICML), 2025.

Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data.
Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme and Daniel Khashabi.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2025. [slides] [slides2] [talk]

Upsample or Upweight? Balanced Training on Heavily Imbalanced Datasets.
Tianjian Li, Haoran Xu, Weiting Tan, Kenton Murray and Daniel Khashabi.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2025. [slides] [talk]

Benchmarking Language Model Creativity: A Case Study on Code Generation.
Yining Lu, Dixuan Wang, Tianjian Li, Dongwei Jiang and Daniel Khashabi.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2025. [slides] [talk] [poster] [code]

TurkingBench: A Challenge Benchmark for Web Agents.
Kevin Xu, Yeganeh Kordi, Tanay Nayak, Adi Asija, Yizhong Wang, Kate Sanders, Adam Byerly, Jingyu Zhang, Benjamin Van Durme and Daniel Khashabi.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2025. [data] [slides] [talk] [demo]

GenEx: Generating an Explorable World.
Taiming Lu, Tianmin Shu, Alan Yuille, Daniel Khashabi and Jieneng Chen.
International Conference on Learning Representations (ICLR), 2025. [slides] [poster] [project] [demo] [blog]

Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements.
Jingyu Zhang, Ahmed Elgohary, Ahmed Magooda, Daniel Khashabi and Benjamin Van Durme.
International Conference on Learning Representations (ICLR), 2025. [data] [slides] [slides2] [talk] [poster] [code]

WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment.
Jiefu Ou, Arda Uzunoglu, Benjamin Van Durme and Daniel Khashabi.
Conference on Artificial Intelligence (AAAI), 2025. [talk] [poster]

Self-(In)Correct: LLMs Struggle with Discriminating Self-Generated Responses.
Dongwei Jiang, Jingyu Zhang, Orion Weller, Nathaniel Weir, Benjamin Van Durme and Daniel Khashabi.
Conference on Artificial Intelligence (AAAI), 2025. [slides] [talk] [poster]

Self-Consistency Falls Short! The Adverse Effects of Positional Bias on Long-Context Problems.
Adam Byerly and Daniel Khashabi.
Transactions of the Association for Computational Linguistics (TACL), 2025.

CARDBiomedBench: A Benchmark for Evaluating Large Language Model Performance in Biomedical Research.
Owen Bianchi, Maya Willey, Chelsea X Avarado, Benjamin Danek, Marzieh Khani, Nicole Kuznetsov, Anant Dadu, Syed Shah, Mathew J Koretsky, Mary B Makarious, Cory Weller, Kristin S Levine, Sungwon Kim, Paige Jarreau, Dan Vitale, Elise Marsan, Hirotaka Iwaki, Hampton Leonard, Sara Bandres-Ciga, Andrew B Singleton, Mike A. Nalls, Shekoufeh Mokhtari, Daniel Khashabi and Faraz Faghri.
The Lancet Digital Health, 2025. [data] [code] [blog]

Artificial Intelligence as a Co-Tutor: Assessing the Impact in the Advanced Learning Virtual Classroom.
Kathryn N Thompson, Kimberley L Chandler, Candice Morgan, Daniel Khashabi, Emily A Delinski and Benjamin Van Durme.
Journal of Advanced Academics, 2025.

Large Language Models for Screening Search Results in Systematic Reviews: Are We There Yet?
S Swaroop Vedula and Daniel Khashabi.
Annals of Internal Medicine, 2025.

LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression.
Jieneng Chen, Luoxin Ye, Ju He, Zhao-Yang Wang, Daniel Khashabi and Alan Yuille.
Advances in Neural Information Processing Systems (NeurIPS), 2024. [code] [project]

DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation.
Weiting Tan, Jingyu Zhang, Lingfeng Shen, Daniel Khashabi and Philipp Koehn.
Advances in Neural Information Processing Systems (NeurIPS), 2024. [code]

Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell.
Taiming Lu, Muhan Gao, Kuai Yu, Adam Byerly and Daniel Khashabi.
Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings, 2024. [poster] [code]

AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies.
Xiao Ye, Andrew Wang, Jacob Choi, Yining Lu, Shreya Sharma, Lingfeng Shen, Vijay Tiyyala, Nicholas Andrews and Daniel Khashabi.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024. [data] [slides] [talk] [poster] [code]

Dated Data: Tracing Knowledge Cutoffs in Large Language Models.
Jeffrey Cheng, Marc Marone, Orion Weller, Dawn Lawrie, Daniel Khashabi and Benjamin Van Durme.
Conference on Language Modeling (COLM), 2024. Outstanding paper award.🏆 [slides] [talk] [poster] [code]

The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Context.
Lingfeng Shen, Weiting Tan, Sihao Chen, Yunmo Chen, Jingyu Zhang, Haoran Xu, Boyuan Zheng, Philipp Koehn and Daniel Khashabi.
Annual Meeting of the Association for Computational Linguistics (ACL) - Findings, 2024.

k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text.
Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi and Tianxing He.
Annual Meeting of the Association for Computational Linguistics (ACL) - Findings, 2024. [code]

RORA: Robust Free-Text Rationale Evaluation.
Zhengping Jiang, Yining Lu, Hanjie Chen, Daniel Khashabi, Benjamin Van Durme and Anqi Liu.
Annual Meeting of the Association for Computational Linguistics (ACL), 2024.

Do Pretrained Transformers Learn In-Context by Gradient Descent?
Lingfeng Shen, Aayush Mishra and Daniel Khashabi.
International Conference on Machine Learning (ICML), 2024. [slides] [talk]

SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation.
Abe Bohan Hou, Jingyu Zhang, Tianxing He, Yichen Wang, Yung-Sung Chuang, Hongwei Wang, Lingfeng Shen, Benjamin Van Durme, Daniel Khashabi and Yulia Tsvetkov.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024. [code]

The Trickle-down Impact of Reward (In-)consistency on RLHF.
Lingfeng Shen, Sihao Chen, Linfeng Song, Lifeng Jin, Baolin Peng, Haitao Mi, Daniel Khashabi and Dong Yu.
International Conference on Learning Representations (ICLR), 2024. [code]

Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models.
Tianjian Li, Haoran Xu, Philipp Koehn, Daniel Khashabi and Kenton Murray.
International Conference on Learning Representations (ICLR), 2024. Spotlight presentation.🏆 [talk] [code]

GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution.
Yining Lu, Haoping Yu and Daniel Khashabi.
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2024. [code]

‘‘According to …’’: Prompting Language Models Improves Quoting from Pre-Training Data.
Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi and Benjamin Van Durme.
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2024. [talk] [code]

Representation Projection Invariance Mitigates Representation Collapse.
Anastasia Razdaibiedina, Ashish Khetan, Zohar Karnin, Daniel Khashabi, Vishaal Kapoor and Vivek Madan.
Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings, 2023.

Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency.
Lingfeng Shen, Weiting Tan, Boyuan Zheng and Daniel Khashabi.
Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings, 2023. [code]

When Not to Trust Language Models: Investigating Effectiveness and Limitations of Parametric and Non-Parametric Memories.
Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi and Hannaneh Hajishirzi.
Annual Meeting of the Association for Computational Linguistics (ACL), 2023. Best video award.🏆 [slides] [talk] [code]

The Tail Wagging the Dog: Dataset Construction Biases of Social Bias Benchmarks.
Nikil Roashan Selvam, Sunipa Dev, Daniel Khashabi, Tushar Khot and Kai-Wei Chang.
Annual Meeting of the Association for Computational Linguistics (ACL), 2023. Outstanding paper award.🏆 [slides] [talk] [poster] [code]

Self-Instruct: Aligning Language Model with Self Generated Instructions.
Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi and Hannaneh Hajishirzi.
Annual Meeting of the Association for Computational Linguistics (ACL), 2023. [code]

Generating Sequences by Learning to Self-Correct.
Sean Welleck, Ximing Lu, Peter West, Faeze Brahman, Tianxiao Shen, Daniel Khashabi and Yejin Choi.
International Conference on Learning Representations (ICLR), 2023.

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adri`{a} Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlm"{u}ller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta and others.
Transactions on Machine Learning Research (TMLR), 2023. Finalist for outstanding certification.🏆 [data]

UnifiedQA-v2: Stronger Generalization via Broader Cross-Format Training.
Daniel Khashabi, Yeganeh Kordi and Hannaneh Hajishirzi.
arXiv preprint arXiv:2202.12359, 2022. [code] [demo]

COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics.
Lianhui Qin, Sean Welleck, Daniel Khashabi and Yejin Choi.
Advances in Neural Information Processing Systems (NeurIPS), 2022. [slides] [talk] [poster] [code]

ProsocialDialog: A Prosocial Backbone for Conversational Agents.
Hyunwoo Kim, Youngjae Yu, Liwei Jiang, Ximing Lu, Daniel Khashabi, Gunhee Kim, Yejin Choi and Maarten Sap.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022. [data]

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ Tasks.
Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, David Stap, Eshaan Pathak, Giannis Karamanolakis, Haizhi Gary Lai, Ishan Purohit, Ishani Mondal, Jacob Anderson, Kirby Kuznia, Krima Doshi, Maitreya Patel, Kuntal Kumar Pal, Mehrad Moradshahi, Mihir Parmar, Mirali Purohit, Neeraj Varshney, Phani Rohitha Kaza, Pulkit Verma, Ravsehaj Singh Puri, Rushang Karia, Shailaja Keyur Sampat, Savan Doshi, Siddhartha Mishra, Sujan Reddy, Sumanta Patro, Tanay Dixit, Xudong Shen, Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi and Daniel Khashabi.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022. [data] [slides] [slides2] [poster] [project] [blog]

GENIE: Toward Reproducible and Standardized Human Evaluation for Text Generation.
Daniel Khashabi, Gabriel Stanovsky, Jonathan Bragg, Nicholas Lourie, Jungo Kasai, Yejin Choi, Noah A Smith and Daniel S Weld.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022. [project] [coverage]

Reframing Instructional Prompts to GPTk's Language.
Swaroop Mishra, Daniel Khashabi, Chitta Baral, Yejin Choi and Hannaneh Hajishirzi.
Annual Meeting of the Association for Computational Linguistics (ACL) - Findings, 2022. [talk] [code]

Hey AI, Can You Solve Complex Tasks by Talking to Agents?
Tushar Khot, Kyle Richardson, Daniel Khashabi and Ashish Sabharwal.
Annual Meeting of the Association for Computational Linguistics (ACL) - Findings, 2022. [slides] [talk] [poster]

Cross-Task Generalization via Natural Language Crowdsourcing Instructions.
Swaroop Mishra, Daniel Khashabi, Chitta Baral and Hannaneh Hajishirzi.
Annual Meeting of the Association for Computational Linguistics (ACL), 2022. AI2's lasting impact award.🏆 [slides] [talk] [code] [project] [blog]

Time Waits for No One! Analysis and Challenges of Temporal Misalignment.
Kelvin Luu, Daniel Khashabi, Suchin Gururangan, Karishma Mandyam and Noah A Smith.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022. [data] [talk]

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics.
Ximing Lu, Sean Welleck, Peter West, Liwei Jiang, Jungo Kasai, Daniel Khashabi, Ronan Le Bras, Lianhui Qin, Youngjae Yu, Rowan Zellers, Noah A. Smith and Yejin Choi.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022. Best paper award.🏆 [code]

Prompt Waywardness: The Curious Case of Discretized Interpretation of Continuous Prompts.
Daniel Khashabi, Xinxi Lyu, Sewon Min, Lianhui Qin, Kyle Richardson, Sean Welleck, Hannaneh Hajishirzi, Tushar Khot, Ashish Sabharwal, Sameer Singh and Yejin Choi.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022. [slides] [slides2] [talk] [code]

Toward Automatic Discovery of Diverse Perspectives.
Sihao Chen, Daniel Khashabi and Dan Roth.
Creating a More Transparent Internet: The Perspective Web - Cambridge University Press, 2022.

Think you have solved direct-answer question answering? Try ARC-DA, the direct-answer AI2 reasoning challenge.
Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord and Peter Clark.
arXiv preprint arXiv:2102.03315, 2021. [data]

GooAQ: Open Question Answering with Diverse Answer Types.
Daniel Khashabi, Amos Ng, Tushar Khot, Ashish Sabharwal, Hannaneh Hajishirzi and Chris Callison-Burch.
Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings, 2021. [slides] [talk] [code]

Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?
Jieyu Zhao, Daniel Khashabi, Tushar Khot, Ashish Sabharwal and Kai-Wei Chang.
Annual Meeting of the Association for Computational Linguistics (ACL) - Findings, 2021. [code]

Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models.
Tushar Khot, Daniel Khashabi, Kyle Richardson, Peter Clark and Ashish Sabharwal.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021. [slides] [talk] [code] [demo]

Findings of the 2021 Conference on Machine Translation (WMT21).
Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ond{v{r}}ej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-jussa, Cristina Espa{~n}a-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin and Marcos Zampieri.
Conference on Machine Translation (WMT), 2021.

Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies.
Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth and Jonathan Berant.
Transactions of the Association for Computational Linguistics (TACL), 2021. [data] [slides] [leaderboard]

ParsiNLU: A Suite of Language Understanding Challenges for Persian.
Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, Marzieh Bitaab, Faeze Brahman, Sarik Ghazarian and others.
Transactions of the Association for Computational Linguistics (TACL), 2021. [slides] [code]

UnQovering Stereotypical Biases via Underspecified Questions.
Tao Li, Daniel Khashabi, Tushar Khot, Ashish Sabharwal and Vivek Srikumar.
Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings, 2020. [code] [demo]

UnifiedQA: Crossing Format Boundaries With a Single QA System.
Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark and Hannaneh Hajishirzi.
Conference on Empirical Methods in Natural Language Processing (EMNLP) - Findings, 2020. [slides] [code] [demo]

More Bang for Your Buck: Natural Perturbation for Robust Question Answering.
Daniel Khashabi, Tushar Khot and Ashish Sabharwal.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020. [data] [slides] [talk]

Temporal Common Sense Acquisition with Minimal Supervision.
Ben Zhou, Qiang Ning, Daniel Khashabi and Dan Roth.
Annual Meeting of the Association for Computational Linguistics (ACL), 2020. [slides] [code]

Not All Claims are Created Equal: Choosing the Right Approach to Assess Your Hypotheses.
Erfan Sadeqi Azer, Daniel Khashabi, Ashish Sabhwawal and Dan Roth.
Annual Meeting of the Association for Computational Linguistics (ACL), 2020. [slides] [slides2] [slides3] [talk] [code]

TransOMCS: From Linguistic Graphs to Commonsense Knowledge.
Hongming Zhang, Daniel Khashabi, Yangqiu Song and Dan Roth.
International Joint Conferences on Artificial Intelligence (IJCAI), 2020. [code]

From ‘F’ to ‘A’ on the NY Regents Science Exams: An Overview of the Aristo Project.
Peter Clark, Oren Etzioni, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Niket Tandon, Sumithra Bhakthavatsalam, Dirk Groeneveld, Michal Guerquin and Michael Schmitz.
AI Magazine, 2020. [talk] [coverage]

‘‘Going on a vacation’’ takes longer than ‘‘Going for a walk’’: A Study of Temporal Commonsense Understanding.
Ben Zhou, Daniel Khashabi, Qiang Ning and Dan Roth.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019. [slides] [talk] [code] [leaderboard]

PerspectroScope: A Window to the World of Diverse Perspectives.
Sihao Chen, Daniel Khashabi, Chris Callison-Burch and Dan Roth.
Annual Meeting of the Association for Computational Linguistics (ACL) - Demonstrations, 2019. [poster] [code] [demo]

Seeing Things from a Different Angle: Discovering Diverse Perspectives about Claims.
Sihao Chen, Daniel Khashabi, Wenpeng Yin, Chris Callison-Burch and Dan Roth.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019. [poster] [code] [demo] [visualization]

On the Possibilities and Limitations of Multi-hop Reasoning Under Linguistic Imperfections.
Daniel Khashabi, Erfan Sadeqi Azer, Tushar Khot, Ashish Sabharwal and Dan Roth.
arXiv, 2019.

Reasoning-Driven Question-Answering for Natural Language Understanding.
Daniel Khashabi.
PhD thesis at University of Pennsylvania, 2019.

Zero-Shot Open Entity Typing as Type-Compatible Grounding.
Ben Zhou, Daniel Khashabi, Chen-Tse Tsai and Dan Roth.
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018. [poster] [code]

Looking beyond the surface: A challenge set for reading comprehension over multiple sentences.
Daniel Khashabi, Snigdha Chaturvedi, Michael Roth, Shyam Upadhyay and Dan Roth.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2018. [data] [poster]

CogCompNLP: Your swiss army knife for nlp.
Daniel “Khashabi, Mark Sammons, Ben Zhou, Tom Redman, Christos Christodoulopoulos, Vivek Srikumar, Nicholas Rizzolo, Lev Ratinov, Guanheng Luo, Quang Do, Chen-Tse Tsai, Subhro Roy, Stephen Mayhew, Zhili Feng, John Wieting, Xiaodong Yu, Yangqiu Song, Shashank Gupta, Shyam Upadhyay, Naveen Arivazhagan, Qiang Ning, Shaoshi Ling and Dan” Roth.
International Conference on Language Resources and Evaluation (LREC), 2018. [poster] [code]

Question Answering as Global Reasoning over Semantic Abstractions.
Daniel Khashabi, Tushar Khot, Ashish Sabharwal and Dan Roth.
Conference on Artificial Intelligence (AAAI), 2018. [slides] [code]

Learning what is essential in questions.
Daniel Khashabi, Tushar Khot, Ashish Sabharwal and Dan Roth.
SIGNLL Conference on Natural Language Learning (CoNLL), 2017. [poster] [code]

Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks.
Parisa Kordjamshidi, Sameer Singh, Daniel Khashabi, Christos Christodoulopoulos, Mark Summons, Saurabh Sinha and Dan Roth.
Seventh International Workshop on Statistical Relational AI (StarAI), 2017. [poster] [code]

EDISON: Feature Extraction for NLP, Simplified.
Mark Sammons, Christos Christodoulopoulos, Parisa Kordjamshidi, Daniel Khashabi, Vivek Srikumar and Dan Roth.
International Conference on Language Resources and Evaluation (LREC), 2016. [poster]

Question Answering via Integer Programming over Semi-Structured Knowledge.
Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Peter Clark, Oren Etzioni and Dan Roth.
International Joint Conferences on Artificial Intelligence (IJCAI), 2016. [slides] [talk] [poster] [code] [demo]

Better call saul: Flexible programming for learning and inference in NLP.
Parisa Kordjamshidi, Daniel Khashabi, Christos Christodoulopoulos, Bhargav Mangipudi, Sameer Singh and Dan Roth.
International Conference on Computational Linguistics (COLING), 2016. [slides] [code]

Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions.
Peter Clark, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Turney and Daniel Khashabi.
Conference on Artificial Intelligence (AAAI), 2016.

Image demosaicing.
Reinhard Sebastian Bernhard Nowozin, Danyal Khashabi, Jeremy Martin Jancsary, Bruce Justin Lindbloom and Andrew William Fitzgibbon.
US Patent 9,344,690 - Google Patents, 2016.

Clustering With Side Information: From a Probabilistic Model to a Deterministic Algorithm.
Daniel Khashabi, Jeffrey Yufei Liu, John Wieting and Feng Liang.
arXiv preprint arXiv:1508.06235, 2015. [code]

Online Learning with Adversarial Delays.
Kent Quanrud and Daniel Khashabi.
Advances in Neural Information Processing Systems (NeurIPS), 2015. [poster]

Solving Hard Coreference Problems.
Haoruo Peng, Daniel Khashabi and Dan Roth.
Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2015. [slides]

Illinois-Profiler: Knowledge Schemas at Scale.
Zhiye Fei, Daniel Khashabi, Haoruo Peng, Hao Wu and Dan Roth.
Workshop on Cognitive Knowledge Acquisition and Applications (Cognitum), 2015. [slides] [poster]

Joint Demosaicing and Denoising via Learned Nonparametric Random Fields.
Daniel Khashabi, Sebastian Nowozin, Jeremy Jancsary and Andrew W Fitzgibbon.
IEEE Transactions on Image Processing (TIP), 2014. [slides]