survey

Free Access

Just Accepted

Natural Language Reasoning, A Survey

Authors:
Fei Yu

The Chinese University of Hong Kong - Shenzhen, Shenzhen, China

The Chinese University of Hong Kong - Shenzhen, Shenzhen, China

0009-0005-0400-4599
Search about this author

,
Hongbo Zhang

The Chinese University of Hong Kong - Shenzhen, Shenzhen China

The Chinese University of Hong Kong - Shenzhen, Shenzhen China

0000-0003-0425-3673
Search about this author

,
Prayag Tiwari

School of Information Technology, Halmstad University, Halmstad, Sweden

School of Information Technology, Halmstad University, Halmstad, Sweden

0000-0002-2851-4260
Search about this author

,
Benyou Wang

School of Data Science, The Chinese University of Hong Kong - Shenzhen, Shenzhen China

School of Data Science, The Chinese University of Hong Kong - Shenzhen, Shenzhen China

0000-0002-1501-9914
Search about this author

Authors Info & Claims

ACM Computing SurveysAccepted on April 2024https://doi.org/10.1145/3664194

Online AM:09 May 2024Publication History

ACM Computing Surveys

Abstract

This survey paper proposes a clearer view of natural language reasoning in the field of Natural Language Processing (NLP), both conceptually and practically. Conceptually, we provide a distinct definition for natural language reasoning in NLP, based on both philosophy and NLP scenarios, discuss what types of tasks require reasoning, and introduce a taxonomy of reasoning. Practically, we conduct a comprehensive literature review on natural language reasoning in NLP, mainly covering classical logical reasoning, natural language inference, multi-hop question answering, and commonsense reasoning. The paper also identifies and views backward reasoning, a powerful paradigm for multi-step reasoning, and introduces defeasible reasoning as one of the most important future directions in natural language reasoning research. We focus on single-modality unstructured natural language text, excluding neuro-symbolic research and mathematical reasoning.

References

Shourya Aggarwal, Divyanshu Mandowara, Vishwajeet Agrawal, Dinesh Khandelwal, Parag Singla, and Dinesh Garg. 2021. Explanations for CommonsenseQA: New Dataset and Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 3050–3065. https://doi.org/10.18653/v1/2021.acl-long.238Google ScholarCross Ref
Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou. 2022. What learning algorithm is in-context learning? Investigations with linear models. CoRR abs/2211.15661(2022). https://doi.org/10.48550/arXiv.2211.15661 arXiv:2211.15661Google ScholarCross Ref
Peter Adam Angeles. 1981. Dictionary of Philosophy. Barnes & Noble Books.Google Scholar
Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, and Pascale Fung. 2023. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. CoRR abs/2302.04023(2023). https://doi.org/10.48550/arXiv.2302.04023 arXiv:2302.04023Google ScholarCross Ref
Qiming Bao, Alex Yuxuan Peng, Tim Hartill, Neset Tan, Zhenyun Deng, Michael Witbrock, and Jiamou Liu. 2022. Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation. The 2nd International Joint Conference on Learning and Reasoning and 16th International Workshop on Neural-Symbolic Learning and Reasoning (IJCLR-NeSy 2022).Google Scholar
Gregor Betz, Christian Voigt, and Kyle Richardson. 2021. Critical Thinking for Language Models. In IWCS. Association for Computational Linguistics, 63–75.Google Scholar
Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Wen-tau Yih, and Yejin Choi. 2020. Abductive Commonsense Reasoning. In ICLR. OpenReview.net.Google Scholar
Yonatan Bisk, Rowan Zellers, Ronan Le Bras, Jianfeng Gao, and Yejin Choi. 2020. PIQA: Reasoning about Physical Commonsense in Natural Language. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 7432–7439. https://ojs.aaai.org/index.php/AAAI/article/view/6239Google Scholar
Simon Blackburn. 2008. The Oxford Dictionary of Philosophy. Oxford University Press.Google Scholar
Michael Boratko, Xiang Li, Tim O’Gorman, Rajarshi Das, Dan Le, and Andrew McCallum. 2020. ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning. In EMNLP (1). Association for Computational Linguistics, 1122–1136.Google Scholar
Kaj Bostrom, Xinyu Zhao, Swarat Chaudhuri, and Greg Durrett. 2021. Flexible Generation of Natural Language Deductions. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 6266–6278. https://doi.org/10.18653/v1/2021.emnlp-main.506Google ScholarCross Ref
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015, Lluís Màrquez, Chris Callison-Burch, Jian Su, Daniele Pighin, and Yuval Marton (Eds.). The Association for Computational Linguistics, 632–642. https://doi.org/10.18653/v1/d15-1075Google ScholarCross Ref
The Editors of Encyclopaedia Britannica. 2017. inference. Encyclopedia Britannica, 16 Jun. 2017(2017). https://www.britannica.com/topic/inference-reason.Google Scholar
The Editors of Encyclopaedia Britannica. 2020. reason. Encyclopedia Britannica, 15 May. 2020(2020). https://www.britannica.com/topic/reason.Google Scholar
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.). https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.htmlGoogle Scholar
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, and Yi Zhang. 2023. Sparks of Artificial General Intelligence: Early experiments with GPT-4. CoRR abs/2303.12712(2023). https://doi.org/10.48550/arXiv.2303.12712 arXiv:2303.12712Google ScholarCross Ref
Kevin Burton, Akshay Java, and Ian Soboroff. 2009. The icwsm 2009 spinn3r dataset. In Third Annual Conference on Weblogs and Social Media (ICWSM 2009).Google Scholar
Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, and Phil Blunsom. 2018. e-SNLI: Natural Language Inference with Natural Language Explanations. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 9560–9572. https://proceedings.neurips.cc/paper/2018/hash/4c7a167bb329bd92580a99ce422d6fa6-Abstract.htmlGoogle Scholar
Tuhin Chakrabarty, Debanjan Ghosh, Adam Poliak, and Smaranda Muresan. 2021. Figurative Language in Recognizing Textual Entailment. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021(Findings of ACL, Vol. ACL/IJCNLP 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 3354–3361. https://doi.org/10.18653/v1/2021.findings-acl.297Google ScholarCross Ref
Jifan Chen and Greg Durrett. 2019. Understanding Dataset Design Choices for Multi-hop Reasoning. In NAACL-HLT (1). Association for Computational Linguistics, 4026–4032.Google Scholar
Tongfei Chen, Zhengping Jiang, Adam Poliak, Keisuke Sakaguchi, and Benjamin Van Durme. 2020. Uncertain Natural Language Inference. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 8772–8779. https://doi.org/10.18653/v1/2020.acl-main.774Google ScholarCross Ref
Wenhu Chen, Hongmin Wang, Jianshu Chen, Yunkai Zhang, Hong Wang, Shiyang Li, Xiyou Zhou, and William Yang Wang. 2020. TabFact: A Large-scale Dataset for Table-based Fact Verification. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=rkeJRhNYDHGoogle Scholar
Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Y. Zhao, Yanping Huang, Andrew M. Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei. 2022. Scaling Instruction-Finetuned Language Models. CoRR abs/2210.11416(2022). https://doi.org/10.48550/arXiv.2210.11416 arXiv:2210.11416Google ScholarCross Ref
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. 2018. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. CoRR abs/1803.05457(2018). arXiv:1803.05457 http://arxiv.org/abs/1803.05457Google Scholar
Peter Clark, Oyvind Tafjord, and Kyle Richardson. 2020. Transformers as Soft Reasoners over Language. In IJCAI. ijcai.org, 3882–3890.Google Scholar
Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel R. Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. XNLI: Evaluating Cross-lingual Sentence Representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, 2475–2485. https://doi.org/10.18653/v1/d18-1269Google ScholarCross Ref
Antonia Creswell and Murray Shanahan. 2022. Faithful Reasoning Using Large Language Models. CoRR abs/2208.14271(2022).Google Scholar
Antonia Creswell, Murray Shanahan, and Irina Higgins. 2022. Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning. CoRR abs/2205.09712(2022).Google Scholar
Ido Dagan, Dan Roth, Mark Sammons, and Fabio Massimo Zanzotto. 2013. Recognizing Textual Entailment: Models and Applications. Morgan & Claypool Publishers. https://doi.org/10.2200/S00509ED1V01Y201305HLT023Google ScholarCross Ref
Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Zhifang Sui, and Furu Wei. 2022. Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers. CoRR abs/2212.10559(2022). https://doi.org/10.48550/arXiv.2212.10559 arXiv:2212.10559Google ScholarCross Ref
Bhavana Dalvi, Peter Jansen, Oyvind Tafjord, Zhengnan Xie, Hannah Smith, Leighanna Pipatanangkura, and Peter Clark. 2021. Explaining Answers with Entailment Trees. In EMNLP (1). Association for Computational Linguistics, 7358–7370.Google Scholar
Ishita Dasgupta, Andrew K. Lampinen, Stephanie C. Y. Chan, Antonia Creswell, Dharshan Kumaran, James L. McClelland, and Felix Hill. 2022. Language models show human-like content effects on reasoning. CoRR abs/2207.07051(2022).Google Scholar
Xiang Deng, Yu Su, Alyssa Lees, You Wu, Cong Yu, and Huan Sun. 2021. ReasonBERT: Pre-trained to Reason with Distant Supervision. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 6112–6127. https://doi.org/10.18653/v1/2021.emnlp-main.494Google ScholarCross Ref
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/n19-1423Google ScholarCross Ref
Qingxiu Dong, Ziwei Qin, Heming Xia, Tian Feng, Shoujie Tong, Haoran Meng, Lin Xu, Zhongyu Wei, Weidong Zhan, Baobao Chang, Sujian Li, Tianyu Liu, and Zhifang Sui. 2022. Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 932–946. https://doi.org/10.18653/v1/2022.acl-long.66Google ScholarCross Ref
Li Du, Xiao Ding, Ting Liu, and Bing Qin. 2021. Learning Event Graph Knowledge for Abductive Reasoning. In ACL/IJCNLP (1). Association for Computational Linguistics, 5181–5190.Google Scholar
Li Du, Xiao Ding, Kai Xiong, Ting Liu, and Bing Qin. 2022. e-CARE: a New Dataset for Exploring Explainable Causal Reasoning. In ACL (1). Association for Computational Linguistics, 432–446.Google Scholar
Denis Emelin, Ronan Le Bras, Jena D. Hwang, Maxwell Forbes, and Yejin Choi. 2021. Moral Stories: Situated Reasoning about Norms, Intents, Actions, and their Consequences. In EMNLP (1). Association for Computational Linguistics, 698–718.Google Scholar
Zichu Fei, Qi Zhang, Tao Gui, Di Liang, Sirui Wang, Wei Wu, and Xuanjing Huang. 2022. CQG: A Simple and Effective Controlled Generation Framework for Multi-hop Question Generation. In ACL (1). Association for Computational Linguistics, 6896–6906.Google Scholar
Yanlin Feng, Xinyue Chen, Bill Yuchen Lin, Peifeng Wang, Jun Yan, and Xiang Ren. 2020. Scalable Multi-Hop Relational Reasoning for Knowledge-Aware Question Answering. In EMNLP (1). Association for Computational Linguistics, 1295–1309.Google Scholar
Maurice A Finocchiaro. 1984. Informal logic and the theory of reasoning. Informal Logic 6, 2 (1984).Google Scholar
Maxwell Forbes, Jena D. Hwang, Vered Shwartz, Maarten Sap, and Yejin Choi. 2020. Social Chemistry 101: Learning to Reason about Social and Moral Norms. In EMNLP (1). Association for Computational Linguistics, 653–670.Google Scholar
Ahti-Veikko Pietarinen Francesco Bellucci. 2022. Peirce’s Logic. The Internet Encyclopedia of Philosophy, ISSN 2161-0002 (2022). https://iep.utm.edu/peir-log/.Google Scholar
Saadia Gabriel, Skyler Hallinan, Maarten Sap, Pemi Nguyen, Franziska Roesner, Eunsol Choi, and Yejin Choi. 2022. Misinfo Reaction Frames: Reasoning about Readers’ Reactions to News Headlines. In ACL (1). Association for Computational Linguistics, 3108–3127.Google Scholar
Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. 2021. Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies. Trans. Assoc. Comput. Linguistics 9 (2021), 346–361.Google ScholarCross Ref
Alvin I Goldman. 1986. Epistemology and cognition. harvard university Press.Google Scholar
Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, and Weizhu Chen. 2023. CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing. CoRR abs/2305.11738(2023). https://doi.org/10.48550/ARXIV.2305.11738 arXiv:2305.11738Google ScholarCross Ref
Trudy Govier. 1989. Critical thinking as argument analysis. Argumentation 3, 2 (1989), 115–126.Google ScholarCross Ref
Ivan Habernal, Henning Wachsmuth, Iryna Gurevych, and Benno Stein. 2018. The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants. In NAACL-HLT. Association for Computational Linguistics, 1930–1940.Google Scholar
Simeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Luke Benson, Lucy Sun, Ekaterina Zubova, Yujie Qiao, Matthew Burtell, David Peng, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni, Linyong Nan, Jungo Kasai, Tao Yu, Rui Zhang, Shafiq Joty, Alexander R. Fabbri, Wojciech Kryscinski, Xi Victoria Lin, Caiming Xiong, and Dragomir Radev. 2022. FOLIO: Natural Language Reasoning with First-Order Logic. CoRR abs/2209.00840(2022).Google Scholar
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring Massive Multitask Language Understanding. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=d7KBjmI3GmQGoogle Scholar
Jaakko J. Hintikka. 2022. logic. Encyclopedia Britannica, 9 Jun. 2022(2022). https://www.britannica.com/topic/logic.Google Scholar
Matthew Ho, Aditya Sharma, Justin Chang, Michael Saxon, Sharon Levy, Yujie Lu, and William Yang Wang. 2022. WikiWhy: Answering and Explaining Cause-and-Effect Questions. CoRR abs/2210.12152(2022). https://doi.org/10.48550/arXiv.2210.12152 arXiv:2210.12152Google ScholarCross Ref
Namgyu Ho, Laura Schmid, and Se-Young Yun. 2022. Large Language Models Are Reasoning Teachers. CoRR abs/2212.10071(2022). https://doi.org/10.48550/arXiv.2212.10071 arXiv:2212.10071Google ScholarCross Ref
Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. 2020. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps. In COLING. International Committee on Computational Linguistics, 6609–6625.Google Scholar
Ruixin Hong, Hongming Zhang, Xintong Yu, and Changshui Zhang. 2022. METGEN: A Module-Based Entailment Tree Generation Framework for Answer Explanation. In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022, Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz (Eds.). Association for Computational Linguistics, 1887–1905. https://doi.org/10.18653/v1/2022.findings-naacl.145Google ScholarCross Ref
Md Mosharaf Hossain, Venelin Kovatchev, Pranoy Dutta, Tiffany Kao, Elizabeth Wei, and Eduardo Blanco. 2020. An Analysis of Natural Language Inference Benchmarks through the Lens of Negation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 9106–9118. https://doi.org/10.18653/v1/2020.emnlp-main.732Google ScholarCross Ref
Hai Hu, Kyle Richardson, Liang Xu, Lu Li, Sandra Kübler, and Lawrence S. Moss. 2020. OCNLI: Original Chinese Natural Language Inference. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020(Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 3512–3526. https://doi.org/10.18653/V1/2020.FINDINGS-EMNLP.314Google ScholarCross Ref
Jie Huang and Kevin Chen-Chuan Chang. 2022. Towards Reasoning in Large Language Models: A Survey. CoRR abs/2212.10403(2022). https://doi.org/10.48550/arXiv.2212.10403 arXiv:2212.10403Google ScholarCross Ref
Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, and Denny Zhou. 2023. Large Language Models Cannot Self-Correct Reasoning Yet. CoRR abs/2310.01798(2023). https://doi.org/10.48550/ARXIV.2310.01798 arXiv:2310.01798Google ScholarCross Ref
Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, and Jiawei Han. 2022. Large Language Models Can Self-Improve. CoRR abs/2210.11610(2022). https://doi.org/10.48550/arXiv.2210.11610 arXiv:2210.11610Google ScholarCross Ref
Lifu Huang, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. 2019. Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning. In EMNLP/IJCNLP (1). Association for Computational Linguistics, 2391–2401.Google Scholar
Yongjie Huang and Meng Yang. 2021. Breadth First Reasoning Graph for Multi-hop Question Answering. In NAACL-HLT. Association for Computational Linguistics, 5810–5821.Google Scholar
Yinya Huang, Hongming Zhang, Ruixin Hong, Xiaodan Liang, Changshui Zhang, and Dong Yu. 2022. MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure. CoRR abs/2210.12487(2022).Google Scholar
Patrick J Hurley. 2014. A concise introduction to logic. Cengage Learning.Google Scholar
Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jeff Da, Keisuke Sakaguchi, Antoine Bosselut, and Yejin Choi. 2021. (Comet-) Atomic 2020: On Symbolic and Neural Commonsense Knowledge Graphs. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 6384–6392. https://ojs.aaai.org/index.php/AAAI/article/view/16792Google Scholar
Naoya Inoue, Pontus Stenetorp, and Kentaro Inui. 2020. R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason. In ACL. Association for Computational Linguistics, 6740–6750.Google Scholar
Naoya Inoue, Harsh Trivedi, Steven Sinha, Niranjan Balasubramanian, and Kentaro Inui. 2021. Summarize-then-Answer: Generating Concise Explanations for Multi-hop Reading Comprehension. In EMNLP (1). Association for Computational Linguistics, 6064–6080.Google Scholar
Harsh Jhamtani and Peter Clark. 2020. Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multihop Question-Answering. In EMNLP (1). Association for Computational Linguistics, 137–150.Google Scholar
Yichen Jiang and Mohit Bansal. 2019. Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA. In ACL (1). Association for Computational Linguistics, 2726–2736.Google Scholar
Fangkai Jiao, Yangyang Guo, Xuemeng Song, and Liqiang Nie. 2022. MERIt: Meta-Path Guided Contrastive Learning for Logical Reasoning. In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 3496–3509. https://doi.org/10.18653/v1/2022.findings-acl.276Google ScholarCross Ref
Jaehun Jung, Lianhui Qin, Sean Welleck, Faeze Brahman, Chandra Bhagavatula, Ronan Le Bras, and Yejin Choi. 2022. Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations. CoRR abs/2205.11822(2022). https://doi.org/10.48550/arXiv.2205.11822 arXiv:2205.11822Google ScholarCross Ref
Daniel Kahneman. 2011. Thinking, fast and slow. Macmillan.Google Scholar
Seyed Mehran Kazemi, Najoung Kim, Deepti Bhatia, Xin Xu, and Deepak Ramachandran. 2022. LAMBADA: Backward Chaining for Automated Reasoning in Natural Language. CoRR abs/2212.13894(2022). https://doi.org/10.48550/arXiv.2212.13894 arXiv:2212.13894Google ScholarCross Ref
Daniel Khashabi, Arman Cohan, Siamak Shakeri, Pedram Hosseini, Pouya Pezeshkpour, Malihe Alikhani, Moin Aminnaseri, Marzieh Bitaab, Faeze Brahman, Sarik Ghazarian, Mozhdeh Gheini, Arman Kabiri, Rabeeh Karimi Mahabadi, Omid Memarrast, Ahmadreza Mosallanezhad, Erfan Noury, Shahab Raji, Mohammad Sadegh Rasooli, Sepideh Sadeghi, Erfan Sadeqi Azer, Niloofar Safi Samghabadi, Mahsa Shafaei, Saber Sheybani, Ali Tazarv, and Yadollah Yaghoobzadeh. 2021. ParsiNLU: A Suite of Language Understanding Challenges for Persian. Trans. Assoc. Comput. Linguistics 9 (2021), 1147–1162. https://doi.org/10.1162/TACL_A_00419Google ScholarCross Ref
Tushar Khot, Peter Clark, Michal Guerquin, Peter Jansen, and Ashish Sabharwal. 2020. QASC: A Dataset for Question Answering via Sentence Composition. In AAAI. AAAI Press, 8082–8090.Google Scholar
Tushar Khot, Daniel Khashabi, Kyle Richardson, Peter Clark, and Ashish Sabharwal. 2021. Text Modular Networks: Learning to Decompose Tasks in the Language of Existing Models. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tür, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou (Eds.). Association for Computational Linguistics, 1264–1279. https://doi.org/10.18653/v1/2021.naacl-main.99Google ScholarCross Ref
Tushar Khot, Kyle Richardson, Daniel Khashabi, and Ashish Sabharwal. 2022. Hey AI, Can You Solve Complex Tasks by Talking to Agents?. In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 1808–1823. https://doi.org/10.18653/V1/2022.FINDINGS-ACL.142Google ScholarCross Ref
Tushar Khot, Ashish Sabharwal, and Peter Clark. 2018. SciTaiL: A Textual Entailment Dataset from Science Question Answering. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, Sheila A. McIlraith and Kilian Q. Weinberger (Eds.). AAAI Press, 5189–5197. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17368Google ScholarCross Ref
Tassilo Klein and Moin Nabi. 2019. Attention Is (not) All You Need for Commonsense Reasoning. In ACL (1). Association for Computational Linguistics, 4831–4836.Google Scholar
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. CoRR abs/2205.11916(2022).Google Scholar
Yash Kumar Lal, Nathanael Chambers, Raymond J. Mooney, and Niranjan Balasubramanian. 2021. TellMeWhy: A Dataset for Answering Why-Questions in Narratives. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021(Findings of ACL, Vol. ACL/IJCNLP 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 596–610. https://doi.org/10.18653/v1/2021.findings-acl.53Google ScholarCross Ref
Hung Le, Chinnadhurai Sankar, Seungwhan Moon, Ahmad Beirami, Alborz Geramifard, and Satwik Kottur. 2021. DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021, Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 5651–5665. https://doi.org/10.18653/v1/2021.acl-long.439Google ScholarCross Ref
Kyungjae Lee, Seung-won Hwang, Sang-eun Han, and Dohyeon Lee. 2021. Robustifying Multi-hop QA through Pseudo-Evidentiality Training. In ACL/IJCNLP (1). Association for Computational Linguistics, 6110–6119.Google Scholar
Douglas B. Lenat. 1995. CYC: A Large-Scale Investment in Knowledge Infrastructure. Commun. ACM 38, 11 (1995), 32–38. https://doi.org/10.1145/219717.219745Google ScholarDigital Library
Zhengzhong Liang, Steven Bethard, and Mihai Surdeanu. 2021. Explainable Multi-hop Verbal Reasoning Through Internal Monologue. In NAACL-HLT. Association for Computational Linguistics, 1225–1250.Google Scholar
Bill Yuchen Lin, Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Xiang Ren, and William W. Cohen. 2021. Differentiable Open-Ended Commonsense Reasoning. In NAACL-HLT. Association for Computational Linguistics, 4611–4625.Google Scholar
Bill Yuchen Lin, Wangchunshu Zhou, Ming Shen, Pei Zhou, Chandra Bhagavatula, Yejin Choi, and Xiang Ren. 2020. CommonGen: A Constrained Text Generation Challenge for Generative Commonsense Reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020(Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1823–1840. https://doi.org/10.18653/v1/2020.findings-emnlp.165Google ScholarCross Ref
Kevin Lin, Oyvind Tafjord, Peter Clark, and Matt Gardner. 2019. Reasoning Over Paragraph Effects in Situations. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering, MRQA@EMNLP 2019, Hong Kong, China, November 4, 2019, Adam Fisch, Alon Talmor, Robin Jia, Minjoon Seo, Eunsol Choi, and Danqi Chen (Eds.). Association for Computational Linguistics, 58–62. https://doi.org/10.18653/v1/D19-5808Google ScholarCross Ref
Hanmeng Liu, Leyang Cui, Jian Liu, and Yue Zhang. 2021. Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 13388–13396. https://ojs.aaai.org/index.php/AAAI/article/view/17580Google ScholarCross Ref
Hugo Liu and Push Singh. 2004. ConceptNet—a practical commonsense reasoning tool-kit. BT technology journal 22, 4 (2004), 211–226.Google Scholar
Jian Liu, Leyang Cui, Hanmeng Liu, Dandan Huang, Yile Wang, and Yue Zhang. 2020. LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, Christian Bessiere (Ed.). ijcai.org, 3622–3628. https://doi.org/10.24963/ijcai.2020/501Google ScholarCross Ref
Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, and Hannaneh Hajishirzi. 2022. Generated Knowledge Prompting for Commonsense Reasoning. In ACL (1). Association for Computational Linguistics, 3154–3169.Google Scholar
John Locke. 1847. An essay concerning human understanding. Kay & Troutman.Google Scholar
Man Luo, Shrinidhi Kumbhar, Ming Shen, Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, and Chitta Baral. 2023. Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models. CoRR abs/2310.00836(2023). https://doi.org/10.48550/ARXIV.2310.00836 arXiv:2310.00836Google ScholarCross Ref
Aman Madaan, Dheeraj Rajagopal, Niket Tandon, Yiming Yang, and Eduard H. Hovy. 2021. Could you give me a hint ? Generating inference graphs for defeasible reasoning. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021(Findings of ACL, Vol. ACL/IJCNLP 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 5138–5147. https://doi.org/10.18653/v1/2021.findings-acl.456Google ScholarCross Ref
Aman Madaan, Niket Tandon, Dheeraj Rajagopal, Peter Clark, Yiming Yang, and Eduard H. Hovy. 2021. Think about it! Improving defeasible reasoning by first modeling the question scenario. In EMNLP (1). Association for Computational Linguistics, 6291–6310.Google Scholar
Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adámek, Eric Malmi, and Aliaksei Severyn. 2022. Teaching Small Language Models to Reason. CoRR abs/2212.08410(2022). https://doi.org/10.48550/arXiv.2212.08410 arXiv:2212.08410Google ScholarCross Ref
Tom McCoy, Ellie Pavlick, and Tal Linzen. 2019. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 3428–3448. https://doi.org/10.18653/v1/p19-1334Google ScholarCross Ref
Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. 2018. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, 2381–2391. https://doi.org/10.18653/v1/d18-1260Google ScholarCross Ref
Sewon Min, Eric Wallace, Sameer Singh, Matt Gardner, Hannaneh Hajishirzi, and Luke Zettlemoyer. 2019. Compositional Questions Do Not Necessitate Multi-hop Reasoning. In ACL (1). Association for Computational Linguistics, 4249–4257.Google Scholar
Sewon Min, Victor Zhong, Luke Zettlemoyer, and Hannaneh Hajishirzi. 2019. Multi-hop Reading Comprehension through Question Decomposition and Rescoring. In ACL (1). Association for Computational Linguistics, 6097–6109.Google Scholar
Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James F. Allen. 2016. A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories. In NAACL HLT 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12-17, 2016, Kevin Knight, Ani Nenkova, and Owen Rambow (Eds.). The Association for Computational Linguistics, 839–849. https://doi.org/10.18653/v1/n16-1098Google ScholarCross Ref
Yixin Nie, Adina Williams, Emily Dinan, Mohit Bansal, Jason Weston, and Douwe Kiela. 2020. Adversarial NLI: A New Benchmark for Natural Language Understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 4885–4901. https://doi.org/10.18653/V1/2020.ACL-MAIN.441Google ScholarCross Ref
Yasumasa Onoe, Michael J. Q. Zhang, Eunsol Choi, and Greg Durrett. 2021. CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge. In NeurIPS Datasets and Benchmarks.Google Scholar
Santiago Ontañón, Joshua Ainslie, Vaclav Cvicek, and Zachary Fisher. 2022. LogicInference: A New Dataset for Teaching Logical Inference to seq2seq Models. CoRR abs/2203.15099(2022). https://doi.org/10.48550/arXiv.2203.15099 arXiv:2203.15099Google ScholarCross Ref
Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, and William Yang Wang. 2021. Unsupervised Multi-hop Question Answering by Question Generation. In NAACL-HLT. Association for Computational Linguistics, 5866–5880.Google Scholar
Pruthvi Patel, Swaroop Mishra, Mihir Parmar, and Chitta Baral. 2022. Is a Question Decomposition Unit All We Need?CoRR abs/2205.12538(2022). https://doi.org/10.48550/arXiv.2205.12538 arXiv:2205.12538Google ScholarCross Ref
Charles Sanders Peirce. 1992. Reasoning and the logic of things: The Cambridge conferences lectures of 1898. Harvard University Press.Google Scholar
Ethan Perez, Patrick S. H. Lewis, Wen-tau Yih, Kyunghyun Cho, and Douwe Kiela. 2020. Unsupervised Question Decomposition for Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 8864–8880. https://doi.org/10.18653/v1/2020.emnlp-main.713Google ScholarCross Ref
Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Yan Gao, Qiang Fu, Jian-Guang Lou, and Weizhu Chen. 2022. Reasoning Like Program Executors. CoRR abs/2201.11473(2022). arXiv:2201.11473 https://arxiv.org/abs/2201.11473Google Scholar
Adam Poliak, Jason Naradowsky, Aparajita Haldar, Rachel Rudinger, and Benjamin Van Durme. 2018. Hypothesis Only Baselines in Natural Language Inference. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, *SEM@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5-6, 2018, Malvina Nissim, Jonathan Berant, and Alessandro Lenci (Eds.). Association for Computational Linguistics, 180–191. https://doi.org/10.18653/v1/s18-2023Google ScholarCross Ref
Ofir Press, Muru Zhang, Sewon Min, Ludwig Schmidt, Noah A. Smith, and Mike Lewis. 2022. Measuring and Narrowing the Compositionality Gap in Language Models. CoRR abs/2210.03350(2022).Google Scholar
Ben Prystawski and Noah D. Goodman. 2023. Why think step-by-step? Reasoning emerges from the locality of experience. CoRR abs/2304.03843(2023). https://doi.org/10.48550/arXiv.2304.03843 arXiv:2304.03843Google ScholarCross Ref
Peng Qi, Haejun Lee, Tg Sido, and Christopher D. Manning. 2021. Answering Open-Domain Questions of Varying Reasoning Steps from Text. In EMNLP (1). Association for Computational Linguistics, 3599–3614.Google Scholar
Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, and Huajun Chen. 2022. Reasoning with Language Model Prompting: A Survey. CoRR abs/2212.09597(2022). https://doi.org/10.48550/arXiv.2212.09597 arXiv:2212.09597Google ScholarCross Ref
Lianhui Qin, Antoine Bosselut, Ari Holtzman, Chandra Bhagavatula, Elizabeth Clark, and Yejin Choi. 2019. Counterfactual Story Reasoning and Generation. In EMNLP/IJCNLP (1). Association for Computational Linguistics, 5042–5052.Google Scholar
Lianhui Qin, Vered Shwartz, Peter West, Chandra Bhagavatula, Jena D. Hwang, Ronan Le Bras, Antoine Bosselut, and Yejin Choi. 2020. Back to the Future: Unsupervised Backprop-based Decoding for Counterfactual and Abductive Commonsense Reasoning. In EMNLP (1). Association for Computational Linguistics, 794–805.Google Scholar
Lin Qiu, Yunxuan Xiao, Yanru Qu, Hao Zhou, Lei Li, Weinan Zhang, and Yong Yu. 2019. Dynamically Fused Graph Network for Multi-hop Reasoning. In ACL (1). Association for Computational Linguistics, 6140–6150.Google Scholar
Hanhao Qu, Yu Cao, Jun Gao, Liang Ding, and Ruifeng Xu. 2022. Interpretable Proof Generation via Iterative Backward Reasoning. In NAACL-HLT. Association for Computational Linguistics, 2968–2981.Google Scholar
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. OpenAI.Google Scholar
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 21(2020), 140:1–140:67. http://jmlr.org/papers/v21/20-074.htmlGoogle Scholar
Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, and Richard Socher. 2019. Explain Yourself! Leveraging Language Models for Commonsense Reasoning. In ACL (1). Association for Computational Linguistics, 4932–4942.Google Scholar
Hannah Rashkin, Maarten Sap, Emily Allaway, Noah A. Smith, and Yejin Choi. 2018. Event2Mind: Commonsense Inference on Events, Intents, and Reactions. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, 463–473. https://doi.org/10.18653/v1/P18-1043Google ScholarCross Ref
Abhilasha Ravichander, Matt Gardner, and Ana Marasovic. 2022. CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 8729–8755. https://aclanthology.org/2022.emnlp-main.598Google ScholarCross Ref
Danilo Neves Ribeiro, Shen Wang, Xiaofei Ma, Rui Dong, Xiaokai Wei, Henghui Zhu, Xinchi Chen, Peng Xu, Zhiheng Huang, Andrew O. Arnold, and Dan Roth. 2022. Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner. In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022, Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz (Eds.). Association for Computational Linguistics, 465–475. https://doi.org/10.18653/v1/2022.findings-naacl.35Google ScholarCross Ref
Kyle Richardson and Ashish Sabharwal. 2022. Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022. AAAI Press, 11209–11219. https://doi.org/10.1609/AAAI.V36I10.21371Google ScholarCross Ref
Melissa Roemmele, Cosmin Adrian Bejan, and Andrew S. Gordon. 2011. Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning. In Logical Formalizations of Commonsense Reasoning, Papers from the 2011 AAAI Spring Symposium, Technical Report SS-11-06, Stanford, California, USA, March 21-23, 2011. AAAI. http://www.aaai.org/ocs/index.php/SSS/SSS11/paper/view/2418Google Scholar
Rachel Rudinger, Vered Shwartz, Jena D. Hwang, Chandra Bhagavatula, Maxwell Forbes, Ronan Le Bras, Noah A. Smith, and Yejin Choi. 2020. Thinking Like a Skeptic: Defeasible Inference in Natural Language. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020(Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 4661–4675. https://doi.org/10.18653/v1/2020.findings-emnlp.418Google ScholarCross Ref
Dagobert D Runes. 2001. The dictionary of philosophy. Citadel Press.Google Scholar
Mobashir Sadat and Cornelia Caragea. 2022. SciNLI: A Corpus for Natural Language Inference on Scientific Text. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 7399–7409. https://doi.org/10.18653/v1/2022.acl-long.511Google ScholarCross Ref
Marzieh Saeidi, Max Bartolo, Patrick S. H. Lewis, Sameer Singh, Tim Rocktäschel, Mike Sheldon, Guillaume Bouchard, and Sebastian Riedel. 2018. Interpretation of Natural Language Rules in Conversational Machine Reading. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, 2087–2097. https://doi.org/10.18653/v1/d18-1233Google ScholarCross Ref
Swarnadeep Saha, Sayan Ghosh, Shashank Srivastava, and Mohit Bansal. 2020. PRover: Proof Generation for Interpretable Reasoning over Rules. In EMNLP (1). Association for Computational Linguistics, 122–136.Google Scholar
Swarnadeep Saha, Yixin Nie, and Mohit Bansal. 2020. ConjNLI: Natural Language Inference Over Conjunctive Sentences. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 8240–8252. https://doi.org/10.18653/v1/2020.emnlp-main.661Google ScholarCross Ref
Swarnadeep Saha, Prateek Yadav, and Mohit Bansal. 2021. multiPRover: Generating Multiple Proofs for Improved Interpretability in Rule Reasoning. In NAACL-HLT. Association for Computational Linguistics, 3662–3677.Google Scholar
Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. 2020. WinoGrande: An Adversarial Winograd Schema Challenge at Scale. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 8732–8740. https://ojs.aaai.org/index.php/AAAI/article/view/6399Google ScholarCross Ref
Soumya Sanyal, Zeyi Liao, and Xiang Ren. 2022. RobustLR: Evaluating Robustness to Logical Perturbation in Deductive Reasoning. CoRR abs/2205.12598(2022). https://doi.org/10.48550/arXiv.2205.12598 arXiv:2205.12598Google ScholarCross Ref
Soumya Sanyal, Harman Singh, and Xiang Ren. 2022. FaiRR: Faithful and Robust Deductive Reasoning over Natural Language. In ACL (1). Association for Computational Linguistics, 1075–1093.Google Scholar
Soumya Sanyal, Yichong Xu, Shuohang Wang, Ziyi Yang, Reid Pryzant, Wenhao Yu, Chenguang Zhu, and Xiang Ren. 2022. APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning. CoRR abs/2212.09282(2022). https://doi.org/10.48550/arXiv.2212.09282 arXiv:2212.09282Google ScholarCross Ref
Maarten Sap, Ronan Le Bras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, and Yejin Choi. 2019. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. In AAAI. AAAI Press, 3027–3035.Google Scholar
Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. 2020. Social Bias Frames: Reasoning about Social and Power Implications of Language. In ACL. Association for Computational Linguistics, 5477–5490.Google Scholar
Maarten Sap, Hannah Rashkin, Derek Chen, Ronan Le Bras, and Yejin Choi. 2019. Social IQa: Commonsense Reasoning about Social Interactions. In EMNLP/IJCNLP (1). Association for Computational Linguistics, 4462–4472.Google Scholar
Abulhair Saparov and He He. 2022. Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought. CoRR abs/2210.01240(2022).Google Scholar
Kumar Shridhar, Alessandro Stolfo, and Mrinmaya Sachan. 2022. Distilling Multi-Step Reasoning Capabilities of Large Language Models into Smaller Models via Semantic Decompositions. CoRR abs/2212.00193(2022). https://doi.org/10.48550/arXiv.2212.00193 arXiv:2212.00193Google ScholarCross Ref
Koustuv Sinha, Shagun Sodhani, Jin Dong, Joelle Pineau, and William L. Hamilton. 2019. CLUTRR: A Diagnostic Benchmark for Inductive Reasoning from Text. In EMNLP/IJCNLP (1). Association for Computational Linguistics, 4505–4514.Google Scholar
Robyn Speer, Joshua Chin, and Catherine Havasi. 2017. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, Satinder Singh and Shaul Markovitch (Eds.). AAAI Press, 4444–4451. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972Google ScholarCross Ref
Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Santilli, Andreas Stuhlmüller, Andrew M. Dai, Andrew La, Andrew K. Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakas, and et al. 2022. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models. CoRR abs/2206.04615(2022).Google Scholar
Alane Suhr, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, and Yoav Artzi. 2019. A Corpus for Reasoning about Natural Language Grounded in Photographs. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 6418–6428. https://doi.org/10.18653/v1/p19-1644Google ScholarCross Ref
Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, and Jason Wei. 2022. Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them. CoRR abs/2210.09261(2022).Google Scholar
Oyvind Tafjord, Bhavana Dalvi, and Peter Clark. 2021. ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language. In ACL/IJCNLP (Findings)(Findings of ACL, Vol. ACL/IJCNLP 2021). Association for Computational Linguistics, 3621–3634.Google ScholarCross Ref
Oyvind Tafjord, Bhavana Dalvi Mishra, and Peter Clark. 2022. Entailer: Answering Questions with Faithful and Truthful Chains of Reasoning. CoRR abs/2210.12217(2022). https://doi.org/10.48550/arXiv.2210.12217 arXiv:2210.12217Google ScholarCross Ref
Alon Talmor and Jonathan Berant. 2018. The Web as a Knowledge-Base for Answering Complex Questions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), Marilyn A. Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, 641–651. https://doi.org/10.18653/V1/N18-1059Google ScholarCross Ref
Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. 2019. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4149–4158. https://doi.org/10.18653/v1/n19-1421Google ScholarCross Ref
Alon Talmor, Oyvind Tafjord, Peter Clark, Yoav Goldberg, and Jonathan Berant. 2020. Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge. In NeurIPS.Google Scholar
Alon Talmor, Ori Yoran, Ronan Le Bras, Chandra Bhagavatula, Yoav Goldberg, Yejin Choi, and Jonathan Berant. 2021. CommonsenseQA 2.0: Exposing the Limits of AI through Gamification. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, Joaquin Vanschoren and Sai-Kit Yeung (Eds.). https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/3ef815416f775098fe977004015c6193-Abstract-round1.htmlGoogle Scholar
Alexandre Tamborrino, Nicola Pellicanò, Baptiste Pannier, Pascal Voitot, and Louise Naudin. 2020. Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning. In ACL. Association for Computational Linguistics, 3878–3887.Google Scholar
Niket Tandon, Bhavana Dalvi, Joel Grus, Wen-tau Yih, Antoine Bosselut, and Peter Clark. 2018. Reasoning about Actions and State Changes by Injecting Commonsense Knowledge. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, 57–66. https://doi.org/10.18653/v1/d18-1006Google ScholarCross Ref
Niket Tandon, Bhavana Dalvi, Keisuke Sakaguchi, Peter Clark, and Antoine Bosselut. 2019. WIQA: A dataset for ”What if...” reasoning over procedural text. In EMNLP/IJCNLP (1). Association for Computational Linguistics, 6075–6084.Google Scholar
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. 2020. Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning. In EMNLP (1). Association for Computational Linguistics, 8846–8863.Google Scholar
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. 2022. MuSiQue: Multihop Questions via Single-hop Question Composition. Trans. Assoc. Comput. Linguistics 10 (2022), 539–554. https://doi.org/10.1162/tacl_a_00475Google ScholarCross Ref
Masatoshi Tsuchiya. 2018. Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018, Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Kôiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga (Eds.). European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2018/summaries/786.htmlGoogle Scholar
Gladys Tyen, Hassan Mansoor, Peter Chen, Tony Mak, and Victor Carbune. 2023. LLMs cannot find reasoning errors, but can correct them!CoRR abs/2311.08516(2023). https://doi.org/10.48550/ARXIV.2311.08516 arXiv:2311.08516Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.htmlGoogle ScholarDigital Library
David Vilares and Carlos Gómez-Rodríguez. 2019. HEAD-QA: A Healthcare Dataset for Complex Reasoning. In ACL (1). Association for Computational Linguistics, 960–966.Google Scholar
Douglas N Walton. 1990. What is reasoning? What is an argument?The journal of Philosophy 87, 8 (1990), 399–419.Google Scholar
Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.). 3261–3275. https://proceedings.neurips.cc/paper/2019/hash/4496bf24afe7fab6f046bf4923da8de6-Abstract.htmlGoogle Scholar
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. 2019. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net. https://openreview.net/forum?id=rJ4km2R5t7Google Scholar
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, and Denny Zhou. 2022. Self-Consistency Improves Chain of Thought Reasoning in Language Models. CoRR abs/2203.11171(2022).Google Scholar
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. 2022. Emergent Abilities of Large Language Models. CoRR abs/2206.07682(2022). https://doi.org/10.48550/arXiv.2206.07682 arXiv:2206.07682Google ScholarCross Ref
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi, Quoc Le, and Denny Zhou. 2022. Chain of Thought Prompting Elicits Reasoning in Large Language Models. CoRR abs/2201.11903(2022).Google Scholar
Johannes Welbl, Pontus Stenetorp, and Sebastian Riedel. 2018. Constructing Datasets for Multi-hop Reading Comprehension Across Documents. Trans. Assoc. Comput. Linguistics 6 (2018), 287–302.Google ScholarCross Ref
Jason Weston, Antoine Bordes, Sumit Chopra, and Tomás Mikolov. 2016. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In ICLR (Poster).Google Scholar
Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), Marilyn A. Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, 1112–1122. https://doi.org/10.18653/v1/n18-1101Google ScholarCross Ref
Tomer Wolfson, Mor Geva, Ankit Gupta, Yoav Goldberg, Matt Gardner, Daniel Deutch, and Jonathan Berant. 2020. Break It Down: A Question Understanding Benchmark. Trans. Assoc. Comput. Linguistics 8 (2020), 183–198. https://doi.org/10.1162/TACL_A_00309Google ScholarCross Ref
Yuxiang Wu, Matt Gardner, Pontus Stenetorp, and Pradeep Dasigi. 2022. Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 2660–2676. https://doi.org/10.18653/v1/2022.acl-long.190Google ScholarCross Ref
Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. 2022. An Explanation of In-context Learning as Implicit Bayesian Inference. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net. https://openreview.net/forum?id=RdJVFCHjUMIGoogle Scholar
Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson, and Zhenzhong Lan. 2020. CLUE: A Chinese Language Understanding Evaluation Benchmark. In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), December 8-13, 2020, Donia Scott, Núria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, 4762–4772. https://doi.org/10.18653/V1/2020.COLING-MAIN.419Google ScholarCross Ref
Hitomi Yanaka, Koji Mineshima, Daisuke Bekki, Kentaro Inui, Satoshi Sekine, Lasha Abzianidze, and Johan Bos. 2019. Can Neural Networks Understand Monotonicity Reasoning?. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP@ACL 2019, Florence, Italy, August 1, 2019, Tal Linzen, Grzegorz Chrupala, Yonatan Belinkov, and Dieuwke Hupkes (Eds.). Association for Computational Linguistics, 31–40. https://doi.org/10.18653/v1/W19-4804Google ScholarCross Ref
Hitomi Yanaka, Koji Mineshima, Daisuke Bekki, Kentaro Inui, Satoshi Sekine, Lasha Abzianidze, and Johan Bos. 2019. HELP: A Dataset for Identifying Shortcomings of Neural Models in Monotonicity Reasoning. In Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics, *SEM@NAACL-HLT 2019, Minneapolis, MN, USA, June 6-7, 2019, Rada Mihalcea, Ekaterina Shutova, Lun-Wei Ku, Kilian Evang, and Soujanya Poria (Eds.). Association for Computational Linguistics, 250–255. https://doi.org/10.18653/v1/s19-1027Google ScholarCross Ref
Kaiyu Yang, Jia Deng, and Danqi Chen. 2022. Generating Natural Language Proofs with Verifier-Guided Search. CoRR abs/2205.12443(2022). https://doi.org/10.48550/arXiv.2205.12443 arXiv:2205.12443Google ScholarCross Ref
Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, and Furu Wei. 2022. Language Models as Inductive Reasoners. CoRR abs/2212.10923(2022). https://doi.org/10.48550/arXiv.2212.10923 arXiv:2212.10923Google ScholarCross Ref
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. In EMNLP. Association for Computational Linguistics, 2369–2380.Google Scholar
Michihiro Yasunaga, Hongyu Ren, Antoine Bosselut, Percy Liang, and Jure Leskovec. 2021. QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering. In NAACL-HLT. Association for Computational Linguistics, 535–546.Google Scholar
Xi Ye, Srinivasan Iyer, Asli Celikyilmaz, Ves Stoyanov, Greg Durrett, and Ramakanth Pasunuru. 2022. Complementary Explanations for Effective In-Context Learning. CoRR abs/2211.13892(2022). https://doi.org/10.48550/arXiv.2211.13892 arXiv:2211.13892Google ScholarCross Ref
Da Yin, Liunian Harold Li, Ziniu Hu, Nanyun Peng, and Kai-Wei Chang. 2021. Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 2115–2129. https://doi.org/10.18653/v1/2021.emnlp-main.162Google ScholarCross Ref
Wenpeng Yin, Dragomir R. Radev, and Caiming Xiong. 2021. DocNLI: A Large-scale Dataset for Document-level Natural Language Inference. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021(Findings of ACL, Vol. ACL/IJCNLP 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 4913–4922. https://doi.org/10.18653/v1/2021.findings-acl.435Google ScholarCross Ref
Nathan Young, Qiming Bao, Joshua Bensemann, and Michael Witbrock. 2022. AbductionRules: Training Transformers to Explain Unexpected Inputs. In Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, May 22-27, 2022, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, 218–227. https://doi.org/10.18653/v1/2022.findings-acl.19Google ScholarCross Ref
Jianxing Yu, Wei Liu, Shuang Qiu, Qinliang Su, Kai Wang, Xiaojun Quan, and Jian Yin. 2020. Low-Resource Generation of Multi-hop Reasoning Questions. In ACL. Association for Computational Linguistics, 6729–6739.Google Scholar
Ping Yu, Tianlu Wang, Olga Golovneva, Badr AlKhamissy, Gargi Ghosh, Mona T. Diab, and Asli Celikyilmaz. 2022. ALERT: Adapting Language Models to Reasoning Tasks. CoRR abs/2212.08286(2022). https://doi.org/10.48550/arXiv.2212.08286 arXiv:2212.08286Google ScholarCross Ref
Weihao Yu, Zihang Jiang, Yanfei Dong, and Jiashi Feng. 2020. ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning. In ICLR. OpenReview.net.Google Scholar
Eric Zelikman, Yuhuai Wu, Jesse Mu, and Noah D. Goodman. 2022. STaR: Bootstrapping Reasoning With Reasoning. CoRR abs/2203.14465(2022).Google Scholar
Rowan Zellers, Yonatan Bisk, Roy Schwartz, and Yejin Choi. 2018. SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (Eds.). Association for Computational Linguistics, 93–104. https://doi.org/10.18653/v1/d18-1009Google ScholarCross Ref
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. HellaSwag: Can a Machine Really Finish Your Sentence?. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 4791–4800. https://doi.org/10.18653/v1/p19-1472Google ScholarCross Ref
Honghua Zhang, Liunian Harold Li, Tao Meng, Kai-Wei Chang, and Guy Van den Broeck. 2022. On the Paradox of Learning to Reason from Data. CoRR abs/2205.11502(2022).Google Scholar
Li Zhang, Qing Lyu, and Chris Callison-Burch. 2020. Reasoning about Goals, Steps, and Temporal Ordering with WikiHow. In EMNLP (1). Association for Computational Linguistics, 4630–4639.Google Scholar
Xikun Zhang, Antoine Bosselut, Michihiro Yasunaga, Hongyu Ren, Percy Liang, Christopher D. Manning, and Jure Leskovec. 2022. GreaseLM: Graph REASoning Enhanced Language Models. In ICLR. OpenReview.net.Google Scholar
Zhuosheng Zhang, Aston Zhang, Mu Li, and Alex Smola. 2022. Automatic Chain of Thought Prompting in Large Language Models. CoRR abs/2210.03493(2022).Google Scholar
Chen Zhao, Chenyan Xiong, Corby Rosset, Xia Song, Paul N. Bennett, and Saurabh Tiwary. 2020. Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention. In ICLR. OpenReview.net.Google Scholar
Chen Zheng and Parisa Kordjamshidi. 2020. SRLGRN: Semantic Role Labeling Graph Reasoning Network. In EMNLP (1). Association for Computational Linguistics, 8881–8891.Google Scholar
Victor Zhong and Luke Zettlemoyer. 2019. E3: Entailment-driven Extracting and Editing for Conversational Machine Reading. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Anna Korhonen, David R. Traum, and Lluís Màrquez (Eds.). Association for Computational Linguistics, 2310–2320. https://doi.org/10.18653/v1/p19-1223Google ScholarCross Ref
Wanjun Zhong, Tingting Ma, Jiahai Wang, Jian Yin, Tiejun Zhao, Chin-Yew Lin, and Nan Duan. 2022. Disentangling Reasoning Capabilities from Language Models with Compositional Reasoning Transformers. CoRR abs/2210.11265(2022).Google Scholar
Wanjun Zhong, Siyuan Wang, Duyu Tang, Zenan Xu, Daya Guo, Yining Chen, Jiahai Wang, Jian Yin, Ming Zhou, and Nan Duan. 2022. Analytical Reasoning of Text. In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, United States, July 10-15, 2022, Marine Carpuat, Marie-Catherine de Marneffe, and Iván Vladimir Meza Ruíz (Eds.). Association for Computational Linguistics, 2306–2319. https://doi.org/10.18653/v1/2022.findings-naacl.177Google ScholarCross Ref
Ben Zhou, Kyle Richardson, Xiaodong Yu, and Dan Roth. 2022. Learning to Decompose: Hypothetical Question Decomposition Based on Comparable Texts. CoRR abs/2210.16865(2022). https://doi.org/10.48550/arXiv.2210.16865 arXiv:2210.16865Google ScholarCross Ref
Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, and Ed Chi. 2022. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. CoRR abs/2205.10625(2022).Google Scholar
Pei Zhou, Rahul Khanna, Seyeon Lee, Bill Yuchen Lin, Daniel Ho, Jay Pujara, and Xiang Ren. 2021. RICA: Evaluating Robust Inference Capabilities Based on Commonsense Axioms. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 7560–7579. https://doi.org/10.18653/v1/2021.emnlp-main.598Google ScholarCross Ref

Index Terms

Natural Language Reasoning, A Survey
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning

Recommendations

Recipe-MPR: A Test Collection for Evaluating Multi-aspect Preference-based Natural Language Retrieval
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

The rise of interactive recommendation assistants has led to a novel domain of natural language (NL) recommendation that would benefit from improved multi-aspect reasoning to retrieve relevant items based on NL statements of preference. Such preference ...
Read More
Integrating analogical reasoning in a natural language understander
IEA/AIE '90: Proceedings of the 3rd international conference on Industrial and engineering applications of artificial intelligence and expert systems - Volume 1

The research described in this paper addresses the problem of integrating analogical reasoning and argumentation into a natural language understanding system. We present an approach to completing an implicit argument-by-analogy as found in a natural ...
Read More
Multi-Perspective Reasoning Transformers
ICMLC '21: Proceedings of the 2021 13th International Conference on Machine Learning and Computing

Machine Reading Comprehension is defined as the ability of machines to read and understand unstructured text and answer questions about it. It is considered as a challenging task with wide range of enterprise applications. Wide range of natural language ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Computing Surveys Just Accepted
ISSN:0360-0300
EISSN:1557-7341
Table of Contents

Copyright © 2024 Copyright held by the owner/author(s).
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Online AM: 9 May 2024
- Accepted: 26 April 2024
- Revised: 9 March 2024
- Received: 6 May 2023
Check for updates
Author Tags
natural language reasoning
pre-trained language models
Qualifiers
- survey
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 186
  Total Downloads
- Downloads (Last 12 months)186
- Downloads (Last 6 weeks)186
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Natural Language Reasoning, A Survey

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Recipe-MPR: A Test Collection for Evaluating Multi-aspect Preference-based Natural Language Retrieval

Integrating analogical reasoning in a natural language understander

Multi-Perspective Reasoning Transformers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Natural Language Reasoning, A Survey

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Recipe-MPR: A Test Collection for Evaluating Multi-aspect Preference-based Natural Language Retrieval

Integrating analogical reasoning in a natural language understander

Multi-Perspective Reasoning Transformers

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media