skip to main content
survey
Free Access
Just Accepted

A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making

Online AM:11 May 2024Publication History
Skip Abstract Section

Abstract

In the field of Sequential Decision Making (SDM), two paradigms have historically vied for supremacy: Automated Planning (AP) and Reinforcement Learning (RL). In the spirit of reconciliation, this paper reviews AP, RL and hybrid methods (e.g., novel learn to plan techniques) for solving Sequential Decision Processes (SDPs), focusing on their knowledge representation: symbolic, subsymbolic or a combination. Additionally, it also covers methods for learning the SDP structure. Finally, we compare the advantages and drawbacks of the existing methods and conclude that neurosymbolic AI poses a promising approach for SDM, since it combines AP and RL with a hybrid knowledge representation.

References

  1. Pieter Abbeel and Andrew Ng. 2004. Learning first-order Markov models for control. NeurIPS 17(2004).Google ScholarGoogle Scholar
  2. Kamal Acharya, Waleed Raza, Carlos Dourado, Alvaro Velasquez, and Houbing Herbert Song. 2023. Neurosymbolic reinforcement learning and planning: A survey. IEEE Trans. Artif. Intell.(2023).Google ScholarGoogle Scholar
  3. David W Aha. 2018. Goal reasoning: Foundations, emerging applications, and prospects. AI Mag. 39, 2 (2018), 3–24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ankuj Arora, Humbert Fiorino, Damien Pellier, Marc Métivier, and Sylvie Pesty. 2018. A review of learning planning action models. Knowl. Eng. Rev. 33(2018), e20.Google ScholarGoogle ScholarCross RefCross Ref
  5. Saurabh Arora and Prashant Doshi. 2021. A survey of inverse reinforcement learning: Challenges, methods and progress. Artif. Intell. 297(2021), 103500.Google ScholarGoogle ScholarCross RefCross Ref
  6. Kavosh Asadi, Evan Cater, Dipendra Misra, and Michael L Littman. 2018. Towards a simple approach to multi-step model-based reinforcement learning. arXiv (2018).Google ScholarGoogle Scholar
  7. Masataro Asai, Hiroshi Kajino, Alex Fukunaga, and Christian Muise. 2022. Classical planning in deep latent space. JAIR 74(2022), 1599–1686.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Marcello Balduccini. 2011. Learning and using domain-specific heuristics in ASP solvers. AI Commun. 24, 2 (2011), 147–164.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. 2016. Interaction networks for learning about objects, relations and physics. NeurIPS 29(2016).Google ScholarGoogle Scholar
  10. Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. 2018. Relational inductive biases, deep learning, and graph networks. arXiv (2018).Google ScholarGoogle Scholar
  11. Dimitri Bertsekas. 2019. Reinforcement learning and optimal control. Athena Scientific.Google ScholarGoogle Scholar
  12. Avrim L Blum and Merrick L Furst. 1997. Fast planning through planning graph analysis. Artif. Intell. 90(1997), 281–300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Blai Bonet and Hector Geffner. 2003. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming.. In ICAPS, Vol.  3. 12–21.Google ScholarGoogle Scholar
  14. Adi Botea, Markus Enzenberger, Martin Müller, and Jonathan Schaeffer. 2005. Macro-FF: Improving AI planning with automatically learned macro-operators. JAIR 24(2005), 581–621.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. NeurIPS 33(2020), 1877–1901.Google ScholarGoogle Scholar
  16. Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1 (2012), 1–43.Google ScholarGoogle ScholarCross RefCross Ref
  17. Alan Bundy and Lincoln Wallen. 1984. Breadth-first search. Catalogue of artificial intelligence tools(1984), 13–13.Google ScholarGoogle Scholar
  18. Quentin Cappart, Didier Chételat, Elias B. Khalil, Andrea Lodi, Christopher Morris, and Petar Velickovic. 2021. Combinatorial Optimization and Reasoning with Graph Neural Networks. In IJCAI. 4348–4355.Google ScholarGoogle Scholar
  19. Luis A Castillo, Juan Fernández-Olivares, Oscar Garcia-Perez, and Francisco Palao. 2006. Efficiently Handling Temporal Knowledge in an HTN Planner.. In ICAPS. 63–72.Google ScholarGoogle Scholar
  20. Tristan Cazenave. 2006. Optimizations of data structures, heuristics and algorithms for path-finding on maps. In 2006 IEEE Symp. Comp. Intell. Games. 27–33.Google ScholarGoogle ScholarCross RefCross Ref
  21. Tathagata Chakraborti, Sarath Sreedharan, and Subbarao Kambhampati. 2020. The Emerging Landscape of Explainable Automated Planning & Decision Making. In IJCAI. 4803–4811.Google ScholarGoogle Scholar
  22. Michael Chang, Tomer D. Ullman, Antonio Torralba, and Joshua B. Tenenbaum. 2017. A Compositional Object-Based Approach to Learning Physical Dynamics. In ICLR. OpenReview.net.Google ScholarGoogle Scholar
  23. Arthur Charpentier, Romuald Elie, and Carl Remlinger. 2021. Reinforcement learning in economics and finance. Comput. Econ. (2021), 1–38.Google ScholarGoogle Scholar
  24. Kevin Chen, Nithin Shrivatsav Srikanth, David Kent, Harish Ravichandar, and Sonia Chernova. 2021. Learning hierarchical task networks with preferences from unannotated demonstrations. In CoRL. 1572–1581.Google ScholarGoogle Scholar
  25. Silvia Chiappa, Sébastien Racanière, Daan Wierstra, and Shakir Mohamed. 2017. Recurrent environment simulators. In ICLR. OpenReview.net.Google ScholarGoogle Scholar
  26. Jaedeug Choi and Kee-Eung Kim. 2011. Map inference for bayesian inverse reinforcement learning. NeurIPS 24(2011).Google ScholarGoogle Scholar
  27. Lonnie Chrisman. 1992. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In AAAI, Vol.  1992. 183–188.Google ScholarGoogle Scholar
  28. Andrew I Coles and Amanda J Smith. 2007. Marvin: A heuristic search planner with online macro-action learning. JAIR 28(2007), 119–156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Marc Deisenroth and Carl E Rasmussen. 2011. PILCO: A model-based and data-efficient approach to policy search. In ICML. 465–472.Google ScholarGoogle Scholar
  30. S Depeweg, JM Hernández-Lobato, F Doshi-Velez, and S Udluft. 2017. Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks. In ICLR. OpenReview.net.Google ScholarGoogle Scholar
  31. Carlos Diuk, Andre Cohen, and Michael L Littman. 2008. An object-oriented representation for efficient reinforcement learning. In ICML. 240–247.Google ScholarGoogle Scholar
  32. Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, and Denny Zhou. 2019. Neural Logic Machines. In ICLR. OpenReview.net.Google ScholarGoogle Scholar
  33. Denise Draper, Steve Hanks, and Daniel S Weld. 1994. Probabilistic planning with information gathering and contingent execution. In AIPS. 31–36.Google ScholarGoogle Scholar
  34. Sašo Džeroski, Luc De Raedt, and Kurt Driessens. 2001. Relational reinforcement learning. Mach. Learn. 43(2001), 7–52.Google ScholarGoogle ScholarCross RefCross Ref
  35. Stefan Edelkamp. 2001. Planning with pattern databases. In Proc. ECP, Vol.  1. 13–24.Google ScholarGoogle Scholar
  36. Stefan Edelkamp. 2007. Automated creation of pattern database search heuristics. Lect. Notes Comput. Sci. 4428 (2007), 35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Gregory Farquhar, Tim Rockt aschel, Maximilian Igl, and Shimon Whiteson. 2018. TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning. In ICLR.Google ScholarGoogle Scholar
  38. Zhengzhu Feng and Eric A Hansen. 2002. Symbolic heuristic search for factored Markov decision processes. In AAAI/IAAI. 455–460.Google ScholarGoogle Scholar
  39. Alan Fern, Sung Wook Yoon, and Robert Givan. 2004. Learning Domain-Specific Control Knowledge from Random Walks. In ICAPS. 191–199.Google ScholarGoogle Scholar
  40. Seyedshams Feyzabadi and Stefano Carpin. 2017. Planning using hierarchical constrained Markov decision processes. Auton. Robots 41(2017), 1589–1607.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Richard E Fikes, Peter E Hart, and Nils J Nilsson. 1972. Learning and executing generalized robot plans. Artif. Intell. 3(1972), 251–288.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Sébastien Forestier, Rémy Portelas, Yoan Mollard, and Pierre-Yves Oudeyer. 2022. Intrinsically motivated goal exploration processes with automatic curriculum learning. JMLR 23, 1 (2022), 6818–6858.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Maria Fox and Derek Long. 1998. The automatic inference of state invariants in TIM. JAIR 9(1998), 367–421.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Raquel Fuentetaja and Tomás De la Rosa. 2012. A planning-based approach for generating planning problems. In Workshops at AAAI.Google ScholarGoogle Scholar
  45. Artur d’Avila Garcez, Sebastian Bader, Howard Bowman, Luis C Lamb, Leo de Penning, BV Illuminoo, Hoifung Poon, and COPPE Gerson Zaverucha. 2022. Neural-symbolic learning and reasoning: a survey and interpretation. Neuro-Symbolic Artificial Intelligence: The State of the Art 342 (2022).Google ScholarGoogle Scholar
  46. Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. JMLR 16(2015), 1437–1480.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Marta Garnelo, Kai Arulkumaran, and Murray Shanahan. 2016. Towards deep symbolic reinforcement learning. arXiv (2016).Google ScholarGoogle Scholar
  48. Marta Garnelo and Murray Shanahan. 2019. Reconciling deep learning with symbolic artificial intelligence: representing objects and relations. Curr. Opin. Behav. Sci. 29 (2019), 17–23.Google ScholarGoogle ScholarCross RefCross Ref
  49. Clement Gehring, Masataro Asai, Rohan Chitnis, Tom Silver, Leslie Kaelbling, Shirin Sohrabi, and Michael Katz. 2022. Reinforcement learning for classical planning: Viewing heuristics as dense reward generators. In ICAPS, Vol.  32. 588–596.Google ScholarGoogle Scholar
  50. Ilche Georgievski and Marco Aiello. 2015. HTN planning: Overview, comparison, and beyond. Artif. Intell. 222(2015), 124–156.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Alfonso Gerevini and Lenhart Schubert. 1998. Inferring state constraints for domain-independent planning. In AAAI. 905–912.Google ScholarGoogle Scholar
  52. Malik Ghallab, Dana Nau, and Paolo Traverso. 2016. Automated planning and acting. Cambridge University Press.Google ScholarGoogle Scholar
  53. Christopher Grimm, André Barreto, Satinder Singh, and David Silver. 2020. The value equivalence principle for model-based reinforcement learning. NeurIPS 33(2020), 5541–5552.Google ScholarGoogle Scholar
  54. Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. 2016. Continuous deep q-learning with model-based acceleration. In ICML. 2829–2838.Google ScholarGoogle Scholar
  55. Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, et al. 2019. An investigation of model-free planning. In ICML. 2464–2473.Google ScholarGoogle Scholar
  56. Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, and David Silver. 2018. Learning to search with MCTSnets. In ICML. 1822–1831.Google ScholarGoogle Scholar
  57. David Gunning and David Aha. 2019. DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 40(2019), 44–58.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Alexander Hans, Daniel Schneegaß, Anton Maximilian Schäfer, and Steffen Udluft. 2008. Safe exploration for reinforcement learning.. In ESANN. 143–148.Google ScholarGoogle Scholar
  59. Eric A Hansen and Shlomo Zilberstein. 2001. LAO*: A heuristic search algorithm that finds solutions with loops. Artif. Intell. 129, 1-2 (2001), 35–62.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Peter E Hart, Nils J Nilsson, and Bertram Raphael. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4, 2 (1968), 100–107.Google ScholarGoogle ScholarCross RefCross Ref
  61. Patrik Haslum, Nir Lipovetzky, Daniele Magazzeni, and Christian Muise. 2019. An introduction to the planning domain definition language. Synth. Lect. Artif. Intell. Mach. Learn. 13 (2019), 1–187.Google ScholarGoogle ScholarCross RefCross Ref
  62. Matthew Hausknecht and Peter Stone. 2015. Deep recurrent q-learning for partially observable mdps. In AAAI.Google ScholarGoogle Scholar
  63. Matthias Heger. 1994. Consideration of risk in reinforcement learning. In Mach. Learn.Elsevier, 105–111.Google ScholarGoogle Scholar
  64. Malte Helmert. 2006. The fast downward planning system. JAIR 26(2006), 191–246.Google ScholarGoogle ScholarCross RefCross Ref
  65. Malte Helmert. 2009. Concise finite-domain representations for PDDL planning tasks. Artif. Intell. 173(2009), 503–535.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Todd Hester and Peter Stone. 2013. Texplore: real-time sample-efficient reinforcement learning for robots. Mach. Learn. 90(2013), 385–429.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Thomas Hickling, Abdelhafid Zenati, Nabil Aouf, and Phillippa Spencer. 2023. Explainability in Deep Reinforcement Learning: A Review into Current Methods and Applications. ACM Comput. Surv. 56, 5 (2023), 1–35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Jesse Hoey, Robert St-Aubin, Alan Hu, and Craig Boutilier. 1999. SPUDD: stochastic planning using decision diagrams. In UAI. 279–288.Google ScholarGoogle Scholar
  69. Jörg Hoffmann. 2001. FF: The fast-forward planning system. AI Mag. 22, 3 (2001), 57–57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Jörg Hoffmann, Julie Porteous, and Laura Sebastia. 2004. Ordered landmarks in planning. JAIR 22(2004), 215–278.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Chad Hogg, Héctor Munoz-Avila, and Ugur Kuter. 2008. HTN-MAKER: Learning HTNs with Minimal Additional Knowledge Engineering Required.. In AAAI. 950–956.Google ScholarGoogle Scholar
  72. Daniel Höller and Pascal Bercher. 2021. Landmark generation in HTN planning. In AAAI, Vol.  35. 11826–11834.Google ScholarGoogle Scholar
  73. Richard Howey, Derek Long, and Maria Fox. 2004. VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL. In IEEE Int. Conf. Tools Artif. Intell.294–301.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Sergio Jiménez, Tomás De La Rosa, Susana Fernández, Fernando Fernández, and Daniel Borrajo. 2012. A review of machine learning for automated planning. Knowl. Eng. Rev. 27, 4 (2012), 433–467.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. S Jiménez, F Fernández, and D Borrajo. 2008. The PELA architecture: integrating planning and learning to improve execution. In AAAI. AAAI Press.Google ScholarGoogle Scholar
  76. Sergio Jiménez, Javier Segovia-Aguas, and Anders Jonsson. 2019. A review of generalized planning. Knowl. Eng. Rev. 34(2019), e5.Google ScholarGoogle ScholarCross RefCross Ref
  77. Mu Jin, Zhihao Ma, Kebing Jin, Hankz Hankui Zhuo, Chen Chen, and Chao Yu. 2022. Creativity of ai: Automatic symbolic option discovery for facilitating deep reinforcement learning. In AAAI, Vol.  36. 7042–7050.Google ScholarGoogle Scholar
  78. Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Błażej Osiński, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, et al. 2020. Model Based Reinforcement Learning for Atari. In ICLR. OpenReview.net.Google ScholarGoogle Scholar
  79. Ken Kansky, Tom Silver, David A Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, and Dileep George. 2017. Schema networks: Zero-shot transfer with a generative causal model of intuitive physics. In ICML. 1809–1818.Google ScholarGoogle Scholar
  80. Erez Karpas and Carmel Domshlak. 2009. Cost-Optimal Planning with Landmarks.. In IJCAI. 1728–1733.Google ScholarGoogle Scholar
  81. Michael Katz and Shirin Sohrabi. 2020. Generating Data In Planning: SAS Planning Tasks of a Given Causal Structure. HSDIP (2020), 41.Google ScholarGoogle Scholar
  82. Michael Katz, Kavitha Srinivas, Shirin Sohrabi, Mark Feblowitz, Octavian Udrea, and Oktie Hassanzadeh. 2021. Scenario planning in the wild: A neuro-symbolic approach. FinPlan 15(2021).Google ScholarGoogle Scholar
  83. Emil Keyder, Jörg Hoffmann, and Patrik Haslum. 2014. Improving delete relaxation heuristics through explicitly represented conjunctions. JAIR 50(2014), 487–533.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. S Mohammad Khansari-Zadeh and Aude Billard. 2011. Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Trans. Robot. 27, 5 (2011), 943–957.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. DP Kingma and M Welling. 2014. Auto-encoding variational Bayes international. In ICLR.Google ScholarGoogle Scholar
  86. Thomas N. Kipf, Elise van der Pol, and Max Welling. 2020. Contrastive Learning of Structured World Models. In ICLR. OpenReview.net.Google ScholarGoogle Scholar
  87. Jens Kober, J Andrew Bagnell, and Jan Peters. 2013. Reinforcement learning in robotics: A survey. Int. J. Rob. Res. 32, 11 (2013), 1238–1274.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Daphne Koller, Nir Friedman, Sašo Džeroski, Charles Sutton, Andrew McCallum, Avi Pfeffer, Pieter Abbeel, Ming-Fai Wong, Chris Meek, Jennifer Neville, et al. 2007. Introduction to statistical relational learning. MIT press.Google ScholarGoogle Scholar
  89. George Dimitri Konidaris and Andrew G Barto. 2007. Building Portable Options: Skill Transfer in Reinforcement Learning.. In IJCAI, Vol.  7. 895–900.Google ScholarGoogle Scholar
  90. Rogier Koppejan and Shimon Whiteson. 2011. Neuroevolutionary reinforcement learning for generalized control of simulated helicopters. Evol. Intell. 4(2011), 219–241.Google ScholarGoogle ScholarCross RefCross Ref
  91. Richard E Korf. 1985. Macro-operators: A weak method for learning. Artif. Intell. 26(1985), 35–77.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Stefan Kramer. 1996. Structural regression trees. In AAAI. 812–819.Google ScholarGoogle Scholar
  93. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. ACM Commun. 60(2017), 84–90.Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Mikel Landajuela, Brenden K Petersen, Sookyung Kim, Claudio P Santiago, Ruben Glatt, Nathan Mundhenk, Jacob F Pettit, and Daniel Faissol. 2021. Discovering symbolic policies with deep reinforcement learning. In ICML. 5979–5989.Google ScholarGoogle Scholar
  95. Adrien Laversanne-Finot, Alexandre Pere, and Pierre-Yves Oudeyer. 2018. Curiosity driven exploration of learned disentangled goal spaces. In CoRL. 487–504.Google ScholarGoogle Scholar
  96. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521(2015), 436–444.Google ScholarGoogle Scholar
  97. Yuxi Li. 2018. Deep Reinforcement Learning. arXiv (2018).Google ScholarGoogle Scholar
  98. Michael Lederman Littman. 1996. Algorithms for sequential decision-making. Brown University.Google ScholarGoogle Scholar
  99. Derek Long and Maria Fox. 1999. Efficient implementation of the plan graph in STAN. JAIR 10(1999), 87–115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. William S Lovejoy. 1991. A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res. 28, 1 (1991), 47–65.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Daoming Lyu, Fangkai Yang, Bo Liu, and Steven Gustafson. 2019. SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In AAAI, Vol.  33. 2970–2977.Google ScholarGoogle Scholar
  102. Marlos C Machado, Marc G Bellemare, and Michael Bowling. 2017. A laplacian framework for option discovery in reinforcement learning. In ICML. 2295–2304.Google ScholarGoogle Scholar
  103. Maurício Cecílio Magnaguagno, RAMON FRAGA PEREIRA, Martin Duarte Móre, and Felipe Rech Meneguzzi. 2017. Web planner: A tool to develop classical planning domains and visualize heuristic state-space search. In ICAPS UISP Workshop.Google ScholarGoogle Scholar
  104. Gary Marcus. 2018. Deep learning: A critical appraisal. arXiv (2018).Google ScholarGoogle Scholar
  105. Amy McGovern and Richard S Sutton. 1998. Macro-actions in reinforcement learning: An empirical analysis. Computer Science Department Faculty Publication Series (1998), 15.Google ScholarGoogle Scholar
  106. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In ICML. 1928–1937.Google ScholarGoogle Scholar
  107. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv (2013).Google ScholarGoogle Scholar
  108. Thomas M Moerland, Joost Broekens, and Catholijn M Jonker. 2020. A framework for reinforcement learning and planning. arXiv (2020).Google ScholarGoogle Scholar
  109. Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker, et al. 2023. Model-based reinforcement learning: A survey. Found. Trends Mach. Learn. 16, 1 (2023), 1–118.Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. Matthew Molineaux, Matthew Klenk, and David Aha. 2010. Goal-driven autonomy in a Navy strategy simulation. In AAAI, Vol.  24. 1548–1554.Google ScholarGoogle Scholar
  111. Kira Mourao, Ronald PA Petrick, and Mark Steedman. 2008. Using kernel perceptrons to learn action effects for planning. In CogSys. 45–50.Google ScholarGoogle Scholar
  112. Mausam Natarajan and Andrey Kolobov. 2022. Planning with Markov decision processes: An AI perspective. Springer Nature.Google ScholarGoogle Scholar
  113. Dana S Nau, Tsz-Chiu Au, Okhtay Ilghami, Ugur Kuter, J William Murdock, Dan Wu, and Fusun Yaman. 2003. SHOP2: An HTN planning system. JAIR 20(2003), 379–404.Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Negin Nejati, Pat Langley, and Tolga Konik. 2006. Learning hierarchical task networks by observation. In ICML. 665–672.Google ScholarGoogle Scholar
  115. Andrew Y Ng, Stuart Russell, et al. 2000. Algorithms for inverse reinforcement learning.. In ICML, Vol.  1. 2.Google ScholarGoogle Scholar
  116. Carlos Núñez-Molina, Juan Fernández-Olivares, and Raúl Pérez. 2022. Learning to select goals in Automated Planning with Deep-Q Learning. Expert Syst. Appl. 202(2022), 117265.Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Carlos Núñez-Molina, Pablo Mesejo, and Juan Fernández-Olivares. 2023. NeSIG: A Neuro-Symbolic Method for Learning to Generate Planning Problems. arXiv (2023).Google ScholarGoogle Scholar
  118. Tim Oates and Paul R Cohen. 1996. Searching for planning operators with context-dependent and probabilistic effects. In AAAI. 863–868.Google ScholarGoogle Scholar
  119. Junhyuk Oh, Satinder Singh, and Honglak Lee. 2017. Value prediction network. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  120. Ioannis Partalas, Dimitris Vrakas, and Ioannis Vlahavas. 2008. Reinforcement learning and automated planning: A survey. In Artificial Intelligence for Advanced Problem Solving Techniques. IGI Global, 148–165.Google ScholarGoogle Scholar
  121. Razvan Pascanu, Yujia Li, Oriol Vinyals, Nicolas Heess, Lars Buesing, Sebastien Racanière, David Reichert, Théophane Weber, Daan Wierstra, and Peter Battaglia. 2017. Learning model-based planning from scratch. arXiv (2017).Google ScholarGoogle Scholar
  122. Hanna M Pasula, Luke S Zettlemoyer, and Leslie Pack Kaelbling. 2007. Learning symbolic models of stochastic domains. JAIR 29(2007), 309–352.Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. Ramon Fraga Pereira, Nir Oren, and Felipe Meneguzzi. 2020. Landmark-based approaches for goal recognition as planning. Artif. Intell. 279(2020), 103217.Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Aske Plaat, Walter Kosters, and Mike Preuss. 2023. High-accuracy model-based reinforcement learning, a survey. Artif. Intell. Rev. (2023), 1–33.Google ScholarGoogle Scholar
  125. Alberto Pozanco, Susana Fernández, and Daniel Borrajo. 2018. Learning-driven goal generation. AI Commun. 31, 2 (2018), 137–150.Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. Miquel Ramırez and Hector Geffner. 2009. Plan recognition as planning. In IJCAI. 1778–1783.Google ScholarGoogle Scholar
  127. Jussi Rintanen. 2008. Regression for classical and nondeterministic planning. In ECAI. IOS Press, 568–572.Google ScholarGoogle Scholar
  128. Alexander Rovner, Silvan Sievers, and Malte Helmert. 2019. Counterexample-guided abstraction refinement for pattern selection in optimal classical planning. In ICAPS, Vol.  29. 362–367.Google ScholarGoogle Scholar
  129. Stuart J. Russell and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach (4th Edition). Pearson.Google ScholarGoogle Scholar
  130. Earl D Sacerdoti. 1975. The nonlinear nature of plans. Technical Report. Stanford Research Inst. Menlo Park CA.Google ScholarGoogle Scholar
  131. Javad Safaei and Gholamreza Ghassem-Sani. 2007. Incremental learning of planning operators in stochastic domains. In SOFSEM. 644–655.Google ScholarGoogle Scholar
  132. Scott Sanner et al. 2010. Relational dynamic influence diagram language (rddl): Language description. Unpublished ms. Australian National University 32 (2010), 27.Google ScholarGoogle Scholar
  133. Björn Schäpers, Tim Niemueller, Gerhard Lakemeyer, Martin Gebser, and Torsten Schaub. 2018. ASP-based time-bounded planning for logistics robots. In ICAPS, Vol.  28. 509–517.Google ScholarGoogle Scholar
  134. Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. 2020. Mastering atari, go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604–609.Google ScholarGoogle Scholar
  135. José Á Segura-Muros, Raúl Pérez, and Juan Fernández-Olivares. 2021. Discovering relational and numerical expressions from plan traces for learning action models. Appl. Intell. 51(2021), 7973–7989.Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Ashish Kumar Shakya, Gopinatha Pillai, and Sohom Chakrabarty. 2023. Reinforcement Learning Algorithms: A brief survey. Expert Syst. Appl. (2023), 120495.Google ScholarGoogle Scholar
  137. William Shen, Felipe Trevizan, and Sylvie Thiébaux. 2020. Learning domain-independent planning heuristics with hypergraph networks. In ICAPS, Vol.  30. 574–584.Google ScholarGoogle Scholar
  138. W M Shen and H A Simon. 1989. Rule Creation and Rule Learning Through Environmental Exploration. In IJCAI. Morgan Kaufmann, 675–680.Google ScholarGoogle Scholar
  139. Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. NeurIPS 28(2015).Google ScholarGoogle Scholar
  140. David Silver, Hado Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, et al. 2017. The predictron: End-to-end learning and planning. In ICML. 3191–3199.Google ScholarGoogle Scholar
  141. David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 6419 (2018), 1140–1144.Google ScholarGoogle Scholar
  142. Satinder Singh, Tommi Jaakkola, and Michael Jordan. 1994. Reinforcement learning with soft state aggregation. NeurIPS 7(1994).Google ScholarGoogle Scholar
  143. Shirin Sohrabi, Anton V Riabov, and Octavian Udrea. 2016. Plan Recognition as Planning Revisited.. In IJCAI. 3258–3264.Google ScholarGoogle Scholar
  144. Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, and Chelsea Finn. 2018. Universal planning networks: Learning generalizable representations for visuomotor control. In ICML. 4732–4741.Google ScholarGoogle Scholar
  145. Martin Stolle and Doina Precup. 2002. Learning options in reinforcement learning. In SARA. 212–223.Google ScholarGoogle Scholar
  146. Richard S Sutton. 1991. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin 2, 4 (1991), 160–163.Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. Richard S Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 1-2 (1999), 181–211.Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, and Michael H. Bowling. 2008. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping. In UAI. AUAI Press, 528–536.Google ScholarGoogle Scholar
  150. Prasad Tadepalli, Robert Givan, and Kurt Driessens. 2004. Relational reinforcement learning: An overview. In ICML workshop on relational reinforcement learning. 1–9.Google ScholarGoogle Scholar
  151. Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, and Pieter Abbeel. 2016. Value iteration networks. NeurIPS 29(2016).Google ScholarGoogle Scholar
  152. Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM J. Comput. 1, 2 (1972), 146–160.Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. Austin Tate. 1977. Generating project networks. In IJCAI. 888–893.Google ScholarGoogle Scholar
  154. Andrea Lockerd Thomaz, Cynthia Breazeal, et al. 2006. Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance. In AAAI, Vol.  6. 1000–1005.Google ScholarGoogle Scholar
  155. Alvaro Torralba, Jendrik Seipp, and Silvan Sievers. 2021. Automatic instance generation for classical planning. In ICAPS, Vol.  31. 376–384.Google ScholarGoogle Scholar
  156. Sam Toyer, Felipe Trevizan, Sylvie Thiébaux, and Lexing Xie. 2018. Action schema networks: Generalised policies with deep learning. In AAAI, Vol.  32.Google ScholarGoogle Scholar
  157. Felipe W Trevizan and Manuela M Veloso. 2014. Depth-based short-sighted stochastic shortest path problems. Artif. Intell. 216(2014), 179–205.Google ScholarGoogle ScholarDigital LibraryDigital Library
  158. Jes ús Virseda, Daniel Borrajo, and Vidal Alcázar. 2013. Learning heuristic functions for cost-based planning. Planning and Learning 4(2013).Google ScholarGoogle Scholar
  159. Mauro Vallati, Lukas Chrpa, Marek Grześ, Thomas Leo McCluskey, Mark Roberts, Scott Sanner, et al. 2015. The 2014 international planning competition: Progress and trends. Ai Mag. 36, 3 (2015), 90–98.Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. NeurIPS 30(2017).Google ScholarGoogle Scholar
  161. Niklas Wahlström, Thomas B Schön, and Marc Peter Deisenroth. 2015. From pixels to torques: Policy learning with deep dynamical models. arXiv (2015).Google ScholarGoogle Scholar
  162. Thomas J Walsh and Michael L Littman. 2008. Efficient learning of action schemas and web-service descriptions. In AAAI, Vol.  8. 714–719.Google ScholarGoogle Scholar
  163. William Yang Wang, Jiwei Li, and Xiaodong He. 2018. Deep reinforcement learning for NLP. In ACL: Tutorial Abstracts. 19–21.Google ScholarGoogle Scholar
  164. Xuemei Wang. 1996. Learning planning operators by observation and practice. Ph. D. Dissertation. Carnegie Mellon University.Google ScholarGoogle Scholar
  165. Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards. Ph. D. Dissertation. King’s College.Google ScholarGoogle Scholar
  166. Ben Weber, Michael Mateas, and Arnav Jhala. 2012. Learning from demonstration for goal-driven autonomy. In AAAI, Vol.  26. 1176–1182.Google ScholarGoogle Scholar
  167. Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning(1992), 5–32.Google ScholarGoogle Scholar
  168. Qiang Yang, Kangheng Wu, and Yunfei Jiang. 2007. Learning action models from plan examples using weighted MAX-SAT. Artif. Intell. 171, 2-3 (2007), 107–143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  169. Sungwook Yoon and Subbarao Kambhampati. 2007. Towards model-lite planning: A proposal for learning & planning with incomplete domain models. In ICAPS Workshop on Artificial Intelligence Planning and Learning.Google ScholarGoogle Scholar
  170. Sung Wook Yoon, Alan Fern, and Robert Givan. 2006. Learning Heuristic Functions from Relaxed Plans.. In ICAPS, Vol.  2. 3.Google ScholarGoogle Scholar
  171. Sung Wook Yoon, Alan Fern, and Robert Givan. 2007. FF-Replan: A Baseline for Probabilistic Planning.. In ICAPS, Vol.  7. 352–359.Google ScholarGoogle Scholar
  172. Håkan LS Younes and Michael L Littman. 2004. PPDDL1. 0: An extension to PDDL for expressing planning domains with probabilistic effects. Techn. Rep. CMU-CS-04-162 2 (2004), 99.Google ScholarGoogle Scholar
  173. Chao Yu, Xuejing Zheng, Hankz Hankui Zhuo, Hai Wan, and Weilin Luo. 2023. Reinforcement Learning with Knowledge Representation and Reasoning: A Brief Survey. arXiv (2023).Google ScholarGoogle Scholar
  174. Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, et al. 2019. Deep reinforcement learning with relational inductive biases. In ICLR.Google ScholarGoogle Scholar
  175. Hankz Hankui Zhuo and Qiang Yang. 2014. Action-model acquisition for planning via transfer learning. Artif. Intell. 212(2014), 80–103.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making

                                                        Recommendations

                                                        Comments

                                                        Login options

                                                        Check if you have access through your login credentials or your institution to get full access on this article.

                                                        Sign in

                                                        Full Access

                                                        • Published in

                                                          cover image ACM Computing Surveys
                                                          ACM Computing Surveys Just Accepted
                                                          ISSN:0360-0300
                                                          EISSN:1557-7341
                                                          Table of Contents

                                                          Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

                                                          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                                                          Publisher

                                                          Association for Computing Machinery

                                                          New York, NY, United States

                                                          Publication History

                                                          • Online AM: 11 May 2024
                                                          • Accepted: 13 April 2024
                                                          • Revised: 5 April 2024
                                                          • Received: 5 May 2023

                                                          Check for updates

                                                          Author Tags

                                                          Qualifiers

                                                          • survey
                                                        • Article Metrics

                                                          • Downloads (Last 12 months)35
                                                          • Downloads (Last 6 weeks)35

                                                          Other Metrics

                                                        PDF Format

                                                        View or Download as a PDF file.

                                                        PDF

                                                        eReader

                                                        View online with eReader.

                                                        eReader