survey

Free Access

Just Accepted

A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making

Authors:
Carlos Núñez-Molina

Department of Computer Science and AI, Universidad de Granada, Granada, Spain

Department of Computer Science and AI, Universidad de Granada, Granada, Spain

0000-0003-1450-7323
Search about this author

,
Pablo Mesejo

Department of Computer Science and AI, Universidad de Granada, Granada, Spain

Department of Computer Science and AI, Universidad de Granada, Granada, Spain

0000-0001-9955-2101
Search about this author

,
Juan Fernández-Olivares

Department of Computer Science and AI, Universidad de Granada, Granada, Spain

Department of Computer Science and AI, Universidad de Granada, Granada, Spain

0000-0002-7391-882X
Search about this author

Authors Info & Claims

ACM Computing SurveysAccepted on April 2024https://doi.org/10.1145/3663366

Online AM:11 May 2024Publication History

ACM Computing Surveys

Abstract

In the field of Sequential Decision Making (SDM), two paradigms have historically vied for supremacy: Automated Planning (AP) and Reinforcement Learning (RL). In the spirit of reconciliation, this paper reviews AP, RL and hybrid methods (e.g., novel learn to plan techniques) for solving Sequential Decision Processes (SDPs), focusing on their knowledge representation: symbolic, subsymbolic or a combination. Additionally, it also covers methods for learning the SDP structure. Finally, we compare the advantages and drawbacks of the existing methods and conclude that neurosymbolic AI poses a promising approach for SDM, since it combines AP and RL with a hybrid knowledge representation.

References

Pieter Abbeel and Andrew Ng. 2004. Learning first-order Markov models for control. NeurIPS 17(2004).Google Scholar
Kamal Acharya, Waleed Raza, Carlos Dourado, Alvaro Velasquez, and Houbing Herbert Song. 2023. Neurosymbolic reinforcement learning and planning: A survey. IEEE Trans. Artif. Intell.(2023).Google Scholar
David W Aha. 2018. Goal reasoning: Foundations, emerging applications, and prospects. AI Mag. 39, 2 (2018), 3–24.Google ScholarDigital Library
Ankuj Arora, Humbert Fiorino, Damien Pellier, Marc Métivier, and Sylvie Pesty. 2018. A review of learning planning action models. Knowl. Eng. Rev. 33(2018), e20.Google ScholarCross Ref
Saurabh Arora and Prashant Doshi. 2021. A survey of inverse reinforcement learning: Challenges, methods and progress. Artif. Intell. 297(2021), 103500.Google ScholarCross Ref
Kavosh Asadi, Evan Cater, Dipendra Misra, and Michael L Littman. 2018. Towards a simple approach to multi-step model-based reinforcement learning. arXiv (2018).Google Scholar
Masataro Asai, Hiroshi Kajino, Alex Fukunaga, and Christian Muise. 2022. Classical planning in deep latent space. JAIR 74(2022), 1599–1686.Google ScholarDigital Library
Marcello Balduccini. 2011. Learning and using domain-specific heuristics in ASP solvers. AI Commun. 24, 2 (2011), 147–164.Google ScholarDigital Library
Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. 2016. Interaction networks for learning about objects, relations and physics. NeurIPS 29(2016).Google Scholar
Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. 2018. Relational inductive biases, deep learning, and graph networks. arXiv (2018).Google Scholar
Dimitri Bertsekas. 2019. Reinforcement learning and optimal control. Athena Scientific.Google Scholar
Avrim L Blum and Merrick L Furst. 1997. Fast planning through planning graph analysis. Artif. Intell. 90(1997), 281–300.Google ScholarDigital Library
Blai Bonet and Hector Geffner. 2003. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming.. In ICAPS, Vol. 3. 12–21.Google Scholar
Adi Botea, Markus Enzenberger, Martin Müller, and Jonathan Schaeffer. 2005. Macro-FF: Improving AI planning with automatically learned macro-operators. JAIR 24(2005), 581–621.Google ScholarDigital Library
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. NeurIPS 33(2020), 1877–1901.Google Scholar
Cameron B Browne, Edward Powley, Daniel Whitehouse, Simon M Lucas, Peter I Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis, and Simon Colton. 2012. A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1 (2012), 1–43.Google ScholarCross Ref
Alan Bundy and Lincoln Wallen. 1984. Breadth-first search. Catalogue of artificial intelligence tools(1984), 13–13.Google Scholar
Quentin Cappart, Didier Chételat, Elias B. Khalil, Andrea Lodi, Christopher Morris, and Petar Velickovic. 2021. Combinatorial Optimization and Reasoning with Graph Neural Networks. In IJCAI. 4348–4355.Google Scholar
Luis A Castillo, Juan Fernández-Olivares, Oscar Garcia-Perez, and Francisco Palao. 2006. Efficiently Handling Temporal Knowledge in an HTN Planner.. In ICAPS. 63–72.Google Scholar
Tristan Cazenave. 2006. Optimizations of data structures, heuristics and algorithms for path-finding on maps. In 2006 IEEE Symp. Comp. Intell. Games. 27–33.Google ScholarCross Ref
Tathagata Chakraborti, Sarath Sreedharan, and Subbarao Kambhampati. 2020. The Emerging Landscape of Explainable Automated Planning & Decision Making. In IJCAI. 4803–4811.Google Scholar
Michael Chang, Tomer D. Ullman, Antonio Torralba, and Joshua B. Tenenbaum. 2017. A Compositional Object-Based Approach to Learning Physical Dynamics. In ICLR. OpenReview.net.Google Scholar
Arthur Charpentier, Romuald Elie, and Carl Remlinger. 2021. Reinforcement learning in economics and finance. Comput. Econ. (2021), 1–38.Google Scholar
Kevin Chen, Nithin Shrivatsav Srikanth, David Kent, Harish Ravichandar, and Sonia Chernova. 2021. Learning hierarchical task networks with preferences from unannotated demonstrations. In CoRL. 1572–1581.Google Scholar
Silvia Chiappa, Sébastien Racanière, Daan Wierstra, and Shakir Mohamed. 2017. Recurrent environment simulators. In ICLR. OpenReview.net.Google Scholar
Jaedeug Choi and Kee-Eung Kim. 2011. Map inference for bayesian inverse reinforcement learning. NeurIPS 24(2011).Google Scholar
Lonnie Chrisman. 1992. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In AAAI, Vol. 1992. 183–188.Google Scholar
Andrew I Coles and Amanda J Smith. 2007. Marvin: A heuristic search planner with online macro-action learning. JAIR 28(2007), 119–156.Google ScholarDigital Library
Marc Deisenroth and Carl E Rasmussen. 2011. PILCO: A model-based and data-efficient approach to policy search. In ICML. 465–472.Google Scholar
S Depeweg, JM Hernández-Lobato, F Doshi-Velez, and S Udluft. 2017. Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks. In ICLR. OpenReview.net.Google Scholar
Carlos Diuk, Andre Cohen, and Michael L Littman. 2008. An object-oriented representation for efficient reinforcement learning. In ICML. 240–247.Google Scholar
Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, and Denny Zhou. 2019. Neural Logic Machines. In ICLR. OpenReview.net.Google Scholar
Denise Draper, Steve Hanks, and Daniel S Weld. 1994. Probabilistic planning with information gathering and contingent execution. In AIPS. 31–36.Google Scholar
Sašo Džeroski, Luc De Raedt, and Kurt Driessens. 2001. Relational reinforcement learning. Mach. Learn. 43(2001), 7–52.Google ScholarCross Ref
Stefan Edelkamp. 2001. Planning with pattern databases. In Proc. ECP, Vol. 1. 13–24.Google Scholar
Stefan Edelkamp. 2007. Automated creation of pattern database search heuristics. Lect. Notes Comput. Sci. 4428 (2007), 35.Google ScholarDigital Library
Gregory Farquhar, Tim Rockt aschel, Maximilian Igl, and Shimon Whiteson. 2018. TreeQN and ATreeC: Differentiable Tree Planning for Deep Reinforcement Learning. In ICLR.Google Scholar
Zhengzhu Feng and Eric A Hansen. 2002. Symbolic heuristic search for factored Markov decision processes. In AAAI/IAAI. 455–460.Google Scholar
Alan Fern, Sung Wook Yoon, and Robert Givan. 2004. Learning Domain-Specific Control Knowledge from Random Walks. In ICAPS. 191–199.Google Scholar
Seyedshams Feyzabadi and Stefano Carpin. 2017. Planning using hierarchical constrained Markov decision processes. Auton. Robots 41(2017), 1589–1607.Google ScholarDigital Library
Richard E Fikes, Peter E Hart, and Nils J Nilsson. 1972. Learning and executing generalized robot plans. Artif. Intell. 3(1972), 251–288.Google ScholarDigital Library
Sébastien Forestier, Rémy Portelas, Yoan Mollard, and Pierre-Yves Oudeyer. 2022. Intrinsically motivated goal exploration processes with automatic curriculum learning. JMLR 23, 1 (2022), 6818–6858.Google ScholarDigital Library
Maria Fox and Derek Long. 1998. The automatic inference of state invariants in TIM. JAIR 9(1998), 367–421.Google ScholarDigital Library
Raquel Fuentetaja and Tomás De la Rosa. 2012. A planning-based approach for generating planning problems. In Workshops at AAAI.Google Scholar
Artur d’Avila Garcez, Sebastian Bader, Howard Bowman, Luis C Lamb, Leo de Penning, BV Illuminoo, Hoifung Poon, and COPPE Gerson Zaverucha. 2022. Neural-symbolic learning and reasoning: a survey and interpretation. Neuro-Symbolic Artificial Intelligence: The State of the Art 342 (2022).Google Scholar
Javier Garcıa and Fernando Fernández. 2015. A comprehensive survey on safe reinforcement learning. JMLR 16(2015), 1437–1480.Google ScholarDigital Library
Marta Garnelo, Kai Arulkumaran, and Murray Shanahan. 2016. Towards deep symbolic reinforcement learning. arXiv (2016).Google Scholar
Marta Garnelo and Murray Shanahan. 2019. Reconciling deep learning with symbolic artificial intelligence: representing objects and relations. Curr. Opin. Behav. Sci. 29 (2019), 17–23.Google ScholarCross Ref
Clement Gehring, Masataro Asai, Rohan Chitnis, Tom Silver, Leslie Kaelbling, Shirin Sohrabi, and Michael Katz. 2022. Reinforcement learning for classical planning: Viewing heuristics as dense reward generators. In ICAPS, Vol. 32. 588–596.Google Scholar
Ilche Georgievski and Marco Aiello. 2015. HTN planning: Overview, comparison, and beyond. Artif. Intell. 222(2015), 124–156.Google ScholarDigital Library
Alfonso Gerevini and Lenhart Schubert. 1998. Inferring state constraints for domain-independent planning. In AAAI. 905–912.Google Scholar
Malik Ghallab, Dana Nau, and Paolo Traverso. 2016. Automated planning and acting. Cambridge University Press.Google Scholar
Christopher Grimm, André Barreto, Satinder Singh, and David Silver. 2020. The value equivalence principle for model-based reinforcement learning. NeurIPS 33(2020), 5541–5552.Google Scholar
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. 2016. Continuous deep q-learning with model-based acceleration. In ICML. 2829–2838.Google Scholar
Arthur Guez, Mehdi Mirza, Karol Gregor, Rishabh Kabra, Sébastien Racanière, Théophane Weber, David Raposo, Adam Santoro, Laurent Orseau, Tom Eccles, et al. 2019. An investigation of model-free planning. In ICML. 2464–2473.Google Scholar
Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, and David Silver. 2018. Learning to search with MCTSnets. In ICML. 1822–1831.Google Scholar
David Gunning and David Aha. 2019. DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 40(2019), 44–58.Google ScholarDigital Library
Alexander Hans, Daniel Schneegaß, Anton Maximilian Schäfer, and Steffen Udluft. 2008. Safe exploration for reinforcement learning.. In ESANN. 143–148.Google Scholar
Eric A Hansen and Shlomo Zilberstein. 2001. LAO*: A heuristic search algorithm that finds solutions with loops. Artif. Intell. 129, 1-2 (2001), 35–62.Google ScholarDigital Library
Peter E Hart, Nils J Nilsson, and Bertram Raphael. 1968. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4, 2 (1968), 100–107.Google ScholarCross Ref
Patrik Haslum, Nir Lipovetzky, Daniele Magazzeni, and Christian Muise. 2019. An introduction to the planning domain definition language. Synth. Lect. Artif. Intell. Mach. Learn. 13 (2019), 1–187.Google ScholarCross Ref
Matthew Hausknecht and Peter Stone. 2015. Deep recurrent q-learning for partially observable mdps. In AAAI.Google Scholar
Matthias Heger. 1994. Consideration of risk in reinforcement learning. In Mach. Learn.Elsevier, 105–111.Google Scholar
Malte Helmert. 2006. The fast downward planning system. JAIR 26(2006), 191–246.Google ScholarCross Ref
Malte Helmert. 2009. Concise finite-domain representations for PDDL planning tasks. Artif. Intell. 173(2009), 503–535.Google ScholarDigital Library
Todd Hester and Peter Stone. 2013. Texplore: real-time sample-efficient reinforcement learning for robots. Mach. Learn. 90(2013), 385–429.Google ScholarDigital Library
Thomas Hickling, Abdelhafid Zenati, Nabil Aouf, and Phillippa Spencer. 2023. Explainability in Deep Reinforcement Learning: A Review into Current Methods and Applications. ACM Comput. Surv. 56, 5 (2023), 1–35.Google ScholarDigital Library
Jesse Hoey, Robert St-Aubin, Alan Hu, and Craig Boutilier. 1999. SPUDD: stochastic planning using decision diagrams. In UAI. 279–288.Google Scholar
Jörg Hoffmann. 2001. FF: The fast-forward planning system. AI Mag. 22, 3 (2001), 57–57.Google ScholarDigital Library
Jörg Hoffmann, Julie Porteous, and Laura Sebastia. 2004. Ordered landmarks in planning. JAIR 22(2004), 215–278.Google ScholarDigital Library
Chad Hogg, Héctor Munoz-Avila, and Ugur Kuter. 2008. HTN-MAKER: Learning HTNs with Minimal Additional Knowledge Engineering Required.. In AAAI. 950–956.Google Scholar
Daniel Höller and Pascal Bercher. 2021. Landmark generation in HTN planning. In AAAI, Vol. 35. 11826–11834.Google Scholar
Richard Howey, Derek Long, and Maria Fox. 2004. VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL. In IEEE Int. Conf. Tools Artif. Intell.294–301.Google ScholarDigital Library
Sergio Jiménez, Tomás De La Rosa, Susana Fernández, Fernando Fernández, and Daniel Borrajo. 2012. A review of machine learning for automated planning. Knowl. Eng. Rev. 27, 4 (2012), 433–467.Google ScholarDigital Library
S Jiménez, F Fernández, and D Borrajo. 2008. The PELA architecture: integrating planning and learning to improve execution. In AAAI. AAAI Press.Google Scholar
Sergio Jiménez, Javier Segovia-Aguas, and Anders Jonsson. 2019. A review of generalized planning. Knowl. Eng. Rev. 34(2019), e5.Google ScholarCross Ref
Mu Jin, Zhihao Ma, Kebing Jin, Hankz Hankui Zhuo, Chen Chen, and Chao Yu. 2022. Creativity of ai: Automatic symbolic option discovery for facilitating deep reinforcement learning. In AAAI, Vol. 36. 7042–7050.Google Scholar
Łukasz Kaiser, Mohammad Babaeizadeh, Piotr Miłos, Błażej Osiński, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, et al. 2020. Model Based Reinforcement Learning for Atari. In ICLR. OpenReview.net.Google Scholar
Ken Kansky, Tom Silver, David A Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, and Dileep George. 2017. Schema networks: Zero-shot transfer with a generative causal model of intuitive physics. In ICML. 1809–1818.Google Scholar
Erez Karpas and Carmel Domshlak. 2009. Cost-Optimal Planning with Landmarks.. In IJCAI. 1728–1733.Google Scholar
Michael Katz and Shirin Sohrabi. 2020. Generating Data In Planning: SAS Planning Tasks of a Given Causal Structure. HSDIP (2020), 41.Google Scholar
Michael Katz, Kavitha Srinivas, Shirin Sohrabi, Mark Feblowitz, Octavian Udrea, and Oktie Hassanzadeh. 2021. Scenario planning in the wild: A neuro-symbolic approach. FinPlan 15(2021).Google Scholar
Emil Keyder, Jörg Hoffmann, and Patrik Haslum. 2014. Improving delete relaxation heuristics through explicitly represented conjunctions. JAIR 50(2014), 487–533.Google ScholarDigital Library
S Mohammad Khansari-Zadeh and Aude Billard. 2011. Learning stable nonlinear dynamical systems with gaussian mixture models. IEEE Trans. Robot. 27, 5 (2011), 943–957.Google ScholarDigital Library
DP Kingma and M Welling. 2014. Auto-encoding variational Bayes international. In ICLR.Google Scholar
Thomas N. Kipf, Elise van der Pol, and Max Welling. 2020. Contrastive Learning of Structured World Models. In ICLR. OpenReview.net.Google Scholar
Jens Kober, J Andrew Bagnell, and Jan Peters. 2013. Reinforcement learning in robotics: A survey. Int. J. Rob. Res. 32, 11 (2013), 1238–1274.Google ScholarDigital Library
Daphne Koller, Nir Friedman, Sašo Džeroski, Charles Sutton, Andrew McCallum, Avi Pfeffer, Pieter Abbeel, Ming-Fai Wong, Chris Meek, Jennifer Neville, et al. 2007. Introduction to statistical relational learning. MIT press.Google Scholar
George Dimitri Konidaris and Andrew G Barto. 2007. Building Portable Options: Skill Transfer in Reinforcement Learning.. In IJCAI, Vol. 7. 895–900.Google Scholar
Rogier Koppejan and Shimon Whiteson. 2011. Neuroevolutionary reinforcement learning for generalized control of simulated helicopters. Evol. Intell. 4(2011), 219–241.Google ScholarCross Ref
Richard E Korf. 1985. Macro-operators: A weak method for learning. Artif. Intell. 26(1985), 35–77.Google ScholarDigital Library
Stefan Kramer. 1996. Structural regression trees. In AAAI. 812–819.Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. ACM Commun. 60(2017), 84–90.Google ScholarDigital Library
Mikel Landajuela, Brenden K Petersen, Sookyung Kim, Claudio P Santiago, Ruben Glatt, Nathan Mundhenk, Jacob F Pettit, and Daniel Faissol. 2021. Discovering symbolic policies with deep reinforcement learning. In ICML. 5979–5989.Google Scholar
Adrien Laversanne-Finot, Alexandre Pere, and Pierre-Yves Oudeyer. 2018. Curiosity driven exploration of learned disentangled goal spaces. In CoRL. 487–504.Google Scholar
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. nature 521(2015), 436–444.Google Scholar
Yuxi Li. 2018. Deep Reinforcement Learning. arXiv (2018).Google Scholar
Michael Lederman Littman. 1996. Algorithms for sequential decision-making. Brown University.Google Scholar
Derek Long and Maria Fox. 1999. Efficient implementation of the plan graph in STAN. JAIR 10(1999), 87–115.Google ScholarDigital Library
William S Lovejoy. 1991. A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res. 28, 1 (1991), 47–65.Google ScholarDigital Library
Daoming Lyu, Fangkai Yang, Bo Liu, and Steven Gustafson. 2019. SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In AAAI, Vol. 33. 2970–2977.Google Scholar
Marlos C Machado, Marc G Bellemare, and Michael Bowling. 2017. A laplacian framework for option discovery in reinforcement learning. In ICML. 2295–2304.Google Scholar
Maurício Cecílio Magnaguagno, RAMON FRAGA PEREIRA, Martin Duarte Móre, and Felipe Rech Meneguzzi. 2017. Web planner: A tool to develop classical planning domains and visualize heuristic state-space search. In ICAPS UISP Workshop.Google Scholar
Gary Marcus. 2018. Deep learning: A critical appraisal. arXiv (2018).Google Scholar
Amy McGovern and Richard S Sutton. 1998. Macro-actions in reinforcement learning: An empirical analysis. Computer Science Department Faculty Publication Series (1998), 15.Google Scholar
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In ICML. 1928–1937.Google Scholar
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv (2013).Google Scholar
Thomas M Moerland, Joost Broekens, and Catholijn M Jonker. 2020. A framework for reinforcement learning and planning. arXiv (2020).Google Scholar
Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker, et al. 2023. Model-based reinforcement learning: A survey. Found. Trends Mach. Learn. 16, 1 (2023), 1–118.Google ScholarDigital Library
Matthew Molineaux, Matthew Klenk, and David Aha. 2010. Goal-driven autonomy in a Navy strategy simulation. In AAAI, Vol. 24. 1548–1554.Google Scholar
Kira Mourao, Ronald PA Petrick, and Mark Steedman. 2008. Using kernel perceptrons to learn action effects for planning. In CogSys. 45–50.Google Scholar
Mausam Natarajan and Andrey Kolobov. 2022. Planning with Markov decision processes: An AI perspective. Springer Nature.Google Scholar
Dana S Nau, Tsz-Chiu Au, Okhtay Ilghami, Ugur Kuter, J William Murdock, Dan Wu, and Fusun Yaman. 2003. SHOP2: An HTN planning system. JAIR 20(2003), 379–404.Google ScholarDigital Library
Negin Nejati, Pat Langley, and Tolga Konik. 2006. Learning hierarchical task networks by observation. In ICML. 665–672.Google Scholar
Andrew Y Ng, Stuart Russell, et al. 2000. Algorithms for inverse reinforcement learning.. In ICML, Vol. 1. 2.Google Scholar
Carlos Núñez-Molina, Juan Fernández-Olivares, and Raúl Pérez. 2022. Learning to select goals in Automated Planning with Deep-Q Learning. Expert Syst. Appl. 202(2022), 117265.Google ScholarDigital Library
Carlos Núñez-Molina, Pablo Mesejo, and Juan Fernández-Olivares. 2023. NeSIG: A Neuro-Symbolic Method for Learning to Generate Planning Problems. arXiv (2023).Google Scholar
Tim Oates and Paul R Cohen. 1996. Searching for planning operators with context-dependent and probabilistic effects. In AAAI. 863–868.Google Scholar
Junhyuk Oh, Satinder Singh, and Honglak Lee. 2017. Value prediction network. Advances in neural information processing systems 30 (2017).Google Scholar
Ioannis Partalas, Dimitris Vrakas, and Ioannis Vlahavas. 2008. Reinforcement learning and automated planning: A survey. In Artificial Intelligence for Advanced Problem Solving Techniques. IGI Global, 148–165.Google Scholar
Razvan Pascanu, Yujia Li, Oriol Vinyals, Nicolas Heess, Lars Buesing, Sebastien Racanière, David Reichert, Théophane Weber, Daan Wierstra, and Peter Battaglia. 2017. Learning model-based planning from scratch. arXiv (2017).Google Scholar
Hanna M Pasula, Luke S Zettlemoyer, and Leslie Pack Kaelbling. 2007. Learning symbolic models of stochastic domains. JAIR 29(2007), 309–352.Google ScholarDigital Library
Ramon Fraga Pereira, Nir Oren, and Felipe Meneguzzi. 2020. Landmark-based approaches for goal recognition as planning. Artif. Intell. 279(2020), 103217.Google ScholarDigital Library
Aske Plaat, Walter Kosters, and Mike Preuss. 2023. High-accuracy model-based reinforcement learning, a survey. Artif. Intell. Rev. (2023), 1–33.Google Scholar
Alberto Pozanco, Susana Fernández, and Daniel Borrajo. 2018. Learning-driven goal generation. AI Commun. 31, 2 (2018), 137–150.Google ScholarDigital Library
Miquel Ramırez and Hector Geffner. 2009. Plan recognition as planning. In IJCAI. 1778–1783.Google Scholar
Jussi Rintanen. 2008. Regression for classical and nondeterministic planning. In ECAI. IOS Press, 568–572.Google Scholar
Alexander Rovner, Silvan Sievers, and Malte Helmert. 2019. Counterexample-guided abstraction refinement for pattern selection in optimal classical planning. In ICAPS, Vol. 29. 362–367.Google Scholar
Stuart J. Russell and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach (4th Edition). Pearson.Google Scholar
Earl D Sacerdoti. 1975. The nonlinear nature of plans. Technical Report. Stanford Research Inst. Menlo Park CA.Google Scholar
Javad Safaei and Gholamreza Ghassem-Sani. 2007. Incremental learning of planning operators in stochastic domains. In SOFSEM. 644–655.Google Scholar
Scott Sanner et al. 2010. Relational dynamic influence diagram language (rddl): Language description. Unpublished ms. Australian National University 32 (2010), 27.Google Scholar
Björn Schäpers, Tim Niemueller, Gerhard Lakemeyer, Martin Gebser, and Torsten Schaub. 2018. ASP-based time-bounded planning for logistics robots. In ICAPS, Vol. 28. 509–517.Google Scholar
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, et al. 2020. Mastering atari, go, chess and shogi by planning with a learned model. Nature 588, 7839 (2020), 604–609.Google Scholar
José Á Segura-Muros, Raúl Pérez, and Juan Fernández-Olivares. 2021. Discovering relational and numerical expressions from plan traces for learning action models. Appl. Intell. 51(2021), 7973–7989.Google ScholarDigital Library
Ashish Kumar Shakya, Gopinatha Pillai, and Sohom Chakrabarty. 2023. Reinforcement Learning Algorithms: A brief survey. Expert Syst. Appl. (2023), 120495.Google Scholar
William Shen, Felipe Trevizan, and Sylvie Thiébaux. 2020. Learning domain-independent planning heuristics with hypergraph networks. In ICAPS, Vol. 30. 574–584.Google Scholar
W M Shen and H A Simon. 1989. Rule Creation and Rule Learning Through Environmental Exploration. In IJCAI. Morgan Kaufmann, 675–680.Google Scholar
Xingjian Shi, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. 2015. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. NeurIPS 28(2015).Google Scholar
David Silver, Hado Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, et al. 2017. The predictron: End-to-end learning and planning. In ICML. 3191–3199.Google Scholar
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. 2018. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 6419 (2018), 1140–1144.Google Scholar
Satinder Singh, Tommi Jaakkola, and Michael Jordan. 1994. Reinforcement learning with soft state aggregation. NeurIPS 7(1994).Google Scholar
Shirin Sohrabi, Anton V Riabov, and Octavian Udrea. 2016. Plan Recognition as Planning Revisited.. In IJCAI. 3258–3264.Google Scholar
Aravind Srinivas, Allan Jabri, Pieter Abbeel, Sergey Levine, and Chelsea Finn. 2018. Universal planning networks: Learning generalizable representations for visuomotor control. In ICML. 4732–4741.Google Scholar
Martin Stolle and Doina Precup. 2002. Learning options in reinforcement learning. In SARA. 212–223.Google Scholar
Richard S Sutton. 1991. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin 2, 4 (1991), 160–163.Google ScholarDigital Library
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.Google ScholarDigital Library
Richard S Sutton, Doina Precup, and Satinder Singh. 1999. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artif. Intell. 112, 1-2 (1999), 181–211.Google ScholarDigital Library
Richard S. Sutton, Csaba Szepesvári, Alborz Geramifard, and Michael H. Bowling. 2008. Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping. In UAI. AUAI Press, 528–536.Google Scholar
Prasad Tadepalli, Robert Givan, and Kurt Driessens. 2004. Relational reinforcement learning: An overview. In ICML workshop on relational reinforcement learning. 1–9.Google Scholar
Aviv Tamar, Yi Wu, Garrett Thomas, Sergey Levine, and Pieter Abbeel. 2016. Value iteration networks. NeurIPS 29(2016).Google Scholar
Robert Tarjan. 1972. Depth-first search and linear graph algorithms. SIAM J. Comput. 1, 2 (1972), 146–160.Google ScholarDigital Library
Austin Tate. 1977. Generating project networks. In IJCAI. 888–893.Google Scholar
Andrea Lockerd Thomaz, Cynthia Breazeal, et al. 2006. Reinforcement learning with human teachers: Evidence of feedback and guidance with implications for learning performance. In AAAI, Vol. 6. 1000–1005.Google Scholar
Alvaro Torralba, Jendrik Seipp, and Silvan Sievers. 2021. Automatic instance generation for classical planning. In ICAPS, Vol. 31. 376–384.Google Scholar
Sam Toyer, Felipe Trevizan, Sylvie Thiébaux, and Lexing Xie. 2018. Action schema networks: Generalised policies with deep learning. In AAAI, Vol. 32.Google Scholar
Felipe W Trevizan and Manuela M Veloso. 2014. Depth-based short-sighted stochastic shortest path problems. Artif. Intell. 216(2014), 179–205.Google ScholarDigital Library
Jes ús Virseda, Daniel Borrajo, and Vidal Alcázar. 2013. Learning heuristic functions for cost-based planning. Planning and Learning 4(2013).Google Scholar
Mauro Vallati, Lukas Chrpa, Marek Grześ, Thomas Leo McCluskey, Mark Roberts, Scott Sanner, et al. 2015. The 2014 international planning competition: Progress and trends. Ai Mag. 36, 3 (2015), 90–98.Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. NeurIPS 30(2017).Google Scholar
Niklas Wahlström, Thomas B Schön, and Marc Peter Deisenroth. 2015. From pixels to torques: Policy learning with deep dynamical models. arXiv (2015).Google Scholar
Thomas J Walsh and Michael L Littman. 2008. Efficient learning of action schemas and web-service descriptions. In AAAI, Vol. 8. 714–719.Google Scholar
William Yang Wang, Jiwei Li, and Xiaodong He. 2018. Deep reinforcement learning for NLP. In ACL: Tutorial Abstracts. 19–21.Google Scholar
Xuemei Wang. 1996. Learning planning operators by observation and practice. Ph. D. Dissertation. Carnegie Mellon University.Google Scholar
Christopher John Cornish Hellaby Watkins. 1989. Learning from delayed rewards. Ph. D. Dissertation. King’s College.Google Scholar
Ben Weber, Michael Mateas, and Arnav Jhala. 2012. Learning from demonstration for goal-driven autonomy. In AAAI, Vol. 26. 1176–1182.Google Scholar
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning(1992), 5–32.Google Scholar
Qiang Yang, Kangheng Wu, and Yunfei Jiang. 2007. Learning action models from plan examples using weighted MAX-SAT. Artif. Intell. 171, 2-3 (2007), 107–143.Google ScholarDigital Library
Sungwook Yoon and Subbarao Kambhampati. 2007. Towards model-lite planning: A proposal for learning & planning with incomplete domain models. In ICAPS Workshop on Artificial Intelligence Planning and Learning.Google Scholar
Sung Wook Yoon, Alan Fern, and Robert Givan. 2006. Learning Heuristic Functions from Relaxed Plans.. In ICAPS, Vol. 2. 3.Google Scholar
Sung Wook Yoon, Alan Fern, and Robert Givan. 2007. FF-Replan: A Baseline for Probabilistic Planning.. In ICAPS, Vol. 7. 352–359.Google Scholar
Håkan LS Younes and Michael L Littman. 2004. PPDDL1. 0: An extension to PDDL for expressing planning domains with probabilistic effects. Techn. Rep. CMU-CS-04-162 2 (2004), 99.Google Scholar
Chao Yu, Xuejing Zheng, Hankz Hankui Zhuo, Hai Wan, and Weilin Luo. 2023. Reinforcement Learning with Knowledge Representation and Reasoning: A Brief Survey. arXiv (2023).Google Scholar
Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, et al. 2019. Deep reinforcement learning with relational inductive biases. In ICLR.Google Scholar
Hankz Hankui Zhuo and Qiang Yang. 2014. Action-model acquisition for planning via transfer learning. Artif. Intell. 212(2014), 80–103.Google ScholarDigital Library

Index Terms

A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making

Recommendations

Decision Making Method Based on Projection of Hybrid Vector
FSKD '08: Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 01

The problem of hybrid multi-attribute decision making with real number, interval number and fuzzy number, is studied. First of all, the hybrid vector and the projection of hybrid vector are defined, then a new decision making method based on projection ...
Read More
Neurosymbolic Integration of Linear Temporal Logic in Non Symbolic Domains
Multi-Agent Systems
Abstract
Linear Temporal Logic (LTL) is widely used to specify temporal relationships and dynamic constraints for autonomous agents. However, in order to be used in practice in real-world domains, this high-level knowledge must be grounded in the task ...
Read More
A Hybrid Architecture for Situated Learning of Reactive Sequential Decision Making

In developing autonomous agents, one usually emphasizes only (situated) procedural knowledge, ignoring more explicit declarative knowledge. On the other hand, in developing symbolic reasoning models, one usually emphasizes only declarative knowledge, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Computing Surveys Just Accepted
ISSN:0360-0300
EISSN:1557-7341
Table of Contents

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Online AM: 11 May 2024
- Accepted: 13 April 2024
- Revised: 5 April 2024
- Received: 5 May 2023
Check for updates
Author Tags
neurosymbolic AI
Qualifiers
- survey
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 35
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)35
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Decision Making Method Based on Projection of Hybrid Vector

Neurosymbolic Integration of Linear Temporal Logic in Non Symbolic Domains

A Hybrid Architecture for Situated Learning of Reactive Sequential Decision Making

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Decision Making Method Based on Projection of Hybrid Vector

Neurosymbolic Integration of Linear Temporal Logic in Non Symbolic Domains

A Hybrid Architecture for Situated Learning of Reactive Sequential Decision Making

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media