In this article, we explore the challenges of developing artificial intelligence (AI) that can reason interactively and effectively in complex environments. We introduce the ScienceWorld benchmark, a platform that evaluates the effectiveness of AI models in simulated tasks that require problem-solving abilities similar to those of humans.
To tackle these complex tasks, AI models must possess long-horizon planning, subgoal decomposition, spatial reasoning, exception handling, and commonsense knowledge capabilities. Traditional approaches have limitations in addressing these complexities, leading us to propose novel methods that scale instruction-finetuned language models to overcome these challenges.
We discuss related work on textworld, a learning environment for text-based games, which provides insights into the importance of interactive reasoning and the need for more advanced AI systems. Our proposed approach leverages the strengths of instruction-finetuned language models while addressing their limitations in complex interactive tasks.
In summary, this article highlights the significance of developing AI that can reason interactively and effectively in dynamic, open-world environments. By introducing the ScienceWorld benchmark and proposing novel methods to overcome traditional limitations, we take a significant step towards advancing the field of artificial general intelligence.
Computation and Language, Computer Science