Reinforcement learning (RL) is a powerful tool for optimizing control laws, but it can be slow and computationally expensive. In this article, we propose an efficient approach that combines RL with a new way of handling the plant dynamics. This allows us to learn a control law faster and with fewer training trials than traditional RL methods.
The key idea is to use a single experiment to learn both the plant dynamics and the control law. This is done by using a special algorithm called Algorithm 1, which takes advantage of the discrete-time cost function used in RL. By using this algorithm, we can avoid the tedious tuning of hyperparameters and improve the performance of the policy evaluation and improvement.
To make sure that the learning process is efficient, we use a termination condition that ensures the repetition of the policy evaluation and improvement until the desired level of accuracy is reached. This termination condition is based on a scalar value called ǫ, which determines how close the learned control law must be to the optimal solution.
The proposed approach was tested on a simple example of a plant with known dynamics, and the results showed that it was able to learn a control law with better performance than traditional RL methods. This demonstrates the potential of our approach for improving tran-sient learning performance and designing a control law with a smaller cost than RL alone under the same number of training trials.
In conclusion, our proposed approach provides an efficient way to learn a control law using reinforcement learning. By combining Algorithm 1 with a termination condition based on ǫ, we can improve the performance of the policy evaluation and improvement without oversimplifying the process. This has important implications for applications where real-time control is critical, such as robotics and autonomous vehicles.
Electrical Engineering and Systems Science, Systems and Control