Machine learning for dynamic maneuvers using Bayesian gradient estimates
Program Director UROP
- +49 241 80-90695
- Send Email
- Project Offer-Number:
- UROP International
- Computer Science
- Organisation unit:
- Institute for Data Science in Mechanical Engineering
- Language Skills:
- Computer Skills:
- Python programming
Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. We recently proposed a method that increases data efficiency by using Bayesian gradient estimates. The method systematically reasons about uncertainty and actively chooses informative samples. We are looking for a UROP student researcher that applies the method to learn a highly dynamic swing-up maneuver on an inverted pendulum on hardware. The results will show that we can learn complex behavior directly on hardware by using data-efficient methods.
The UROP student researcher will be asked to do the following, with the help of the supervisor: (1) read the current project paper(s) and a small amount of background material on algorithm used in the project, (2) implement and perform learning experiments in simulation, (3) perform learning experiments on hardware available in our lab,, (4) present the final work at the end of the program.
- Engineering, computer science, and/or mathematics educational background - Comfortable programming in Python - Interest in optimization, probability theory, machine learning and reinforcement learning - Highly motivated, ready to problem-solve and work independently (with supervisor support)