UROP Project

Machine learning for dynamic maneuvers using Bayesian gradient estimates

Contact

Name

Jan Müller

Program Director UROP

Telephone

workPhone
+49 241 80-90299

E-Mail

Key Info

Basic Information

Project Offer-Number:
1209
Category:
UROP International
Field:
Computer Science
Faculty:
4
Organisation unit:
Institute for Data Science in Mechanical Engineering
Language Skills:
English
Computer Skills:
Python programming
Professor:
Prof. Sebastian Trimpe

MoveOn

Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. We recently proposed a method that increases data efficiency by using Bayesian gradient estimates. The method systematically reasons about uncertainty and actively chooses informative samples. We are looking for a UROP student researcher that applies the method to learn a highly dynamic swing-up maneuver on an inverted pendulum on hardware. The results will show that we can learn complex behavior directly on hardware by using data-efficient methods.

Task

The UROP student researcher will be asked to do the following, with the help of the supervisor:  (1) read the current project paper(s) and a small amount of background material on algorithm used in the project,  (2) implement and perform learning experiments in simulation, (3) perform learning experiments on hardware available in our lab,, (4) present the final work at the end of the program.

Requirements

- Engineering, computer science, and/or mathematics educational background - Comfortable programming in Python - Interest in optimization, probability theory, machine learning and reinforcement learning - Highly motivated, ready to problem-solve and work independently (with supervisor support)