Amr Gomaa
Doctoral Researcher
Research Interests
Computer VisionApplied Machine LearningDeep LearningReinforcement LearningMulti-modal InteractionHuman Computer InteractionAdaptive InterfacesOpen Thesis Topics
Task-oriented grasping is a well-known problem in the robotics domain. Modeling the complex relation between grasping types, objects, and the intended task is a long-studied paradigm. However, most existing methods still focus on closed-world concepts and non-anthropomorphic solutions using two-finger grippers. More recently, researchers incorporated Large Language Models (LLMs) to analyze and describe objects based on tasks for a grasping robot. In this thesis, you will focus on an LLM-based solution for a five-finger anthropomorphic robotic hand that goes beyond the limitation of existing semantic knowledge-based methods.
Moreover, you would combine this information with vision-based input/geometric data, predict a grasp type, and plan/execute the grasp. You should do this in a simulation since this is relatively complicated and focus only on the most fundamental grasp types. You can refer to the following two papers (among others you might find yourself) that focus on this problem.
Prerequisites
- Please read about the following papers [1] [2] [3]
- Familiar with deep learning concepts, Large Language Models, and Simulation environments (e.g., Unity)
- Completed Statistics, Machine learning, and/or HCI courses
- Strong programming skills
How to apply
Please send us an email with the following pieces of information:
- When you plan to start the thesis
- When you plan to finish the thesis
- A short motivational statement why this topic is interesting for you
- A summary why you would be a good fit for this topic
- Your transcript of records and CV
Recent advances in Machine learning (specifically Deep Learning) allowed robots to understand objects and the surrounding environment on a perceptual non-symbolic level (e.g. object detection, sensors fusion, and language understanding), however a trending area of research is to understand objects on a conceptual symbolic level so we can achieve a level of robots thinking like humans. Deep Reinforcement Learning (RL) recently attempted implicitly combining these symbolic and non-symbolic learning paradigms, but it has several drawbacks such as: (1) the need for very long training time with respect to traditional deep learning approaches, (2) convergence to optimum policy is not guaranteed and it can get stuck in a sub-optimal policy, and (3) a RL agent is trained over a simulated environment so it cannot foresee actions that only exist in the real environment. The goal of thesis is to train a robot that would explicitly learn on both perceptual and conceptual levels through direct feedback from a human expert along with its existing view (i.e. sensors) of the world.
Focus
This work will focus on Reinforcement Learning, Imitation Learning and the combination of both. This work will involve real-time implementation of a working system.
Prerequisites
- Please read about the following papers [1] [2] [3] [4] [5]
- Background or interest in RL, Computer Vision or AI Planning
- Completed HCI, Statistics and/or Machine learning courses
- Strong programming skills
- Unity/SImulation environments background is a plus
How to apply
Please send me an email with the following pieces of information:
- When you plan to start the thesis
- When you plan to finish the thesis
- A short motivational statement why this topic is interesting for you
- A summary why you would be a good fit for this topic
- Your transcript of records and CV
Referencing resolution is a trending topic that remains unsolved due to the high variance in users' behavior when performing a referencing task. Referencing resolution is simply identifying the object a user is intending to select through speech, pointing, gaze or multi modal fusion of all the previous modalities. Referencing is used in multiple domains in HCI such as Human Robot Interaction (HRI) [Nickel et al. 2003; Whitney et al. 2016; Kontogiorgos et al. 2018; Sibirtseva et al. 2019], and Vehicle and Drone interaction [Rümelin et al. 2013; Roider et al. 2017; Gomaa et al. 2020]. However, most of the current research focus on stationary first-person view when interacting with the object. In this thesis, you will work on the task of multi-modal real-time reference resolution using speech, gaze and/or pointing gestures from a moving source when interacting with a vehicle, industrial robot, or retail delivery drone.
Focus
This work will focus on gesture identification, gaze tracking, object detection, speech recognition and/or modality fusion techniques. This work will involve real-time implementation of a working system.
Prerequisites
- Please read about the following papers [1] [2] [3] [4] [5] [6]
- Background or interest in gesture recognition, NLP or gaze tracking
- Completed HCI, Statistics and/or Machine learning courses
- Strong programming skills
How to apply
Please send me an email with the following pieces of information:
- When you plan to start the thesis
- When you plan to finish the thesis
- A short motivational statement why this topic is interesting for you
- A summary why you would be a good fit for this topic
- Your transcript of records and CV
Projects
CAMELOT (finished)
TRACTAT (finished)
APX-HMI (finished)
Teaching
Seminar: Speech-based Adaptation of Personalized User Interfaces (Winter 2022/2023)
Adaptive Human Machine Interfaces for Autonomous Systems (Winter 2021/2022)
Hybrid Machine Learning Approaches and Applications (Winter 2020/2021)
Seminar: Automotive User Interfaces (Winter 2019/2020)
Academic Services
AC at AutomotiveUI WIP 2023, PC at HITLAML Workshop 2023
Reviewer: IMWUT 2023, CHI 2024 & 2023, IEEE VR 2023, NordiCHI 2022, AutomotiveUI 2023 & 2022 & 2021, CHI PLAY 2021, ICMI 2021, and IEEE AIVR 2021.