Location: The Winokur Family Hall, SEC 1.321 at 150 Western Ave, Allston, 02134 (if the main room is full, the overflow room is available at SEC 1.413)
Recording: https://www.youtube.com/playlist?list=PLH1hTGoJ9GtKuJf8mQBQfBrd9e_Boic4c
Jan 23 2025: Workshop Day 1
- 8:15-8:40am: Registration, coffee, breakfast
- 8:40-9:00am: Opening remarks:
- Dr. Michael Littman, Dr. Anthony A. Maciejewski, and Dr. Anthony Kuh from NSF;
- Dean David Parkes and Prof. Na (Lina) Li from Harvard SEAS
- 9:00-9:40am: Keynote 1: Disentangling Goals from Beliefs
- Benjamin Van Roy
- Stanford University
- Recording: https://youtu.be/s4at4Mlr9E8

Abstract and Bio
As AI agents generate increasingly sophisticated behaviors, manually encoding human preferences to guide these agents becomes ever more challenging. To address this, it has been suggested that agents instead learn preferences from human choice data. Common approaches to this conflate human beliefs and goals. As a consequence, even for tasks where an agent is better informed or more capable than humans, it tends to imitate humans rather than leverage its advantage to best serve human goals. We will discuss how to address this when learning from choices between partial trajectories of states and actions as well as broader implications pertaining to the future evolution of AI.
Bio: Benjamin Van Roy is a Professor at Stanford University, where he has served on the faculty since 1998. His research interests center on the design and analysis of reinforcement learning agents. He founded the Efficient Agent Team at Google DeepMind, and has also led research programs at Morgan Stanley, Unica (acquired by IBM), and Enuvis (acquired by SiRF), which he co-founded. He received the SB in Computer Science and Engineering and the SM and PhD in Electrical Engineering and Computer Science, all from MIT, where his doctoral research was advised by John N. Tsitsiklis. He is a Fellow of the IEEE and INFORMS and is a recipient of the INFORMS Lanchester Prize.
- 9:40-10:20am: Keynote 2: Augmenting Our Abilities: RL for Decision Assistance
- Finale Doshi-Velez
- Harvard University
- Recording: https://youtu.be/odl6DJNHOvY

Abstract and Bio
RL is about decision making, but there are many kinds of decisions that we either cannot or should not leave fully to agents. For example, in health settings, people may have access to measurements or past medical history that’s not available to agent. They may also have preferences that have not made it into the RL’s reward function. In this talk, I’ll discuss two categories of RL problems in this context. The first is helping people identify promising policies from offline data. The second is optimizing the human+agent interaction itself to result in the best outcomes at the point of use. For each, I’ll provide some examples from my own work and also open questions. More broadly, I’ll encourage us to think deeply about different shapes of problems in RL: we have many methods, and those methods work amazing — sometimes. Perhaps, a better method will work more times. But we could also do well if we could better understand under what conditions a method might be expected to work.
Bio: Finale Doshi-Velez is a Herchel Smith Professor in Computer Science at the Harvard Paulson School of Engineering and Applied Sciences. She completed her MSc from the University of Cambridge as a Marshall Scholar, her PhD from MIT, and her postdoc at Harvard Medical School. Her interests lie at the intersection of machine learning, healthcare, and interpretability.
- 10:20-10:35am: Break
- 10:35-11:45am: Panel 1: Bridge Theory and Practice
- Bo Dai (Gatech and Google), Lead Organizer
- Mingyi Hong (UMN), Scribe
- Recording: https://youtu.be/1ZgLCXJ56Sg
- Panelists:

(Amazon)
Moderator

(MIT)

(CMU)

(UMD)

(Harvard)

(UW)
- 11:45-12:30pm: Lightning talk session 1
- Recording: https://youtu.be/FIHONKJDhL0
- 12:30-1:45pm: Lunch and poster
- 1:45-2:25pm: Keynote 3: Multiagent RL: Cooperation and Competition
- Peter Stone
- The University of Texas at Austin and Sony AI
- Recording: https://youtu.be/BFcGjEQkQww

Abstract and Bio
As autonomous agents proliferate in the real world, both in software and robotic settings, they will increasingly need to interact with each other, sometimes as teammates, and sometimes as adversaries. This talks will begin by considering agents that need to band together for cooperative activities with previously unfamiliar teammates. In such “ad hoc” team settings, team strategies cannot be developed a priori. Rather agents must learn to cooperate with many types of teammates: they must collaborate without pre-coordination.
In contrast, automobile racing represents an extreme example of competitve interaction. In this setting, drivers must execute complex tactical maneuvers to pass or block opponents while operating their vehicles at their traction limits. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multiagent interactions. This talk ends with an overview of Gran Turismo Sophy, an RL agent that won a head-to-head competition against four of the world’s best Gran Turismo drivers, and is now available for people around the world to compete against.
Bio: Dr. Peter Stone holds the Truchard Foundation Chair in Computer Science at the University of Texas at Austin. He is Associate Chair of the Computer Science Department, as well as Director of Texas Robotics. In 2013 he was awarded the University of Texas System Regents’ Outstanding Teaching Award and in 2014 he was inducted into the UT Austin Academy of Distinguished Teachers, earning him the title of University Distinguished Teaching Professor. Professor Stone’s research interests in Artificial Intelligence include machine learning (especially reinforcement learning), multiagent systems, and robotics. Professor Stone received his Ph.D in Computer Science in 1998 from Carnegie Mellon University. From 1999 to 2002 he was a Senior Technical Staff Member in the Artificial Intelligence Principles Research Department at AT&T Labs – Research. He is an Alfred P. Sloan Research Fellow, Guggenheim Fellow, AAAI Fellow, IEEE Fellow, AAAS Fellow, ACM Fellow, Fulbright Scholar, and 2004 ONR Young Investigator. In 2007 he received the prestigious IJCAI Computers and Thought Award, given biannually to the top AI researcher under the age of 35, and in 2016 he was awarded the ACM/SIGAI Autonomous Agents Research Award. Professor Stone co-founded Cogitai, Inc., a startup company focused on continual learning, in 2015, and currently serves as Chief Scientist of Sony AI.
- 2:25-3:05pm: Keynote 4: Advancing Reinforcement and Imitation Learning for Robust and Scalable Robotics
- Jonathan How
- MIT
- Recording: https://youtu.be/Pt8G9-H1vB4

Abstract and Bio
Reinforcement learning (RL) has shown great promise for enabling agents to learn complex behaviors in robotics, but real-world deployment faces challenges like generalization across dynamics, robustness to disturbances, and efficient task adaptation. This talk explores these challenges in the context of ground and aerial robotics, presenting innovative RL- and Imitation Learning-based frameworks for single- and multi-agent problems. For single-agent flight control, we introduce the Robust Tube Model Predictive Control (RTMPC) framework, which generates robust trajectories under uncertainty. Using Tube-Guided Data Augmentation, RTMPC trains policies resilient to disturbances, significantly reducing the need for large datasets. Building on this, we present GRAM, a deep RL framework for dynamics generalization, combining in-distribution adaptation and out-of-distribution robustness with an uncertainty-aware module for enhanced performance without expert-designed robustness. For multiagent planning, we highlight frameworks for decentralized and collaborative operations in uncertain environments. These include PUMA, a perception- and uncertainty-aware trajectory planner, PRIMER, which accelerates planning with imitation learning, and DYNUS, an optimization-based approach for fast, safe replanning. Finally, we discuss policy safety through uncertainty-aware planning and safety certification, introducing CARV for scalable reachability analysis and EVORA for probabilistic models for terrain navigation. Together, these frameworks demonstrate RL’s potential to address real-world challenges in robotics.
Bio: Jonathan P. How is the Richard C. Maclaurin Professor of Aeronautics and Astronautics at the Massachusetts Institute of Technology. He received a B.A.Sc. (Aerospace) from the University of Toronto in 1987, and his S.M. and Ph.D. in Aeronautics and Astronautics from MIT in 1990 and 1993, respectively, and then studied for 1.5 years at MIT as a postdoctoral associate. Prior to joining MIT in 2000, he was an assistant professor in the Department of Aeronautics and Astronautics at Stanford University.Dr. How was the editor-in-chief of the IEEE Control Systems Magazine (2015-19), an associate editor for the AIAA Journal of Aerospace Information Systems (2012-21) and IEEE Transactions on Neural Networks and Learning Systems (2018-21). He was the Program Vice-chair (tutorials) for the 2021 Conference on Decision and Control and is the Program Chair for the American Control Conference in 2025. He was elected to the Board of Governors of the IEEE CSS for 2020- 22, is a member of the IEEE CSS Executive Committee (VP Finance) (2023-24), is on the IEEE CSS Long Range Planning Committee (2022 – ), is a member of the IEEE CSS Technical Committee on Aerospace Control and the Technical Committee on Intelligent Control, was a member of the IEEE Fellows Selection committee for CSS (2021-22), and since 2021, he serves as the AIAA Director on the American Automatic Control Council. He was a member of the USAF Scientific Advisory Board (SAB) from 2014-17.
His research focuses on robust planning and learning under uncertainty with an emphasis on multiagent systems, and he was the planning and control lead for the MIT DARPA Urban Challenge team. His work has been recognized with multiple awards, including receiving the IEEE Transactions on Robotics King-Sun Fu Memorial Best Paper Award for 2022, being inducted into the University of Toronto Engineering Hall of Distinction (2022), receiving the 2020 IEEE CSS Distinguished Member Award, the 2020 AIAA Intelligent Systems Award, the 2015 AeroLion Technologies Outstanding Paper Award for Unmanned Systems, the 2015 IEEE CSS Video Clip Contest, the 2011 IFAC Automatica award for best applications paper, and the 2002 Institute of Navigation Burka Award. He also received the Air Force Commander’s Public Service Award in 2017. He is a Fellow of IEEE (2018) and AIAA (2016) and was elected to the National Academy of Engineering in 2021.
- 3:05-3:35pm: Break
- 3:35-4:45pm: Panel 2: RL and Control Synergies: Models, Data, and Policies
- Guannan Qu (CMU), Lead Organizer
- Kaiqing Zhang (UMD), Scribe
- Recording: https://youtu.be/uif8oWX9_T8%7C
- Panelists:

(USC)
Moderator

(MIT)

(UMich)

(BU)

(Microsoft)

(Harvard)
- 4:45-5:30 pm: Lightning talk session 2
- Recording: https://youtu.be/FND9g-NkGDs
- 5:30-5:35 pm: Concluding Remarks for Day 1 (Na (Lina) Li, Harvard)
- 5:35-6:45 pm: Reception and poster
Jan 24 2025: Workshop Day 2
- 8:15-8:45am: Breakfast and registration
- 8:45-9:20am: Special Keynote 1: A History of Machine Reinforcement Learning
- Andrew Barto
- UMass Amherst
- Recording: https://youtu.be/be_2JNwqkmw

Abstract and Bio
The idea of reinforcement learning (RL) as a key principle of animal learning has been around at least since Edward Thorndike proposed the “Law of Effect” in 1898. Machine implementation of this principle began with electro-mechanical machines in the 1930s, and the earliest idea for computer implementation was probably Turing’s 1948 proposal of a computer implementation of a “pleasure-pain system”. In this talk I review what has followed Turing’s unimplemented proposal, starting with the first computer experiments in 1954, up to what we now know as modern computational RL.
Bio: Andrew Barto is Professor Emeritus of Information and Computer Sciences, University of Massachusetts Amherst, having retired in 2012. He served as Chair of the UMass Amherst Department of Computer Science from 2007 to 2011. He received a BS with distinction in mathematics from the University of Michigan in 1970, and a PhD in Computer and Communication Sciences in 1975, also from the University of Michigan. He joined the UMass Computer Science Department in 1977 as a Postdoctoral Research Associate, became an Associate Professor in 1982, and a Full Professor in 1991. Before retiring he co-directed the UMass Autonomous Learning Laboratory, which focused on reinforcement learning and produced many notable machine learning researchers.. Professor Barto is a Fellow of the American Association for the Advancement of Science, a Fellow and Life Member of the IEEE, and has published over one hundred papers or chapters in journals, books, and conference and workshop proceedings. He is co-author of the book “Reinforcement Learning: An Introduction,” MIT Press, 1998, which has received over 25,000 citations. A much expanded second edition was published in 2018.
- 9:20-9:55am: Special Keynote 2: Exploration vs. Exploitation: From Adaptive Control to Reinforcement Learning
- P.R. Kumar
- Texas A&M
- Recording: https://youtu.be/S5OBefx2pmk

Abstract and Bio
We address the problem of exploration versus exploitation that lies at the heart of reinforcement learning of dynamic systems. We describe the Biased Maximum Likelihood Method proposed to address this challenge. We present a comparative study of its regret performance in a variety of contexts ranging from Bandits to Markov Decision Processes to LQG systems. We also provide an account of regulation problems where there is no intrinsic conflict between exploration and exploitation, and present a historical account of results on stability, asymptotic behavior and robustness. [Joint work with Akshay Mete, Rahul Singh, Ping-Chun Hsieh, Yu-Heng Hung, Xi Liu, and Anirban Bhattacharya].
Bio: P. R. Kumar, B. Tech (1973, IIT Madras) and D.Sc. (1977, Washington Univ., St. Louis), was a faculty member in the Math Dept at University of Maryland, Baltimore County (1977-84), ECE and CSL at the University of Illinois, Urbana-Champaign (1985-2011), and has been at Texas A&M University since 2011. He has worked on problems in game theory, adaptive control, simulated annealing, machine learning, queueing networks, manufacturing systems, scheduling wafer fabrication plants, wireless networks and network information theory. His current research focus includes renewable energy, power systems, security, automated transportation, unmanned aerial vehicle traffic management, millimeter wave 5G, and cyber-physical systems. He is a member of the U.S. National Academy of Engineering, The World Academy of Sciences, and Indian National Academy of Engineering. He was awarded an honorary doctorate by ETH, Zurich. He received the Alexander Graham Bell Medal of IEEE, the IEEE Field Award for Control Systems, the Donald Eckman Award of the American Automatic Control Council, the Ellersick Prize of IEEE Communication Society, the Outstanding Contribution Award of ACM SIGMOBILE, the Infocom Achievement Award, the ACM SIGMOBILE Test-of-Time Paper Award, and COMSNETS Outstanding Contribution Award. He is a Fellow of IEEE, ACM and IFAC. He is an Honorary Professor at IIT Hyderabad.
- 9:55-10:10am: Break
- 10:10-11:20am: Panel 3: RL and control in GenAI era
- Ben Eysenbach (Princeton), Lead Organizer
- Zhuoran Yang (Yale), Scribe
- Recording: https://youtu.be/6XJBiXK2cdE
- Panelists:

(UW & Google)
Moderator

(UC Davis)

(Georgia Tech)

(MIT)

(UT Austin)

(Harvard)
- 11:25-12:35 pm Breakout Discussion (invited participants only)
- 4 breakout sessions; each breakout session with 1 focused topic.
- Topics: 1) RL <-> Control, 2) Theory <-> Practice, 3) Robotics/Autonomy, 4) Societal/Multiagent Systems.
- 12:45 pm Grab to-go lunch (breakout session participants only) and departure