Wednesday, December 10, 2014

Reinforcement Learning or ADP?

I teach a course on Markov decision processes (MDPs) in the Fall. For those who have never heard of MDPs: the MDP is a widely studied tool in management science as well as in robotics. Much of my course focuses on a computational method called Reinforcement Learning (RL), which solves MDPs. Another more recent name for RL is ADP (approximate/adaptive dynamic programming). 

A student in my class asked which name is more appropriate: RL or ADP?

Now, there is some interesting history to these names that I shared with the student:

The classical approach to solving MDPs is called dynamic programming, and it was invented by Bellman and Howard in the 1950s and 1960s. Much later, in the 1980s, Reinforcement Learning emerged as a computational alternative to dynamic programming --- one that could solve MDPs with larger solution spaces than was possible with dynamic programming. 

The discovery of the Q-Learning algorithm by Watkins (1989) and that of the actor-critic algorithm by Barto, Sutton, and Anderson (1983) were two notable landmarks in the history of reinforcement learning.  It turns out that both of these algorithms were discovered in the computer science community, where the preferred application for the method was robotics.

Many robotics scientists, for very understandable reasons, prefer to directly test their algorithms in the actual system, rather than try them out in simulators first. The world "learning" is related to trials and errors that the robot has to perform before it becomes smart; essentially, the feedback (reinforcement) that the robot receives in the trials and errors allows it to get smarter. So the name reinforcement learning obviously made a lot of sense to the computer scientists.

Not surprisingly, early applications of reinforcement learning in the world of management science, where simulation is often preferred to testing something directly in the actual system, were also referred to as applications of "reinforcement learning." In recent times, however, the name approximate or adaptive dynamic programming (ADP) has gained popularity in the management science community.

But do note: Reinforcement Learning, technically speaking, is not really a dynamic programming method; rather it uses a statistical algorithm (called the Robbins-Monro algorithm) in combination with the Bellman equation.

In the early part of this century, the name ADP was not used much, but the name neuro-dynamic programming (NDP) was popular amongst many in the management science community. Unfortunately, NDP as a name did not gain traction. Although that name has not survived, the book on NDP by Bertsekas and Tsitsiklis is a seminal contribution to the field, and I still find myself referring to it every now and then.

Reinforcement Learning as a name has surprisingly survived.  In the artificial intelligence (AI) community, it is still the name to be used. Also, AI as a field is making rapid progress, and, indeed, many thinking robots use Reinforcement Learning. So it is very likely that this name will not die out.

I still like to use the name Reinforcement Learning. When I wrote my Ph.D. dissertation in 1999, I used it. When I wrote the first edition of my book on the topic in 2003, again, I used it, because ADP was almost never used then.  So you might say I am not a very unbiased observer here :) :)

For many, and I agree with them, it is just a matter of semantics, But it will be very, very interesting to see which name is used twenty years hence.  If this blog survives until Dec 10, 2034, we will be able to revisit this topic then.





Sunday, December 7, 2014

Second edition of Simulation Optimization textbook

The second edition of my textbook on Simulation Optimization is out!

Simulation-Based Optimization: Parametric Optimization Techniques and Reinforcement Learning introduces the evolving area of static and dynamic simulation-based optimization. Covered in detail are model-free optimization techniques – especially designed for those discrete-event, stochastic systems which can be simulated but whose analytical models are difficult to find in closed mathematical forms.

Please click here to order it via amazon.com.

All the codes are now available for free download here 

Also, the book has a dedicated website, which also has MATLAB programs for download. You can access it here 

The new (second) edition is a significantly improved version of the first. 

Key features of this revised and improved second edition include:
  • Extensive coverage, via step-by-step recipes, of powerful new algorithms for static simulation optimization, including simultaneous perturbation, backtracking adaptive search, and nested partitions, in addition to traditional methods, such as response surfaces, Nelder-Mead search, and meta-heuristics (simulated annealing, tabu search, and genetic algorithms)
  • Detailed coverage of the Bellman equation framework for Markov Decision Processes (MDPs), along with dynamic programming (value and policy iteration) for discounted, average, and total reward performance metrics
  • An in-depth consideration of dynamic simulation optimization via temporal differences and Reinforcement Learning: Q-Learning, SARSA, and R-SMART algorithms, and policy search, via API, Q-P-Learning, actor-critics, and learning automata
  • A special examination of neural-network-based function approximation for Reinforcement Learning, semi-Markov decision processes (SMDPs), finite-horizon MDPs, two time scales, case studies for industrial tasks, computer codes (placed online), and convergence proofs, via Banach fixed point theory and Ordinary Differential Equations
Themed around three areas in separate sets of chapters –- Static Simulation Optimization, Reinforcement Learning, and Convergence Analysis –- this book is written for researchers and students in the fields of engineering (industrial, systems, electrical, and computer), operations research, computer science, and applied mathematics.