Q-learning vs dynamic programming
up vote
4
down vote
favorite
Is the classic Q-learning algorithm, using lookup table (instead of function approximation), equivalent to dynamic programming?
machine-learning dynamic-programming reinforcement-learning q-learning
add a comment |
up vote
4
down vote
favorite
Is the classic Q-learning algorithm, using lookup table (instead of function approximation), equivalent to dynamic programming?
machine-learning dynamic-programming reinforcement-learning q-learning
add a comment |
up vote
4
down vote
favorite
up vote
4
down vote
favorite
Is the classic Q-learning algorithm, using lookup table (instead of function approximation), equivalent to dynamic programming?
machine-learning dynamic-programming reinforcement-learning q-learning
Is the classic Q-learning algorithm, using lookup table (instead of function approximation), equivalent to dynamic programming?
machine-learning dynamic-programming reinforcement-learning q-learning
machine-learning dynamic-programming reinforcement-learning q-learning
edited Nov 22 at 16:01
nbro
5,51584692
5,51584692
asked Aug 17 '16 at 5:16
D_Wills
7818
7818
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
up vote
4
down vote
accepted
Dynamic Programming is an umbrella encompassing many algorithms. Q-Learning is a specific algorithm. So, no, it is not the same.
Also, if you mean Dynamic Programming as in Value Iteration or Policy Iteration, still not the same. These algorithms are "planning" methods. You have to give them a transition and a reward function and they will iteratively compute a value function and an optimal policy.
Q-Learning is a model-free reinforcement learning method. This one is "model-free", not because it doesn't use a machine learning model or anything like that, but because they don't require, and don't use a model of the environment, also known as MDP, to obtain an optimal policy. You also have "model-based" methods. These, unlike Dynamic Programming methods, are based on learning a model, not simply using one. And, unlike model-free methods, don't throw away samples after estimating values, they instead try to reconstruct the transition and reward function to get better performance.
Model-based methods combine model-free and planning algorithms to get same good results with less amount of samples than required by model-free methods (Q-Learning), and without needing a model like Dynamic Programming methods (Value/Policy Iteration).
add a comment |
up vote
10
down vote
From Sutton & Barto's book (Reinforcement Learning: An Introduction, chapter 4)
The term dynamic programming (DP) refers to a collection of algorithms
that can be used to compute optimal policies given a perfect model of
the environment as a Markov decision process (MDP). Classical DP
algorithms are of limited utility in reinforcement learning both
because of their assumption of a perfect model and because of their
great computational expense, but they are still important
theoretically.
So, although both share the same working principles (either using tabular Reinforcement Learning/Dynamic Programming or approximated RL/DP), the key difference between classic DP and classic RL is that the first assume the model is known. This basically means knowing the transition probabilities (which indicates the probability of change from state s to state s' given action a) and the expected immediate reward function.
On the contrary, RL methods only require to have access to a set of samples, either collected online or offline (depending on the algorithm).
Of course, there are hybrid methods that can be place between RL and DP, for example those that learn a model from the samples, and then use that model in the learning process.
NOTE: The term Dynamic Programming, in addition to a set of mathematical optimization techniques related with RL, is also used to refer a "general algorithmic pattern", as pointed in some comment. In both case, the fundaments are the same, but depending on the context may have a different meaning.
add a comment |
up vote
-1
down vote
If you use Q-learning in an offline setup, like AlphaGo, for example, then it is equivalent to dynamic programming. The difference is that it can also be used in an online setup.
1
no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.
– cel
Aug 17 '16 at 6:16
@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.
– Don Reba
Aug 17 '16 at 11:22
add a comment |
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
Dynamic Programming is an umbrella encompassing many algorithms. Q-Learning is a specific algorithm. So, no, it is not the same.
Also, if you mean Dynamic Programming as in Value Iteration or Policy Iteration, still not the same. These algorithms are "planning" methods. You have to give them a transition and a reward function and they will iteratively compute a value function and an optimal policy.
Q-Learning is a model-free reinforcement learning method. This one is "model-free", not because it doesn't use a machine learning model or anything like that, but because they don't require, and don't use a model of the environment, also known as MDP, to obtain an optimal policy. You also have "model-based" methods. These, unlike Dynamic Programming methods, are based on learning a model, not simply using one. And, unlike model-free methods, don't throw away samples after estimating values, they instead try to reconstruct the transition and reward function to get better performance.
Model-based methods combine model-free and planning algorithms to get same good results with less amount of samples than required by model-free methods (Q-Learning), and without needing a model like Dynamic Programming methods (Value/Policy Iteration).
add a comment |
up vote
4
down vote
accepted
Dynamic Programming is an umbrella encompassing many algorithms. Q-Learning is a specific algorithm. So, no, it is not the same.
Also, if you mean Dynamic Programming as in Value Iteration or Policy Iteration, still not the same. These algorithms are "planning" methods. You have to give them a transition and a reward function and they will iteratively compute a value function and an optimal policy.
Q-Learning is a model-free reinforcement learning method. This one is "model-free", not because it doesn't use a machine learning model or anything like that, but because they don't require, and don't use a model of the environment, also known as MDP, to obtain an optimal policy. You also have "model-based" methods. These, unlike Dynamic Programming methods, are based on learning a model, not simply using one. And, unlike model-free methods, don't throw away samples after estimating values, they instead try to reconstruct the transition and reward function to get better performance.
Model-based methods combine model-free and planning algorithms to get same good results with less amount of samples than required by model-free methods (Q-Learning), and without needing a model like Dynamic Programming methods (Value/Policy Iteration).
add a comment |
up vote
4
down vote
accepted
up vote
4
down vote
accepted
Dynamic Programming is an umbrella encompassing many algorithms. Q-Learning is a specific algorithm. So, no, it is not the same.
Also, if you mean Dynamic Programming as in Value Iteration or Policy Iteration, still not the same. These algorithms are "planning" methods. You have to give them a transition and a reward function and they will iteratively compute a value function and an optimal policy.
Q-Learning is a model-free reinforcement learning method. This one is "model-free", not because it doesn't use a machine learning model or anything like that, but because they don't require, and don't use a model of the environment, also known as MDP, to obtain an optimal policy. You also have "model-based" methods. These, unlike Dynamic Programming methods, are based on learning a model, not simply using one. And, unlike model-free methods, don't throw away samples after estimating values, they instead try to reconstruct the transition and reward function to get better performance.
Model-based methods combine model-free and planning algorithms to get same good results with less amount of samples than required by model-free methods (Q-Learning), and without needing a model like Dynamic Programming methods (Value/Policy Iteration).
Dynamic Programming is an umbrella encompassing many algorithms. Q-Learning is a specific algorithm. So, no, it is not the same.
Also, if you mean Dynamic Programming as in Value Iteration or Policy Iteration, still not the same. These algorithms are "planning" methods. You have to give them a transition and a reward function and they will iteratively compute a value function and an optimal policy.
Q-Learning is a model-free reinforcement learning method. This one is "model-free", not because it doesn't use a machine learning model or anything like that, but because they don't require, and don't use a model of the environment, also known as MDP, to obtain an optimal policy. You also have "model-based" methods. These, unlike Dynamic Programming methods, are based on learning a model, not simply using one. And, unlike model-free methods, don't throw away samples after estimating values, they instead try to reconstruct the transition and reward function to get better performance.
Model-based methods combine model-free and planning algorithms to get same good results with less amount of samples than required by model-free methods (Q-Learning), and without needing a model like Dynamic Programming methods (Value/Policy Iteration).
edited Nov 16 '17 at 15:54
answered Nov 16 '17 at 15:48
mimoralea
6,01543846
6,01543846
add a comment |
add a comment |
up vote
10
down vote
From Sutton & Barto's book (Reinforcement Learning: An Introduction, chapter 4)
The term dynamic programming (DP) refers to a collection of algorithms
that can be used to compute optimal policies given a perfect model of
the environment as a Markov decision process (MDP). Classical DP
algorithms are of limited utility in reinforcement learning both
because of their assumption of a perfect model and because of their
great computational expense, but they are still important
theoretically.
So, although both share the same working principles (either using tabular Reinforcement Learning/Dynamic Programming or approximated RL/DP), the key difference between classic DP and classic RL is that the first assume the model is known. This basically means knowing the transition probabilities (which indicates the probability of change from state s to state s' given action a) and the expected immediate reward function.
On the contrary, RL methods only require to have access to a set of samples, either collected online or offline (depending on the algorithm).
Of course, there are hybrid methods that can be place between RL and DP, for example those that learn a model from the samples, and then use that model in the learning process.
NOTE: The term Dynamic Programming, in addition to a set of mathematical optimization techniques related with RL, is also used to refer a "general algorithmic pattern", as pointed in some comment. In both case, the fundaments are the same, but depending on the context may have a different meaning.
add a comment |
up vote
10
down vote
From Sutton & Barto's book (Reinforcement Learning: An Introduction, chapter 4)
The term dynamic programming (DP) refers to a collection of algorithms
that can be used to compute optimal policies given a perfect model of
the environment as a Markov decision process (MDP). Classical DP
algorithms are of limited utility in reinforcement learning both
because of their assumption of a perfect model and because of their
great computational expense, but they are still important
theoretically.
So, although both share the same working principles (either using tabular Reinforcement Learning/Dynamic Programming or approximated RL/DP), the key difference between classic DP and classic RL is that the first assume the model is known. This basically means knowing the transition probabilities (which indicates the probability of change from state s to state s' given action a) and the expected immediate reward function.
On the contrary, RL methods only require to have access to a set of samples, either collected online or offline (depending on the algorithm).
Of course, there are hybrid methods that can be place between RL and DP, for example those that learn a model from the samples, and then use that model in the learning process.
NOTE: The term Dynamic Programming, in addition to a set of mathematical optimization techniques related with RL, is also used to refer a "general algorithmic pattern", as pointed in some comment. In both case, the fundaments are the same, but depending on the context may have a different meaning.
add a comment |
up vote
10
down vote
up vote
10
down vote
From Sutton & Barto's book (Reinforcement Learning: An Introduction, chapter 4)
The term dynamic programming (DP) refers to a collection of algorithms
that can be used to compute optimal policies given a perfect model of
the environment as a Markov decision process (MDP). Classical DP
algorithms are of limited utility in reinforcement learning both
because of their assumption of a perfect model and because of their
great computational expense, but they are still important
theoretically.
So, although both share the same working principles (either using tabular Reinforcement Learning/Dynamic Programming or approximated RL/DP), the key difference between classic DP and classic RL is that the first assume the model is known. This basically means knowing the transition probabilities (which indicates the probability of change from state s to state s' given action a) and the expected immediate reward function.
On the contrary, RL methods only require to have access to a set of samples, either collected online or offline (depending on the algorithm).
Of course, there are hybrid methods that can be place between RL and DP, for example those that learn a model from the samples, and then use that model in the learning process.
NOTE: The term Dynamic Programming, in addition to a set of mathematical optimization techniques related with RL, is also used to refer a "general algorithmic pattern", as pointed in some comment. In both case, the fundaments are the same, but depending on the context may have a different meaning.
From Sutton & Barto's book (Reinforcement Learning: An Introduction, chapter 4)
The term dynamic programming (DP) refers to a collection of algorithms
that can be used to compute optimal policies given a perfect model of
the environment as a Markov decision process (MDP). Classical DP
algorithms are of limited utility in reinforcement learning both
because of their assumption of a perfect model and because of their
great computational expense, but they are still important
theoretically.
So, although both share the same working principles (either using tabular Reinforcement Learning/Dynamic Programming or approximated RL/DP), the key difference between classic DP and classic RL is that the first assume the model is known. This basically means knowing the transition probabilities (which indicates the probability of change from state s to state s' given action a) and the expected immediate reward function.
On the contrary, RL methods only require to have access to a set of samples, either collected online or offline (depending on the algorithm).
Of course, there are hybrid methods that can be place between RL and DP, for example those that learn a model from the samples, and then use that model in the learning process.
NOTE: The term Dynamic Programming, in addition to a set of mathematical optimization techniques related with RL, is also used to refer a "general algorithmic pattern", as pointed in some comment. In both case, the fundaments are the same, but depending on the context may have a different meaning.
edited Nov 22 at 15:57
nbro
5,51584692
5,51584692
answered Aug 17 '16 at 8:24
Pablo EM
2,59411626
2,59411626
add a comment |
add a comment |
up vote
-1
down vote
If you use Q-learning in an offline setup, like AlphaGo, for example, then it is equivalent to dynamic programming. The difference is that it can also be used in an online setup.
1
no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.
– cel
Aug 17 '16 at 6:16
@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.
– Don Reba
Aug 17 '16 at 11:22
add a comment |
up vote
-1
down vote
If you use Q-learning in an offline setup, like AlphaGo, for example, then it is equivalent to dynamic programming. The difference is that it can also be used in an online setup.
1
no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.
– cel
Aug 17 '16 at 6:16
@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.
– Don Reba
Aug 17 '16 at 11:22
add a comment |
up vote
-1
down vote
up vote
-1
down vote
If you use Q-learning in an offline setup, like AlphaGo, for example, then it is equivalent to dynamic programming. The difference is that it can also be used in an online setup.
If you use Q-learning in an offline setup, like AlphaGo, for example, then it is equivalent to dynamic programming. The difference is that it can also be used in an online setup.
answered Aug 17 '16 at 5:49
Don Reba
10.7k23252
10.7k23252
1
no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.
– cel
Aug 17 '16 at 6:16
@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.
– Don Reba
Aug 17 '16 at 11:22
add a comment |
1
no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.
– cel
Aug 17 '16 at 6:16
@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.
– Don Reba
Aug 17 '16 at 11:22
1
1
no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.
– cel
Aug 17 '16 at 6:16
no it is not equivalent - you cannot compare these two things that way. Dynamic programming is a very general algorithmic pattern that is used in many algorithms. Your algorithm may make use of a dynamic programming strategy, but it certainly is not dynamic programming.
– cel
Aug 17 '16 at 6:16
@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.
– Don Reba
Aug 17 '16 at 11:22
@cel, you have to consider the context. With respect to RL, it is safe to assume that DP refers to value or policy iteration.
– Don Reba
Aug 17 '16 at 11:22
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f38988582%2fq-learning-vs-dynamic-programming%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown