--question--
--text--
Fill in the blanks to complete the following Q-Learning equation:
Q[__A__, __B__] = Q[__A__, __B__] + LEARNING_RATE * (reward + GAMMA * np.max(Q[__C__, :]) - Q[__A__, __B__])
--answers--
A: state
B: action
C: next_state
A: state
B: action
C: prev_state
A: state
B: reaction
C: next_state
--video-solution--
1