This chapter is a brief introduction to Reinforcement Learning (RL) and includes some key concepts associated with it.
In this chapter, we talk about Reinforcement Learning as a core concept and then define it further. We show a complete flow of how Reinforcement Learning works. We discuss exactly where Reinforcement Learning fits into artificial intelligence (AI). After that we define key terms related to Reinforcement Learning. We start with agents and then touch on environments and then finally talk about the connection between agents and environments.
What Is Reinforcement Learning?
We use Machine Learning to constantly improve the performance of machines or programs over time. The simplified way of implementing a process that improves machine performance with time is using Reinforcement Learning (RL). Reinforcement Learning is an approach through which intelligent programs, known as agents, work in a known or unknown environment to constantly adapt and learn based on giving points. The feedback might be positive, also known as rewards, or negative, also called punishments. Considering the agents and the environment interaction, we then determine which action to take.
In a nutshell, Reinforcement Learning is based on rewards and punishments . Some important points about Reinforcement Learning:
The Reinforcement Learning cycle is depicted in Figure 1-1 with the help of a robot.
A maze is a good example that can be studied using Reinforcement Learning , in order to determine the exact right moves to complete the maze. In below figure, we are applying Reinforcement Learning and we call it the Reinforcement Learning box because within its vicinity the process of RL works. RL starts with an intelligent program, known as agents, and when they interact with environments, there are rewards and punishments associated. An environment can be either known or unknown to the agents. The agents take actions to move to the next state in order to maximize rewards.
In the maze, the centralized concept is to keep moving. The goal is to clear the maze and reach the end as quickly as possible. The following concepts of Reinforcement Learning and the working scenario are discussed later this chapter.
We use the maze example to apply concepts of Reinforcement Learning. We will be describing the following steps :
The rewards predictions are made iteratively, where we update the value of each state in a maze based on the value of the best subsequent state and the immediate reward obtained. This is called the update rule. The constant movement of the Reinforcement Learning process is based on decision-making.
Reinforcement Learning works on a trial-and-error basis because it is very difficult to predict which action to take when it is in one state. From the maze problem itself, you can see that in order get the optimal path for the next move, you have to weigh a lot of factors. It is always on the basis of state action and rewards. For the maze, we have to compute and account for probability to take the step. The maze also does not consider the reward of the previous step; it is specifically considering the move to the next state. The concept is the same for all Reinforcement Learning processes.
Here are the steps of this process:
Reinforcement Learning works well with intelligent program agents that give rewards and punishments when interacting with an environment. This interaction is very important because through these exchanges, the agent adapts to the environments. When a Machine Learning program, robot, or Reinforcement Learning program starts working, the agents are exposed to known or unknown environments and the Reinforcement Learning technique allows the agents to interact and adapt according to the environment’s features.
Accordingly, the agents work and the Reinforcement Learning robot learns. In order to get to a desired position, we assign rewards and punishments.
Now, the program has to work around the optimal path to get maximum rewards if it fails (that is, it takes punishments or receives negative points). In order to reach a new position, which also is known as a state, it must perform what we call an action. To perform an action, we implement a function, also known as a policy. A policy is therefore a function that does some work.
Faces of Reinforcement Learning
As you see from the Venn diagram in Figure 1-5, Reinforcement Learning sits at the intersection of many different fields of science.
The intersection points reveal a very strong feature of Reinforcement Learning—it shows the science of decision-making . If we have two paths and have to decide which path to take so that some point is met, a scientific decision-making process can be designed. Reinforcement Learning is the fundamental science of optimal decision-making.
If we focus on the computer science part of the Venn diagram in Figure 1-5, we see that if we want to learn, it falls under the category of Machine Learning, which is specifically mapped to Reinforcement Learning. Reinforcement Learning can be applied to many different fields of science. In engineering, we have devices that focus mostly on optimal control. In neuroscience, we are concerned with how the brain works as a stimulant for making decisions and study the reward system that works on the brain (the dopamine system).
Psychologists can apply Reinforcement Learning to determine how animals make decisions. In mathematics, we have a lot of data applying Reinforcement Learning in operations research.
The Flow of Reinforcement Learning
Figure 1-6 connects agents and environments.
The interaction happens from one state to another. The exact connection starts between an agent and the environment. Rewards are happening on a regular basis. We take appropriate actions to move from one state to another. The key points of consideration after going through the details are the following:
Figure 1-7 simplifies the interaction process.
An agent is always learning and finally makes a decision. An agent is a learner, which means there might be different paths. When the agent starts training, it starts to adapt and intelligently learns from its surroundings. The agent is also a decision maker because it tries to take an action that will get it the maximum reward. When the agent starts interacting with the environment, it can choose an action and respond accordingly. From then on, new scenes are created. When the agent changes from one place to another in an environment, every change results in some kind of modification. These changes are depicted as scenes. The transition that happens in each step helps the agent solve the Reinforcement Learning problem more effectively.
Let’s look at another scenario of state transitioning:
Learn to choose actions that maximize the following :
- [ 英文學習 ]
- [ 計算機概論 ]
- [ 深入雲計算 ]
- [ 雜七雜八 ]
- [ Algorithm in Java ]
- [ Data Structures with Java ]
- [ IR Class ]
- [ Java 文章收集 ]
- [ Java 代碼範本 ]
- [ Java 套件 ]
- [ JVM 應用 ]
- [ LFD Note ]
- [ MangoDB ]
- [ Math CC ]
- [ MongoDB ]
- [ MySQL 小學堂 ]
- [ Python 考題 ]
- [ Python 常見問題 ]
- [ Python 範例代碼 ]
- [C 常見考題]
- [C 範例代碼]
- [C/C++ 範例代碼]
- [Intro Alg]
- [Java 代碼範本]
- [Java 套件]
- [Linux 小技巧]
- [Linux 小學堂]
- [Linux 命令]
- [ML In Action]
- [Python 學習筆記]
- [Quick Python]
- [Software Engineering]
- [The python tutorial]
- ActiveMQ In Action
- Big Data 研究
- Design Pattern
- Device Driver Programming
- Docker 工具
- Docker Practice
- English Writing
- ExtJS 3.x
- Git Pro
- Hadoop. Hadoop Ecosystem
- Java Framework
- Java UI
- Learn Spark
- ML Udemy
- node js
- Py DS
- Python Std Library
- Python tools
- Ruby Packages
- Windows 技巧
Source From Here Question Im trying to figure out a way to get the time difference in seconds between two dates . For example, diff...
Preface: 在這個階層中，我們只需考慮電路模組的功能，而不需考慮其硬體的詳細內容. Verilog 的時序控制為以事件為基礎的時序控制: * 接線或暫存器的值被改變。 * 模組的輸入埠接收到新的值 * 正規...
屬性 : 系統相關 - 檔案與目錄 語法 : du [參數] [檔案] 參數 | 功能 -a | 顯示目錄中個別檔案的大小 -b | 以bytes為單位顯示 -c | 顯示個別檔案大小與總和 -D | 顯示符號鏈結的來源檔大小 -h | Hum...
CNN 卷積神經網路簡介 STEP1. 卷積神經網路介紹 CNN 卷積神經網路可以分成兩大部分: * 影像的特徵提取 : 透過 Convolution 與 Max Pooling 提取影像特徵. * Fully connected Feedforward n...