Tutorial 2: Game Set-Up and Random Player
Contents
Tutorial 2: Game Set-Up and Random Player¶
Week 3, Day 5: Reinforcement Learning for Games and Deep Learning Thinking 3
By Neuromatch Academy
Content creators: Mandana Samiei, Raymond Chua, Tim Lilicrap, Blake Richards
Content reviewers: Arush Tagade, Lily Cheng, Melvin Selim Atay, Kelson Shilling-Scrivo
Content editors: Melvin Selim Atay, Spiros Chavlis, Gunnar Blohm
Production editors: Namrata Bafna, Gagana B, Spiros Chavlis
Tutorial Objectives¶
In this tutorial, you will learn how to implement a game loop and create a random player. In future tutorials, you will be training other types of players using reinforcement learning.
The specific objectives for this tutorial:
Understand the format of two-players games, Othello specifically
Understand how to create random players
Setup¶
Install dependencies¶
# @title Install dependencies
!pip install coloredlogs --quiet
!pip3 install vibecheck datatops --quiet
from vibecheck import DatatopsContentReviewContainer
def content_review(notebook_section: str):
return DatatopsContentReviewContainer(
"", # No text prompt
notebook_section,
{
"url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
"name": "public_testbed",
"user_key": "3zg0t05r",
},
).render()
# Imports
import os
import torch
import random
import logging
import coloredlogs
import numpy as np
import torch.optim as optim
log = logging.getLogger(__name__)
coloredlogs.install(level='INFO') # Change this to DEBUG to see more info.
Set random seed¶
Executing set_seed(seed=seed)
you are setting the seed
# @title Set random seed
# @markdown Executing `set_seed(seed=seed)` you are setting the seed
# For DL its critical to set the random seed so that students can have a
# baseline to compare their results to expected results.
# Read more here: https://pytorch.org/docs/stable/notes/randomness.html
# Call `set_seed` function in the exercises to ensure reproducibility.
import random
import torch
def set_seed(seed=None, seed_torch=True):
"""
Function that controls randomness. NumPy and random modules must be imported.
Args:
seed : Integer
A non-negative integer that defines the random state. Default is `None`.
seed_torch : Boolean
If `True` sets the random seed for pytorch tensors, so pytorch module
must be imported. Default is `True`.
Returns:
Nothing.
"""
if seed is None:
seed = np.random.choice(2 ** 32)
random.seed(seed)
np.random.seed(seed)
if seed_torch:
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
print(f'Random seed {seed} has been set.')
# In case that `DataLoader` is used
def seed_worker(worker_id):
"""
DataLoader will reseed workers following randomness in
multi-process data loading algorithm.
Args:
worker_id: integer
ID of subprocess to seed. 0 means that
the data will be loaded in the main process
Refer: https://pytorch.org/docs/stable/data.html#data-loading-randomness for more details
Returns:
Nothing
"""
worker_seed = torch.initial_seed() % 2**32
np.random.seed(worker_seed)
random.seed(worker_seed)
Set device (GPU or CPU). Execute set_device()
¶
# @title Set device (GPU or CPU). Execute `set_device()`
# especially if torch modules used.
# Inform the user if the notebook uses GPU or CPU.
def set_device():
"""
Set the device. CUDA if available, CPU otherwise
Args:
None
Returns:
Nothing
"""
device = "cuda" if torch.cuda.is_available() else "cpu"
if device != "cuda":
print("WARNING: For this notebook to perform best, "
"if possible, in the menu under `Runtime` -> "
"`Change runtime type.` select `GPU` ")
else:
print("GPU is enabled in this notebook.")
return device
SEED = 2021
set_seed(seed=SEED)
DEVICE = set_device()
Random seed 2021 has been set.
GPU is enabled in this notebook.
Download the modules¶
# @title Download the modules
# @markdown Run this cell!
# @markdown Download from OSF. The original repo is https://github.com/raymondchua/nma_rl_games.git
import os, io, sys, shutil, zipfile
from urllib.request import urlopen
# download from github repo directly
#!git clone git://github.com/raymondchua/nma_rl_games.git --quiet
REPO_PATH = 'nma_rl_games'
if os.path.exists(REPO_PATH):
download_string = "Redownloading"
shutil.rmtree(REPO_PATH)
else:
download_string = "Downloading"
zipurl = 'https://osf.io/kf4p9/download'
print(f"{download_string} and unzipping the file... Please wait.")
with urlopen(zipurl) as zipresp:
with zipfile.ZipFile(io.BytesIO(zipresp.read())) as zfile:
zfile.extractall()
print("Download completed.")
print(f"Add the {REPO_PATH} in the path and import the modules.")
# add the repo in the path
sys.path.append('nma_rl_games/alpha-zero')
# @markdown Import modules designed for use in this notebook
import Arena
from utils import *
from Game import Game
from MCTS import MCTS
from NeuralNet import NeuralNet
# from othello.OthelloPlayers import *
from othello.OthelloLogic import Board
# from othello.OthelloGame import OthelloGame
# from othello.pytorch.NNet import NNetWrapper as NNet
Downloading and unzipping the file... Please wait.
Download completed.
Add the nma_rl_games in the path and import the modules.
The hyperparameters used throughout the notebook.
args = dotdict({
'numIters': 1, # In training, number of iterations = 1000 and num of episodes = 100
'numEps': 1, # Number of complete self-play games to simulate during a new iteration.
'tempThreshold': 15, # To control exploration and exploitation
'updateThreshold': 0.6, # During arena playoff, new neural net will be accepted if threshold or more of games are won.
'maxlenOfQueue': 200, # Number of game examples to train the neural networks.
'numMCTSSims': 15, # Number of games moves for MCTS to simulate.
'arenaCompare': 10, # Number of games to play during arena play to determine if new net will be accepted.
'cpuct': 1,
'maxDepth':5, # Maximum number of rollouts
'numMCsims': 5, # Number of monte carlo simulations
'mc_topk': 3, # Top k actions for monte carlo rollout
'checkpoint': './temp/',
'load_model': False,
'load_folder_file': ('/dev/models/8x100x50','best.pth.tar'),
'numItersForTrainExamplesHistory': 20,
# Define neural network arguments
'lr': 0.001, # lr: Learning Rate
'dropout': 0.3,
'epochs': 10,
'batch_size': 64,
'device': DEVICE,
'num_channels': 512,
})
Section 0: Introduction¶
Video 0: Introduction¶
Submit your feedback¶
# @title Submit your feedback
content_review("W3D5_RL_for_games_intro")
Section 1: Create a game/agent loop for RL¶
Time estimate: ~20mins
Video 1: A game loop for RL¶
Submit your feedback¶
# @title Submit your feedback
content_review("W3D5_game_loop_for_RL")
Section 1.1: Introduction to OthelloGame¶
Othello is a board game played by two players on a board of 64 squares arranged in an eight-by-eight grid, with 64 playing pieces that are black on one side and white on the other.
Setup: The board will start with 2 black discs and 2 white discs at the centre of the board. They are arranged with black forming a North-East to South-West direction. White is forming a North-West to South-East direction. Each player gets 32 discs and black always starts the game.
Game rules:
Players take turns placing a single disk at a time.
A move is made by placing a disc of the player’s color on the board to surround (i.e. “outflank”) discs of the opposite color. In other words, the player with black discs must place on so that there is a straight line between the newly placed disc and another black disc, with one or more white pieces between them.
Surrounded disks get flipped (i.e. change color).
If a player does not have a valid move (they cannot place their disc to outflank the oppponent’s discs), they pass on their turn
A player can not voluntarily forfeit his turn.
When both players can not make a valid move the game ends.
There are nice rules/diagrams here if useful: https://www.eothello.com/. You can play an example Othello game there if you like!
Note: we will use a 6x6 board to speed computations up
Exercise Goal: How to setup a game environment with multiple players for reinforcement learning experiments.
Exercise:
Build an agent that plays random moves
Connect with connect 4 game
Generate games including wins and losses
Execute the following code to enable the OthelloGame
class. This class represents a game board and has methods such getInitBoard
to create the intial board, getValidMove
to return the options of valid moves, and other helpful functionality to play the game. You do not need to understand every line of code in this class but try to get a sense of the possible methods
class OthelloGame(Game):
"""
Instantiate Othello Game
"""
square_content = {
-1: "X",
+0: "-",
+1: "O"
}
@staticmethod
def getSquarePiece(piece):
return OthelloGame.square_content[piece]
def __init__(self, n):
self.n = n
def getInitBoard(self):
# Return initial board (numpy board)
b = Board(self.n)
return np.array(b.pieces)
def getBoardSize(self):
# (a,b) tuple
return (self.n, self.n)
def getActionSize(self):
# Return number of actions, n is the board size and +1 is for no-op action
return self.n*self.n + 1
def getCanonicalForm(self, board, player):
# Return state if player==1, else return -state if player==-1
return player*board
def stringRepresentation(self, board):
return board.tobytes()
def stringRepresentationReadable(self, board):
board_s = "".join(self.square_content[square] for row in board for square in row)
return board_s
def getScore(self, board, player):
b = Board(self.n)
b.pieces = np.copy(board)
return b.countDiff(player)
@staticmethod
def display(board):
n = board.shape[0]
print(" ", end="")
for y in range(n):
print(y, end=" ")
print("")
print("-----------------------")
for y in range(n):
print(y, "|", end="") # Print the row
for x in range(n):
piece = board[y][x] # Get the piece to print
print(OthelloGame.square_content[piece], end=" ")
print("|")
print("-----------------------")
@staticmethod
def displayValidMoves(moves):
# Display possible moves
A=np.reshape(moves[0:-1], board.shape)
n = board.shape[0]
print(" ")
print("possible moves")
print(" ", end="")
for y in range(n):
print(y, end=" ")
print("")
print("-----------------------")
for y in range(n):
print(y, "|", end="") # Print the row
for x in range(n):
piece = A[y][x] # Get the piece to print
print(OthelloGame.square_content[piece], end=" ")
print("|")
print("-----------------------")
def getNextState(self, board, player, action):
"""
Helper function to make valid move
If player takes action on board, return next (board,player)
and action must be a valid move
Args:
board: np.ndarray
Board of size n x n [6x6 in this case]
player: Integer
ID of current player
action: np.ndarray
Space of actions
Returns:
(board,player) tuple signifying next state
"""
if action == self.n*self.n:
return (board, -player)
b = Board(self.n)
b.pieces = np.copy(board)
move = (int(action/self.n), action%self.n)
b.execute_move(move, player)
return (b.pieces, -player)
def getValidMoves(self, board, player):
"""
Helper function to make valid move
If player takes action on board, return next (board,player)
and action must be a valid move
Args:
board: np.ndarray
Board of size n x n [6x6 in this case]
player: Integer
ID of current player
action: np.ndarray
Space of action
Returns:
valids: np.ndarray
Returns a fixed size binary vector
"""
valids = [0]*self.getActionSize()
b = Board(self.n)
b.pieces = np.copy(board)
legalMoves = b.get_legal_moves(player)
if len(legalMoves)==0:
valids[-1]=1
return np.array(valids)
for x, y in legalMoves:
valids[self.n*x+y]=1
return np.array(valids)
def getGameEnded(self, board, player):
"""
Helper function to signify if game has ended
Args:
board: np.ndarray
Board of size n x n [6x6 in this case]
player: Integer
ID of current player
Returns:
0 if not ended, 1 if player 1 won, -1 if player 1 lost
"""
b = Board(self.n)
b.pieces = np.copy(board)
if b.has_legal_moves(player):
return 0
if b.has_legal_moves(-player):
return 0
if b.countDiff(player) > 0:
return 1
return -1
def getSymmetries(self, board, pi):
"""
Get mirror/rotational configurations of board
Args:
board: np.ndarray
Board of size n x n [6x6 in this case]
pi: np.ndarray
Dimension of board
Returns:
l: list
90 degree of board, 90 degree of pi_board
"""
assert(len(pi) == self.n**2+1) # 1 for pass
pi_board = np.reshape(pi[:-1], (self.n, self.n))
l = []
for i in range(1, 5):
for j in [True, False]:
newB = np.rot90(board, i)
newPi = np.rot90(pi_board, i)
if j:
newB = np.fliplr(newB)
newPi = np.fliplr(newPi)
l += [(newB, list(newPi.ravel()) + [pi[-1]])]
return l
Below, we initialize and view a board.
# Display the board
set_seed(seed=SEED)
# Set up the game
game = OthelloGame(6)
# Get the initial board
board = game.getInitBoard()
# Display the board
game.display(board)
# Observe the game board size
print(f'Board size = {game.getBoardSize()}')
# Observe the action size
print(f'Action size = {game.getActionSize()}')
Random seed 2021 has been set.
0 1 2 3 4 5
-----------------------
0 |- - - - - - |
1 |- - - - - - |
2 |- - X O - - |
3 |- - O X - - |
4 |- - - - - - |
5 |- - - - - - |
-----------------------
Board size = (6, 6)
Action size = 37
Now let’s look at the valid actions for player 1 (the circles). game.getValidMoves
returns 1s and 0s for every position on the board, 1 indicates if it is a valid place to put a new disc. Note that it turns a list (this could be reshaped into the board shape).
We also have a method to visualize the valid actions. Compare the valid actions to the board above.
# Get valid moves
valids = game.getValidMoves(board, 1)
print(valids)
# Visualize the moves
game.displayValidMoves(valids)
[0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
possible moves
0 1 2 3 4 5
-----------------------
0 |- - - - - - |
1 |- - O - - - |
2 |- O - - - - |
3 |- - - - O - |
4 |- - - O - - |
5 |- - - - - - |
-----------------------
Section 1.2: Create a random player¶
Let’s start by setting up the game loop using a random player to start with so that we we can test the game loop and make sure it works correctly.
To do so, we will first implement a random player in 3 steps:
determine which moves are possible at all
assign a uniform probability to each more (remember, this is a random player): 1/N for N valid moves
randomly choose a move from the possible moves
Coding Exercise 1.2: Implement a random player¶
class RandomPlayer():
"""
Simulates Random Player
"""
def __init__(self, game):
self.game = game
def play(self, board):
"""
Simulates game play
Args:
board: np.ndarray
Board of size n x n [6x6 in this case]
Returns:
a: int
Randomly chosen move
"""
#################################################
## TODO for students: ##
## 1. Please compute the valid moves using getValidMoves() and the game class self.game. ##
## 2. Compute the probability over actions.##
## 3. Pick a random action based on the probability computed above.##
# Fill out function and remove ##
raise NotImplementedError("Implement the random player")
#################################################
# Compute the valid moves using getValidMoves()
valids = self.game.getValidMoves(board, 1)
# Compute the probability of each move being played (random player means this should
# be uniform for valid moves, 0 for others)
prob = ...
# Pick a random action based on the probabilities (hint: np.choice is useful)
a = ...
return a
Submit your feedback¶
# @title Submit your feedback
content_review("W3D5_random_player")
Section 1.3: Create two random agents to play against each other¶
Now we create 2 random players and let them play against one another for a number of times… We will use some nice functionality we imported above, including the Arena
class that allows multiple game plays. You can check out the code here if you want, but it is not necessary: https://github.com/raymondchua/nma_rl_games
# Define the random player
player1 = RandomPlayer(game).play # Player 1 is a random player
player2 = RandomPlayer(game).play # Player 2 is a random player
# Define number of games
num_games = 20
# Start the competition
set_seed(seed=SEED)
arena = Arena.Arena(player1, player2 , game, display=None) # To see the steps of the competition set "display=OthelloGame.display"
result = arena.playGames(num_games, verbose=False) # return ( number of games won by player1, num of games won by player2, num of games won by nobody)
print(f"\n\n{result}")
Random seed 2021 has been set.
---------------------------------------------------------------------------
NotImplementedError Traceback (most recent call last)
Cell In[18], line 11
9 set_seed(seed=SEED)
10 arena = Arena.Arena(player1, player2 , game, display=None) # To see the steps of the competition set "display=OthelloGame.display"
---> 11 result = arena.playGames(num_games, verbose=False) # return ( number of games won by player1, num of games won by player2, num of games won by nobody)
12 print(f"\n\n{result}")
File ~/Desktop/TESTNMA/course-content-dl/tutorials/W3D5_ReinforcementLearningForGames/student/nma_rl_games/alpha-zero/Arena.py:81, in Arena.playGames(self, num, verbose)
79 draws = 0
80 for _ in tqdm(range(num), desc="Arena.playGames (1)"):
---> 81 gameResult = self.playGame(verbose=verbose)
82 if gameResult == 1:
83 oneWon += 1
File ~/Desktop/TESTNMA/course-content-dl/tutorials/W3D5_ReinforcementLearningForGames/student/nma_rl_games/alpha-zero/Arena.py:50, in Arena.playGame(self, verbose)
48 print("Turn ", str(it), "Player ", str(curPlayer))
49 self.display(board)
---> 50 action = players[curPlayer + 1](self.game.getCanonicalForm(board, curPlayer))
52 valids = self.game.getValidMoves(self.game.getCanonicalForm(board, curPlayer), 1)
54 if valids[action] == 0:
Cell In[16], line 27, in RandomPlayer.play(self, board)
10 """
11 Simulates game play
12
(...)
19 Randomly chosen move
20 """
21 #################################################
22 ## TODO for students: ##
23 ## 1. Please compute the valid moves using getValidMoves() and the game class self.game. ##
24 ## 2. Compute the probability over actions.##
25 ## 3. Pick a random action based on the probability computed above.##
26 # Fill out function and remove ##
---> 27 raise NotImplementedError("Implement the random player")
28 #################################################
29
30 # Compute the valid moves using getValidMoves()
31 valids = self.game.getValidMoves(board, 1)
NotImplementedError: Implement the random player
(11, 9, 0)
The results are displayed in the following way: (Number of player 1 wins, number of player 2 wins, number of ties)
Section 1.4: Compute win rate for the random player (player 1)¶
print(f"Number of games won by player1 = {result[0]}, "
f"Number of games won by player2 = {result[1]} out of {num_games} games")
win_rate_player1 = result[0]/num_games
print(f"\nWin rate for player1 over 20 games: {round(win_rate_player1*100, 1)}%")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[19], line 1
----> 1 print(f"Number of games won by player1 = {result[0]}, "
2 f"Number of games won by player2 = {result[1]} out of {num_games} games")
3 win_rate_player1 = result[0]/num_games
4 print(f"\nWin rate for player1 over 20 games: {round(win_rate_player1*100, 1)}%")
NameError: name 'result' is not defined
Number of games won by player1 = 11, Number of games won by player2 = 9 out of 20 games
Win rate for player1 over 20 games: 55.0%
Note: the random player is purely policy-based. It contains no estimates of value. Next we’ll see how to estimate and use value functions for game playing.
Summary¶
In this tutorial, you have learned about the Othello game, how to implement a game loop, and create a random player.