# Tutorial 1: Game Set-Up and Random Player¶

Week 3, Day 5: Reinforcement Learning for Games

Content creators: Mandana Samiei, Raymond Chua, Tim Lilicrap, Blake Richards

Content reviewers: Arush Tagade, Lily Cheng, Melvin Selim Atay, Kelson Shilling-Scrivo

Content editors: Melvin Selim Atay, Spiros Chavlis, Gunnar Blohm

Production editors: Namrata Bafna, Gagana B, Spiros Chavlis

# Tutorial Objectives¶

In this tutorial, you will learn how to implement a game loop and create a random player. In future tutorials, you will be training other types of players using reinforcement learning.

The specific objectives for this tutorial:

• Understand the format of two-players games, Othello specifically

• Understand how to create random players

# Setup¶

## Install dependencies¶

```# @title Install dependencies
!pip install coloredlogs --quiet

from evaltools.airtable import AirtableForm

# generate airtable form
```
```# Imports
import os
import torch
import random
import logging
import coloredlogs
import numpy as np
import torch.optim as optim

log = logging.getLogger(__name__)
```

## Set random seed¶

Executing `set_seed(seed=seed)` you are setting the seed

```# @title Set random seed

# @markdown Executing `set_seed(seed=seed)` you are setting the seed

# For DL its critical to set the random seed so that students can have a
# baseline to compare their results to expected results.

# Call `set_seed` function in the exercises to ensure reproducibility.
import random
import torch

def set_seed(seed=None, seed_torch=True):
"""
Function that controls randomness. NumPy and random modules must be imported.

Args:
seed : Integer
A non-negative integer that defines the random state. Default is `None`.
seed_torch : Boolean
If `True` sets the random seed for pytorch tensors, so pytorch module
must be imported. Default is `True`.

Returns:
Nothing.
"""
if seed is None:
seed = np.random.choice(2 ** 32)
random.seed(seed)
np.random.seed(seed)
if seed_torch:
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.cuda.manual_seed(seed)
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True

print(f'Random seed {seed} has been set.')

# In case that `DataLoader` is used
def seed_worker(worker_id):
"""
DataLoader will reseed workers following randomness in

Args:
worker_id: integer
ID of subprocess to seed. 0 means that
the data will be loaded in the main process

Returns:
Nothing
"""
worker_seed = torch.initial_seed() % 2**32
np.random.seed(worker_seed)
random.seed(worker_seed)
```

## Set device (GPU or CPU). Execute `set_device()`¶

```# @title Set device (GPU or CPU). Execute `set_device()`
# especially if torch modules used.

# Inform the user if the notebook uses GPU or CPU.

def set_device():
"""
Set the device. CUDA if available, CPU otherwise

Args:
None

Returns:
Nothing
"""
device = "cuda" if torch.cuda.is_available() else "cpu"
if device != "cuda":
print("WARNING: For this notebook to perform best, "
"if possible, in the menu under `Runtime` -> "
"`Change runtime type.`  select `GPU` ")
else:
print("GPU is enabled in this notebook.")

return device
```
```SEED = 2021
set_seed(seed=SEED)
DEVICE = set_device()
```
```Random seed 2021 has been set.
WARNING: For this notebook to perform best, if possible, in the menu under `Runtime` -> `Change runtime type.`  select `GPU`
```

```# @title Download the modules

# @markdown Run this cell!

import os, io, sys, shutil, zipfile
from urllib.request import urlopen

#!git clone git://github.com/raymondchua/nma_rl_games.git --quiet
REPO_PATH = 'nma_rl_games'

if os.path.exists(REPO_PATH):
shutil.rmtree(REPO_PATH)
else:

with urlopen(zipurl) as zipresp:
zfile.extractall()

print(f"Add the {REPO_PATH} in the path and import the modules.")
# add the repo in the path
sys.path.append('nma_rl_games/alpha-zero')

# @markdown Import modules designed for use in this notebook
import Arena

from utils import *
from Game import Game
from MCTS import MCTS
from NeuralNet import NeuralNet

# from othello.OthelloPlayers import *
from othello.OthelloLogic import Board
# from othello.OthelloGame import OthelloGame
# from othello.pytorch.NNet import NNetWrapper as NNet
```
```Downloading and unzipping the file... Please wait.
```
```Download completed.
Add the nma_rl_games in the path and import the modules.
```

The hyperparameters used throughout the notebook.

```args = dotdict({
'numIters': 1,            # In training, number of iterations = 1000 and num of episodes = 100
'numEps': 1,              # Number of complete self-play games to simulate during a new iteration.
'tempThreshold': 15,      # To control exploration and exploitation
'updateThreshold': 0.6,   # During arena playoff, new neural net will be accepted if threshold or more of games are won.
'maxlenOfQueue': 200,     # Number of game examples to train the neural networks.
'numMCTSSims': 15,        # Number of games moves for MCTS to simulate.
'arenaCompare': 10,       # Number of games to play during arena play to determine if new net will be accepted.
'cpuct': 1,
'maxDepth':5,             # Maximum number of rollouts
'numMCsims': 5,           # Number of monte carlo simulations
'mc_topk': 3,             # Top k actions for monte carlo rollout

'checkpoint': './temp/',
'numItersForTrainExamplesHistory': 20,

# Define neural network arguments
'lr': 0.001,               # lr: Learning Rate
'dropout': 0.3,
'epochs': 10,
'batch_size': 64,
'device': DEVICE,
'num_channels': 512,
})
```

# Section 1: Create a game/agent loop for RL¶

Time estimate: ~20mins

## Section 1.1: Introduction to OthelloGame¶

Othello is a board game played by two players on a board of 64 squares arranged in an eight-by-eight grid, with 64 playing pieces that are black on one side and white on the other.

Setup: The board will start with 2 black discs and 2 white discs at the centre of the board. They are arranged with black forming a North-East to South-West direction. White is forming a North-West to South-East direction. Each player gets 32 discs and black always starts the game.

Game rules:

• Players take turns placing a single disk at a time.

• A move is made by placing a disc of the player’s color on the board to surround (i.e. “outflank”) discs of the opposite color. In other words, the player with black discs must place on so that there is a straight line between the newly placed disc and another black disc, with one or more white pieces between them.

• Surrounded disks get flipped (i.e. change color).

• If a player does not have a valid move (they cannot place their disc to outflank the oppponent’s discs), they pass on their turn

• A player can not voluntarily forfeit his turn.

• When both players can not make a valid move the game ends.

There are nice rules/diagrams here if useful: https://www.eothello.com/. You can play an example Othello game there if you like!

Note: we will use a 6x6 board to speed computations up

Exercise Goal: How to setup a game environment with multiple players for reinforcement learning experiments.

Exercise:

• Build an agent that plays random moves

• Connect with connect 4 game

• Generate games including wins and losses

Execute the following code to enable the `OthelloGame` class. This class represents a game board and has methods such `getInitBoard` to create the intial board, `getValidMove` to return the options of valid moves, and other helpful functionality to play the game. You do not need to understand every line of code in this class but try to get a sense of the possible methods

```class OthelloGame(Game):
"""
Instantiate Othello Game
"""
square_content = {
-1: "X",
+0: "-",
+1: "O"
}

@staticmethod
def getSquarePiece(piece):
return OthelloGame.square_content[piece]

def __init__(self, n):
self.n = n

def getInitBoard(self):
# Return initial board (numpy board)
b = Board(self.n)
return np.array(b.pieces)

def getBoardSize(self):
# (a,b) tuple
return (self.n, self.n)

def getActionSize(self):
# Return number of actions, n is the board size and +1 is for no-op action
return self.n*self.n + 1

def getCanonicalForm(self, board, player):
# Return state if player==1, else return -state if player==-1
return player*board

def stringRepresentation(self, board):
return board.tobytes()

board_s = "".join(self.square_content[square] for row in board for square in row)
return board_s

def getScore(self, board, player):
b = Board(self.n)
b.pieces = np.copy(board)
return b.countDiff(player)

@staticmethod
def display(board):
n = board.shape[0]
print("   ", end="")
for y in range(n):
print(y, end=" ")
print("")
print("-----------------------")
for y in range(n):
print(y, "|", end="")    # Print the row
for x in range(n):
piece = board[y][x]    # Get the piece to print
print(OthelloGame.square_content[piece], end=" ")
print("|")
print("-----------------------")

@staticmethod
def displayValidMoves(moves):
# Display possible moves
A=np.reshape(moves[0:-1], board.shape)
n = board.shape[0]
print("  ")
print("possible moves")
print("   ", end="")
for y in range(n):
print(y, end=" ")
print("")
print("-----------------------")
for y in range(n):
print(y, "|", end="")    # Print the row
for x in range(n):
piece = A[y][x]    # Get the piece to print
print(OthelloGame.square_content[piece], end=" ")
print("|")
print("-----------------------")

def getNextState(self, board, player, action):
"""
Helper function to make valid move
If player takes action on board, return next (board,player)
and action must be a valid move

Args:
board: np.ndarray
Board of size n x n [6x6 in this case]
player: Integer
ID of current player
action: np.ndarray
Space of actions

Returns:
(board,player) tuple signifying next state
"""
if action == self.n*self.n:
return (board, -player)
b = Board(self.n)
b.pieces = np.copy(board)
move = (int(action/self.n), action%self.n)
b.execute_move(move, player)
return (b.pieces, -player)

def getValidMoves(self, board, player):
"""
Helper function to make valid move
If player takes action on board, return next (board,player)
and action must be a valid move

Args:
board: np.ndarray
Board of size n x n [6x6 in this case]
player: Integer
ID of current player
action: np.ndarray
Space of action

Returns:
valids: np.ndarray
Returns a fixed size binary vector
"""
valids = [0]*self.getActionSize()
b = Board(self.n)
b.pieces = np.copy(board)
legalMoves =  b.get_legal_moves(player)
if len(legalMoves)==0:
valids[-1]=1
return np.array(valids)
for x, y in legalMoves:
valids[self.n*x+y]=1
return np.array(valids)

def getGameEnded(self, board, player):
"""
Helper function to signify if game has ended

Args:
board: np.ndarray
Board of size n x n [6x6 in this case]
player: Integer
ID of current player

Returns:
0 if not ended, 1 if player 1 won, -1 if player 1 lost
"""
b = Board(self.n)
b.pieces = np.copy(board)
if b.has_legal_moves(player):
return 0
if b.has_legal_moves(-player):
return 0
if b.countDiff(player) > 0:
return 1
return -1

def getSymmetries(self, board, pi):
"""
Get mirror/rotational configurations of board

Args:
board: np.ndarray
Board of size n x n [6x6 in this case]
pi: np.ndarray
Dimension of board

Returns:
l: list
90 degree of board, 90 degree of pi_board
"""
assert(len(pi) == self.n**2+1)  # 1 for pass
pi_board = np.reshape(pi[:-1], (self.n, self.n))
l = []

for i in range(1, 5):
for j in [True, False]:
newB = np.rot90(board, i)
newPi = np.rot90(pi_board, i)
if j:
newB = np.fliplr(newB)
newPi = np.fliplr(newPi)
l += [(newB, list(newPi.ravel()) + [pi[-1]])]
return l
```

Below, we initialize and view a board.

```# Display the board
set_seed(seed=SEED)

# Set up the game
game = OthelloGame(6)

# Get the initial board
board = game.getInitBoard()

# Display the board
game.display(board)

# Observe the game board size
print(f'Board size = {game.getBoardSize()}')

# Observe the action size
print(f'Action size = {game.getActionSize()}')
```
```Random seed 2021 has been set.
0 1 2 3 4 5
-----------------------
0 |- - - - - - |
1 |- - - - - - |
2 |- - X O - - |
3 |- - O X - - |
4 |- - - - - - |
5 |- - - - - - |
-----------------------
Board size = (6, 6)
Action size = 37
```

Now let’s look at the valid actions for player 1 (the circles). `game.getValidMoves` returns 1s and 0s for every position on the board, 1 indicates if it is a valid place to put a new disc. Note that it turns a list (this could be reshaped into the board shape).

We also have a method to visualize the valid actions. Compare the valid actions to the board above.

```# Get valid moves
valids = game.getValidMoves(board, 1)
print(valids)

# Visualize the moves
game.displayValidMoves(valids)
```
```[0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0]

possible moves
0 1 2 3 4 5
-----------------------
0 |- - - - - - |
1 |- - O - - - |
2 |- O - - - - |
3 |- - - - O - |
4 |- - - O - - |
5 |- - - - - - |
-----------------------
```

## Section 1.2: Create a random player¶

Let’s start by setting up the game loop using a random player to start with so that we we can test the game loop and make sure it works correctly.

To do so, we will first implement a random player in 3 steps:

1. determine which moves are possible at all

2. assign a uniform probability to each more (remember, this is a random player): 1/N for N valid moves

3. randomly choose a move from the possible moves

### Coding Exercise 1.2: Implement a random player¶

```class RandomPlayer():
"""
Simulates Random Player
"""

def __init__(self, game):
self.game = game

def play(self, board):
"""
Simulates game play

Args:
board: np.ndarray
Board of size n x n [6x6 in this case]

Returns:
a: int
Randomly chosen move
"""
#################################################
## TODO for students: ##
## 1. Please compute the valid moves using getValidMoves() and the game class self.game. ##
## 2. Compute the probability over actions.##
## 3. Pick a random action based on the probability computed above.##
# Fill out function and remove ##
raise NotImplementedError("Implement the random player")
#################################################

# Compute the valid moves using getValidMoves()
valids = self.game.getValidMoves(board, 1)

# Compute the probability of each move being played (random player means this should
# be uniform for valid moves, 0 for others)
prob = ...

# Pick a random action based on the probabilities (hint: np.choice is useful)
a = ...

return a

atform.add_event('Coding Exercise 1.2: Implement a random player')
```

Click for solution

## Section 1.3: Create two random agents to play against each other¶

Now we create 2 random players and let them play against one another for a number of times… We will use some nice functionality we imported above, including the `Arena` class that allows multiple game plays. You can check out the code here if you want, but it is not necessary: https://github.com/raymondchua/nma_rl_games

```# Define the random player
player1 = RandomPlayer(game).play  # Player 1 is a random player
player2 = RandomPlayer(game).play  # Player 2 is a random player

# Define number of games
num_games = 20

# Start the competition
set_seed(seed=SEED)
arena = Arena.Arena(player1, player2 , game, display=None)  # To see the steps of the competition set "display=OthelloGame.display"
result = arena.playGames(num_games, verbose=False)  # return  ( number of games won by player1, num of games won by player2, num of games won by nobody)
print(f"\n\n{result}")
```
```(11, 9, 0)
```

The results are displayed in the following way: (Number of player 1 wins, number of player 2 wins, number of ties)

## Section 1.4: Compute win rate for the random player (player 1)¶

```print(f"Number of games won by player1 = {result[0]}, "
f"Number of games won by player2 = {result[1]} out of {num_games} games")
win_rate_player1 = result[0]/num_games
print(f"\nWin rate for player1 over 20 games: {round(win_rate_player1*100, 1)}%")
```
```Number of games won by player1 = 11, Number of games won by player2 = 9 out of 20 games

Win rate for player1 over 20 games: 55.0%
```

Note: the random player is purely policy-based. It contains no estimates of value. Next we’ll see how to estimate and use value functions for game playing.

# Summary¶

In this tutorial, you have learned about the Othello game, how to implement a game loop, and create a random player.