Open In Colab   Open in Kaggle

Tutorial 2: Game Set-Up and Random Player

Week 3, Day 5: Reinforcement Learning for Games and Deep Learning Thinking 3

By Neuromatch Academy

Content creators: Mandana Samiei, Raymond Chua, Tim Lilicrap, Blake Richards

Content reviewers: Arush Tagade, Lily Cheng, Melvin Selim Atay, Kelson Shilling-Scrivo

Content editors: Melvin Selim Atay, Spiros Chavlis, Gunnar Blohm

Production editors: Namrata Bafna, Gagana B, Spiros Chavlis


Tutorial Objectives

In this tutorial, you will learn how to implement a game loop and create a random player. In future tutorials, you will be training other types of players using reinforcement learning.

The specific objectives for this tutorial:

  • Understand the format of two-players games, Othello specifically

  • Understand how to create random players

Tutorial slides

These are the slides for the videos in the tutorial. If you want to locally download the slides, click here.


Setup

Install dependencies

# @title Install dependencies
!pip install coloredlogs --quiet
!pip3 install vibecheck datatops --quiet

from vibecheck import DatatopsContentReviewContainer

def content_review(notebook_section: str):
    return DatatopsContentReviewContainer(
        "",  # No text prompt
        notebook_section,
        {
            "url": "https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab",
            "name": "public_testbed",
            "user_key": "3zg0t05r",
        },
    ).render()
# Imports
import os
import torch
import random
import logging
import coloredlogs
import numpy as np
import torch.optim as optim

log = logging.getLogger(__name__)
coloredlogs.install(level='INFO')  # Change this to DEBUG to see more info.

Set random seed

Executing set_seed(seed=seed) you are setting the seed

# @title Set random seed

# @markdown Executing `set_seed(seed=seed)` you are setting the seed

# For DL its critical to set the random seed so that students can have a
# baseline to compare their results to expected results.
# Read more here: https://pytorch.org/docs/stable/notes/randomness.html

# Call `set_seed` function in the exercises to ensure reproducibility.
import random
import torch

def set_seed(seed=None, seed_torch=True):
  """
  Function that controls randomness. NumPy and random modules must be imported.

  Args:
    seed : Integer
      A non-negative integer that defines the random state. Default is `None`.
    seed_torch : Boolean
      If `True` sets the random seed for pytorch tensors, so pytorch module
      must be imported. Default is `True`.

  Returns:
    Nothing.
  """
  if seed is None:
    seed = np.random.choice(2 ** 32)
  random.seed(seed)
  np.random.seed(seed)
  if seed_torch:
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.cuda.manual_seed(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

  print(f'Random seed {seed} has been set.')


# In case that `DataLoader` is used
def seed_worker(worker_id):
  """
  DataLoader will reseed workers following randomness in
  multi-process data loading algorithm.

  Args:
    worker_id: integer
      ID of subprocess to seed. 0 means that
      the data will be loaded in the main process
      Refer: https://pytorch.org/docs/stable/data.html#data-loading-randomness for more details

  Returns:
    Nothing
  """
  worker_seed = torch.initial_seed() % 2**32
  np.random.seed(worker_seed)
  random.seed(worker_seed)

Set device (GPU or CPU). Execute set_device()

# @title Set device (GPU or CPU). Execute `set_device()`
# especially if torch modules used.

# Inform the user if the notebook uses GPU or CPU.

def set_device():
  """
  Set the device. CUDA if available, CPU otherwise

  Args:
    None

  Returns:
    Nothing
  """
  device = "cuda" if torch.cuda.is_available() else "cpu"
  if device != "cuda":
    print("WARNING: For this notebook to perform best, "
        "if possible, in the menu under `Runtime` -> "
        "`Change runtime type.`  select `GPU` ")
  else:
    print("GPU is enabled in this notebook.")

  return device
SEED = 2021
set_seed(seed=SEED)
DEVICE = set_device()
Random seed 2021 has been set.
GPU is enabled in this notebook.

Download the modules

# @title Download the modules

# @markdown Run this cell!

# @markdown Download from OSF. The original repo is https://github.com/raymondchua/nma_rl_games.git

import os, io, sys, shutil, zipfile
from urllib.request import urlopen

# download from github repo directly
#!git clone git://github.com/raymondchua/nma_rl_games.git --quiet
REPO_PATH = 'nma_rl_games'

if os.path.exists(REPO_PATH):
  download_string = "Redownloading"
  shutil.rmtree(REPO_PATH)
else:
  download_string = "Downloading"

zipurl = 'https://osf.io/kf4p9/download'
print(f"{download_string} and unzipping the file... Please wait.")
with urlopen(zipurl) as zipresp:
  with zipfile.ZipFile(io.BytesIO(zipresp.read())) as zfile:
    zfile.extractall()
print("Download completed.")

print(f"Add the {REPO_PATH} in the path and import the modules.")
# add the repo in the path
sys.path.append('nma_rl_games/alpha-zero')

# @markdown Import modules designed for use in this notebook
import Arena

from utils import *
from Game import Game
from MCTS import MCTS
from NeuralNet import NeuralNet

# from othello.OthelloPlayers import *
from othello.OthelloLogic import Board
# from othello.OthelloGame import OthelloGame
# from othello.pytorch.NNet import NNetWrapper as NNet
Downloading and unzipping the file... Please wait.
Download completed.
Add the nma_rl_games in the path and import the modules.

The hyperparameters used throughout the notebook.

args = dotdict({
    'numIters': 1,            # In training, number of iterations = 1000 and num of episodes = 100
    'numEps': 1,              # Number of complete self-play games to simulate during a new iteration.
    'tempThreshold': 15,      # To control exploration and exploitation
    'updateThreshold': 0.6,   # During arena playoff, new neural net will be accepted if threshold or more of games are won.
    'maxlenOfQueue': 200,     # Number of game examples to train the neural networks.
    'numMCTSSims': 15,        # Number of games moves for MCTS to simulate.
    'arenaCompare': 10,       # Number of games to play during arena play to determine if new net will be accepted.
    'cpuct': 1,
    'maxDepth':5,             # Maximum number of rollouts
    'numMCsims': 5,           # Number of monte carlo simulations
    'mc_topk': 3,             # Top k actions for monte carlo rollout

    'checkpoint': './temp/',
    'load_model': False,
    'load_folder_file': ('/dev/models/8x100x50','best.pth.tar'),
    'numItersForTrainExamplesHistory': 20,

    # Define neural network arguments
    'lr': 0.001,               # lr: Learning Rate
    'dropout': 0.3,
    'epochs': 10,
    'batch_size': 64,
    'device': DEVICE,
    'num_channels': 512,
})

Section 0: Introduction

Video 0: Introduction

Submit your feedback

# @title Submit your feedback
content_review("W3D5_RL_for_games_intro")

Section 1: Create a game/agent loop for RL

Time estimate: ~20mins

Video 1: A game loop for RL

Submit your feedback

# @title Submit your feedback
content_review("W3D5_game_loop_for_RL")

Section 1.1: Introduction to OthelloGame

Othello is a board game played by two players on a board of 64 squares arranged in an eight-by-eight grid, with 64 playing pieces that are black on one side and white on the other.

Setup: The board will start with 2 black discs and 2 white discs at the centre of the board. They are arranged with black forming a North-East to South-West direction. White is forming a North-West to South-East direction. Each player gets 32 discs and black always starts the game.

Game rules:

  • Players take turns placing a single disk at a time.

  • A move is made by placing a disc of the player’s color on the board to surround (i.e. “outflank”) discs of the opposite color. In other words, the player with black discs must place on so that there is a straight line between the newly placed disc and another black disc, with one or more white pieces between them.

  • Surrounded disks get flipped (i.e. change color).

  • If a player does not have a valid move (they cannot place their disc to outflank the oppponent’s discs), they pass on their turn

  • A player can not voluntarily forfeit his turn.

  • When both players can not make a valid move the game ends.

There are nice rules/diagrams here if useful: https://www.eothello.com/. You can play an example Othello game there if you like!

Note: we will use a 6x6 board to speed computations up

Exercise Goal: How to setup a game environment with multiple players for reinforcement learning experiments.

Exercise:

  • Build an agent that plays random moves

  • Connect with connect 4 game

  • Generate games including wins and losses

Execute the following code to enable the OthelloGame class. This class represents a game board and has methods such getInitBoard to create the intial board, getValidMove to return the options of valid moves, and other helpful functionality to play the game. You do not need to understand every line of code in this class but try to get a sense of the possible methods

class OthelloGame(Game):
  """
  Instantiate Othello Game
  """
  square_content = {
      -1: "X",
      +0: "-",
      +1: "O"
      }

  @staticmethod
  def getSquarePiece(piece):
    return OthelloGame.square_content[piece]

  def __init__(self, n):
    self.n = n

  def getInitBoard(self):
    # Return initial board (numpy board)
    b = Board(self.n)
    return np.array(b.pieces)

  def getBoardSize(self):
    # (a,b) tuple
    return (self.n, self.n)

  def getActionSize(self):
    # Return number of actions, n is the board size and +1 is for no-op action
    return self.n*self.n + 1

  def getCanonicalForm(self, board, player):
    # Return state if player==1, else return -state if player==-1
    return player*board

  def stringRepresentation(self, board):
    return board.tobytes()

  def stringRepresentationReadable(self, board):
    board_s = "".join(self.square_content[square] for row in board for square in row)
    return board_s

  def getScore(self, board, player):
    b = Board(self.n)
    b.pieces = np.copy(board)
    return b.countDiff(player)

  @staticmethod
  def display(board):
    n = board.shape[0]
    print("   ", end="")
    for y in range(n):
      print(y, end=" ")
    print("")
    print("-----------------------")
    for y in range(n):
      print(y, "|", end="")    # Print the row
      for x in range(n):
        piece = board[y][x]    # Get the piece to print
        print(OthelloGame.square_content[piece], end=" ")
      print("|")
    print("-----------------------")

  @staticmethod
  def displayValidMoves(moves):
      # Display possible moves
      A=np.reshape(moves[0:-1], board.shape)
      n = board.shape[0]
      print("  ")
      print("possible moves")
      print("   ", end="")
      for y in range(n):
        print(y, end=" ")
      print("")
      print("-----------------------")
      for y in range(n):
        print(y, "|", end="")    # Print the row
        for x in range(n):
          piece = A[y][x]    # Get the piece to print
          print(OthelloGame.square_content[piece], end=" ")
        print("|")
      print("-----------------------")

  def getNextState(self, board, player, action):
    """
    Helper function to make valid move
    If player takes action on board, return next (board,player)
    and action must be a valid move

    Args:
      board: np.ndarray
        Board of size n x n [6x6 in this case]
      player: Integer
        ID of current player
      action: np.ndarray
        Space of actions

    Returns:
      (board,player) tuple signifying next state
    """
    if action == self.n*self.n:
      return (board, -player)
    b = Board(self.n)
    b.pieces = np.copy(board)
    move = (int(action/self.n), action%self.n)
    b.execute_move(move, player)
    return (b.pieces, -player)

  def getValidMoves(self, board, player):
    """
    Helper function to make valid move
    If player takes action on board, return next (board,player)
    and action must be a valid move

    Args:
      board: np.ndarray
        Board of size n x n [6x6 in this case]
      player: Integer
        ID of current player
      action: np.ndarray
        Space of action

    Returns:
      valids: np.ndarray
        Returns a fixed size binary vector
    """
    valids = [0]*self.getActionSize()
    b = Board(self.n)
    b.pieces = np.copy(board)
    legalMoves =  b.get_legal_moves(player)
    if len(legalMoves)==0:
      valids[-1]=1
      return np.array(valids)
    for x, y in legalMoves:
      valids[self.n*x+y]=1
    return np.array(valids)

  def getGameEnded(self, board, player):
    """
    Helper function to signify if game has ended

    Args:
      board: np.ndarray
        Board of size n x n [6x6 in this case]
      player: Integer
        ID of current player

    Returns:
      0 if not ended, 1 if player 1 won, -1 if player 1 lost
    """
    b = Board(self.n)
    b.pieces = np.copy(board)
    if b.has_legal_moves(player):
      return 0
    if b.has_legal_moves(-player):
      return 0
    if b.countDiff(player) > 0:
      return 1
    return -1

  def getSymmetries(self, board, pi):
    """
    Get mirror/rotational configurations of board

    Args:
      board: np.ndarray
        Board of size n x n [6x6 in this case]
      pi: np.ndarray
        Dimension of board

    Returns:
      l: list
        90 degree of board, 90 degree of pi_board
    """
    assert(len(pi) == self.n**2+1)  # 1 for pass
    pi_board = np.reshape(pi[:-1], (self.n, self.n))
    l = []

    for i in range(1, 5):
      for j in [True, False]:
        newB = np.rot90(board, i)
        newPi = np.rot90(pi_board, i)
        if j:
          newB = np.fliplr(newB)
          newPi = np.fliplr(newPi)
        l += [(newB, list(newPi.ravel()) + [pi[-1]])]
    return l

Below, we initialize and view a board.

# Display the board
set_seed(seed=SEED)

# Set up the game
game = OthelloGame(6)

# Get the initial board
board = game.getInitBoard()

# Display the board
game.display(board)

# Observe the game board size
print(f'Board size = {game.getBoardSize()}')

# Observe the action size
print(f'Action size = {game.getActionSize()}')
Random seed 2021 has been set.
   0 1 2 3 4 5 
-----------------------
0 |- - - - - - |
1 |- - - - - - |
2 |- - X O - - |
3 |- - O X - - |
4 |- - - - - - |
5 |- - - - - - |
-----------------------
Board size = (6, 6)
Action size = 37

Now let’s look at the valid actions for player 1 (the circles). game.getValidMoves returns 1s and 0s for every position on the board, 1 indicates if it is a valid place to put a new disc. Note that it turns a list (this could be reshaped into the board shape).

We also have a method to visualize the valid actions. Compare the valid actions to the board above.

# Get valid moves
valids = game.getValidMoves(board, 1)
print(valids)

# Visualize the moves
game.displayValidMoves(valids)
[0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0]
  
possible moves
   0 1 2 3 4 5 
-----------------------
0 |- - - - - - |
1 |- - O - - - |
2 |- O - - - - |
3 |- - - - O - |
4 |- - - O - - |
5 |- - - - - - |
-----------------------

Section 1.2: Create a random player

Let’s start by setting up the game loop using a random player to start with so that we we can test the game loop and make sure it works correctly.

To do so, we will first implement a random player in 3 steps:

  1. determine which moves are possible at all

  2. assign a uniform probability to each more (remember, this is a random player): 1/N for N valid moves

  3. randomly choose a move from the possible moves

Coding Exercise 1.2: Implement a random player

class RandomPlayer():
  """
  Simulates Random Player
  """

  def __init__(self, game):
    self.game = game

  def play(self, board):
    """
    Simulates game play

    Args:
      board: np.ndarray
        Board of size n x n [6x6 in this case]

    Returns:
      a: int
        Randomly chosen move
    """
    #################################################
    ## TODO for students: ##
    ## 1. Please compute the valid moves using getValidMoves() and the game class self.game. ##
    ## 2. Compute the probability over actions.##
    ## 3. Pick a random action based on the probability computed above.##
    # Fill out function and remove ##
    raise NotImplementedError("Implement the random player")
    #################################################

    # Compute the valid moves using getValidMoves()
    valids = self.game.getValidMoves(board, 1)

    # Compute the probability of each move being played (random player means this should
    # be uniform for valid moves, 0 for others)
    prob = ...

    # Pick a random action based on the probabilities (hint: np.choice is useful)
    a = ...

    return a

Click for solution

Submit your feedback

# @title Submit your feedback
content_review("W3D5_random_player")

Section 1.3: Create two random agents to play against each other

Now we create 2 random players and let them play against one another for a number of times… We will use some nice functionality we imported above, including the Arena class that allows multiple game plays. You can check out the code here if you want, but it is not necessary: https://github.com/raymondchua/nma_rl_games

# Define the random player
player1 = RandomPlayer(game).play  # Player 1 is a random player
player2 = RandomPlayer(game).play  # Player 2 is a random player

# Define number of games
num_games = 20

# Start the competition
set_seed(seed=SEED)
arena = Arena.Arena(player1, player2 , game, display=None)  # To see the steps of the competition set "display=OthelloGame.display"
result = arena.playGames(num_games, verbose=False)  # return  ( number of games won by player1, num of games won by player2, num of games won by nobody)
print(f"\n\n{result}")
Random seed 2021 has been set.
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[18], line 11
      9 set_seed(seed=SEED)
     10 arena = Arena.Arena(player1, player2 , game, display=None)  # To see the steps of the competition set "display=OthelloGame.display"
---> 11 result = arena.playGames(num_games, verbose=False)  # return  ( number of games won by player1, num of games won by player2, num of games won by nobody)
     12 print(f"\n\n{result}")

File ~/Desktop/TESTNMA/course-content-dl/tutorials/W3D5_ReinforcementLearningForGames/student/nma_rl_games/alpha-zero/Arena.py:81, in Arena.playGames(self, num, verbose)
     79 draws = 0
     80 for _ in tqdm(range(num), desc="Arena.playGames (1)"):
---> 81     gameResult = self.playGame(verbose=verbose)
     82     if gameResult == 1:
     83         oneWon += 1

File ~/Desktop/TESTNMA/course-content-dl/tutorials/W3D5_ReinforcementLearningForGames/student/nma_rl_games/alpha-zero/Arena.py:50, in Arena.playGame(self, verbose)
     48     print("Turn ", str(it), "Player ", str(curPlayer))
     49     self.display(board)
---> 50 action = players[curPlayer + 1](self.game.getCanonicalForm(board, curPlayer))
     52 valids = self.game.getValidMoves(self.game.getCanonicalForm(board, curPlayer), 1)
     54 if valids[action] == 0:

Cell In[16], line 27, in RandomPlayer.play(self, board)
     10 """
     11 Simulates game play
     12 
   (...)
     19     Randomly chosen move
     20 """
     21 #################################################
     22 ## TODO for students: ##
     23 ## 1. Please compute the valid moves using getValidMoves() and the game class self.game. ##
     24 ## 2. Compute the probability over actions.##
     25 ## 3. Pick a random action based on the probability computed above.##
     26 # Fill out function and remove ##
---> 27 raise NotImplementedError("Implement the random player")
     28 #################################################
     29 
     30 # Compute the valid moves using getValidMoves()
     31 valids = self.game.getValidMoves(board, 1)

NotImplementedError: Implement the random player
(11, 9, 0)

The results are displayed in the following way: (Number of player 1 wins, number of player 2 wins, number of ties)

Section 1.4: Compute win rate for the random player (player 1)

print(f"Number of games won by player1 = {result[0]}, "
      f"Number of games won by player2 = {result[1]} out of {num_games} games")
win_rate_player1 = result[0]/num_games
print(f"\nWin rate for player1 over 20 games: {round(win_rate_player1*100, 1)}%")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 1
----> 1 print(f"Number of games won by player1 = {result[0]}, "
      2       f"Number of games won by player2 = {result[1]} out of {num_games} games")
      3 win_rate_player1 = result[0]/num_games
      4 print(f"\nWin rate for player1 over 20 games: {round(win_rate_player1*100, 1)}%")

NameError: name 'result' is not defined
Number of games won by player1 = 11, Number of games won by player2 = 9 out of 20 games

Win rate for player1 over 20 games: 55.0%

Note: the random player is purely policy-based. It contains no estimates of value. Next we’ll see how to estimate and use value functions for game playing.


Summary

In this tutorial, you have learned about the Othello game, how to implement a game loop, and create a random player.