Rock paper scissors game example¶

Here is described a Python implementation of the Rock, Paper, Scissors game [1], and how to connect a variety of models to it. Rock, Paper, Scissors is a game where two players compete, each choosing one of three possible moves. It is most interesting when iterated many times, as players may possibly be able to learn about their opponent’s biases and exploit them. Note that there are three outcomes possible in any round, two where one player wins and the other loses, and a third where the players tie. This can have implications for strategies, as, for example, maximizing a player’s numbers wins is not the same as minimizing that player’s number of loses. The game has been extensively studied.

Click here to download a zipped archive for the various files of code described in this section, along with a requirements.txt file. The recommended way of running these examples yourself is to create and activate a virtual environment using venv or conda, doing pip install -r requirements.txt in it, and then in it running Python on the desired file.

The implementation of the game, which is completely independent of PyIBL, is in rps-game.py.

# Copyright 2024 Carnegie Mellon University

"""
A framework for placing the Rock, Paper, Scissors game. Players instances of subclasses
of RPSPlayer.

Also included is a command line interface to run pairs of players, typically of two
different types, against one another for a given number of rounds and a number of
virtual participant pairs. The players are described as a module name, dot, and a
constructor name, optionally followed by parenthesized arguments to the constructor. For
example,
    python rps_game.py wsls.WinStayLoseShiftPlayer "rand.RandomPlayer(bias=(0.8, 0.1))"
"""

import click
from importlib import import_module
import matplotlib.pyplot as plt
from os import listdir
from os.path import splitext
import pandas as pd
from re import fullmatch
import sys
from tqdm import trange


MOVES = ["rock", "paper", "scissors"]
RESULTS = ["tie", "win", "lose"]


class RPSPlayer:
    """
    Subclass this abstract class to create a kind of player. The move() method must
    be overridden to respond with one of the possible MOVES. If desired, the result()
    method may also be overridden to inform the player of the result of the most recent
    round of play. Similarly the reset() method may be overridden; it is typically
    called between virtual games to reset the virtual participant if that participant
    retains state between rounds.
    """

    def __init__(self):
        self._awaiting_result = False

    def reset(self):
        """If there's anything to do this method must be overridden"""
        pass

    def do_reset(self):
        self.reset()
        self._awaiting_result = False

    def move(self):
        """Must be overriden by a subclass"""
        raise NotImplementedError("The move() method must be overridden")

    def do_move(self):
        if self._awaiting_result:
            raise RuntimeError("Cannot make a move until the previous round has been resolved")
        m = self.move()
        self._awaiting_result = True
        return m

    def result(self, opponent_move, outcome, wins, ties, losses):
        """If there's anything to do this method must be overridden"""
        pass

    def do_result(self, opponent_move, outcome, wins, ties, losses):
        if not self._awaiting_result:
            return
        self.result(opponent_move, outcome, wins, ties, losses)
        self._awaiting_result = False



class RPSGame:
    """
    Plays one or more games between two player objects, each of a given number of rounds.
    The player objects are reset between games. Returns a Pandas DataFrame collecting
    the results of all the rounds of all the games.
    """

    def __init__(self, player1, player2, rounds=1, participants=1):
        self._players = [player1, player2]
        self._rounds = rounds
        self._participants = participants

    def play(self, show_progress=False):
        results = []
        for participant in (trange(1, self._participants + 1) if show_progress
                            else range(1, self._participants + 1)):
            wins = [0, 0]
            for p in self._players:
                p.reset()
            for round in range(1, self._rounds + 1):
                moves = [p.do_move() for p in self._players]
                outcomes = [RESULTS[(MOVES.index(moves[i]) - MOVES.index(moves[(i + 1) % 2])) % 3]
                            for i in range(2)]
                for i in range(2):
                    if outcomes[i] == "win":
                        wins[i] += 1
                for p, om, oc, win, loss in zip(self._players,
                                                reversed(moves),
                                                outcomes,
                                                wins,
                                                reversed(wins)):
                    p.do_result(om, oc, win, round - (win + loss), loss)
                results.append([participant, round,
                                moves[0], moves[1],
                                outcomes[0], outcomes[1],
                                wins[0], wins[1]])
        return pd.DataFrame(results,
                            columns=("participant pair,round,"
                                     "player 1 move,player 2 move,"
                                     "player 1 outcome,player 2 outcome,"
                                     "player 1 total wins,player 2 total wins").split(","))


def plot_wins_losses(df, player_no=1, title=None, file=None):
    if file:
        df.to_csv(file)
    other_player = 1 if player_no==2 else 2
    df["wins"] = df.apply(lambda x: x[f"player {player_no} total wins"] / x["round"], axis=1)
    df["losses"] = df.apply(lambda x: x[f"player {other_player} total wins"] / x["round"], axis=1)
    df["ties"] = 1 - (df["wins"] + df["losses"])
    rounds = max(df["round"])
    xmargin = rounds / 80
    df.groupby("round")[["wins", "ties", "losses"]].mean().plot(figsize=(10, 6),
                                                                color=("green", "gray", "firebrick"),
                                                                ylim=(-0.03, 1.03),
                                                                title=title,
                                                                xlabel="round",
                                                                xlim=(1 - xmargin, rounds + xmargin),
                                                                xticks=(range(1, rounds+1) if rounds < 8
                                                                        else None),
                                                                ylabel="fraction winning/losing")
    plt.show()


def make_player(s):
    if m := fullmatch(r"(\w+)\.(\w+)(\(.*\))?", s):
        mname = m.group(1)
        cname = m.group(2)
        args = m.group(3)
        module = import_module(mname)
        c = cname + (args or "()")
        return eval(f"module.{c}"), c
    else:
        raise RuntimeError(f"Don't know how to create player {s}")

@click.command()
@click.option("--rounds", "-r", type=int, default=100,
              help="The number of rounds to play")
@click.option("--participants", "-p", type=int, default=200,
              help="The number of participant pairs to play")
@click.option("--file", "-f", type=str, default=None,
              help="A CSV file into which to write the results")
@click.argument("player1")
@click.argument("player2")
def main(player1, player2, rounds=1, participants=1, file=None, show_progress=None):
    if file and not splitext(file)[1]:
        file += ".csv"
    if show_progress is None:
        show_progress = not player1.startswith("human") and not player2.startswith("human")
    p1, n1 = make_player(player1)
    p2, n2 = make_player(player2)
    plot_wins_losses(RPSGame(p1, p2, rounds, participants).play(show_progress),
                     title=f"{n1} versus\n{n2}\n(averaged over {participants} participants)",
                     file=file)


if __name__ == '__main__':
    main()

This defines a class, RPSPlayer, which is subclassed to implement various player types. There is a further RPSGame class which is constructed with two players which are subclasses of RPSPlayer; the two players are typically, though not necessarily, of different subclasses. The RPSGame object calls the players repeatedly for a number of rounds, typically for several or many virtual participant pairs, and gathers the results, returning them as a Pandas DataFrame The rps_game.py file also contains a function, plot_winsₗosses that takes such a DataFrame and plots the wins and losses of the first player against the second using the Matplotlib library.

When creating a subclass of RPSPlayer its move() method must be overridden to return one of the string values "rock", "paper" or "scissors". Usually the method result() is also overridden, allowing display or capture of the results of a round of the game, though for some very simple models this may not be necessary. Similarly the reset() method may be overridden if the model is retaining state carried from round to round that may need to be reset between virtual participants. A simple human player is defined with the subclass HumanPlayer in human.py; note that the HuamnPlayer overrides move() to request a move from the player and returns it; and overrides result() to display the results:

# Copyright 2024 Carnegie Mellon University

"""
An RPSPlayer subclass which simply solicits moves from a human player using the terminal,
and also prints to the terminal the results of each round of play.
"""

from rps_game import RPSPlayer, MOVES
import sys

def read_move():
    while True:
        print("Enter your next move: r(ock), p(aper) or s(cissors): ", end="", flush=True)
        s = sys.stdin.readline().strip()
        if s:
            for m in MOVES:
                if m.startswith(s):
                    return m


class HumanPlayer(RPSPlayer):

    def move(self):
        self._last_move = read_move()
        return self._last_move

    def result(self, opponent_move, outcome, wins, ties, losses):
        print(f"You played {self._last_move}, your opponent played {opponent_move}, you {outcome} "
              f"(so far you have won {wins}, tied {ties} and lost {losses})")

It would also be relatively straightforward to create a web-based interface as an RPSPlayer, which would allow a human-human game to be played.

In addition to this human player, a number of models are implemented, several using PyIBL, and are described in subsequent subsections.

Finally, rps-game.py implements a command line interface, creating an RPSGame with players of designated types, playing potentially many rounds with many virtual pairs of those players, and then plotting the results of these games. Because running large number of participants can require long periods of time, particularly for some kinds of models and/or large numbers of rounds, a progress indicator is usually shown while results are being computed. For example, to run 1,000 pairs of WinStayLoseShiftPlayer (described further below) against a RandoomPlayer (also described further below), the latter biased to return rock 80% of the time, paper 20%, and scissors never, with each pair playing 60 rounds, you could call:

python rps_game.py --participants=1000 --rounds=60 wsls.WinStayLoseShiftPlayer "rand.RandomPlayer(bias=(0.8, 0.2))"

This will result in display of a graph much like the following, though it may differ slightly in detail since both models are stochastic.

By supplying a --file argument you can also save the resulting DataFrame describing the full results into a CSV file. For example, if we add --file=results.csv the first few lines of the resulting file will look something like the following, though again differing in detail since the models are stochastic.

,participant pair,round,player 1 move,player 2 move,player 1 outcome,player 2 outcome,player 1 total wins,player 2 total wins 0,1,1,scissors,rock,lose,win,0,1 1,1,2,rock,rock,tie,tie,0,1 2,1,3,scissors,rock,lose,win,0,2 3,1,4,rock,rock,tie,tie,0,2 4,1,5,paper,rock,win,lose,1,2 5,1,6,paper,rock,win,lose,2,2 6,1,7,paper,rock,win,lose,3,2 7,1,8,paper,rock,win,lose,4,2 8,1,9,paper,rock,win,lose,5,2 9,1,10,paper,rock,win,lose,6,2 10,1,11,paper,rock,win,lose,7,2 11,1,12,paper,rock,win,lose,8,2 12,1,13,paper,rock,win,lose,9,2 13,1,14,paper,paper,tie,tie,9,2 …

Or, when imported into a spreadsheet:

Random model¶

Perhaps the simplest model, RandomPlayer in rand.py, simply chooses a move at random.

# Copyright 2024 Carnegie Mellon University

"""
A player who, by default, chooses moves uniformly at random. If bias is set it should
be a 2-tuple, the probabilities of rock or paper being chosen, and a correspondingly
skewed distribution is then used for picking moves at random.
"""

import random
from rps_game import RPSPlayer, MOVES

class RandomPlayer(RPSPlayer):

    def __init__(self, bias=None):
        super().__init__()
        if bias:
            self._rock, self._paper = bias
            if not (self._rock >= 0 and self._paper >= 0
                    and (self._rock + self._paper) <= 1):
                raise RuntimError(f"bias ({bias}) should be a tuple of the probabilities that rock and paper are chosen")
        else:
            self._rock = None

    def move(self):
        if not self._rock:
            return random.choice(MOVES)
        r = random.random()
        if r <= self._rock:
            return "rock"
        elif r <= self._rock + self._paper:
            return "paper"
        else:
            return "scissors"

By default it uses a uniform random distribution to choose between the three moves. While this is “optimal” play in the sense that no other player can exploit it to win more than one third of the time, neither can the random player exploit other players’ biases.

If desired the distribution used to choose moves may be skewed by setting the bias parameter, which should be a 2-tuple of the probabilities of choosing rock and paper. for example, to choose rock 80% of the time, paper never, and scissors 20% of the time, RandomPlayer(bias=(0.8, 0).

Win stay lose shift model¶

A strategy that has been widely studied is Win Stay, Lose Shift, which is demonstrated here.

# Copyright 2024 Carnegie Mellon University

"""
A player using a win stay, lose shift strategy. After a loss the next move is an upgrade
of the previous one with probability upgrade, and otherwise a downgrade. The initial move
is chosen randomly. Note that with the default upgrade value of 0.5 the shift after a
is simply to a uniformly random choice between the other two moves.
"""

import random
from rps_game import RPSPlayer, MOVES

class WinStayLoseShiftPlayer(RPSPlayer):

    def __init__(self, upgrade=0.5):
        super().__init__()
        if upgrade < 0 or upgrade > 1:
            raise RuntimeError(f"the upgrade parameter ({updagrede})should be a probability, non-negative but less than one")
        self._upgrade = upgrade
        self.reset()

    def reset(self):
        self._last_move = None
        self._last_outcome = None

    def move(self):
        if self._last_outcome == "win":
            mv = self._last_move
        elif not self._last_outcome:
            mv = random.choice(MOVES)
        elif random.random() < self._upgrade:
            # upgrade
            mv = MOVES[(MOVES.index(self._last_move) + 1) % 3]
        else:
            # downgrade
            mv = MOVES[(MOVES.index(self._last_move) - 1) % 3]
        self._last_move = mv
        return mv

    def result(self, opponent_move, outcome, wins, ties, losses):
        self._last_outcome = outcome

When shifting after a loss a choice needs to be made between the two other possible moves. By default this choice is made uniformly at random. However an upgrade or downgrade can be made preferred by setting the upgrade parameter to the probability that after a loss the next move is an upgrade from this player’s previous move. Note that strategies where a loss is always followed by an upgrade or a downgrade can easily be used simply by setting upgrade to 1 or 0, respectively.

For example, to play a win stay, lose shift strategy, with a loss resulting in a 20% chance of an upgrade and an 80% chance of a downgrade, against a biased random strategy favoring rock 80% of the time we could do:

python rps_game.py "wsls.WinStayLoseShiftPlayer(upgrade=0.2)" "rand.RandomPlayer(bias=(0.8, 0.15))"

This results in a plot similar to the following.

IBL models¶

Neither of the above models have any dependence upon PyIBL, but we can easily create PyIBL models, too, which allow us to see how various IBL models can fare playing Rock, Paper, Scissors. Three such models are in ibl.py.

# Copyright 2024 Carnegie Mellon University

from itertools import repeat
from pyibl import Agent
from rps_game import RPSPlayer, MOVES, RESULTS


"""
Several different IBL models for playing Rock, Paper, Scissors.
"""

class IBLPlayer(RPSPlayer):
    """
    A base class for a players using a single PyIBL Agent. The attributes argument is
    passed to the Agent constructor. The payoffs argument is a 3-tuple of the payoffs
    for the possible results in the order lose, tie, win.
    """

    def __init__(self, attributes=None, payoffs=(-1, 0, 1), kwd={}):
        super().__init__()
        self._agent = Agent(attributes)
        for a in ("noise", "decay", "temperature"):
            if v := kwd.get(a):
                setattr(self._agent, a, v)
        # The _payoffs in the object are stored in a different order than the human
        # friendly order used for the parameter, instead matching he order in which
        # they appear in rps_game.RESULTS: tie, win, lose
        self._payoffs = payoffs[1:] + payoffs[:1]
        self._initial_payoff = 1.2 * max(payoffs)

    def reset(self):
        self._agent.reset(self._agent.default_utility in (None, False))

    def respond(self, outcome):
        self._agent.respond(self._payoffs[RESULTS.index(outcome)])


class BasicIBLPlayer(IBLPlayer):
    """
    A simplistic IBL model that simply notes how well we do for each possible move. While
    this might work against a not very smart opponent, say one that almost always picks
    rock, against players that are learning and responding from the history of our moves
    it is unlike to do well.
    """

    def __init__(self, **kwd):
        super().__init__(kwd)
        self._agent.default_utility = self._initial_payoff
        self.reset()

    def reset(self):
        super().reset()
        self._agent.reset()

    def move(self):
        return self._agent.choose(MOVES)

    def result(self, opponent_move, outcome, wins, ties, losses):
        self.respond(outcome)


class ContextualIBLPlayer(IBLPlayer):
    """
    A slightly smarter IBL model that chooses its move based on what move our opponent
    made in the last round.
    """

    def __init__(self, **kwd):
        super().__init__(["move", "opponent_previous_mode"], kwd=kwd)
        self._agent.default_utility = self._initial_payoff
        self.reset()

    def reset(self):
        super().reset()
        self._opponent_previous = None

    def move(self):
        return self._agent.choose(zip(MOVES, repeat(self._opponent_previous)))[0]

    def result(self, opponent_move, outcome, wins, ties, losses):
        self.respond(outcome)
        self._opponent_previous = opponent_move


NONE_MATCH = 0.5

def move_sim(x, y):
    if x == y:
        return 1
    elif x is None or y is None:
        return NONE_MATCH
    else:
        return 0

def shift(element, list):
    # Adds element to the front of the list, shifting the existing elements towards the
    # back, with the oldest element falling off the end of the list.
    list.pop()
    list.insert(0, element)


class LagIBLPlayer(IBLPlayer):
    """
    An IBL model that keeps track of the past N (= log) moves of both out opponent and
    our own move, thus capturing how our opponent responds to our moves. Because there
    are so many possibilities we use partial matching with not yet seen possibilities
    viewed as half as salient as those that match perfectly.
    """

    def __init__(self, lag=1, mismatch_penalty=1, **kwd):
        self._lag = lag
        move_attrs = (["opp-" + str(i) for i in range(1, lag + 1)] +
                      ["own-" + str(i) for i in range(1, lag + 1)])
        super().__init__(["move"] + move_attrs, kwd=kwd)
        self._agent.mismatch_penalty = mismatch_penalty
        self._agent.similarity(move_attrs, move_sim)
        self.reset()
        self._agent.populate(self.choices(), self._initial_payoff)

    def reset(self):
        self._opp_prev = [None] * self._lag
        self._own_prev = [None] * self._lag

    def choices(self):
        return [[move] + lst
                for move, lst in zip(MOVES, repeat(self._opp_prev + self._own_prev))]

    def move(self):
        self._move = self._agent.choose(self.choices())[0]
        return self._move

    def result(self, opponent_move, outcome, wins, ties, losses):
        self.respond(outcome)
        shift(opponent_move, self._opp_prev)
        shift(self._move, self._own_prev)

All three models use one PyIBL Agent for each player, and the code managing that agent is shared by having each model a subclass of an base IBLPlayer class. What attributes the created Agent has is set with the attributes argument to the IBLPlayer constructor. When making such an IBL model we must decide what the utilities are of winning, losing or tying a game. By default IBLPlayer sets these to 1 point for a win, 0 points for a tie, and -1 points for a loss. These values can be adjusted with the payoffs argument to the constructor. The default prefers winning, but views tying as preferable to losing; but for some investigations it may be appropriate to aim solely at maximizing wins, or minimizing losses, so payoffs something like (0, 0, 1) or (0, 1, 1), respectively, might be appropriate. The IBLPlayer class also makes available to its subclasses a suitable value for prepopulated instances and the like for encouraging exploration, _initial_payoff, based on these possible payoff values. In addition, the usual IBL parameters (noise, decay and blending temperature) can also be modified simply by change parameters passed to the constructor.

The simplest IBL model, BasicIBLPlayer, simply bases its choice on the history of results from having made each move, with no concern for the moves made in the preceding rounds. This can work against unsophisticated opponents, such as one that favors rock 60% of the time:

python rps_game.py ibl.BasicIBLPlayer "rand.RandomPlayer(bias=(0.6, 0.2))"

But this basic model is readily defeated by a more sophisticated opponent, such as one employing a win stay, lose shift strategy:

python rps_game.py ibl.BasicIBLPlayer "wsls.WinStayLoseShiftPlayer(upgrade=1)"

A smarter model can use a PyIBL attribute to base its move selection on the opponents previous move. This does better against the wind stay, stay loses shift opponent, above:

python rps_game.py ibl.ContextualIBLPlayer "wsls.WinStayLoseShiftPlayer(upgrade=1)"

But if the win stay, lose shift opponent randomly selects what it does on a loss, the ContextualIBLPlayer will fare poorly against it:

python rps_game.py ibl.ContextualIBLPlayer "wsls.WinStayLoseShiftPlayer(upgrade=0.5)"

To improve this IBL model further we want to have it capture not just the experience of what opponent moves most likely follow its own moves, but how it responds to our moves. In a LagIBLModel we capture the results based on both players’ past moves. By default it only uses the immediately preceding moves of both players, but by setting the lag parameter this can be increased to multiple past moves. A full set of results for all pairs of possible moves is large, especially if we consider multiple past moves, we use partial matching to treat unseen results as of some value, but not as much as one’s we’ve really seen. The similarly function, sim(), returns 0.5, for attributes that do not match instances we’ve seen, allowing instances only some of whose attributes match to still contributed to the blended value, albeit with less weight than those that have more matching attributes.

With a lag of only one, matching just the preceding moves of both places, this model handily dominates the win stay, lose shift strategy even with an evenly distributed random selection of upgrades/downgraes:

python rps_game.py ibl.LagIBLPlayer "wsls.WinStayLoseShiftPlayer(upgrade=0.5)"

Because most of these models, both the conventional ones and the IBL ones, have various parameters that can be modified it is easy to compare the results of a variety of differing strategies.

Navigation

Related Topics

Rock paper scissors game example¶

Random model¶

Win stay lose shift model¶

IBL models¶