# Tutorial¶

Likely the easiest way to get started with PyIBL is by looking at some examples of its use. While much of what is in this chapter should be understandable even without much knowledge of Python, to write your own models you’ll need to know how to write Python code. If you are new to Python, a good place to start may be The Python Tutorial.

## A first example of using PyIBL¶

In the code blocks that follow, lines the user has typed begin with any of the three prompts,

$>>> ...  Other lines are printed by Python or some other command. First we launch Python, and make PyIBL available to it. While the output here was captured in Linux distribution in which you launch Python version 3 by typing python3, you installation my differ and you may launch it with python, py, or something else entirely; or start an interactive session in a completely different way using a graphical IDE. $ python3
Python 3.4.1 (default, May 26 2014, 01:12:52)
[GCC 4.8.1] on linux
>>> import pyibl
>>>


Next we create an Agent, named 'My Agent'.

>>> a = pyibl.Agent('My Agent')
>>> a
<Agent My Agent (0, 0, 0)>
>>>


The IBLT depends upon several parameters, which are typically adjusted by the modeler for an agent. We’ll set the Agent.noise to 1.5 and the Agent.decay to 5.

>>> a.noise
0.0
>>> a.noise = 1.5
>>> a.noise
1.5
>>> a.decay = 5
>>>


We also have to tell the agent what do if we ask it to choose between options it has never previously experienced. One way to do this is to set a default by setting the agent’s Agent.defaultUtility property.

>>> a.defaultUtility = 10.0
>>>


Now we can ask the agent to choose between two options, that we’ll just describe using two strings. When you try this yourself you may get the opposite answer as the IBLT theory deliberately includes some randomness, which is particularly obvious in cases like this where there is no reason yet to prefer one answer to the other.

>>> a.choose('The Green Button', 'The Red Button')
'The Green Button'
>>>


Now return a response to the model. We’ll supply 1.0.

>>> a.respond(1.0)
>>>


Because that value is significantly less than the default utility when we ask the agent to make the same choice again, we expect it with high probability to pick the other button.

>>> a.choose('The Green Button', 'The Red Button')
'The Red Button'


We’ll give it an even lower utility than we did the first one.

>>> a.respond(-2.0)
>>>


If we stick with these responses the model will tend to favor the first button selected. Again, your results may differ in detail because of randomness.

>>> a.choose('The Green Button', 'The Red Button')
'The Green Button'
>>> a.respond(1.0)
>>> a.choose('The Green Button', 'The Red Button')
'The Red Button'
>>> a.respond(-2.0)
>>> a.choose('The Green Button', 'The Red Button')
'The Green Button'
>>> a.respond(1.0)
>>> a.choose('The Green Button', 'The Red Button')
'The Green Button'
>>> a.respond(1.0)
>>> a.choose('The Green Button', 'The Red Button')
'The Green Button'
>>> a.respond(1.0)
>>> a.choose('The Green Button', 'The Red Button')
'The Green Button'
>>> a.respond(1.0)


But doing this by hand isn’t very useful for modeling. Instead, let’s write a function that asks the model to make this choice, and automates the reply.

>>> def chooseAndRespond():
...     result = a.choose('The Green Button', 'The Red Button')
...     if result == 'The Green Button':
...         a.respond(1.0)
...     else:
...         a.respond(-2.0)
...     return result
...
>>> chooseAndRespond()
'The Green Button'
>>>


Let’s ask the model to make this choice a thousand times, and see how many times it picks each button. But let’s do this from a clean slate. So, before we run it, we’ll cause Agent.reset() to clear the agent’s memory.

>>> a.reset()
>>> results = { 'The Green Button' : 0, 'The Red Button' : 0 }
>>> for i in range(1000):
...     results[chooseAndRespond()] += 1
...
>>> results
{'The Red Button': 22, 'The Green Button': 978}
>>>


As we expected the model prefers the green button, but because of randomness, does try the red one occasionally.

Now lets add some other choices. We’ll make a more complicated function that takes a dictionary of choices and the responses they generate, and see how they do. This will make use of a bit more Python, and, if it doesn’t make sense to you, investigate Python dictionaries, that is objects of the dict class, and their dict.keys() method. The default utility is still 10, and so long as the responses are well below that we can reasonably expect the first few trials to sample them all before favoring those that give the best results; but after the model gains more experience, it will favor whatever color or colors give the highest rewards.

>>> def chooseAndRespond(choices):
...     result = a.choose(*choices.keys())
...     a.respond(choices[result])
...     return result
...
>>> a.reset()
>>> choices = { 'green': -5, 'blue': 0, 'yellow': -4,
...             'red': -6, 'violet': 0 }
>>> results = {}
>>> for color in choices.keys():
...     results[color] = 0
...
>>> for i in range(5):
...     results[chooseAndRespond(choices)] += 1
...
>>> results
{'green': 1, 'blue': 1, 'yellow': 1, 'red': 1, 'violet': 1}
>>> for i in range(955):
...     results[chooseAndRespond(choices)] += 1
...
>>> results
{'green': 18, 'blue': 441, 'yellow': 19, 'red': 19, 'violet': 463}


The results are as we expected.

## Logging¶

PyIBL can write out detailed logs of what it’s doing, including both details of what it’s being asked to do and how it is responding, and details of its internal computations.

For demonstrating this we’ll define a model that chooses between two options, 'risky' and 'safe'. When it chooses 'safe' the reward is always 1.0, but when it chooses 'risky' 30% of the time it will receive 4.0, but the rest of the time nothing.

>>> from pyibl import Agent
>>> from random import random
>>> a = Agent('risky/safe')
>>> a.noise=1.5
>>> a.decay=5.0
>>> a.defaultUtility=10.0
>>> def makeChoice():
...     result = a.choose('risky', 'safe')
...     if result == 'safe':
...         a.respond(1.0)
...     elif random() <= 0.3:
...         a.respond(4.0)
...     else:
...         a.respond(0.0)
...     return result
...
>>>


By default a log file is written in comma separated values (CSV) format, though that can easily be altered. While normally we would write the log to a file by calling Population.logToFile(), for this example we’ll just let it write to standard output, which is the default if we don’t specify a file.

To enable logging we set the agent’s Agent.logging property to a set of strings describing which fields we’d like to see in the log. The possible such fields, and the strings to request them, are described in the Reference section of this manual. For starters in this example we’ll just ask to see the trial number, the choice made, and the response received.

>>> a.logging = {'tTrial', 'tChoice', 'tResponse'}
>>> for i in range(10):
...     result = makeChoice()
...
tTrial,tChoice,tResponse
1,risky,0.0000
2,safe,1.0000
3,safe,1.0000
4,risky,0.0000
5,safe,1.0000
6,risky,0.0000
7,safe,1.0000
8,risky,4.0000
9,risky,0.0000
10,safe,1.0000
>>>


It first printed three column headings that look like the strings used to request them, and then ten lines, once for each choose/respond cycle.

Now we’ll add a couple further fields to the log, the possible choices the agent is choosing between at each trial, and the blended values produced; at each trial it will be picking the choice with the larger blended value. We could just write out the whole set of attributes we want, but instead we’ll use Python’s set union operator to just add them to the existing set.

>>> a.logging |= {'oDecision', 'oBlendedValue'}
>>> a.reset()
>>> for i in range(10):
...     result = makeChoice()
...
tTrial,tChoice,tResponse,oDecision,oBlendedValue
1,safe,1.0000,risky,10.0000
1,safe,1.0000,safe,10.0000
2,risky,0.0000,risky,10.0000
2,risky,0.0000,safe,2.8242
3,safe,1.0000,risky,0.5979
3,safe,1.0000,safe,1.0943
4,safe,1.0000,risky,0.0492
4,safe,1.0000,safe,1.1291
5,safe,1.0000,risky,0.1274
5,safe,1.0000,safe,2.8958
6,risky,4.0000,risky,2.9752
6,risky,4.0000,safe,1.0126
7,risky,0.0000,risky,4.1094
7,risky,0.0000,safe,1.1840
8,risky,4.0000,risky,3.1863
8,risky,4.0000,safe,1.4147
9,safe,1.0000,risky,2.8826
9,safe,1.0000,safe,3.0415
10,risky,4.0000,risky,1.9414
10,risky,4.0000,safe,1.0655
>>>


Now we have two lines printed for every trial, one for each of the alternatives (oDecision) presented to the model to pick between. On both lines the choice (tChoice) that was actually made is the same, as is the actual payoff received (tResponse). The blended value is that, at this trial, for each of the two alternatives. Note that until a real result has been experienced the blended values are the default utility, but they then start to reflect the models actual experiences.

We’ll add a few further fields to the log, these describing the various instances present in the model at each trial. The fields we’ll add are the utility of each instance (iUtility), when it has been experienced (iOccurrences), details of its activation (iActivationBase, iActivationNoise, iActivation) at this trial, and its retrieval probability (iRetrievalProbability) at this trail. Note that the total activation (iActivation) is just the sum of the computed base activation and the noise.

>>> a.logging |= {'iUtility', 'iOccurrences', 'iActivation', 'iRetrievalProbability'}
>>> a.reset()
>>> for i in range(10):
...     result = makeChoice()
...
tTrial,tChoice,tResponse,oDecision,oBlendedValue,iUtility,iOccurrences,iActivation,iRetrievalProbability
1,safe,1.0000,risky,10.0000,10.0000,0,0.5182,1.0000
1,safe,1.0000,safe,10.0000,10.0000,0,-3.2292,1.0000
2,risky,4.0000,risky,10.0000,10.0000,0,-4.0032,1.0000
2,risky,4.0000,safe,3.3630,10.0000,0,-1.1146,0.2626
2,risky,4.0000,safe,3.3630,1.0000,1,1.0762,0.7374
3,risky,0.0000,risky,4.6605,10.0000,0,-4.5835,0.1101
3,risky,0.0000,risky,4.6605,4.0000,2,-0.1500,0.8899
3,risky,0.0000,safe,4.0279,10.0000,0,-5.6409,0.3364
3,risky,0.0000,safe,4.0279,1.0000,1,-4.2000,0.6636
4,safe,1.0000,risky,1.5001,10.0000,0,-7.2386,0.0319
4,safe,1.0000,risky,1.5001,4.0000,2,-2.5140,0.2954
4,safe,1.0000,risky,1.5001,0.0000,3,-0.7680,0.6728
4,safe,1.0000,safe,7.7438,10.0000,0,-3.0752,0.7493
4,safe,1.0000,safe,7.7438,1.0000,1,-5.3980,0.2507
5,risky,0.0000,risky,2.4222,10.0000,0,-9.1597,0.1167
5,risky,0.0000,risky,2.4222,4.0000,2,-7.0615,0.3138
5,risky,0.0000,risky,2.4222,0.0000,3,-5.7971,0.5695
5,risky,0.0000,safe,1.0420,10.0000,0,-9.1142,0.0047
5,risky,0.0000,safe,1.0420,1.0000,"1,4",2.2639,0.9953
6,safe,1.0000,risky,1.1608,10.0000,0,-11.1392,0.0082
6,safe,1.0000,risky,1.1608,4.0000,2,-3.7375,0.2696
6,safe,1.0000,risky,1.1608,0.0000,"3,5",-1.6476,0.7221
6,safe,1.0000,safe,9.0471,10.0000,0,-6.9011,0.8941
6,safe,1.0000,safe,9.0471,1.0000,"1,4",-11.4270,0.1059
7,risky,4.0000,risky,3.1665,10.0000,0,-10.1344,0.0823
7,risky,4.0000,risky,3.1665,4.0000,2,-5.9717,0.5858
7,risky,4.0000,risky,3.1665,0.0000,"3,5",-7.1772,0.3319
7,risky,4.0000,safe,1.6158,10.0000,0,-6.3350,0.0684
7,risky,4.0000,safe,1.6158,1.0000,"1,4,6",-0.7958,0.9316
8,risky,0.0000,risky,3.8982,10.0000,0,-10.0171,0.0052
8,risky,0.0000,risky,3.8982,4.0000,"2,7",1.0505,0.9615
8,risky,0.0000,risky,3.8982,0.0000,"3,5",-6.0847,0.0333
8,risky,0.0000,safe,1.0187,10.0000,0,-14.6961,0.0021
8,risky,0.0000,safe,1.0187,1.0000,"1,4,6",-1.5994,0.9979
9,safe,1.0000,risky,0.5787,10.0000,0,-9.7556,0.0060
9,safe,1.0000,risky,0.5787,4.0000,"2,7",-3.2219,0.1298
9,safe,1.0000,risky,0.5787,0.0000,"3,5,8",0.8006,0.8643
9,safe,1.0000,safe,1.2674,10.0000,0,-10.6717,0.0297
9,safe,1.0000,safe,1.2674,1.0000,"1,4,6",-3.2763,0.9703
10,safe,1.0000,risky,0.8400,10.0000,0,-11.4662,0.0153
10,safe,1.0000,risky,0.8400,4.0000,"2,7",-6.3433,0.1716
10,safe,1.0000,risky,0.8400,0.0000,"3,5,8",-3.0439,0.8130
10,safe,1.0000,safe,1.1395,10.0000,0,-10.9373,0.0155
10,safe,1.0000,safe,1.1395,1.0000,"1,4,6,9",-2.1313,0.9845
>>>


Note that the number of lines at each trial increases as further instances are added to memory. Each instance corresponds to a unique combination of the possible decision (oDecision) and a utility (iUtility).

Often a modeler will want to run experiments together that are similar, but differ from one another in some way. It is often useful to be able to combine the results off all into a single log, with some field distinguishing them. PyIBL provides the Population.block property to address this need. The modeler can set this property to any desired value, and, by asking for an ‘tBlock’ column in the log, see what value it had for each of the log entries.

For example, imagine we wanted to run the above experiment for four different participants, each with its own set of experiences and memories, for five trials apiece. To do this we simply run the overall five trial experiment in an outer loop, once through for each participant, calling Agent.reset() at the beginning of each of the out loops to clear the agent’s memory. To see which participant is which in the log we also set the block property at the beginning of each outer loop, and include a tBlock column in the log.

>>> a.logging = { 'tBlock', 'tTrial', 'tChoice', 'tResponse' }
>>> for participant in range(1,5):
...     a.reset()
...     a.block = "participant-" + str(participant)
...     for trial in range(5):
...         result = makeChoice()
...
tBlock,tTrial,tChoice,tResponse
participant-1,1,risky,4.0000
participant-1,2,safe,1.0000
participant-1,3,risky,0.0000
participant-1,4,safe,1.0000
participant-1,5,safe,1.0000
participant-2,1,safe,1.0000
participant-2,2,risky,4.0000
participant-2,3,risky,0.0000
participant-2,4,safe,1.0000
participant-2,5,risky,0.0000
participant-3,1,safe,1.0000
participant-3,2,risky,0.0000
participant-3,3,safe,1.0000
participant-3,4,risky,4.0000
participant-3,5,risky,0.0000
participant-4,1,risky,0.0000
participant-4,2,safe,1.0000
participant-4,3,safe,1.0000
participant-4,4,risky,0.0000
participant-4,5,safe,1.0000
>>>


## Multiple agents and populations¶

A PyIBL model is not limited to using just one agent. It can use as many as the modeler wishes. For this example we’ll have ten players competing for rewards. Each player, at each turn, will pick either 'safe' or 'risky'. Any player picking 'safe' will always receive 1 point. All those players picking 'risky' will share 7 points evenly between them; if fewer than seven players pick 'risky' those that did will receive more than if they had picked 'safe', but if more than seven players pick 'risky' they will do worse.

>>> from pyibl import Agent
>>> agents = [Agent('Agent-' + str(i+1)) for i in range(10)]
>>> for a in agents:
...     a.noise = 2.0
...     a.decay = 4.5
...     a.defaultUtility = 20
>>>
>>> def playRound():
...     choices = [a.choose('safe', 'risky') for a in agents]
...     risky = [a for a, c in zip(agents, choices) if c == 'risky']
...     reward = 7 / len(risky)
...     for a in agents:
...         if a in risky:
...             a.respond(reward)
...         else:
...             a.respond(1)
...
>>>


We could enable logging for each of these ten agents, and write ten separate log files. But with multiple agents it is usually more convenient to have all the agents write to the same log file, and include an agent’s name in each line that it writes. To make this easy we collect all our agents into a PyIBL Population, by setting each agent’s Agent.population property, and configure the population’s log. Because this model will emit a lot of log information we write the log to a file.

>>> from pyibl import Population
>>> p = Population()
>>> for a in agents:
...     a.population = p
...
>>> for i in range(100):
...     playRound()
...
>>> p.logging = {'tAgent', 'tTrial', 'tChoice', 'tResponse', 'oDecision', 'oBlendedValue'}
>>> p.logToFile('logfile.csv')
'logfile.csv'
>>> for i in range(100):
...     playRound()
...
>>>


The log file is about two thousand lines long, so let’s just look at the first twenty and last twenty lines of it.

$head -n 20 logfile.csv tAgent,tTrial,tChoice,tResponse,oDecision,oBlendedValue Agent-1,1,safe,1,safe,20.0000 Agent-1,1,safe,1,risky,20.0000 Agent-2,1,safe,1,safe,20.0000 Agent-2,1,safe,1,risky,20.0000 Agent-3,1,safe,1,safe,20.0000 Agent-3,1,safe,1,risky,20.0000 Agent-4,1,safe,1,safe,20.0000 Agent-4,1,safe,1,risky,20.0000 Agent-5,1,risky,3.5000,safe,20.0000 Agent-5,1,risky,3.5000,risky,20.0000 Agent-6,1,safe,1,safe,20.0000 Agent-6,1,safe,1,risky,20.0000 Agent-7,1,safe,1,safe,20.0000 Agent-7,1,safe,1,risky,20.0000 Agent-8,1,safe,1,safe,20.0000 Agent-8,1,safe,1,risky,20.0000 Agent-9,1,risky,3.5000,safe,20.0000 Agent-9,1,risky,3.5000,risky,20.0000 Agent-10,1,safe,1,safe,20.0000$ tail -n 20 logfile.csv
Agent-1,100,risky,0.8750,safe,1.1613
Agent-1,100,risky,0.8750,risky,1.3327
Agent-2,100,safe,1,safe,2.9859
Agent-2,100,safe,1,risky,1.4528
Agent-3,100,risky,0.8750,safe,1.1551
Agent-3,100,risky,0.8750,risky,1.5729
Agent-4,100,risky,0.8750,safe,1.0251
Agent-4,100,risky,0.8750,risky,1.0777
Agent-5,100,risky,0.8750,safe,1.0058
Agent-5,100,risky,0.8750,risky,1.0116
Agent-6,100,risky,0.8750,safe,1.0234
Agent-6,100,risky,0.8750,risky,1.3029
Agent-7,100,risky,0.8750,safe,1.0082
Agent-7,100,risky,0.8750,risky,1.0312
Agent-8,100,risky,0.8750,safe,1.0963
Agent-8,100,risky,0.8750,risky,1.4120
Agent-9,100,risky,0.8750,safe,1.0147
Agent-9,100,risky,0.8750,risky,1.1045
Agent-10,100,safe,1,safe,1.0527
Agent-10,100,safe,1,risky,1.0045
\$


We see that there are two lines, one corresponding to each of the potential decisions, for each agent at each trial. At the beginning the blended values are dominated by the default utility, but at the end dominated by experience, which for most of the agents has shown that the risky choice is usually a little bit better, though at this particular trial those making the risky bet lost. That the blended value for the safe bet is so high for Agent-2, and a little high than you might expect for Agent-10, is the result of noise in the activation computation, which on this trial has paid off for these two agents!

## Situations¶

Often we want the choices an agent makes to be affected by a possibly varying situation that can affect which decision is better at this time. For a PyIBL agent a situation is simply an ordered collection of values, called “attributes”. When we create an agent besides supplying its name, we can supply the names of one or more attributes that describe the situations relevant to its choices. Then when asking the agent to make a choice instead of passing just the decisions as arguments to Agent.choose(), we pass SituationDecision objects to the agent, each combining a potential decision with its current situation.

In fact, whenever we call an agent’s choose method we are always passing it SituationDecisions. The choose method is simply clever and when it sees a choice that is not a SituationDecision it creates one with the provided value as the decision, and all the attributes of the situation set to None. While we’ve been using strings as decisions, any Python object that is hashable, except None, can be a decision. Attribute values in situations can be any Python object that is hashable.

As a concrete example, we’ll have our agent decide which of two buttons, 'left' or 'right', to push. But one of these buttons will be illuminated. Which is illuminated at any time is decided randomly, with even chances for either. Pushing the left button earns a base reward of 1, and the right button of 2; but when a button is illuminated its reward is doubled.

We’ll define our agent to have situations with one attribute, 'illuminated', which says whether or not the button which a SituationDecision represents is illuminated.

>>> from pyibl import Agent
>>> from random import random
>>> a = Agent('My Agent', 'illuminated')
>>> a.noise = 0.5
>>> a.decay = 3.0
>>> a.defaultUtility = 5
>>>


We’ll create two SituationDecisions, one for each button. Because a SituationDecision should have exactly the attributes an agent is expecting, all SituationDecisions are actually associated with an agent, and are created by calling that agent’s Agent.situationDecision() method.

>>> sds = {'left': a.situationDecision('left', False),
...        'right': a.situationDecision('right', False)}
>>>


While we’ve created them both with the button un-illuminated, the code that actually runs the experiment will turn one of them on, randomly. Note that while we pass SituationDecisons to the choose method, choose only returns the decision it has selected, not the whole SituationDecison.

>>> def punchButton():
...     if random() <= 0.5:
...         sds['left'].set('illuminated', True)
...         sds['right'].set('illuminated', False)
...     else:
...         sds['left'].set('illuminated', False)
...         sds['right'].set('illuminated', True)
...     result = a.choose(*sds.values())
...     reward = 1 if result == 'left' else 2
...     if sds[result].get('illuminated'):
...         reward *= 2
...     a.respond(reward)
...     return result
...
>>>


Now we’ll run it 2,000 times, counting how many times each button is picked, and how many times an illuminated button is picked.

>>> results = {'left': 0, 'right': 0, True: 0, False: 0}
>>> for i in range(2000):
...     result = punchButton()
...     results[result] += 1
...     results[sds[result].get('illuminated')] += 1
...
>>> print(results)
{False: 486, True: 1514, 'left': 505, 'right': 1495}
>>>