Mousetracker Data #3

Mousetracker Data #3

In this post, I'm extracting some additional information about the stimuli so that we can run further analysis on participants' choices. For further background, please refer to the first post in this series.

In [4]:
import os
import re
import pandas as pd

data = pd.read_csv('./data cleaned/%s' % os.listdir('./data cleaned')[0])
data.head()

includelist = pd.read_csv('n=452 subjectID.csv', header = None)
includelist = includelist[0].values
In [5]:
data = data.loc[data['subject'].isin(includelist)]

Overview

The basic idea is that participants were shown one of two sets of poker chips that they would split between themselves and another person close to them. In every case, they could make a choice that was either selfish (gave more to themselves), or altruistic (gave more to their partner).

We wanted to know how much utility a choice would have to give to a participant before it made them selfish. In other words, how much more would a selfish choice have to give me for me to not be altruistic.

In [7]:
data['RESPONSE1'] = [x for x in data['resp_1'].str.extract(r'..(\d*)_.(\d*)', expand = True).values]
data['RESPONSE2'] = [x for x in data['resp_2'].str.extract(r'..(\d*)_.(\d*)', expand = True).values]

The columns 'resp_1' and 'resp_2' are image files of the choices shown. The naming convention is as follows: 'S' for how many chips the self gets, followed by that number, and 'O' for how many chips the other person gets.

We also have two other columns of interest: 'response', and 'error'. 'response' is which option the participant chose, and 'error' is whether or not that was the selfish choice. In this case, '0' is a selfish choice and '1' is an altruistic choice. This will become important shortly.

In [8]:
data.head()
Out[8]:
subject trial stimfile condition code resp_1 resp_2 response distractor error ... Y_95 Y_96 Y_97 Y_98 Y_99 Y_100 Y_101 Unnamed: 226 RESPONSE1 RESPONSE2
0 75870 2 NaN 1 smglvsslgm4 ~S9_O7.jpg ~S8_O12.jpg 1 2 0 ... 1.1826 1.1825 1.1824 1.1874 1.1875 1.1875 1.1875 NaN [9, 7] [8, 12]
1 75870 3 NaN 1 smglvsslgm6 ~S10_O15.jpg ~S12_O8.jpg 2 1 0 ... 1.1823 1.1837 1.1865 1.1876 1.1875 1.1875 1.1875 NaN [10, 15] [12, 8]
2 75870 7 NaN 1 smglvsslgm18 ~S12_O7.jpg ~S8_O13.jpg 1 2 0 ... 1.1950 1.1935 1.1924 1.1925 1.1925 1.1925 1.1925 NaN [12, 7] [8, 13]
3 75870 9 NaN 1 smglvsslgm19 ~S9_O12.jpg ~S12_O8.jpg 2 1 0 ... 1.2999 1.3000 1.3000 1.3000 1.3000 1.3000 1.3000 NaN [9, 12] [12, 8]
4 75870 14 NaN 2 smglvsslgm9 ~S9_O7.jpg ~S10_O5.jpg 1 1 1 ... 1.2575 1.2577 1.2565 1.2543 1.2519 1.2500 1.2500 NaN [9, 7] [10, 5]

5 rows × 229 columns

Algorithm

This got a little more complicated because the software we were using to capture mouse telemetry data randomized the position of the stimuli (i.e., whether the selfish choice was on the left or right of the screen was randomized), as it should.

This information is not a feature/variable on its own, but can be inferred from the 'response' and 'error' variables. If a participant chose the option on the left (response == 1), and that was coded as an 'error', it means the selfish choice was on the right (because 'errors' are altruistic choices).

There were two pieces of information that we wanted to extract:

  1. How many more chips did the selfish choice give vs. the altruistic choice?
  2. How many more chips did the selfish choice give the group vs. the altruistic choice?

For example, let's look at the first row data:

In [16]:
data.head(1)
Out[16]:
subject trial stimfile condition code resp_1 resp_2 response distractor error ... Y_95 Y_96 Y_97 Y_98 Y_99 Y_100 Y_101 Unnamed: 226 RESPONSE1 RESPONSE2
0 75870 2 NaN 1 smglvsslgm4 ~S9_O7.jpg ~S8_O12.jpg 1 2 0 ... 1.1826 1.1825 1.1824 1.1874 1.1875 1.1875 1.1875 NaN [9, 7] [8, 12]

1 rows × 229 columns

In this case, our participant chose the left option, which was the selfish choice. This choice gave him/her 1 more chip (9-8), and gave the group 4 fewer chips ((9+7)-(8+12)).

I didn't have time to think of an efficient way to do this for all the rows at once, so I decided to brute force it. First, I created a smaller dataframe:

In [10]:
tempdata = pd.DataFrame(columns = ('RESPONSE','ERROR','RESPONSE1','RESPONSE2'))
tempdata['RESPONSE'] = data['response']
tempdata['ERROR'] = data['error']
tempdata['RESPONSE1'] = data['RESPONSE1']
tempdata['RESPONSE2'] = data['RESPONSE2']
tempdata['SELFISHCHOICESELFMORE'] = 0
tempdata['SELFISHCHOICEGROUPMORE'] = 0

This algorithm basically iterates through each row in the data fram, checks to see if the selfish choice is on the left or right, and does the math I described above.

In [11]:
SELFISHCHOICESELFMORE = []
SELFISHCHOICEGROUPMORE = []

for row in tempdata.iterrows():
    if (row[1][0] == 1) & (row[1][1] == 0) | ((row[1][0] == 2) & (row[1][1] == 1)):
        try:
            SELFISHCHOICESELFMORE.append(int(row[1][2][0]) - int(row[1][3][0]))
            SELFISHCHOICEGROUPMORE.append((int(row[1][2][0]) + int(row[1][2][1])) - (int(row[1][3][0]) + int(row[1][3][1])))
        except:
            SELFISHCHOICESELFMORE.append(None)
            SELFISHCHOICEGROUPMORE.append(None)
    elif ((row[1][0] == 2) & (row[1][1] == 0)) | ((row[1][0] == 1) & (row[1][1] == 1)):
        try:
            SELFISHCHOICESELFMORE.append(int(row[1][3][0]) - int(row[1][2][0]))
            SELFISHCHOICEGROUPMORE.append((int(row[1][3][0]) + int(row[1][3][1])) - (int(row[1][2][0]) + int(row[1][2][1])))
        except:
            SELFISHCHOICESELFMORE.append(None)
            SELFISHCHOICEGROUPMORE.append(None)
In [12]:
tempdata = tempdata.drop(['RESPONSE1','RESPONSE2', 'RESPONSE', 'ERROR'], axis = 1)
tempdata['SELFISHCHOICESELFMORE'] = SELFISHCHOICESELFMORE
tempdata['SELFISHCHOICEGROUPMORE'] = SELFISHCHOICEGROUPMORE

Concatenating and writing to a csv:

In [14]:
outdata = pd.concat([data,tempdata], axis = 1)
outdata.to_csv('combineddata3.csv')