# Mousetracker Data #3

## Mousetracker Data #3¶

In this post, I'm extracting some additional information about the stimuli so that we can run further analysis on participants' choices. For further background, please refer to the first post in this series.

In [4]:
import os
import re
import pandas as pd

data = pd.read_csv('./data cleaned/%s' % os.listdir('./data cleaned')[0])

includelist = includelist[0].values

In [5]:
data = data.loc[data['subject'].isin(includelist)]


### Overview¶

The basic idea is that participants were shown one of two sets of poker chips that they would split between themselves and another person close to them. In every case, they could make a choice that was either selfish (gave more to themselves), or altruistic (gave more to their partner).

We wanted to know how much utility a choice would have to give to a participant before it made them selfish. In other words, how much more would a selfish choice have to give me for me to not be altruistic.

In [7]:
data['RESPONSE1'] = [x for x in data['resp_1'].str.extract(r'..(\d*)_.(\d*)', expand = True).values]
data['RESPONSE2'] = [x for x in data['resp_2'].str.extract(r'..(\d*)_.(\d*)', expand = True).values]


The columns 'resp_1' and 'resp_2' are image files of the choices shown. The naming convention is as follows: 'S' for how many chips the self gets, followed by that number, and 'O' for how many chips the other person gets.

We also have two other columns of interest: 'response', and 'error'. 'response' is which option the participant chose, and 'error' is whether or not that was the selfish choice. In this case, '0' is a selfish choice and '1' is an altruistic choice. This will become important shortly.

In [8]:
data.head()

Out[8]:
subject trial stimfile condition code resp_1 resp_2 response distractor error ... Y_95 Y_96 Y_97 Y_98 Y_99 Y_100 Y_101 Unnamed: 226 RESPONSE1 RESPONSE2
0 75870 2 NaN 1 smglvsslgm4 ~S9_O7.jpg ~S8_O12.jpg 1 2 0 ... 1.1826 1.1825 1.1824 1.1874 1.1875 1.1875 1.1875 NaN [9, 7] [8, 12]
1 75870 3 NaN 1 smglvsslgm6 ~S10_O15.jpg ~S12_O8.jpg 2 1 0 ... 1.1823 1.1837 1.1865 1.1876 1.1875 1.1875 1.1875 NaN [10, 15] [12, 8]
2 75870 7 NaN 1 smglvsslgm18 ~S12_O7.jpg ~S8_O13.jpg 1 2 0 ... 1.1950 1.1935 1.1924 1.1925 1.1925 1.1925 1.1925 NaN [12, 7] [8, 13]
3 75870 9 NaN 1 smglvsslgm19 ~S9_O12.jpg ~S12_O8.jpg 2 1 0 ... 1.2999 1.3000 1.3000 1.3000 1.3000 1.3000 1.3000 NaN [9, 12] [12, 8]
4 75870 14 NaN 2 smglvsslgm9 ~S9_O7.jpg ~S10_O5.jpg 1 1 1 ... 1.2575 1.2577 1.2565 1.2543 1.2519 1.2500 1.2500 NaN [9, 7] [10, 5]

5 rows × 229 columns

### Algorithm¶

This got a little more complicated because the software we were using to capture mouse telemetry data randomized the position of the stimuli (i.e., whether the selfish choice was on the left or right of the screen was randomized), as it should.

This information is not a feature/variable on its own, but can be inferred from the 'response' and 'error' variables. If a participant chose the option on the left (response == 1), and that was coded as an 'error', it means the selfish choice was on the right (because 'errors' are altruistic choices).

There were two pieces of information that we wanted to extract:

1. How many more chips did the selfish choice give vs. the altruistic choice?
2. How many more chips did the selfish choice give the group vs. the altruistic choice?

For example, let's look at the first row data:

In [16]:
data.head(1)

Out[16]:
subject trial stimfile condition code resp_1 resp_2 response distractor error ... Y_95 Y_96 Y_97 Y_98 Y_99 Y_100 Y_101 Unnamed: 226 RESPONSE1 RESPONSE2
0 75870 2 NaN 1 smglvsslgm4 ~S9_O7.jpg ~S8_O12.jpg 1 2 0 ... 1.1826 1.1825 1.1824 1.1874 1.1875 1.1875 1.1875 NaN [9, 7] [8, 12]

1 rows × 229 columns

In this case, our participant chose the left option, which was the selfish choice. This choice gave him/her 1 more chip (9-8), and gave the group 4 fewer chips ((9+7)-(8+12)).

I didn't have time to think of an efficient way to do this for all the rows at once, so I decided to brute force it. First, I created a smaller dataframe:

In [10]:
tempdata = pd.DataFrame(columns = ('RESPONSE','ERROR','RESPONSE1','RESPONSE2'))
tempdata['RESPONSE'] = data['response']
tempdata['ERROR'] = data['error']
tempdata['RESPONSE1'] = data['RESPONSE1']
tempdata['RESPONSE2'] = data['RESPONSE2']
tempdata['SELFISHCHOICESELFMORE'] = 0
tempdata['SELFISHCHOICEGROUPMORE'] = 0


This algorithm basically iterates through each row in the data fram, checks to see if the selfish choice is on the left or right, and does the math I described above.

In [11]:
SELFISHCHOICESELFMORE = []
SELFISHCHOICEGROUPMORE = []

for row in tempdata.iterrows():
if (row[1][0] == 1) & (row[1][1] == 0) | ((row[1][0] == 2) & (row[1][1] == 1)):
try:
SELFISHCHOICESELFMORE.append(int(row[1][2][0]) - int(row[1][3][0]))
SELFISHCHOICEGROUPMORE.append((int(row[1][2][0]) + int(row[1][2][1])) - (int(row[1][3][0]) + int(row[1][3][1])))
except:
SELFISHCHOICESELFMORE.append(None)
SELFISHCHOICEGROUPMORE.append(None)
elif ((row[1][0] == 2) & (row[1][1] == 0)) | ((row[1][0] == 1) & (row[1][1] == 1)):
try:
SELFISHCHOICESELFMORE.append(int(row[1][3][0]) - int(row[1][2][0]))
SELFISHCHOICEGROUPMORE.append((int(row[1][3][0]) + int(row[1][3][1])) - (int(row[1][2][0]) + int(row[1][2][1])))
except:
SELFISHCHOICESELFMORE.append(None)
SELFISHCHOICEGROUPMORE.append(None)

In [12]:
tempdata = tempdata.drop(['RESPONSE1','RESPONSE2', 'RESPONSE', 'ERROR'], axis = 1)
tempdata['SELFISHCHOICESELFMORE'] = SELFISHCHOICESELFMORE
tempdata['SELFISHCHOICEGROUPMORE'] = SELFISHCHOICEGROUPMORE


Concatenating and writing to a csv:

In [14]:
outdata = pd.concat([data,tempdata], axis = 1)
outdata.to_csv('combineddata3.csv')


# Mousetracker Data #2

## Overview¶

Since the time of the last post about these data, my friends and I at the NYU couples lab have collect some actual Mousetracker data from online participants. In this post, I wrote some code to pre-process and clean the dataset.

## Current project¶

As a refresher, we hypothesized that at any given time, individuals are concerned with:

1. their self-interest
2. their partner's interests
3. the interest of the group or dyad, or the relationship, or them as a pair

and these motives affect the way individuals choose to distribute resources.

To distinguish between these three motives, we generated three sets of stimuli using poker chips that pit each of these motives against each other:

1. The first set of stimuli pit participants' self-interest against the interests of their partner.
2. The second set of stimuli pits a participant's concern for the interest of their partner vs. their own self interest and the group's interest.
3. The last set of stimuli pit participants' self-interest against that of their partner and the group.

## The data¶

The data come in a person-period dataset. This is a "long" format where each participant has multiple rows that represent each trial of the experiment (there were 60 or so trials). However, each row also contains multiple columns each representing a bin of average locations the participant's mouse pointer was during that time span. There are ~100 such bins.

In other words, each participant made 60 choices, and their mouse positions were averaged into ~100 time points per trial.

The first thing we're going to do is actually some cleaning. Each of the .csv files have a redundant first row, so we're just going to delete that:

In [1]:
import os
import re

files = os.listdir('./data')
files

Out[1]:
['GBPG SMGL VS. SLGM n=118.csv',
'GBPG SMGM VS. SLGL n=118.csv',
'GBPG SO n=118.csv',
'GBPR SMGL VS. SLGM n=121.csv',
'GBPR SMGM vs. SLGL n=121.csv',
'GBPR SO n=121.csv',
'RBPG SMGL vs. SLGM n=105.csv',
'RBPG SMGM vs. SLGL n=105.csv',
'RBPG SO n=105.csv',
'RBPR SMGL vs. SLGM n=120.csv',
'RBPR SMGM vs. SLGL n=120.csv',
'RBPR SO n=120.csv']
In [2]:
#for all our files
for name in files:
with open('./data/{0}'.format(name)) as f:
out =  open('./data cleaned/{0}.csv'.format(re.match(r'(.*) n=.*', name).group(1)), 'wb')

found = 0
for line in contents[1:]:
if line == 'MEAN SUBJECT-BY-SUBJECT DATA\n':
found = 1
elif found == 0:
out.write(line)
elif found == 1:
pass
out.close()


If there wasn't a section at the end I had to delete, I could have just done this:

In [3]:
import pandas as pd

testdata = pd.read_csv('./data/%s' % files[0],skiprows = 1)

Out[3]:
subject trial stimfile condition code resp_1 resp_2 response distractor error ... Y_93 Y_94 Y_95 Y_96 Y_97 Y_98 Y_99 Y_100 Y_101 Unnamed: 226
0 75870 2 NaN 1 smglvsslgm4 ~S9_O7.jpg ~S8_O12.jpg 1 2 0 ... 1.1770 1.1811 1.1826 1.1825 1.1824 1.1874 1.1875 1.1875 1.1875 NaN
1 75870 3 NaN 1 smglvsslgm6 ~S10_O15.jpg ~S12_O8.jpg 2 1 0 ... 1.1825 1.1825 1.1823 1.1837 1.1865 1.1876 1.1875 1.1875 1.1875 NaN
2 75870 7 NaN 1 smglvsslgm18 ~S12_O7.jpg ~S8_O13.jpg 1 2 0 ... 1.2003 1.1955 1.1950 1.1935 1.1924 1.1925 1.1925 1.1925 1.1925 NaN
3 75870 9 NaN 1 smglvsslgm19 ~S9_O12.jpg ~S12_O8.jpg 2 1 0 ... 1.3081 1.3013 1.2999 1.3000 1.3000 1.3000 1.3000 1.3000 1.3000 NaN
4 75870 14 NaN 2 smglvsslgm9 ~S9_O7.jpg ~S10_O5.jpg 1 1 1 ... 1.2577 1.2575 1.2575 1.2577 1.2565 1.2543 1.2519 1.2500 1.2500 NaN

5 rows × 227 columns

Next, I'm going to write a loop that basically does what I did in the first post to all the separate datasets. Again, we're going to be finding the mean of participants' reaction time (RT), maximum deviation (MD), and the area under curve (AUC).

What we want in the end is a csv file that has the overall mean RT, MD, AUC, as well as those metrics when participants' were correct vs. incorrect.

I wrote two functions that basically do what I did by hand in the first post. The first combines two redundant columns, and the second finds the mean of that column, depending on whether the participant made an error or not, or whether we want the grand mean.

In [4]:
data = pd.read_csv('./data cleaned/%s' % os.listdir('./data cleaned')[0])

Out[4]:
subject trial stimfile condition code resp_1 resp_2 response distractor error ... Y_93 Y_94 Y_95 Y_96 Y_97 Y_98 Y_99 Y_100 Y_101 Unnamed: 226
0 75870 2 NaN 1 smglvsslgm4 ~S9_O7.jpg ~S8_O12.jpg 1 2 0 ... 1.1770 1.1811 1.1826 1.1825 1.1824 1.1874 1.1875 1.1875 1.1875 NaN
1 75870 3 NaN 1 smglvsslgm6 ~S10_O15.jpg ~S12_O8.jpg 2 1 0 ... 1.1825 1.1825 1.1823 1.1837 1.1865 1.1876 1.1875 1.1875 1.1875 NaN
2 75870 7 NaN 1 smglvsslgm18 ~S12_O7.jpg ~S8_O13.jpg 1 2 0 ... 1.2003 1.1955 1.1950 1.1935 1.1924 1.1925 1.1925 1.1925 1.1925 NaN
3 75870 9 NaN 1 smglvsslgm19 ~S9_O12.jpg ~S12_O8.jpg 2 1 0 ... 1.3081 1.3013 1.2999 1.3000 1.3000 1.3000 1.3000 1.3000 1.3000 NaN
4 75870 14 NaN 2 smglvsslgm9 ~S9_O7.jpg ~S10_O5.jpg 1 1 1 ... 1.2577 1.2575 1.2575 1.2577 1.2565 1.2543 1.2519 1.2500 1.2500 NaN

5 rows × 227 columns

In [5]:
#first, combine the redundant columns and type the relevant columns
def combine_columns(dddd):
dddd['MD'] = dddd.loc[dddd['MD_1'].isnull() == False, ['MD_1']]
dddd.loc[dddd['MD'].isnull() == True,['MD']] = dddd.loc[dddd['MD_2'].isnull() == False]['MD_2']
dddd['AUC'] = dddd.loc[dddd['AUC_1'].isnull() == False, ['AUC_1']]
dddd.loc[dddd['AUC'].isnull() == True, ['AUC']] = dddd.loc[dddd['AUC_2'].isnull() == False]['AUC_2']

combine_columns(data)

In [6]:
def find_mean(datasource, participantid, metric, error=None):
participantsdata = datasource.loc[datasource['subject'] == participantid]
if error == 1:
return participantsdata.loc[participantsdata['error']==1][metric].astype('float').mean()
elif error == 0:
return participantsdata.loc[participantsdata['error']==0][metric].astype('float').mean()
else:
return participantsdata[metric].astype('float').mean()


Next, we're going to test some code that calculates the mean of the afore-mentioned metrics for every participant in a dataset:

In [7]:
combine_columns(data)

In [8]:
participants = data['subject'].unique()

In [9]:
participantdict = {x:[] for x in participants}

In [10]:
for participant in participantdict:
for metric in ['AUC', 'MD', 'RT']:
try:
participantdict[participant].append(find_mean(data, participant, metric, error = 1))
except:
participantdict[participant].append(None)

In [11]:
outdata = pd.DataFrame.from_dict(participantdict,orient='index')
outdata.columns = ['AUC', 'MD', 'RT']

In [12]:
outdata.head()

Out[12]:
AUC MD RT
9104386 0.222260 0.147950 618.200000
7279624 -0.039400 -0.006537 1208.375000
3452429 0.378033 0.257911 1381.444444
7354384 0.034620 0.038660 1102.700000
479766 -0.036642 -0.048183 867.166667

Let's write this as a function:

In [13]:
def sum_participants(pkeys,metrics,err,datttt):
adictionary = {x:[] for x in pkeys}
for metric in metrics:
try:
adictionary[participant].append(find_mean(datttt, participant, metric ,error = err))
except:


Definitely not production code, but it should work.

### Combining datasets¶

Alright, now we have all of that working, let's combine the datasets that we have, and add features to tell us where the data came from:

In [14]:
files = os.listdir('./data cleaned')
files

Out[14]:
['GBPG SMGL VS. SLGM.csv',
'GBPG SMGM VS. SLGL.csv',
'GBPG SO.csv',
'GBPR SMGL VS. SLGM.csv',
'GBPR SMGM vs. SLGL.csv',
'GBPR SO.csv',
'RBPG SMGL vs. SLGM.csv',
'RBPG SMGM vs. SLGL.csv',
'RBPG SO.csv',
'RBPR SMGL vs. SLGM.csv',
'RBPR SMGM vs. SLGL.csv',
'RBPR SO.csv']

What we're going to do is first create an empty DataFrame with all our columns. Next, we'll load all of our data, do the math for the 3 measures, add a feature that captures where the data came from (i.e., one column for the color codes, another color for the comparison type).

In [15]:
combineddata = pd.DataFrame(columns= ['AUC', 'MD', 'RT', 'ERROR', 'COLORCODE', 'COMPARISON'])


Now, let's put everything together, and loop through all the filenames:

In [16]:
metrics = ['AUC', 'MD', 'RT']

for filename in files:

combine_columns(tempdata)

participants = tempdata['subject'].unique()

correctdict = sum_participants(participants,metrics,0,tempdata)
errordict = sum_participants(participants,metrics,1,tempdata)

correctdata = pd.DataFrame.from_dict(correctdict,orient='index')
errordata = pd.DataFrame.from_dict(errordict,orient='index')

outdata = pd.concat([correctdata,errordata])
outdata.columns = ['AUC', 'MD', 'RT', 'ERROR']
outdata['COLORCODE'] = re.match(r'(....)', filename).group(1)
outdata['COMPARISON'] = re.match(r'.... (.*).csv',filename).group(1)

print filename
print len(outdata)

combineddata = pd.concat([combineddata,outdata])

GBPG SMGL VS. SLGM.csv
236
GBPG SMGM VS. SLGL.csv
236
GBPG SO.csv
236
GBPR SMGL VS. SLGM.csv
242
GBPR SMGM vs. SLGL.csv
242
GBPR SO.csv
242
RBPG SMGL vs. SLGM.csv
210
RBPG SMGM vs. SLGL.csv
210
RBPG SO.csv
210
RBPR SMGL vs. SLGM.csv
240
RBPR SMGM vs. SLGL.csv
240
RBPR SO.csv
240

In [17]:
combineddata.head()

Out[17]:
AUC MD RT ERROR COLORCODE COMPARISON
9104386 0.198260 0.117420 729.000000 0.0 GBPG SMGL VS. SLGM
7279624 0.147017 0.087525 1235.500000 0.0 GBPG SMGL VS. SLGM
3452429 0.530336 0.300000 1443.727273 0.0 GBPG SMGL VS. SLGM
7354384 0.006440 0.003520 1173.000000 0.0 GBPG SMGL VS. SLGM
479766 0.005037 0.016762 814.250000 0.0 GBPG SMGL VS. SLGM

This is what the data should look like, so let's write it to a csv:

In [18]:
combineddata.sort_index().to_csv('combineddata.csv')

In [19]:
outdata

Out[19]:
AUC MD RT ERROR COLORCODE COMPARISON
448004 2.707033 1.111633 906.333333 0 RBPR SO
2675720 0.352980 0.169195 953.950000 0 RBPR SO
2947082 0.160825 0.137081 1037.437500 0 RBPR SO
8594447 0.907000 0.577125 1015.250000 0 RBPR SO
515090 0.394933 0.282000 833.888889 0 RBPR SO
1283091 0.474890 0.310680 781.600000 0 RBPR SO
1591835 2.229329 0.997536 920.357143 0 RBPR SO
8448030 0.717493 0.313850 748.642857 0 RBPR SO
8767007 1.628893 0.699000 990.214286 0 RBPR SO
9444902 0.676900 0.361963 800.750000 0 RBPR SO
8873015 0.907767 0.487600 660.000000 0 RBPR SO
8469940 0.164550 0.122125 647.375000 0 RBPR SO
8325642 1.047320 0.590833 852.800000 0 RBPR SO
503358 0.058525 0.050817 682.333333 0 RBPR SO
7386178 1.234072 0.637233 924.611111 0 RBPR SO
953414 -0.118170 -0.146170 862.500000 0 RBPR SO
5312076 0.395989 0.157433 548.111111 0 RBPR SO
2515533 0.960689 0.571544 974.222222 0 RBPR SO
6012496 1.453787 0.773067 973.066667 0 RBPR SO
4717240 0.883219 0.595013 1131.687500 0 RBPR SO
3007065 0.044767 0.053333 739.555556 0 RBPR SO
3507292 -0.100017 -0.085700 537.000000 0 RBPR SO
1363052 1.691200 0.790547 985.941176 0 RBPR SO
1426541 0.000000 0.000000 404.636364 0 RBPR SO
9603603 0.744925 0.492655 1258.700000 0 RBPR SO
5480054 -0.025157 -0.026857 649.714286 0 RBPR SO
1355895 0.970286 0.480421 1062.285714 0 RBPR SO
7049336 0.971658 0.564100 852.736842 0 RBPR SO
3354234 0.390192 0.230254 1166.076923 0 RBPR SO
2585535 1.319237 0.650863 829.250000 0 RBPR SO
... ... ... ... ... ... ...
84352 0.928446 0.443062 743.384615 1 RBPR SO
3042691 NaN NaN NaN 1 RBPR SO
2597764 2.076241 0.906424 939.000000 1 RBPR SO
9324421 NaN NaN NaN 1 RBPR SO
5227406 NaN NaN NaN 1 RBPR SO
7229839 0.029771 0.035086 437.714286 1 RBPR SO
4169107 NaN NaN NaN 1 RBPR SO
6001558 -0.091046 -0.119262 596.538462 1 RBPR SO
647592 1.448136 0.694329 1193.571429 1 RBPR SO
6338986 -0.117800 -0.135150 452.000000 1 RBPR SO
3329964 0.321055 0.133191 687.454545 1 RBPR SO
4264365 0.253538 0.165177 686.923077 1 RBPR SO
3035571 0.064817 0.089733 607.333333 1 RBPR SO
6431156 0.261650 0.240750 892.500000 1 RBPR SO
7201720 0.105188 0.100663 598.250000 1 RBPR SO
5427132 0.064814 0.078571 694.571429 1 RBPR SO
1544127 0.308789 0.272522 692.666667 1 RBPR SO
265153 0.999410 0.517540 603.100000 1 RBPR SO
1399240 0.176720 0.162690 813.000000 1 RBPR SO
5108681 -0.053300 -0.107300 858.000000 1 RBPR SO
9541367 0.321488 0.219350 659.250000 1 RBPR SO
2938324 0.060600 0.097500 781.000000 1 RBPR SO
374233 0.635582 0.401791 760.272727 1 RBPR SO
8976353 0.608600 0.358400 1020.000000 1 RBPR SO
8706045 0.005117 0.013417 734.833333 1 RBPR SO
9728496 0.165137 0.152912 538.250000 1 RBPR SO
7825395 NaN NaN NaN 1 RBPR SO
6582776 0.665817 0.497900 1213.333333 1 RBPR SO
4016125 1.036192 0.571175 828.333333 1 RBPR SO
7075839 NaN NaN NaN 1 RBPR SO

240 rows × 6 columns

# Mousetracker Data #1

## Overview¶

I'm going to be looking at some pilot data that some colleagues and I collected using Jon Freeman's Mousetracker (http://www.mousetracker.org/). As the name suggests, mousetracker is a program designed to track participants' mouse movements. In social psychology, researchers use it to track participants' implicit decision making processes. Originally, it was developed to study how individuals categorize faces. An example of the paradigm would be a participant having to choose whether a face is male or female, like so:

The researcher could then vary the degree to which the face has stereotypically male features, or stereotypically female features, and track not just what participants are categorizing the faces as, but also, how they reach those decisions, by tracking the paths and time course of the mouse movements.

## Current project¶

Anyway, some friends and I are currently working on distinguishing how individuals allocate resources in the context of a relationship. We hypothesize that at any given time, individuals are concerned with:

1. their self-interest
2. their partner's interests
3. the interest of the group or dyad, or the relationship, or them as a pair

and these motives affect the way individuals choose to distribute resources. To distinguish between these three motives, we generated three sets of stimuli using poker chips that pit each of these motives against each other.

The first set of stimuli pit participants' self-interest against the interests of their partner. For example, if red poker chips were paid out to you and green to your partner, one dilemma would be choosing between these two stacks of poker chips of equal height (i.e., the group receives the same in both cases):

Left Right

The second set of stimuli pits a participant's concern for the interest of their partner vs. their own self interest and the group's interest. This captures participants' "pure" altruistic motives in the sense that choosing to favor their partner in this scenario sacrifices both their own interests and the group's interest:

Left Right

Finally, the last set of stimuli pit participants' self-interest against that of their partner and the group. In this case, one set of poker chips results in the participant getting more than the other set of chips, but in the other set of poker chips, his/her partner gets more and so does the pair of them:

Left Right

## The data¶

The data come in a person-period dataset. This is a "long" format where each participant has multiple rows that represent each trial of the experiment (there were 60 or so trials). However, each row also contains multiple columns each representing a bin of average locations the participant's mouse pointer was during that time span. There are ~100 such bins.

In other words, each participant made 60 choices, and their mouse positions were averaged into ~100 time points per trial.

The first thing we're going to do is to load our data. To do this, we first import Pandas, read our .csv file and print a list of columns. The raw data can be found here: https://raw.githubusercontent.com/bryansim/Python/master/mousetrackerdata/mousetrackercorrected.csv

In [1]:
import pandas as pd
import re

In [2]:
data = pd.read_csv("mousetrackercorrected.csv")
data.columns.values

Out[2]:
array(['subject', 'trial', 'stimfile', 'condition', 'code', 'resp_1',
'resp_2', 'response', 'distractor', 'error', 'init time', 'RT',
'MD_1', 'MD_2', 'AUC_1', 'AUC_2', 'MD_time', 'x-flip', 'y-flip',
'z-MD-separate', 'z-MD-together', 'z-AUC-separate',
'z-AUC-together', 'comments', 'X_1', 'X_2', 'X_3', 'X_4', 'X_5',
'X_6', 'X_7', 'X_8', 'X_9', 'X_10', 'X_11', 'X_12', 'X_13', 'X_14',
'X_15', 'X_16', 'X_17', 'X_18', 'X_19', 'X_20', 'X_21', 'X_22',
'X_23', 'X_24', 'X_25', 'X_26', 'X_27', 'X_28', 'X_29', 'X_30',
'X_31', 'X_32', 'X_33', 'X_34', 'X_35', 'X_36', 'X_37', 'X_38',
'X_39', 'X_40', 'X_41', 'X_42', 'X_43', 'X_44', 'X_45', 'X_46',
'X_47', 'X_48', 'X_49', 'X_50', 'X_51', 'X_52', 'X_53', 'X_54',
'X_55', 'X_56', 'X_57', 'X_58', 'X_59', 'X_60', 'X_61', 'X_62',
'X_63', 'X_64', 'X_65', 'X_66', 'X_67', 'X_68', 'X_69', 'X_70',
'X_71', 'X_72', 'X_73', 'X_74', 'X_75', 'X_76', 'X_77', 'X_78',
'X_79', 'X_80', 'X_81', 'X_82', 'X_83', 'X_84', 'X_85', 'X_86',
'X_87', 'X_88', 'X_89', 'X_90', 'X_91', 'X_92', 'X_93', 'X_94',
'X_95', 'X_96', 'X_97', 'X_98', 'X_99', 'X_100', 'X_101', 'Y_1',
'Y_2', 'Y_3', 'Y_4', 'Y_5', 'Y_6', 'Y_7', 'Y_8', 'Y_9', 'Y_10',
'Y_11', 'Y_12', 'Y_13', 'Y_14', 'Y_15', 'Y_16', 'Y_17', 'Y_18',
'Y_19', 'Y_20', 'Y_21', 'Y_22', 'Y_23', 'Y_24', 'Y_25', 'Y_26',
'Y_27', 'Y_28', 'Y_29', 'Y_30', 'Y_31', 'Y_32', 'Y_33', 'Y_34',
'Y_35', 'Y_36', 'Y_37', 'Y_38', 'Y_39', 'Y_40', 'Y_41', 'Y_42',
'Y_43', 'Y_44', 'Y_45', 'Y_46', 'Y_47', 'Y_48', 'Y_49', 'Y_50',
'Y_51', 'Y_52', 'Y_53', 'Y_54', 'Y_55', 'Y_56', 'Y_57', 'Y_58',
'Y_59', 'Y_60', 'Y_61', 'Y_62', 'Y_63', 'Y_64', 'Y_65', 'Y_66',
'Y_67', 'Y_68', 'Y_69', 'Y_70', 'Y_71', 'Y_72', 'Y_73', 'Y_74',
'Y_75', 'Y_76', 'Y_77', 'Y_78', 'Y_79', 'Y_80', 'Y_81', 'Y_82',
'Y_83', 'Y_84', 'Y_85', 'Y_86', 'Y_87', 'Y_88', 'Y_89', 'Y_90',
'Y_91', 'Y_92', 'Y_93', 'Y_94', 'Y_95', 'Y_96', 'Y_97', 'Y_98',
'Y_99', 'Y_100', 'Y_101'], dtype=object)
In [3]:
data.iloc[0:4, 0:19]

Out[3]:
subject trial stimfile condition code resp_1 resp_2 response distractor error init time RT MD_1 MD_2 AUC_1 AUC_2 MD_time x-flip y-flip
0 455806 1 NaN 777 prac2 ~S3_O6.jpg ~S7_O8.jpg 1 1 1 256 1197 NaN 0.0922 NaN 0.0755 317 8 0
1 455806 3 NaN 777 prac8 ~S5_O4.jpg ~S10_O11.jpg 1 1 1 180 1238 NaN 0.0213 NaN 0.0030 268 4 0
2 455806 5 NaN 777 prac4 ~S5_O4.jpg ~S8_O9.jpg 1 2 0 151 858 NaN -0.0386 NaN -0.0139 0 4 0
3 455806 7 NaN 777 prac5 ~S10_O9.jpg ~S7_O8.jpg 1 1 1 151 1104 NaN 0.8836 NaN 1.3726 293 6 0

### Descriptives¶

In the above data, what we're going to be first doing is finding the mean of participants' reaction time (RT), maximum deviation (MD), and area under curve (AUC). The latter two measures are measures of how much participants were "attracted" to the other option despite selecting the option that they did.

There are two columns for each (e.g., MD_1 and MD_2 depending on which option participants chose). These end up being redundant with one another, and we'll have to combine them.

x-flips and y-flips, as their names suggest, measure the number of times participants' cursors flipped on the x and y axis.

To combine the two MD columns, we create a new column, find all the rows which have data in MD_1, and then fill in the rows which don't have data in MD_1 with the rows that have data in MD_2. We do the same with AUC.

In [4]:
data['MD'] = data.loc[data['MD_1'].isnull() == False, ['MD_1']]
data.loc[data['MD'].isnull() == True,['MD']] = data.loc[data['MD_2'].isnull() == False]['MD_2']
#We do this to get a slice instead of data.loc[data['MD_2'].isnull() == False, ['MD_2']] which returns a dataframe

In [5]:
data['AUC'] = data.loc[data['AUC_1'].isnull() == False, ['AUC_1']]
data.loc[data['AUC'].isnull() == True, ['AUC']] = data.loc[data['AUC_2'].isnull() == False]['AUC_2']


#### Mean MD and AUC¶

Now, we can use the .mean() method to get the mean of the above.

In [6]:
data['AUC'].mean()

Out[6]:
0.4573244198895028
In [7]:
data['MD'].mean()

Out[7]:
0.26286099447513805

#### Means by choice type¶

The next thing we want to do is see whether participants differed depending on what the type of choice was (e.g., self vs. other etc.) Eventually, we will have a 3x2 table of means:

self vs. other group more w/ self less group more w/ self more
chose selfish chose selfish chose selfish
chose selfless chose selfless chose selffless

Because of the way the conditions were coded (they include trial numbers), we'll use some regex to ignore those numbers:

In [8]:
sodata = data.loc[data['code'].str.extract(r'(so)', expand = False).isnull() == False]
smgldata = data.loc[data['code'].str.extract(r'(smgl)', expand = False).isnull() == False]
smgmdata = data.loc[data['code'].str.extract(r'(smgm)', expand = False).isnull() == False]

In [9]:
print sodata['MD'].mean()
print smgldata['MD'].mean()
print smgmdata['MD'].mean()

0.287940806045
0.240009049774
0.241194845361

In [10]:
print sodata['AUC'].mean()
print smgldata['AUC'].mean()
print smgmdata['AUC'].mean()

0.487281360202
0.4192239819
0.451306185567


AS IT TURNS OUT, this isn't very helpful, because this analysis collapses over whether or not participant chose the selfish or unselfish option, which is really what we're interested in. So let's look at that next:

In [11]:
print sodata.loc[sodata['error'] == 0]['MD'].mean()
print sodata.loc[sodata['error'] == 1]['MD'].mean()
print smgldata.loc[smgldata['error'] == 0]['MD'].mean()
print smgldata.loc[smgldata['error'] == 1]['MD'].mean()
print smgmdata.loc[smgmdata['error'] == 0]['MD'].mean()
print smgmdata.loc[smgmdata['error'] == 1]['MD'].mean()

0.283236649215
0.292302427184
0.225892405063
0.247862676056
0.181802877698
0.391294545455

In [12]:
print sodata.loc[sodata['error'] == 0]['AUC'].mean()
print sodata.loc[sodata['error'] == 1]['AUC'].mean()
print smgldata.loc[smgldata['error'] == 0]['AUC'].mean()
print smgldata.loc[smgldata['error'] == 1]['AUC'].mean()
print smgmdata.loc[smgmdata['error'] == 0]['AUC'].mean()
print smgmdata.loc[smgmdata['error'] == 1]['AUC'].mean()

0.511068062827
0.465226699029
0.499286075949
0.374682394366
0.314955395683
0.795901818182


So, that table above looks like this:

MD self vs. other group more w/ self less group more w/ self more
chose selfish .28 .23 .18
chose selfless .29 .25 .39
MD self vs. other group more w/ self less group more w/ self more
chose selfish .51 .50 .31
chose selfless .46 .37 .80

Note to self: I need to check if I have the selfish vs. selfless options coded correctly. I believe error == 0 = selfish.