Mousetracker Data #1

Overview

I'm going to be looking at some pilot data that some colleagues and I collected using Jon Freeman's Mousetracker (http://www.mousetracker.org/). As the name suggests, mousetracker is a program designed to track participants' mouse movements. In social psychology, researchers use it to track participants' implicit decision making processes. Originally, it was developed to study how individuals categorize faces. An example of the paradigm would be a participant having to choose whether a face is male or female, like so: mousetracker image here

The researcher could then vary the degree to which the face has stereotypically male features, or stereotypically female features, and track not just what participants are categorizing the faces as, but also, how they reach those decisions, by tracking the paths and time course of the mouse movements.

Current project

Anyway, some friends and I are currently working on distinguishing how individuals allocate resources in the context of a relationship. We hypothesize that at any given time, individuals are concerned with:

  1. their self-interest
  2. their partner's interests
  3. the interest of the group or dyad, or the relationship, or them as a pair

and these motives affect the way individuals choose to distribute resources. To distinguish between these three motives, we generated three sets of stimuli using poker chips that pit each of these motives against each other.

The first set of stimuli pit participants' self-interest against the interests of their partner. For example, if red poker chips were paid out to you and green to your partner, one dilemma would be choosing between these two stacks of poker chips of equal height (i.e., the group receives the same in both cases):

Left Right
so1 so2

The second set of stimuli pits a participant's concern for the interest of their partner vs. their own self interest and the group's interest. This captures participants' "pure" altruistic motives in the sense that choosing to favor their partner in this scenario sacrifices both their own interests and the group's interest:

Left Right
smgm1 smgm2

Finally, the last set of stimuli pit participants' self-interest against that of their partner and the group. In this case, one set of poker chips results in the participant getting more than the other set of chips, but in the other set of poker chips, his/her partner gets more and so does the pair of them:

Left Right
slgm1 slgm2

The data

The data come in a person-period dataset. This is a "long" format where each participant has multiple rows that represent each trial of the experiment (there were 60 or so trials). However, each row also contains multiple columns each representing a bin of average locations the participant's mouse pointer was during that time span. There are ~100 such bins.

In other words, each participant made 60 choices, and their mouse positions were averaged into ~100 time points per trial.

The first thing we're going to do is to load our data. To do this, we first import Pandas, read our .csv file and print a list of columns. The raw data can be found here: https://raw.githubusercontent.com/bryansim/Python/master/mousetrackerdata/mousetrackercorrected.csv

In [1]:
import pandas as pd
import re
In [2]:
data = pd.read_csv("mousetrackercorrected.csv")
data.columns.values
Out[2]:
array(['subject', 'trial', 'stimfile', 'condition', 'code', 'resp_1',
       'resp_2', 'response', 'distractor', 'error', 'init time', 'RT',
       'MD_1', 'MD_2', 'AUC_1', 'AUC_2', 'MD_time', 'x-flip', 'y-flip',
       'z-MD-separate', 'z-MD-together', 'z-AUC-separate',
       'z-AUC-together', 'comments', 'X_1', 'X_2', 'X_3', 'X_4', 'X_5',
       'X_6', 'X_7', 'X_8', 'X_9', 'X_10', 'X_11', 'X_12', 'X_13', 'X_14',
       'X_15', 'X_16', 'X_17', 'X_18', 'X_19', 'X_20', 'X_21', 'X_22',
       'X_23', 'X_24', 'X_25', 'X_26', 'X_27', 'X_28', 'X_29', 'X_30',
       'X_31', 'X_32', 'X_33', 'X_34', 'X_35', 'X_36', 'X_37', 'X_38',
       'X_39', 'X_40', 'X_41', 'X_42', 'X_43', 'X_44', 'X_45', 'X_46',
       'X_47', 'X_48', 'X_49', 'X_50', 'X_51', 'X_52', 'X_53', 'X_54',
       'X_55', 'X_56', 'X_57', 'X_58', 'X_59', 'X_60', 'X_61', 'X_62',
       'X_63', 'X_64', 'X_65', 'X_66', 'X_67', 'X_68', 'X_69', 'X_70',
       'X_71', 'X_72', 'X_73', 'X_74', 'X_75', 'X_76', 'X_77', 'X_78',
       'X_79', 'X_80', 'X_81', 'X_82', 'X_83', 'X_84', 'X_85', 'X_86',
       'X_87', 'X_88', 'X_89', 'X_90', 'X_91', 'X_92', 'X_93', 'X_94',
       'X_95', 'X_96', 'X_97', 'X_98', 'X_99', 'X_100', 'X_101', 'Y_1',
       'Y_2', 'Y_3', 'Y_4', 'Y_5', 'Y_6', 'Y_7', 'Y_8', 'Y_9', 'Y_10',
       'Y_11', 'Y_12', 'Y_13', 'Y_14', 'Y_15', 'Y_16', 'Y_17', 'Y_18',
       'Y_19', 'Y_20', 'Y_21', 'Y_22', 'Y_23', 'Y_24', 'Y_25', 'Y_26',
       'Y_27', 'Y_28', 'Y_29', 'Y_30', 'Y_31', 'Y_32', 'Y_33', 'Y_34',
       'Y_35', 'Y_36', 'Y_37', 'Y_38', 'Y_39', 'Y_40', 'Y_41', 'Y_42',
       'Y_43', 'Y_44', 'Y_45', 'Y_46', 'Y_47', 'Y_48', 'Y_49', 'Y_50',
       'Y_51', 'Y_52', 'Y_53', 'Y_54', 'Y_55', 'Y_56', 'Y_57', 'Y_58',
       'Y_59', 'Y_60', 'Y_61', 'Y_62', 'Y_63', 'Y_64', 'Y_65', 'Y_66',
       'Y_67', 'Y_68', 'Y_69', 'Y_70', 'Y_71', 'Y_72', 'Y_73', 'Y_74',
       'Y_75', 'Y_76', 'Y_77', 'Y_78', 'Y_79', 'Y_80', 'Y_81', 'Y_82',
       'Y_83', 'Y_84', 'Y_85', 'Y_86', 'Y_87', 'Y_88', 'Y_89', 'Y_90',
       'Y_91', 'Y_92', 'Y_93', 'Y_94', 'Y_95', 'Y_96', 'Y_97', 'Y_98',
       'Y_99', 'Y_100', 'Y_101'], dtype=object)
In [3]:
data.iloc[0:4, 0:19]
Out[3]:
subject trial stimfile condition code resp_1 resp_2 response distractor error init time RT MD_1 MD_2 AUC_1 AUC_2 MD_time x-flip y-flip
0 455806 1 NaN 777 prac2 ~S3_O6.jpg ~S7_O8.jpg 1 1 1 256 1197 NaN 0.0922 NaN 0.0755 317 8 0
1 455806 3 NaN 777 prac8 ~S5_O4.jpg ~S10_O11.jpg 1 1 1 180 1238 NaN 0.0213 NaN 0.0030 268 4 0
2 455806 5 NaN 777 prac4 ~S5_O4.jpg ~S8_O9.jpg 1 2 0 151 858 NaN -0.0386 NaN -0.0139 0 4 0
3 455806 7 NaN 777 prac5 ~S10_O9.jpg ~S7_O8.jpg 1 1 1 151 1104 NaN 0.8836 NaN 1.3726 293 6 0

Descriptives

In the above data, what we're going to be first doing is finding the mean of participants' reaction time (RT), maximum deviation (MD), and area under curve (AUC). The latter two measures are measures of how much participants were "attracted" to the other option despite selecting the option that they did.

There are two columns for each (e.g., MD_1 and MD_2 depending on which option participants chose). These end up being redundant with one another, and we'll have to combine them.

x-flips and y-flips, as their names suggest, measure the number of times participants' cursors flipped on the x and y axis.

To combine the two MD columns, we create a new column, find all the rows which have data in MD_1, and then fill in the rows which don't have data in MD_1 with the rows that have data in MD_2. We do the same with AUC.

In [4]:
data['MD'] = data.loc[data['MD_1'].isnull() == False, ['MD_1']]
data.loc[data['MD'].isnull() == True,['MD']] = data.loc[data['MD_2'].isnull() == False]['MD_2'] 
#We do this to get a slice instead of data.loc[data['MD_2'].isnull() == False, ['MD_2']] which returns a dataframe
In [5]:
data['AUC'] = data.loc[data['AUC_1'].isnull() == False, ['AUC_1']]
data.loc[data['AUC'].isnull() == True, ['AUC']] = data.loc[data['AUC_2'].isnull() == False]['AUC_2']

Mean MD and AUC

Now, we can use the .mean() method to get the mean of the above.

In [6]:
data['AUC'].mean()
Out[6]:
0.4573244198895028
In [7]:
data['MD'].mean()
Out[7]:
0.26286099447513805

Means by choice type

The next thing we want to do is see whether participants differed depending on what the type of choice was (e.g., self vs. other etc.) Eventually, we will have a 3x2 table of means:

self vs. other group more w/ self less group more w/ self more
chose selfish chose selfish chose selfish
chose selfless chose selfless chose selffless

Because of the way the conditions were coded (they include trial numbers), we'll use some regex to ignore those numbers:

In [8]:
sodata = data.loc[data['code'].str.extract(r'(so)', expand = False).isnull() == False]
smgldata = data.loc[data['code'].str.extract(r'(smgl)', expand = False).isnull() == False]
smgmdata = data.loc[data['code'].str.extract(r'(smgm)', expand = False).isnull() == False]
In [9]:
print sodata['MD'].mean()
print smgldata['MD'].mean()
print smgmdata['MD'].mean()
0.287940806045
0.240009049774
0.241194845361
In [10]:
print sodata['AUC'].mean()
print smgldata['AUC'].mean()
print smgmdata['AUC'].mean()
0.487281360202
0.4192239819
0.451306185567

AS IT TURNS OUT, this isn't very helpful, because this analysis collapses over whether or not participant chose the selfish or unselfish option, which is really what we're interested in. So let's look at that next:

In [11]:
print sodata.loc[sodata['error'] == 0]['MD'].mean()
print sodata.loc[sodata['error'] == 1]['MD'].mean()
print smgldata.loc[smgldata['error'] == 0]['MD'].mean()
print smgldata.loc[smgldata['error'] == 1]['MD'].mean()
print smgmdata.loc[smgmdata['error'] == 0]['MD'].mean()
print smgmdata.loc[smgmdata['error'] == 1]['MD'].mean()
0.283236649215
0.292302427184
0.225892405063
0.247862676056
0.181802877698
0.391294545455
In [12]:
print sodata.loc[sodata['error'] == 0]['AUC'].mean()
print sodata.loc[sodata['error'] == 1]['AUC'].mean()
print smgldata.loc[smgldata['error'] == 0]['AUC'].mean()
print smgldata.loc[smgldata['error'] == 1]['AUC'].mean()
print smgmdata.loc[smgmdata['error'] == 0]['AUC'].mean()
print smgmdata.loc[smgmdata['error'] == 1]['AUC'].mean()
0.511068062827
0.465226699029
0.499286075949
0.374682394366
0.314955395683
0.795901818182

So, that table above looks like this:

MD self vs. other group more w/ self less group more w/ self more
chose selfish .28 .23 .18
chose selfless .29 .25 .39
MD self vs. other group more w/ self less group more w/ self more
chose selfish .51 .50 .31
chose selfless .46 .37 .80

Note to self: I need to check if I have the selfish vs. selfless options coded correctly. I believe error == 0 = selfish.