%matplotlib inline

Misc #3: Soup¶

My friend and I once argued about whether one "eats" or "drinks" soup. I believed that you "drink" soup, whereas she thought that you "eat" it.

From our conversation, I realized the verbs people associate with various foods have to do with the state of matter they mentally conjure up when thinking about those foods. For example, I posit that if you came up to me while I was holding a Slurpee (for the sake of balance, other semi-frozen beverages are available), I would be eating it if it just came out of the machine, but drinking it if I'd been standing out in the sweltering New York summer heat for a while.

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import math

Thus, this simple dataset about soup was born. Using Amazon's Mechanical Turk, I asked 107 people: (a) Whether they thought soup was more similar to stew (closer to solid) or broth (closer to liquid), and (b) whether they were more likely to eat or drink soup.

Participants answered both questions on a 1 to 5 scale with 5 being "most likely to drink" and "closer to stew" (effectively reverse-coding their responses and erring conservatively with regard to my own hypothesis should participants cluster their responses at the ends of the scale). There was also a free-response question that was quite entertaining to read.

soupdata = pd.read_csv('SOUPDATA.csv')

soupdata.head()

Scatterplot of responses¶

This first scatterplot is not a very good one, because both scales consisted of 5 discrete points, so the points on the graph overlap (we'll fix that in a second).

plt.figure(figsize=(12,10))
plt.scatter(soupdata['SIMILAR'],soupdata['EATDRINK']);
plt.xlabel("5 = most similar to stew", size = 24);
plt.ylabel("5 = more likely to drink", size = 24);

We can artifically jitter the points to make the variance of these points more visible. First, let's write a simple function that takes an array and jitters it:

def jitter(array):
    multiplier = 1 + 0.1* np.random.random(np.shape(array))
    return multiplier * array

This allows us to see how responses were clustered. Already, this pattern of data appears to reflect a negative correlation (which is what I expected), but we'll test that later.

plt.figure(figsize=(12,10))
plt.scatter(jitter(soupdata['SIMILAR']), jitter(soupdata['EATDRINK']));
plt.xlabel("5 = most similar to stew", size = 24);
plt.ylabel("5 = more likely to drink", size = 24);

Broth vs. stew¶

We can then look at the distribution of the two scales to give us further insight into how our sample responded to the two scales.

np.mean(soupdata['SIMILAR'])

2.9056603773584904

This mean tells us that on average, the sample was split between whether or not soup was broth-like or stew-like.

However, the distribution was such that there weren't two discrete groups of "broth people" vs. "stew people". Rather, most people think it's somewhere in between, or lean slightly to one side. In some ways, this serves as a "sanity check" and also tells us that people know that different types of soup exist.

plt.figure(figsize=(8,6));
plt.hist(soupdata['SIMILAR'], bins = range(1,7));
plt.xlabel('5 = most simlar to stew', size = 24);
plt.ylabel('Frequency', size = 24);
plt.xticks(range(1,6));

Eating vs. drinking soup¶

On the other hand, the participants' average "eating/drinking" score was slightly below the mean, indictating that they'd be more likely to "eat" soup than "drink" it. This is not surprising considering I restricted the sample to respondents living in the U.S., where my friend and I were having our argument.

np.mean(soupdata['EATDRINK'])

2.4622641509433962

The distribution of responses is also normal, with a slight positive skew.

plt.figure(figsize=(8,6));
plt.hist(soupdata['EATDRINK'], bins = range(1,7));
plt.xlabel('5 = most simlar to stew', size = 24);
plt.ylabel('Frequency', size = 24);
plt.xticks(range(1,6));

To tell us whether or not the 2.46 average that we obtained (vs. the scale midpoint of 3) might have been due to sampling error, we can also calculate the results of a one-way t-test, comparing 2.46 to the midpoint of the scale. This involves finding the standard error of the sample, and although there are numerous packages that can do this for you, it's easy enough to do by hand:

#Standard deviation
soupsd = np.std(soupdata['EATDRINK'])
print 'Standard deviation = {0}'.format(soupsd)

#Standard error = standard deviation/square root(sample size)
soupse = soupsd/math.sqrt(len(soupdata['EATDRINK']))
print 'Standard error = {0}'.format(soupse)

#t-value = (mean1 - mean2)/standard error
tvalue = (3 - np.mean(soupdata['EATDRINK']))/soupse
print 't({0}) = {1}'.format(len(soupdata['EATDRINK']), tvalue)

Standard deviation = 1.08309939875
Standard error = 0.105199913354
t(106) = 5.11156171059

This gives us a t-value of 5.11 with 106 degrees of freedom, which, if you look up in a t-distribution table, returns a p of < .01 (for reference, a t-value of ~2 with df of ~30 gives you p <.05). We can also calculate confidence intervals:

upperbound = 2.462 + (soupse *1.96)
lowerbound = 2.462 - (soupse *1.96)
print '95% confidence intervals [{0}]'.format((lowerbound,upperbound))

95% confidence intervals [(2.255808169826536, 2.6681918301734644)]

This tells us that if we randomly sampled our population over and over again, 95% of the sample means would fall between 2.26 and 2.67, still below the scale midpoint of 3. Taken together, it's quite likely (and also quite theoretically reasonable), that these participants' tendency to favor "eating" soup was not due to sampling error.

Predicting eating vs. drinking from broth-y vs. stew-y soups¶

So, what about my hypothesis? To test whether people's mental representation of soup predicts the verb that they use to describe consuming it, we can run a simply OLS regression. I'm not going to manually write out the algorithm for it (although I have before, see here), and instead, borrow the much prettier statsmodel one.

(CAVEAT: These are correlational data; the usual caveats with regard to establishing causality apply)

import statsmodels.api as sm

subdata = sm.add_constant(soupdata)
subdata['SIMILARC'] = subdata['SIMILAR'] - np.mean(subdata['SIMILAR'])
model = sm.OLS(subdata['EATDRINK'],subdata.loc[:,['const','SIMILARC']],hasconstant=1)
results = model.fit()

results.summary()

In this analysis, the first coefficient (const), 2.46, tells us that a hypothetical person who though soup could be just as much broth-like as it could be stew-like is still more likely to eat it than drink it (2.46 being significantly lower than the midpoint of 3).

The second coefficient, SIMILARC, is negative, statistically significant and tells us that believing soup is stew-like is associated with saying that one would eat (vs. drink) soup. Specifically, for every 1 point a hypothetical respondent says that soup is similar to stew (vs. broth), he/she is predicted to respond -.27 points lower on the eat vs. drink scale.

Coming back to our original scatter plot, we can draw a fit-line over it:

#Defining a function for our fit line:
def fitline(Xs):
    return [y*(-.2701)+3.47 for y in Xs] #The 3.47 (vs. 2.46) comes from the uncentered version of this analysis

plt.figure(figsize=(12,10))
plt.scatter(jitter(soupdata['SIMILAR']), jitter(soupdata['EATDRINK']));
plt.xlabel("5 = most similar to stew", size = 24);
plt.ylabel("5 = more likely to drink", size = 24);
plt.plot(range(1,6), fitline(range(1,6)));

Conclusion¶

In summary, I learned that:

Most people obviously know that there are different types of soup, but there is variabilty in what someone's "default" category is.
Not everyone "eats" soup, but more people in the U.S. "eat" it than "drink it".
Finally, people who associate soup with stew are more likely to eat it than drink it.

I'm not sure who won the argument in the end, but this analysis does feel a little like winning the battle but losing the war.

	SIMILAR	EATDRINK	COMMENT
0	5	3	this little survey made me laugh
1	4	3	Sopu is awesome.
2	4	1	NaN
3	2	2	NaN
4	2	3	Soup can only be eaten if meat and other harde...

Dep. Variable:	EATDRINK	R-squared:	0.062
Model:	OLS	Adj. R-squared:	0.053
Method:	Least Squares	F-statistic:	6.832
Date:	Fri, 29 Jul 2016	Prob (F-statistic):	0.0103
Time:	17:58:57	Log-Likelihood:	-155.50
No. Observations:	106	AIC:	315.0
Df Residuals:	104	BIC:	320.3
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
const	2.4623	0.103	23.933	0.000	2.258 2.666
SIMILARC	-0.2701	0.103	-2.614	0.010	-0.475 -0.065

Omnibus:	3.808	Durbin-Watson:	2.014
Prob(Omnibus):	0.149	Jarque-Bera (JB):	3.819
Skew:	0.450	Prob(JB):	0.148
Kurtosis:	2.766	Cond. No.	1.00