%matplotlib inline

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import psycopg2
from datetime import date

pd.options.display.max_columns = 40
pd.options.mode.chained_assignment = None  # default='warn'

Hack for Heat #8: Complaint resolution time revisited¶

In this post, I'm going to explore a simple way of dealing with the censoring issue in complaint resolution time: Basically, because we only care about heat complaints during the heat season, we can constrain our cases of interest to the heating season specifically (1^st October to May 31^st).

#pull from our database again:
connection = psycopg2.connect('dbname = threeoneone user= threeoneoneadmin password = threeoneoneadmin')

cursor = connection.cursor()

cursor.execute('''SELECT createddate, closeddate, borough, complainttype FROM service;''')

data = cursor.fetchall()

data = pd.DataFrame(data)

data.columns = ['createddate', 'closeddate', 'borough', 'complainttype']

data = data.loc[(data['complainttype'] == 'HEATING') | (data['complainttype'] == 'HEAT/HOT WATER') ]

So far, we've subsetted the data to only include the heating complaints. Now let's create a datetime mask. What we want is to have subsets of data for each of the heating seasons:

heatmonths = range(1, 6) + [10,11,12]

heatmonths

[1, 2, 3, 4, 5, 10, 11, 12]

createdmask = data['createddate'].map(lambda x: (x.month in heatmonths))
closedmask = data['closeddate'].map(lambda x: (x.month in heatmonths) if x != None else False)

mask = createdmask & closedmask

heatseasondata = data.loc[mask]

How many heat/hot water complaints are created and closed inside vs. outside of heating season?

len(data.loc[mask]) #inside heating season

1302688

len(data.loc[~mask])#outside heating season

68750

The next thing we want to do is ignore cases where the complaint was resolved in the next heating season:

prevmonths = range(1, 6) 
nextmonths = [10,11,12]

heatseasondata['createdheatseason'] = [x.year if (x.month in prevmonths) else (x.year-1) for x in heatseasondata['createddate']]

heatseasondata.head()

heatseasondata['closedheatseason'] = [x.year if (x.month in prevmonths) else (x.year-1) for x in heatseasondata['closeddate']]

Now that we've done this, we can select only the cases where the closed date was in the same season as the created date:

heatseasondata = heatseasondata.loc[heatseasondata['createdheatseason']  == heatseasondata['closedheatseason']]

Okay, now we can calculate some average resolution times:

heatseasondata['resolutiontime'] = heatseasondata['closeddate'] - heatseasondata['createddate']

heatseasondata['resolutiontimeint'] = heatseasondata.resolutiontime.astype('timedelta64[D]')

resolutiontimedata = heatseasondata.groupby(by='createdheatseason').mean()['resolutiontimeint']

resolutiontimedata.to_csv('resolutiontimebyyear.csv')

resolutiontimedata

createdheatseason
2009    4.605845
2010    3.695498
2011    3.666409
2012    3.623974
2013    3.514244
2014    3.137578
2015    3.406145
2016    3.446075
Name: resolutiontimeint, dtype: float64

Resolution times by year:¶

x = resolutiontimedata.index.values
y = resolutiontimedata.values

plt.figure(figsize=(12,10));
plt.plot(x,y);

Resolution time by borough:¶

restimebyboro = heatseasondata.groupby(by=['borough', 'createdheatseason']).mean()['resolutiontimeint']

restimebyboro.to_csv('restimebyboro.csv')

restimebyboro = restimebyboro.loc[[x in range(2010,2017) for x in restimebyboro.index.get_level_values('createdheatseason')]]

boros = heatseasondata.borough.unique()

boroplots = {x:[] for x in boros}
for boro in boros:
    boroplots[boro] = restimebyboro.xs(boro).values
    
boroplots.pop('Unspecified')

array([ 3.64268956,  3.89000015])

x = range(2010,2017)

plt.figure(figsize=(12,10));
for boro in boroplots:
    plt.plot(x,boroplots[boro]);
plt.legend(boroplots.keys());
plt.xticks(x, [str(label) for label in x]);

I was told beforehand that the average resolution time from a similar analysis last year was about 3-5 days, so this looks about right.

	createddate	closeddate	borough	complainttype	createdheatseason
0	2014-11-18	2014-11-22	BROOKLYN	HEAT/HOT WATER	2013
1	2014-11-18	2014-11-25	BROOKLYN	HEAT/HOT WATER	2013
2	2014-11-18	2014-11-19	BRONX	HEAT/HOT WATER	2013
3	2014-11-18	2014-11-19	MANHATTAN	HEAT/HOT WATER	2013
4	2014-11-18	2014-11-20	BRONX	HEAT/HOT WATER	2013