Hack for Heat #8: Complaint resolution time revisted
%matplotlib inline
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import psycopg2
from datetime import date
pd.options.display.max_columns = 40
pd.options.mode.chained_assignment = None # default='warn'
Hack for Heat #8: Complaint resolution time revisited¶
In this post, I'm going to explore a simple way of dealing with the censoring issue in complaint resolution time: Basically, because we only care about heat complaints during the heat season, we can constrain our cases of interest to the heating season specifically (1st October to May 31st).
#pull from our database again:
connection = psycopg2.connect('dbname = threeoneone user= threeoneoneadmin password = threeoneoneadmin')
cursor = connection.cursor()
cursor.execute('''SELECT createddate, closeddate, borough, complainttype FROM service;''')
data = cursor.fetchall()
data = pd.DataFrame(data)
data.columns = ['createddate', 'closeddate', 'borough', 'complainttype']
data = data.loc[(data['complainttype'] == 'HEATING') | (data['complainttype'] == 'HEAT/HOT WATER') ]
So far, we've subsetted the data to only include the heating complaints. Now let's create a datetime mask. What we want is to have subsets of data for each of the heating seasons:
heatmonths = range(1, 6) + [10,11,12]
heatmonths
createdmask = data['createddate'].map(lambda x: (x.month in heatmonths))
closedmask = data['closeddate'].map(lambda x: (x.month in heatmonths) if x != None else False)
mask = createdmask & closedmask
heatseasondata = data.loc[mask]
How many heat/hot water complaints are created and closed inside vs. outside of heating season?
len(data.loc[mask]) #inside heating season
len(data.loc[~mask])#outside heating season
The next thing we want to do is ignore cases where the complaint was resolved in the next heating season:
prevmonths = range(1, 6)
nextmonths = [10,11,12]
heatseasondata['createdheatseason'] = [x.year if (x.month in prevmonths) else (x.year-1) for x in heatseasondata['createddate']]
heatseasondata.head()
heatseasondata['closedheatseason'] = [x.year if (x.month in prevmonths) else (x.year-1) for x in heatseasondata['closeddate']]
Now that we've done this, we can select only the cases where the closed date was in the same season as the created date:
heatseasondata = heatseasondata.loc[heatseasondata['createdheatseason'] == heatseasondata['closedheatseason']]
Okay, now we can calculate some average resolution times:
heatseasondata['resolutiontime'] = heatseasondata['closeddate'] - heatseasondata['createddate']
heatseasondata['resolutiontimeint'] = heatseasondata.resolutiontime.astype('timedelta64[D]')
resolutiontimedata = heatseasondata.groupby(by='createdheatseason').mean()['resolutiontimeint']
resolutiontimedata.to_csv('resolutiontimebyyear.csv')
resolutiontimedata
Resolution times by year:¶
x = resolutiontimedata.index.values
y = resolutiontimedata.values
plt.figure(figsize=(12,10));
plt.plot(x,y);
Resolution time by borough:¶
restimebyboro = heatseasondata.groupby(by=['borough', 'createdheatseason']).mean()['resolutiontimeint']
restimebyboro.to_csv('restimebyboro.csv')
restimebyboro = restimebyboro.loc[[x in range(2010,2017) for x in restimebyboro.index.get_level_values('createdheatseason')]]
boros = heatseasondata.borough.unique()
boroplots = {x:[] for x in boros}
for boro in boros:
boroplots[boro] = restimebyboro.xs(boro).values
boroplots.pop('Unspecified')
x = range(2010,2017)
plt.figure(figsize=(12,10));
for boro in boroplots:
plt.plot(x,boroplots[boro]);
plt.legend(boroplots.keys());
plt.xticks(x, [str(label) for label in x]);
I was told beforehand that the average resolution time from a similar analysis last year was about 3-5 days, so this looks about right.