Hack for Heat #10: NYC's top complaints
%matplotlib inline
from matplotlib import pyplot as plt
from matplotlib import patches as mpatches
import pandas as pd
import numpy as np
import psycopg2
from datetime import datetime
from datetime import date
pd.options.display.max_columns = 40
connection = psycopg2.connect('dbname = threeoneone user=threeoneoneadmin password=threeoneoneadmin')
cursor = connection.cursor()
Hack for Heat #10: NYC's top complaints¶
In this post, I'm going to look at what the top 311 complaints in NYC are, and how their numbers have changed over the years.
cursor.execute('''SELECT createddate, borough, complainttype FROM service''')
totalcomplaints = cursor.fetchall()
len(totalcomplaints)
Over the past 7 years, there were about 13 million complaints, which is about just 1.5 per person over the 7 years, or .22 complaints per person per year
totalcomplaints = pd.DataFrame(totalcomplaints)
totalcomplaints.columns = ['Date', 'Boro', 'Comptype']
totalcomplaints['Year'] = [x.year for x in totalcomplaints['Date']]
This is how it breaks down over the years:
totalcomplaints.groupby(by=['Year']).count()
complainttypesbyseason = totalcomplaints.groupby(by=['Year','Comptype']).count()
topfive = complainttypesbyseason['Date'].groupby(level=0,group_keys=False).apply(lambda x:x.sort_values(ascending=False).head(5))
topfive
These data gives us a rough idea of how the number of complaints have changed over time, but one problem is that the labels that were used changed each year (presumably because they were replaced by new labels that better reflected what the complaints were). Secondly, for our purposes, some categories can be combined. For example, 'Street Condition' and 'Street Light Condition' may be important for administrators, but for now, we can be content with just "stuff that's gone wrong on the street".
topfive.index.levels[1]
So, over the last 5 years, there have been 10 different categories that have made up the top five. Notice that some of these labels have changed over time. We'll have to standardize them.
years = totalcomplaints.Year.unique()
#first have a list of labels that are the same
oldlabels = totalcomplaints.loc[totalcomplaints['Year'] == years[0],['Comptype']]['Comptype'].unique()
newlabels = []
for x in years:
yearslabels = totalcomplaints.loc[totalcomplaints['Year'] == x,['Comptype']]['Comptype'].unique()
[newlabels.append(x) for x in yearslabels if x not in oldlabels and x not in newlabels]
From doing this, we get two lists: oldlabels and newlabels, that we will use to decide how to combine stuff.
#sorted(oldlabels)
#sorted(newlabels)
One modification I'm going to make is to combine 'Street Condition', 'Street Light Condition', 'Street Sign - Damaged', 'Street Sign - Dangling', and 'Street Sign - Missing' into 'Street Condition'.
For the current purposes, these are the same thing.
I'm also going to collapse 'GENERAL CONSTRUCTION', 'General Construction/Plumbing' and 'PLUMBING' into a new category called 'CONSTRUCTION/PLUMBING'.
I'll collapse noise related complaints ('Noise', 'Noise - Commercial', 'Noise - Helicopter', 'Noise - House of Worship', 'Noise - Park', 'Noise - Residential', 'Noise - Street/Sidewalk', 'Noise - Vehicle') into a label named 'NOISY'.
Finally, I'll combine 'HEAT/HOT WATER' and 'HEATING' into 'HEAT RELATED'.
streetconditioncomb = ['Street Condition', 'Street Light Condition', 'Street Sign - Damaged', 'Street Sign - Dangling', \
'Street Sign - Missing']
totalcomplaints['RevComptype'] = ['Street Condition' if x in streetconditioncomb else x for x in totalcomplaints['Comptype']]
genconstructcomb = ['GENERAL CONSTRUCTION', 'General Construction/Plumbing', 'PLUMBING']
totalcomplaints['RevComptype'] = ['CONSTRUCTION/PLUMBING' if x in genconstructcomb else x for x in totalcomplaints['RevComptype']]
noisecomb = ['Noise', 'Noise - Commercial', 'Noise - Helicopter', 'Noise - House of Worship', 'Noise - Park',\
'Noise - Residential', 'Noise - Street/Sidewalk', 'Noise - Vehicle']
totalcomplaints['RevComptype'] = ['NOISY' if x in noisecomb else x for x in totalcomplaints['RevComptype']]
heatcomb = ['HEAT/HOT WATER', 'HEATING']
totalcomplaints['RevComptype'] = ['HEAT RELATED' if x in heatcomb else x for x in totalcomplaints['RevComptype']]
revisedcomplainttypes = totalcomplaints.groupby(by=['Year','RevComptype']).count()
topfiverev = revisedcomplainttypes['Date'].groupby(level=0, group_keys=False).apply(lambda x : x.sort_values(ascending=False).head(8))
topfiverev
This gives us some different numbers. Taken together, although heat related complaints are still constantly in the top 5, what we observe over time is also a gradual decrease in construction related complaints, a constant, variable but large amount of street-related complaints, and a steady increase in noise-related complaints. Next, I'll post a version of this looking at only the heating season.