%matplotlib inline from matplotlib import pyplot as plt from matplotlib import patches as mpatches import pandas as pd import numpy as np import psycopg2 from datetime import datetime from datetime import date pd.options.display.max_columns = 40 connection = psycopg2.connect('dbname = threeoneone user=threeoneoneadmin password=threeoneoneadmin') cursor = connection.cursor()
Hack for Heat #10: NYC's top complaints¶
In this post, I'm going to look at what the top 311 complaints in NYC are, and how their numbers have changed over the years.
cursor.execute('''SELECT createddate, borough, complainttype FROM service''') totalcomplaints = cursor.fetchall()
Over the past 7 years, there were about 13 million complaints, which is about just 1.5 per person over the 7 years, or .22 complaints per person per year
totalcomplaints = pd.DataFrame(totalcomplaints)
totalcomplaints.columns = ['Date', 'Boro', 'Comptype']
totalcomplaints['Year'] = [x.year for x in totalcomplaints['Date']]
This is how it breaks down over the years:
complainttypesbyseason = totalcomplaints.groupby(by=['Year','Comptype']).count()
topfive = complainttypesbyseason['Date'].groupby(level=0,group_keys=False).apply(lambda x:x.sort_values(ascending=False).head(5)) topfive
Year Comptype 2010 HEATING 214218 GENERAL CONSTRUCTION 127131 Street Light Condition 116466 Noise - Residential 115615 PLUMBING 111504 2011 HEATING 190184 GENERAL CONSTRUCTION 133242 PLUMBING 113218 Noise - Residential 111447 Street Condition 104694 2012 HEATING 182974 Noise - Residential 127943 GENERAL CONSTRUCTION 112436 Street Light Condition 93981 PLUMBING 87561 2013 HEATING 202896 Noise - Residential 151516 GENERAL CONSTRUCTION 112169 Street Light Condition 92502 PLUMBING 86564 2014 Noise - Residential 192608 HEAT/HOT WATER 132767 Street Condition 114545 HEATING 97597 Street Light Condition 94665 2015 HEAT/HOT WATER 225706 Noise - Residential 208026 Street Condition 124378 Blocked Driveway 100881 Street Light Condition 98106 2016 HEAT/HOT WATER 126906 Noise - Residential 104324 Illegal Parking 55932 Blocked Driveway 53955 Street Condition 51977 dtype: int64
These data gives us a rough idea of how the number of complaints have changed over time, but one problem is that the labels that were used changed each year (presumably because they were replaced by new labels that better reflected what the complaints were). Secondly, for our purposes, some categories can be combined. For example, 'Street Condition' and 'Street Light Condition' may be important for administrators, but for now, we can be content with just "stuff that's gone wrong on the street".
Index([u'Blocked Driveway', u'GENERAL CONSTRUCTION', u'HEAT/HOT WATER', u'HEATING', u'Illegal Parking', u'Noise - Residential', u'PLUMBING', u'Street Condition', u'Street Light Condition'], dtype='object', name=u'Comptype')
So, over the last 5 years, there have been 10 different categories that have made up the top five. Notice that some of these labels have changed over time. We'll have to standardize them.
years = totalcomplaints.Year.unique() #first have a list of labels that are the same oldlabels = totalcomplaints.loc[totalcomplaints['Year'] == years,['Comptype']]['Comptype'].unique() newlabels =  for x in years: yearslabels = totalcomplaints.loc[totalcomplaints['Year'] == x,['Comptype']]['Comptype'].unique() [newlabels.append(x) for x in yearslabels if x not in oldlabels and x not in newlabels]
From doing this, we get two lists: oldlabels and newlabels, that we will use to decide how to combine stuff.
One modification I'm going to make is to combine 'Street Condition', 'Street Light Condition', 'Street Sign - Damaged', 'Street Sign - Dangling', and 'Street Sign - Missing' into 'Street Condition'.
For the current purposes, these are the same thing.
I'm also going to collapse 'GENERAL CONSTRUCTION', 'General Construction/Plumbing' and 'PLUMBING' into a new category called 'CONSTRUCTION/PLUMBING'.
I'll collapse noise related complaints ('Noise', 'Noise - Commercial', 'Noise - Helicopter', 'Noise - House of Worship', 'Noise - Park', 'Noise - Residential', 'Noise - Street/Sidewalk', 'Noise - Vehicle') into a label named 'NOISY'.
Finally, I'll combine 'HEAT/HOT WATER' and 'HEATING' into 'HEAT RELATED'.
streetconditioncomb = ['Street Condition', 'Street Light Condition', 'Street Sign - Damaged', 'Street Sign - Dangling', \ 'Street Sign - Missing'] totalcomplaints['RevComptype'] = ['Street Condition' if x in streetconditioncomb else x for x in totalcomplaints['Comptype']] genconstructcomb = ['GENERAL CONSTRUCTION', 'General Construction/Plumbing', 'PLUMBING'] totalcomplaints['RevComptype'] = ['CONSTRUCTION/PLUMBING' if x in genconstructcomb else x for x in totalcomplaints['RevComptype']] noisecomb = ['Noise', 'Noise - Commercial', 'Noise - Helicopter', 'Noise - House of Worship', 'Noise - Park',\ 'Noise - Residential', 'Noise - Street/Sidewalk', 'Noise - Vehicle'] totalcomplaints['RevComptype'] = ['NOISY' if x in noisecomb else x for x in totalcomplaints['RevComptype']] heatcomb = ['HEAT/HOT WATER', 'HEATING'] totalcomplaints['RevComptype'] = ['HEAT RELATED' if x in heatcomb else x for x in totalcomplaints['RevComptype']]
revisedcomplainttypes = totalcomplaints.groupby(by=['Year','RevComptype']).count()
topfiverev = revisedcomplainttypes['Date'].groupby(level=0, group_keys=False).apply(lambda x : x.sort_values(ascending=False).head(8)) topfiverev
Year RevComptype 2010 CONSTRUCTION/PLUMBING 287253 Street Condition 229584 HEAT RELATED 214218 NOISY 201834 PAINT - PLASTER 93194 Water System 70346 NONCONST 69659 Traffic Signal Condition 53773 2011 CONSTRUCTION/PLUMBING 272234 Street Condition 216865 NOISY 196453 HEAT RELATED 190184 PAINT - PLASTER 100704 NONCONST 68407 Water System 61577 Blocked Driveway 52592 2012 CONSTRUCTION/PLUMBING 225338 NOISY 220392 HEAT RELATED 182974 Street Condition 171910 PAINT - PLASTER 77287 NONCONST 60055 Water System 57678 Blocked Driveway 50818 2013 NOISY 259356 CONSTRUCTION/PLUMBING 223314 HEAT RELATED 202896 Street Condition 182618 PAINT - PLASTER 77957 Blocked Driveway 57408 NONCONST 53831 Water System 53186 2014 NOISY 337892 HEAT RELATED 230364 Street Condition 221097 CONSTRUCTION/PLUMBING 106251 Blocked Driveway 79170 PAINT/PLASTER 63503 Illegal Parking 63243 UNSANITARY CONDITION 61789 2015 NOISY 387227 Street Condition 233839 HEAT RELATED 225706 Blocked Driveway 100881 Illegal Parking 92679 CONSTRUCTION/PLUMBING 83585 UNSANITARY CONDITION 82888 Water System 71086 2016 NOISY 197186 HEAT RELATED 126906 Street Condition 105346 Illegal Parking 55932 Blocked Driveway 53955 CONSTRUCTION/PLUMBING 39460 UNSANITARY CONDITION 35357 Water System 31299 dtype: int64
This gives us some different numbers. Taken together, although heat related complaints are still constantly in the top 5, what we observe over time is also a gradual decrease in construction related complaints, a constant, variable but large amount of street-related complaints, and a steady increase in noise-related complaints. Next, I'll post a version of this looking at only the heating season.