Hack for Heat #11: NYC's top complaints pt.2

In [1]:
%matplotlib inline

from matplotlib import pyplot as plt
from matplotlib import patches as mpatches
import pandas as pd
import numpy as np
import psycopg2
from datetime import datetime
from datetime import date


pd.options.display.max_columns = 40

Hack for Heat #11: NYC's top complaints pt. 2

In this post, I'm going to post a visualization of the top complaints reported in the previous post. I have this in a csv:

In [2]:
tempdata = pd.read_csv('topfiverev.csv', names = ['Year','Category','Count'])
In [3]:
tempdata.head()
Out[3]:
Year Category Count
0 2010 CONSTRUCTION/PLUMBING 287253
1 2010 Street Condition 229584
2 2010 HEAT RELATED 214218
3 2010 NOISY 201834
4 2010 PAINT - PLASTER 93194
In [4]:
#Fixing some category names
tempdata['Category'] = ['PAINT/PLASTER' if (x == 'PAINT/PLASTER') | (x== 'PAINT - PLASTER') else x for x in tempdata['Category']]
tempdata['Category'] = ['Street Condition' if x == 'Traffic Signal Condition' else x for x in tempdata['Category'] ]

What we want is to create a data frame with as many rows as there are categories, as many columns as there are years, and the counts as the data points.

In [5]:
upcaseindex = [x.upper() for x in tempdata['Category'].unique()]
upcaseindex.remove('NONCONST') #remove this category because it refers to missing values
In [6]:
plotdata = pd.DataFrame(index = upcaseindex, columns = tempdata['Year'].unique())
In [7]:
plotdata
Out[7]:
2010 2011 2012 2013 2014 2015 2016
CONSTRUCTION/PLUMBING NaN NaN NaN NaN NaN NaN NaN
STREET CONDITION NaN NaN NaN NaN NaN NaN NaN
HEAT RELATED NaN NaN NaN NaN NaN NaN NaN
NOISY NaN NaN NaN NaN NaN NaN NaN
PAINT/PLASTER NaN NaN NaN NaN NaN NaN NaN
WATER SYSTEM NaN NaN NaN NaN NaN NaN NaN
BLOCKED DRIVEWAY NaN NaN NaN NaN NaN NaN NaN
ILLEGAL PARKING NaN NaN NaN NaN NaN NaN NaN
UNSANITARY CONDITION NaN NaN NaN NaN NaN NaN NaN

Now let's populate it with values.

In [8]:
for row in tempdata['Category'].unique():
    for year in  tempdata['Year'].unique():
        try:
            plotdata.loc[row.upper(),year] = tempdata.loc[(tempdata['Year'] == year) & (tempdata['Category'] == row)]['Count'].values[0]
        except:
            plotdata.loc[row.upper(),year] = None
In [9]:
plotdata = plotdata.drop(['NONCONST'])
plotdata
Out[9]:
2010 2011 2012 2013 2014 2015 2016
CONSTRUCTION/PLUMBING 287253 272234 225338 223314 106251 83585 39460
STREET CONDITION 229584 216865 171910 182618 221097 233839 105346
HEAT RELATED 214218 190184 182974 202896 230364 225706 126906
NOISY 201834 196453 220392 259356 337892 387227 197186
PAINT/PLASTER 93194 100704 77287 77957 63503 None None
WATER SYSTEM 70346 61577 57678 53186 None 71086 31299
BLOCKED DRIVEWAY None 52592 50818 57408 79170 100881 53955
ILLEGAL PARKING None None None None 63243 92679 55932
UNSANITARY CONDITION None None None None 61789 82888 35357
In [10]:
x = range(0,len(plotdata.columns))
xlabels = plotdata.columns

plt.figure(figsize=(16,10));
plt.xlim(-1,len(xlabels));
plt.ylim(0,500000);
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.gca().yaxis.set_ticks_position('left')
plt.gca().xaxis.set_ticks_position('bottom')
plt.title('Complaint Category by Year', size = 24)
plt.ylabel('Number of Complaints',size = 14)

plotcolors = [(1,0,103),(213,255,0),(255,0,86),(158,0,142),(14,76,161),(255,229,2),(0,95,57),\
            (0,255,0),(149,0,58),(255,147,126),(164,36,0),(0,21,68),(145,208,203),(98,14,0)]
plotcolors = [(color[0]/float(255),color[1]/float(255),color[2]/float(255)) for color in plotcolors]
legendcolors = [mpatches.Patch(color = color) for color in plotcolors]

count = 0
for row in plotdata.index:
    y = plotdata.loc[row]
    plt.plot(x, y, color = plotcolors[count], lw=4, marker = '.', markersize = 20);
    count += 1

plt.legend(legendcolors, upcaseindex, bbox_to_anchor=(0.3, 1));
plt.xticks(x,xlabels);

So that's my graph. Do note that the reason these complaints dropped in 2016 is obviously because we're only midway through the year! I grabbed these data in early July, so that's when these numbers were last updated.

Hack for Heat #10: NYC's top complaints

In [1]:
%matplotlib inline

from matplotlib import pyplot as plt
from matplotlib import patches as mpatches
import pandas as pd
import numpy as np
import psycopg2
from datetime import datetime
from datetime import date


pd.options.display.max_columns = 40

connection = psycopg2.connect('dbname = threeoneone user=threeoneoneadmin password=threeoneoneadmin')
cursor = connection.cursor()

Hack for Heat #10: NYC's top complaints

In this post, I'm going to look at what the top 311 complaints in NYC are, and how their numbers have changed over the years.

In [2]:
cursor.execute('''SELECT createddate, borough, complainttype FROM service''')
totalcomplaints = cursor.fetchall()
In [3]:
len(totalcomplaints)
Out[3]:
13092182

Over the past 7 years, there were about 13 million complaints, which is about just 1.5 per person over the 7 years, or .22 complaints per person per year

In [4]:
totalcomplaints = pd.DataFrame(totalcomplaints)
In [5]:
totalcomplaints.columns = ['Date', 'Boro', 'Comptype']
In [6]:
totalcomplaints['Year'] = [x.year for x in totalcomplaints['Date']]

This is how it breaks down over the years:

In [7]:
totalcomplaints.groupby(by=['Year']).count()
Out[7]:
Date Boro Comptype
Year
2010 2005730 2005730 2005730
2011 1918888 1918888 1918888
2012 1783013 1783013 1783013
2013 1848938 1848938 1848938
2014 2102177 2102177 2102177
2015 2286697 2286697 2286697
2016 1146739 1146739 1146739
In [8]:
complainttypesbyseason = totalcomplaints.groupby(by=['Year','Comptype']).count()
In [9]:
topfive = complainttypesbyseason['Date'].groupby(level=0,group_keys=False).apply(lambda x:x.sort_values(ascending=False).head(5))
topfive
Out[9]:
Year  Comptype              
2010  HEATING                   214218
      GENERAL CONSTRUCTION      127131
      Street Light Condition    116466
      Noise - Residential       115615
      PLUMBING                  111504
2011  HEATING                   190184
      GENERAL CONSTRUCTION      133242
      PLUMBING                  113218
      Noise - Residential       111447
      Street Condition          104694
2012  HEATING                   182974
      Noise - Residential       127943
      GENERAL CONSTRUCTION      112436
      Street Light Condition     93981
      PLUMBING                   87561
2013  HEATING                   202896
      Noise - Residential       151516
      GENERAL CONSTRUCTION      112169
      Street Light Condition     92502
      PLUMBING                   86564
2014  Noise - Residential       192608
      HEAT/HOT WATER            132767
      Street Condition          114545
      HEATING                    97597
      Street Light Condition     94665
2015  HEAT/HOT WATER            225706
      Noise - Residential       208026
      Street Condition          124378
      Blocked Driveway          100881
      Street Light Condition     98106
2016  HEAT/HOT WATER            126906
      Noise - Residential       104324
      Illegal Parking            55932
      Blocked Driveway           53955
      Street Condition           51977
dtype: int64

These data gives us a rough idea of how the number of complaints have changed over time, but one problem is that the labels that were used changed each year (presumably because they were replaced by new labels that better reflected what the complaints were). Secondly, for our purposes, some categories can be combined. For example, 'Street Condition' and 'Street Light Condition' may be important for administrators, but for now, we can be content with just "stuff that's gone wrong on the street".

In [10]:
topfive.index.levels[1]
Out[10]:
Index([u'Blocked Driveway', u'GENERAL CONSTRUCTION', u'HEAT/HOT WATER',
       u'HEATING', u'Illegal Parking', u'Noise - Residential', u'PLUMBING',
       u'Street Condition', u'Street Light Condition'],
      dtype='object', name=u'Comptype')

So, over the last 5 years, there have been 10 different categories that have made up the top five. Notice that some of these labels have changed over time. We'll have to standardize them.

In [11]:
years = totalcomplaints.Year.unique()

#first have a list of labels that are the same
oldlabels = totalcomplaints.loc[totalcomplaints['Year'] == years[0],['Comptype']]['Comptype'].unique()

newlabels = []

for x in years:
    yearslabels = totalcomplaints.loc[totalcomplaints['Year'] == x,['Comptype']]['Comptype'].unique()
    [newlabels.append(x) for x in yearslabels if x not in oldlabels and x not in newlabels]

From doing this, we get two lists: oldlabels and newlabels, that we will use to decide how to combine stuff.

In [12]:
#sorted(oldlabels)
In [13]:
#sorted(newlabels)

One modification I'm going to make is to combine 'Street Condition', 'Street Light Condition', 'Street Sign - Damaged', 'Street Sign - Dangling', and 'Street Sign - Missing' into 'Street Condition'.

For the current purposes, these are the same thing.

I'm also going to collapse 'GENERAL CONSTRUCTION', 'General Construction/Plumbing' and 'PLUMBING' into a new category called 'CONSTRUCTION/PLUMBING'.

I'll collapse noise related complaints ('Noise', 'Noise - Commercial', 'Noise - Helicopter', 'Noise - House of Worship', 'Noise - Park', 'Noise - Residential', 'Noise - Street/Sidewalk', 'Noise - Vehicle') into a label named 'NOISY'.

Finally, I'll combine 'HEAT/HOT WATER' and 'HEATING' into 'HEAT RELATED'.

In [14]:
streetconditioncomb = ['Street Condition', 'Street Light Condition', 'Street Sign - Damaged', 'Street Sign - Dangling', \
                       'Street Sign - Missing']

totalcomplaints['RevComptype'] = ['Street Condition' if x in streetconditioncomb else x for x in totalcomplaints['Comptype']]

genconstructcomb = ['GENERAL CONSTRUCTION', 'General Construction/Plumbing', 'PLUMBING']

totalcomplaints['RevComptype'] = ['CONSTRUCTION/PLUMBING' if x in genconstructcomb else x for x in totalcomplaints['RevComptype']]

noisecomb = ['Noise', 'Noise - Commercial', 'Noise - Helicopter', 'Noise - House of Worship', 'Noise - Park',\
             'Noise - Residential', 'Noise - Street/Sidewalk', 'Noise - Vehicle']

totalcomplaints['RevComptype'] = ['NOISY' if x in noisecomb else x for x in totalcomplaints['RevComptype']]

heatcomb = ['HEAT/HOT WATER', 'HEATING']

totalcomplaints['RevComptype'] = ['HEAT RELATED' if x in heatcomb else x for x in totalcomplaints['RevComptype']]
In [15]:
revisedcomplainttypes = totalcomplaints.groupby(by=['Year','RevComptype']).count()
In [16]:
topfiverev = revisedcomplainttypes['Date'].groupby(level=0, group_keys=False).apply(lambda x : x.sort_values(ascending=False).head(8))
topfiverev
Out[16]:
Year  RevComptype             
2010  CONSTRUCTION/PLUMBING       287253
      Street Condition            229584
      HEAT RELATED                214218
      NOISY                       201834
      PAINT - PLASTER              93194
      Water System                 70346
      NONCONST                     69659
      Traffic Signal Condition     53773
2011  CONSTRUCTION/PLUMBING       272234
      Street Condition            216865
      NOISY                       196453
      HEAT RELATED                190184
      PAINT - PLASTER             100704
      NONCONST                     68407
      Water System                 61577
      Blocked Driveway             52592
2012  CONSTRUCTION/PLUMBING       225338
      NOISY                       220392
      HEAT RELATED                182974
      Street Condition            171910
      PAINT - PLASTER              77287
      NONCONST                     60055
      Water System                 57678
      Blocked Driveway             50818
2013  NOISY                       259356
      CONSTRUCTION/PLUMBING       223314
      HEAT RELATED                202896
      Street Condition            182618
      PAINT - PLASTER              77957
      Blocked Driveway             57408
      NONCONST                     53831
      Water System                 53186
2014  NOISY                       337892
      HEAT RELATED                230364
      Street Condition            221097
      CONSTRUCTION/PLUMBING       106251
      Blocked Driveway             79170
      PAINT/PLASTER                63503
      Illegal Parking              63243
      UNSANITARY CONDITION         61789
2015  NOISY                       387227
      Street Condition            233839
      HEAT RELATED                225706
      Blocked Driveway            100881
      Illegal Parking              92679
      CONSTRUCTION/PLUMBING        83585
      UNSANITARY CONDITION         82888
      Water System                 71086
2016  NOISY                       197186
      HEAT RELATED                126906
      Street Condition            105346
      Illegal Parking              55932
      Blocked Driveway             53955
      CONSTRUCTION/PLUMBING        39460
      UNSANITARY CONDITION         35357
      Water System                 31299
dtype: int64

This gives us some different numbers. Taken together, although heat related complaints are still constantly in the top 5, what we observe over time is also a gradual decrease in construction related complaints, a constant, variable but large amount of street-related complaints, and a steady increase in noise-related complaints. Next, I'll post a version of this looking at only the heating season.

Hack for Heat #9: Total complaints over time, complaints per capita, etc.

In [1]:
%matplotlib inline

from matplotlib import pyplot as plt
from matplotlib import patches as mpatches
import pandas as pd
import numpy as np
import psycopg2
from datetime import datetime
from datetime import date


pd.options.display.max_columns = 40

connection = psycopg2.connect('dbname = threeoneone user=threeoneoneadmin password=threeoneoneadmin')
cursor = connection.cursor()

Hack for Heat #9: Total complaints over time, complaints per capita, etc.

In this post, I'm going to be doing a couple of miscellaneous things to prepare for our first blog post of the summer. Last year's posts can be found here.

In [2]:
cursor.execute('''SELECT createddate, borough, complainttype FROM service;''')
borodata = cursor.fetchall()
In [3]:
borodata = pd.DataFrame(borodata)
In [4]:
borodata.columns = ['Date', 'Boro', 'Comptype']

Again, I removed entires earlier than 2011 because there are data quality issues (substantially fewer cases):

In [5]:
borodata = borodata.loc[borodata['Date'] > date(2011,3,1)]
In [6]:
borodata = borodata.loc [borodata['Date'] <date(2016,6,1)]

Remove cases where the borough was unspecified:

In [7]:
borodata = borodata.loc[borodata['Boro'] != 'Unspecified']

Total complaints by borough, year-over-year

We already have a chart of heat complaints by month; we need one for the total number of complaints received.

In [8]:
borodata['Year'] = [x.year for x in borodata['Date']]
borodata['Month'] = [x.month for x in borodata['Date']]
borodata['Day'] = [x.day for x in borodata['Date']]
In [9]:
plotdata = borodata.groupby(by=['Boro', 'Year', 'Month']).count()
plotdata.head()
Out[9]:
Date Comptype Day
Boro Year Month
BRONX 2011 3 26256 26256 26256
4 22898 22898 22898
5 22722 22722 22722
6 23677 23677 23677
7 27348 27348 27348

Now we generate a dictionary that we can use to plot these data:

In [10]:
plotdict = {x:[] for x in borodata['Boro'].unique()}
In [11]:
for boro in plotdict:
    plotdict[boro] = list(plotdata.xs(boro).Date)

Now we need to format our data for the plot. We need 5 "rows", each representing a borough, and n columns, each representing one month.

In [12]:
plotdata = np.zeros(len(plotdict['BROOKLYN']))
legendkey = []

for boro in plotdict.keys():
    plotdata = np.row_stack((plotdata, plotdict[boro]))
    legendkey.append(boro)
    
plotdata = np.delete(plotdata, (0), axis=0)
In [13]:
x = np.arange(len(plotdata[0]))

#crude xlabels
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
years = ['2011', '2012', '2013', '2014', '2015', '2016']
xlabels = []
for year in years:
    for month in months:
        xlabels.append("{0} {1}".format(month,year))

        
xlabels = xlabels[2:-7] #start from march 2011, end may2016
In [14]:
plotcolors = [(1,0,103),(213,255,0),(255,0,86),(158,0,142),(14,76,161),(255,229,2),(0,95,57),\
            (0,255,0),(149,0,58),(255,147,126),(164,36,0),(0,21,68),(145,208,203),(98,14,0)]

#rescaling rgb from 0-255 to 0 to 1
plotcolors = [(color[0]/float(255),color[1]/float(255),color[2]/float(255)) for color in plotcolors]
legendcolors = [mpatches.Patch(color = color) for color in plotcolors]

plt.figure(figsize = (15,10));
plt.stackplot(x,plotdata, colors = plotcolors);
plt.xticks(x,xlabels,rotation=90);
plt.xlim(0,len(xlabels))
plt.legend(legendcolors,legendkey, bbox_to_anchor=(0.2, 1));
plt.title('Heating Complaints by Borough', size = 24)
plt.ylabel('Number of Complaints',size = 14)

plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.gca().yaxis.set_ticks_position('left')
plt.gca().xaxis.set_ticks_position('bottom')
In [15]:
plotdatacsv = pd.DataFrame(plotdata)
plotdatacsv.index = legendkey
plotdatacsv.columns = xlabels
plotdatacsv.to_csv('totalcomplaintsbyboroughandmonth.csv')

Normalized total complaints

We'll also need a normalized version of the above, where everything is scaled to 100%, to see if complaints come disproportionately from one borough vs. another. There are probably better ways to do this, but I'm going to simply sum the number of complaints from each time point, and divide each column by the total for that column (i.e., number of complaints from that borough for that month divided by total number of complaints).

In [16]:
normplotdata = np.ndarray(shape=np.shape(plotdata))

It will thus be helpful to have an array of the total number of complaints:

In [17]:
totalcomplaintsbymonth = borodata.groupby(by=['Year','Month']).count().Date
In [18]:
totalcomplaintsbymonth = totalcomplaintsbymonth.values
In [19]:
totalcomplaintsbymonth = np.tile(totalcomplaintsbymonth,(5,1))
In [20]:
normplotdata = plotdata/totalcomplaintsbymonth
In [21]:
plotcolors = [(1,0,103),(213,255,0),(255,0,86),(158,0,142),(14,76,161),(255,229,2),(0,95,57),\
            (0,255,0),(149,0,58),(255,147,126),(164,36,0),(0,21,68),(145,208,203),(98,14,0)]

#rescaling rgb from 0-255 to 0 to 1
plotcolors = [(color[0]/float(255),color[1]/float(255),color[2]/float(255)) for color in plotcolors]
legendcolors = [mpatches.Patch(color = color) for color in plotcolors]

plt.figure(figsize = (15,10));
plt.stackplot(x,normplotdata, colors = plotcolors);
plt.xticks(x,xlabels,rotation=90);
plt.xlim(0,len(xlabels))
plt.ylim(0,1)
plt.legend(legendcolors,legendkey, bbox_to_anchor=(0.2, 1));
plt.title('Heating Complaints by Borough', size = 24)
plt.ylabel('Number of Complaints',size = 14)

plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.gca().yaxis.set_ticks_position('left')
plt.gca().xaxis.set_ticks_position('bottom')

Heat complaints per capita:

We'd also like to look at how many complaints per capita are generated. Given that we only have full years for 2012-2015, let's consider those.

In [22]:
boropop = {
    'MANHATTAN': 1636268,
    'BRONX': 1438159,
    'BROOKLYN': 2621793,
    'QUEENS': 2321580,
    'STATEN ISLAND': 473279,
    }
In [23]:
totalnycpop = reduce(lambda x, y: x +y, boropop.values())
totalnycpop
Out[23]:
8491079
In [24]:
complaintsbyyear = borodata.groupby(by=['Year']).count()
borocomplaintsbyyear = borodata.groupby(by=['Year','Boro']).count()
In [25]:
complaintsbyyear['Pop'] = [totalnycpop for x in complaintsbyyear.index]
In [26]:
borocomplaintsbyyear['Pop'] = [boropop.get(x[1]) for x in borocomplaintsbyyear.index]
In [27]:
complaintsbyyear['CompPerCap'] = complaintsbyyear['Day']/complaintsbyyear['Pop']
In [28]:
complaintsbyyear
Out[28]:
Date Boro Comptype Month Day Pop CompPerCap
Year
2011 1359368 1359368 1359368 1359368 1359368 8491079 0.160094
2012 1528597 1528597 1528597 1528597 1528597 8491079 0.180024
2013 1546188 1546188 1546188 1546188 1546188 8491079 0.182096
2014 1770357 1770357 1770357 1770357 1770357 8491079 0.208496
2015 1924917 1924917 1924917 1924917 1924917 8491079 0.226699
2016 891783 891783 891783 891783 891783 8491079 0.105026
In [29]:
borocomplaintsbyyear['CompPerCap'] = borocomplaintsbyyear['Day']/borocomplaintsbyyear['Pop']
In [30]:
borocomplaintsbyyear
Out[30]:
Date Comptype Month Day Pop CompPerCap
Year Boro
2011 BRONX 262597 262597 262597 262597 1438159 0.182592
BROOKLYN 434038 434038 434038 434038 2621793 0.165550
MANHATTAN 273282 273282 273282 273282 1636268 0.167015
QUEENS 315070 315070 315070 315070 2321580 0.135714
STATEN ISLAND 74381 74381 74381 74381 473279 0.157161
2012 BRONX 296502 296502 296502 296502 1438159 0.206168
BROOKLYN 482088 482088 482088 482088 2621793 0.183877
MANHATTAN 314437 314437 314437 314437 1636268 0.192167
QUEENS 358990 358990 358990 358990 2321580 0.154632
STATEN ISLAND 76580 76580 76580 76580 473279 0.161807
2013 BRONX 302328 302328 302328 302328 1438159 0.210219
BROOKLYN 490979 490979 490979 490979 2621793 0.187268
MANHATTAN 332487 332487 332487 332487 1636268 0.203198
QUEENS 343772 343772 343772 343772 2321580 0.148077
STATEN ISLAND 76622 76622 76622 76622 473279 0.161896
2014 BRONX 340297 340297 340297 340297 1438159 0.236620
BROOKLYN 557290 557290 557290 557290 2621793 0.212561
MANHATTAN 373895 373895 373895 373895 1636268 0.228505
QUEENS 408389 408389 408389 408389 2321580 0.175910
STATEN ISLAND 90486 90486 90486 90486 473279 0.191190
2015 BRONX 355017 355017 355017 355017 1438159 0.246855
BROOKLYN 602144 602144 602144 602144 2621793 0.229669
MANHATTAN 417223 417223 417223 417223 1636268 0.254985
QUEENS 455115 455115 455115 455115 2321580 0.196037
STATEN ISLAND 95418 95418 95418 95418 473279 0.201610
2016 BRONX 169773 169773 169773 169773 1438159 0.118049
BROOKLYN 271140 271140 271140 271140 2621793 0.103418
MANHATTAN 209840 209840 209840 209840 1636268 0.128243
QUEENS 200335 200335 200335 200335 2321580 0.086293
STATEN ISLAND 40695 40695 40695 40695 473279 0.085985

From these data, we can see two things: First, on average, about 1 in 5 people make a 311 complaint each year. Second, overall, this number are pretty consistent across the 5 boroughs.

Hack for Heat #8: Complaint resolution time revisted

In [1]:
%matplotlib inline

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import psycopg2
from datetime import date

pd.options.display.max_columns = 40
pd.options.mode.chained_assignment = None  # default='warn'

Hack for Heat #8: Complaint resolution time revisited

In this post, I'm going to explore a simple way of dealing with the censoring issue in complaint resolution time: Basically, because we only care about heat complaints during the heat season, we can constrain our cases of interest to the heating season specifically (1st October to May 31st).

In [2]:
#pull from our database again:
connection = psycopg2.connect('dbname = threeoneone user= threeoneoneadmin password = threeoneoneadmin')
In [3]:
cursor = connection.cursor()
In [4]:
cursor.execute('''SELECT createddate, closeddate, borough, complainttype FROM service;''')
In [5]:
data = cursor.fetchall()
In [6]:
data = pd.DataFrame(data)
In [7]:
data.columns = ['createddate', 'closeddate', 'borough', 'complainttype']
In [8]:
data = data.loc[(data['complainttype'] == 'HEATING') | (data['complainttype'] == 'HEAT/HOT WATER') ]

So far, we've subsetted the data to only include the heating complaints. Now let's create a datetime mask. What we want is to have subsets of data for each of the heating seasons:

In [9]:
heatmonths = range(1, 6) + [10,11,12]
In [10]:
heatmonths
Out[10]:
[1, 2, 3, 4, 5, 10, 11, 12]
In [11]:
createdmask = data['createddate'].map(lambda x: (x.month in heatmonths))
closedmask = data['closeddate'].map(lambda x: (x.month in heatmonths) if x != None else False)
In [12]:
mask = createdmask & closedmask
In [13]:
heatseasondata = data.loc[mask]

How many heat/hot water complaints are created and closed inside vs. outside of heating season?

In [14]:
len(data.loc[mask]) #inside heating season
Out[14]:
1302688
In [15]:
len(data.loc[~mask])#outside heating season
Out[15]:
68750

The next thing we want to do is ignore cases where the complaint was resolved in the next heating season:

In [16]:
prevmonths = range(1, 6) 
nextmonths = [10,11,12]
In [17]:
heatseasondata['createdheatseason'] = [x.year if (x.month in prevmonths) else (x.year-1) for x in heatseasondata['createddate']]
In [18]:
heatseasondata.head()
Out[18]:
createddate closeddate borough complainttype createdheatseason
0 2014-11-18 2014-11-22 BROOKLYN HEAT/HOT WATER 2013
1 2014-11-18 2014-11-25 BROOKLYN HEAT/HOT WATER 2013
2 2014-11-18 2014-11-19 BRONX HEAT/HOT WATER 2013
3 2014-11-18 2014-11-19 MANHATTAN HEAT/HOT WATER 2013
4 2014-11-18 2014-11-20 BRONX HEAT/HOT WATER 2013
In [19]:
heatseasondata['closedheatseason'] = [x.year if (x.month in prevmonths) else (x.year-1) for x in heatseasondata['closeddate']]

Now that we've done this, we can select only the cases where the closed date was in the same season as the created date:

In [20]:
heatseasondata = heatseasondata.loc[heatseasondata['createdheatseason']  == heatseasondata['closedheatseason']]

Okay, now we can calculate some average resolution times:

In [21]:
heatseasondata['resolutiontime'] = heatseasondata['closeddate'] - heatseasondata['createddate']
In [22]:
heatseasondata['resolutiontimeint'] = heatseasondata.resolutiontime.astype('timedelta64[D]')
In [23]:
resolutiontimedata = heatseasondata.groupby(by='createdheatseason').mean()['resolutiontimeint']
In [24]:
resolutiontimedata.to_csv('resolutiontimebyyear.csv')
In [25]:
resolutiontimedata
Out[25]:
createdheatseason
2009    4.605845
2010    3.695498
2011    3.666409
2012    3.623974
2013    3.514244
2014    3.137578
2015    3.406145
2016    3.446075
Name: resolutiontimeint, dtype: float64

Resolution times by year:

In [26]:
x = resolutiontimedata.index.values
y = resolutiontimedata.values

plt.figure(figsize=(12,10));
plt.plot(x,y);

Resolution time by borough:

In [27]:
restimebyboro = heatseasondata.groupby(by=['borough', 'createdheatseason']).mean()['resolutiontimeint']
In [28]:
restimebyboro.to_csv('restimebyboro.csv')
In [29]:
restimebyboro = restimebyboro.loc[[x in range(2010,2017) for x in restimebyboro.index.get_level_values('createdheatseason')]]
In [31]:
boros = heatseasondata.borough.unique()
In [32]:
boroplots = {x:[] for x in boros}
for boro in boros:
    boroplots[boro] = restimebyboro.xs(boro).values
    
boroplots.pop('Unspecified')
Out[32]:
array([ 3.64268956,  3.89000015])
In [33]:
x = range(2010,2017)

plt.figure(figsize=(12,10));
for boro in boroplots:
    plt.plot(x,boroplots[boro]);
plt.legend(boroplots.keys());
plt.xticks(x, [str(label) for label in x]);

I was told beforehand that the average resolution time from a similar analysis last year was about 3-5 days, so this looks about right.

Hack for Heat #7: Heat complaints over time

In [1]:
%matplotlib inline

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import psycopg2
from datetime import datetime
from datetime import date

pd.options.display.max_columns = 40

Hack for Heat #7: Heat complaints over time

In the last post, I plotted how the number of complaints differd by borough over time. This time around, I'm going to revisit this process, but this time focusing on heating complaints only (this is Heat Seek, after all).

Loading data:

In [2]:
connection = psycopg2.connect('dbname = threeoneone user=threeoneoneadmin password=threeoneoneadmin')
cursor = connection.cursor()
In [3]:
cursor.execute('''SELECT DISTINCT complainttype FROM service;''')
complainttypes = cursor.fetchall()
In [4]:
cursor.execute('''SELECT createddate, borough, complainttype FROM service;''')
borodata = cursor.fetchall()
In [5]:
borodata = pd.DataFrame(borodata)
In [6]:
borodata.head()
Out[6]:
0 1 2
0 2011-04-15 MANHATTAN Street Condition
1 2011-04-15 MANHATTAN Street Condition
2 2011-04-15 MANHATTAN Street Condition
3 2011-04-15 MANHATTAN Street Condition
4 2011-04-14 QUEENS Street Condition
In [7]:
borodata.columns = ['Date', 'Boro', 'Comptype']
In [8]:
heatdata = borodata.loc[(borodata['Comptype'] == 'HEATING') | (borodata['Comptype'] == 'HEAT/HOT WATER')]

I removed entires earlier than 2011 because there are data quality issues (substantially fewer cases):

In [9]:
heatdata = heatdata.loc[heatdata['Date'] > date(2011,3,1)]

Heat complaints by borough:

In [10]:
len(heatdata)
Out[10]:
1092048

There were about a million heating complaints over the 3+ years we have data for.

In [11]:
heatbydate = heatdata.groupby(by='Boro').count()
In [12]:
heatbydate
Out[12]:
Date Comptype
Boro
BRONX 354335 354335
BROOKLYN 334574 334574
MANHATTAN 250296 250296
QUEENS 138477 138477
STATEN ISLAND 10601 10601
Unspecified 3765 3765

Per capita:

Again, let's look at how many heat complaints each boro generates per person:

In [13]:
boropop = {
    'MANHATTAN': 1636268,
    'BRONX': 1438159,
    'BROOKLYN': 2621793,
    'QUEENS': 2321580,
    'STATEN ISLAND': 473279,
    }
In [14]:
heatbydate['Pop'] = [boropop.get(x) for x in heatbydate.index]
In [15]:
heatbydate['CompPerCap'] = heatbydate['Date']/heatbydate['Pop']
In [16]:
heatbydate
Out[16]:
Date Comptype Pop CompPerCap
Boro
BRONX 354335 354335 1438159.0 0.246381
BROOKLYN 334574 334574 2621793.0 0.127613
MANHATTAN 250296 250296 1636268.0 0.152968
QUEENS 138477 138477 2321580.0 0.059648
STATEN ISLAND 10601 10601 473279.0 0.022399
Unspecified 3765 3765 NaN NaN

Complaints by borough over months

First, let's recreate the graph from before:

In [17]:
heatdata['Year'] = [x.year for x in heatdata['Date']]
heatdata['Month'] = [x.month for x in heatdata['Date']]
heatdata['Day'] = [x.day for x in heatdata['Date']]
In [18]:
heatdata.head()
Out[18]:
Date Boro Comptype Year Month Day
58 2015-10-17 BROOKLYN HEAT/HOT WATER 2015 10 17
728 2011-04-16 MANHATTAN HEATING 2011 4 16
1026 2011-04-16 BROOKLYN HEATING 2011 4 16
1418 2015-10-17 MANHATTAN HEAT/HOT WATER 2015 10 17
1624 2011-04-15 MANHATTAN HEATING 2011 4 15

We remove data where it was unspecified what the boro was

In [19]:
heatdata = heatdata.loc[heatdata['Boro'] != 'Unspecified']
In [20]:
heatplotdata = heatdata.groupby(by=['Boro', 'Year','Month']).count()
heatplotdata
Out[20]:
Date Comptype Day
Boro Year Month
BRONX 2011 3 5101 5101 5101
4 3339 3339 3339
5 1217 1217 1217
6 562 562 562
7 564 564 564
8 482 482 482
9 674 674 674
10 5954 5954 5954
11 7393 7393 7393
12 10536 10536 10536
2012 1 12655 12655 12655
2 7146 7146 7146
3 5049 5049 5049
4 3336 3336 3336
5 1214 1214 1214
6 672 672 672
7 688 688 688
8 711 711 711
9 675 675 675
10 3488 3488 3488
11 11853 11853 11853
12 9365 9365 9365
2013 1 13580 13580 13580
2 10037 10037 10037
3 6794 6794 6794
4 3706 3706 3706
5 1545 1545 1545
6 861 861 861
7 605 605 605
8 593 593 593
... ... ... ... ... ...
STATEN ISLAND 2013 12 261 261 261
2014 1 491 491 491
2 199 199 199
3 206 206 206
4 112 112 112
5 70 70 70
6 35 35 35
7 32 32 32
8 31 31 31
9 36 36 36
10 216 216 216
11 405 405 405
12 246 246 246
2015 1 402 402 402
2 523 523 523
3 216 216 216
4 91 91 91
5 58 58 58
6 31 31 31
7 24 24 24
8 31 31 31
9 31 31 31
10 261 261 261
11 250 250 250
12 230 230 230
2016 1 480 480 480
2 327 327 327
3 133 133 133
4 110 110 110
5 79 79 79

315 rows × 3 columns

In [21]:
boros = heatbydate.index
borodict = {x:[] for x in boros}
borodict.pop('Unspecified')

for boro in borodict:
    borodict[boro] = list(heatplotdata.xs(boro).Date)
In [22]:
plotdata = np.zeros(len(borodict['BROOKLYN']))
for boro in sorted(borodict.keys()):
    plotdata = np.row_stack((plotdata, borodict[boro]))
In [23]:
plotdata = np.delete(plotdata, (0), axis=0)
In [25]:
from matplotlib import patches as mpatches

x = np.arange(len(plotdata[0]))

#crude xlabels
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
years = ['2011', '2012', '2013', '2014', '2015', '2016']
xlabels = []
for year in years:
    for month in months:
        xlabels.append("{0} {1}".format(month,year))

xlabels = xlabels[2:-7] #start from march 2011, end may2016

A non-normalized plot:

The plot below is of raw complaint numbers, and is what we might expect: complaints about eat matter the most during the heating season!

In [26]:
plotcolors = [(1,0,103),(213,255,0),(255,0,86),(158,0,142),(14,76,161),(255,229,2),(0,95,57),\
            (0,255,0),(149,0,58),(255,147,126),(164,36,0),(0,21,68),(145,208,203),(98,14,0)]

#rescaling rgb from 0-255 to 0 to 1
plotcolors = [(color[0]/float(255),color[1]/float(255),color[2]/float(255)) for color in plotcolors]
legendcolors = [mpatches.Patch(color = color) for color in plotcolors]

plt.figure(figsize = (15,10));
plt.stackplot(x,plotdata, colors = plotcolors);
plt.xticks(x,xlabels,rotation=90);
plt.xlim(0,len(xlabels))
plt.legend(legendcolors,sorted(borodict.keys()), bbox_to_anchor=(0.2, 1));
plt.title('Heating Complaints by Borough', size = 24)
plt.ylabel('Number of Complaints',size = 14)

plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.gca().yaxis.set_ticks_position('left')
plt.gca().xaxis.set_ticks_position('bottom')

Normalizing data:

Next, we're going to normalize these data. This allows us to better visualize if the proportion of heating complaints changed by borough, over time (e.g., did one borough generate more/less complaints vs. others over time?).

What we want to do is divide each of the 5 rows by the total number of complaints by that column (the month)

In [27]:
totalcounts = heatdata.groupby(by=['Year', 'Month']).count().Date.values
In [40]:
boros = heatbydate.index
normdict = {x:[] for x in boros}
normdict.pop('Unspecified')

for boro in normdict:
    for i in range(0,len(plotdata[1])): # for all the values in each row
        normp = float(borodict[boro][i])/float(totalcounts[i])
        normdict[boro].append(normp*100)
In [41]:
normplotdata = np.zeros(len(borodict['BROOKLYN']))
for boro in sorted(normdict.keys()):
    normplotdata = np.row_stack((normplotdata, normdict[boro]))
In [42]:
normplotdata = np.delete(normplotdata,(0),axis = 0)
In [44]:
plotcolors = [(1,0,103),(213,255,0),(255,0,86),(158,0,142),(14,76,161),(255,229,2),(0,95,57),\
            (0,255,0),(149,0,58),(255,147,126),(164,36,0),(0,21,68),(145,208,203),(98,14,0)]

#rescaling rgb from 0-255 to 0 to 1
plotcolors = [(color[0]/float(255),color[1]/float(255),color[2]/float(255)) for color in plotcolors]
legendcolors = [mpatches.Patch(color = color) for color in plotcolors]

plt.figure(figsize = (15,10));
plt.stackplot(x,normplotdata, colors = plotcolors);
plt.xticks(x,xlabels,rotation=90);
plt.xlim(0,len(xlabels))
plt.ylim(0,100)
plt.legend(legendcolors,sorted(normdict.keys()), bbox_to_anchor=(0.2, 1));
plt.title('Heating Complaints by Borough (normalized)', size = 24)
plt.ylabel('% of Complaints',size = 14)

plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.gca().yaxis.set_ticks_position('left')
plt.gca().xaxis.set_ticks_position('bottom')

Overall, it looks like heating complaints were pretty consistent by borough. In the next post, I'm going to take a look at one way of addressing the afore-mentioned censoring problems.

Hack for Heat #5: Complaint resolution time

In [1]:
%matplotlib inline

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import psycopg2

pd.options.display.max_columns = 40

Hack for Heat #5: How long do complaints take to resolve?

In this post, we're going to see if we can graph how long it takes for complaints to get resolved.

In [2]:
connection = psycopg2.connect('dbname= threeoneone user=threeoneoneadmin password=threeoneoneadmin')
cursor = connection.cursor()

cursor.execute('''SELECT createddate, closeddate, borough FROM service;''')
data = cursor.fetchall()
data = pd.DataFrame(data)
In [3]:
data.columns = ['createddate','closeddate','borough']
In [4]:
data = data.loc[data['createddate'].notnull()]
data = data.loc[data['closeddate'].notnull()]
In [5]:
data['timedelta'] = data['closeddate'] - data['createddate']
In [6]:
data['timedeltaint'] = [x.days for x in data['timedelta']]
In [7]:
data.head()
Out[7]:
createddate closeddate borough timedelta timedeltaint
0 2013-04-26 2013-04-26 QUEENS 0 days 0
1 2013-04-26 2013-04-29 MANHATTAN 3 days 3
2 2013-04-26 2013-04-30 MANHATTAN 4 days 4
3 2013-04-26 2013-04-26 QUEENS 0 days 0
4 2013-04-26 2013-04-30 STATEN ISLAND 4 days 4
In [8]:
data.groupby(by='borough')['timedeltaint'].mean()
Out[8]:
borough
BRONX            -3.658691
BROOKLYN         -6.835417
MANHATTAN       -20.325506
QUEENS           -4.121806
STATEN ISLAND    -2.108396
Unspecified       4.699638
Name: timedeltaint, dtype: float64

Oops! Looks like something's wrong. Let's try and find out:

In [9]:
data.sort_values('timedeltaint').head()
Out[9]:
createddate closeddate borough timedelta timedeltaint
604564 2016-03-16 1900-01-01 Unspecified -42443 days -42443
596082 2016-03-16 1900-01-01 Unspecified -42443 days -42443
605654 2016-03-16 1900-01-01 QUEENS -42443 days -42443
606458 2016-03-15 1900-01-01 MANHATTAN -42442 days -42442
552013 2016-03-14 1900-01-01 Unspecified -42441 days -42441
In [10]:
data.sort_values('timedeltaint', ascending=False).head()
Out[10]:
createddate closeddate borough timedelta timedeltaint
9392808 2010-04-08 2201-05-13 BRONX 69796 days 69796
9525584 2010-06-01 2201-06-17 BROOKLYN 69777 days 69777
1581464 2011-01-20 2201-03-25 BRONX 69460 days 69460
1536326 2015-11-03 2100-01-01 QUEENS 30740 days 30740
2479091 2013-01-10 2023-05-01 BROOKLYN 3763 days 3763

Ah. Well, as a first step, let's remove any values that are before Jan 1st 2010 or after today:

In [11]:
import datetime

today = datetime.date(2016,5,29)
janone = datetime.date(2010,1,1)

Let's also remove any rows where the close date is before the created date:

In [12]:
subdata = data.loc[(data['closeddate'] > janone) & (data['closeddate'] < today)]
subdata = subdata.loc[data['closeddate'] > data['createddate']]
In [13]:
len(subdata)
Out[13]:
7979757
In [14]:
subdata.sort_values('timedeltaint').head()
Out[14]:
createddate closeddate borough timedelta timedeltaint
11371297 2015-08-28 2015-08-29 QUEENS 1 days 1
7950080 2015-06-20 2015-06-21 BROOKLYN 1 days 1
2366882 2012-11-19 2012-11-20 MANHATTAN 1 days 1
2366879 2015-06-09 2015-06-10 Unspecified 1 days 1
2366874 2012-04-17 2012-04-18 BROOKLYN 1 days 1
In [15]:
subdata.sort_values('timedeltaint',ascending = False).head()
Out[15]:
createddate closeddate borough timedelta timedeltaint
10774084 2010-01-06 2016-03-24 BROOKLYN 2269 days 2269
1506622 2010-02-19 2016-05-02 BROOKLYN 2264 days 2264
1508205 2010-03-02 2016-05-04 STATEN ISLAND 2255 days 2255
513580 2010-01-01 2016-02-17 STATEN ISLAND 2238 days 2238
589865 2010-01-22 2016-02-25 BRONX 2225 days 2225

This looks a little bit more realistic, but let's also visualize the distribution:

In [16]:
plotdata = list(subdata['timedeltaint'])
In [17]:
plt.figure(figsize=(12,10))
plt.hist(plotdata);

Okay, this still looks really wonky. Let's further subset the data, and see what happens when we remove the top and bottom 2.5%.

Pandas has a quantile function:

In [18]:
subdata.quantile([.025, .975])
Out[18]:
timedeltaint
0.025 1
0.975 138
In [19]:
quantcutdata = subdata.loc[(subdata['timedeltaint'] > 1) & (subdata['timedeltaint'] < 138) ]
In [20]:
len(quantcutdata)
Out[20]:
6278707
In [21]:
plotdata = list(quantcutdata['timedeltaint'])

plt.figure(figsize=(12,10))
plt.hist(plotdata);

That looks a little better, but there might be other ways to filter out bad data.

In [22]:
subdata.groupby(by='borough').median()
Out[22]:
timedeltaint
borough
BRONX 5
BROOKLYN 5
MANHATTAN 6
QUEENS 5
STATEN ISLAND 4
Unspecified 5
In [23]:
subdata.groupby(by='borough').mean()
Out[23]:
timedeltaint
borough
BRONX 18.789025
BROOKLYN 20.725656
MANHATTAN 20.841213
QUEENS 23.746771
STATEN ISLAND 26.283222
Unspecified 11.652647

So, I definitely wouldn't trust these data right now - I'm still working on finding a data dictionary for the 311 data, and I need to make sure the columns mean what I think they mean. The point of this is just to show that the process would be once I get my hands on reliable data.

Hack for Heat #4: Complaints by borough and date

In [2]:
%matplotlib inline

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import psycopg2

pd.options.display.max_columns = 40

Hack for Heat #4: Number of complaints over time pt.2

In this post, we're going to look at the number of complaints each borough received for the last five or so years. First, let's look at the total number of complaints received:

In [3]:
connection = psycopg2.connect('dbname = threeoneone user=threeoneoneadmin password=threeoneoneadmin')
cursor = connection.cursor()

Borough complaints by date

In [4]:
cursor.execute('''SELECT createddate, borough FROM service;''')
borodata = cursor.fetchall()
In [5]:
borodata = pd.DataFrame(borodata)
In [6]:
borodata.columns = ['Date', 'Boro']
In [7]:
borobydate = borodata.groupby(by='Boro').count()
In [8]:
borobydate
Out[8]:
Date
Boro
BRONX 1882215
BROOKLYN 3178206
MANHATTAN 2166983
QUEENS 2444874
STATEN ISLAND 541935
Unspecified 1157085

For fun, let's look at the number of complaints per capita in each of the 5 boroughs. The population values below are from Wikipedia.

In [9]:
boropop = {
    'MANHATTAN': 1636268,
    'BRONX': 1438159,
    'BROOKLYN': 2621793,
    'QUEENS': 2321580,
    'STATEN ISLAND': 473279,
    }
In [10]:
borobydate['Pop'] = [boropop.get(x) for x in borobydate.index]
In [11]:
borobydate['CompPerCap'] = borobydate['Date']/borobydate['Pop']
In [12]:
borobydate
Out[12]:
Date Pop CompPerCap
Boro
BRONX 1882215 1438159 1.308767
BROOKLYN 3178206 2621793 1.212226
MANHATTAN 2166983 1636268 1.324345
QUEENS 2444874 2321580 1.053108
STATEN ISLAND 541935 473279 1.145065
Unspecified 1157085 NaN NaN

Complaints by borough over months

The next thing we're going to do is make a stacked plot of complaints by borough, over months. To do this, we need to extract the day and month from the date column:

In [13]:
borodata['Year'] = [x.year for x in borodata['Date']]
borodata['Month'] = [x.month for x in borodata['Date']]
In [14]:
borodata = borodata.loc[borodata['Boro'] != 'Unspecified']

Next, we need to generate an array of Ys. We want the rows of this dataframe to be the 5 boroughs, and the columns to be the count of complaints for the year and month.

In [15]:
boroplotdata = borodata.groupby(by=['Boro', 'Year','Month']).count()
boroplotdata
Out[15]:
Date
Boro Year Month
BRONX 2010 1 9395
2 9475
3 13616
4 11786
5 12169
6 15681
7 16861
8 12920
9 11633
10 10590
11 9196
12 8815
2011 1 9599
2 11229
3 26772
4 22898
5 22722
6 23677
7 27348
8 25241
9 25138
10 30071
11 28319
12 30927
2012 1 33097
2 25259
3 24480
4 20330
5 21851
6 24170
... ... ... ...
STATEN ISLAND 2013 12 4984
2014 1 8753
2 8978
3 8211
4 7937
5 8278
6 8173
7 8111
8 6934
9 6597
10 7067
11 5558
12 5889
2015 1 6398
2 8228
3 10537
4 8489
5 8240
6 8803
7 8507
8 7684
9 7920
10 7592
11 6756
12 6264
2016 1 7186
2 7505
3 8531
4 8750
5 6688

385 rows × 1 columns

We basically need to get the above table into a graph.

In [16]:
boros = borobydate.index
borodict = {x:[] for x in boros}
borodict.pop('Unspecified')

for boro in borodict:
    borodict[boro] = list(boroplotdata.xs(boro).Date)
In [17]:
plotdata = np.zeros(len(borodict['BROOKLYN']))
for boro in borodict:
    plotdata = np.row_stack((plotdata, borodict[boro]))
In [18]:
plotdata = np.delete(plotdata, (0), axis=0)
In [19]:
plotdata
Out[19]:
array([[  5804.,   6791.,  11009.,   6578.,   6590.,   6370.,   5909.,
          5957.,   5926.,   5597.,   4823.,   4715.,   6008.,   7360.,
          8109.,   6836.,   7091.,   7257.,   6380.,   9858.,   8268.,
          8356.,   6778.,   5799.,   6169.,   5042.,   6018.,   5607.,
          6746.,   6839.,   7037.,   7024.,   6108.,   7036.,   7509.,
          5445.,   7207.,   5851.,   6352.,   6915.,   6843.,   7392.,
          7552.,   6442.,   5936.,   6152.,   4996.,   4984.,   8753.,
          8978.,   8211.,   7937.,   8278.,   8173.,   8111.,   6934.,
          6597.,   7067.,   5558.,   5889.,   6398.,   8228.,  10537.,
          8489.,   8240.,   8803.,   8507.,   7684.,   7920.,   7592.,
          6756.,   6264.,   7186.,   7505.,   8531.,   8750.,   6688.],
       [ 21753.,  22463.,  30847.,  25357.,  26545.,  27245.,  29419.,
         26279.,  26287.,  24483.,  22788.,  22361.,  22895.,  24331.,
         43807.,  41075.,  39214.,  41555.,  42120.,  46155.,  44927.,
         47186.,  44536.,  44451.,  46493.,  36073.,  37645.,  35357.,
         38451.,  38776.,  41169.,  39849.,  36523.,  41004.,  47840.,
         42908.,  50532.,  40369.,  40716.,  38720.,  40310.,  41070.,
         41466.,  37545.,  34560.,  40456.,  42451.,  42784.,  60679.,
         49657.,  45848.,  40934.,  46025.,  46460.,  46283.,  41112.,
         41180.,  45079.,  47862.,  46171.,  52516.,  55555.,  55122.,
         47002.,  47346.,  51835.,  51154.,  47652.,  47385.,  52645.,
         48386.,  45544.,  56649.,  52388.,  50952.,  52751.,  44888.],
       [  9395.,   9475.,  13616.,  11786.,  12169.,  15681.,  16861.,
         12920.,  11633.,  10590.,   9196.,   8815.,   9599.,  11229.,
         26772.,  22898.,  22722.,  23677.,  27348.,  25241.,  25138.,
         30071.,  28319.,  30927.,  33097.,  25259.,  24480.,  20330.,
         21851.,  24170.,  25939.,  22423.,  19910.,  23060.,  29396.,
         26587.,  33031.,  28243.,  24956.,  22083.,  20914.,  22673.,
         26734.,  20546.,  19060.,  23673.,  29511.,  30904.,  43650.,
         33274.,  29464.,  24904.,  24040.,  24024.,  26418.,  22771.,
         21910.,  25771.,  33015.,  31056.,  36197.,  37733.,  32791.,
         26419.,  24552.,  27103.,  29387.,  26814.,  25613.,  30593.,
         29559.,  28256.,  39647.,  36866.,  30192.,  29880.,  25408.],
       [ 15659.,  14831.,  18597.,  19642.,  20490.,  20316.,  21347.,
         19256.,  19772.,  19534.,  18897.,  14786.,  14499.,  17235.,
         27956.,  26018.,  26449.,  25271.,  25531.,  25630.,  27482.,
         31265.,  29656.,  28735.,  29867.,  24649.,  27054.,  24528.,
         25844.,  26173.,  25093.,  25044.,  23468.,  26678.,  28202.,
         27837.,  32110.,  27178.,  28061.,  26816.,  27207.,  25809.,
         26994.,  25026.,  24760.,  29790.,  30731.,  28005.,  37394.,
         29569.,  30384.,  29098.,  30921.,  30898.,  31221.,  28214.,
         28754.,  32569.,  33913.,  30960.,  34194.,  34262.,  33343.,
         33626.,  33555.,  33952.,  33896.,  34326.,  34295.,  40630.,
         37755.,  33389.,  40881.,  37560.,  40559.,  43912.,  37175.],
       [ 22951.,  22336.,  32976.,  27739.,  28528.,  28287.,  29154.,
         27223.,  34110.,  26295.,  22845.,  21670.,  22964.,  24969.,
         32932.,  29108.,  29876.,  31333.,  31231.,  38856.,  33368.,
         31570.,  29785.,  28166.,  29217.,  23758.,  26756.,  25594.,
         28361.,  30195.,  31792.,  31756.,  28628.,  34079.,  39404.,
         29450.,  33078.,  28121.,  27475.,  27384.,  29037.,  29633.,
         31044.,  27880.,  26220.,  28053.,  27561.,  28286.,  38144.,
         36074.,  35977.,  31202.,  34711.,  35843.,  36394.,  31615.,
         31129.,  32470.,  33042.,  31788.,  34332.,  38725.,  44965.,
         37836.,  36885.,  40637.,  39691.,  38212.,  36171.,  38864.,
         34327.,  34463.,  38982.,  38579.,  38548.,  40171.,  34063.]])

Awesome! Now we have 5 rows with 77 columns each denoting the complaints for each of the boros for each of the months from 2010 to 2016.

In [20]:
from matplotlib import patches as mpatches

x = np.arange(len(plotdata[0]))

#crude xlabels
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
years = ['2010', '2011', '2012', '2013', '2014', '2015', '2016']
xlabels = []
for year in years:
    for month in months:
        xlabels.append("{0} {1}".format(month,year))

plotcolors = [(1,0,103),(213,255,0),(255,0,86),(158,0,142),(14,76,161),(255,229,2),(0,95,57),\
            (0,255,0),(149,0,58),(255,147,126),(164,36,0),(0,21,68),(145,208,203),(98,14,0)]

#rescaling rgb from 0-255 to 0 to 1
plotcolors = [(color[0]/float(255),color[1]/float(255),color[2]/float(255)) for color in plotcolors]
legendcolors = [mpatches.Patch(color = color) for color in plotcolors]

plt.figure(figsize = (15,10));
plt.stackplot(x,plotdata, colors = plotcolors);
plt.xticks(x,xlabels,rotation=90);
plt.xlim(0,76)
plt.legend(legendcolors,borodict.keys(), bbox_to_anchor=(0.2, 1));
plt.title('311 Complaints by Borough', size = 24)
plt.ylabel('Number of Complaints',size = 14)

plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.gca().yaxis.set_ticks_position('left')
plt.gca().xaxis.set_ticks_position('bottom')
C:\Users\Bryan\Anaconda2\lib\site-packages\matplotlib\collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):