%matplotlib inline

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import psycopg2

pd.options.display.max_columns = 40

Hack for Heat #4: Number of complaints over time pt.2¶

In this post, we're going to look at the number of complaints each borough received for the last five or so years. First, let's look at the total number of complaints received:

connection = psycopg2.connect('dbname = threeoneone user=threeoneoneadmin password=threeoneoneadmin')
cursor = connection.cursor()

Borough complaints by date¶

cursor.execute('''SELECT createddate, borough FROM service;''')
borodata = cursor.fetchall()

borodata = pd.DataFrame(borodata)

borodata.columns = ['Date', 'Boro']

borobydate = borodata.groupby(by='Boro').count()

borobydate

For fun, let's look at the number of complaints per capita in each of the 5 boroughs. The population values below are from Wikipedia.

boropop = {
    'MANHATTAN': 1636268,
    'BRONX': 1438159,
    'BROOKLYN': 2621793,
    'QUEENS': 2321580,
    'STATEN ISLAND': 473279,
    }

borobydate['Pop'] = [boropop.get(x) for x in borobydate.index]

borobydate['CompPerCap'] = borobydate['Date']/borobydate['Pop']

borobydate

Complaints by borough over months¶

The next thing we're going to do is make a stacked plot of complaints by borough, over months. To do this, we need to extract the day and month from the date column:

borodata['Year'] = [x.year for x in borodata['Date']]
borodata['Month'] = [x.month for x in borodata['Date']]

borodata = borodata.loc[borodata['Boro'] != 'Unspecified']

Next, we need to generate an array of Ys. We want the rows of this dataframe to be the 5 boroughs, and the columns to be the count of complaints for the year and month.

boroplotdata = borodata.groupby(by=['Boro', 'Year','Month']).count()
boroplotdata

We basically need to get the above table into a graph.

boros = borobydate.index
borodict = {x:[] for x in boros}
borodict.pop('Unspecified')

for boro in borodict:
    borodict[boro] = list(boroplotdata.xs(boro).Date)

plotdata = np.zeros(len(borodict['BROOKLYN']))
for boro in borodict:
    plotdata = np.row_stack((plotdata, borodict[boro]))

plotdata = np.delete(plotdata, (0), axis=0)

plotdata

array([[  5804.,   6791.,  11009.,   6578.,   6590.,   6370.,   5909.,
          5957.,   5926.,   5597.,   4823.,   4715.,   6008.,   7360.,
          8109.,   6836.,   7091.,   7257.,   6380.,   9858.,   8268.,
          8356.,   6778.,   5799.,   6169.,   5042.,   6018.,   5607.,
          6746.,   6839.,   7037.,   7024.,   6108.,   7036.,   7509.,
          5445.,   7207.,   5851.,   6352.,   6915.,   6843.,   7392.,
          7552.,   6442.,   5936.,   6152.,   4996.,   4984.,   8753.,
          8978.,   8211.,   7937.,   8278.,   8173.,   8111.,   6934.,
          6597.,   7067.,   5558.,   5889.,   6398.,   8228.,  10537.,
          8489.,   8240.,   8803.,   8507.,   7684.,   7920.,   7592.,
          6756.,   6264.,   7186.,   7505.,   8531.,   8750.,   6688.],
       [ 21753.,  22463.,  30847.,  25357.,  26545.,  27245.,  29419.,
         26279.,  26287.,  24483.,  22788.,  22361.,  22895.,  24331.,
         43807.,  41075.,  39214.,  41555.,  42120.,  46155.,  44927.,
         47186.,  44536.,  44451.,  46493.,  36073.,  37645.,  35357.,
         38451.,  38776.,  41169.,  39849.,  36523.,  41004.,  47840.,
         42908.,  50532.,  40369.,  40716.,  38720.,  40310.,  41070.,
         41466.,  37545.,  34560.,  40456.,  42451.,  42784.,  60679.,
         49657.,  45848.,  40934.,  46025.,  46460.,  46283.,  41112.,
         41180.,  45079.,  47862.,  46171.,  52516.,  55555.,  55122.,
         47002.,  47346.,  51835.,  51154.,  47652.,  47385.,  52645.,
         48386.,  45544.,  56649.,  52388.,  50952.,  52751.,  44888.],
       [  9395.,   9475.,  13616.,  11786.,  12169.,  15681.,  16861.,
         12920.,  11633.,  10590.,   9196.,   8815.,   9599.,  11229.,
         26772.,  22898.,  22722.,  23677.,  27348.,  25241.,  25138.,
         30071.,  28319.,  30927.,  33097.,  25259.,  24480.,  20330.,
         21851.,  24170.,  25939.,  22423.,  19910.,  23060.,  29396.,
         26587.,  33031.,  28243.,  24956.,  22083.,  20914.,  22673.,
         26734.,  20546.,  19060.,  23673.,  29511.,  30904.,  43650.,
         33274.,  29464.,  24904.,  24040.,  24024.,  26418.,  22771.,
         21910.,  25771.,  33015.,  31056.,  36197.,  37733.,  32791.,
         26419.,  24552.,  27103.,  29387.,  26814.,  25613.,  30593.,
         29559.,  28256.,  39647.,  36866.,  30192.,  29880.,  25408.],
       [ 15659.,  14831.,  18597.,  19642.,  20490.,  20316.,  21347.,
         19256.,  19772.,  19534.,  18897.,  14786.,  14499.,  17235.,
         27956.,  26018.,  26449.,  25271.,  25531.,  25630.,  27482.,
         31265.,  29656.,  28735.,  29867.,  24649.,  27054.,  24528.,
         25844.,  26173.,  25093.,  25044.,  23468.,  26678.,  28202.,
         27837.,  32110.,  27178.,  28061.,  26816.,  27207.,  25809.,
         26994.,  25026.,  24760.,  29790.,  30731.,  28005.,  37394.,
         29569.,  30384.,  29098.,  30921.,  30898.,  31221.,  28214.,
         28754.,  32569.,  33913.,  30960.,  34194.,  34262.,  33343.,
         33626.,  33555.,  33952.,  33896.,  34326.,  34295.,  40630.,
         37755.,  33389.,  40881.,  37560.,  40559.,  43912.,  37175.],
       [ 22951.,  22336.,  32976.,  27739.,  28528.,  28287.,  29154.,
         27223.,  34110.,  26295.,  22845.,  21670.,  22964.,  24969.,
         32932.,  29108.,  29876.,  31333.,  31231.,  38856.,  33368.,
         31570.,  29785.,  28166.,  29217.,  23758.,  26756.,  25594.,
         28361.,  30195.,  31792.,  31756.,  28628.,  34079.,  39404.,
         29450.,  33078.,  28121.,  27475.,  27384.,  29037.,  29633.,
         31044.,  27880.,  26220.,  28053.,  27561.,  28286.,  38144.,
         36074.,  35977.,  31202.,  34711.,  35843.,  36394.,  31615.,
         31129.,  32470.,  33042.,  31788.,  34332.,  38725.,  44965.,
         37836.,  36885.,  40637.,  39691.,  38212.,  36171.,  38864.,
         34327.,  34463.,  38982.,  38579.,  38548.,  40171.,  34063.]])

Awesome! Now we have 5 rows with 77 columns each denoting the complaints for each of the boros for each of the months from 2010 to 2016.

from matplotlib import patches as mpatches

x = np.arange(len(plotdata[0]))

#crude xlabels
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
years = ['2010', '2011', '2012', '2013', '2014', '2015', '2016']
xlabels = []
for year in years:
    for month in months:
        xlabels.append("{0} {1}".format(month,year))

plotcolors = [(1,0,103),(213,255,0),(255,0,86),(158,0,142),(14,76,161),(255,229,2),(0,95,57),\
            (0,255,0),(149,0,58),(255,147,126),(164,36,0),(0,21,68),(145,208,203),(98,14,0)]

#rescaling rgb from 0-255 to 0 to 1
plotcolors = [(color[0]/float(255),color[1]/float(255),color[2]/float(255)) for color in plotcolors]
legendcolors = [mpatches.Patch(color = color) for color in plotcolors]

plt.figure(figsize = (15,10));
plt.stackplot(x,plotdata, colors = plotcolors);
plt.xticks(x,xlabels,rotation=90);
plt.xlim(0,76)
plt.legend(legendcolors,borodict.keys(), bbox_to_anchor=(0.2, 1));
plt.title('311 Complaints by Borough', size = 24)
plt.ylabel('Number of Complaints',size = 14)

plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.gca().yaxis.set_ticks_position('left')
plt.gca().xaxis.set_ticks_position('bottom')

C:\Users\Bryan\Anaconda2\lib\site-packages\matplotlib\collections.py:590: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  if self._edgecolors == str('face'):

Sanity checks¶

So, some parts of this graph bear checking. First, did Staten Island really not increase in complaints over the years? The data below (complaints by by borough by year) suggest that that was the case:

borodata.groupby(by = ['Boro', 'Year']).count()

The other thing that would be worth investigating is what happened between Februrary and March 2011 - complaints doubled during this time. Perhaps some aspect of the pipeline changed during this time, but that's a future story.

		Date	Month
Boro	Year
BRONX	2010	142137	142137
	2011	283941	283941
	2012	296502	296502
	2013	302328	302328
	2014	340297	340297
	2015	355017	355017
	2016	161993	161993
BROOKLYN	2010	305827	305827
	2011	482252	482252
	2012	482088	482088
	2013	490979	490979
	2014	557290	557290
	2015	602142	602142
	2016	257628	257628
MANHATTAN	2010	223127	223127
	2011	305727	305727
	2012	314437	314437
	2013	332487	332487
	2014	373895	373895
	2015	417223	417223
	2016	200087	200087
QUEENS	2010	324114	324114
	2011	364158	364158
	2012	358990	358990
	2013	343772	343772
	2014	408389	408389
	2015	455108	455108
	2016	190343	190343
STATEN ISLAND	2010	76069	76069
	2011	88100	88100
	2012	76580	76580
	2013	76622	76622
	2014	90486	90486
	2015	95418	95418
	2016	38660	38660

	Date
Boro
BRONX	1882215
BROOKLYN	3178206
MANHATTAN	2166983
QUEENS	2444874
STATEN ISLAND	541935
Unspecified	1157085

			Date
Boro	Year	Month
BRONX	2010	1	9395
		2	9475
		3	13616
		4	11786
		5	12169
		6	15681
		7	16861
		8	12920
		9	11633
		10	10590
		11	9196
		12	8815
	2011	1	9599
		2	11229
		3	26772
		4	22898
		5	22722
		6	23677
		7	27348
		8	25241
		9	25138
		10	30071
		11	28319
		12	30927
	2012	1	33097
		2	25259
		3	24480
		4	20330
		5	21851
		6	24170
...	...	...	...
STATEN ISLAND	2013	12	4984
	2014	1	8753
		2	8978
		3	8211
		4	7937
		5	8278
		6	8173
		7	8111