Hack for Heat #11: NYC's top complaints pt.2

In [1]:
%matplotlib inline

from matplotlib import pyplot as plt
from matplotlib import patches as mpatches
import pandas as pd
import numpy as np
import psycopg2
from datetime import datetime
from datetime import date


pd.options.display.max_columns = 40

Hack for Heat #11: NYC's top complaints pt. 2

In this post, I'm going to post a visualization of the top complaints reported in the previous post. I have this in a csv:

In [2]:
tempdata = pd.read_csv('topfiverev.csv', names = ['Year','Category','Count'])
In [3]:
tempdata.head()
Out[3]:
Year Category Count
0 2010 CONSTRUCTION/PLUMBING 287253
1 2010 Street Condition 229584
2 2010 HEAT RELATED 214218
3 2010 NOISY 201834
4 2010 PAINT - PLASTER 93194
In [4]:
#Fixing some category names
tempdata['Category'] = ['PAINT/PLASTER' if (x == 'PAINT/PLASTER') | (x== 'PAINT - PLASTER') else x for x in tempdata['Category']]
tempdata['Category'] = ['Street Condition' if x == 'Traffic Signal Condition' else x for x in tempdata['Category'] ]

What we want is to create a data frame with as many rows as there are categories, as many columns as there are years, and the counts as the data points.

In [5]:
upcaseindex = [x.upper() for x in tempdata['Category'].unique()]
upcaseindex.remove('NONCONST') #remove this category because it refers to missing values
In [6]:
plotdata = pd.DataFrame(index = upcaseindex, columns = tempdata['Year'].unique())
In [7]:
plotdata
Out[7]:
2010 2011 2012 2013 2014 2015 2016
CONSTRUCTION/PLUMBING NaN NaN NaN NaN NaN NaN NaN
STREET CONDITION NaN NaN NaN NaN NaN NaN NaN
HEAT RELATED NaN NaN NaN NaN NaN NaN NaN
NOISY NaN NaN NaN NaN NaN NaN NaN
PAINT/PLASTER NaN NaN NaN NaN NaN NaN NaN
WATER SYSTEM NaN NaN NaN NaN NaN NaN NaN
BLOCKED DRIVEWAY NaN NaN NaN NaN NaN NaN NaN
ILLEGAL PARKING NaN NaN NaN NaN NaN NaN NaN
UNSANITARY CONDITION NaN NaN NaN NaN NaN NaN NaN

Now let's populate it with values.

In [8]:
for row in tempdata['Category'].unique():
    for year in  tempdata['Year'].unique():
        try:
            plotdata.loc[row.upper(),year] = tempdata.loc[(tempdata['Year'] == year) & (tempdata['Category'] == row)]['Count'].values[0]
        except:
            plotdata.loc[row.upper(),year] = None
In [9]:
plotdata = plotdata.drop(['NONCONST'])
plotdata
Out[9]:
2010 2011 2012 2013 2014 2015 2016
CONSTRUCTION/PLUMBING 287253 272234 225338 223314 106251 83585 39460
STREET CONDITION 229584 216865 171910 182618 221097 233839 105346
HEAT RELATED 214218 190184 182974 202896 230364 225706 126906
NOISY 201834 196453 220392 259356 337892 387227 197186
PAINT/PLASTER 93194 100704 77287 77957 63503 None None
WATER SYSTEM 70346 61577 57678 53186 None 71086 31299
BLOCKED DRIVEWAY None 52592 50818 57408 79170 100881 53955
ILLEGAL PARKING None None None None 63243 92679 55932
UNSANITARY CONDITION None None None None 61789 82888 35357
In [10]:
x = range(0,len(plotdata.columns))
xlabels = plotdata.columns

plt.figure(figsize=(16,10));
plt.xlim(-1,len(xlabels));
plt.ylim(0,500000);
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.gca().yaxis.set_ticks_position('left')
plt.gca().xaxis.set_ticks_position('bottom')
plt.title('Complaint Category by Year', size = 24)
plt.ylabel('Number of Complaints',size = 14)

plotcolors = [(1,0,103),(213,255,0),(255,0,86),(158,0,142),(14,76,161),(255,229,2),(0,95,57),\
            (0,255,0),(149,0,58),(255,147,126),(164,36,0),(0,21,68),(145,208,203),(98,14,0)]
plotcolors = [(color[0]/float(255),color[1]/float(255),color[2]/float(255)) for color in plotcolors]
legendcolors = [mpatches.Patch(color = color) for color in plotcolors]

count = 0
for row in plotdata.index:
    y = plotdata.loc[row]
    plt.plot(x, y, color = plotcolors[count], lw=4, marker = '.', markersize = 20);
    count += 1

plt.legend(legendcolors, upcaseindex, bbox_to_anchor=(0.3, 1));
plt.xticks(x,xlabels);

So that's my graph. Do note that the reason these complaints dropped in 2016 is obviously because we're only midway through the year! I grabbed these data in early July, so that's when these numbers were last updated.