Hack for Heat #11: NYC's top complaints pt.2
In [1]:
%matplotlib inline
from matplotlib import pyplot as plt
from matplotlib import patches as mpatches
import pandas as pd
import numpy as np
import psycopg2
from datetime import datetime
from datetime import date
pd.options.display.max_columns = 40
Hack for Heat #11: NYC's top complaints pt. 2¶
In this post, I'm going to post a visualization of the top complaints reported in the previous post. I have this in a csv:
In [2]:
tempdata = pd.read_csv('topfiverev.csv', names = ['Year','Category','Count'])
In [3]:
tempdata.head()
Out[3]:
In [4]:
#Fixing some category names
tempdata['Category'] = ['PAINT/PLASTER' if (x == 'PAINT/PLASTER') | (x== 'PAINT - PLASTER') else x for x in tempdata['Category']]
tempdata['Category'] = ['Street Condition' if x == 'Traffic Signal Condition' else x for x in tempdata['Category'] ]
What we want is to create a data frame with as many rows as there are categories, as many columns as there are years, and the counts as the data points.
In [5]:
upcaseindex = [x.upper() for x in tempdata['Category'].unique()]
upcaseindex.remove('NONCONST') #remove this category because it refers to missing values
In [6]:
plotdata = pd.DataFrame(index = upcaseindex, columns = tempdata['Year'].unique())
In [7]:
plotdata
Out[7]:
Now let's populate it with values.
In [8]:
for row in tempdata['Category'].unique():
for year in tempdata['Year'].unique():
try:
plotdata.loc[row.upper(),year] = tempdata.loc[(tempdata['Year'] == year) & (tempdata['Category'] == row)]['Count'].values[0]
except:
plotdata.loc[row.upper(),year] = None
In [9]:
plotdata = plotdata.drop(['NONCONST'])
plotdata
Out[9]:
In [10]:
x = range(0,len(plotdata.columns))
xlabels = plotdata.columns
plt.figure(figsize=(16,10));
plt.xlim(-1,len(xlabels));
plt.ylim(0,500000);
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['top'].set_visible(False)
plt.gca().yaxis.set_ticks_position('left')
plt.gca().xaxis.set_ticks_position('bottom')
plt.title('Complaint Category by Year', size = 24)
plt.ylabel('Number of Complaints',size = 14)
plotcolors = [(1,0,103),(213,255,0),(255,0,86),(158,0,142),(14,76,161),(255,229,2),(0,95,57),\
(0,255,0),(149,0,58),(255,147,126),(164,36,0),(0,21,68),(145,208,203),(98,14,0)]
plotcolors = [(color[0]/float(255),color[1]/float(255),color[2]/float(255)) for color in plotcolors]
legendcolors = [mpatches.Patch(color = color) for color in plotcolors]
count = 0
for row in plotdata.index:
y = plotdata.loc[row]
plt.plot(x, y, color = plotcolors[count], lw=4, marker = '.', markersize = 20);
count += 1
plt.legend(legendcolors, upcaseindex, bbox_to_anchor=(0.3, 1));
plt.xticks(x,xlabels);
So that's my graph. Do note that the reason these complaints dropped in 2016 is obviously because we're only midway through the year! I grabbed these data in early July, so that's when these numbers were last updated.