Hack for Heat #3: Number of complaints over time
%matplotlib inline
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import psycopg2
pd.options.display.max_columns = 40
Hack for Heat #3: Number of complaints over time¶
This time, we're going to look at raw 311 complaint data. The data that I was working with previously was summarized data.
This dataset is much bigger, which is nice because it'll give me a chance to maintain my SQL-querying-from-memory-skills.
First, we're going to have to load all of this data into a postgres database. I wrote this tablebase.
connection = psycopg2.connect('dbname = threeoneone user=threeoneoneadmin password=threeoneoneadmin')
cursor = connection.cursor()
For example, we might want to extract the column names from our table:
cursor.execute('''SELECT * FROM threeoneone.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'service'; ''')
columns = cursor.fetchall()
columns = [x[3] for x in columns]
columns[0:5]
Complaints over time¶
Let's start with something simple. First, let's extract a list of all complaints, and the plot the number of complaints by month.
cursor.execute('''SELECT createddate FROM service;''')
complaintdates = cursor.fetchall()
complaintdates = pd.DataFrame(complaintdates)
complaintdates.head()
Renaming our column:
complaintdates.columns = ['Date']
Next we have to convert these tuples into strings:
complaintdates['Date'] = [x [0] for x in complaintdates['Date']]
Normally, if these were strings, we'd use the extract_dates function we wrote in a previous post. However, because I typed these as datetime objects, we can just extract the .year(), .month(), and .day() attributes:
type(complaintdates['Date'][0])
complaintdates['Day'] = [x.day for x in complaintdates['Date']]
complaintdates['Month'] = [x.month for x in complaintdates['Date']]
complaintdates['Year'] = [x.year for x in complaintdates['Date']]
This is how many total complaints we have:
len(complaintdates)
We can group them by month:
bymonth = complaintdates.groupby(by='Month').count()
bymonth
By year:
byyear = complaintdates.groupby(by='Year').count()
byyear
byday = complaintdates.groupby(by='Day').count()
bydate = complaintdates.groupby(by='Date').count()
Some matplotlib¶
plt.figure(figsize = (12,10))
x = range(0,12)
y = bymonth['Date']
plt.plot(x,y)
plt.figure(figsize = (12,10))
x = range(0,7)
y = byyear['Date']
plt.plot(x,y)
plt.figure(figsize = (12,10))
x = range(0,len(byday))
y = byday['Date']
plt.plot(x,y)
The sharp decline we see at the end is obviously because not all months have the same number of days.
plt.figure(figsize=(12,10))
x = range(0,len(bydate))
y = bydate['Year'] #This is arbitrary - year, month, and day are all series that store the counts
plt.plot(x,y)
That's all for now. In the next post, I'm going to break this down by borough, as well as polish this graph.