Hacking for heat¶

In this series, I'm going to be posting about the process that goes on behind some of the blog posts we end up writing. In this first entry, I'm going to be exploring a number of datsets.

These are the ones that I'm going to be looking at:

(for HPD datasets, some documentation can be found here)

HPD litigation database¶

First, we're going to look at the smallest dataset, one that contains cases against landlords. From the documentation, this file contains "All cases commenced by HPD or by tennants (naming HPD as a party) in [housing court] since August 2006" either seeking orders for landlords to comply with regulations, or awarding HPD civil penalties (i.e., collecting on fines).

import pandas as pd

litigation = pd.read_csv("Housing_Litigations.csv")

litigation.head()

Let's take a look at unique values for some of the columns:

litigation['Boro'].unique()

array(['BRONX', 'QUEENS', 'MANHATTAN', 'BROOKLYN', 'STATEN ISLAND'], dtype=object)

litigation.groupby(by = ['Boro','CaseJudgement']).count()

The above table tells us that Manhattan has the lowest proportion of cases that receive judgement (about 1 in 80), whereas Staten Island has the highest (about 1 in 12). It may be something worth looking into, but it's also important to note that many cases settle out of court, and landlords in Manhattan may be more willing (or able) to do so.

litigation['CaseType'].unique()

array(['Tenant Action', 'Heat and Hot Water', 'CONH',
       'Comp Supplemental Cases', 'Comprehensive',
       'Access Warrant - Non-Lead', 'Access Warrant - lead',
       'Lead False Certification', 'False Certification Non-Lead',
       'Heat Supplemental Cases', '7A', 'Failure to Register Only',
       'HLD - Other Case Type'], dtype=object)

litigation.groupby(by = ['CaseType', 'CaseJudgement']).count()

The table above shows the same case judgement proportions, but conditioned on what type of case it was. Unhelpfully, the documentation does not specify what the difference between Access Warrant - Lead and Non-Lead is. It could be one of two possibilities: The first is whether the warrants have to do with lead-based paint, which is a common problem, but perhaps still too idiosyncratic to have it's own warrant type. The second, perhaps more likely possibility is whether or not HPD was the lead party in the case.

We'll probably end up using these data by aggregating it and examining how complaints change over time, perhaps as a function of what type they are. There's also the possibility of looking up specific buildings' complaints and tying them to landlords. There's probably also an easy way to join this dataset with another, by converting the address information into something standardized, like borough-block-lot (BBL; http://www1.nyc.gov/nyc-resources/service/1232/borough-block-lot-bbl-lookup)

HPD complaints¶

Next, we're going to look at a dataset of HPD complaints.

hpdcomp = pd.read_csv('Housing_Maintenance_Code_Complaints.csv')

hpdcomp.head()

len(hpdcomp)

662672

This dataset is less useful on its own. It doesn't tell us what the type of complaint was, only the date it was received and whether or not the complaint is still open. However, it may be useful in conjunction with the earlier dataset. For example, we might be interested in how many of these complaints end up in court (or at least, have some sort of legal action taken).

HPD violations¶

The following dataset tracks HPD violations.

hpdviol = pd.read_csv('Housing_Maintenance_Code_Violations.csv')

hpdviol.head()

len(hpdviol)

1437246

These datasets all have different lengths, but that's not surprising, given they come from different years. One productive initial step would be to convert the date strings into something numerical.

HPD complaint problems database¶

hpdcompprob = pd.read_csv('Complaint_Problems.csv')

hpdcompprob.head()

Awesome! This dataset provides some more details about the complaints, and lets us join by ComplaintID.

Summary and next steps¶

In the immediate future, I'm going to be writing script to join and clean this dataset. This can either be done either in python, or by writing some SQL. I haven't decided yet. Additionally, I'm going to be writing some code to do things like convert date strings, and perhaps scrape text.

	LitigationID	BuildingID	BoroID	Boro	HouseNumber	StreetName	Zip	Block	Lot	CaseType	CaseOpenDate	CaseStatus	CaseJudgement
0	98385	806633	2	BRONX	866	EAST 221 STREET	10467.0	4679	73	Tenant Action	04/23/2009	CLOSED	NO
1	98387	55134	2	BRONX	3940	CARPENTER AVENUE	10466.0	4825	51	Tenant Action	04/23/2009	CLOSED	NO
2	98389	119044	2	BRONX	1356	WALTON AVENUE	10452.0	2841	41	Tenant Action	04/23/2009	CLOSED	NO
3	98391	889796	2	BRONX	871	EAST 179 STREET	10460.0	3123	77	Tenant Action	04/24/2009	CLOSED	NO
4	98393	56949	2	BRONX	1680	CLAY AVENUE	10457.0	2889	1	Tenant Action	04/24/2009	CLOSED	NO

		LitigationID	BuildingID	BoroID	HouseNumber	StreetName	Zip	Block	Lot	CaseType	CaseOpenDate	CaseStatus
Boro	CaseJudgement
BRONX	NO	19734	19734	19734	19734	19734	19734	19734	19734	19734	19701	19734
BRONX	YES	481	481	481	481	481	481	481	481	481	478	481
BROOKLYN	NO	19833	19833	19833	19833	19833	19831	19833	19833	19833	19807	19833
BROOKLYN	YES	762	762	762	762	762	760	762	762	762	762	762
MANHATTAN	NO	11849	11849	11849	11849	11849	11849	11849	11849	11849	11841	11849
MANHATTAN	YES	129	129	129	129	129	129	129	129	129	129	129
QUEENS	NO	9068	9068	9068	9068	9068	9061	9068	9068	9068	9063	9068
QUEENS	YES	313	313	313	313	313	313	313	313	313	308	313
STATEN ISLAND	NO	1064	1064	1064	1064	1064	1064	1064	1064	1064	1064	1064
STATEN ISLAND	YES	84	84	84	84	84	84	84	84	84	84	84

		LitigationID	BuildingID	BoroID	Boro	HouseNumber	StreetName	Zip	Block	Lot	CaseOpenDate	CaseStatus
CaseType	CaseJudgement
7A	NO	105	105	105	105	105	105	105	105	105	102	105
Access Warrant - Non-Lead	NO	5665	5665	5665	5665	5665	5665	5665	5665	5665	5665	5665
Access Warrant - Non-Lead	YES	8	8	8	8	8	8	8	8	8	8	8
Access Warrant - lead	NO	1786	1786	1786	1786	1786	1786	1786	1786	1786	1785	1786
CONH	NO	797	797	797	797	797	797	797	797	797	797	797
Comp Supplemental Cases	NO	750	750	750	750	750	750	750	750	750	702	750
Comp Supplemental Cases	YES	31	31	31	31	31	31	31	31	31	29	31
Comprehensive	NO	2274	2274	2274	2274	2274	2274	2274	2274	2274	2274	2274
Comprehensive	YES	121	121	121	121	121	121	121	121	121	121	121
Failure to Register Only	NO	60	60	60	60	60	60	60	60	60	60	60
Failure to Register Only	YES	3	3	3	3	3	3	3	3	3	3	3
False Certification Non-Lead	NO	2355	2355	2355	2355	2355	2355	2355	2355	2355	2355	2355
False Certification Non-Lead	YES	20	20	20	20	20	20	20	20	20	20	20
HLD - Other Case Type	NO	3	3	3	3	3	3	3	3	3	2	3
Heat Supplemental Cases	NO	124	124	124	124	124	124	124	124	124	108	124
Heat Supplemental Cases	YES	7	7	7	7	7	7	6	7	7	6	7
Heat and Hot Water	NO	15760	15760	15760	15760	15760	15760	15758	15760	15760	15757	15760
Heat and Hot Water	YES	1299	1299	1299	1299	1299	1299	1298	1299	1299	1294	1299
Lead False Certification	NO	239	239	239	239	239	239	239	239	239	239	239
Lead False Certification	YES	2	2	2	2	2	2	2	2	2	2	2
Tenant Action	NO	31630	31630	31630	31630	31630	31630	31623	31630	31630	31630	31630
Tenant Action	YES	278	278	278	278	278	278	278	278	278	278	278

	ComplaintID	BuildingID	BoroughID	Borough	HouseNumber	StreetName	Zip	Block	Lot	Apartment	CommunityBoard	ReceivedDate	StatusID	Status	StatusDate
0	6960137	3418	1	MANHATTAN	1989	ADAM C POWELL BOULEVARD	10026.0	1904	4	12D	10	07/07/2014	2	CLOSE	07/29/2014
1	6960832	3512	1	MANHATTAN	2267	ADAM C POWELL BOULEVARD	10030.0	1918	4	3B	10	07/08/2014	2	CLOSE	07/12/2014
2	6946867	5318	1	MANHATTAN	778	11 AVENUE	10019.0	1083	1	4P	4	06/19/2014	2	CLOSE	07/13/2014
3	6966946	5608	1	MANHATTAN	1640	AMSTERDAM AVENUE	10031.0	2073	29	5A	9	07/16/2014	2	CLOSE	07/21/2014
4	6956574	17896	1	MANHATTAN	230	EAST 88 STREET	10128.0	1533	32	1E	8	07/01/2014	2	CLOSE	07/09/2014

	ViolationID	BuildingID	RegistrationID	BoroID	Boro	HouseNumber	LowHouseNumber	HighHouseNumber	StreetName	StreetCode	...	NewCertifyByDate	NewCorrectByDate	CertifiedDate	OrderNumber	NOVID	NOVDescription	NOVIssuedDate	CurrentStatusID	CurrentStatus	CurrentStatusDate
0	2293208	444	130476	1	MANHATTAN	22	22	22	1 AVENUE	10010	...	NaN	NaN	NaN	772	338596.0	Â§ 27-2098 ADM CODE FILE WITH THIS DEPARTMENT ...	04/22/1997	19	VIOLATION CLOSED	03/10/2015
1	2293181	444	130476	1	MANHATTAN	22	22	22	1 AVENUE	10010	...	NaN	NaN	09/10/1974	775	338584.0	D26-41.05 ADM CODE FILE WITH THIS DEPARTMENT R...	07/16/1974	19	VIOLATION CLOSED	03/10/2015
2	2293249	448	135326	1	MANHATTAN	2222	2222	2222	1 AVENUE	10010	...	NaN	NaN	NaN	772	338619.0	Â§ 27-2098 ADM CODE FILE WITH THIS DEPARTMENT ...	11/19/1996	19	VIOLATION CLOSED	03/10/2015
3	2293486	467	136913	1	MANHATTAN	2250	2250	2250	1 AVENUE	10010	...	NaN	NaN	NaN	772	338733.0	D26-41.03 ADM CODE FILE WITH THIS DEPARTMENT A...	05/10/1982	19	VIOLATION CLOSED	03/10/2015
4	2293490	467	136913	1	MANHATTAN	2250	2250	2250	1 AVENUE	10010	...	NaN	NaN	NaN	772	338737.0	Â§ 27-2098 ADM CODE FILE WITH THIS DEPARTMENT ...	11/19/1996	19	VIOLATION CLOSED	03/10/2015

	ProblemID	ComplaintID	UnitTypeID	UnitType	SpaceTypeID	SpaceType	TypeID	Type	MajorCategoryID	MajorCategory	MinorCategoryID	MinorCategory	CodeID	Code	StatusID	Status	StatusDate	StatusDescription
0	14548958	6967900	91	APARTMENT	541	BATHROOM	1	EMERGENCY	9	PLUMBING	63	BATHTUB/SHOWER	2538	BROKEN OR MISSING	2	CLOSE	07/29/2014	The Department of Housing Preservation and De...
1	14548959	6967900	91	APARTMENT	541	BATHROOM	3	NON EMERGENCY	9	PLUMBING	63	BATHTUB/SHOWER	2540	FAUCET BROKEN/MISSING/LEAKING	2	CLOSE	08/04/2014	The Department of Housing Preservation and De...
2	14548960	6967900	91	APARTMENT	543	ENTIRE APARTMENT	3	NON EMERGENCY	58	FLOORING/STAIRS	343	FLOOR	2691	TILE BROKEN OR MISSING	2	CLOSE	08/04/2014	The Department of Housing Preservation and De...
3	14548961	6967900	91	APARTMENT	541	BATHROOM	3	NON EMERGENCY	9	PLUMBING	63	BATHTUB/SHOWER	2541	CHIPPED OR RUSTED	2	CLOSE	08/04/2014	The Department of Housing Preservation and De...
4	14615271	6994958	20	APARTMENT	68	ENTIRE APARTMENT	1	EMERGENCY	59	HEAT/HOT WATER	349	ENTIRE BUILDING	2717	NO HOT WATER	2	CLOSE	08/22/2014	The Department of Housing Preservation and De...