The Data:

What is 311?

Why look at 311?

Can 311 Fix It?

Data Munging:

How to get this:

To something more useful like this:

We get there by lots of coding! There were plenty of redundant features as well as null and garbage inputs(for full cleaning analysis- see MVP notebook on Github): An Example:

print("Number of Community Boards: {}".format(api_df['community_board'].nunique()))
Number of Community Boards: 74

There are only 59 community boards in NYC!

api_df_cleaned = api_df[api_df['community_board'].isin(community_board_list)]
Number of Community Boards: 59

Early Analysis:

Regression Predicting

random_forest = RandomForestRegressor()
random_forest.fit(X_train, y_train)
print('R2 score {}'.format(random_forest.score(X_test,y_test)))
R2 score 0.7741774171081134

Feature importance is helpful because it allows us to make recommendation on specific items that may have a causal effect on outcome. This has to be approached cautiously. Note that NYPD is rated as a high feature importance. This makes sense considering that NYPD handles the largest share of complaints. By their nature as well, NYPD complaints also tend to be closed faster, as opposed to an inquiry about taxes made to Department of Finance.

gradient_boost = GradientBoostingRegressor()
gradient_boost.fit(X_train, y_train)
print('R2 score {}'.format(gradient_boost.score(X_test,y_test)))
R2 score 0.7141774171081134

Current Predictions:

Today's predictions

Current Comparision Results:

Results:

Data Dashboarding:

There's a problem with trying to visualize large open data sets...

These dashboards can be created using SQL calls using the existing API:

time = datetime.utcnow()-timedelta(days=7)
time_string = '{}-{}-{}T00:00:00.000'.format(time.year,time.month,time.day)
query = "created_date > '{}' AND complaint_type LIKE '%Homeless%' ".format(time_string)
results = client.get(database_311, select=select_sql, where=query, limit=100000)