Lately I have been working on a project to try to predict where cycling infrastructure should be built in order to increase bicycle commuting. This is an important project because it has the potential to help decrease traffic while improving the health of commuters. Cycling infrastructure is expensive to build, particularly in areas that are already developed. If accurate predictions can be made, urban planner can make informed, statistically-based decisions that will maximize effectiveness while minimizing capital costs.
One of my initial findings is that bicycle commuting is particularly high around universities and colleges. While this is not shocking since many of the commuters are potentially college students or university faculty and staff who live very close to campus, it is important because it demonstrates a significant need for safe cycling infrastructure.
While statistics are important, visual presentations can have a much more striking impact. Using R, I created a visualization to show of the amount of bicycle commuting around several major universities. The geography is broken into sections called Census Tracts. A census tract can be thought of a neighborhood. Each census tract in the visual is color coded according the percentage of residents who commute to work via bicycle.
The data for the visual was obtained from the US Census Bureau's American Community Survey (ACS) using the 2012-2016 American Community Survey 5-Year Estimates. I used table B8301 - Means of Transportation to Work for all Census Tracts in Texas. This table has a large number of features, but I was only interested in a few:
In R, I used several libraries.
The code for creating the visual is below:
#Map Percentage of Cyclists around Universities at the Census Tract level
#Read the full Means of Transportation csv (B08301) into a data frame
full_MOT_df <- read.csv("ACS_16_5YR_B08301_All_TX_CT.csv", header = T)
#Create a pared down version of the Means of Transportation data frame, only including the values of interest, and calculate the percentage of residents who commute to work by bicycle
pared_MOT_df <- data.frame(full_MOT_df$GEO.id, full_MOT_df$GEO.id2, full_MOT_df$GEO.display.label, full_MOT_df$HD01_VD01, full_MOT_df$HD01_VD18, full_MOT_df$HD01_VD18 / full_MOT_df$HD01_VD01 * 100)
#Name the columns
names(pared_MOT_df) <- c("GEOID", "GEOID2", "GEO_Display_ID", "Total_Estimate", "Number_of_Cyclists", "Percent_Cyclists")
#Read the file that the colleges information is located in
colleges_df <- read.csv("colleges.csv", header = T)
#Get the unique counties from the County column in the colleges_df
unique_counties <- unique(colleges_df$County)
#Get census tracts for the various counties of interest (using the tigris library)
census_tracts <- tracts(state = 'TX', county = unique_counties)
#Join the census tract data (coordinates, etc) with the cycling information
tracts_joined <- geo_join(census_tracts, pared_MOT_df, "GEOID", "GEOID2")
#Join the census tract data (coordinates, etc) with the college information. This allows us to have one large data frame with all the data that is needed
colleges_joined <- geo_join(tracts_joined, colleges_df, "GEOID", "GEOID2", how = 'inner')
#Create the color palate to represent the percentage of cyclists in a census tract - Yellow to Orange to Red
pal <- colorNumeric("YlOrRd", tracts_joined$Percent_Cyclists)
#Create the text that will display in the popup on the map when a census tract is selected
popup <- paste0("Census Tract: ", tracts_joined$GEOID, " Percent Cyclists: ", sprintf("%.1f%%", tracts_joined$Percent_Cyclists), " Estimate: ", tracts_joined$Total_Estimate, " Number of Cyclists: ", tracts_joined$Number_of_Cyclists)
#Create map using Leaflet
#colleges_joined is the underlying data with the coloration of the census tracts determined by the percent of residents who commute to work via bicycle (Percent_Cyclists field)
addPolygons(data = tracts_joined,
fillColor = ~pal(tracts_joined$Percent_Cyclists),
fillOpacity = 0.7,
weight = 0.2,
smoothFactor = 0.2,
popup = popup) %>%
addLegend(pal = pal,
values = tracts_joined$Percent_Cyclists,
position = "bottomright",
title = "% Bicycle Commuters") %>%
addMarkers(~Lon, ~Lat, popup = paste0(colleges_joined$Name, ", Census Tract: ", colleges_joined$GEOID2, ", Percent Cyclists: ", sprintf("%.1f%%", colleges_joined$Percent_Cyclists), " Estimate: ", colleges_joined$Total_Estimate, " Number of Cyclists: ", colleges_joined$Number_of_Cyclists))
This will produce the following map. Each of the blue markers is a university.
If we zoom in on one of the markers, Texas A&M in College Station, TX, we can see the outline of Brazos County, in which it is located (yellow area).
Zooming in a little closer and clicking on the blue marker, we see the information about Texas A&M. There is quite a bit of information in this small area. The university is located in census tract 48041002015. 11.6% of the residents (148/1272) commute to work via bicycle. The university is also located in a census tract that is colored dark purple. Looking at the legend, we see that this represents over 10% bicycle commuters. So just by looking at the color of the census tract, we can see that it has a high percentage of bicycle commuters. Clicking on the marker allows us to drill down and get more detailed information. We can also see that the surrounding census tracts also have high percentages of residents who commute via bicycle (notice the red and dark orange shapes). This is a great way for a non-technical audience member to see a lot of information very quickly.
I hope this has been helpful.