GeoCoding: Downloading Shape Files
Updated 04/Oct/2022
Source:vignettes/GeoCoding_DownloadShapeFiles.Rmd
GeoCoding_DownloadShapeFiles.Rmd
The ability to create chloropleth maps
relies on shape files containing the boundary information for the
regions of interest. The shape files have various levels of detail
ranging from continents to US Census Tracts. When constructing
chloropleth maps in R
the shape file is used to create the
border of individual regions. Often, the regions are shaded to reflect
various information contained within the data.
Map construction relies on shape files containing the geographic boundaries of geographical landmasses (continents), countries, and subregions (commonly referred to as shires, states, provinces, counties, and postal and zip codes to name a few popular subregions). Shape files are available from different organizations in various formats.
We use the term shape file to refer to a collection of data defining the “shape” of regions and subregions. When possible, downloading and using the “geopackage format” is preferred and will make life easier.
The “geopackage” format is the a very good general spatial data file format (for vector data). It is based on the SpatiaLite format, and can be read by software using GDAL/OGR, including
R
(with the sf package), QGIS and ArcGIS.
World Data
GADM provides maps and spatial data for all countries and their sub-divisions. While they suggest downloading individual countries due to file sizes, it is advisable to download the entire dataset. The current, version of GADM’s world spatial data is 4.1 (though there are older versions available), and it delimits 400,276 administrative areas.
- Go to the GADM “entire world download” webpage https://gadm.org/download_world.html
- Click the link for the “single database in the GeoPackage format.” NB: The compressed complete world database is approximately 1.47 GBs and the unzipped file is approximately 2.76 GBs.
Zip Code Data
The ZIP Code Tabulation Areas (ZCTAs) contains the ZIP code boundaries for 2021.
Go to the 2021 TIGER/Line Shapefiles webpage and click the Web Interface link under the Download heading. Set the year to “2021”, select the “ZIP Code Tabulation Areas” option, and click “Submit” to download the 2021 ZIP Code shape file. Alternatively, click this link to directly download the zipped ZCTA file (approximately 572 MBs). The expanded directory is approximately 827 MBs.
US Census Cartographic Boundary Files
The US Census Cartographic Boundary Files contain American Indian Area Geographies, Census Block Groups, Census Tracts, Congressional Districts: 116th Congress, Consolidated Cities, Counties, Counties within Congressional Districts: 116th Congress, County Subdivisions, Divisions, Estates, Metropolitan and Micropolitan Statistical Areas and Related Statistical Areas, Places, Regions, School Districts, State Legislative Districts, States, Subbarrios, and the United States Outline. Download the entire boundary dataset in a single collection. The boundary data is at the 1:500,000 (national; not sure of the units) and the zipped file is approximately 333 MBs, and contains individual zipped files.
Storing and Using the Shape Files
Due to the stability of the data contained within the shape files, I
created a mapfiles
folder within my OneDrive folder to
easily store the files and access them from R
.
- Create the
mapfiles
directory on OneDrive; e.g.,~/OneDrive/projects/mapfiles
You will notice themapfiles
directory is within aprojects
directory. Because all my analyses projects are stored withinprojects
, it made the most sense to store the mapfiles within theprojects
directory. - Move (drag-and-drop) the compressed (zipped) map data into you
mapfiles
directory. - Unzip the
gadm_410-gpkg.zip
,cb_2021_us_all_500k.zip
, andtl_2021_us_zcta520.zip
files.
When making maps within R
, create variables containing
the path to each shape file. For example
mapfile.world <- "~/OneDrive/projects/mapfiles/gadm_410.gpkg"
mapfile.zipcodes <- "~/OneDrive/projects/mapfiles/tl_2021_us_zcta520/tl_2021_us_zcta520.shp"
Because of their size, the maps can take some time to load, and if
stored within the RData
file, will significantly increase
the size of the file and increase the write and read time of your
RData
files.