Skip to contents

The ability to create chloropleth maps relies on shape files containing the boundary information for the regions of interest. The shape files have various levels of detail ranging from continents to US Census Tracts. When constructing chloropleth maps in R the shape file is used to create the border of individual regions. Often, the regions are shaded to reflect various information contained within the data.

Map construction relies on shape files containing the geographic boundaries of geographical landmasses (continents), countries, and subregions (commonly referred to as shires, states, provinces, counties, and postal and zip codes to name a few popular subregions). Shape files are available from different organizations in various formats.

We use the term shape file to refer to a collection of data defining the “shape” of regions and subregions. When possible, downloading and using the “geopackage format” is preferred and will make life easier.

The “geopackage” format is the a very good general spatial data file format (for vector data). It is based on the SpatiaLite format, and can be read by software using GDAL/OGR, including R (with the sf package), QGIS and ArcGIS.

World Data

GADM provides maps and spatial data for all countries and their sub-divisions. While they suggest downloading individual countries due to file sizes, it is advisable to download the entire dataset. The current, version of GADM’s world spatial data is 4.1 (though there are older versions available), and it delimits 400,276 administrative areas.

Zip Code Data

The ZIP Code Tabulation Areas (ZCTAs) contains the ZIP code boundaries for 2021.

Go to the 2021 TIGER/Line Shapefiles webpage and click the Web Interface link under the Download heading. Set the year to “2021”, select the “ZIP Code Tabulation Areas” option, and click “Submit” to download the 2021 ZIP Code shape file. Alternatively, click this link to directly download the zipped ZCTA file (approximately 572 MBs). The expanded directory is approximately 827 MBs.

US Census Cartographic Boundary Files

The US Census Cartographic Boundary Files contain American Indian Area Geographies, Census Block Groups, Census Tracts, Congressional Districts: 116th Congress, Consolidated Cities, Counties, Counties within Congressional Districts: 116th Congress, County Subdivisions, Divisions, Estates, Metropolitan and Micropolitan Statistical Areas and Related Statistical Areas, Places, Regions, School Districts, State Legislative Districts, States, Subbarrios, and the United States Outline. Download the entire boundary dataset in a single collection. The boundary data is at the 1:500,000 (national; not sure of the units) and the zipped file is approximately 333 MBs, and contains individual zipped files.

Storing and Using the Shape Files

Due to the stability of the data contained within the shape files, I created a mapfiles folder within my OneDrive folder to easily store the files and access them from R.

  • Create the mapfiles directory on OneDrive; e.g., ~/OneDrive/projects/mapfiles You will notice the mapfiles directory is within a projects directory. Because all my analyses projects are stored within projects, it made the most sense to store the mapfiles within the projects directory.
  • Move (drag-and-drop) the compressed (zipped) map data into you mapfiles directory.
  • Unzip the gadm_410-gpkg.zip, cb_2021_us_all_500k.zip, and tl_2021_us_zcta520.zip files.

When making maps within R, create variables containing the path to each shape file. For example

  mapfile.world <- "~/OneDrive/projects/mapfiles/gadm_410.gpkg"
  mapfile.zipcodes <- "~/OneDrive/projects/mapfiles/tl_2021_us_zcta520/tl_2021_us_zcta520.shp"

Because of their size, the maps can take some time to load, and if stored within the RData file, will significantly increase the size of the file and increase the write and read time of your RData files.