Commodity Flow Survey 1997-2017

Data Accumulation and Cleaning

The purpose of gathering and understanding this data is to have a comprehensive dataset that people could use to compare information from the Commodity Flow Survey throughout the years. The data is collected every five years, but has not been organized and the formatting varies year to year.

Gathering the Data

The first step is gathering the data from 1997 to 2017. The tables that were gathered included origin, destination, commodity, and mode. For 2002, there were separate tables for commodity and mode and separate tables for the Coefficient of Variation that would have to later be merged. For 1997, the destination tables did not include Average Miles or the Coefficient of Variation.
The data is public data, so for privacy reasons some data is suppressed and thus may not be accurately represented. Additionally, due to the size of the data, the entire dataset could not be uploaded on to this platform; therefore, the data was filtered by "All modes" or "All commodities". 
2012 and 2017     |      2002 and 2007     |     1997

Understanding the Data

It is important to be able to understand the data that is accumulated in order to make visualizations and to analyze the data. Included are helpful links and definitions for the Commodity Flow Survey.

Helpful Links

Standard Classification of Transported Goods (SCTG)

Standard Classification of Transported Goods (SCTG) codes are used to label different commodity types.  Two-digit SCTG codes can be put into different categories. Below is a snippet of a few, but not all of the code descriptions. Clicking the image will lead to the original source for further clarification.

Mode of Transportation Definitions

Mode of transportation is broken into different categories including single modes and multiple modes.
Single Mode Shipments - Shipments transported by only one of the following modes: Private truck, For-hire truck, Rail, any water mode, Pipeline, or Air.
Company-Owned truck (formerly Private Truck) – Trucks operated by employees of the establishment or the buyer/receiver of the shipment.  Includes trucks providing dedicated services to the establishment.
For-Hire Truck – Trucks operated by common or contract carriers made under a negotiated rate.
Rail – Any common carrier or private railroad.
Inland Water – Vessels or barges operating primarily in navigable waters, both within and along the borders of the United States, such as:
  • Rivers (Mississippi River, Saint Lawrence Seaway, etc.)
  • Lakes (excluding Great Lakes)
  • Along the shoreline but actually in the ocean (Intracoastal Waterway along the Atlantic and Gulf coasts, Inside Passage of Alaska, etc.)
  • Canals, harbors, major bays, and inlets
Great Lakes – Vessels or barges operating on the Great Lakes.

Deep Sea – Vessels or barges operating primarily in the open waters of the ocean, outside the borders of the United States.
Multiple Waterways – Shipments sent by any combination of Inland water, Great Lakes, and Deep sea; involving a transfer between vessels.
Pipeline - Movements of oil, petroleum, gas, slurry, etc. through pipelines that extend to other establishments or locations beyond the shipper's establishment. (Aqueducts for the movement of water are not included.)
Air - Any shipment sent via air mode to its destination. (This includes shipments carried by truck to and/or from an airport.)
Multiple Mode Shipments - Shipments for which two or more of the following modes of transportation were used AND Parcel delivery/Courier/U.S. Parcel Post shipments:
  • Company-owned truck or For-hire truck
  • Railroad
  • Water (Inland water, Great Lakes, Deep Sea, and Multiple Waterways)
  • Pipeline
  • Air
  • Other mode
Parcel Delivery/Courier/U.S. Parcel Post - Includes ground and air shipments of packages and parcels that weigh 150 pounds or less, and were transported by a for-hire carrier.
(Parcel delivery/Courier/U.S. Parcel Post are considered multiple mode because this category includes all parcel shipments whether on the ground or via air tendered to a parcel or express carrier. In defining this mode, we did not combine these shipments with any other reported mode because by their nature, Parcel delivery/Courier/U.S. Parcel Post are already multimodal. For example, if the respondent reported a shipment's mode of transportation as "parcel" and "air," we treated the shipment as parcel only.)
Other Multiple Modes – Shipments sent by any other mode combinations not specifically listed in the tables.
 Other Mode(s) – Includes shipments with a mode other than any of the listed modes, such as conveyor belt, animal power, etc. 

Data Cleaning and Organizing

Once the data has been gathered, the next step is to clean and organize the data. The column names differ from year to year, so they are now updated to match the most recent year. Additionally, the column types are changed in order for visualizations to work correctly. All the cleaning and organizing was done in R, a programming language and software environment. Below is a table to explain the column names and types.
  • Character Fields (Text)
    • Origin
    • Destination
    • Commodity Name
    • Year
  • Numerical Fields
    • 2-digit SCTG Code
    • Value (in Millions $)
    • Coefficient of Variation for Value
    • Tons (in thousands)
    • Coefficient of Variation for Tons
    • Ton-Miles (millions)
    • Coefficient of Variation for Ton-Miles
    • Average Miles per Shipment
    • Coefficient of Variation for Average Miles per Shipment

Methodological Changes

Labels for mode of transportation and commodity type have changed throughout the years. To make the data homogenous, the labels are updated to match the most recent year. Included are snippets of mode of transportation changes and a snippet of coding in R that had to be done to update the labels. Clicking the image will lead to the original source for further clarification.

SCTG Description Changes

SCTG Code Changes
note: because of the revision in 2012, the code meaning has some minor changes from the previous years and may not be accurately represented in the data