Plot a Histogram

Now we will use these skills in unison to create a plot of our data. For this, we will use the ggplot2 package, also part of the tidyverse. The template for visualizations generated by ggplot involve base inputs through the ggplot() operation, followed by aesthetics added with aes() and layouts generated with theme(). There is a vast array of kinds of plots we can create in R through ggplot, but we will start with a simple histogram of carbon density.

First, we must calculate carbon density. Some of the studies in our data calculated the fraction mass of the sample that is carbon, which is needed to calculate carbon density. However, not all studies included this calculation. We will need to estimate the fraction carbon from the fraction organic matter, using the quadratic formula derived by Holmquist et al (2018).

In order to accomplish this, we will create a function that estimates the fraction carbon using the Holmquist et al (2018) equation. Then, we will write an if-else statement: if a logical statement is TRUE, then an action is performed; if the statement is FALSE, a different action is performed:

# This function estimates fraction carbon as a function of fraction organic matter
estimate_fraction_carbon <- function(fraction_organic_matter) {
  
# Use the quadratic relationship published by Holmquist et al. 2018 to derive the fraction of carbon from the 
# fraction of organic matter
  
  fraction_carbon <- (0.014) * (fraction_organic_matter ^ 2) + (0.421 * fraction_organic_matter) + (0.008) 
  if (fraction_carbon < 0) {
    fraction_carbon <- 0 # Replace negative values with 0
  }
  return(fraction_carbon) # The product of the function is the estimated fraction carbon value
} 

depthseries_data_with_carbon_density <- depthseries_data %>% # For each row in depthseries_data...
  mutate(carbon_density = 
    ifelse(! is.na(fraction_carbon), # IF the value for fraction_carbon is not NA.... 
# Then a new variable, carbon density, equals fraction carbon * dry bulk density....
      (fraction_carbon * dry_bulk_density), 
# ELSE, carbon density is estimated using our function
      (estimate_fraction_carbon(fraction_organic_matter) * dry_bulk_density) 
    )
  )

glimpse(depthseries_data_with_carbon_density)

## Observations: 16,976
## Variables: 9
## $ study_id                <chr> "Boyd_2012", "Boyd_2012", "Boyd_2012",...
## $ core_id                 <chr> "BBRC_1", "BBRC_1", "BBRC_1", "BBRC_1"...
## $ depth_min               <int> 0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 4...
## $ depth_max               <int> 2, 6, 10, 14, 18, 22, 26, 30, 34, 38, ...
## $ dry_bulk_density        <dbl> 0.28, 0.41, 0.40, 0.48, 0.54, 0.59, 0....
## $ fraction_organic_matter <dbl> 0.28, 0.23, 0.27, 0.20, 0.18, 0.16, 0....
## $ fraction_carbon         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ fraction_carbon_type    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ carbon_density          <dbl> 0.03555373, 0.04328395, 0.04907624, 0....

All of our newly calculated carbon_density data is within a new column of the same name. Let’s compile this information into a visual using ggplot. For displaying the distribution of carbon density, a histogram will do.

# Create a histogram of carbon density
histo <- ggplot(data = depthseries_data_with_carbon_density, # Define dataset
       aes(carbon_density)) + # Define which data is displayed
       geom_histogram(bins = 100) # Determine bin width, and therefore number of bins displayed

histo

histogram

Notice how ggplot has its own notation that is different from dplyr, but the two packages are similar in that you can stack operations one after another. While dplyr uses the %>% notation, ggplot's syntax is a plus sign +. Also, you likely received a warning that rows containing non-finite values were removed. This is because ggplot recognized that some of our data was NA, and did not plot these entries.

Our plot could look a little better if the limit of the x-axis was reduced to fit the distribution…and let’s change the theme background and plot labels while we're at it.

# Update our histogram
histo <- histo + # Load the plot that we've already made
         xlim(0, 0.15) + # Change x-axis limit to better fit the data
         theme_classic() + # Change the plot theme
         ggtitle("Distribution of Carbon Density of CONUS") + # Give the plot a title (CONUS = contiguous US)
         xlab("Carbon Density") + # Change x axis title
         ylab("Number of Observations") # Change y axis title
histo

histogram

Great, much better. This is a basic plot of just one type of data. However, there is much more you can do with the functionality of ggplot (the ggplot cheatsheet provides some quick overview), and soon we will develop more intricate plots with multiple data types. But first, we’ll learn a couple new skills.

Last Page Return to Top Next Page