Add and Remove Columns

To manipulate our datasets, we are going to use the dplyr package, a component of the tidyverse (and so dplyr is automatically installed with tidyverse). dplyr is a package dedicated to data wrangling, and will come in great use to us.

We will be performing many modifications to our datasets in these exercises, but to leave ourselves a trail of breadcrumbs back to the original data, we will want to create new datasets instead of permanently modifying the original tibbles. For this, we will create a new object for each modification. Assigning a value to an object is an integral component of R and many coding practices:

# create an object
x <- 3
x

## [1] 3

This will come in handy when we want to modify one of our datasets. Sometimes the data we are seeking is not included in the original data, but is a product of existing columns. For instance, it may be useful to calculate organic matter density by multiplying dry bulk density (g * cm^-3) by the fraction of soil that is organic matter (fraction). To do this, we can use the mutate() operation:

# calculate organic matter density
depthseries_data_with_organic_matter_density <- mutate(depthseries_data, 
       organic_matter_density = dry_bulk_density * fraction_organic_matter)

head(depthseries_data_with_organic_matter_density)

## # A tibble: 6 x 9
##   study_id  core_id depth_min depth_max dry_bulk_density fraction_organic~
##   <chr>     <chr>       <int>     <int>            <dbl>             <dbl>
## 1 Boyd_2012 BBRC_1          0         2             0.28              0.28
## 2 Boyd_2012 BBRC_1          4         6             0.41              0.23
## 3 Boyd_2012 BBRC_1          8        10             0.4               0.27
## 4 Boyd_2012 BBRC_1         12        14             0.48              0.2 
## 5 Boyd_2012 BBRC_1         16        18             0.54              0.18
## 6 Boyd_2012 BBRC_1         20        22             0.59              0.16
## # ... with 3 more variables: fraction_carbon <dbl>,
## #   fraction_carbon_type <chr>, organic_matter_density <dbl>

Notice that we have 9 columns instead of 8, as we did before? That's because we've successfully added organic matter density. We now have more columns than we can easily display in the console, which is one (of many) reasons that you may want to manipulate how many, and which, columns we choose to include. In our present case, we want to simplify our dataset to just a few columns. For this, we select() which columns we desire.

# Select just a handful of columns
depthseries_data_select <- select(depthseries_data_with_organic_matter_density, # Define dataset
                                  core_id, depth_min, depth_max, organic_matter_density) # Chosen columns

head(depthseries_data_select)

## # A tibble: 6 x 4
##   core_id depth_min depth_max organic_matter_density
##   <chr>       <int>     <int>                  <dbl>
## 1 BBRC_1          0         2                 0.0784
## 2 BBRC_1          4         6                 0.0943
## 3 BBRC_1          8        10                 0.108 
## 4 BBRC_1         12        14                 0.096 
## 5 BBRC_1         16        18                 0.0972
## 6 BBRC_1         20        22                 0.0944

This display looks much better. We can also perform both the mutate and select operations in the same command by using dplyr’s pipe notation, %>%. The pipe notation is useful for a few reasons. First, you can call many operations in quick succession on the same dataset. It makes for cleaner, more readable code: we can easily display each operation on a new line, and we move the dataset from the first argument in each of the operations to outside and in front of the function. You can say to yourself as you type %>%, "I'm putting the dataset on the left through the function below". Don't worry about being judged while talking to yourself, just tell people you're learning how to code in R and they will forgive a lot of strange behavior.

# Calculate organic matter density, then remove all non-critical columns
depthseries_data_select <- depthseries_data %>% # Define dataset
# Separate operations with "%>%"
  mutate(organic_matter_density = dry_bulk_density * fraction_organic_matter) %>%
# Even though we just created the organic_matter_density column,
# we can immediately use it in the select() operation
  select(core_id, depth_min, depth_max, organic_matter_density) 

head(depthseries_data_select)

## # A tibble: 6 x 4
##   core_id depth_min depth_max organic_matter_density
##   <chr>       <int>     <int>                  <dbl>
## 1 BBRC_1          0         2                 0.0784
## 2 BBRC_1          4         6                 0.0943
## 3 BBRC_1          8        10                 0.108 
## 4 BBRC_1         12        14                 0.096 
## 5 BBRC_1         16        18                 0.0972
## 6 BBRC_1         20        22                 0.0944

Last Page Return to Top Next Page