Coastal CarbonDatabase Structure

Naming Conventions for Attributes and Variables (Version 1)

 

3 July 2018

Contents

Overview

Return to Top

This page serves as guidance for the types and scope of data and metadata that will be archived as part of the Network’s developing tidal soil carbon synthesis. We propose the following data structure and standardized attribute names for metadata and data in order to make datasets machine-readable and interoperable. Each subheading lists a level of metadata or data hierarchy from study level metadata to site level to core level to depth series information. Each subheading also represents separate tables which can be joined by common attributes such as study_id, site_id, and core_id. We also include accompanying sets of recommended controlled vocabulary for key categorical variables (also known as factors). Some attributes have controlled units that we wish to keep uniform across datasets. Data that we curate will follow naming conventions outlined herein. Data that we ingest from outside sources will be converted to these conventions when being ingested into the central GitHub database using custom-built R scripts.

At a minimum a submission should have the following for inclusion in soil carbon synthesis products: study_id, author information, core_id, latitude and longitude information associated with either a core or the site, depth_min, depth_max, dry_bulk_density, organic_matter_fraction and/or carbon_fraction. The more auxiliary detail that you provide, the more widely your data can be used. Throughout the tables below mandatory attributes are shown in bold.

The depth series is the level at which carbon-relevant information is housed. This synthesis will not ingest core-level or site-level averages of variables like dry bulk density, fraction organic matter, or fraction carbon. These averages can be derived from the database, but are not immediately useful to our research questions unless those averages can be traced back to their original data.

There are many opportunities to express your data’s individuality. We refer throughout to ‘flags’ and ‘notes’. Flags refer to common methodological choices or data issues that can be coded using categorical variables. The idea behind flags is to allow users the option to query datasets based on methodology. Flags are very machine-readable but not very flexible from the standpoint of a submitter. Notes are available for almost all measured attributes and take the form of free-text allowing submitters to provide context, observations, or concerns about methods, sites, cores, or observations. These are more flexible from the perspective of a submitter but are less machine-readable.

Development Process to Date

Return to Top

This guidance is the culminations of three efforts:

  1. A meeting of 47 experts in Menlo Park, CA in January 2016, hosted by the United States Carbon Cycle Science Program, in order to establish community priorities.

  2. Experience with the initial curation of a dataset of ~1,500 public soil cores as part of the publication Holmquist et al., 2018 Accuracy and Precision of Tidal Wetland Soil Carbon Mapping in the Conterminous United States.

  3. The results of 19 collaborators submitting commentary on an initial draft of these recommendations put up for public comment in April and May 2018.

Ongoing and Future Development

Return to Top

We acknowledge that this is a lot of information to process and do not want to imply >100 attributes are mandatory. They are not. While the entire entry template is available for download , we are also in the process of designing an application which will generate a custom submission template based on your answers to a questionnaire about your dataset.

Submitters can feel free to add other attributes to data submissions as long as the attributes and any associated categorical variables are defined with the submission. CCRCN personnel will accept and archive related soils data within reason, but will not be able to quality control data falling outside the outlined guidance. If attributes or variables are submitted often and there is community coordination behind their inclusion, they could be integrated into periodic updates to this guidance.

We anticipate that this guidance will evolve as we synthesize new datasets as part of five working groups. Part of each working group’s task will be to revisit this guidance and agree on new needed attribute names, definitions, variables, controlled vocabulary and units. Any further guidance based on the working group’s experience will be made available to the community via post-workshop reports and peer reviewed publications. Documentation on any changes to the data management plan and submission templates will be issued with version numbers. CCRCN produces will reference these documents and version numbers. We will avoid changing attribute or variable names, and will only do so if there is a compelling reason to. If in the future there ends up being more than one acceptable redundant attribute or variable name, names will be added to a database of synonyms and working synthesis products will be updated given the most current standards.

Study Level Metadata

Return to Top

Study-level information is essential for formatting the Ecological Metadata Language, and is a great way for you to express your project’s history, context, and originality.

Study Information

Return to Top

Please provide some custom text for your study.

Study Information
attribute name definition data type format, unit or codes
study_id Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year spearated by underscores. character  
one_liner If this is data the CCRCN curates, the submitter should include a one line description of the study. character  
study_code If this is data the CCRCN curates, the study will be assigned a 128-bit universal unique identifier. Submitters should only include this if it already exists for the data. Otherwise CCRCN personnel will generate this as part of the data ingestion process. character  
study_start_date Study start date. Date YYYY-MM-DD
study_end_date Study end date. Date YYYY-MM-DD
title If this is data the CCRCN curates, the submitter should include a study title. If this is data the CCRCN is ingesting, this can be pulled from the metadata or source text. character  
abstract If this is data the CCRCN curates, the submitter should include a one paragraph description of the study. If this is data the CCRCN is ingesting, this can be pulled from the metadata or source text. character  

Keywords

Return to Top

Keywords are not necessary, but can help make your data more searchable in a database.

Keywords
attribute name definition data type format, unit or codes
study_id Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year spearated by underscores. character  
key_words If this is data the CCRCN curates, the submitter should include five to fifteen descriptive words or phrases describing the study. If this is data the CCRCN is ingesting, this can be pulled from the metadata or source. Keywords help build some search functionality into the databases. character  

Authors

Return to Top

For each dataset at least one corresponding author should be specified. Specifying author names will allow users (or you in the future) to query the dataset and see how many cores you’ve submitted.

Authors
attribute name definition data type format, unit or codes
study_id Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. character  
last_name Submitter’s family name. character  
given_name Submitter’s first name, middle name, middle initial, or any other names. character  
institution Submitter’s current institution. character  
email Submitter’s current email address. character  
address Submitter’s current mailing address. character  
phone Submitter’s current phone number. character  
corresponding_author TRUE or FALSE indicating whether the author should be contacted as the corresponding author. factor TRUE = The author should be contacted with any further questions. FALSE = The author should not be contacted with any further questions.

Funding Sources

Return to Top

Your funders will love being acknowledged in a data release, and will appreciate being searchable in the database. One dataset can have multiple funding sources.

Funding
attribute name definition data type format, unit or codes
study_id Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. character  
funding_agency Agency name funding the research, spelled out, no acronyms. character  
funding_id Code used by the agency to track the project funding. character  
funding_notes Any other submitter-generated notes about the project funding. character  

Associated Publications

Return to Top

One dataset can be affiliated with multiple publications. This allows an original work to be cited as a primary source, as well as any secondary or synthesis papers that added value to the dataset’s archival. Submitters can simply add a bibtex style citation, such as one copied over from Google Scholar, or they can fill out all of the relevant attributes for the data release. It’s all the same to us. Much of this guidance came from the Wikipedia page for BibTeX.

Associated Publications
attribute name definition data type format, unit or codes
study_id Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. character  
bibtex_citation Submitters can associate multiple BibTeX style citations with the dataset. They can also include the same information by filling out following attributes in tabular form if more convenient than BibTeX formatting. character  
publication_type Code indicating the type of publication the study originates from. factor article = Journal article. book = Book. mastersthesis = Master’s Thesis. misc = Miscellaneous publications such as online datasets. phdthesis = PhD thesis or dissertation. techreport = Technical report. unpublished = Unpublished source.
author The names of the author separated by “and”. character  
year The year of publication or, if unpublished, the year of creation. Date YYYY
title The title of the work. character  
journal The journal or magazine the work was published in. character  
volume The volume of a journal or multi-volume book. character  
number The “(issue) number” of a journal, magazine, or tech-report, if applicable. (Most publications have a “volume”, but no “number” field.). character  
pages Page numbers, separated either by commas or double-hyphens. character  
url Permanent web address where the work can be located. character  
doi Digital object identifier associated with the work. character  
address Publisher’s address (usually just the city, but can be the full address for lesser-known publishers). character  
annote An annotation for annotated bibliography styles (not typical). character  
booktitle The title of the book, if only part of it is being cited. character  
chapter The chapter number. character  
crossref The key of the cross-referenced entry. character  
edition The edition of a book, long form (such as “First” or “Second”). character  
editor The name(s) of the editor(s). character  
howpublished How it was published, if the publishing method is nonstandard. character  
institution The institution that was involved in the publishing, but not necessarily the publisher. character  
key A hidden field used for specifying or overriding the alphabetical order of entries (when the “author” and “editor” fields are missing). Note that this is very different from the key (mentioned just after this list) that is used to cite or cross-reference the entry. character  
month The month of publication (or, if unpublished, the month of creation). Date MM
note Miscellaneous extra information. character  
organization The conference sponsor. character  
publisher The publisher’s name. character  
school The school where the thesis was written. character  
series The series of books the book was published in (e.g. “The Hardy Boys” or “Lecture Notes in Computer Science”). character  
type The field overriding the default type of publication (e.g. “Research Note” for techreport, “{PhD} dissertation” for phdthesis, “Section” for inbook/incollection). character  

Materials and Methods

Return to Top

For each study please fill out key data regarding materials and methods that are important to the soil carbon stocks meta-analysis. Some users may want to include or exclude certain methodologies, or see your commentary on the methods. Let’s make it easy for them.

Materials and Methods
attribute name definition data type format, unit or codes
study_id Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. character  
coring_method Code indicating what type of device was used to collect soil depth profiles. factor gouge auger = A half cylinder coring device in which the coring section is open, not sealed off by a fin. hargas corer = A large diameter (>10 cm) coring device consisting of a tube, piston, and a cutting head. mcauley corer = A half cylinder coring device with the coring section sealed off by a fin attached to a rotating pivot point. mccaffrey peat cutter = U-shaped blade that extracts a core by cutting down through peat. none specified = No coring device was specified. other shallow corer = Any other type of coring device typically taking cores shallower than 30 centimeters. piston corer = A device that extrudes core into tube upward with a plunger. push core = Any number of coring types involving driving a tube into the sediment to recover a core. pvc and hammer = PVC pipe was driven into the sediment with a hammer to recover a core. russian corer = A half cylinder coring device with the coring section sealed off by a fin attached to a rotating pivot point. vibracore = A technique involving collecting a core by sinking a continuous pipe into sediment attaching a source of vibration, then recovering using a winch and pulley. surface sample = A technique involving collecting a core shallower than ~5 cm using a circular metal cutter.
roots_flag Code indicating whether live roots were included or excluded from carbon assessments. factor roots and rhizomes included = Roots and rhizomes were included in dry bulk density and or organic matter and carbon measurements. roots and rhizomes separated = Roots and rhizomes were separated from soil before dry bulk density and or organic matter and carbon measurements.
sediment_sieved_flag Code indicating whether or not sediment was sieved prior to carbon measurements. factor sediment sieved = Sediment was sieved prior to analysis for organics. sediment not sieved = Sediment was not sieved prior to analysis for organics.
sediment_sieve_size If sediment was sieved, the size of sieve used. numeric millimeters
compaction_flag Code indicating how the authors qualified or quantified compaction of the core. factor compaction qualified = Compaction was at least qualified and noted by the authors. compaction quantified = Compaction was quantified and corrected for in core based measurements. corer limits compaction = Authors specified that the coring device’s design minimized compaction. no obvious compaction = Authors observed no obvious compaction. not specified = Compaction was not specified.
dry_bulk_density_temperature Temperature at which samples were dried to measure dry bulk density. This can include either samples that were freeze dried or oven dried. numeric celsius
dry_bulk_density_time Time over which samples were dried to measure dry bulk density. numeric hour
dry_bulk_density_sample_volume Sample volume used for bulk density measurements, if held constant. numeric cubicCentimeters
dry_bulk_density_sample_mass Sample mass used for bulk density measurements, if held constant. numeric grams
dry_bulk_density_flag Any notable codes regarding how the authors quantified dry bulk density. factor air dried to constant mass = Methodology specified that samples were air dried to a constant mass. modeled = Bulk density was not measured, but was modeled from loss on ignition and assumptions about the particle densities of organic and inorganic matter. freeze dried = Bulk density was measured on freeze dried samples. not specified = No additional details regarding bulk density methodology were provided. removed non structural water = Bulk density methodology did not specify drying temperature or length, only that non-strucural water was removed. time approximate = Bulk density time recorded herin is an approximate estimate. to constant mass = Bulk density methodology did not specify drying temperature or length, only that samples were dried to a constant mass.
loss_on_ignition_temperature Temperature at which samples were combusted to estimate fraction organic matter. numeric celsius
loss_on_ignition_time Time over which samples were combusted to estimate fraction organic matter. numeric hour
loss_on_ignition_sample_volume Sample volume used for loss on ignition, if held constant. numeric cubicCentimeters
loss_on_ignition_sample_mass Sample mass used for loss on ignition, if held constant. numeric grams
loss_on_ignition_flag Common codes regarding loss on ignition methodology. factor time approximate = Loss on ignition time recorded herein is an approximate estimate. not specified = No additional details regarding loss on ignition methodology or time were provided.
carbon_measured_or_modeled Code indicating whether fraction carbon was measured or estimated as a function of organic matter. factor measured = Fraction carbon was measured as opposed to modeled. modeled = Fraction carbon was modeled as opposed to measured.
carbonates_removed Whether or not carbonates were removed prior to calculating fraction organic carbon. factor FALSE = Carbonates were not removed before measuring organic carbon. TRUE = Carbonates were removed before measuring organic carbon.
carbonate_removal_method The method used to remove carbonates prior to measuring fraction carbon. factor direct acid treatment = Carbonates were removed using direct application of dilute acid. acid fumigation = Carbonates were removed by fumigating with concentrated acid. low carbonate soil = Organic carbon fraction was measured without removing carbonates assuming carbonate content of the soil type was minimal. carbonates not removed = Carbonates were not removed and low carbonate soil was not specified. none specified = Carbonate removal methodology was not specified.
fraction_carbon_method Code indicating the method for which fraction carbon was measured or modeled (Note: regression based models are permitted, but the use of the Bemmelen factor [0.58 gOC gOM-1] is discouraged). factor Craft regression = Used regression model from Craft et al., 1991, Estuaries, to predict fraction carbon as a function of fraction organic matter. EA = Each sample presented was measured using Elemental Analysis. Fourqurean regression = Used regression model from Fourqurean et al., 2012, Nature Geoscience, to predict fraction carbon as a function of fraction organic matter. Holmquist regression = Used regression model from Holmquist et al., 2018, Scientific Reports, to predict fraction carbon as a function of fraction organic matter. kjeldahl digestion = Each sample was measured kjeldahl digestion method. local regression = A regression model fit using a subset of measurements was used to predict fraction carbon as a function of fraction organic matter. not specified = No additional details were provided regarding fraction carbon methodologies. wet oxidation = Each sample was measured using a wet oxidation method.
fraction_carbon_type Code indicating whether fraction_carbon refers to organic or total carbon. factor organic carbon = Author specified that fraction carbon measurements were of organic carbon. total carbon = Author specified that fraction carbon measurements were of total carbon.
carbon_profile_notes Any other submitter defined notes describing methodologies for determining dry bulk density, organic matter fraction, and carbon fraction. character  
cs137_counting_method Code indicating the method used for determining radiocesium activity. factor alpha = Alpha counting method used. gamma = Gamma counting method used.
pb210_counting_method Code indicating the method used for determining lead 210 activity. factor alpha = Alpha counting method used. gamma = Gamma counting method used.
excess_pb210_rate Code indicating the mass or accretion rate used in the excess_pb_210_model factor mass accumulation = Excess 210Pb modeled using mass accumulation rate. accretion = Excess 210Pb modeled using vertical accretion rate.
excess_pb210_model Code indicating the model used to estimate excess lead 210. factor CRS = Constant rate of supply model used. CIC = Constant initial concentration model used. CFCS = Constant flux constant sedimentation model used.
ra226_assumption Code indicating the assumption used to estimate the core’s background 226Ra levels. factor each sample = 226Ra was measured for each sample. total core = 226Ra was measured for the total core, at asymptote = asy
c14_counting_method Code indicating the method used for determining radiocarbon activity. factor AMS = Accelerator mass spectroscopy used. beta = Beta counting used.
dating_notes Any submitter defined notes elaborating on the process of dating the core not yet made clear by the coding. character  
age_depth_model_reference Code indicating the reference or 0 year of the age depth model. factor YBP = Year zero is defined as years before present, 1960 CE. CE = Year zero is set according to Common Era and Before Common Era standards. core collection date = Year zero is set as the core’s collection year.
age_depth_model_notes Any submitter defined notes on how the age depth model was created. character  

Site Level

Return to Top

Site information provides important context for your study. You should describe the site and how it fits into your broader study, provide geographic information (although this can be generated automatically from the cores as well), and add any relevant tags and notes regarding site vegetation and inundation. Vegetation and inundation can alternatively be incorporated into the core-level data, whatever makes the most sense for your study design.

Site Information
attribute name definition data type format, unit or codes
study_id Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. character  
site_id Site identification code unique to each study. character  
site_description Site description including relevant study details and political geographic units. Some of these descriptions can be automated by the ingestion code. character  
site_latitude_max Maximum latitude defining a bounding box for the site in decimal degree World Geodedic System of 1984 (WGS84). This can also be generated automatically by the ingestion code. numeric degree
site_latitude_min Minimum latitude defining a bounding box for the site in decimal degree WGS84. This can also be generated automatically by the ingestion code. numeric degree
site_longitude_max Maximum longitude defining a bounding box for the site in decimal degree WGS84. This can also be generated automatically by the ingestion code. numeric degree
site_longitude_min Minimum longitude defining a bounding box for the site in decimal degree WGS84. This can also be generated automatically by the ingestion code. numeric degree
site_boundaries As an alternative to submitting or automatically generating a bounding box, submitters can include a shapefile (.shp) or keyhole markup language (.kml) documenting the geographic boundaries of the site. This can be converted to and stored in well known text (WTK) format. character  
salinity_class Code based on submitter field observation or measurement indicating average annual salinity (Note: Palustrine and freshwater should only include tidal wetlands, or wetlands that are potentially/formerly tidal but artificially freshened due to artificial tidal restrictions). factor estuarine C-CAP = 5-35 parts per thousand salinity (ppt) according to the coastal change analysis program. palustrine C-CAP = < 5 ppt according to the coastal change analysis program. estuarine = 0.5-35 ppt according to most other definitions. palustrine = < 0.5 ppt according to most other definitions. brine = >50 ppt. saline = 30-50 ppt. brackish = 0.5-30 ppt. fresh = <0.5 ppt. mixoeuhaline = 30-40 ppt. polyhaline = 18-30 ppt. mesohaline = 5-18 ppt. oligohaline = 0.5-5 ppt.
salinity_method Indicate whether salinity_class was determined using a field observation or a measurement. factor field observation = Salinity inferred by field observation such as vegetation. measurement = Salinity observed from local instrument.
salinity_notes Any relevant submitter generated notes on how salinity_class was determined. character  
vegetation_class Code based on submitter field observations or measurement indicating dominant wetland vegetation type. factor emergent = Describes wetlands dominated by persistent emergent vascular plants. scrub shrub = Describes wetlands dominated by woody vegetation <= 5 meters in height. forested = Describes wetlands dominated by woody vegetation > 5 meters in height. forested to shrub = Dominated by forested to scrub/shrub biomass. forested to emergent = Dominated by forest and underlying marsh. seagrass = Describes tidal or subtidal communities dominated by rooted vascular plants.
vegetation_method Indicate whether vegetation_class was determined using a field observation or a measurement factor field observation = Vegetation inferred by field observation. measurement = Vegetation measured by counts or plots.
vegetation_notes Any relevant submitter generated notes on how vegetation_class was were determined character  
inundation_class Code based on submitter field observation or measurement indicating how often the coring location is inundated factor high = Study-specific definition of an elevation relatively high in the tidal frame, typically defined by vegetation type. mid = Study-specific definition of an elevation in the relative middle of the tidal frame, typically defined by vegetation type. low = Study-specific definition of an elevation in relatively low in the tidal frame, typically defined by vegetation type. levee = Study-specific definition of a relatively high elevation zone built up on the edge of a river, creek, or channel. back = Study-specific definition of a relatively low elevation zone behind a levee.
inundation_method Indicate whether inundation_class was determined using a field observation or a measurement factor field observation = Inundation inferred by field observation such as vegetation. measurement = Inundation class assessed from elevation and nearby tide gauge or other similar method.
inundation_notes Any relevant submitter generated notes on how inundation was determined character  

Core Level

Return to Top

Note that positional data can be assigned at the core level, or at the site level. However, it is important that this is specified, that site coordinates are not attributed as core coordinates, and that the method of measurement and precision is noted. Vegetation and inundation can alternatively be incorporated into the site-level data, whatever makes the most sense to your study design. In the future this level of hierarchy will be complemented by a ‘subsite level’ as this level of hierarchy can handle any sublocation information such as vegetation plot, and instrument location/description.

Core- (or Subsite-) Level Information
attribute name definition data type format, unit or codes
study_id

Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores.

factor NA
site_id Site identification code unique to each study. factor NA
core_id Core identification code unique to each site. factor NA
core_date Date of core collection. Date YYYY-MM-DD
core_notes Any other relevant submitter generated notes on how cores were collected. character  
core_latitude Positional latitude of the core in decimal degree WGS84. numeric degree
core_longitude Positional longitude of the core in decimal degree WGS84. numeric degree
core_position_accuracy Accuracy of latitude and longitude measurement, if determined and recorded. numeric meter
core_position_method Code indicating how latitude and longitude were determined. factor

RTK = Real-time kinematic global position system (GPS). handheld = Conventional Commercially available hand-held GPS. other high resolution = Any other technique resulting in positional error < 1 meter. other moderate resolution = Any other technique resulting in positional error < 30 meters. other low resolution = Any other technique resulting in positional error > 30 meters.

core_position_notes Any relevant submitter generated notes on how latitude and longitude were determined. character  
core_elevation Surface elevation of the core relative to defined datum. numeric meters
core_elevation_datum The datum relative to which the core elevation was measured against (For a complete list of datum names and aliases please refer to the ISO Geodedic Registry https://geodetic.isotc211.org/). factor

NAVD88 = A gravity-based geodetic datum, North American Vertical Datum of 1988. MSL = A tidal datum, Mean Sea Level as measured against a local tide gauge. MTL = A tidal datum, Mean Tidal Level as measured against a local tide gauge. MHW = A tidal datum, Mean High Water as measured against a local tide gauge. MHHW = A tidal datum, Mean Higher High Water as measured against a local tide gauge. MHHWS = A tidal datum, Mean Higher High Water for Spring Tides as measured against a local tide gauge. MLW = A tidal datum, Mean Low Water as measured against a local tide gauge. MLLW = A tidal datum, Mean Lower Low Water as measured against a local tide gauge.

core_elevation_accuracy Accuracy of elevation measurement, if determined and recorded numeric meters
core_elevation_method Code indicating how elevation was determined factor

RTK = Real-time kinematic GPS. other high resolution = Any other technique resulting in positional error < 5 cm of random error. LiDAR = Handheld GPS matched to lidar-based digital elevation model. DEM = Handheld GPS matched to another digital elevation model. other low resolution = Any other technique resulting in positional error > 5 cm of random error.

core_elevation_notes Any relevant submitter generated notes on how elevation was determined character  
salinity_class Code based on submitter field observation or measurement indicating average annual salinity (Note: Palustrine and freshwater should only include tidal wetlands, or wetlands that are potentially/formerly tidal but artificially freshened due to artificial tidal restrictions). factor

estuarine C-CAP = 5-35 parts per thousand salinity (ppt) according to the coastal change analysis program. palustrine C-CAP = < 5 ppt according to the coastal change analysis program. estuarine = 0.5-35 ppt according to most other definitions. palustrine = < 0.5 ppt according to most other definitions. brine = >50 ppt. saline = 30-50 ppt. brackish = 0.5-30 ppt. fresh = <0.5 ppt. mixoeuhaline = 30-40 ppt. polyhaline = 18-30 ppt. mesohaline = 5-18 ppt. oligohaline = 0.5-5 ppt.

salinity_method Indicate whether salinity_class was determined using a field observation or a measurement factor

field observation = Salinity inferred by field observation such as vegetation. measurement = Salinity observed from local instrument.

salinity_notes Any relevant submitter generated notes on how salinity_class was determined character  
vegetation_class Code based on submitter field observations or measurement indicating dominant wetland vegetation type. factor

emergent = Describes wetlands dominated by persistent emergent vascular plants. scrub shrub = Describes wetlands dominated by woody vegetation < 5 meters in height. forested = Describes wetlands dominated by woody vegetation > 5 meters in height. seagrass = Describes tidal or subtidal communities dominated by rooted vascular plants.

vegetation_method Indicate whether vegetation_class was determined using a field observation or a measurement factor

field observation = Vegetation inferred by field observation. measurement = Vegetation measured by counts or plots.

vegetation_notes Any relevant submitter generated notes on how vegetation_class and dominant_species were determined. character  
inundation_class Code based on submitter field observation or measurement indicating how often the coring location is inundated. factor

high = Study-specific definition of an elevation relatively high in the

tidal frame, typically defined by vegetation type. mid = Study-specific definition of an elevation in the relative middle of the tidal frame, typically defined by vegetation type. low = Study-specific definition of an elevation in relatively low in the tidal frame, typically defined by vegetation type. levee = Study-specific definition of a relatively high elevation zone built up on the edge of a river, creek, or channel. back = Study-specific definition of a relatively low elevation zone behind a levee.

inundation_method Indicate whether inundation_class was determined using a field observation or a measurement factor field observation = Inundation inferred by field observation such as vegetation. measurement = Inundation class assesed from elevation and nearby tidegauge or other similar method.
inundation_notes Any relevant submitter generated notes on how elevation was determined character  
core_length_flag Indicated whether or not the coring team believes they recovered a full sediment profile, down to bedrock, or other non-marsh interface. factor core depth limited by length of corer = The total depth of the core was limited by the length of the coring device. core depth represents deposit depth = Authors report that the depth of the core represents the depth of the wetland soil deposit. not specified = Authors did not specify whether or not the depth of the core represents the depth of the wetland soil deposit.

Soil Depth Series

Return to Top

This level of hierarchy contains the actual depth series information. At minimum a submission needs to specify minimum and maximum depth increments, dry bulk density, and either fraction organic matter or fraction carbon. Sample ID’s should be used in the case that there are multiple replicates of a measurements. There is plenty of room for recording raw data from various dating techniques as well as age depth models.

Soil Depth Series Information
attribute name definition data type format, unit or codes
study_id Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. character  
site_id Site identification code unique to each study character  
core_id Core identification code unique to each site character  
depth_min Minimum depth of a sampling increment. numeric centimeter
depth_max Maximum depth of a sampling increment. numeric centimeter
sample_id Sample identification unique to the core. This should be used in the case that there are relevant lab specific sample codes, or in the case that there are multiple replicate samples. character  
dry_bulk_density Dry mass per unit volume of a soil sample. This does not include ash free bulk density. numeric gramsPerCubicCentimeter
fraction_organic_matter Mass of organic matter relative to sample dry mass. Ash free bulk density should not be used here but should be expressed as a loss on ignition fraction. numeric dimensionless
fraction_carbon Mass of carbon relative to sample dry mass. numeric dimensionless
compaction_fraction Fraction of the sample depth interval reduced due to compaction. numeric dimensionless
compaction_notes Any submitter generated notes on compaction. character  
cs137_activity Radioactivity counts per unit dry weight for radiocesium (137Cs). numeric becquerelPerKilogram
cs137_activity_sd 1 standard deviation of uncertainty associated with cs137_activity. numeric becquerelPerKilogram
total_pb210_activity Total radioactivity counts per unit dry weight for excess lead 210 (210Pb). numeric becquerelPerKilogram
total_pb210_activity_sd 1 standard deviation of uncertainty associated with total_pb210_activity. numeric becquerelPerKilogram
ra226_activity Total radioactivity counts per unit dry weight for Radium 226 (226Ra) if measured as part of the 210Pb dating process. numeric becquerelPerKilogram
ra226_activity_sd 1 standard deviation of uncertainty associated with ra226_activity. numeric becquerelPerKilogram
excess_pb210_activity Excess radioactivity counts per unit dry weight for excess lead 210 (210Pb). numeric becquerelPerKilogram
excess_pb210_activity_sd 1 standard deviation of uncertainty associated with excess_pb210_activity. numeric becquerelPerKilogram
c14_age Radiocarbon age as estimated from AMS measurements. numeric radiocarbonYear
c14_age_sd Estimated uncertainty in c14_age. numeric radiocarbonYear
c14_material Description of the material selected for radiocarbon (14C) dating. character  
c14_notes Any relevant submitter generated notes on 14C dating process. character  
delta_c13 The isotopic signature of 13C. This is oftentimes measured along with c14_age and can be useful for analyzing carbon lability and provenance. numeric partsPerMillion
be7_activity Radioactivity counts per unit dry weight for 7Be. numeric becquerelPerKilogram
be7_activity_sd Estimated uncertainty in be_7_activity. numeric becquerelPerKilogram
am241_activity Radioactivity counts per unit dry weight for 241Am. numeric becquerelPerKilogram
am241_activity_sd Estimated uncertainty in am_241_activity. numeric becquerelPerKilogram
marker_date The age of any other dated depth horizon such as an artificial marker, pollen horizon, pollution horizon, etc. Date YYYY-MM-DD
marker_type Code indicating the type of marker. factor artificial horizon = Horizon was added to the surface artificially by using materials such as feldspar, glitter, or rare earth elements. pollen = Pollen analysis was used to tie horizon to the timing of vegetation change such as the arrival of invasives, or the beginning of local agriculture. pollution = Chemical analysis was used to tie the horizon to the timing of a pollution event. tsunami = Sediment analysis was used to tie the horizon to the timing of a tsunami event.
marker_notes Any other submitter generated notes about the origin of the marker. character  
age Most likely, median, or mean age of the depth interval from submitter generated age depth model. numeric year
age_min Minimum age of the depth interval from submitter generated age depth model. numeric year
age_max Maximum age of the depth interval from submitter generated age depth model. numeric year
age_sd Standard deviation of age estimate from submitter generated age depth model. numeric year
depth_interval_notes Any other submitter generated notes specific to the depth interval. character  

Multiple Special Conditions at the Level of the Site or Core

Return to Top

Because there may be multiple observations or conditions that are part of the study, such as species present, or degradation or restoration activities, that can affect a site or core, these are archived separately.

Dominant Species Present

Return to Top

You can record species codes associated with sites and/or cores. The CCRCN is species code system is derived from the USDA PLANTS Database, and for most taxa, the code consists of the first two letters of genus follow by the first two letters of the species (e.g., "Spartina alterniflora" = "SPAL").

Species Present at Site or Subsite
attribute name definition data type format, unit or codes
study_id Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. character  
site_id Site identification code unique to each study. character  
core_id Core identification code unique to each site. character  
species_code Code associated with a species or a vegetation assemblage. character  

Anthropogenic Impacts Present

Return to Top

You can record various codes associated with degradation or restoration conditions at sites and/or cores.

Anthropogenic Impacts at Site or Subsite
attribute name definition data type format, unit or codes
study_id Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. character  
site_id Site identification code unique to each study. character  
core_id Core identification code unique to each site. character  
impact_class Code indicating any major anthropogenic impacts historically and currently affecting the coring location. factor tidally restricted = Tidal flow is muted or blocked by built structures. impounded = Water level is raised artificially by a tidal restriction, resulting in ponding of water on the wetland and or upland surface. managed impounded = Wetland is impounded seasonally, and other times natural or semi natural hydrology occurs. ditched = Tidal hydrology is altered because artificial ditches have been cut to promote tidal flooding and drainage. diked and drained = The wetland has been diked and drained, with or without flapper gates, pumping, or other means. farmed = Managed impoundment or drainage in which wetland has been converted to agricultural land. tidally restored = Tidal flow has been restored by removing an artificial obstruction. revegetated = Wetland vegetation has been reintroduced by replanting on unvegetated surfaces. invasive plants removed = Natural plant communities have been restored by the active removal of invasive plant species. invasive herbivores removed = Tidal wetland vegetation has been managed by the removal of invasive herbivores. sediment added = Elevation has been managed by artificially adding sediment to the site using techniques such as thin layering or sediment diversion. wetlands built = Constructed wetland using sediments such as dredge spoils or other sediment source.

Submitter Defined Attributes and Definitions

Return to Top

Part of the reason we control these attribute and variable names are so that the dataset does not become unmanageable, and we can deliver products that run cleanly and smoothly to you. However, we know that research is complicated, and not all of the data you want to include can be represented here. As long as it fits within this hierarchy, we allow you to submit user defined attributes.

Study Level Species Table

Return to Top

If species codes or common names are used anywhere in the study, there should be a separate table included defining all names using scientific names. The CCRCN is species code system is derived from the USDA PLANTS Database, and for most taxa, the code consists of the first two letters of genus follow by the first two letters of the species (e.g., "Spartina alterniflora" = "SPAL").

attribute name definition data type format, unit or codes
study_id Unique identifier for the study made up of the first author’s family name, as well as the second author’s family name or ‘et al.’ if more than three, then publication year separated by underscores. character  
species_code Code associated with a species or a vegetation assemblage. character  
genus Genus according to the most up to date classification. character  
species Species according to the most up to date classification. character  
sub_species Any nomenclature referring to subspecies special cases. character  
hybrid Any nomenclature referring to special cases of hybridization. character  
common_name Common name associated with the species, especially if it is referred to in any accompanying text. character  
species_notes Any other submitter defined notes regarding the species. character  

Other Attributes and Variables

Return to Top

Any submitter-defined attributes should be included in a separate table indicating the associated level of hierarchy, attribute name, data type (date, factor, character, or numeric). Attribute names should follow good naming practices: self-descriptive, don’t start with a number or special character, no spaces. Dates should be stored as a character string and should have an accompanying ‘string format’ indicating the position, number of digits and deliminators for the date time. For example June twenty-sixth two-thousand eighteen written as 2018-06-26 would be formatted as ‘YYYY-MM-DD’. Here is a handy dateTime reference. Numeric values should have their units defined. Factors (i.e. categorical variables) should be defined in a separate table.

level of hierarchy attribute name description data Type format, unit
ex. site level or core level (your column name here. [use good naming conventions]) (describe your attribute here.) Date, factor, character, or numeric (extra necessary info here)

Variable names, like attribute names, should be self-descriptive. Such as ‘experimental’ or ‘control’ as opposed to ‘1’ and ‘2’.

level of hierarchy attribute name categorical variable name description
ex. site level or core level (parent column name here) (your variable name here.) (describe your variable)

That’s It

You now know everything there is to know about soil carbon data management.

Return to Top