How to Cite a Data SetData citation is a developing practice. Please post feedback on the Discussion Forum. Updated: 19 June 2008
By encouraging proper citation of data sets, data providers and publishers receive appropriate credit for their efforts, the perception of data management as a discipline improves, and it is easier to track the use and impact of the data. In scientific publication, merely acknowledging the data set in the text or in the acknowledgments section is insufficient. These guidelines can help data users develop appropriate citations for data used in their publications, and can help data managers recommend appropriate citation of their holdings. These guidelines were adapted from internal guidelines used by the National Snow and Ice Data Center, which has encouraged formal data citation for more than a decade. In general, data sets should be cited like books. Used here is the author-date system described in Chicago Manual of Style, 15th Edition. When users cite data, they need to use the style dictated by their publishers, but by providing an example, data publishers can give users all the important elements they should include in their citations of data sets. An example of a citation in the author-date system is: Algire, G. H., and F. T. Legallais. 1948. Biology of Melanomas. ed. R. W. Miner. New York: New York Academy of Sciences. As seen in this example, the elements of the citation in order are: Author(s). Date. Title. Editor. Place of Publication. Publisher. All these elements are common in data set citations, but other elements, as described below, are commonly used as well. Data publishers (e.g. data centers) have a responsibility to work with data providers and science teams to develop the actual content of the citation. Citation ContentThe citation should include the following elements as appropriate. Although this is shown as a literary citation, most of the elements are captured in standard metadata. A mapping to the "Citation Information" section of the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) (FGDC-STD-001-1998) is indicated:
Author or InvestigatorThis is the individual(s) whose intellectual work, such as a particular field experiment or algorithm, led to the creation of the data set. Oberbauer, S. 2000. Ecosystem carbon fluxes, Toolik Lake, Alaska 1995. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/arcss006.html. A particular group or organization may sometimes be the author. Arctic Climatology Project. 2000. Environmental Working Group Arctic meteorology and climate atlas. Edited by F. Fetterer and V. Radionov. Boulder, Colorado USA: National Snow and Ice Data Center. CD-ROM. If the data set is a collection of several smaller, independent data sets, the individual data sets would have their own specific citations with author, but the whole collection would not have an author. The collection would likely have an editor or compiler, though. Cross, M. compiler. 1997. Greenland summit ice cores. Boulder, Colorado USA: National Snow and Ice Data Center in association with the World Data Center A for Paleoclimatology at NOAA-NGDC, and the Institute of Arctic and Alpine Research. CD-ROM. Publication DateFor a completed data set, the publication date is simply the year of release. Helmig, D. 2004 Vertical Boundary Layer Profiles for Ozone and Meteorological Parameters at Summit, Greenland, 2000. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/arcss100.html. For a data set that is updated infrequently or on an irregular basis, list the first year of publication followed by "updated" with the current update information. This is appropriate when the title or version of the data set does not change, the data are simply updated. Osterkamp, T. 1999, updated 2001 Daily air and active layer temperatures from permafrost observatories in Alaska, 1986-2001. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/arcss106.html. For an ongoing data set that is updated on a regular or continual basis, list the first year of publication followed by the last update. Updates could occur annually or more frequently. Maslanik, J. and J. Stroeve. 1999, updated quarterly. DMSP SSM/I daily polar gridded brightness temperatures, Jan. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0001.html. Hall, Dorothy K., George A. Riggs, and Vincent V. Salomonson. 2007, updated daily. MODIS/Aqua Sea Ice Extent 5-Min L2 swath 1km V005, Oct. 2007–Apr. 2008. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/myd29v5.html. A note on updates vs. new versions: Ongoing updates to a time series do change the content of the data set, but they do not typically constitute a new version or edition of a data set. New versions typically reflect changes in sampling protocols, algorithms, quality control processes, etc. Both a new version and an update may be reflected in the publication date. The title should indicate the new version.
If a particular version of a time series is discontinued, it is appropriate to indicate when the final update occurred.
TitleThis is the formal title of the data set. It may also include version or edition information. Liu, H., K. Jezek, B. Li, and Z. Zhao. 2001. Radarsat Antarctic Mapping Project digital elevation model version 2. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0082.html. Dates UsedFor time series, especially continually updated time series, indicate which dates of data were used. Note this is distinct from the publication date.
Editor or CompilerAn editor is the person or team who is responsible for creating a value-added and possibly quality-controlled product from the data. In cases where there is minimal scientific or technical input, yet still substantial effort in compiling the product, the person may be more correctly cited as a compiler. Editors and compilers may often be responsible for a larger work that includes an individual author's data set. Occasionally, there may be both a compiler and editor. Some products will have neither. Armstrong, R., J. Francis, J. Key, J. Maslanik, T. Scambos, and A. Schweiger. 1998. Polar Pathfinder sampler: Combined AVHRR, SMMR-SSM/I, and TOVS time series and full-resolution samples. Compiled by S. Khalsa. Boulder, CO, USA: National Snow and Ice Data Center. CD-ROM. Cline, D., R. Armstrong, R. Davis, K. Elder, and G. Liston. 2002, Updated July 2004. CLPX-Ground: ISA snow pit measurements. Edited by M. Parsons and M. J. Brodzik. Boulder, CO: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0176.html. Bockheim, J. 2003. "University of Wisconsin Antarctic Soils Database". In International Permafrost Association Standing Committee on Data Information and Communication (comp.). 2003. Circumpolar Active-Layer Permafrost System, Version 2.0. Edited by M. Parsons and T. Zhang. Boulder, CO: National Snow and Ice Data Center/World Data Center for Glaciology. CD-ROM. When there is an editor or compiler but no author, the editor is listed first. Publication PlaceThis is the city, state (when necessary), and country of the publisher. Cavalieri, D., C. Parkinson, P. Gloersen, and H. J. Zwally. 1996, updated 2006. Sea ice concentrations from Nimbus-7 SMMR and DMSP SSM/I passive microwave data, March 2002–Sept. 2003. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0051.html. PublisherThe publisher is whoever published the data set. A publisher often has an implied responsibility for stewardship of the data set. This is usually a data center and is written immediately after the place. Cavalieri, D., C. Parkinson, P. Gloersen, and H. J. Zwally. 1996, updated 2006. Sea ice concentrations from Nimbus-7 SMMR and DMSP SSM/I passive microwave data, March 2002–Sept. 2003. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0051.html. Distributor or Associate PublisherThis field should be used only when it differs from the publisher, i.e. rarely. Its listing should be written in the same manner as that of publisher. Sometimes NSIDC acts as a simple distributor; sometimes we are an associate publisher; sometimes others are associate publishers. Environmental Working Group. 2000. Environmental Working Group: Joint U.S.-Russian Arctic sea ice atlas. Ann Arbor, MI: Environmental Research Institute of Michigan; distributed by the National Snow and Ice Data Center. CD-ROM. Cross, M. compiler. 1997. Greenland summit ice cores. Boulder, CO: National Snow and Ice Data Center in association with the World Data Center A for Paleoclimatology at NOAA-NGDC, and the Institute of Arctic and Alpine Research. CD-ROM. Distribution Medium and LocationIf there is one fixed medium, list it. For example, CD-ROM, DVD.
If data are available over the internet or through multiple digital media options it is best to include a reference to the location of the data. Often this is through a standard URL. Cavalieri, D., C. Parkinson, P. Gloersen, and H. J. Zwally. 1996, updated 2006. Sea ice concentrations from Nimbus-7 SMMR and DMSP SSM/I passive microwave data, March 2002–Sept. 2003. Boulder, Colorado USA: National Snow and Ice Data Center. Data set accessed 2008-05-14 at http://nsidc.org/data/nsidc-0051.html. Ideally, a persistent identifier such as a Digital Object Identifier should be used. König-Langlo, Gert and Hatwig Gernandt. 2006. Compilation of radiosonde data from the Antarctic Georg-Forster station of the German Democratic Republic from 1985 to 1992. Bremerhaven, Germany: Alfred Wegener Institute for Polar and Marine Research Data set accessed 2008-05-22. doi:10.1594/PANGAEA.547983 Access DateBecause data can be dynamic and changeable in ways that are not always reflected in publication dates and versions, it is important to indicate when on-line data were accessed. It is not necessary to indicate an access date for a fixed medium like a DVD.
Data Within a Larger WorkA particular data set may be part of a compilation, in which case it is appropriate to cite the data set somewhat like a chapter in an edited volume.
Increasingly, publishers are allowing data supplements to be published along with peer-reviewed research papers. When using the data supplement one need only cite the parent reference. For example, when using the data at doi:10.1594/PANGAEA.476007, the following reference is appropriate.
|
|||||||||||||||||||||||