Skip to Main Content

Data Management Plans: What Data?

Creating a data management plan for access, sharing, and preservation

Data Sharing and Management Snafu in 3 Short Acts

Data Life Cycle

Planning the Research

  • What data will be collected?
  • What format will the data be in?
  • How long should the data be stored?
  • Is there potential for the data to be re-used in other inquiries?
  • How large will the datasets be?
  • Who owns the data?

Create a Data Management Plan

  • What metadata or standardized tags will you use?
  • How will you share the data while your research is in progress?
  • What documentation is needed to keep the data accessible throughout the project and after?

Collect Data and Documentation

Back up data and documentation in at least three places, e.g. hard drive, thumb drive, and web space

Analyze data

  • Back up data and documentation
  • Leave your original data intact using copies to perform analyses
  • Include algorithms, formulae, methods in your documentation (use a scripting software such as R to document your analyses)

Prepare Data For Sharing

  • Datasets should be in file formats compatible with repository support
  • Metadata (tags) added to enable discovery

Archiving and Preservation

  • Add to metadata, include published research associated with data

Deposit Data

  • Complete forms for depositing data in repository

Open Access to Data

Panton Principles [launched February 2010 at the Panton Arms on Panton Street in Cambridge, UK]

"Science is based on building on, reusing and openly criticising the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open."

Defining Research Data

  • United States Circular No. A-110

    The U.S. Federal Government's Office of Management and Budget Circular A-110 (36.d.2.i Property Standards; Intangible property; definition) states:

    Research data is defined as the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This "recorded" material excludes physical objects (e.g., laboratory samples). Research data also do not include:


    • Trade secrets, commercial information, materials necessary to be held confidential by a researcher until they are published, or similar information which is protected under law; and
    • Personnel and medical information and similar information the disclosure of which would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study.
  • National Institutes of Health (NIH) Data Sharing Policy

    Scientific Data is data commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings. It includes any data needed to validate and replicate research findings. For the purposes of this policy, scientific data does not include laboratory notebooks, preliminary analyses, completed case report forms, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as gels or laboratory specimens.

  • National Science Foundation (NSF) Sharing Data 38.a

    NSF expects significant findings from research and education activities it supports to be promptly submitted for publication, with authorship that accurately reflects the contributions of those involved. It expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work. It also encourages grantees to share software and inventions or otherwise act to make the innovations they embody widely useful and usable.

TED Talk by Tim Berners-Lee

Tim Berners-Lee on the Next Web

A 16 minute talk by Berners-Lee, the father of hypertext markup language, about open linked datasets on the web.

Need More Help?

Ask a Science Librarian

SciTech Reference Desk
Mon-Fri, 9 am-3 pm
Phone: (808) 956-8263

All Library Hours

Classroom Calendars

Make a Purchase Suggestion