Data Life Cycle
Planning the Research
- What data will be collected?
- What format will the data be in?
- How long should the data be stored?
- Is there potential for the data to be re-used in other inquiries?
- How large will the datasets be?
- Who owns the data?
Create a Data Management Plan
- What metadata or standardized tags will you use?
- How will you share the data while your research is in progress?
- What documentation is needed to keep the data accessible throughout the project and after?
Collect Data and Documentation
Back up data and documentation in at least three places, e.g. hard drive, thumb drive, and web space
- Back up data and documentation
- Leave your original data intact using copies to perform analyses
- Include algorithms, formulae, methods in your documentation (use a scripting software such as R to document your analyses)
Prepare Data For Sharing
- Datasets should be in file formats compatible with repository support
- Metadata (tags) added to enable discovery
Archiving and Preservation
- Add to metadata, include published research associated with data
- Complete forms for depositing data in repository
Open Access to Data
"Science is based on building on, reusing and openly criticising the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open."
Defining Research Data
- United States Circular No. A-110
The U.S. Federal Government's Office of Management and Budget Circular A-110 (36.d.2.i Property Standards; Intangible property; definition) states:
Research data is defined as the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This "recorded" material excludes physical objects (e.g., laboratory samples). Research data also do not include:
- Trade secrets, commercial information, materials necessary to be held confidential by a researcher until they are published, or similar information which is protected under law; and
- Personnel and medical information and similar information the disclosure of which would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study.
- National Institutes of Health (NIH) Data Sharing Policy
Definition of Final Research Data
Recorded factual material commonly accepted in the scientific community as necessary to document and support research findings. This does not mean summary statistics or tables; rather, it means the data on which summary statistics and tables are based. For the purposes of this policy, final research data do not include laboratory notebooks, partial datasets, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as gels or laboratory specimens. NIH has separate guidance on the sharing of research resources, which can be found at NIHGPS
- National Science Foundation (NSF)
Sharing Data 38.a
NSF expects significant findings from research and education activities it supports to be promptly submitted for publication, with authorship that accurately reflects the contributions of those involved. It expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work. It also encourages grantees to share software and inventions or otherwise act to make the innovations they embody widely useful and usable.
About this guide
This work by Sara Rutter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.