University of Hawaii at Manoa LibraryLibrary CatalogResearch ToolsAsk Us Skip to main content

Geology & Geophysics: Managing Data

Data Repositories

Purdue University Libraries has a current list of data repositories, searchable by keyword or by browsing, see

Planning the Research

  • What data will be collected?
  • What format will the data be in?
  • How long should the data be stored?
  • Is there potential for the data to be re-used in other inquiries?
  • How large will the datasets be?
  • Who owns the data?

Create a Data Management Plan

  • What metadata or standardized tags will you use?
  • How will you share the data while your research is in progress?
  • What documentation is needed to keep the data accessible throughout the project and after?

Collect Data and Documentation

Back up data and documentation in at least three places, e.g. hard drive, thumb drive, and web space

Analyze data

  • Back up data and documentation
  • Leave your original data intact using copies to perform analyses
  • Include algorithms, formulae, methods in your documentation (use a scripting software such as R to document your analyses)

Prepare Data For Sharing

  • Datasets should be in file formats compatible with repository support
  • Metadata (tags) added to enable discovery

Archiving and Preservation

  • Add to metadata, include published research associated with data

Deposit Data

  • Complete forms for depositing data in repository

Tim Berners-Lee on the Next Web

A 16 minute talk by Berners-Lee, the father of hypertext markup language, about open linked datasets on the web.

Linked Data on Wikipedia

Linked Data has references and information about sharing datasets on the web by providing University Resource Identifiers for the files.

Citeable Data

The OECD recently (rev. 2010 Feb.) published a white paper about making their datasets citeable. Noting that much of their data are cited without granularity so that the reader cannot easily find the data upon which the analyses and inferences have been made. See for the full report.

DataCite,, is an international collaboration of university libraries and information centers that is supporting making datasets discoverable and citeable. Digital Object Identifiers (DOI) are assigned to data by DOI Registration Agencies. DataCite and CrossRef are two DOI registrants.