Tuesday 20 October, 3:00 - 4:00, Library 128

In this session we explore the short term data storage, backup, and security protocols needed in effective data management before transitioning into the long-term archiving and preservation needed for repository storage. We will also learn how to use a tool to assist with managing our documents and files across collaborators.

Data Storage, Backup, Security / Repositories, Archiving, Preservation

How you store, backup, and secure your research data is the part of data management that first comes to most people's minds when they think of the concept. Securely stored and regularly backed-up data is a step that will save you a lot of time and hassle long-term if something goes wrong. The final part of the data management puzzle is what you do with your data after the results are published; the archiving and preservation in a repository. There are many repositories out there, and knowing how to prepare your data sets for long-term preservation in these places will ensure your research data can be used as much as possible.


The second tool is a Version Control System. Linked here is Bitbucket, a repository for Git and Mercurial (and a few others) files. Git and Mercurial are the two biggest names in VCS tools. Bitbucket can import your work from the very popular GitHub site which some of you may be familiar with; you do not need an account on either site to use Git or Mercurial. In general, Git is a more flexible tool but its command structure is a bit archaic and complicated, while Mercurial is fairly user-friendly but less adaptable to different functions.

What will we do today?

Today's class will be a lecture mixed between a PowerPoint slideshow and using Git for file management and version control. We're using Git instead of Mercurial for this class as Git seems to be a more popular global choice, despite there being a strong user-base for Mercurial here. The lecture notes will cover everything you need to know about the management of your data in the short-term during your research and the long term after publication.

The Git repository we will use can be found here. You don't need a GitHub account to view it, and if you prefer to use Bitbucket it is very simple to port it over to that site.

Lecture Outline

Lecture materials from Powerpoint, Module 4 (Painter)
Lecture materials from Powerpoint, Module 7 (Winiarz)
Hands-on demonstration of Git (Painter)

Slides and Materials

New England Collaborative Data Management Curriculum (revised) - Modules 4 & 7