Wednesday, September 14, 2011

Lab habits + data management

The reason I had been looking for that Sydney Harris cartoon is that I was putting together a guest lecture for our university's "Responsible Conduct of Research" course. I was speaking today about data management and retention, a topic I've come to know well over the last year through some university service work working on policies in that area. After speaking, it occurred to me that it's not a bad idea to summarize important points on this for the benefit of student readers of this blog.  In brief:
  • Everything is data.  Not just raw numbers or images, but also the final analyzed graphs, the software used to do the analysis, the descriptions of the instrument settings used to acquire the raw numbers - everything.
  • The data are the science.  The data are the foundation for all the analysis, model-building, papers, arguments, further refinements, patents, etc.  Protect the data!
  • If you didn't document it, you didn't do it.
  • Write down everything.  Fill up notebooks.  Annotate liberally, including false starts, what you were thinking when you set up the little sub-experiments or trials that go into any major research endeavor.  I guarantee, you will never, ever in your life look back and say, "I regret that I was so thorough, and I wish I had written down less."  After years of observation, I am convinced that good notebook skills genuinely reduce mean time to thesis completion in many cases.  If you actually keep track of what you've been doing, and really write down your logic, you are less likely to go down blind alleys or have to repeat mistakes.
  • You may think that you own your data.  You don't, technically.  In an academic setting, the university has legal title to the data (that gives them the legal authority that they need to adjudicate disputes about access to data, including those that arise in the rare but unfortunate cases of research misconduct), while investigators are shepherds or custodians of the data.  Both have their own responsibilities and rights.  Some of those responsibilities are inherent in good science and engineering (e.g., the duty to do your best to make sure that the published results are accurate and correct, as much as possible), and others are imposed externally (e.g., federal funding agencies require preservation of data for some number of years beyond the end of an award).
  • Back everything up.  In multiple ways.  With the advent of scanners, digital cameras, cheap external hard drives, laptops, thumbdrives, "the cloud" (as long as it's better than this), etc., there is absolutely no excuse for not properly backing up data.  To repeat, back everything up.  No, seriously.  Have a backup copy at an off-site location, as a sensible precaution against disaster (fire, hurricane, earthquake, zombie apocalypse).
  • Good habits are habits, and must be habituated.  It took me more than 25 years to get in the habit of really flossing.  Do yourself a favor, and get in the habit of properly caring for your data.  Please.


Chris said...

I find myself taking more notes electronically so I can have ease of search, backup, embedding of plots/tables, LaTeX, complete edit history of the page, etc... Also, my handwriting is atrocious =) and this makes it MUCH more readable.

Currently, the best solution have found is to use an internal wiki for note taking, but I could imagine using something like blogger as well. Do you or any of your colleagues prefer to take notes electronically? Do you know of any good software for taking notes electronically?


MisterBee said...


Have you tried OneNote? I use it on a tablet (Asus EP121) and it has changed the way I work.

Doug, do you insist your students keep a traditional research log-book (cloth bound paper) or have you moved to electronic research notes?

Douglas Natelson said...

Chris, MisterBee - I'm happy for students to take notes electronically, so long as they're (1) complete, (2) really preserved in a useful format. I don't want to have to dig up some weird proprietary reader software five years from now. In general, though, I do still prefer real, physical paper notebooks in general.

Onenote + a serious tablet looks interesting, and seems to be approaching what I'd like to have. What format does it use to store everything? I'd love to have everything be pdf by default, not some annoying, proprietary Microsoft format that will fail to be backward compatible. Also, for the cost of a good tablet pc today, i can get a lot of nice notebooks. This does look like the eventual trend, though: inexpensive, reliable tablets, styluses, and good archival storage formats would be fine with me.

Anonymous said...

"I regret that I was so thorough, and I wish I had written down less."

Spend enough time writing CYA documentation in the oil industry and you will, about when you're doing 18 hr days for weeks on end because of it. At some point one has to accept that there is a large and unavoidable element of chance in life.

DanM said...

I find myself curiously drawn to the idea of my data being lost due to zombie apocalypse. Or at least to the idea of actually using that excuse when we get audited.

Ruby said...

This is a great read! Whatever data you’re about to deal with, it’s important that you document it, if it’s badly needed. For all important data, you should be aware to have a backup so that if you lose your first copy, you can instantly have another copy when needed. I hope that more data managers can read this, and not just students, so that the tips can help them in managing their data. :)

Ruby Badcoe