Scientific archives workshop 2018

I attended the Second Workshop on Scientific Archives held at the Carnegie Institution for Science, Washington, D.C. on the 13 & 14 August 2018.

The first Workshop on Scientific Archives was held at EMBL in 2016, and was organised entirely by Anne-Flore Laloe, the archivist at EMBL. It was (I think) the first time that archivists working in the scientific area had come together internationally to exchange experiences. I attended it and gave a paper (even though I’m not an archivist). After that first workshop a small international committee was formed (CAST – Committee on Contemporary Archives in Science and Technology). This committee planned the 2018 workshop which featured a good range of topics and attracted about 40 attendees.

The complete programme is here. I learnt something from most papers, but some stood out for me.

Data Management Plans and reasons for keeping data

Jean Deken, SLAC National Accelerator Laboratory, Scientific Data Management Plans in Theory and In Practice

Jean Deken noted that scientists are required to plan for how they manage research data, thanks to funders’ policies. She suspects that archivists’ concerns were not uppermost in policymakers’ minds when they made their rules.

To an archivist, a Data Management Plan (DMP) is a historical document describing the data practices of the experimental collaboration.

Tools exist to help with creating DMPs – e.g. the California Digital Library DMP tool and the  Digital Curation Centre DMPonline tool in the UK.

In theory DMPs minimise the risk of data loss and maximise data accessibility but in reality they leave many questions unanswered. Jean quoted Jeff Rothenburg’s wisecrack “Digital data lasts forever – or five years”.

After the analysis of a dataset is completed there is often no requirement to retain the original data.  Even when it is retained, it may become unusable over time even by the original researchers. Sometimes it’s better or cheaper to do a new experiment.

Here Jean mentioned the National Research Council report in 1995 which highlighted the difference between experimental science and observational science when it comes to data retention. Observational science benefits from long-term data gathering, so it makes sense to hold onto old data. Experimental scientists tend to expect that repeating an experiment in the future with better equipment will give better results, so they’d-rather repeat the experiment than hold onto it long-term. 

This ignores the issue of reproducibility, which was perhaps not so prominent back in 1995.

Record-keeping in science

Juan Ilerbaig, University of Toronto, Integrating Data and Records in Archiving Scientific Research

Ana Margarida Dias da Silva et al., Universidade de Coimbra, The Importance of the Botanic Archive in Contextualizing the Botanic Collections of the University of Coimbra

Juan Ilerbaig gave a very thought-provoking talk about the role of record-keeping in science, and the inter-relationship of different records and objects. This was new ground for me but Juan’s talk made me want to learn more.  Juan noted that the records of science include both the structured ‘minutes of science’ (the published literature) and various less structured records (communications, raw data, records).

Juan referred to the correspondence between records, data and physical objects. A published scientific paper can be seen as a proxy for the research (the data). The data and objects produced by research can be seen as possible sources for future work.

He cited the US archivist Maynard Brichford who wrote in 1969 that “Test and experimental data should be destroyed when the information they contain is condensed into published reports or statistical summaries.” (1)

Juan suggested that this point of view neglects to consider that scientific record making is an active agent in the process of science, not just a passive byproduct. Therefore models of science that rely only on the final publication risk misrepresenting what really happens in research.

To support what he said Juan related an example from Charles Darwin’s voyage of the Beagle. Juan explained that the links between Darwin’s specimens, tags (metadata), published description, labels, notebooks, specimen catalogue, zoological diary (rewritten diary), were all crucial to an understanding of how Darwin came to his conclusions. At first it was not clear to Darwin that the location of where he had collected his specimens was important. He had not been gathering location information. When he realised that location was a crucial part of the story he asked the ship’s crew members (many of whom had made their own notes) to provide information to fill in the gaps in his records.

Juan said that the process of recording (writing) and cross-referencing turns private experience into public information and turns itemized knowledge into generalized knowledge. I need to think a bit more about that – I’m not sure I quite grasp it. 

Some of what Juan said chimed with another talk, from Ana Margarida Dias da Silva at the University of Coimbra. She too emphasized that the whole is greater than the sum of its parts, showing how links between her institution’s botanic archive and its plant collections were synergistic. Similarly links between the archives can shed valuable light on objects in the museum collections and on the development of the library collections.

I really appreciate this holistic point of view, and the context provided by different kinds of information and evidence resources.

Archiving websites

Polina Ilieva, University of California, San Francisco, Science Online: Evaluating Appraisal, Usage, and Impact

Polina Ilieva from the UCSF archives explained their approach to archiving websites. She stated that an archive needs to collect more broadly than just records that support the published record. A contemporary scientific archive must also collect many unofficial channels of communication, including electronic information.

Polina made the point strongly that when talking about electronic records, appraisal has to occur soon after creation of the records, not decades later (2). 

At first UCSF only collected websites that linked to existing archive records but then extended their remit to archive the websites of all labs. They invited PIs to nominate websites to be archived (allowing self-nominations). Now they are archiving 128 out of 187 unique lab websites that they have identified. They crawl the websites twice a year. They use Archive-It  to archive lab websites.

Lab websites often only represent the successful side of research. Not all the failed, rejected stuff. UCSF is also looking at electronic lab notebooks (ELN) with a view to archiving these. Because they are proprietary it may not be possible to archive them. Maybe archivists need to start a conversation with ELN service providers.

Polina recommended Lorraine Daston’s recent book – Science in the Archives.

Appraisal

John Faundeen, U.S. Geological Survey, Science and Technology Archives: The Art and Science of Conducting Appraisals

Patrick Shea, Science History Institute, Appraising the Records of 20th century science

I enjoyed the papers from John Faundeen and from Patrick Shea on appraisal, though they were mostly talking about paper records.  This section was instructive for me, a non-archivist.

Appraisal informs the initial decision to ingest records to the archive, and subsequent decisions to retain or discard. One approach is t form an appraisal team, including an archivist, scientist(s), and a research manager.

Both John and Patrick used structured questionnaires to collect facts about the records. John  used 44 questions (NARA best practice for federal agencies) while Patrick used 21 questions.

John asks scientists:  are the records somewhere else too? what was the original purpose of these records? what may be the future scientific uses of these records? He has carried out 90 appraisals in 12 years. In that time he has accepted/retained about two thirds of the material appraised.

In his talk Patrick noted that you can’t keep everything. The material’s uniqueness, form, importance, and value all come into the decision. As well as actual archives his institute will collect ephemeral material – e.g. conference proceedings, equipment catalogues.

Scientists don’t appreciate the importance of anything except the published reports. There are many challenges – not least that Records Management can end up destroying too many records.

“History in the true sense depends on the unvarnished evidence, considering not only what happened, but why it happened, what succeeded, what went wrong” said US archivist Frank Burke.

Archives for a new institute

Laura Outterside, European XFEL, New Science, New Archives: Records Management at European XFEL

Laura Outterside is records manager at the XFEL (European X-Ray Free-Electron Laser Facility). This is a new institute – though it has been some years in the planning. Her focus is on scientific records – records about the administration of science – funding, planning, and everything before the data gathering. She is also considering the need for an XFEL archive.

She noted that XFEL researchers are managing their records already, but they are all doing it differently. Laura is planning to undertake records ‘health checks’ to assess the state of RM across all research groups. She hopes to work towards a central document catalogue.

Now is a good time to focus on RM and archives as XFEL moves from a planning phase to an operational phase. A new chapter is opening, and a new generation of staff is coming in. The current scientific director is retiring. He has been involved from the start of the XFEL project and will have many paper, digital, and email records. Laura plans an oral history interview with him. She is also planning to review procedures for managing records on the departure of key staff.

Laura is starting with the records and working backwards to procedures, policies.  Bottom-up, decentralised, flexible rather than compliance-based approach. This seems a very pragmatic approach, and it makes sense to me. Good scientific research practice policy has some documentation and publishing guidelines relevant to archives, such as “retain all records safely”. XFEL also has an Asset Management policy which is relevant to RM.

Laura has been inspired by the examples of EMBL, CERN, and SLAC archives. Those  archives were created 20-40 years after the creation of the respective institutions. Laura noted that today it is important to consider archival legacy from the start, echoing the point made by Polina that digital archives are more vulnerable than paper archives.

Archives to theatre

Christian Salewski, Alfred-Wegener-Institut Helmholtz Centre for Polar and Marine Research/ Archive for German Polar Research (AGPR), The History of German Polar Research goes Theatre – The Project “Staging Files”

The final paper of the workshop was from Christian Salewski, head of the Archive for German Polar Research.

According to its website “The mission of the AGPR is to secure the written and oral tradition of German polar and maritime research, a 150-year-old scientific venture with deep roots in the federal state of Bremen. Founded in 2011, the AGPR archives records and other material of this research field. “

Christian told us that there is a 100 year-old tradition in Germany of documentary theatre. In 2016 the AGPR decided to create a play about early German polar research, based on their archives. The process was led by a historian, working with a theatre company. Christian taught students from the University of Bremen history department about the history of German polar research. The students were given access to material in the AGPR. Then they wrote essays about the history and these were used by the theatre company to put together a first draught of the play.

The play was developed as a stage reading.  It is called Vom Eis gebissen, im Eis vergraben (Bitten by ice, buried by ice) and was put on by the Bremer Shakespeare Company.

The AGPR got great recognition for the play, including from the Institute management. It is a very creative way to exploit archives.

Other points from other papers:

  • It’s always helpful to document choices and decisions when you make them.
  • The importance of established criteria on what to collect.
  • How can technical or technology-related archives become accessible for humanities research?
  • First, persuade owners/creators of existence and significance of archive.
  • People may value the old, but do not realise the value of newer records even if they are very rare.
  • Holding public events for the community helped to change attitudes towards archives.
  • Help records creators to understand significance of things they have, and stop them throwing it away.

More about CAST

The CAST committee has been brought under the umbrella of the International Council on Archives Section for Research and University Archives (ICA-SUV).  This opens up some funding streams for future events and helps to bring the workshops to a wider audience. It is planned to continue alternating between Europe and north America, and to hold a workshop every one or two years.

I’m pleased to say that I have recently been invited to become a member of CAST, which is very flattering.  I will be working with the other members of the committee to help plan the 2020 workshop, and look forward to getting involved.

References

  1. Scientific and technological documentation : archival evaluation and processing of university records relating to science and technology / by Maynard J. Brichford.
  2. Terry Cook, http://www.interpares.org/book/interpares_book_l_app03.pdf

==========

Edited 12 Nov 2018

C-CAST has changed its name to CAST, and dropped the word ‘Contemporary’ from its title.

About Frank Norman

I am a retired librarian. I spent 40 years working in biomedical research libraries.
This entry was posted in Archives, Research data and tagged . Bookmark the permalink.