Comment by Ruth

In our last lecture on this semester’s theme “Data: What are they? What do they do?”Carsten Østerlund asked what hyperlink data are and what they do in a citizen science project. In his talk titled „Tracing Hyperlinks: Different forms of presence and knowledge production in an online citizen science community“, he examined hyperlinks and the knowledge they produce, as well as practices of mixing different forms of knowledge in interesting ways.

Inspired by Estrid Sørensen’s (2009) work and her differentiation of three forms of knowledge (i.e., representational, resonance and fluid knowledge) this project traces different forms of presence and knowledge created through hyperlinks and practices of hyperlinking in the online interactions of involved participants of the citizen-science project Gravity Spy ( For the Gravity Spy project, the Laser Interferometer Gravitational-wave Observatory (LIGO) is detecting and documenting gravitational waves hitting the earth. Gravitational waves originate when very heavy objects – like neutron stars – merge, creating ripples in spacetime, that can be detected on earth as a very faint noise. The detection of these waves is susceptible to noise events from various sources, called glitches, that can mask the gravitational waves. The automated detection of these glitches is aided by participants, helping machines to find and classify these glitches more easily, to clean them out. Even though the project is based on sound, participating citizens see visual representations of these sounds as the basis for classification. The role of citizens in this project is to volunteer to train machines to enhance machine learning and the automated recognition of glitches. At the same time volunteers come into the project without prior knowledge, thus they have to be trained themselves by machines. Participants begin the classification with easy glitches, moving through several levels, where the glitches get more advanced and harder to classify for machines resulting in ways of humans and machines training each other to do better. While a huge number of people do a little work on the project, the vast majority of the work is done by a few people. Each classification can be commented on and discussed and exactly the communication of the citizen-work and the use of hyperlinks therein is the subject of interest of Østerlund’s research project.

The project uses a blend of qualitative and quantitative data, approaching a quantitative data set of 5426 URLs with a qualitative research design. The project focuses on what people need to know to do their work in the citizen-science-project better, to help automated classification and find new glitch categories. It asks what knowledge people need (additionally to the visual representations) to help even better in the detection of these glitches, and looks at the role of hyperlinks in that.

These hyperlinks are used for various purposes, adding notes, linking similar glitches, but also notes for oneself only, hashtags to find specific glitches again in the future, or to even suggest new glitch classes. There is only a handful of very involved participants, who do most of the advanced classification work. They look at glitches categorized as “none of the above” by other participants and try to work on these glitches, that couldn’t be classified by others. Hyperlinks are used to communicate about the classification work that different involved participants are doing and trying to establish a consensus among them.

Different uses of hyperlinks are analyzed in this project e.g. where they link to. They are mostly internal, including hashtags, that can be used to who a collection of images with the same hashtag, creating bespoken collections, linking to tutorials or glitch descriptions, discussion collections trying to come to a consensus or even new glitch proposals, to add another class of glitch. External links point to journal articles, research papers, Reddit, or any information that seems helpful for the work of classifying glitches.

Another angle discussed is who is using these links: Only a handful of people e.g. a retired chemist in France, a person with a photographic memory in Poland, are the ones creating most work and most hyperlinks within the project. The interviews with the participants involved reveal, they often are not for someone else but rather for themselves to help in the work they do, to be later able to find them again, in their search for new glitch classes.

The different methods used to analyze hyperlinks also cluster them according to where they appear: in comments on specific glitches, in forum discussions, or new glitch proposals.

But the main focus is on how the links are used to create knowledge and how they combine different forms of knowledge to do so. Most are more liquid, e.g. threads bringing the discussion of other people together, some are more representational and most are mixing different forms of knowledge. Østerlund showed how liquid knowledge is used as citizens try to educate others on what they have researched, communicating what they might think might be helpful for others. They are linking to a similar discussion or summarizing these discussions. Participants use them to engage in conversations, point out and reference to work of others, and other people take time to gather these discussions, trying to create more consensus, to finally lead to a new glitch proposal. The extensive descriptions of glitch proposals are also the connection to the scientists involved in the project, (although they are too slow in the review of these proposals). The documentation of these glitch proposals and their status (accepted, reclined, etc) is where the work of involved citizens is trying to come to a consensus and share their knowledge with the astrophysicists involved.

These preliminary insights into this project examined data, what they are and how they are made in a lot of different facets. The Gravity Spy project itself is concerned with certain data, that are translated into visuals representations and classified in a combined effort of machines and humans in this citizen-science-project. Machines and humans are educating each other on this data, to help create more useable data for astrophysicists. In doing so the involved citizens try to create a consensus by using comments, discussions, and hyperlinks. Which in turn becomes the data for Carsten Østerlund’s research project, examining what kind of knowledge is necessary to better classify the data citizens work on. The lecture provided a fascinating example of how to approach large sets of quantitative data with qualitative methods and questions how to make sense of data and trace how it (inter)acts in practices of knowledge production.

CARSTEN ØSTERLUND is a Professor at the iSchool at Syracuse University. His research explores the organization, creation, and use of documents in distributed environments where people’s daily practices are characterized by high mobility. He is particularly interested in the interplay between social and material structures and how they together facilitate distributed work, play and learning. Empirically he studies these issues through in-depth qualitative and quantitative studies of everyday work practices in a range of settings including citizen science, crowdsourcing, healthcare, distributed science teams and game design. He earned a Ph.D. in Management from Massachusetts Institute of Technology and is a former student of UC Berkeley, University of Aarhus, and University of Copenhagen, Denmark. He has been affiliated with the Work Practice and Technology Group at Xerox PARC.

For further comments click here.