Abstract: Informatiewetenschap 1999

F.A.P. Harmsze, M.C. van der Tol and J.G. Kircz
Van der Waals-Zeeman Institute
Speech Communication, Argumentation Theory and Rhetoric
University of Amsterdam
Valckenierstraat 65
1018 XE Amsterdam
The Netherlands

A modular model for electronic scientific articles

The project

At present, we face a revolution in the dissemination and handling of scientific articles. Most major publishers make an important share of their publications available via an Internet site and many independent new initiatives are launched. With a few exceptions, these electronic publications are in fact reproductions of paper-based products. In the project `Communication in Physics', we have gone one step further and proposed a novel model for the creation and evaluation of electronic scientific articles, in which we take into account the intrinsic features of the new medium, the requirements for adequate scientific communication and the societal traditions.

Rather than concentrating on software issues, we chose an analytical approach. We analysed the characteristics of scientific communication via articles and drafted a profile of the interactants in the communication process. This led to a series of requirements that scientific articles have to satisfy to allow for effective and efficient communication. In order to ensure that our model is grounded in scientific practice, we developed a model in conjunction with an analysis of a coherent corpus of printed articles in the field of experimental physics. In this analysis, we identified different types of information and relations in the corpus and re-organised that information in a novel, modular structure. We found that the modular structure indeed allows for the creation of scientific articles that meet our requirements. We specify the model in terms of instructions to authors and provide recommendations for software implementation.

In an earlier presentation in this series of conferences we gave an outline of our programme (IW 1996). In this contribution we would like to present the results, which will soon be fully reported elsewhere (Harmsze 2000).

The model

Because electronic media are suitable for multiple usage and reshuffling of information units, as well as addition of new components to published work, our guiding principle is `modularity' (Kircz 1998). We developed a model for modular articles, starting from the idea that an electronic article can be made up of well-defined modules and links that, following the SGML-philosophy, can be identified with tags. In our modular model, we define the modules that can represent the different types of information in an article. To guarantee and express the coherence of the information in different modules, we introduce a systematic way of linking the modules, both within the same article and between different publications. So, a modular article represents a subnetwork of information within the network of all published information. In our model, both modules and links are explicitly characterised `information objects' that can be handled using database management and information retrieval techniques.

Modules

We define a module as a uniquely characterised, self-contained representation of a conceptual information unit that is aimed at communicating that information. Not its length, but the coherence and completeness of the information it contains makes it a module. Modules can be located, retrieved and consulted separately as well as in conjunction with related modules.

The relations between modules can be expressed not only in links, but also in the composition of elementary modules into higher-level, complex modules. We define a complex module as a module that consists of a coherent collection of (elementary or complex) modules and the links between them. Using a metaphor, elementary modules are 'atomic' entities that can be composed into a 'molecular' entity: a complex module.

We distinguish two types of complex modules: compound modules and cluster modules. In a compound module, related (albeit possibly dissimilar) modules are aggregated to form a new module on a higher level. An example of an aggregated module is the module 'Experimental methods' that is composed of lower-level modules representing the various components of a molecular beam apparatus, such as the source of the particle beam and the detector. The central concept of a cluster module is a generalisation of the specific concepts focused on in its constituent modules. An example of a cluster module is a module 'Raw data' composed of various elementary modules reporting the results of the same general type of measurements involving different molecules.

In order to be able to determine what is `similar information' that has to be grouped in an information unit and represented in a self-contained module and, subsequently, to determine how to tag the resulting module, we need an appropriate typology of scientific information. We introduce a typology by which we characterise the information from four complementary points of view. In this typology, we incorporate the characterisations from two classical points of view: the domain-oriented characterisation that can be expressed in keywords, and the characterisation by specified bibliographic data. In addition, we introduce a characterisation by the range of the information and a characterisation by its conceptual function, i.e. by the role the information plays in the scientific problem-solving process.

Characterising information by its range, microscopic, mesoscopic and macroscopic modules can be created. A microscopic module represents information that belongs only in one particular article, e.g. information concerning the specific problem addressed in that article. A mesoscopic module functions at the level of an entire research project; it is created for multiple use in several articles issued from the same project. For example, information about the experimental set-up that has been used in a series of experiments can be represented in a mesoscopic module and connected to several articles. A macroscopic module represents information that transcends even the level of the research project; this type of firmly established information is given in books, for example.

Our main division in modules is based on the characterisation of the information by its conceptual function. Our starting point was the prototypical section structure of scientific articles: Introduction, Methods, Results, Discussion and Conclusions. By the conceptual function, we distinguish the following modules:

Links

In the present practice of hypertext linking, the relations between the linked objects are often left unclear to the reader. A standard hyperlink only indicates that the author has some relation in mind between, for example, a blue underlined word and something else. In standard HTML-documents full of links, we are directed from nowhere to everywhere and back.

In our modular model, a link is defined as an explicitly characterised directed connection, between modules or parts thereof (e.g. words or sentences), that represents one or more different kinds of relevant relations. Characterising links by the relations they express and by the modules they connect enables the reader, firstly, to make a well-considered choice whether or not to follow the link and, secondly, to take the links into account in the process of locating and retrieving relevant information. Each link is also characterised by the bibliographic data of the author who identified these relations and created the link, so that links become proper `information objects' with well-defined metadata. This ensures the authenticity and priority of each `information object' when new links or modules are added to published work.

In our analysis, we identified different types of relations that are relevant in modular scientific articles and formulated a typology for the links in the modular structure. We distinguish two main classes of relations: organisational relations and the scientific discourse relations. In the class of organisational relations, which express the organisational coherence of the modular network, we distinguish the following six types of relations:

  1. hierarchical: an asymmetric relation between complex modules and their constituent modules,
  2. proximity-based: a symmetric relation between linked modules expressing whether they are part of the same collection (in particular, the same article or set of articles),
  3. range-based: an asymmetric relation expressing the difference in range between linked modules,
  4. administrative: an asymmetric relation between ``scientific'' modules and the modules representing their meta-information,
  5. sequential: an asymmetric relation between modules linked to form a complete or more easy-going reading path,
  6. representational: a asymmetric relation between different representations of the same information (e.g. in texts, tables and figures).

An important aspect of links based on organisational relations is that they can be assigned semi-automatically, provided the authors have appropriate authoring tools at their disposal.

The second main class of relations, scientific discourse relations, allows authors to indicate why they refer to another module or another part of the same module. Following speech communication research, we arrive at two subclasses of scientific discourse relations. One class is based on the communicative function. Within this class, we distinguish argumentation and elucidation relations. The latter are further subdivided into explanation and clarification (in particular, specification and definition). So, the author can, for instance, connect a difficult term to an ``encyclopaedic'' macroscopic module by a link expressing a `definition relation'. The second subclass of scientific discourse relations comprises content relations, such as dependency, elaboration, similarity, synthesis and causality. For example, in the problem-solving process, results depend on the methods used to generate them. The author can explicitly express that dependency relation in a link connecting the Results module to the relevant Methods module.

Applicability of our model

We developed the model in conjunction with an analysis of a corpus of articles published by a single research group about experimental molecular dynamics. However, a short inspection of examples of publications in other domains showed that modular models for other types of publications can be derived from our model.

To test the model, we rewrote two strongly related articles from our corpus as modular electronic article (demo in progress). Although the modular model is explicitly intended for the creation and evaluation of new work, recasting old work into the new mould we found that modular electronic articles can meet our pre-defined requirements better than linear articles. In particular:





Back to the Communication in Physics Project home page with frames
or without frames .


The URL of this page is: http://www.wins.uva.nl/projects/commphys/papers/infwet.html

Additions, corrections and comments concerning this page are always welcome. Please send them to: harmsze@wins.uva.nl. Contact webmaster@wins.uva.nl if you have problems with the server.
Last updated: 9-9 1999