Pre-Print for your information, submitted to Journal of Documentation. Final version: 23 April, 1997, 9.35 pm (Slightly shorted published in Journal of Documentation. vol.54. No. 2. March 1998. pp. 210-235)
Joost G. Kircz
Van der Waals-Zeemanlaboratorium
University of Amsterdam Valckenierstraat 65, 1018 XE Amsterdam, The Netherlands
P.O.Box 103, 1000 AC Amsterdam, The Netherlands
The development of electronic publishing heralds a new period in scientific communications. Besides the obvious advantages of an almost endless storage and transport capacity, many new features come to the fore. As each technology finds its own expressions in the ways scientific communications take form, we analyse print on paper scientific articles in order to obtain the necessary ingredients for shaping a new model for electronic communications.
A historical overview shows that the typical form of the present-day linear (essay-type) scientific article is the result of a development over the centuries. The various characteristics of print on paper are discussed and the foreseeable changes to a more modular form of communication in an electronic environment are postulated. Subsequently we take the functions of the present-day scientific article vis-à-vis the author and the reader as starting points. We then focus on the process of scientific information transfer and deal essentially with the information consumption by the reader. Different types of information, at present intermingled in the linear article, can be separated and stored in well-defined, cognitive, textual modules. To serve the scientists better in finding their way through the information overload of today, we conclude that the electronic information transfer of the future will be, in essence, a transfer of well-defined, cognitive information modules. In the last part of this article we outline the first steps towards a new heuristic model for such scientific information transfer.
The emergence of electronic means for publishing not only enhances the possibilities for information storage, retrieval, and dissemination, but also amplifies the existing problem of a massive information overload. Electronic storage and transmission of scientific information enables scientists to learn in real time about the newest thoughts, ideas, and proposals of every colleague around the world; it also puts at their immediate disposal all available works independent of their date of publication, relevance, language, or context.
The present system of scientific journal publishing is a way of structuring the available information to enable the communication between scientists; journal titles serve as signposts for fields as well as quality marks. Indexing and classification terms next to authors' names and addresses facilitate the retrievability of a work. In an electronic environment these standard indicators can be manipulated in a vastly enhanced way. The question at hand is: will the speed-up and further sophistication of indicator manipulation, complemented with new capabilities such as free text searching, be sufficient to cater for the scientific communication needs in an electronic environment? In this article we argue that the very form of scientific communication might change in such a way that a more precise and to-the-point communication system will emerge. The present structure of the scientific article itself is up for review.
In the tradition of the paper article a clear document structure emerged; a structure characterised by its linear build up and closed character. In an electronic environment, where all articles and parts thereof can be interconnected, new ways of presenting scientific information will develop.
As in the development of every new technology, the first, introductory, phase is characterized by casting the old and trusted forms into new frames of reference. However, in the course of time, the new technology will impose its specific characteristics onto the presentation of the content. Just as the first automobiles mimicked horse-drawn coaches, the first electronic journals mimic their printed ancestors. So it is obvious that the current endeavours of simply bringing the existing culture of article writing into the electronic era is only part of the story. In this article we describe a research programme which tries to go beyond the simple transplantation of printed scientific articles into electronic media, and tries to grasp the essential new features that electronic information transfer may induce in scientific communication.
The object of this paper is to show that the standard, linear, essay type of research paper is a typical historical product of print on paper and that, in an electronic environment, this presentation form might be supplanted by a coherent set of linked modules. We also discuss the communicative and reporting functions of a scientific article needed to define such modules.
In order to set out on our road towards a new model for shaping scientific communication in the electronic era, it is important to analyse what the essential functions of the written scientific article are, how they came into being, and what their intrinsic demands are with regard to the technologies involved. It is important to understand which characteristics of scientific communications are essential for the discourse of science, and which are typical for the technologies applied. In this study we restrict ourselves to textual scientific communications, in particular in physics, in the form of the regular journal article [a], as being the most popular and universally accepted form of scientific communication. The present-day scientific article is the result of a very long development of social and technological changes and innovations. In the emerging electronic era, it is immediately clear that usage and form of scientific communication will change drastically. Electronic mail, bulletin boards, news groups, hyper-links within and between various, and even geographically, separated texts, integration of sound, motion pictures and text: these all herald deep and decisive changes. Most of the examples in electronic publishing we see mushrooming now are first attempts to transpose the paper product into an electronic medium. The classical journal article is stored in an electronic medium (CD-ROM or server-memory in a network) and endowed with new technological features such as hyper-links, colour; while sound, pictures and film files, etc. are often added as digital appendices. On this vibrant testing ground, many a new technology is introduced and exuberant claims and forecasts are aired. It is explicitly not our aim to review all these technological innovations; we take them for granted in our analysis . In the present paper we try to develop a deeper understanding of the historical context and of the functions of the present-day scientific article in its social and technological context, as a basis for the development of new, modular, ways of presenting scientific information which are in line with the established functions, but are fundamentally shaped by electronic opportunities. In this respect it is interesting to refer to the electronic publishing plan of the Association of Computing Machinery (ACM) of 1995, in which documents are defined as "object oriented, with some components being other objects already published on the Web" , which is exactly in line with the central argument of this paper, that electronic publications lend themselves to modular structuring. In the subsequent sections we deal with the historical roots of the standard scientific article and discuss its characteristics vis-à-vis the pre-journal period. At the same time we try to list (media independent) features that will remain, as well as features that will change in the electronic era. In part 3 we discuss the functions of the present-day scientific article. Next to the author's intentions of presenting and disseminating new findings, we emphasize the demands of the readers in their various consumptive rôles. In part 4, we combine our analyses into a first outline of a new heuristic model for a modular representation of scientific information.
The use of written texts for scientific communications, as we know them, is relatively recent. Within the paper based scientific journal article, many historical patterns are drawn together; they form a solid fabric of social and technological trends which, in their present and also historically final form, play an essential rôle in the whole organization of scientific research. In order to carry over these trends into an electronic environment, we will now review the main historical threads.
In the centuries-long process from orality to literacy, the scarcity of durable materials for information storage constituted one of the crucial limitations for scientific debates and developments. The societal needs of record keeping for religious, tax, calendrical, and trade purposes formed a strong driving force for the development of new technologies . The difficulties of storing large amounts of information and the great skill needed to prepare the memory medium as well as the actual writing kept the art of writing confined to priests and scribes of the rich and mighty. Most early writings were related to holy texts, stocks, historical facts, and astronomical data. In all the steps from the oral tradition to the first inscriptions in stone, clay, papyrus, or wax, and further to vellum, parchment, and paper, new forms and techniques for storing and presenting knowledge developed. Not only did the ever greater storage capacity enable a change from oral mnemonic aids to typographical structures, but the whole structure of reporting changed.
The early scientific texts show a high level of factual knowledge [4,5] but it took a considerable time before a true culture of real scientific information exchange and debate emerged. In this paper we will not enter into the important discussion of the social aspects of change from orality to universal literacy and its consequences, although this discussion is certainly very important in understanding the changing patterns of communications as result of changing technologies [6,7,8,9]. With the invention of the phonetic alphabet by the Greeks , generalized literacy emerged and consequently elaborate scientific treatises developed in Greece, heralding the birth of modern science. However, it would take until the 17th century before we witness the full power of written science.
The writing of the middle ages was to a large extent characterized by the copying of existing scripts by scribes; monks performed this tedious work in the claustrum of the monasteries, not for the benefit of science or knowledge, but as a religious activity that was for the monasteries economically advantageous. The painful and technically complicated labour of preparing the vellum, parchment, pen, and ink was a craft not necessarily linked to the intellectual consumption of the information. Troll  even concludes: "The obvious conclusion is that the Middle Ages were populated by copyists who often could not read what they copied: the text could be in a foreign language or script that the scribe could not understand". Marshall McLuhan phrased it in his typical sweeping way: "The medieval book trade was a second-hand trade even as with the dealings today in `old masters' . We also have to remember that before Charlemagne (8th century) there were no word divisions , punctuation, or paragraphing. Reading in the middle ages was still not a silent activity but the texts were muttered since the written text was difficult to understand. Written texts were made to repeat existing and universally accepted knowledge. For a long period, the unique and extremely expensive hand-written texts fitted nicely into a culture where science was scholastic and characterized by interpretation of authoritative texts (mainly from Aristotle). Texts did not yet serve as universal tools for active debate, but represented eternal wisdom, which was interpreted and repeated. Science and its medium were essentially static.
Most interhuman communications remained oral and hence local. Knowledge percolated through society mainly through travelling bards and savants; this continued in the tradition of the Grand Tour for young students. The art of reading and writing as an integrated craft for shaping social knowledge took a longer route. With every new technology, humankind shaped itself a new environment for communication, distinctly different from the previous one. A very important aspect, which is the pivot of McLuhan's message in his Gutenberg Galaxy, is that with the emergence of a paper culture and a fortiori the decline of oral communication, communication loses context (in the form of music, performance, and debate in personal contact), but also became universal in the sense of being non-local. Text starts to stand on its own feet, and from then on can transfer information independently of time and distance. To quote Olson: "Literacy generally, and printing in particular, fixed the written record as the given against which interpretations could be compared. Writing created a fixed, original, objective `text'; printing put that text in millions of hands" .
The use of text as the medium for scientific discourse resulted in stringent specific features, for example with regard to the standardization of the forms of regular scientific article; features which might again be relaxed in the many-media environment of tomorrow [b]. In this sense, McLuhan's exuberant emphasis on the role of media as a prime mover of human activities and organisation provides most interesting insights into the intertwining of technologies available and kinds of intellectual expression .
In the western tradition, the real breakthrough in scientific communication emerged only in the second half of the 15th century, following the invention of movable type and, later, the printing press. The establishment of the Nurenberg Press by Regiomontanus around 1470 can be taken as a starting point. The fundamental social and cultural consequences and epoch-defining applications of this invention are extensively dealt with by Eisenstein . A very important feature of the development of widely distributed texts is that the rôle of the printed text vis-à-vis the content changes. Texts as independent objects obtain a special status as carriers of truth. This is exemplified by Luther's statement that the meaning of the Scripture depended not upon the dogmas of the church, but upon a deeper reading of the text, hence text could explicitly state meaning without an interpretive context. As Olson [9, p.263] argues: "the shift in orientation.... of the `essayist technique' was one of the high points in the long history of the attempt to make meaning completely explicit".
With the breakthrough of the printing press, a series of essential new and lasting features of the printed work emerged. It is interesting for our discussion to list the most prominent ones and to relate them to the emerging features of electronic media (see section 3.1).
The new features of print on paper found an immediately fertile ground with the economical and cultural development of the 16th and 17th centuries when science secured its place as an important and often crucial production force . We see a change in patterns of communication between scientists as well as the establishment of more formal institutions which shape this communication into more or less standardized form. In an interesting way, the social role of science and the capacity to disseminate science in printed form reinforce each other. The centuries-old tradition of secret crafts and private knowledge, falls apart in the scientific revolution. Eamon  describes how this "book of secrets" tradition is shattered by the new experimental approach of the emerging Baconian tradition. Secret knowledge becomes public knowledge when printed, but at the same time acquires the status of recognized intellectual property. The important aspects of scientific recognition, priority claims and rewarding systems, that took shape in the second half of the 17th century, are extensively analyzed by Merton .
The founding of scientific organizations heralded a new phase in the scientific revolution. They started to centralize the growing correspondence between scientists, correspondence which was truly international and wide ranging. Letters were copied and circulated or read at meetings. The big leap forward in scientific communication was made by the collecting of letters and reports in printed form, often as collections of correspondence of important scientists. The first real scientific journal, "Le Journal de Sçavans", appeared on 5 January 1665 in Paris under the editorship of Denis de Sallo de Coudraye . The breakthrough occurred with the publication the "Philosophical Transactions: Giving Some Accompt of the Present Undertakings, Studies and Labours of the Ingenious in Many Considerable Parts of the World", on 6 March 1665, under the auspices of the Royal Society of London. This publication served for a very long time as a model for new scientific journals . The publication of the Transactions reflected the new ideal of "cooperative natural philosophy", mainly based on "Bacon's belief that the acquisition of knowledge was somehow an automatic process, once the correct procedure was followed" . This concept shaped a uniform literary style and presentation of the articles based on a commonly accepted way of reporting. Thus a kind of neutral, that is to say collectively accepted, authority emerged which standardized the scientific article. The strong emphasis on experimental work was a distinct break with the scholastic tradition, "rather than being a generalized statement of how some aspects of the world behaves, it was instead a report on how, in one instance, the world had behaved" [22, p. 152]. This aspect introduced and institutionalized the new, and still crucial, concept of peer review , as the reports were extremely detailed about all circumstantial facts and gave listings of all those who witnessed the experiment [c]. Under the unique editorship of the world's first truly scientific journal editor Henry Oldenburg [24,25] the essential characteristics of the modern scientific article took shape. As we are analyzing in our research ways to change scientific reporting practice as a function of changing media, it is important to underline the essential features of these printed articles. For that reason, it is useful to cite Oldenburg where he emphasizes in a letter to the great chemist Boyle, the role of the Society in safeguarding the integrity and intellectual ownership of scientific reports, which he vested in the Transactions. "...to communicate freely to ye Society, what new discoveries he maketh, or wt new Expts he tryeth, the Society being very carefull of registering as well the person and time of any new matter, imparted to ym, as the matter itselfe; whereby the honor of ye invention will be inviolably preserved to all posterity". This is a clear statement of two enduring features of a scientific report, namely the date of receipt as formal date of claim and the pertinent claim of intellectual ownership of the work presented by the author. Even more clearly, Oldenburg writes in another letter: " This justice and generosity of our Society is exceedingly commendable, and doth rejoyce me, as often as I think on't, chiefly upon this account, yt I thence persuade myselfe, yt all Ingenious men will be therby incouraged to impart their knowledge and discoveryes, as farre as they may, not doubting of ye Observance of ye Old Law, of Suum cuique tribuere (allowing to each man his own)" . It is clear that the above defined rôles, which Oldenburg anchored in the scientific article as a controllable account of a work, which is properly registered, dated, and intellectually owned, remain valid in an electronic era. The important question is how these newly established features change their form with changing reproduction techniques.
In the same period we see a strong emphasis on printing in vernaculars instead of Latin. This enhanced the development of national scientific activities, but shattered the real international information system which existed until then. Only after World War II do we see with English the resurrection of a single Lingua Franca and a new transparent worldwide information system.
Soon after the above mentioned trail-blazers, the number of scientific periodicals mushroomed. The first abstract journals had already appeared at the beginning of the 18th century, whilst the well known outcry "This is truly the decade of the journal, and one should seek to limit their number rather than to increase them, since there can also be too many periodicals" was already aired in 1789 in the Neues medicinisches Wochenblatt für Aerzte [21, p. 171]. The first citation analyses, in the form of cases cited by the cases printed in volumes of judicial reports, was already published in 1743 . Slowly the different periodicals (almanacs, book series, news and review journals, etc.) developed into real scientific journals for the scientists' use only, in the double rôle of a repository as well as a vehicle for dissemination of information. In other words: upon first publication in a printed journal, alongside the news value, an archival function is immediately fulfilled as well, since the journal copy is the basis of a library collection.
In the emerging electronic environment, the repository function is clearly different from the dissemination function now independent of the location of the archive. The publication itself can travel freely over the various electronic networks, whatever the location of the original hard-copy (if any) is. The use of so-called Unique Resource Locators (URL) in a World Wide Web environment is an attempt to maintain to a certain level the tradition of a local archive. It goes without saying that in an electronic environment the journal name or the publisher's imprint, attached to each publication still acts as identifiers for a certain quality of authenticity, integrity, and certification.
The pace of the creation of new scientific journals accelerated over time and, particularly after WWII when science became a fully integrated component in the worldwide economic and military competition, the flood surged to the present complete overload.
It is worthwhile to mention some figures here. If we take the number of items for pure physics in Physics Abstract, the major bibliographic abstracting service in Physics and the manufacturer of the INSPEC database, we see an almost stable growth rate from 1955 onwards. In that year about 10000 items were indexed, which surged to almost 174000 items in 1996, of which about 146500 are journal articles.
We have to realize that these items are mostly English publications which the data-base manufacturer considered important. Next to those a very large number of less important works exist, setting aside all those tens of thousands publications in other languages.
With all this in mind, we have to realize that researchers do not search for information of the last year only. In the exploratory phase all available information is of interest, so the pool of searchable information reaches millions of items. The growth rate of information in pharmacy, medicine and life sciences is, in general, much higher than in physics. On top of that, studies  show that partially due to the continuing specialisation in science, the difficulty of reading an article by somebody who is not a specialist is increasing, worsening the accessibility. All in all, it means that the computer as a storage and indexing facility is a double-edged sword. On the one hand the computer enables superior searching, on the other hand it allows huge quantities of information to be included in the pool, which in turn deteriorates the search results.
The prime problem in electronic storage of information is therefore not the storage itself, but the formats of storing and presentation and the various methods of indexing. Search and retrieval by a scientist in a paper world are characterized by things like word spaces, page numbers, typographical distinction between different kinds of information, and foot- and end-notes, which shape the printed information. In the electronic era the intrinsic structural formats are an important subject of research.
From the above historical sketch we hope to have made clear that the features of a scientific journal article are characterized by two interwoven threads: on the one hand we have the primary uses of scientific information namely, discussion and knowledge dissemination; on the other hand we have the technical possibilities for archiving, re-using, indexing, commenting, improving, digesting, and retrieving. These two threads cannot be completely disentangled as they do to a large extent define each other. In the emerging world of electronic publishing we have to be precise in determining which functions of the "classical" article remain and to what extent their form might be altered through electronic storage. A second and equally important question is: which functions typical of the paper world will cease to exist, or be fundamentally altered, and which new functions will emerge? A simple example is the erratum, a typical print-on-paper invention. In an electronic environment one could argue that we don't need errata. If a mistake is identified the electronic file can be up-dated. The file date or its version number will then inform the reader which file is the most recent and hence the correct one. In doing so, two problems have to be dealt with namely: a) it is important to keep the very first (original) version to enable comparison with the corrected one(s), as a reader of the original version has to know what has been corrected to understand the correction [d]; b) many errata are not simply misprints but comprise arguments or in-depth corrections. In such cases up-dating blurs the uniqueness of the original and hides possibly important discussion; in short, the scientific integrity is at stake. There the erratum should be considered as a comment to a communication and hence, should be appended permanently to the original instead of being integrated.
One of the most important aspects of a scientific communication is its portability. A text once composed must keep its integrity and form irrespective of the transporting medium. In an oral society many mnemomical techniques were developed to help the performer recite the texts. Stanza, rhythm and rhyme are typical examples . In written text we experience the conversion from speech into print and a developing bifurcation between the spoken and written text. The written text assumes its own independent form, the need to memorize the whole text ceases. Writing provides the essential memory function, hence the text can become extremely entangled and complicated as the reader is able to re-read parts and can scan forward and backward through the text. With the change from hand-written to printed texts this aspect is only enhanced. However, the memory function of the printed text is not the end of the story; a printed text is characterized by its wholeness. It is a narrative, a complete story, including the embedding of the actual work in relation to other works, discussions on one's own and others' activities, results and conclusions. A printed scientific text may be torn out of a journal and stored as an independent entity. Indeed one of the most important characteristics of the printed text is that it is a single, transportable, and unique entity. This feature is a typical result of the technology used. As mentioned in the previous section, presentation and storage are comprised in the same medium namely, a printed text. In the electronic environment this essential uniqueness will be abolished and storage and presentation become disentangled.
The outcome of the wholeness and narrative character is that the structure of printed text is linear, it has a beginning, a digression and an end. Consequently, the rhetoric of the printed text also has a particular form; the author argues for or against an opinion of others in the absence of the other. Opposite to a real debate (as in the oral environment of a scientific conference) no local context exists, hence arguments have to be tailored such that they cater for the unkown general reader.
In an electronic environment, where storage and presentation are no longer integrated, the reader is no longer obliged to follow the whole line of the author's reasoning but may select those parts that he or she fancies. It becomes then very natural for the reader to shuffle text around. This by itself might change the patterns of reasoning in an electronic environment. Of course such partial reading already happens daily when people browse and scan articles to find out if a particular paper is worthwhile to read in toto (or to cut-out or photocopy for further reference), but it is not a natural way of reading after the decision has been made to make use of a particular article.
In an electronic environment the natural human activity of scanning and browsing can essentially be catered for. In the first place, of course, by proper mark-up of the text so that the scanning reader can easily jump from, for instance, section header to section header. In this case, the typographical reading aids of the printed text are being elevated to a more conceptual level where the content (e.g. new section) and the presentation (e.g. italics or boldface letters) are disconnected. The development of text interchange projects such as Standard Generalized Mark-Up Language (SGML) [30, 31, 32, 33] is a natural consequence of this.
Next to the development of better structuring of existing documents, a much more radical approach would be to further transcend this development by breaking apart the linear text into independent modules, each with its own unique cognitive character. Since normally a general reader is looking only for parts of the information stored in a scientific article , a natural consequence of the split between storage and presentation would be the separate storage of unique pieces of information. In the historical trail from orality, via writing, to electronic information processing we have reached the stage where comprehensive communication no longer needs a linear build-up. A complete set of modules, each being in themselves (small) texts emphasizing aspects of the message that together establish a complete message from author to reader, is the next natural step in scientific communication. Such modules will also include new forms of debate, where arguments are gaining in context as they are clearly linked to a particular type of cognitive module. Every kind of information (pure data reporting or elaborate mathematical digressions) can then be shaped in a style and form which is best fitted for that particular kind of information, independent of the existence of other types of information, but of course coherently linked to the other modules of the same work.
With print-on-paper scientific articles we have independent articles; with an electronically stored article we have intrinsic connectivity between all parts of all works stored in the accessible memory. In a modular build-up it becomes possible to collect only those parts (modules) of a set of articles (e.g. all articles from a certain laboratory) which have a particular value for the searching reader (e.g. only equipment descriptions). In such an environment it also immediately becomes clear that much information which is, for reasons of coherence, necessarily duplicated in separated paper documents, can be merged to single entities in an electronic storage medium. It is this development which we research and want to develop.
In the first parts of this article we presented a short historical overview in which we indicated how the societal developments, intertwined with the available technologies, shaped the, now standard, scientific journal article characterized by its linear essay form. We also argued that the present stage of scientific communications, as a mass activity contrary to the pre-WWII situation of a discussion between a more limited number of scientists, demands new ways of presenting, storing, and retrieving scientific information.
The new electronic technologies that are called in to rescue the present situation of information overload are, at present, mainly used for massive storage of complete documents on file servers and the transport of documents from author to file server and from file server to reader. The intrinsic capability for a flexible system of well-defined cognitive modules of various kinds of information is still under construction. Its possibilities are clearly heralded by the use of hyper-text systems where texts are mixed with other texts as well as with non-textual elements.
In this part of the paper we deal in more detail with the functions of the scientific journal article in order to add, next to the historical arguments, evidence that in a fully electronic environment information transfer can be characterised by modularity instead of the paper based essay form. In the fourth part of this paper we will present the scaffolding for a heuristic model for such a new modular structure.
It is illustrative to give a comparative overview of important changes in communication due to the printing press and to relate them to the unrolling electronic developments. Using Eisenstein [16, mainly chapters 2,6 and 8] we come to the following:
i) The reusability of old works or parts thereof
i.i) The printing press quickly induced massive reprinting of old, and often in the strict scientific sense, obsolete works. Although this introduced the birth of the information overload with all its noise problems, it also unified the widely scattered knowledge and data repositories of humankind. As Eisenstein clearly points out, this general availability of the human intellectual heritage was needed since the universal mastering and assimilation of all previous knowledge was necessary before it could be properly surpassed [16, p. 516].
At the moment we are already witnessing the trend of making all kinds of works available in electronic form. It indicates that in the electronic era, more than ever before, the availability of all previous scientific reporting, discussions and controversies, become available as permanent sources for referencing, inspiration, and where needed dismissal. It also means that parts of old works can be easily integrated into new works. Hence, a new period of general information reevaluation can start.
i.ii) The printing press introduced the development of dictionaries, indexes, bibliographies, compendia, catalogues, and reference works. In other words, the emergence of proper registration functions and systems. It is interesting to see that history is repeating itself now; one of the largest activities on Internet is exactly this most elementary level of registration and indexing, as demonstrated by all the various Web crawlers, search engines and so forth.
ii) An enormous growth in the dissemination of identical information
ii.i) Next to the obvious rôle in advancing the education and general cultural level of society, printing also enhanced the integrity of the information as such, since deteriorating information due to heavy use, damage or aging can be checked against other copies of the same edition.
The availability of many identical copies allowed serious scientific discourse and exchange of views based on exactly the same information. This aspect became an essential ingredient of scientific development (including the concept of certification) and is, of course, an essential feature of electronic media too. A new, unique, feature of electronic dissemination is that we can always refer back to (an exact digital copy of) the original first copy. After all, the essential characteristic of digital information is that it creates perfect clones. However, we do have to realize that wilful corruption (or "improvement") is much easier in an electronic environment than with print on paper. So the other side of the coin is that the status of a printed report as a genuine representation of an unique work is challenged in the electronic era, where reading and manipulating are merged. Electronic watermarking schemes might rescue the situation.
ii.ii) An important related aspect is the use of books for self-study overtaking the old master-apprentice relationship. Knowledge is no longer coupled to a person but is easily available for the independent student. In an electronic environment "interactive textbooks" will complete this historical line with courses adaptable to the various levels and needs of the reading and learning students and scientists. Re-use of information also means that it should be stored differently, than in the form of large comprehensive linear texts; more as a collection of units, modules, or objects which can be dynamically combined.
iii) The emergence of standardization of presentation and judgement
iii.i) The emergence of widely distributed printed rules and laws, changed and standardised the entire legal and bureaucratic structure of the state. In the same way, in the course of this centuries-long process well established standards for writing and reporting emerged, which now appear natural. Standards, in the chain of events from scientific experiment to publication are now vested in research protocols, instructions to authors, and research funding proposal forms. The quality control and certification procedures find their expression in Journal names and Imprints of publishing houses. Although quality and certification requirements will stay, standards will partly change in an electronic environment. Different standards and ways of presentation for different kinds of information will develop. For example, the presentation in electronic form of raw experimental data demands another standard of (manipulatable) presentation and judgement (e.g. in peer review protocols), than mathematical proofs or scientific claims.
iii.ii) In the electronic era, the standardisation of mathematical symbols will allow for symbolic manipulation programs and interactive maths where 'readers' can 'play' with the works presented, in order to understand these works interactively. However, we have to understand that this prospect is not an easy one to realise. It took our own Hindu-Arabic numerals far into the 18th century to become universally used, whilst an internationally accepted notation for mathematics and logic only took shape in the late 19th, early 20th century, without yet reaching universality .
iv) The development of typography
iv.i) The emergence of type fonts in all possible languages such as Arabic, Greek, Hebrew, etc. secured by this typographical fixity old and/or threatened knowledge.
iv.ii) Increasing familiarity with regular numbered pages (in arabic numbers), punctuation marks, section breaks, running heads, indices helped to order the thoughts of all readers, whatever their profession or craft. In an interesting essay Katzen  analyses the development of typographical and lay-out structures in a case study of the Philosophical Transactions from 1665 until today. Highlighted text, running headlines, and all other techniques to identify different kinds of information in a printed text are now transcended in functional approaches like the Standard Generalized Mark Up Language (SGML), where the information content is identified separately from its typographical representation. The ordering of information will change again, since page numbers will now cease to exist. New ways of structuring of and referring to information are needed. This is the subject of the present article.
v) New forms of data handling
Large-scale data-collections were subject to new forms of use. Here, of course, the printing press reached its highest peak with the development of ingenious and complicated tables, graphs, and fold-outs. These two dimensional presentations are now supplemented with all kinds of 3D computer modelling, sometimes even with a time evolution as a fourth coordinate. In an electronic environment new data-structuring methods, now also for non-textual information, is high on the agenda.
vi) The possibility of error correction
The invention of errata allowed the continuing improvement of works in subsequent print runs. In an electronic environment the character of an erratum will change as we already discussed in section 2.3. This aspect also points to the notion that collectively working on one article in an electronic environment does not have to lead to one homogeneous text. Also an electronic document does not demand a local (group of) author(s). Using electronic networks, geographically separated authors can work together on the same article. This has to be arranged in such a way that each change or addition can be properly registered and assigned to a particular partner. Real integrated discussion can become the hallmark of a modular electronic article.
From the above, we see a difference between functions that are related to the technology used (general presentation, typography, page numbers, etc., registration and indexing systems), and functions which really enhance scientific communication per se (the organisation of the certification practice and the different ordering of unequal types of information). It is clear that the form of some functions will radically change in an electronic environment, where no single pages (or perhaps even classical documents) exist, where parts of one publication can be directly shared by a variety of other publications, and where improved editions can be made in a continuous way.
In analyzing the communicative and reporting functions of the journal article we must start by considering the scientist as both author and reader, since in the process of performing research both rôles are exercised simultaneously. Hence we can distinguish general functions, as well as typical authors' needs and requirements on the one hand, and readers' needs and requirements on the other hand.
Using the results of a survey based on extensive interviews with scientists reported by Kircz and Roosendaal , and Van Rooy , we can summarize the following general, technology independent functions of a scientific journal article:
The first three functions deal with activities of the members of the scientific research community themselves, whilst the last function is normally vested in supporting bodies like the publisher and the library.
It is clear that all these functions have, in principle, a trans-historical core and new technologies must improve the expression of these functions, compared to the existing usage, in order to become successful.
The same survey reports that researchers have rather well-defined expectations of scientific publications. With regard to information needs, Kircz and Roosendaal discuss the following aspects: reliability, relevance, timeliness, presentation, and storage.
In the process of direct information acquisition the needs are characterized by the catch-phrases: time-to-access, convenience of structure, personal adjustment of retrieval profiles, comprehensiveness and integration of sources, transportability, generative power (related to serendipity), transparency, and costs. All these are still fairly general notions, but all have a clear technology dependent component.
In the process of a research programme different information consumption needs emerge at different stages. In almost every phase in the research process, with possibly the exceptions of periods of pure data collection or performing calculations, the researcher is a reader. For instance, in the first stage of an experimental research project general information of the subject must be scrutinized in order to position one's own plans properly in their wider context. In the phase of apparatus building very precise technical information is needed. In the phase of data-reduction and measurement interpretation various models and interpretations must be reviewed and assessed. In the phase of comparison with other work and conclusions the information needs are again different. In the final stage of article writing often a complete examination of the existing relevant literature is needed to ensure that the reported results are well placed in the already existing corpus of literature on that particular subject.
Thus, in various phases of the research process the information needs vary from the demand for very precise data to general insights into a broader context of the general scientific aspects. In the various phases from the starting of a new project to the final reporting in a scientific journal, the "author" is never the same type of "reader". Nevertheless, in the present-day situation all the information needs are to be extracted from a linear, essay type article. Hence readers use the article only partly, browsing journal articles with a goal in mind, as has been analyzed by Bazerman . For this reason the various types of indexing and classification systems are insufficient for the searching scientist. Index-terms and classification-codes refer to entire articles and not to particular information in its proper cognitive context.
The breakdown of an article into its sections and subheadings such as: Introduction, Experimental Set-up, Design of the Study, Theoretical Excursion, Discussion, Conclusions, etc., suggests some relief to the reader as Line  put forward. Unfortunately the linear structure of the article entangles many lines of reasoning as argued by Kircz [34, p. 356], which makes it inexpedient to simply use the section headings as discriminators for the various types of information. For that reason it would be a great advantage if we could apply electronic techniques in such a way that we can better separate the various types of information, which will introduce a new coordinate of representation next to the index-terms, classification-codes, and the very words used in the article.
From the foregoing we infer that in an electronic environment the structure of scientific presentations can be better tailored to the reader's needs, as well as facilitating more concise and to-the-point writing, by breaking up the classical article and looking for its replacement by a coherent set of well-defined modules, each with its own cognitive characteristic. In doing so we have to ensure that the technology independent functions are well preserved and that the technology dependent functions are enhanced. In a modular model, where different kinds of information are presented separately, we envision a searching scientist will find the requested information more quickly and better placed in context.
In our analyses we take the following notions into account:
1) Different readers are looking for different types of information in the different stages of their research. This means that we have to know how the information needs of a reader develop during the various stages of a research process. Unfortunately, detailed investigations into the precise readers' information needs and literature searching behaviour as function of a developing research programme are not available. It is therefore useful to follow Kircz  where a division into four general categories of readers is made. In the course of a research programme a reader can change from one typology to another and back. We distinguish the following types of reader:
2) As the scientific article is essentially a communicative tool, a modularising of articles should at least retain the qualities of readability common to 'classical' linear articles. Hence all the modules, independently as well as in various clusters, must be properly readable texts. Some readers will only read one or more modules (see point 1 above), others want to read the full work. In all cases the reasoning and comprehensiveness of the texts have to be guaranteed.
3) A scientific article is not solely a compilation of scientific information and facts. The article is the culmination of the research process, in which ideas and results are disseminated to an audience of peers. The article is not only a reporting vehicle but is, to a large extent, a way of explaining, defending and questioning scientific ideas; it follows that the scientific article is also an argumentative text. The structure of a linear article follows a certain style of argumentation. However in a modular presentation the argumentative structure will be different since different modules may represent specific components in the argumentation. In this respect we depart from the idea of using the rhetorical structure of the entire article as a starting point for extra identifiers for retrieval purposes, as was proposed earlier by Kircz  and Sillince [41, 42]. Here we go an essential step further by explicating the argumentative structure in a new modular form of presentation. As we demand that every module is in itself a readable text, every module may have its own argumentative structure, as in every part of a discourse particular arguments exist. If we want to develop a new modular structure for scientific communication, we have to realize that the argumentative structure should be reconstructable by the reader. It is important to note here that we are not entering the discussion of whether scientific texts have a rhetorical function as such . We start from the observation that authors want to convey a message and want to persuade their community (and money granting institutions) of the relevance, reliability, quality, and importance of their work. We analyse how the structure of a scientific article, as part of an ongoing scientific (argumentative) discourse, can be made productive in shaping new forms of communication. The method we use is known as the pragma-dialectical approach in argumentation theory [44, 45, 46]. In this theory, rules are formulated which form a code of the conduct for rational interlocutors who want to act reasonably, in order to guarantee a structured, fair argumentative discourse. In this approach the starting point is that the interlocutors try to solve a difference of opinion by critically evaluating the standpoint at hand. The method is pragmatic because argumentation is considered a form of goal oriented language use and is guided by certain communicative rules, which transcend pure logical structures of premises and conclusions. The method is also dialectic because the argument is considered part of a discussion: one party (e.g. the author) puts forward a standpoint and presents evidence and arguments to augment this point of view, whilst another party (e.g. a close colleague in the same field or an unknown reader) might have a totally different opinion or indeed no opinion at all.
The pragma-dialectical model serves as an analysing tool as well as a judging tool in order to assess if and when a discussion goes astray. In using the pragma-dialectical analysis we attempt to reconstruct the reasoning in order to formulate guidelines for writing modular structured articles.
Having argued that modularity is a natural next step in the history of information transfer and presentation and having analysed the various functions of a present-day scientific article vis-à-vis author and reader, we arrive at the point where we outline our efforts of building a new modular model for scientific information.
In this part of our paper we will describe how we analyse a corpus of regular articles in the field of Atomic Beam Physics in order to grasp the underlying cognitive entities. The aim is to break down the linear essay form into well defined cognitive chunks, or information modules. Of course, mapping the content of the essay form into a set of various discrete modules is not a one-to-one process. The analysis shows that often information is repeated, whilst other information is missing. We try, in fact, to envision the information contained in the author's mind before it was cast into a print-on-paper essay. We will then cast the same information into connected, but self-contained, modules.
A second line of attack in dealing with the problem is the analysis of the way the essay article is presented in an attempt to better understand the reasoning behind the presentation. We need this information to enable ourselves to structure the desired modules more coherently.
In order to specify these ideas on modularity in scientific articles we are investigating a sample of regular articles in physics. The selection of the set of articles was practical.
1) The set comprises articles of a coherent research programme in molecular
collision physics by the group of Prof. J. Los from the FOM Institute for Atomic
and Molecular Physics in Amsterdam. This is the field in which one investigates
the reaction dynamics between two colliding chemical substances in the gas phase
(e.g. the metal Sodium and the halogen Bromine). They are analysed by spectroscopic
techniques which provide information on the interactions of the species which
ultimately leads to valuable information about the chemical behaviour of atoms
2) The set of articles reports on internationally acclaimed top level research.
3) The articles deal with experiments as well as theory.
4) The subject is sufficiently familiar to the members of our research group to deal with, whilst the senior scientist, Prof. J. Los, participats as consultant for intrinsic physics matters.
5) The articles are written in the conventional style, hegemonic for physics.
We are dealing with closely related articles which are, by themselves, part of a large international area of research. As every article is an independent publication, it becomes immediately clear that much information is repetitive and serves to introduce each new reader of each new article into the subject's world. On the other hand each article contains sufficient new and pertinent information on many aspects of this complicated research programme to have warranted the publication of a new article.
In order to envision a modular structure for scientific articles we have first to analyse the classical article and identify which characteristics we can derive from there. We must then discuss how we can use these features for reshaping the information in another form. An important issue is that the final new structure must at least be as readable as the original version (see section 3.3 point 2).
In our first, heuristic attempt we start by identifying four different kinds of characterisation for the modules.
This division proves to be very useful since it enables us to define information "packages" which are, in principle, re-usable in a series of communications. This way the well known repetition of information in linear articles can be avoided, by separating common information into separate modules, to which, in an electronic environment, each new communication can refer (e.g., with the help of a hypertext link).
Within these four types of characterisation we consider elementary and composite modules. An elementary module is the smallest unit of an article with a precise characterisation. A composite module is a larger unit composed of various elementary sub-modules. An example is the distinction between the elementary modules Experimental and Theoretical methods within the composite module Methods.
After carefully reading part of the corpus and trying to map the content into the five distinct modules, it was immediately confirmed that a simple "cutting and pasting" is impossible. A well written linear article has many interwoven and interactive lines of reasoning. The linear form forces authors to collect conceptually different kinds of information within one and the same section. The argumentative function of the article demands a certain build-up in order to be digestible for the reader (this is explained and elaborated in detail in many books on the writing of scientific articles [47, 48]) and varies over the different scientific disciplines.
Next to this, a typical paper-based problem is the available space. In a paper article the only option to explain intricate problems in detail is the use of appendices and footnotes, a practice none too popular with most scientific journals. In an electronic environment there is fundamentally no limitation on the length of a particular module. Instead of the notorious phrase "after some algebra we obtain", a full mathematical digression can be given which will only be read by those who really want to know the details. It is our firm conviction that in the electronic era, the demands on publications will change in the direction of a more compulsory completeness of presentation, simply because we can hide ("zoom-in/out") elaborate information for those kinds of readers not interested in the details.
We have to bear in mind that not every passage that fits a certain pragmatic description fits in the module of that name. For example, in describing a certain type of particle detector, the device in question is discussed not only with regard to its technical features, but also in relation to other devices which are not chosen for the experiment under investigation. So depending on the usages of the information, such text parts can belong either to a module "experimental set-up" or a module "discussion".
In this article we will not dwell at length on the ongoing work in analysing our corpus. Below we simply want to list our first results as an illustration of our methodological approach. First detailed results are published elsewhere . Our aim, here, is to show how our attempt to modularize linear articles is in principle possible and confronts us with rich and difficult analytical problems.
As mentioned above, the scientific article combines a variety of different functions. For that reason it is useful to start with the identification of those modules that are more or less clear cut and relatively less complicated. As a modular build-up breaks the various components of the work apart, it is necessary to start with a module we call Meta-Information.
This module should become the central module of a modular electronic article and caters for all pertinent questions about the article as such. It contains the following sub-modules:
The second module we can call: Goal and setting. From our first round of analysis it became clear that much descriptive information on the rationale for performing the research and the setting of the work within a broader context was gathered mainly in an introductory section. In our first meta-information module above, we have already split off the central informative meta-data from the text parts. It is therefore useful to create a special module comprising all those elements that discuss the reasons for performing a certain piece of research as well as its scientific and societal context. As argued above, informed readers who are interested only in for instance the pure results of the work can easily skip these parts.
An elaborate module is methods. In this module we would like to comprise all used methods, techniques, tools, etc. It is a central part of every article and one of the most difficult to handle. Obviously we have to break down this module into sub-modules. In a first attack on the problem we split the experimental methods from the theoretical ones. One can expand the notion of experimental methods to computational aspects for the case of a purely theoretical work in which calculations based on a certain theory are carried out using computer codes.
In the experimental physics programme we analyse, this split is easy. We find a description of the apparatus, the sample preparation, and the detection as well as the discussion on these tools and techniques. The next sub-module would the be the Measurements.
The division between the description of tools (above) and the use of them is given by the different reach of the information. Apparatus is often of a typical meso type and can be the same for a whole series of articles (take, for example, the description of a nuclear reactor as the source of neutrons which are used as a standard tool for certain kinds of spectroscopic measurements). The actual measurement is unique for each new result.
In defining these (sub-)modules a series of prescriptions for the author come almost immediately to the fore. If we can describe well defined information entities, a list of requirements automatically emerges which are essential to fulfill the general demands of integrity and certifiability. So it is adviseable (and in many cases perhaps compulsory) that authors give a full account of the technical range of applicability, the inherent sources of errors, the accuracy, and so forth.
Dealing with Theoretical methods, we can make a distinction between: Models, in which the theoretical model is described in its full splendour as well as whatever approximative form is warranted by the research presented, and Calculations. In the calculation part we really deal with the mathematical approximations and the number crunching. It is here that we can envision an entity called "full mathematical digression" which can be "zoomed-in" for those readers who do want to read all details. Here too it, is important to think about clear writers guidelines on how the reader should be informed of the constraints and validity of the methods used.
The fourth module is called: Results. This module contains the results of measurements or calculations which lead to answers of questions posed. We distinguish between the raw, unpolished results and the smoothed or fitted result due to statistics or application of certain models.
The most difficult parts to break apart into well defined structures are the Discussion and the Conclusions. In a discussion module we must distinguish between a discussion on the reliability of the obtained results and a discussion on the acceptability of an interpretation of those results in comparison with theoretical models. In analysing the parts dealing with discussion we encounter most interesting complications, as in fact a distinction has to be made between 'objective' result and 'subjective' interpretation. Obviously this is very much linked to problems where established knowledge is taken for granted, or where established knowledge is challenged by new ideas, interpretations, or data. A full argumentative analysis of these parts in the documents in the corpus will give further guidelines as to how to divide the various kinds of information in a useful and honest way.
Dealing with conclusions we can identify two types of information. Firstly, answers to the questions posed in the work linked to the module Goal and setting. Here answers may be formulated as to how and why the reported results give (partial) answers to the scientific questions at hand, as well as how the results rephrase the original quest. Secondly we have suggestions for further research; although not compulsory, it is important that authors can express themselves on how they think the course of research has to be set out for the future.
In this article we have tried to indicate that technological features are essential elements for the way scientific communications are shaped. Based on: i) a historical overview of the coming into being of the present-day scientific article and ii) an insight into the functions of a scientific journal article, it is argued that the crucial difference between the presentation on paper and a presentation in electronic form is that the linear essay form, characteristic for a paper based report, is replaced by an inherently modular form. In order to gain an understanding of this modular reporting, we analyse a coherent corpus of standard high level experimental physics articles. In our analyses we analyse the purely scientific content with regard to the various kinds of information, as well the argumentative structure which reveals the underlying reasoning. With these results we want to create a new heuristic model for a modular presentation of scientific communications. In this article we report the general outline and the first steps. As our sample corpus is prototypical for science articles in general, we present our results and ideas, in this paper, without going into technical details. In subsequent publications we will report all details. We hope that we provide sufficient evidence so that the question of modularity as the next step in scientific reporting can be answered in the affirmative. With this article we also hope to provide sufficient material for a more general discussion on the changing forms of scientific information in an electronic environment, based on solid knowledge of different rôles and functions technology and language play in communicating scientific results.
First of all the continuing critical discussions with the other members of our research group: Maarten van der Tol and Frédérique Harmsze are gratefully acknowledged. Helpful discussions and comments also came from Antje Melissen and Hans Roosendaal, whilst Betsy Lightfoot and Jonathan Clark are thanked for correcting the English as well. This work is part of the "Communication in Physics" project of the Foundation Physica, and financially supported by Foundation Physica, KSLA, Royal Dutch Academy of Sciences, Royal Library, and Elsevier Science NL.
a) Hence we do not consider review articles or letters.
b) In our opinion, the term `many-medium environment' better describes the integration of sound, text, as well as interhuman discourse than multimedia; multi-media is commonly understood as fully digitalized information: hence, in fact, a mono-medium (zeros and ones).
c) An interesting observation is that at the beginning of the peer-review system, the authority of the spectators was important. In time this changed to the demand that research has to be reported in such a way that a detached, innocent, reader could in principle repeat the experiment. In our time of e.g., huge and unique experiments such as these at the European particle physics research centre CERN, or large scale computer simulations, the authority of the authors again creeps into the peer-review process.
d) An interesting observation is that at the beginning of the peer-review system, the authority of the spectators was important. In time this changed to the demand that research has to be reported in such a way that a detached, innocent, reader could in principle repeat the experiment. In our time of e.g., huge and unique experiments such as these at the European particle physics research centre CERN, or large scale computer simulations, the authority of the authors again creeps into the peer-review process.