> > > LIBREAS. Library Ideas # 4

The foundation of LIS in information science and semiotics

Søren Brier, "The foundation of LIS in information science and semiotics. ". LIBREAS. Library Ideas, 4 ().


part 1

part 2

part 3

  • Various types and definitions of information
  • The usefulness of Peirce's approach in LIS
  • References
  • Figure 1 . Figure 2 . Figure 3 . Figure 4

    The technological impetus for the development of information science

    Information science did not really take off until the development of computer technology in the 1960s and its increasing use as information technology in the LIS domain. Seen from society's viewpoint, the problem has primarily been how to handle constructively and cheaply the burgeoning production of documents from science, industry and culture. An industrial aim has been to construct a technology to handle increased access to the buying and selling of knowledge; information is becoming a strategic resource on a level with capital, technology and labor.

    The information retrieval industry has become a large-scale industry in the so-called information society. The computer and communication industry has exploded since World War II and is now entering a synthetic phase. Now the computer's various technologies, including calculation technology, telecommunication and language, and more recently its sound and image treatment, are beginning to melt together into a common multimedia interaction technology.

    It seems clear that document retrieval, and therefore registration, indexing and classification, plays an unavoidable and growing role in nearly all types of extensive computer networks and all kinds of knowledge-sharing systems. The larger these systems become, the more central the document-mediating component will become and the graver the problem of indexing and intellectual access.

    As Blair (1990) has pointed out, there is a qualitative shift in the problem of document retrieval once databases exceed 100.000 documents. First, the number of retrieved documents becomes a problem: there are too many documents to sort for relevance. Second, the level of “noise” becomes intolerable, especially for full text automatic natural language indexed documents. Third, it becomes nearly impossible to estimate recall: nobody knows what of relevance really exists in a base with 15 million documents, such as BIOSIS.

    The problem is, on the one hand, that the user is buried in too many documents of different levels of relevance and knowledge quality, and on the other hand, that the user may miss the most relevant high-quality documents. These are the documents especially suited to the user’s problems, interests, knowledge background, focus, knowledge level and time for reading. Anyone who has made a subject search on the Internet understands what I am referring to.

    Through the Internet we are improving physical access to electronic documents’ information for growing numbers of people, and we are increasing intellectual access for many newcomers and low-level users of document systems. But high-quality intellectual access is becoming a growing problem for those who need it in their daily work, such as researchers, teachers, journalists and managers. Overload, noise, lack of precision and ignorance of recall are modern problems of document retrieval.

    One way to improve intellectual access is to create interfaces for users with domain knowledge, but without LIS-technical skills. These difficult cognitive and communicative problems are connected to giving access to users who lack domain knowledge and therefore do not know the specialized meaning of the concepts employed, some of which are the same as those used in everyday language.

    There is no doubt that technological development is transforming traditional LIS areas of document retrieval and mediation of knowledge. It is therefore important both theoretically and practically to respond constructively and purposefully to meet this challenge. This requires a scientific basis that encompasses technical, sociological, psychological, and linguistic aspects of the problem of translating peoples’ information needs to system-functional queries.

    From the last twenty years of computer systems development, it is clear that the manipulation of natural language by machines in a social communicative setting with humans is a major theoretical and practical problem. The meaning of language is one of the pivots around which human existence spins. The problem of how precise meanings of signs and words become fixed within a social and cultural practice is enormous from a traditional mechanistic point of view. Attempts to build interdisciplinary information sciences on the basis of the entropic conceptions of information in information theory and thermodynamics have been theoretically and practically unable to fruitfully deal with the problem of the communication of meaning between humans. The functionalist – or information-processing – paradigm in cognitive science has the additional difficulty of providing a theoretical background for approaching the problem in IR with multi-faceted meanings of words and sentences. As documents are complex semantic sign and language systems, they are some of the most difficult items to handle in computer systems for retrieval by a broad public. As Blair (1990) argues, users are the only reliable source to judge relevance, and only users can turn information into knowledge.

    No doubt, development of interfaces that are interactive and graphic could improve search quality for low-level users. But something more fundamental in the organization of our scientific understanding of document-mediating systems is at stake. It is a qualitative difference in the way computers and humans deal with the complexity of information. As Luhmann (1995) points out, then humans reduce complexity through meaning. LIS must move from a mechanical information-processing understanding of document retrieval based on only cognitive science’s information-processing paradigm towards true integration with a more pragmatically semiotic, cybernetic and socio-linguistic theory of understanding in order to improve the design of document-mediating systems. The theoretical foundation of LIS in the IR-area must be replaced by a broader foundation that incorporates the semantic production of meaning. To summarize, LIS has four major problems:

    1. First the lack of a theory of how to design the best possible document-mediating system for one or more well-known user groups.
    2. Second the lack of a theory of how to design interfaces for non-specialists for huge document bases originally created for documentalists within certain subject areas or domains, most often scientific and technical such as chemistry, biology and medicine.
    3. Third the lack of full recognition from computer and software designers and from the arts and sciences of the interdisciplinary complexity and scientific depth of the problems of document mediation.
    4. Fourth the lack of a full theoretical scientific foundation for the practice of librarianship in the age of the computer. LIS’ lack of a fully developed theoretical and scientific self-awareness is a problem for itself, for the recognition by other scientific subjects and research groups of the seriousness of the problems it addresses and the depth of the knowledge it has already acquired through centuries of practice.

    In my opinion the information-processing paradigm has never been able to describe the central problems of mediating the semantic content of documents from producer to user that documentalists and librarians deal with. It fails in this regard because it does not address the social and phenomenological aspects of cognition (“becoming informed” in Buckland’s 1991 terminology), which is the bottom line of the mediation of documents. This leads to serious doubt about the existence of the scientific object in the form of “objective information processing”.

    My rejection of the information-processing paradigm of information science is based on views similar to Machlup (1983) and Winograd & Flores' (1987) statements that the original meaning of the word “information” is something a person (or a living system) communicates to another person (or living system). The meaning of information can only be understood by considering living beings in a social and historical context. Furthermore, I agree with Machlup when he suggests that one cannot define information as that which reduces uncertainty. The fact is that some kinds of information will make the receiver more uncertain. But if one knows the social context precisely enough to determine the full spectrum of possible outcomes, then one can use a statistical/entropic information concept as part of the description of the characteristics of a message. I further agree with Searle (1986) that the common link between information-processing in humans and machines is not the fact that both follow rules. Machines behave according to causalities, but only conscious beings can willfully choose to follow rules.

    But since the information concept is now firmly rooted in computer informatics and in the information theories of Shannon and Weaver as well of Winerian cybernetics, another strategy would be to abandon the original human communicative meaning of the concept as Shannon’s theory of information has never addressed the semantic content of messages. Shannon and Weaver write:

    The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that they are selected from a set of possible messages. (Shannon and Weaver: 1969:31-32)

    What people and animals treat as “information” is quite different from what Wiener’s theory of information suggests. Stonier therefore makes a sensible discrimination between information and meaning, as information to him is objective structure and organization. This is clear, but we must then realize and acknowledge that the theory has almost nothing to do with semantic aspects of cognition or communication between living systems. As von Foerster concludes:

    However, when we look more closely at these theories, it becomes transparently clear that they are not really concerned with information but rather with signals and the reliable transmission of signals over unreliable channels (...) (von Foerster 1980: 20-21)

    Thus, information science in the subject area of living systems and humans will not be able to explain vital aspects of the phenomena of cognition and communication, such as meaning and the constraints of social context. It is also well known that to determine the entropy in a system it is necessary to determine in advance what will qualify as macro states and the probability of every state. There is no room for the completely unexpected, and therefore the real creative complexity of nature and language is lost. Thus this approach has other limits on its own level.

    In my opinion, knowledge – or knowing (to underline the process) – is a far more complicated “thing” or process than expected by the “information-processing paradigm” described above. According to, for instance, Thomas Kuhn's philosophy of science (1970), nature and human mind are not directly connected. Nature does not speak to us. I would also argue with Maturana and Varela (1980) that nature does not – in the usual meaning of the word – transfer information to us through our observation. We participate through science in a socially, biologically and psychologically influenced interpretation of the world.

    Warner (1994) points to the problems for LIS if the meaning of words must be partially inferred from a socio-linguistic context. It is clear that simply matching query words to index words, no matter how sophisticated a partial match and ranking algorithm one has, will always have a low precision because the semantics are not equally well-defined.

    In practice, librarians, documentalists and LIS-researchers work every day on the social and practical dynamism of signification and how to relate it to the textual representations in computer systems (not to mention the problems of understanding how the different software system themselves use words). Now through the Internet, corporate Intranets, management information systems, GIS, and file-handling systems, more people are spending more time looking for an exponentially increasing numbers of documents. What LIS needs is an interdisciplinary scientific understanding broad and deep enough to encompass the communicational and organizational aspects of classification and indexing of documents, the computer systems and the producers. Therefore, the theoretical foundation of LIS in the IR-area must be broadened to incorporate the semantic production of meaning in its conception of information science.

    In his semiotics, Peirce says that the meaning of a word is its use in society (practice of living). This fits well with the language philosophy of the later Wittgenstein who says that words’ meanings are fixed not by definition, but through the “language game” they appear in, such as dialogues, persuasions or seductions. This is what Wittgenstein calls “forms of life,” in short, the things humans do. Among these are specific kinds of science. Science is a life form and has its own language games, as document searching does by subject.
    The hope of trans-disciplinary theories in information and communication science is that they can deepen our understanding of human to human communication through machine-mediated documents in such a way that we can improve our designs of document-mediating systems.

    LIS: The science of document-mediating systems

    The subject area of expertise of librarians, archivists and documentalists has always been the storage, indexing, retrieval and mediation of materials carrying data, knowledge, meaning and experience.

    1. One can therefore define the objective of LIS as first and foremost to promote the communication of desired information from the producer to the user.
    2. Information can include recorded measurements and observations, theoretical knowledge, meanings, visions and experiences, art and fiction.
    3. I define data as a given input with a structure that the receiver regards as reliable and usable in a given situation.
    4. I define a document as a human work with communicative intention recorded in a material way.

    I want to clarify Buckland's view that can be construed considering natural things to be documents in themselves. Something culturally and intentionally communicative such as being put into a classification system must be done to natural things before they becomeS documents. But perhaps we can agree by saying that all things are potential documents, just as everything is potential information. These become documents when they become interesting for members of a communicative knowledge system. But that demands that the object is put into a certain point of view so that it can be viewed from a certain interest, or as Bateson (1973) says: Information is a difference that makes a difference.

    Data becomes information when it is integrated with a given knowledge process and pre-understanding. It only becomes information when it is received and interpreted by a bio-psychological-social knowledge system. The difference between knowledge and information is that information is viewed as a minor part of a knowledge system. But both are dependent on semiotic interpretation if they are to become meaningful.

    I agree with Salthe in his view that one cannot consider the meaning of information without an interpretation. We could add to Wiener’s statement that (in itself) ‘information is information, neither matter nor energy’, that information is also not meaning until it has been interpreted by a living system.

    LIS is concerned with finding suitable rules for the design of systems and procedures for collecting, organizing, classifying, indexing, storing, retrieving and mediating those materials that support data, knowledge, meaning and experience. As an offshoot of both indexing and communication to users with different requirements, one must study the origins of the various document types, how they are produced, for which users they are created and under which economic and knowledge domain constraints they are produced.

    It has been realized that producers of documents generally have certain consumers in mind, and these consumers are often part of the group of producers themselves. In this way the system is, as seen by a cybernetician, closed in on it self. The LISA bibliographic database, for instance, is a base of information and library science with documents written by librarians and information scientists, to librarians and information scientists, and mediated by librarians and information scientists.

    The cognitive viewpoints opening toward a cybersemiotic concept of information in LIS

    In agreement with Ingwersen (1992) one can, as an answer to the humanistic-socially-oriented critique, formulate a broader and less objective concept of information than that of the information-processing paradigm. From the cognitive viewpoint, information is seen as the mental phenomena that documents (consisting of signs and text, depending on the state of knowledge of the recipient) can cause. The examination of these “correct circumstances” is an important part of information science. In connection with the design of information systems for businesses and institutions, one can now speak of information quality. The cognitive viewpoint represents three important developments:

    1. Information is understood as potential until somebody interprets it.
    2. The objective carrier of information is a sign.
    3. Interpretation is based on the total semantic network, horizons, worldviews and experiences of the person including the emotional and social aspects..

    The aim is for the creation of information in the user’s mind to be understood as meeting social, cultural or existential needs. This is an important improvement to the intention of cognitive science to create an objective theory about information. One can therefore reformulate information science's aim as follows:

    Library and information science devotes itself primarily to the study of systems and methods for classification, indexing, storing, retrieval, and mediation of documents that can cause the creation of information in the user’s mind.

    The crucial question is that of the interpretation of the document’s meaning for the individual in a given organizational or institutional connection, and in a given historical situation. Ingwersen (1996) describes the information need as built from a cognitive state (including previous knowledge), a work task, interest and a domain.

    Neither information nor quality is constant phenomena; they change over time. Relevance is the keyword here, and relevance is dependent on the meaning we give to things in relation to our preconceptions. It is these social-pragmatic circumstances that form the context for understanding our informational desires and problems. Ingwersen (1996) successfully develops a matrix with four distinct cognitive forms of information needs relevant for determining search behavior and types of polyrepresentation.

    So far, we do not have an explicit theoretical treatment of how varying forms of aboutness come into existence and function in a social context. As information, in this view, develops primarily in an individual mind in front of a document-mediating system, there are no explicit theories about how information develops in social practice. This is where semiotics both as a general and as a social science of meaning generation and interpretation can contribute to the informational view of LIS.

    As Blair (1990) has shown, one of the largest problems of subject searching is that indexer and the searcher do not participate in the same language game! Their work and social environments are different and therefore their use of words will be different. This means that their subject descriptions will be different: they will mean different things with the same words, or they will use different words for the same thing. There is no “ultimate” description of a document or of the uses of a subject term. Attempts to make a universal and correct description are pointless because there is no limit to the number of descriptors one could use to characterize a document; semiosis is unlimited (Blair 1990). To limit the spectrum of meanings in a useful way, the indexer must create descriptions in accordance with certain forms of living, such as trading and research, and their language games, also called discourse communities and their discourses.

    From this view, it is possible to point to several systematic reasons why the user does not obtain a satisfactory retrieval result and why this situation is so difficult to improve. Although the large domain-specific databases attempt to make the definition of their classification system (thesauri and controlled keywords, professional indexers) clear and consistent, there is still the human factor of interpretation. The optimum of their actual precision of subject-indexing performance is only 75% consistent between all indexers on a detailed or “deep” indexing of a document. The likelihood that a user would use the subject term in the same way as the indexers – even if the user reads the scope notes carefully – is even smaller, say 60%. This means that the limits of the sets get very fuzzy. Every time one uses a subject term to describe the documents sought, the chance of locating the right one is 0.75 x 0.60 = 0.45, even when the system was used to the best of the user’s abilities. When one combines four different subject terms to select specific documents, the chance of selecting correctly is then 0.454 = 0.04 = 4% (see Blair 1990:106 for an even more merciless example). This problem preexists before using any mechanical model of partial match using probability, weighted factor ranking, vector space, fuzzy sets or hypertext. These are all based on the ideal of good and consistent indexing and a perfect match with the user’s application of search terms.

    Thus one can only retrieve very small portion of the relevant documents, and a great number of those retrieved will be irrelevant. The more search terms used, the more serious the problem becomes. If one knows the bibliographic base’s classification system, there are ways to diminish this problem. But most searchers do not, and even for those who do, their efforts diminish the problem, they do not eliminate it. Information is stored in such a way that only specialists combining subject knowledge with technical retrieval knowledge – investing years of training – really have a chance to retrieve. The system only produces knowledge with the desired precision and scope for certain groups, and then only to a limited extent. There may be physical access for all, but intellectual access only for highly trained specialists.

    The basic problem in LIS is that one must perform an intellectual analysis to determine the content of a document in order to achieve precise and useful indexing.

    Now, does the indexer determine the objective content of the document? Both practice and hermeneutic theory tell us that the content of a document depends on the context in which it is seen, that is to say, what those who read it know and what their interests are. There are at least three ways of determining the content of the document:

    1. The content of the document, as seen from the indexing system (its thesaurus or classification system). In the best cases this is constructed from a profound knowledge of the domain of knowledge in question. This is seldom the case and furthermore, the researcher who wrotes the document probably did not have the classification system and its concepts in mind when writing the paper. The writer might invent some neologisms, a new interdisciplinary subject, or perhaps a whole paradigm opposed to the paradigm underlying the present classification system. Finally, the user often does not share the knowledge background of either the system or the author.

    2. The content of the document as seen from the author’s viewpoint:
    One can pick words from the text and the indexer can give a description with appropriate words. But the determination of this is an interpretation by the indexer. The main interest of a document retrieval system is that others who need or require the document can find it using their own language game, and this is most likely distinct from those of both the classification system and the indexer.

    3. The content of the document as seen from the user’s viewpoint: The problem here is that in most large document retrieval systems there are so many types of users that the indexer can only index in relation to the largest and most formally well-defined knowledge domains.

    As Blair (1990) suggested, one of the major problems of subject searching is that indexers and searchers do not participate in the same language games. Their work and social environments are different, and therefore their uses of words will be different. The hope of trans-disciplinary theories in information and communication science is that they can deepen our understanding of human to human communication through machine-mediated documents in such a way that we can improve our designs of document-mediating systems. To summarize, our major challenge in LIS now is how to map semantic fields of concepts and their signifying contexts into our systems in ways that move beyond the logical and statistical approaches that until now seemed the only realistic strategies given available technology.

    - move on to part 3 -

    Søren Brier ist Associate Professor im Bereich philosophy of information, cognition and communication sciences an der CBS (Copenhagen Business School).

    Unter anderem ist er Herausgeber der Zeitschrift Cybernetic & Human Knowing: A Journal of Second Order Cybernetics, Autopoiesis and Cybersemiotics.


    Homepage: http://uk.cbs.dk/content/view/full/9710