Pragmatic Applications of the Semantic Web Using SemTalk

Christian Fillies

SC4 Solution Clustering, Falkensee, Germany cfillies@semtalk.com

Frauke Weichhardt

Beratung im Netz, Potsdam, Germany fweichhardt@fweichhardt.de

Gay Wood-Albrecht

Bonapart Solutions, USA wood-albrecht@mindspring.com

Dietmar Wikarski

Fachhochschule Brandenburg, Brandenburg,Germany wikarski@fh-brandenburg.de

Pragmatic applications of the Semantic Web using SemTalk

Summary

The Semantic Web is a new layer of the Internet that enables distributed modeling of the contents of existing web pages. Semantic webs store not only text but they are similar to whiteboard models that include the most relevant associated terms or keywords. Compared to standardized ontologies, semantic webs present powerful new search strategies. “Ambient”, intelligent applications and agents can use this knowledge network in various ways.

SemTalk, using a Microsoft Visio front-end, offers an easy to use editor for semantic web ontologies and processes. Using an open, graphically configurable meta model, Visio can be easily adapted to different model worlds such as ARIS EPKs or Bonapart process and organizational models. These models, with the help of Microsoft Office XP SmartTags, allow users to easily create semantic webs as a bi-product of their daily work with other MS Office products such as Winword, Excel or Outlook.

This paper will present two applied uses of this technology:

1. An Ontology Project: Department-wide information modeling at the Credit Suisse Bank.
Emphasis was both on linguistic standardization and in the development of a centralized description of all of the decentralized applications found in the organization. Local knowledge management teams were able to immediately take advantage of the available terms and solutions created by the modeling teams.

2. A BPM Project: Distributed process modeling of the Bausparkasse Deutscher Ring, a German financial institution
Several groups of students from the technical university FH Brandenburg explored how to develop and apply an industry-specific Semantic Web to Business Process Modeling.

1 Introduction

The next generation internet or Semantic Web is a new layer of the Internet that enables distributed modeling of the contents of existing web pages. Semantic webs store not only text, but also whiteboard files or frameworks that include the most relevant associated terms. Compared to standardized ontologies, semantic webs present powerful new search strategies. “Ambient” or embedded intelligent applications and agents can use this knowledge network in various ways.

The Semantic Web is still in its initial stages. Enormous possibilities for its further development can be seen from the increasing number of pages available about semantic webs. Even though concrete applications are still very rare, the definition of XML logs such as RDF, RDFS and DAML+OIL by the W3C suggest growing interest. Therefore it is likely that an ever-increasing number of Semantic Web applications will be seen in the near future.

Based on our recent experiences, we predict that this new technology will spread first within the Intranets of larger, distributed enterprises, since there is continuous demand to fine-tune Knowledge Management system structures between different areas of the enterprise. The creation and fine-tuning of these Knowledge structures can easily be accomplished using Semantic Web technologies. The creation of a central vocabulary within the context of ontologies and processing concepts is a necessary prerequisite

The following is a description of two practical applications of Semantic Web technology. The goal of the first project was to create a department-wide information model within Credit Suisse, whereby both linguistic standardization as well as a department specific description of all of the decentralized applications used by the different departments.

The second project involved distributed process modeling of the Bausparkasse Deutscher Ring, a German financial institution. Several groups of students from the technical university FH Brandenburg explored how to develop and apply an industry-specific Semantic Webs.

2 Comprehensive Departmental Information Modelling at Credit Suisse

In the context of this project several workshops were undertaken to create the basic repository for a growing visual glossary. This glossary was to be used as a possible basis for a knowledge management system. The results of the workshops were summarized in the form of conceptual models. These models were then published and made available in the Intranet.

2.1 Assumptions

In today's large-scale enterprises language variety is common because of rapid technological change and integration of many smaller companies or departments into larger conglomerates. This is particularly true in the IT area where there is an abundance of different architecture descriptions, strategy papers and technology concepts etc. The knowledge contained in these documents is often strongly bound to the vocabulary of individuals, and is therefore difficult to consolidate. Typically frequent is the use of homonyms, words having the same sounds but different meanings. Although in the quite new area which IT is, synonyms are also emerging that can also have quite different meanings depending on the department.

2.2 Project Goals

In this project an infrastructure and a practically usable base vocabulary needed to be prepared using existing linguistically standardized documents. Glossaries and/or models were represented as flexibly as possible in reusable forms so that they could be easily inserted into technical applications such as Document Management and Content Management systems. A further application is the automatic document classification system.

The emphasis in this project was both linguistic standardization and the population of a central version of a glossary that was to be used by people designing or managing department-specific peripheral applications. The goal was not dogmatic control or centralized specification management but rather to create awareness of available terms and solutions at the level of the local knowledge manager or member of the modeling team. In order to ensure that use of the glossary permanently became a part of everyday practice, a general consciousness of context had to be produced. This was most effectively accomplished using already available contexts such as integrating standard office applications, most importantly Office XP, in the preparation of fundamental definitions.

From the start of this project initial requirements demanded that the glossary available in the Intranet should be in a form suitable for many different types of users. This meant that it was not acceptable to use complicated technical notations e.g. UML diagrams.

It was hoped that this project would produce a possible measuring stick from which future knowledge management systems could be structured. “Bootstrapping” of such a system is always a very complex project. Initially if there was not enough content available, the system would not be used sufficiently and therefore would not begin to develop a life of its own. However a complete ontology of all objects existing in the enterprise is not desirable. The world is constantly changing and the language of the enterprise needs to reflect these changes.

Success depends on being able to publish a glossary with sufficient content and basic graphic definitions to encourage users to use and update the glossary as appropriate. This required technology that is easy to use and one that is integrated with standard office applications. Similar to the creation and indexing of textual web pages, this is best done if the system appeals to the need for the user to participate in the process.

Within this scope of this project only the creation and modeling of a glossary were required.

2.3 Semantic Web as a Knowledge Management System

The glossary consists of terms with definition text and Synonym/homonym relationships. In addition, explicit relationships are defined between the terms and their classifications to super ordinate and subordinate terms. The formalized representation is presented as a model. In order to store information models flexibly, both topic maps and W3C recommended RDFS based on XML standards are created.

SemTalk is used as graphic editor. With help from SemTalk and RDFS the models can be stored as individual HTML web pages in the Intranet with all of their embedded hyperlinks. This type of the knowledge representation requires no central maintenance for the model and it provides a coordinated approval mechanism for the core terms that are used.

Figure 1: View of a SemTalk Model

Consistency between different partial models is ensured during the modeling process by the SemTalk consistency Wizard. The Wizard points out which terms are already used in another model. Instead of modeling the same term again, a hyperlink from the reference term is formed. The SemTalk Wizard uses index tables created by the SemTalk RDFS Crawler. This Crawler creates a directory of the available knowledge within selected areas of Intranet, Internet and within file systems.

These index tables are also used to interface with MS Office. SemTalk SmartTag is a technology that analyses text while the user is writing in order to mark the words that are already contained in the glossary as reference terms or Synonyms. Synonyms that are found can be exchanged for other reference terms if necessary. The definitions of the detected words are available using a single click that will take you to either the Visio model or to the available HTML representation. This results in substantial savings during complex manual revision of texts.

The SemTalk Tool Suite also produces pointers to revised documents and text passages.

Specific models, for example, the representation of detailed connections between individual documents are created. If these connections are not to be included in the general glossary, SemTalk can be run on the workstation during text revision. Models of individual documents or of specialty areas extend or add specialized components to the general glossary. As each term is used again it is arranged in the context of existing terms. Searches using general headings will also include new models and glossaries containing specialized terms.

If new terms for the general glossary emerge during document revision, they will be added after they are reviewed.

Knowledge management systems are usually initially created via workshops, usually with expert interviews. Significant savings can be realized if the Concept composer from the TextTech company is utilized to extract useful terminology.

· The Concept Composer is assigned to search larger text quantities (source text + collocation) together with SemTalk relevant technical terms can be identified as well as appropriate collocations

· Concept Presenter in the Intranet with graphic interface, can be integrated into the HTML Viewer of Semtalk.

Figure 2: The Interface to Concept Composer

Different versions of definitions, associated Synonym/homonyms and text passages can be managed with the SemTalk Glossary. The SemTalk Glossary shows the interface between SemTalk and the Concept Composer.

2.4 Project Bootstrapping Methods:

Creation of a list central, more prioritized list of defining terms
Scanned text from 100 representative documents via the Concept Composer (TextTech company). Results consist of a hit list of important technical terms, an infrastructure for looking up passages in the text and package collocations that show the frequently word pairs are found together. Concept Composer was used first externally as ASP solution.
Execution of three, 3-5 days Workshops, with up to five experts. During the workshops the SemTalk Glossary is used for the documentation and administration of definitions.

Figure 3: SemTalk Glossary

At the end of each Workshop day the scenarios discussed during the day are modeled graphically in SemTalk. The resulting graphic models are crucial in helping to simplify the resulting discussions. Relationships are easy to visualize and it is easy to navigate through large amounts of information. Homonyms are places on the opposite side of the graphic representation.

At the end of the Workshop central terms are defined and graphically modeled. The glossary with all of the graphic representations is then placed on the Intranet to be used by the enterprise.

2.5 The Knowledge Management Process

Creation of a glossary using SemTalk acts as a knowledge foundation that is designed to dynamically grow in ways that support better decision making and communication within the enterprise especially as the environment changes. The glossary is published on the Intranet. Periodic audit of the contents ensure that the glossary remains up-to-date and useful. Modification requests are centrally collected and updates are made on a regular basis with the collaboration of the appropriate departments. A model is only updated if the majority of general users deem the updates appropriate. Responsibility for the maintenance of the models was given to the individuals responsible for Intranet updates.

2.6 Project Results

Two hundred critical keywords were modeled over a three month period. Approximately 10 departmental representatives defined these keywords during several Workshops that lasted between two hours and three days. Project costs were related to time lost from work. SemTalk Glossary was strongly felt to be a critical factor in being able to effectively build a glossary is such a short period of time.

The results were published in the Intranet and updated periodically. SemTalk enabled users to access keywords in several different contexts. The graphical view made it easier to understand the meaning of the keywords in relationship to each context because both the keyword and associated words are identified when doing searches.

SemTalk structured project work in a way that enhanced communication between coworkers from different departments. Additionally, purposeful revision of the documentation made it easy to quickly identify which documents needed to be updated, especially if context for a keyword changed.

2. 7 Future Perspectives

The glossary created for Credit Suisse is currently in testing. If this project is deemed successful the project will continue in other departments throughout the enterprise.

3 Distributed Process Modeling at the Deutscher Ring Bausparkasse

The primary goal of this distributed process modeling project was to model order processing at the Deutscher Ring Bausparkasse. This project took place over several weeks and was done by students from the professional school Brandenburg.

The primary difference between this project and conventional process modeling projects was the use of an industry-specific Semantic Web. Semantic web allowed processes to be easily fine-tuned and terminological work to be executed more efficiently.

3.1 Conditions and Goals

Two separate groups, each with four students, modeled all business processes in two different departments. After interviewing department members information was modeled systematically in SemTalk. Models of existing processes were shown next to models of the “to be” processes that showed both the desires of each department as well as the feasibility of implementing the processes.

The primary customer targets were to make the processes clearer in the enterprise as well as defining the processes needed for the new workflow management system.

In addition, a significant project aim consisted of examining the processes associated with distributed modeling. The project team examined how communication can be improved within modeling teams and with the end-users.

Experience from business modeling projects shows that a high quality distributed modeling tool with a common repository is not always sufficient. A repository can only guarantee the syntactic consistency of a model in a best case scenario. Most modeling tools offer little assistance with the creation of a common conceptual basis for functions, processes and information. This problems becomes more important if processes are spread between enterprise, e.g. such as the B2B area when different business partners must connect first their enterprise languages with each other.

3.2. SemTalk Process Modeling Methods

The most important philosophy behind the Internet and hence Semantic Webs is that information is not copied, it is referenced. Creating links to external pages does not alter the contents of those pages. A flexible information system that develops in this way does not have the consistency of a database but it has the advantage of being able to grow dynamically. SemTalk does not create individual models, it creates a network of linked models. While the emphasis of the Semantic Web is on pure knowledge representation, or in the case of Credit Suisse the modeling of information classes, SemTalk process models can also be created and managed as webs. Models can be created within one another or they can be linked with external models such as industry-specific standards.

Semantic Web process modeling procedures consist primarily of three steps:

1. Selection of suitable reference library from the Internet

2. Customization of these libraries to fit project requirements

3. Creation of the process model using the reference model as a background

3.2.1. The Semantic Web Delivers Reference Models

Our methodology consists of using internet-based reference models that are easy to adapt to users needs. There is an increasing number of organizations that have developed such models:

http://www.dmtf.org develops an ontology for the Telecommunication Industry
http://www.bpmi.org develops a process ontology for representing business processes
http://www.papinet.org develops global transaction standards for the paper supply chain.
http://www.hr-xml.org is dedicated to the development and promotion of standardized XML vocabularies for human resources (HR).

There are also different XML-based languages being used. Two popular repositories from the EAI area are BizTalk www.biztalk.org and RosettaNet.

General XML notation systems are found at www.cyc.com and at Wordnet www.xmlns.com

These reference models can also analyze source text such as that described in analysis as in the first section.

3.2.2 Process Modeling

SemTalk supports different business process modeling methods, including the representation of enterprise processes developed using PROMET, a method developed by Österle at IMG (http://prometatweb.img.com/). In the current project, with its strong focus on internal processes, SemTalk uses the methodology of communication structural analysis (CSA) developed by Krallmann (http://www.sysedv.cs.tu-berlin.de/Homepage/SYSEDV.nsf/) The students in the Deutscher Ring Bausparkasse project were already familiar with this method because of their experience with the CSA-based modeling tool Bonapart.

In CSA, and therefore also in Bonapart, a process consists of interfaces between activities connecting by information flows made up of information and media. Class models act as building blocks for these process models. Class models help to form structured and linguistic consistent processing concepts. This improves re-use and allows better methods to evaluate models that are being developed. This mostly concerns model elements from the modeling tool Bonapart.

Class models maintain linguistic consistency of the processing concepts. They form the basis of model re-use and offer better ways to evaluate the models that are being developed. With SemTalk the class models in the Semantic Web are written in standard RDFS and they contain references to other of class models. The class models can be created top-down using existing materials or bottom-up during workshops. Bottom-up modeling is generally more efficient because it helps to limit the modeling depth of the class models.

Thinking first about the objects and then over the processes themselves is an important step in the initial phases of the project. It is also critical to make sure that class libraries are consistent between several small related models. This will make it easier to integrate the models later.

3.2.3 An Example

For the better understanding of the distributed modeling with SemTalk the simple process " address modification " (Figure 4) is presented in the following example.

Figure 4: Example process „Change Address“

3.2.4 Tool Support Using SemTalk

SemTalk supports the user during the modeling process using a Wizard that monitor the modeling process and offer suggestions. Examples include tips about writing e.g. large/lower case, detecting synonyms and the investigation of situations where the hierarchical structures appear to be incorrect. A further agent in Office XP is embedded and examines each record to see if any of the keywords are used in models.

For the animation the agents is created using the agent toolkit from MS Office. The agents are supported by a Crawler, which looks independently or requests available models and creates index files for the agents. The Crawler looks not only in the local file system but also in the Semantic Web for available sources of knowledge in the format RDFS.

4 Summary

Using SemTalk models are able to give context to keywords. They also create a starting point to understand and communicate process information. The Visio editor enables a wide range of users to use and understand models. Application of knowledge models into popular business processing tools introduces processing concepts and allows a new navigation medium to support knowledge management. By the integration the technology into the daily work processes, the acceptance, and thus the usefulness of the models rises. Most importantly a process context for more powerful intelligent retrieval using Semantic Web concepts is unleashed.