Formal Models from Controlled Natural Language via Cognitive Grammar and Configuration

This PhD project investigates the computer aided formalisation of textual software requirements to improve their quality and consistency as well as their communication between stakeholders. It develops a general framework, and a prototype implementation, for the flexible analysis of a restricted form of English that can assist in the identification of errors and inconsistencies in the text. In addition, it investigates the acquisition of the knowledge required for the approach through two means: a simple but well-defined glossary, and the textual specification itself. The results showed accurate analysis of the text and identification of ambiguity where present. Moreover, the knowledge acquisition method performed well on the glossary and achieved results similar to those of other approaches when performed on the specification directly.

Researcher: Matt Selway
Supervisors: Prof. Markus Stumptner, Dr. Wolfgang Mayer

Overview

In the past decade the rise of Model-Driven Engineering has resulted in major changes to industrial software development. Using models for describing behaviours allows replacing part of the coding activities with higher-level definitions of concepts and behaviour: commonly cited as reducing costs, errors and time-to-market, while improving software quality and communication between stakeholders. However, we continue to see spectacular failures of large, expensive IT projects, often blamed on badly specified requirements. With high-level models of business environments—independent of technical considerations—now possible and standards like the Semantics of Business Vocabulary and Business Rules (SBVR) emphasising business and domain experts as the owners of their business requirements, we are reaching the point where formal modelling of requirements by the business experts themselves is necessary. However, requirements specifications written in natural language (e.g. English) still play a fundamental role in communicating the requirements.

This thesis investigates an approach to semi-automatically transforming natural language specifications written by business/domain experts into formal models. The aim is to support the user in the process of formalising their requirements as models to reduce errors, ambiguities, inconsistencies, and the time taken.

Initially, two possible approaches to achieving this aim are identified: a precise Controlled Natural Language approach, and a more flexible Information Extraction-based approach. However, both are considered unsuitable for use by business experts as they require too much technical knowledge and time, or cannot express the necessary requirements. Therefore, a new approach that overcomes these limitations is developed in this thesis.

A general framework is presented, based on Cognitive Grammar and Knowledge-based Configuration, which focuses on the semantic analysis of controlled language. It is more flexible than a formal grammar for controlled language while preserving their desirable properties—e.g. reduced ambiguity for humans and computers—and can identify and help correct errors, ambiguities, and inconsistencies. Most of the knowledge required for analysing the text is embedded in the parser as models used for configuration, with the exception of some simple lexical information and mapping rules that evoke lexical entries and instantiate
the elements required for parsing.

Following this, a prototype implementation for SBVR Structured English is created, called CLUE4SBVR, which provides concrete descriptions for each aspect of the general framework. Furthermore, two lexical acquisition are incorporated: one that learns vocabulary from a partly formalised glossary, and a another that learns candidate vocabulary entries from unrestricted text.

The prototype is evaluated on its performance in processing an SBVR SE specification. The lexical acquisition components are also evaluated, including the development of a comparative framework to more reliably compare existing IE approaches to the component for vocabulary extraction from unrestricted text.

The results show that CLUE4SBVR is capable of performing precise semantic interpretation of the CNL specification, while identifying ambiguity as intended. Furthermore, the lexicon acquisition from glossary performed very well, while acquisition of candidate vocabulary from unrestricted text obtained similar results to the compared approaches. However, the prototype displayed run-time performance issues that must be overcome before it can be used in practice; several possible solutions are identified.

Overview of the process of analysing controlled language specifications into formal models.

Overview of the process of analysing controlled language specifications into formal models.

Related Publications

  • [PDF] M. Selway, “Formal Models from Controlled Natural Language via Cognitive Grammar and Configuration,” PhD Thesis, University of South Australia, Adelaide, SA, Australia, 2016.
    [Bibtex]
    @PHDTHESIS{SelwayThesis2016,
      author = {Selway, Matt},
      title = {Formal Models from Controlled Natural Language via Cognitive Grammar
      and Configuration},
      school = {School of Information Technology and Mathematical Sciences},
      year = {2016},
      address = {University of South Australia, Adelaide, SA, Australia}
    }
  • [PDF] [DOI] M. Selway, G. Grossmann, W. Mayer, and M. Stumptner, “Formalising Natural Language Specifications using a Cognitive Linguistic/Configuration Based Approach,” Information Systems, vol. 54, 2015.
    [Bibtex]
    @ARTICLE{IS2015/SelwayGMS,
      author = {Matt Selway and Georg Grossmann and Wolfgang Mayer and Markus Stumptner},
      title = {Formalising Natural Language Specifications using a Cognitive Linguistic/Configuration Based Approach},
      journal = {Information Systems},
      year = 2015,
      publisher = {Elsevier},
      doi = {10.1016/j.is.2015.04.003},
      volume = {54},
    }
  • [PDF] [DOI] M. Selway, W. Mayer, and M. Stumptner, “Semantic Interpretation of Requirements through Cognitive Grammar and Configuration,” in Proc. Pacific Rim Conference on Artificial Intelligence (PRICAI) 2014, Springer International Publishing, 2014, vol. 8862, pp. 496-510.
    [Bibtex]
    @INCOLLECTION{PRICAI2014/SelwayMS2014,
      author = {Selway, Matt and Mayer, Wolfgang and Stumptner, Markus},
      title = {Semantic Interpretation of Requirements through Cognitive Grammar
      and Configuration},
      booktitle = {Proc. Pacific Rim Conference on Artificial Intelligence ({PRICAI}) 2014},
      publisher = {Springer International Publishing},
      year = {2014},
      volume = {8862},
      series = {Lecture Notes in Computer Science},
      pages = {496--510},
      doi = {10.1007/978-3-319-13560-1_40},
    }
  • [PDF] [DOI] M. Selway, G. Grossmann, W. Mayer, and M. Stumptner, “Formalising Natural Language Specifications using a Cognitive Linguistic/Configuration Based Approach,” in Proc. EDOC 2013, Vancouver, Canada, 2013, pp. 59-68.
    [Bibtex]
    @INPROCEEDINGS{edoc13/SelwayGMS,
      author =       {Matt Selway and Georg Grossmann and Wolfgang Mayer and Markus Stumptner},
      title =        {Formalising Natural Language Specifications using a Cognitive Linguistic/Configuration Based Approach},
      booktitle =    {Proc. EDOC 2013},
      pages =        {59--68},
      month =        sep,
      year =         2013,
      address =      {Vancouver, Canada},
      isbn =         {978-0-7695-5081-7},
      publisher =    {IEEE},
      doi =          {10.1109/EDOC.2013.16},
    }
  • [PDF] M. Selway, W. Mayer, and M. Stumptner, “Configuring Domain Knowledge for Natural Language Understanding,” in Proc. 15th International Configuration Workshop (ConfWS’13), Vienna, Austria, 2013, pp. 63-70.
    [Bibtex]
    @INPROCEEDINGS{confws13/SelwayMS,
      author =       {Matt Selway and Wolfgang Mayer and Markus Stumptner},
      title =        {Configuring Domain Knowledge for Natural Language Understanding},
      booktitle =    {Proc. 15th International Configuration Workshop (ConfWS'13)},
      pages =        {63--70},
      month =        aug,
      year =         2013,
      address =      {Vienna, Austria},
      isbn =         {979-10-91526-02-9},
      url =         {http://ws-config-2013.mines-albi.fr/CWS-2013-Proceedings-Color.pdf},
    }