Unification-Based Grammar Engineering

Dan Flickinger and Stephan Oepen

  • Area: LaCo
  • Level: I
  • Week: 1
  • Time: 17:00 – 18:30
  • Room: A5.18

Abstract

Parsing and generation of natural language can benefit from the availability of manually constructed grammars that capitalize on accuracy and cross-domain flexibility; this holds true in linguistic theory building and practical, applied tasks alike. In this course we provide an introduction to the implementation of linguistically motivated grammars that can be developed to meet the broad-coverage demands of research and applications.

Motivation and Description

The implementation of linguistically-based grammars for natural languages draws on a combination of engineering skills, sound grammatical theory, and software development tools. This course provides a hands-on introduction to the formalism, techniques, and tools needed for building the precise, extensible grammars required both in research and in applications. Through a combination of lectures and in-class exercises, students investigate the implementation of constraints in morphology, syntax, and semantics, working within a unification-based framework. Topics to be addressed in the course include: the use of types and features, lexical rules, constructions and monotonic vs. default inheritance. The daily implementation exercises are conducted in the freely available LKB grammar development platform, and include experience with adding and repairing lexical types, lexical entries, lexical rules, phrase structure schemata, and others.

The course offers a conservatively updated version of materials and instruction that have been used successfully at previous summer schools (including twice at ESSLLI, once at the LSA Summer Institute, and at several others), as well as in more extended, regular courses at Stanford University, the University of Oslo, and others. Unification- or constraint-based approaches to the representation and processing of grammatical knowledge were part of mainstream curricula in formal and computational linguistics until maybe a decade ago. In recent years, however, much work in natural language processing (i.e. applied computational linguistics) has focused more on aspects of `shallower’, robust and efficient morpho-syntactic analysis, with applications of machine learning over linguistically annotated training data as the core acquisition technique for grammatical information. As emphasis is now gradually shifting from surface syntax to `deeper’ and in particular semantic analysis (or at least meaning representation), the study of the syntax–semantics interface and questions of compositionality are again gaining popularity. However, current student generations are often under-educated about relevant traditions in these areas, and thus we view an introductory course to unfication-based grammar a timely measure to help re-kindle broader knowledge and curiosity about different possibilities for the division of labor between grammar induction and linguistics engineering.

Course Outline

Each session consists of a short lecture followed by a laboratory period where the students gain hands-on experience with the tools and methods of modern grammar implementation.

Monday Unification-based Linguistic Description: Introduction to the LKB Exercise 1 Starting Package
Tuesday Phrase Structure Recursion and Modification Exercise 2 Starting Package
Wednesday Use of the Type Hierarchy for Concise Linguistic Description Exercise 3 Starting Package
Thursday Lexical Rules for Inflectional and Derivational Morphology Exercise 4 Starting Package
Friday Long-Distance Dependencies: Topicalization and Relative Clauses Exercise 5 Starting Package

Expected Level and Prerequisites

A basic knowledge of syntactic theory (in any framework) will be assumed, but no prior programming skills are required.

Development Environment

The course uses the Linguistic Knowledge Builder (LKB) software for grammar engineering. The LKB is pre-installed on the ESSLLI Linux virtual machines that are accessible from the computer laboratory on the fifth floor. Please see the first assignment sheet for further instructions on how to get going with the LKB.

It is also possible to access these virtual machines from outside the computer laboratory by installing a lightweight software client (which is available for different operating systems) on your own machine; please follow the instructions provided by the organizers, but note that access to the client software as well as to the virtual machine server appears to be blocked from the eduroam wireless network.

Some years ago, we taught a similar course at a Scandinavian summer school, where one student had to walk out early from the final lecture, looking clearly apologetic about not finishing the last assignment. She presented the results of the assignment, beautifully carried through, two years later to the instructors. To make it possible to continue your work on grammar engineering past ESSLLI, the complete development environment (and more) is available under an open-source license. We strongly recommend its use in a Linux environment, where installation should be a matter of just a few commands. Please see the instructions for what is called the LOGON distribution of the LKB, additional grammar development and parsing software, and several larger grammars.

Incentives: Certified Grammar Engineer Prize

We have available two copies of the relevant parts of the background books (see below), which we will give as a prize of honor to those students who appear most likely to pursue a career in grammar engineering. To participate in this competition, we ask that you submit the results of your work on either Exercise 3 or Exercise 4 (whichever you are most proud of) to Dan and Stephan for review, before breakfast time (09:00) on Friday morning. In case you are unsure about how to best pack up and email the contents of your grammar2 directory (or grammar4 directory, or maybe even grammar1, if you managed to build on your work from the first session), the course environment on the ESSLLI machines provides a command-line tool for that. In the terminal, make sure to re-run the lkb command and then try something like submit grammar2. Re-assuring messages about the files being submitted should appear.

Background Reading

Sag, Ivan A., Wasow, Thomas, & Bender, Emily M. (2003). Syntactic theory: A formal introduction (Second ed.). Stanford, California: Center for the Study of Language and Information.

Copestake, Ann (2002). Implementing typed feature structure grammars. Stanford,California: Center for the Study of Language and Information.