Learning from Data: A Foundational Course for Linguists

Malvina Nissim and Johannes Bjerva

  • Area: LaCo
  • Level: F
  • Week: 2
  • Time: 17:00 – 18:30
  • Room: D1.02


This course is aimed at students who have a background in linguistics, can see and formulate research questions on language-related problems from an application perspective, but have no knowledge of machine learning approaches to language processing. I will assume no prior knowledge of statistical language processing. The course will balance theory and practice, by covering conceptual as well as implementation aspects.

This isn’t a theoretical course on the mathematical aspects of learning, rather a course aimed at equipping the students with practical abilities to run basic machine learning experiments, building on an introductory theoretical background. Pointers will be given to those who want to expand on this in a more substantial way. During the lectures, I will introduce the basic concepts and procedures of machine learning (learning by example, features, training and testing, supervised vs unsupervised, generative vs discriminative etc.), the main algorithms that one can use, feature extraction and manipulation, basic concepts in evaluation and error understanding (bias vs variance, overfitting, learning curves, etc), and existing tools/platforms to easily run experiments. All of this will be illustrated by means of theory and practice, both during class and at home: day-to-day small practical assignments will be given so that theory can be applied and understood right away.

At the end of the course, you are expected to be able to practically run machine learning experiments on a given (NLP) problem at the end of the course. You will understand key concepts and terminology of machine learning, training and testing procedures, and use existing tools that support machine learning experiments, such as Weka, NLTK, and scikit-learn. More specifically, in setting up an experiment for a given task, you will know that you have (to make) choices in how to represent a problem, implement features for learning and pick an appropriate algorithm, and will be able to interpret the results critically, by understanding evaluation metrics, as well as possible sources of errors.



LFD lecture1

LFD lecture2

LFD lecture3

LFD lecture4

LFD lecture5


Downloading material

Approach 1: Use git (updateable, recommended if you have git)

  1. In your terminal, type: ‘git clone https://github.com/bjerva/esslli-learning-from-data-students.git’
  2. Followed by ‘cd esslli-learning-from-data-students’
  3. Whenever the code is updated, type: ‘git pull’

Approach 2: Download a zip archive (static = you need to be told when a new version is up)

  1. Download the zip archive from:
  2. Whenever the code is updated, download the archive again.

Running scripts

  1. Navigate to your ‘esslli-learning-from-data-students’ (using cd in the terminal)
  2. To extract features and learn model:

python run_experiment.py –csv data/trainset-sentiment-extra.csv –nwords 1 –algorithms nb

The command above would use the trainset-sentiment-extra dataset with a Naive Bayes unigram model


Additional References

With a focus on NLP:

  • Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, MIT Press. Cambridge, MA. 1999. http://nlp.stanford.edu/fsnlp/
  • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008. http://nlp.stanford.edu/IR-book/
  • James Pustejovsky and Amber Stubbs, Natural Language Annotation for Machine Learning, O’Reilly. 2012.
  • Steven Bird, Ewan Klein, and Edward Loper, Natural Language Processing with Python, O’Reilly. 2009. http://www.nltk.org
  • Hal Daumé III. A course in Machine Learning. http://ciml.info (incomplete manuscript available online – some parts available for free.)

More generally on machine learning:

  • Tom Mitchell, Machine Learning, McGraw Hill. 1997.
  • Ian H. Witten, Eibe Frank, Mark A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, The Morgan Kaufmann Series in Data Management Systems. 2011.
  • Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin, Learning from Data, AMLBook. 2012.
  • Peter Flach, Machine Learning: The Art and Science of Algorithms that Make Sense of Data, Cambridge University Press. 2012.

More specific to Scikit learn (and ML with Python):

  • Luis Pedro Coehlo and Willi Richert, Building Machine Learning Systems with Python, PACKT Publishing. 2013.
  • Raúl Garreta and Guillermo Moncecchi, Learning scikit-learn: Machine Learning in Python, PACKT Publishing. 2013.


Natural Language Processing of Microblogs

Tatjana Scheffler and Manfred Stede

  • Area: LaCo
  • Level: I
  • Week: 2
  • Time: 14:00 – 15:30
  • Room: D1.03


Social media has become an abundant data source for NLP applications, and its analysis has gained significance as a research field of its own. In this course we introduce how to work with microblogs for linguistic theorizing as well as for developing computational models and applications. Focussing on Twitter data, we show how to collect and store social media corpora. Social media language contains many non-standard features that necessitates preprocessing or the adaptation of standard NLP tools. The course will introduce common social media processing tasks with a focus on utilizing metadata like user info, geo-tagging, or time stamps. A special feature of much social media data is its interactional nature, which we will address in a session on discourse processing for Twitter. Finally, we will apply existing tools for working with social media in a practical mini-project implementing our own linguistically-inspired Twitter bots.

Motivation & Description:

Social media such as Twitter, Facebook, blogs, chats, etc. are a generous source of user-generated data for natural language processing. On the one hand, there are many advantages of working with this kind of data: a large industrial and academic interest in analyzing and automatically processing user generated content, abundance of textual data through public APIs, the ability to monitor newly emerging trends (social sensors), etc. On the other hand, natural language processing of social media texts faces many particular challenges (Baldwin, 2012). Often, state-of-the-art NLP applications cannot be immediately applied to social media data, and even adaptation comes with a huge degradation in evaluation scores.

Some of the potential challenges for computational linguists are:

  • Volume and speed of the data stream
  • Representativeness / collection of corpora
  • Variability of style and content
  • Conversational data (essentially, written-down spoken-like dialog data)

The course is aimed at students who want to start working with social media data, especially from Twitter. We will present the available tools, methods, and approaches for the entire pipeline of linguistic or computational linguistic research on social media data, hoping to enable students to start their own research projects.

We will introduce data formats and discuss how to collect one’s own corpus from Twitter, given available tools and APIs. One specific, under-researched field within social media NLP is work on non-English languages. We can show how to obtain data in other languages. In addition, we show methods for collecting and working with conversational corpora, which exhibit many interesting features for linguistc and computational analysis.

Language variability is a huge issue in social media texts, due to many contributing factors: variability of authors, topics, text genres, dialects, etc. (Eisenstein, 2015). Two common approaches dealing with variability and non-standard language are an elaborate preprocessing step for normalization (Sidarenka et al., 2014), or adaptation of standard NLP tools to social media data. In most cases, both steps are probably needed.

After discussing corpus collection and preprocessing, we introduce state-of-the-art approaches to common microblog processing applications, such as (sentiment) classification and (topic) clustering. We put special focus on how to work with the specific metadata that distinguishes microblogs from other textual data: user information, geographical information, and time stamps.

Finally, we will present existing Python tools for working with microblogs (e.g., the Tweepy package). These tools and some scripts provided by the instructors enable us to implement our own Twitter bots with a few lines of code (Waite, 2014). Simple linguistically-informed Twitter bots could include a bot that returns a translation (or parse image) of an incoming tweet, identifies its language, etc.

Tentative Outline:

  • Session 1: Collecting corpora, structure of the data, preprocessing
  • Session 2: Working with metadata: users, geo-information, time stamps
  • Session 3: Classification and clustering: sentiment analysis
  • Session 4: Discourse processing of social media text
  • Session 5: Linguistic phenomena in microblogs

Level & Prerequisites:

The course is meant as an introduction to methods and available tools for undergraduate or graduate students interested in working with social media. Basic knowledge of linguistics and computational linguistics suffices. Familiarity with Python (for the last session) is a plus, but not required.


Useful Links

  • Detecting automated Twitter accounts: BotOrNot

Building Twitter Bots

Collect your ideas here!

Quick and easy Twitter bots: Make your own @HydrateBot

Python corpus based Twitter bots: Creative Twitter bots

Bots we made at ESSLLI!

@xiejiabot: a bot that generates bot ideas

@gaebot: generates ESSLLI class names

@ESSLLIbot: generates ESSLLI class names

@BreakfastAndArt: generates new painting names that sound like restaurant dishes

A bot that debeautifies Slovenian songs

@Eugeneralissimo – a city name generator with “factual” information

@TheBotOfPuns: a bot that tweets a random joke from a previously generated list of homophone-based jokes

@WhosThere_bot – responds to “knock knock jokes” (when it’s not over its API quota)

@drbotson – generates new Sherlock Holmes story titles

@pictureeveryday: Chuck norris Jokes

@millueh – locate ESSLLI participants

A bot that generates novel (cooking) recipes

Computational Semantics

Johan Bos

  • Area: LaCo
  • Level: I
  • Week: 2
  • Time: 11:00 – 12:30
  • Room: D1.02


In this course on computational semantics the relationship between expressions of natural language and meaning representations is studied, and the way one could use these meaning representations to draw (automatically) inferences. This will be done by:

  1. Comparing model-based approaches with proof-based approaches, and introduce inference techniques such as model checking, model building, and theorem proving;
  2. Discussing the ingredients of meaning representations for natural language expressions;
  3. Introducing a compositional approach for mapping natural language expressions to meaning representations, based on categorial grammar;
  4. Applying the techniques to practical applications in language technology such as contradiction checking and advanced image search.

This is an introductory course. No special knowledge of first-order logic, computational grammar, formal semantics, or automated reasoning is required.

Day 1: Exploring Models

Day 2: Meaning Representations

Day 3: Computing Meanings with DCG

Day 4: Computing Meanings with CCG

  • C&C + Boxer
  • Further reading: Bos (2015): Open-Domain Semantic Parsing with Boxer. In: B. Megyesi (ed): Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015), pp 301–304. PDF
  • Slides: CategorialGrammar

Day 5: Drawing Inferences and Meaning Banking

Crowdsourcing Linguistic Datasets

Chris Biemann

  • Area: LaCo
  • Level: I
  • Week: 2
  • Time: 11:00 – 12:30
  • Room: C3.06


The course gives a thorough introduction to crowdsourcing as an instrument to quickly acquire linguistic datasets for training and evaluation purposes.  Further, the course will provide step-by-step instructions on how to realize simple and complex crowdsourcing projects on Amazon Mechanical Turk and on CrowdFlower.
While crowdsourcing seems like a straightforward solution for linguistic annotation, the success of a crowdsourcing project is critically depending on multiple dimensions.
In this course, emphasis is placed on understanding these dimensions by dis-cussing practical experiences in order to enable participants to successfully use crowdsourcing for language-related research. This includes learning about demographics, platform mechanisms, schemes for ensuring data quality, best practices regarding the treatment of workers and, most of all, lessons learned from previous crowdsourcing projects as described in the literature and as conducted by the instructor.
The educational goal is to enable participants to successfully set up crowdsourcing projects and to circumnavigate typical pitfalls.

The course is organized in 5 sessions of 90 minutes each.

  1. What is Crowdsourcing? History and demographics, definitions, elementary concepts, example projects.
  2. Crowdsourcing platforms, esp. Amazon Mturk and Crowdflower. Technical possibilities, payment schemes, Do’s and Don’ts, schemes for ensuring quality,
  3. Successful design patterns for Crowdsourcing projects for language tasks,
  4. Crowdsourcing projects for language tasks, lessons learned, including non-English tasks.
  5. Quality Control Mechanisms, Ethical considerations, how to treat your crowdworkers,, requester code of conduct, turker forums

Short Bio

Chris is assistant professor and head of the Language Technology group at TU Darmstadt in Germany. He received his Ph.D. from the University of Leipzig, and subsequently spent three years in industrial search engine research at Powerset and Microsoft Bing in San Francisco, California. He is regularly publishing in journals and top conferences in the field of Computational Linguistics.
His research is targeted towards self-learning structure from natural language text, specifically regarding semantic representations. Using big-data techniques, his group has built an open-source, scalable language-independent framework for symbolic distributional semantics. To connect induced structures to tasks, Chris is frequently using crowdsourcing techniques for the acquisition of natural language semantics data.


LECTURE 1: What is Crowdsourcing? 1 Crowdsourcing_Aug2016_ESSLLII.pptx
History and demographics, definitions, elementary concepts, example projects

LECTURE 2: Crowdsourcing platforms 2 Crowdsourcing_Aug2016_ESSLLII.pptx
esp. Amazon Mturk and Crowdflower. Technical possibilities, payment schemes, Do’s and Dont’s, schemes for ensuring quality

LECTURE 3: Successful design patterns 3 Crowdsourcing_Aug2016_ESSLLII.pptx
illustrated with some exemplary projects

LECTURE 4: Crowdsourcing projects for language tasks 4 Crowdsourcing_Aug2016_ESSLLII.pptx
a variety of projects, and lessons learned

LECTURE 5: Quality Control and Ethical considerations 5 Crowdsourcing_Aug2016_ESSLLII.pptx
quality control mechanisms, modelling the quality of individual workers automatically, how to treat your crowdworkers, requester code of conduct, crowdworker forums

Additional References (Selection)

General Studies and Surveys

Robert Munro, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai, Robin Melnick, Christopher Potts, Tyler Schnoebelen, and Harry Tily. 2010. Crowdsourcing and language studies: the new generation of linguistic data. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT ’10). 122-130.

Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’08). 254-263.

Juska-Bacher, Britta and Biemann, Chris and Quasthoff, Uwe. 2013. Webbasierte linguistische Forschung: Möglichkeiten und Begrenzungen beim Umgang mit Massendaten. Linguistik online 61, 4/2013

Lexical Resource

Lafourcade, Mathieu and Zarrouk, Manel and Joubert, Alain. 2014. About Inferences in a Crowdsourced Lexical-Semantic Network. Proceedings of the EACL, Gothenburg, Sweden, 174–182

Braslavski, Pavel and Ustalov, Dmitry and Mukhin, Mikhail. 2014. A Spinning Wheel for YARN: User Interface for a Crowdsourced Thesaurus. Proceedings of the Demonstrations at EACL, Gothenburg, Sweden, 101–104

Fossati, Marco and Giuliano, Claudio and Tonelli, Sara. 2013. Outsourcing FrameNet to the Crowd. Proceedings of ACL (Volume 2: Short Papers), Sofia, Bulgaria, 742–747

Hartshorne, Joshua K. and Bonial, Claire and Palmer, Martha. 2014. The VerbCorner Project: Findings from Phase 1 of crowd-sourcing a semantic decomposition of verbs. Proceedings of ACL (Volume 2: Short Papers), Baltimore, Maryland,397–402

Biemann, Chris and Nygaard, Valerie. 2010. Crowdsourcing WordNet. Proceedings of GWC-2010

Word Sense

Jurgens, David. 2014. Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels. Proceedings of NAACL-HLT, Atlanta, Georgia, 556–562

Lopez de Lacalle, Oier and Agirre, Eneko. 2015. Crowdsourced Word Sense Annotations and Difficult Words and Examples. Proceedings of the 11th International Conference on Computational Semantics, London, UK, 94–100

Biemann, Chris. 2012. Creating a system for lexical substitutions from scratch using crowdsourcing. Lang. Resources & Evaluation, vol. 47, no. 1, p. 97–112

Event entailment

Takabatake, Yu and Morita, Hajime and Kawahara, Daisuke and Kurohashi, Sadao and Higashinaka, Ryuichiro and Matsuo, Yoshihiro. 2015. Classification and Acquisition of Contradictory Event Pairs using Crowdsourcing. Proceedings of the The 3rd Workshop on EVENTS, Denver, Colorado, 99–107


Steven Burrows, Martin Potthast, and Benno Stein. 2013. Paraphrase Acquisition via Crowdsourcing and Machine Learning. Transactions on Intelligent Systems and Technology (ACM TIST)

Tschirsich, Martin and Hintz, Gerold. 2013. Leveraging Crowdsourcing for Paraphrase Recognition. Proceedings of LAW and Interoperability with Discourse, Sofia, Bulgaria, 205–213

Matteo Negri, Yashar Mehdad, Alessandro Marchetti, Danilo Giampiccolo, and Luisa Bentivogli. 2012. Chinese whispers: Cooperative paraphrase acquisition. In Proceedings of LREC’12, Istanbul, Turkey


Feizabadi, Parvin Sadat and Padó, Sebastian. 2014. Crowdsourcing Annotation of Non-Local Semantic Roles. Proceedings of EACL, volume 2: Short Papers, Gothenburg, Sweden, 226–230


Yan, Rui and Gao, Mingkun and Pavlick, Ellie and Callison-Burch, Chris. 2014. Are Two Heads Better than One? Crowdsourced Translation via a Two-Step Collaboration of Non-Professional Translators and Editors. Proceedings of ACL (Long Papers), Baltimore, MD, 1134–1144

Kunchukuttan, Anoop and Chatterjee, Rajen and Roy, Shourya and Mishra, Abhijit and Bhattacharyya, Pushpak. 2013. TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain. Proceedings of ACL System Demonstrations, Sofia, Bulgaria, 175–180

Sequence Tagging

Hovy, Dirk and Plank, Barbara and Søgaard, Anders. 2014. Experiments with crowdsourced re-annotation of a POS tagging data set. Proceedings ACL (Volume 2: Short Papers), Baltimore, Maryland, 377–382

Sentiment (inter alia)

Staiano, Jacopo and Guerini, Marco. 2014. Depeche Mood: a Lexicon for Emotion Analysis from Crowd Annotated News. Proceedings ACL (Volume 2: Short Papers), Baltimore, MD, 427–433

Text Reuse and Simplification

Potthast, Martin and Hagen, Matthias and Völske, Michael and Stein, Benno. 2013. Crowdsourcing Interaction Logs to Understand Text Reuse from the Web. Proceedings of ACL (Volume 1: Long Papers), Sofia, Bulgaria, 1212–1221

Amancio, Marcelo and Specia, Lucia. 2014. An Analysis of Crowdsourced Text Simplifications, Proceedings of the 3rd Workshop on PITR, Gothenburg, Sweden, 123–130

Quality and how to make use of divergence

Felt, Paul and Black, Kevin and Ringger, Eric and Seppi, Kevin and Haertel, Robbie. 2015. Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models. Proceedings of NAACL-HLT, Denver, Colorado, 882–891

Ramanath, Rohan and Choudhury, Monojit and Bali, Kalika and Saha Roy, Rishiraj. 2013. Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation. Proceedings of ACL (Volume 1: Long Papers), Sofia, Bulgaria, 1713–1722

Integration in Annotation Tools

Bontcheva, Kalina and Roberts, Ian and Derczynski, Leon and Rout, Dominic. 2014. The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy. Proceedings of the Demonstrations at EACL, Gothenburg, Sweden, 97–100

Yimam, Seid Muhie and Gurevych, Iryna and Eckart de Castilho, Richard and Biemann Chris. 2013. WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations. In Proceedings of ACL-2013, demo session, Sofia, Bulgaria

Computational Models of Events (CANCELLED)

James Pustejovsky

  • Area: LoCo
  • Level: A
  • Week: 1
  • Time: The course has been cancelled
  • Room: –


The notion of event has long been central for both modeling the semantics of natural language as well as reasoning in goal-driven tasks in artificial intelligence. This course examines developments in computational models for events, bringing together recent work from the areas of semantics, logic, computer science, and computational linguistics. The goal of this course is to look at event structure from a unifying perspective, enabled by a new synthesis of how these disciplines have approached the problem. This entails examining the structure of events at all levels impacted by linguistic expressions: (a) predicate decomposition and subatomic event structure; (b) atomic events and the mapping to syntax; (c) events in discourse structure; (d) and the macro-event structure of
narratives and scripts.


Additional References

DSALT: Distributional Semantics and Linguistic Theory

Gemma Boleda and Denis Paperno

  • Workshop
  • Week: 1
  • Time: 17:00 – 18:30
  • Room: D1.02
  • 15-19 August 2016, Bolzano, Italy

(Please also see the related Composes workshop, co-located with ESSLLI, to be held the day before DSALT starts)

The DSALT workshop seeks to foster discussion at the intersection of distributional semantics and various subfields of theoretical linguistics, with the goal of boosting the impact of distributional semantics on linguistic research beyond lexical semantic phenomena, as well as broadening the empirical basis and theoretical tools used in linguistics. Our contributions explore the theoretical interpretation of distributional vector spaces and their application to theoretical morphology, syntax, semantics, and pragmatics.


Monday, August 15

17:00-17:45 Invited talk, Jason Weston (Facebook). Memory Networks for Language Understanding. (slides, tutorial) Abstract: There has been a recent resurgence in interest in the use of the combination of reasoning, attention and memory for solving tasks, particularly in the research area of machine learning applied to language understanding. I will focus on one of my own group’s contributions, memory networks, an architecture that we have applied to question answering, language modeling and general dialog. As we try to move towards the goal of true language understanding, I will also discuss recent datasets and tests that have been built to assess these models abilities to see how far we have come (hint: there’s still a long way to go!!).

17:50-18:10 Jerry R. Hobbs and Jonathan Gordon. Distribution and Inference (slides).

18:10-18:30 William Hamilton, Jure Leskovec, Dan Jurafsky. Distributional approaches to diachronic semantics (slides).

Tuesday, August 16

17:00-17:45 Invited talk, Katrin Erk (University of Texas at Austin). The probabilistic samowar: an attempt at explaining how people can learn from distributional data (slides). Abstract: There is evidence that people can learn the meaning of words from observing them in text. But how would that work, in particular, how would such learning connect a word with the entities in the world that it denotes? In this talk I discuss two proposals of how humans could learn from distributional data, both of which have a number of core assumptions in common. They both assume that the information that distributional data can contribute is property information: words that appear in similar contexts (for suitable definitions of “context”) denote entities with similar properties. Disributional  data is noisy and probabilistic; for that reason, both approaches assume that an agent has a probabilistic information state — a probability distribution over worlds that could be the actual world –, which can be influenced by textual context data.

17:50-18:30 Poster session 1.

Wednesday, August 17

17:00-17:45 Invited talk, Alessandro Lenci (University of Pisa). Distributional Models of Sentence Comprehension. (slides) Abstract: In this talk I will discuss the modelling of phenomena related to sentence comprehension in a distributional semantic framework. In particular, I will focus on how linguistic and neurocognitive evidence about human sentence processing can be integrated in distributional semantic models to tackle the challenges of compositional and incremental online construction of sentence representations.

17:50-18:10 Gabriella Lapesa, Max Kisselew, Sebastian Pado, Tilmann Pross, Antje Roßdeutscher. Characterizing the pragmatic component of distributional vectors in terms of polarity: Experiments on German uber verbs (slides).

18:10-18:30 Enrico Santus, Alessandro Lenci, Qin Lu, Chu-Ren Huang. Squeezing Semantics out of Contexts: Automatic Identification of Semantic Relations in DSMs (slides).

Thursday, August 18

17:00-17:45 Invited talk, Aurélie Herbelot (University of Trento). Where do models come from? (slides) Abstract: When contrasted with formal semantics, distributional semantics is usually described as a natural (and very successful) way to simulate human analogical processes. There is however no essential reason to believe that formal semantics should be unable to do similarity. In this talk, I will propose that a) the inability of formal semantics to model relatedness is linked to a notion of ‘model sparsity’, and b) the strength of distributional semantics lies not so much in similarity but in having a cognitively sound basis, which potentially enables us to answer the question ‘Where do models come from?’ I will give an overview of some experimental results which support the idea that a rich model of the world can be acquired from distributional data via soft inference processes.

17:50-18:30 Poster session 2.

Friday, August 19

17:00-17:45 Invited talk, Marco Baroni (University of Trento). Living a discrete life in a continuous world (slides). Abstract: Natural language understanding requires reasoning about sets of discrete discourse entities, updating their status and adding new ones as the discourse unfolds. This fundamental characteristic of linguistic semantics makes it difficult to handle with fully trainable end-to-end architectures, that are not able to learn discrete operations. Inspired by recent proposals such as Stack-RNN (Joulin and Mikolov, 2015) and Memory Networks (Sukhbaatar et al. 2015), where a neural network learns to control a discrete memory through a continuous interface, we introduce a model that learns to create and update discourse referents, represented by distributed vectors, by being trained end-to-end on a reference resolution task. Preliminary results suggest that our approach is viable. (Work in collaboration with Gemma Boleda and Sebastian Padó.)

17:50-18:10 Kristina Gulordava. Measuring distributional semantic effects in syntactic variation (slides).

18:10-18:30 Discussion and wrap-up.


Gabor Borbely, Andras Kornai, Marcus Kracht, David Nemeskey. Denoising composition in distributional semantics.
Guy Emerson. Compositional semantics in a probabilistic framework.
Anna Gladkova and Aleksandr Drozd. King – man + woman = queen: the linguistics of “linguistic regularities”.
Dimitri Kartsaklis, Matthew Purver, Mehrnoosh Sadrzadeh. Verb Phrase Ellipsis using Frobenius Algebras in Categorical Compositional Distributional Semantics.
Reinhard Muskens and Mehrnoosh Sadrzadeh. Lambdas and Vectors.
Rossella Varvara, Gabriella Lapesa, Sebastian Padó. Quantifying regularity in morphological processes: An ongoing study on nominalization in German.
Ramon Ziai, Kordula De Kuthy, Detmar Meurers. Approximating Schwarzschild’s Givenness with Distributional Semantics.


Kyröläinen Aki-Juhani, Luotolahti M. Juhani, Hakala Kai, Ginter Filip. Modeling cloze probabilities and selectional preferences with neural networks.
Alexander Kuhnle. Investigating the effect of controlled context choice in distributional semantics.
Andrey Kutuzov. Redefining part-of-speech classes with distributional semantic models.
Edoardo Maria Ponti, Elisabetta Jezek, Bernardo Magnini. Grounding the Lexical Sets of Anti-Causative Pairs on a Vector Model.
Pascual Martínez-Gomez, Koji Mineshima, Yusuke Miyao, Daisuke Bekki. Integrating Distributional Similarity as an Abduction Mechanism in Recognizing Textual Entailment.
Michael Repplinger. A Systematic Evaluation of Current Motivation for Explicit Compositionality in Distributional Semantics.
Marijn Schraagen. Towards a dynamic application of distributional semantics.

Programme Committee

Nicholas Asher, Marco Baroni, Emily Bender, Raffaella Bernardi, Robin Cooper, Ann Copestake, Katrin Erk, Ed Greffenstette, Aurélie Herbelot, Germán Kruszewski, Angeliki Lazaridou, Alessandro Lenci, Marco Marelli, Louise McNally, Sebastian Padó, Barbara Partee, Chris Potts, Laura Rimell, Hinrich Schütze, Mark Steedman, Bonnie Webber, Galit Weidman Sassoon, Roberto Zamparelli.


Funding and Endorsements

With funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 655577 (LOVe) as well as the 7th Framework Program ERC grant 283554 (COMPOSES).

Endorsed by SIGLEX and SIGSEM of the ACL.

Workshop Description

The DSALT workshop seeks to foster discussion at the intersection of distributional semantics and various subfields of theoretical linguistics, with the goal of boosting the impact of distributional semantics on linguistic research beyond lexical semantic phenomena, as well as broadening the empirical basis and theoretical tools used in linguistics. We welcome contributions regarding the theoretical interpretation  of distributional vector spaces and/or their application to theoretical morphology, syntax, semantics, discourse, dialogue, and any other subfield of linguistics. Potential topics of interest include, among others:

  • distributional semantics and morphology: How do results in the distributional semantics-morphology interface impact theoretical accounts of morphology? Can distributional models account for inflectional morphology? Can they shed light on phenomena like productivity and regularity?
  • distributional semantics and syntax: How can compositionality at the semantic level interact with syntactic structure? Can we go beyond the state of the art in accounting for the syntax-semantics interface when it interacts with lexical semantics? How can distributional accounts for gradable syntactic phenomena, e.g. selectional preferences or argument alternations, be integrated into theoretical linguistic accounts?
  • distributional semantics and formal semantics: How can distributional representations be related to the traditional components of a semantics for natural languages, especially reference and truth? Can distributional models be integrated with discourse- or dialogue-oriented semantic theories like file change semantics or inquisitive semantics?
  • distributional semantics and discourse: Distributional semantics has shown to be able to model some aspects of discourse coherence at a global level (Landauer and Dumais 1997, a.o.); can it also help with other discourse-related phenomena, such as the choice of discourse particles, nominal and verbal anaphora, or the form of referring expressions as discourse unfolds?
  • distributional semantics and dialogue: Distributional semantics has traditionally been mostly static, in the sense that it creates a semantic representation for a word once and for all. Can it be made dynamic so it can help model, for example, phenomena related to Questions Under Discussion (QUDs) in dialogue? Can distributional representations help predict the relations between utterance units in dialogue?
  • distributional semantics and pragmatics: Distributional semantics is based on the statistics of language use, and therefore should include information related to pragmatics of language. How do distributional models relate to such aspects of pragmatics as focus, pragmatic presupposition, or conversational implicature?



Submissions to DSALT do not need to be anonymous. We solicit two-page (plus references) abstracts in at most 11pt font (no other requirements on format and citation style; you can use the ACL stylesheet if you want — but make sure to set font size to 11). No proceedings will be published, so workshop submissions may discuss published work (as well as unpublished work), and they can report on finished or ongoing work. The abstract submission deadline is April 12, 2016 (extended). Submissions are accepted by email at dsalt2016 AT gmail.com.


Important Dates

Deadline for abstract submission: April 12, 2016
Author notification: May 15, 2016
Workshop dates: August 15-19, 2016

Computational Historical Linguistics

Gerhard Jäger

  • Area: LoCo
  • Level: A
  • Week: 1
  • Time: 09:00 – 10:30
  • Room: D1.02


Language change shares several features with biological evolution: languages and biological traits
are realized in populations; they are transmitted between generations; population splits lead to diversification. The history of this diversification is studied by systematic comparison of extant
(plus historical/fossilized) traits. Within the past three decades, comparative biology has turned mathematical and computational; there is a plethora of models and algorithms to infer phylogenetic information from comparative data. Those range from clustering methods to sophisticated Bayesian models. Recent applications thereof to historical linguistics have garnered remarkable but also controversial results. Computational/phylogenetic historical linguistics faces several challenges, such as the sparseness of comparative data. Also, replication of linguistic knowledge is arguably less understood than its biological counterpart. The course offers a recapitulation of the comparative method in historical linguistics, a primer on phylogenetic inference, plus an overview over the state of the art in computational and phylogenetic historical linguistics.


(will be updated during the week)

Additional Reading




Incremental Speech and Language Processing for Interactive Systems

Timo Baumann and Arne Köhn

  • Area: LoCo
  • Level: A
  • Week: 1
  • Time: 14:00 – 15:30
  • Room: C3.06


Incremental processing – the processing of partial linguistic material as it happens has become a highly relevant research area. Incremental processing allows for faster system reactions (with processing times folded between modules), for more natural behaviour based on partial understanding (e. g. in interactive situations), to shape interactions in collaboration between the system and interlocutor (e. g. using less rigid turn-taking schemes), and is psycholinguistically appealing. We introduce a model of incrementality, in particular establishing the need to be able to re-analyze a component’s hypothesis about the state of affairs. We describe strategies for solving incremental processing tasks in speech and language using speech recognition, syntactic parsing, and speech synthesis as examples. We discuss how to evaluate the various aspects of incremental quality (such as timing, stability, and correctness) for each of the examples. We close with a discussion of what happens when individual processors are combined into partial or full systems.

Continue reading

Improving Language Technology with Fortuitous Data

Željko Agić, Anders Johannsen, and Barbara Plank

  • Area: LaCo
  • Level: A
  • Week: 1
  • Time: 17:00 – 18:30
  • Room: D1.01


Current successful approaches to natural language processing (NLP) are for the most part based on supervised learning. In turn, supervised learning critically depends on the availability of annotated data. Such data is generally not plentiful, as it requires time and expertise to develop annotated resources. This is the problem of data sparsity. At the same time, available annotated data is usually a sample of a particular domain or language. Thus, even if some annotated data is available, it is often not a clear fit for the problem at hand. This is the problem of data bias.

In this course, we present approaches to facilitate NLP development when confronted by sparsity, or even absence, of supervision through annotated, biased samples of language data. By using part-of-speech tagging and syntactic dependency parsing as running examples, we outline modern approaches to augmenting supervised techniques for top-level performance. The approaches include semi-supervised and unsupervised techniques, domain adaptation and cross-lingual learning. We place particular emphasis on leveraging the various sources of fortuitous data that may be available even in the most severely under-resourced domains of natural language. We argue that fortuitous data provides often the ‘secret sauce’ to make approaches based on limited supervision work.


  • Day 1: Introduction
  • Day 2: Structured input and output
  • Day 3: Representation sharing and multi-task learning
  • Day 4: Fortuitous recipes + hands-on
  • Day 5: Cross-lingual learning


The course material can be found on the fortuitous data homepage.

Additional References

See slides on course material website

Unification-Based Grammar Engineering

Dan Flickinger and Stephan Oepen

  • Area: LaCo
  • Level: I
  • Week: 1
  • Time: 17:00 – 18:30
  • Room: A5.18


Parsing and generation of natural language can benefit from the availability of manually constructed grammars that capitalize on accuracy and cross-domain flexibility; this holds true in linguistic theory building and practical, applied tasks alike. In this course we provide an introduction to the implementation of linguistically motivated grammars that can be developed to meet the broad-coverage demands of research and applications.

Motivation and Description

The implementation of linguistically-based grammars for natural languages draws on a combination of engineering skills, sound grammatical theory, and software development tools. This course provides a hands-on introduction to the formalism, techniques, and tools needed for building the precise, extensible grammars required both in research and in applications. Through a combination of lectures and in-class exercises, students investigate the implementation of constraints in morphology, syntax, and semantics, working within a unification-based framework. Topics to be addressed in the course include: the use of types and features, lexical rules, constructions and monotonic vs. default inheritance. The daily implementation exercises are conducted in the freely available LKB grammar development platform, and include experience with adding and repairing lexical types, lexical entries, lexical rules, phrase structure schemata, and others.

The course offers a conservatively updated version of materials and instruction that have been used successfully at previous summer schools (including twice at ESSLLI, once at the LSA Summer Institute, and at several others), as well as in more extended, regular courses at Stanford University, the University of Oslo, and others. Unification- or constraint-based approaches to the representation and processing of grammatical knowledge were part of mainstream curricula in formal and computational linguistics until maybe a decade ago. In recent years, however, much work in natural language processing (i.e. applied computational linguistics) has focused more on aspects of `shallower’, robust and efficient morpho-syntactic analysis, with applications of machine learning over linguistically annotated training data as the core acquisition technique for grammatical information. As emphasis is now gradually shifting from surface syntax to `deeper’ and in particular semantic analysis (or at least meaning representation), the study of the syntax–semantics interface and questions of compositionality are again gaining popularity. However, current student generations are often under-educated about relevant traditions in these areas, and thus we view an introductory course to unfication-based grammar a timely measure to help re-kindle broader knowledge and curiosity about different possibilities for the division of labor between grammar induction and linguistics engineering.

Course Outline

Each session consists of a short lecture followed by a laboratory period where the students gain hands-on experience with the tools and methods of modern grammar implementation.

Monday Unification-based Linguistic Description: Introduction to the LKB Exercise 1 Starting Package
Tuesday Phrase Structure Recursion and Modification Exercise 2 Starting Package
Wednesday Use of the Type Hierarchy for Concise Linguistic Description Exercise 3 Starting Package
Thursday Lexical Rules for Inflectional and Derivational Morphology Exercise 4 Starting Package
Friday Long-Distance Dependencies: Topicalization and Relative Clauses Exercise 5 Starting Package

Expected Level and Prerequisites

A basic knowledge of syntactic theory (in any framework) will be assumed, but no prior programming skills are required.

Development Environment

The course uses the Linguistic Knowledge Builder (LKB) software for grammar engineering. The LKB is pre-installed on the ESSLLI Linux virtual machines that are accessible from the computer laboratory on the fifth floor. Please see the first assignment sheet for further instructions on how to get going with the LKB.

It is also possible to access these virtual machines from outside the computer laboratory by installing a lightweight software client (which is available for different operating systems) on your own machine; please follow the instructions provided by the organizers, but note that access to the client software as well as to the virtual machine server appears to be blocked from the eduroam wireless network.

Some years ago, we taught a similar course at a Scandinavian summer school, where one student had to walk out early from the final lecture, looking clearly apologetic about not finishing the last assignment. She presented the results of the assignment, beautifully carried through, two years later to the instructors. To make it possible to continue your work on grammar engineering past ESSLLI, the complete development environment (and more) is available under an open-source license. We strongly recommend its use in a Linux environment, where installation should be a matter of just a few commands. Please see the instructions for what is called the LOGON distribution of the LKB, additional grammar development and parsing software, and several larger grammars.

Incentives: Certified Grammar Engineer Prize

We have available two copies of the relevant parts of the background books (see below), which we will give as a prize of honor to those students who appear most likely to pursue a career in grammar engineering. To participate in this competition, we ask that you submit the results of your work on either Exercise 3 or Exercise 4 (whichever you are most proud of) to Dan and Stephan for review, before breakfast time (09:00) on Friday morning. In case you are unsure about how to best pack up and email the contents of your grammar2 directory (or grammar4 directory, or maybe even grammar1, if you managed to build on your work from the first session), the course environment on the ESSLLI machines provides a command-line tool for that. In the terminal, make sure to re-run the lkb command and then try something like submit grammar2. Re-assuring messages about the files being submitted should appear.

Background Reading

Sag, Ivan A., Wasow, Thomas, & Bender, Emily M. (2003). Syntactic theory: A formal introduction (Second ed.). Stanford, California: Center for the Study of Language and Information.

Copestake, Ann (2002). Implementing typed feature structure grammars. Stanford,California: Center for the Study of Language and Information.