Crowdsourcing Linguistic Datasets

Chris Biemann

  • Area: LaCo
  • Level: I
  • Week: 2
  • Time: 11:00 – 12:30
  • Room: C3.06


The course gives a thorough introduction to crowdsourcing as an instrument to quickly acquire linguistic datasets for training and evaluation purposes.  Further, the course will provide step-by-step instructions on how to realize simple and complex crowdsourcing projects on Amazon Mechanical Turk and on CrowdFlower.
While crowdsourcing seems like a straightforward solution for linguistic annotation, the success of a crowdsourcing project is critically depending on multiple dimensions.
In this course, emphasis is placed on understanding these dimensions by dis-cussing practical experiences in order to enable participants to successfully use crowdsourcing for language-related research. This includes learning about demographics, platform mechanisms, schemes for ensuring data quality, best practices regarding the treatment of workers and, most of all, lessons learned from previous crowdsourcing projects as described in the literature and as conducted by the instructor.
The educational goal is to enable participants to successfully set up crowdsourcing projects and to circumnavigate typical pitfalls.

The course is organized in 5 sessions of 90 minutes each.

  1. What is Crowdsourcing? History and demographics, definitions, elementary concepts, example projects.
  2. Crowdsourcing platforms, esp. Amazon Mturk and Crowdflower. Technical possibilities, payment schemes, Do’s and Don’ts, schemes for ensuring quality,
  3. Successful design patterns for Crowdsourcing projects for language tasks,
  4. Crowdsourcing projects for language tasks, lessons learned, including non-English tasks.
  5. Quality Control Mechanisms, Ethical considerations, how to treat your crowdworkers,, requester code of conduct, turker forums

Short Bio

Chris is assistant professor and head of the Language Technology group at TU Darmstadt in Germany. He received his Ph.D. from the University of Leipzig, and subsequently spent three years in industrial search engine research at Powerset and Microsoft Bing in San Francisco, California. He is regularly publishing in journals and top conferences in the field of Computational Linguistics.
His research is targeted towards self-learning structure from natural language text, specifically regarding semantic representations. Using big-data techniques, his group has built an open-source, scalable language-independent framework for symbolic distributional semantics. To connect induced structures to tasks, Chris is frequently using crowdsourcing techniques for the acquisition of natural language semantics data.


LECTURE 1: What is Crowdsourcing? 1 Crowdsourcing_Aug2016_ESSLLII.pptx
History and demographics, definitions, elementary concepts, example projects

LECTURE 2: Crowdsourcing platforms 2 Crowdsourcing_Aug2016_ESSLLII.pptx
esp. Amazon Mturk and Crowdflower. Technical possibilities, payment schemes, Do’s and Dont’s, schemes for ensuring quality

LECTURE 3: Successful design patterns 3 Crowdsourcing_Aug2016_ESSLLII.pptx
illustrated with some exemplary projects

LECTURE 4: Crowdsourcing projects for language tasks 4 Crowdsourcing_Aug2016_ESSLLII.pptx
a variety of projects, and lessons learned

LECTURE 5: Quality Control and Ethical considerations 5 Crowdsourcing_Aug2016_ESSLLII.pptx
quality control mechanisms, modelling the quality of individual workers automatically, how to treat your crowdworkers, requester code of conduct, crowdworker forums

Additional References (Selection)

General Studies and Surveys

Robert Munro, Steven Bethard, Victor Kuperman, Vicky Tzuyin Lai, Robin Melnick, Christopher Potts, Tyler Schnoebelen, and Harry Tily. 2010. Crowdsourcing and language studies: the new generation of linguistic data. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (CSLDAMT ’10). 122-130.

Rion Snow, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP ’08). 254-263.

Juska-Bacher, Britta and Biemann, Chris and Quasthoff, Uwe. 2013. Webbasierte linguistische Forschung: Möglichkeiten und Begrenzungen beim Umgang mit Massendaten. Linguistik online 61, 4/2013

Lexical Resource

Lafourcade, Mathieu and Zarrouk, Manel and Joubert, Alain. 2014. About Inferences in a Crowdsourced Lexical-Semantic Network. Proceedings of the EACL, Gothenburg, Sweden, 174–182

Braslavski, Pavel and Ustalov, Dmitry and Mukhin, Mikhail. 2014. A Spinning Wheel for YARN: User Interface for a Crowdsourced Thesaurus. Proceedings of the Demonstrations at EACL, Gothenburg, Sweden, 101–104

Fossati, Marco and Giuliano, Claudio and Tonelli, Sara. 2013. Outsourcing FrameNet to the Crowd. Proceedings of ACL (Volume 2: Short Papers), Sofia, Bulgaria, 742–747

Hartshorne, Joshua K. and Bonial, Claire and Palmer, Martha. 2014. The VerbCorner Project: Findings from Phase 1 of crowd-sourcing a semantic decomposition of verbs. Proceedings of ACL (Volume 2: Short Papers), Baltimore, Maryland,397–402

Biemann, Chris and Nygaard, Valerie. 2010. Crowdsourcing WordNet. Proceedings of GWC-2010

Word Sense

Jurgens, David. 2014. Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels. Proceedings of NAACL-HLT, Atlanta, Georgia, 556–562

Lopez de Lacalle, Oier and Agirre, Eneko. 2015. Crowdsourced Word Sense Annotations and Difficult Words and Examples. Proceedings of the 11th International Conference on Computational Semantics, London, UK, 94–100

Biemann, Chris. 2012. Creating a system for lexical substitutions from scratch using crowdsourcing. Lang. Resources & Evaluation, vol. 47, no. 1, p. 97–112

Event entailment

Takabatake, Yu and Morita, Hajime and Kawahara, Daisuke and Kurohashi, Sadao and Higashinaka, Ryuichiro and Matsuo, Yoshihiro. 2015. Classification and Acquisition of Contradictory Event Pairs using Crowdsourcing. Proceedings of the The 3rd Workshop on EVENTS, Denver, Colorado, 99–107


Steven Burrows, Martin Potthast, and Benno Stein. 2013. Paraphrase Acquisition via Crowdsourcing and Machine Learning. Transactions on Intelligent Systems and Technology (ACM TIST)

Tschirsich, Martin and Hintz, Gerold. 2013. Leveraging Crowdsourcing for Paraphrase Recognition. Proceedings of LAW and Interoperability with Discourse, Sofia, Bulgaria, 205–213

Matteo Negri, Yashar Mehdad, Alessandro Marchetti, Danilo Giampiccolo, and Luisa Bentivogli. 2012. Chinese whispers: Cooperative paraphrase acquisition. In Proceedings of LREC’12, Istanbul, Turkey


Feizabadi, Parvin Sadat and Padó, Sebastian. 2014. Crowdsourcing Annotation of Non-Local Semantic Roles. Proceedings of EACL, volume 2: Short Papers, Gothenburg, Sweden, 226–230


Yan, Rui and Gao, Mingkun and Pavlick, Ellie and Callison-Burch, Chris. 2014. Are Two Heads Better than One? Crowdsourced Translation via a Two-Step Collaboration of Non-Professional Translators and Editors. Proceedings of ACL (Long Papers), Baltimore, MD, 1134–1144

Kunchukuttan, Anoop and Chatterjee, Rajen and Roy, Shourya and Mishra, Abhijit and Bhattacharyya, Pushpak. 2013. TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain. Proceedings of ACL System Demonstrations, Sofia, Bulgaria, 175–180

Sequence Tagging

Hovy, Dirk and Plank, Barbara and Søgaard, Anders. 2014. Experiments with crowdsourced re-annotation of a POS tagging data set. Proceedings ACL (Volume 2: Short Papers), Baltimore, Maryland, 377–382

Sentiment (inter alia)

Staiano, Jacopo and Guerini, Marco. 2014. Depeche Mood: a Lexicon for Emotion Analysis from Crowd Annotated News. Proceedings ACL (Volume 2: Short Papers), Baltimore, MD, 427–433

Text Reuse and Simplification

Potthast, Martin and Hagen, Matthias and Völske, Michael and Stein, Benno. 2013. Crowdsourcing Interaction Logs to Understand Text Reuse from the Web. Proceedings of ACL (Volume 1: Long Papers), Sofia, Bulgaria, 1212–1221

Amancio, Marcelo and Specia, Lucia. 2014. An Analysis of Crowdsourced Text Simplifications, Proceedings of the 3rd Workshop on PITR, Gothenburg, Sweden, 123–130

Quality and how to make use of divergence

Felt, Paul and Black, Kevin and Ringger, Eric and Seppi, Kevin and Haertel, Robbie. 2015. Early Gains Matter: A Case for Preferring Generative over Discriminative Crowdsourcing Models. Proceedings of NAACL-HLT, Denver, Colorado, 882–891

Ramanath, Rohan and Choudhury, Monojit and Bali, Kalika and Saha Roy, Rishiraj. 2013. Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation. Proceedings of ACL (Volume 1: Long Papers), Sofia, Bulgaria, 1713–1722

Integration in Annotation Tools

Bontcheva, Kalina and Roberts, Ian and Derczynski, Leon and Rout, Dominic. 2014. The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy. Proceedings of the Demonstrations at EACL, Gothenburg, Sweden, 97–100

Yimam, Seid Muhie and Gurevych, Iryna and Eckart de Castilho, Richard and Biemann Chris. 2013. WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations. In Proceedings of ACL-2013, demo session, Sofia, Bulgaria