Steven Bird, Robert Dale, Bonnie J. Dorr, Bryan Gibson, Mark T. Joseph, Min-yen Kan, ...
The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of...
OLAC: Accessing the world's language resources (2009)
Language resources are the bread and butter of language documentation and linguistic investigation. They include the primary objects of study such as texts and recordings, the outputs of research...
Curating lexical databases for minority languages (2009)
One of the biggest challenges in compiling a dictionary of a minority language is managing the large quantity of lexical data. Decisions about the format and content of the dictionary or the...
Querying linguistic trees (2009)
Large databases of linguistic annotations are used for testing linguistic hypotheses and for training language processing models. These linguistic annotations are often syntactic or prosodic in...
Querying linguistic trees (2009)
Large databases of linguistic annotations are used for testing linguistic hypotheses and for training language processing models. These linguistic annotations are often syntactic or prosodic in...
Building a Search Engine to Drive Problem-Based Learning ABSTRACT (2008)
Search engines pervade the digital world, mediating most access to information instantaneously. We have found that students can build search engine components, and even entire search engines, in the...
Representing and Rendering Linguistic Paradigms (2008)
Linguistic forms are inherently multi-dimensional. They exhibit a variety of phonological, orthographic, morphosyntactic, semantic and pragmatic properties. Accordingly, linguistic analysis involves...
Chief Investigators, Timothy Baldwin, Steven Bird, Baden Hughes
Language occupies a central role on the web: most content is expressed in a given language, and most access takes place via natural language input and interfaces. Today, investigation of human...
LPath +: A First-Order Complete Language for Linguistic Tree Query (2008)
Annotated linguistic databases are widely used in linguistic research and in language technology development. These annotations are typically hierarchical, and represent the nested structure of...
Building a Search Engine to Drive Problem-Based Learning ABSTRACT (2008)
Search engines pervade the digital world, mediating most access to information instantaneously. We have found that students can build search engine components, and even entire search engines, in the...
3.1.1 Speech Acts................................... 8 (2008)
Olivia Catherine March, Associate Professor, Steven Bird, Dr. Adrian Pearce
Intelligent agents should be able to communicate with each other using an extensible, expressive language. Agents should have the ability work together in a heterogeneous environment to solve complex...
Jerry Goldman, Steve Renals, Steven Bird, Franciska Jong, Marcello Federico, Carl Fleischhauer, ...
The date of receipt and acceptance will be inserted by the editor Abstract. Spoken word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental...
LPath +: A First-Order Complete Language for Linguistic Tree Query (2008)
Large databases of linguistic annotations are used for testing linguistic hypotheses, and for training language processing models. Linguistic annotations are often syntactic or prosodic and typically...
The Annotation Graph Toolkit: (2008)
Software Components For, Kazuaki Maeda, Steven Bird, Xiaoyi Ma, Haejoong Lee
Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete software infrastructure supporting the rapid...
Transcribing with Annotation Graphs (2007)
Edouard Geoffrois, Claude Barras, Steven Bird, Zhibiao Wu
Transcriber is a tool for manual annotation of large speech files. It was originally designed for the broadcast news transcription task. The annotation file format was derived from previous formats...
Orthography and Identity in Cameroon (2007)
The tone languages of sub-Saharan Africa raise challenging questions for the design of new writing systems. Marking too much or too little tone can have grave consequences for the usability of an...
Review of: Computational Phonology: A Constraint-Based Approach by Steven Bird (2007)
Steven Bird, Deirdre Wheeler, Bob Carpenter
Introduction This book is a revised and expanded version of the author's Ph. D. thesis (Edinburgh University, 1990), entitled: Constraint-Based Phonology. The field of computational phonology is...
Annotation Graphs: A Foundation for Integrating Tools, Formats and Corpora (2007)
In recent work we have presented a formal framework for linguistic annotations using labeled acyclic digraphs. These `annotation graphs' offer a simple yet powerful method for representing...
The phonetic description of Ibibio tones (2007)
Introduction Ibibio belongs to the Lower-Cross language group of the Benue-Congo branch of the NigerCongo language family (Williamson 1989) and it is spoken in Akwa Ibom State in the SouthEastern...
Building an Open Language Archives Community (2007)
Draft only, please do not cite or quote verbatim
Steven Bird, Deirdre Wheeler, Bob Carpenter
natural language processing, edited by
Phonology is the systematic study of the sounds used in language, their internal structure, and their composition into syllables, words and phrases. Computational phonology is the application of...
Annotation tools based on the annotation graph API (2007)
Steven Bird, Kazuaki Maeda, Xiaoyi Ma, Haejoong Lee
Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete open-source software infrastructure...
Orthography and Identity in Cameroon (2007)
The tone languages of sub-Saharan Africa raise challenging questions for the design of new writing systems. Marking too much or too little tone can have grave consequences for the usability of an...
Kazuaki Maeda, Steven Bird, Xiaoyi Ma, Haejoong Lee
Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete software infrastructure supporting the rapid...
Orthography and Identity in Cameroon (2007)
The tone languages of sub-Saharan Africa raise challenging questions for the design of new writing systems. Marking too much or too little tone can have grave consequences for the usability of an...
Managing Fieldwork Data with Toolbox and the Natural Language Toolkit (2007)
Robinson, Stuart, Aumann, Greg, Bird, Steven
This paper shows how fieldwork data can be managed using the program Toolbox together with the Natural Language Toolkit (NLTK) for the Python programming language. It provides background information...
Structured Classification for Multilingual Natural Language Processing (2007)
Philip Blunsom, Timothy Baldwin, Philip Blunsom, Steven Bird, James Curran
This thesis investigates the application of structured sequence classification models to multilingual natural language processing (NLP). Many tasks tackled by NLP can be framed as classification,...
Dynamic path prediction and recommendation in a museum environment (2007)
Karl Grieser, Timothy Baldwin, Steven Bird
This research is concerned with making recommendations to museum visitors based on their history within the physical environment, and textual information associated with each item in their history....
Reconsidering language identification for written language resources (2006)
Baden Hughes, Timothy Baldwin, Steven Bird, Jeremy Nicholson, Andrew Mackinlay
The task of identifying the language in which a given document (ranging from a sentence to thousands of pages) is written has been relatively well studied over several decades. Automated approaches...
Designing and evaluating an XPath dialect for linguistic queries (2006)
Linguistic research and natural language processing employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for...
Accessing the Spoken Word (2005)
Jerry Goldman, Steve Renals, Steven Bird, Franciska Jong, Mark Kornbluh, ...
Spoken word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental proceedings, lectures, and telephone conversations. The collection, access...
Extending XPath to support linguistic queries (2005)
Steven Bird, Yi Chen, Susan B. Davidson, Haejoong Lee, Yifeng Zheng
Linguistic research and language technology development employ large repositories of ordered trees. XML, a standard ordered tree model, and XPath, its associated language, are natural choices for...
NLTK-Lite: Efficient scripting for natural language processing (2005)
The Natural Language Toolkit is a suite of program modules, data sets, tutorials and exercises covering symbolic and statistical natural language processing. NLTK is popular in teaching and research,...
LPath+: A First-Order Complete Language for Linguistic Tree Query (2005)
PACLIC 19 / Taipei, taiwan / December 1-3, 2005
Towards a general model for linguistic paradigms (2004)
David Penton, Catherine Bow, Steven Bird, Baden Hughes
Linguistic forms are inherently multi-dimensional. They exhibit a variety of phonological, orthographic, morphosyntactic, semantic and pragmatic properties. Accordingly, linguistic analysis involves...
Functional Requirements for an Interlinear Text Editor (2004)
Baden Hughes Catherine, Catherine Bow, Steven Bird
Interlinear text has long been considered a valuable format in the presentation of multilingual data, and a variety of software tools have facilitated the creation and processing of such texts by...
Securing Interpretability: The Case of Ega Language Documentation (2004)
Dafydd Gibbon Catherine, Catherine Bow, Steven Bird, Baden Hughes
The prime consideration in designing sustainable language resources is to ensure that they remain interpretable for coming generations of users. In this paper we adopt a new perspective on resource...
Querying and Updating Treebanks: A Critical Survey and Requirements Analysis (2004)
Language technology makes extensive use of hierarchically annotated text and speech data. These databases are stored in flat files and manipulated using corpus-specific query tools or special-purpose...
Representing and Rendering Linguistic Paradigms (2004)
David Penton And, David Penton, Steven Bird
Linguistic forms are inherently multi-dimensional. They exhibit a variety of phonological, orthographic, morphosyntactic, semantic and pragmatic properties. Accordingly, linguistic analysis involves...
Querying and updating treebanks: A critical survey and requirements analysis (2004)
Language technology makes extensive use of hierarchically annotated text and speech data. These databases are stored in flat files and manipulated using corpus specific query tool or scripts. While...
Experiments with Data-Intensive NLP on a Computational Grid (2004)
Large databases of annotated text and speech are widely used for developing and testing language technologies. However, the size of these corpora and associated language models are outpacing the...
Ega Interlinear XML samples (2003)
Gibbon, Dafydd, Bird, Steven, Bow, Catherine, Hughes, Baden
Ega Interlinear XML samples including python script to convert from table format
Seven dimensions of portability for language documentation and description (2003)
http://www.ethnologue.com/show_work.asp?id=43792
Extending Dublin Core metadata to support the description and discovery of language resources (2003)
http://www.ethnologue.com/show_work.asp?id=44054
Building an Open Language Archives Community on the OAI foundation (2003)
Gary Simons, Steven Bird, Edited Tim Cole, Michael Seadle
The Open Language Archives Community (OLAC) is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources. The Dublin Core (DC)...
Extending Dublin Core metadata to support the description and discovery of language resources (2003)
SIL International Abstract. As language data and associated technologies proliferate and as the language resources community expands, it is becoming increasingly difficult to locate and reuse...
Seven dimensions of portability for language documentation and description (2003)
Abstract: The process of documenting and describing the world’s languages is undergoing radical transformation with the rapid uptake of new digital technologies for capture, storage, annotation and...
Encoding and Presenting Interlinear Text Using XML Technologies (2003)
Baden Hughes Steven, Steven Bird, Catherine Bow
Interlinear text is a common presentational format for linguistic information, and its creation and management have been greatly facilitated by the development of specialised software. In earlier...
Seven dimensions of portability for language documentation and description (2003)
SIL International Abstract: The process of documenting and describing the world’s languages is undergoing radical transformation with the rapid uptake of new digital technologies for capture,...
New ways of documenting and describing language via electronic media coupled with new ways of distributing the results via the World‐Wide Web offer a degree of access to language resources...
Nltk: The natural language toolkit (2002)
NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware. NLTK covers symbolic and...
Creating annotation tools with the annotation graph toolkit (2002)
Steven Bird, Kazuaki Maeda, Xiaoyi Ma, Haejoong Lee
Annotation graphs (AGs) provide an efficient and expressive data model for linguistic annotations of time-series data [Bird and Liberman, 2001]. Recently, the LDC has been developing a complete...
NLTK: The Natural Language Toolkit (2002)
NLTK, the Natural Language Toolkit, is a suite of open source program modules, tutorials and problem sets, providing ready-to-use computational linguistics courseware. NLTK covers symbolic and...
NLTK: The Natural Language Toolkit (2002)
The Natural Language Toolkit is a suite of program modules, data sets, tutorials and exercises, covering symbolic and statistical natural language processing. NLTK is written in Python and...
NLTK: The Natural Language Toolkit (2002)
The Natural Language Toolkit is a suite of program modules, data sets and tutorials supporting research and teaching in computational linguistics and natural language processing. NLTK is written in...
The OLAC Metadata Set and Controlled Vocabularies (2001)
As language data and associated technologies proliferate and as the language resources community rapidly expands, it has become difficult to locate and reuse existing resources. Are there any lexical...
The Open Language Archives Community and Asian Language Resources (2001)
Steven Bird, Gary Simons, Chu-ren Huang
The Open Language Archives Community (OLAC) is a new project to build a worldwide system of federated language archives based on the Open Archives Initiative and the Dublin Core Metadata Initiative....
The annotation graph toolkit: software components for building linguistic annotation tools (2001)
Kazuaki Maeda, Steven Bird, Xiaoyi Ma, Haejoong Lee
Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete software infrastructure supporting the rapid...
The OLAC Metadata Set and Controlled Vocabularies (2001)
Steven Bird Linguistic, Steven Bird
As language data and associated technologies proliferate and as the language resources community rapidly expands, it has become difficult to locate and reuse existing resources.
Christopher Cieri, Steven Bird
Annotation graphs and annotation servers offer infrastructure to support the analysis of human language resources in the form of time-series data such as text, audio and video. This paper outlines...
A Formal Framework for Interlinear Text (2000)
Interlinear texts come in many forms and can be represented digitally in many ways, e.g. plain text with hard spacing, tables, special markup, and special-purpose data structures. There are various...
Querying Databases of Annotated Speech (2000)
Annotated speech corpora are databases consisting of signal data along with time-aligned symbolic `transcriptions'. Such databases are typically multidimensional, heterogeneous and dynamic....
Querying Databases of Annotated Speech (2000)
Annotated speech corpora are databases consisting of signal data along with time-aligned symbolic `transcriptions'. Such databases are typically multidimensional, heterogeneous and dynamic....
ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation (2000)
Steven Bird, David Day, John Garofolo, John Henderson, Christophe Laprun, Mark Liberman
We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract...
Many Uses, Many Annotations for Large Speech Corpora: Switchboard and TDT as Case Studies (2000)
David Graff And, David Graff, Steven Bird
This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are presented: the...
Towards A Query Language for Annotation Graphs (2000)
Steven Bird, Peter Buneman, Wang-chiew Tan
The multidimensional, heterogeneous, and temporal nature of speech databases raises interesting challenges for representation and query. Recently, annotation graphs have been proposed as a...
Many Uses, Many Annotations for Large Speech Corpora: (2000)
Switchboard And Tdt, David Graff, Steven Bird
This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are presented: the...
Many Uses, Many Annotations for Large Speech Corpora: Switchboard and TDT as Case Studies (2000)
This paper discusses the challenges that arise when large speech corpora receive an ever-broadening range of diverse and distinct annotations. Two case studies of this process are presented: the...
ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation (2000)
Steven Bird, David Day, John Garofolo, John Henderson, Christophe Laprun, Mark Liberman
We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract...
Towards a Query Language for Annotation Graphs (2000)
Steven Bird, Peter Buneman, Wang-chiew Tan
The multidimensional, heterogeneous, and temporal nature of speech databases raises interesting challenges for representation and query. Recently, annotation graphs have been proposed as a...
Annotation graphs as a framework for multidimensional linguistic data analysis (1999)
In recent work we have presented a formal framework for linguistic annotation based on labeled acyclic digraphs. These 'annotation graphs' offer a simple yet powerful method for...
When Marking Tone Reduces Fluency: An Orthography Experiment (1999)
Should an alphabetic orthography for a tone language include tone marks? Opinion and practice are divided along three lines: zero marking, phonemic marking and various reduced marking schemes. This...
A formal framework for linguistic annotation (1999)
`Linguistic annotation ' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions-- audio, video and/or physiological...
Strategies for Representing Tone in African Writing (1999)
Systems Critical Review, Steven Bird
Tone languages provide some interesting challenges for the designers of new orthographies.
Annotation Graphs as a Framework for Multidimensional Linguistic Data Analysis (1999)
In recent work we have presented a formal framework for linguistic annotation based on labeled acyclic digraphs. These #annotation graphs# o#er a simple yet powerful method for representing complex...
A Formal Framework for Linguistic Annotation (1999)
Steven Bird And, Steven Bird, Mark Liberman
`Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological...
Multidimensional Exploration of Online Linguistic Field Data (1999)
Advances in storage technology make it possible to house virtually unlimited quantities of recorded speech data online. Advances in character-encoding technology make it possible to create...
Annotation Graphs as a Framework for Multidimensional Linguistic Data Analysis (1999)
In recent work we have presented a formal framework for linguistic annotation based on labeled acyclic digraphs. These `annotation graphs' offer a simple yet powerful method for representing...
Multidimensional Exploration of Online Linguistic Field Data (1999)
Advances in storage technology make it possible to house virtually unlimited quantities of recorded speech data online. Advances in character-encoding technology make it possible to create...
A Formal Framework for Linguistic Annotation (1999)
`Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological...
Multidimensional Exploration of Online Linguistic Field Data (1999)
Advances in storage technology make it possible to house virtually unlimited quantities of recorded speech data online. Advances in character-encoding technology make it possible to create...
Towards A Formal Framework For Linguistic Annotations (1999)
`Linguistic annotation' is a term covering any transcription, translation or annotation of textual data or recorded linguistic signals. While there are several ongoing efforts to provide formats...
Multidimensional Exploration of Online Linguistic Field Data (1999)
Advances in storage technology make it possible to house virtually unlimited quantities of recorded speech data online. Advances in character-encoding technology make it possible to create...
Multidimensional Exploration of Online Linguistic Field Data (1999)
Advances in storage technology make it possible to house virtually unlimited quantities of recorded speech data online. Advances in character-encoding technology make it possible to create...
Strategies for Representing Tone in African Writing Systems: A Critical Review (1998)
Tone languages provide some interesting challenges for the designers of new orthographies. One approach is to omit tone marks, just as stress is not marked in English (zero marking). Another approach...
When marking tone reduces fluency : an orthography experiment in Cameroon (1998)
Includes bibliographical references (p. 25-26)
A lexical database tool for quantitative phonological research (1997)
A lexical database tool tailored for phonological research is described. Database fields include transcriptions, glosses and hyperlinks to speech files. Database queries are expressed using HTML...
Dschang Syllable Structure (1997)
this article will be to explain some of the alternations and distributional asymmetries in terms of syllable structure. The only consonant clusters which occur have the form (N)C(G)(h) where N is a...
Dschang Syllable Structure (1997)
this article will be to explain some of the alternations and distributional asymmetries in terms of syllable structure. The only consonant clusters which occur have the form (N)C(G)(h) where N is a...
A Lexical Database Tool for Quantitative Phonological Research (1997)
A lexical database tool tailored for phonological research is described. Database fields include transcriptions, glosses and hyperlinks to speech files. Database queries are expressed using HTML...
Orthography and Identity in Cameroon (1997)
this document marks the transition to a new period of orthographic history. The 1980s can be viewed as a period of fundamentalism. The orthography standard was absolute, and it had linguistic science...
A semantics for λ {} str : a calculus with overloading and late-binding (1996)
Steven Bird, Jerry L. Morgan, Iway Fong, Jennifer Cole, John Coleman, Alan M. Frisch, ...
"We... recommend this book to anyone interested in computational phonology. "--Computational Linguistics Computational phonology is one of the newest areas of computational...
Dschang Syllable Structure and Moraic Aspiration (1996)
Steven Bird, Steven Bird, Steven Bird
The syllable structure of Dschang is interesting for a variety of reasons. Most notable is the aspiration which can appear on most consonant types, including voiced stops. I shall argue that...
European Studies Research, James M. Scobbie, John S. Coleman, Steven Bird
this paper we give a brief and broad characterisation of Declarative Phonology in terms of certain key aspects, both theoretical and methodological. In Section 2 we present our identification of...
The Bamileke Dschang Associative Construction: Instrumental Findings (1995)
This report is organised as follows. Section 2 is devoted to a description of the field methods which were employed. Section 3 presents our primary findings relating to tone in the associative...
One-level phonology: autosegmental representations and rules as finite automata (1994)
When phonological rules are regarded as declarative descriptions, it is possible to construct a model of phonology in which rules and representations are no longer distinguished and such procedural...
Automated tone transcription (1994)
In this paper I report on an investigation into thc problem of assigning tones to pitch contours. The proposed model is intended to serve as a tool for phonologists working on instrumentally obtained...
Phonological Analysis in Typed Feature Systems (1994)
this paper we suggest some strategies for reuniting phonology and the rest of grammar in the context of a uniform constraint formalism. We explain why this is a desirable goal, and we present some...
Automated Tone Transcription (1994)
In this paper I report on an investigation into the problem of assigning tones to pitch contours. The proposed model is intended to serve as a tool for phonologists working on instrumentally obtained...
Phonological analysis in typed feature systems (1994)
Research on constraint-based grammar frameworks has focussed on syntax and semantics largely to the exclusion of phonology. Likewise, current developments in phonology have generally ignored the...
Phonological analysis in typed feature systems (1994)
Research on constraint-based grammar frameworks has focused on syntax and semantics largely to the exclusion of phonology. Likewise, current developments in phonology have generally ignored the...
This paper is structured as follows. In x2 we explain the goals and method of our experiment, paying particular attention to the technique of determining the fundamental frequency (F 0 ) of a given...
One-Level Phonology: Autosegmental Representations and Rules as Finite Automata (1992)
this paper we present a finite-state model of phonology in which automata are the descriptions and tapes (or strings) are the objects being described. This provides the formal semantics for an...
Finite-State Phonology in HPSG (1992)
Attention on constraint-based grammar formalisms such as Head-driven Phrase Structure Grammar (HPSG) has focussed on syntax and semantics to the exclusion of phonology. This paper investigates the...
Focus and phrasing in Unification Categorial Grammar (1991)
For a long time it has been recognized that continuous speech comes in groups, and that this grouping is in terms of both form and meaning. At the most obvious level, the utterance is simultaneously...
Steven Bird, John Coleman, Janet Pierrehumbert, James Scobbie
This article consists of four sections, where each section has been contributed by a different author. The first three sections present reanalyses of phenomena that have previously been thought to...
this article concerns autosegmental representations, and not the rules which are presumed to manipulate them. Due to the expository goals of this paper we have not attempted to carry out a detailed...
When Marking Tone Reduces Fluency: An Orthography Experiment in Cameroon
Should an alphabetic orthography for a tone language include tone marks? Opinion and practice are divided along three lines: zero marking, phonemic marking and various reduced marking schemes. This...
this paper we report on some ongoing research that is directed at solving these two problems in the context of a phonological grammar development environment (GDE). The primary aim of this GDE is to...