GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2 (2009)
Release type: General
GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 1 (2009)
Release type: General
Corpus Support for Machine Translation at LDC (2008)
This paper describes LDC's efforts in collecting, creating and processing different types of linguistic data, including lexicons, parallel text, multiple translation corpora, and human...
The Annotation Graph Toolkit: (2008)
Software Components For, Kazuaki Maeda, Steven Bird, Xiaoyi Ma, Haejoong Lee
Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete software infrastructure supporting the rapid...
Improving named entity recognition with co-training and unlabeled bilingual data (2008)
Supervised learning systems require a large quantity of labeled data, which is time-consuming, expensive and in some cases requires linguistic expertise to create. Semi-supervised methods combine the...
Annotation tools based on the annotation graph API (2007)
Steven Bird, Kazuaki Maeda, Xiaoyi Ma, Haejoong Lee
Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete open-source software infrastructure...
Stephen Intille, Charles Kukla, Xiaoyi Ma
intille | kukla | mxy @ mit.edu Determining requirements for any design project involves identifying and ranking user needs and preferences. User needs are typically elicited via personal or focus...
Annotation tools based on the annotation graph API (2007)
Steven Bird, Kazuaki Maeda, Xiaoyi Ma, Haejoong Lee
Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete open-source software infrastructure...
Kazuaki Maeda, Steven Bird, Xiaoyi Ma, Haejoong Lee
Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete software infrastructure supporting the rapid...
Champollion: A Robust Parallel Text Sentence Aligner (2006)
This paper describes Champollion, a lexicon-based sentence aligner designed for robust alignment of potential noisy parallel text. Champollion increases the robustness of the alignment by assigning...
Champollion: A Robust Parallel Text Sentence Aligner (2006)
This paper describes Champollion, a lexicon-based sentence aligner designed for robust alignment of potential noisy parallel text. Champollion increases the robustness of the alignment by assigning...
Models and Tools for Collaborative Annotation (2002)
Ma, Xiaoyi, Lee, Haejoong, Bird, Steven, Maeda, Kazuaki
The Annotation Graph Toolkit (AGTK) is a collection of software which facilitates development of linguistic annotation tools. AGTK provides a database interface which allows applications to use a...
Creating Annotation Tools with the Annotation Graph Toolkit (2002)
Maeda, Kazuaki, Bird, Steven, Ma, Xiaoyi, Lee, Haejoong
The Annotation Graph Toolkit is a collection of software supporting the development of annotation tools based on the annotation graph model. The toolkit includes application programming interfaces...
Bird, Steven, Maeda, Kazuaki, Ma, Xiaoyi, Lee, Haejoong, Randall, Beth, Zayat, Salim
Four diverse tools built on the Annotation Graph Toolkit are described. Each tool associates linguistic codes and structures with time-series data. All are based on the same software library and tool...
Creating annotation tools with the annotation graph toolkit (2002)
Steven Bird, Kazuaki Maeda, Xiaoyi Ma, Haejoong Lee
Annotation graphs (AGs) provide an efficient and expressive data model for linguistic annotations of time-series data [Bird and Liberman, 2001]. Recently, the LDC has been developing a complete...
The annotation graph toolkit: software components for building linguistic annotation tools (2001)
Kazuaki Maeda, Steven Bird, Xiaoyi Ma, Haejoong Lee
Annotation graphs provide an efficient and expressive data model for linguistic annotations of time-series data. This paper reports progress on a complete software infrastructure supporting the rapid...
Parallel Text Collections at Linguistic Data Consortium (1999)
The Linguistic Data Consortium (LDC) is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons,...