2004. ‘A tool for text comparison (2008)
Text reuse is commonplace in academia and the media. An efficient algorithm for automatically detecting and measuring similar/related texts would have applications in corpus linguistics, historical...
Abstract Constructing Corpora of South Asian Languages (2008)
Paul Baker, Andrew Hardie, Tony Mcenery
million word corpus of South Asian languages. In addition, the project has had to address a number of issues related to establishing a language engineering (LE) environment for South Asian language...
Project ET-10/63: Work Package 3 Report (2008)
Roger Garside, Paul Rayson, Tony Mcenery
In constructing automatic Natural Language Processing (NLP) systems, we need to supply resources of various kinds to provide information about the linguistic facts of the language or sub-language...
Construction and annotation of a corpus of contemporary Nepali (2008)
Allwood, Jens, Yavada, Yogendra P, Hardie, Andres, Lohani, R R, Rhegmi, Bhim, Gurung, S, ...
Construction and annotation of a corpus of contemporary Nepali (2008)
Allwood, Jens, Yavada, Yogendra P, Hardie, Andres, Lohani, R R, Rhegmi, Bhim, Gurung, S, ...
Construction and annotation of a corpus of contemporary Nepali (2008)
Allwood, Jens, Yavada, Yogendra P, Hardie, Andres, Lohani, R R, Rhegmi, Bhim, Gurung, S, ...
Construction and annotation of a corpus of contemporary Nepali (2008)
Allwood, Jens, Yavada, Yogendra P, Hardie, Andres, Lohani, R R, Rhegmi, Bhim, Gurung, S, ...
Project ET-10/63: Work Package 3 Report (2007)
Roger Garside, Paul Rayson, Tony Mcenery
Introduction In constructing automatic Natural Language Processing (NLP) systems, we need to supply resources of various kinds to provide information about the linguistic facts of the language or...
Baker, Paul, McEnery, Tony, Gabrielatos, Costas
Refugees, asylum seekers, and immigrants (henceforth RASIM) coming into the UK have attracted increased press attention (Greenslade, 2005). As their representation in the press can construct their...
Baker, Paul, McEnery, Tony, Gabrielatos, Costas
Refugees, asylum seekers, and immigrants (henceforth RASIM) coming into the UK have attracted increased press attention (Greenslade, 2005). As their representation in the press can construct their...
Baker, Paul, McEnery, Tony, Gabrielatos, Costas
Refugees, asylum seekers, and immigrants (henceforth RASIM) coming into the UK have attracted increased press attention (Greenslade, 2005). As their representation in the press can construct their...
A large semantic lexicon for corpus annotation (2006)
Dawn Archer, Olga Mudraya, Paul Rayson, Roger Garside, Tony Mcenery, ...
Semantic lexical resources play an important part in both corpus linguistics and NLP. Over the past 14 years, a large semantic lexical resource has been built at Lancaster University. Different from...
Collocation, Semantic Prosody, and Near Synonymy: A Cross-Linguistic Perspective (2006)
This paper explores the collocational behaviour and semantic prosody of near synonyms from a cross-linguistic perspective. The importance of these concepts to language learning is well recognized....
Comparing and combining a semantic tagger and a statistical tool for MWE extraction. (2005)
Songlin Piao, Scott;, Rayson, Paul;, Archer, Dawn;, McEnery, Tony
Automatic extraction of multiword expressions (MWEs) presents a tough challenge for the NLP community and corpus linguistics. Indeed, although numerous knowledge-based symbolic approaches and...
The Lancaster Speech, Writing and Thought Presentation Spoken Corpus (2005)
Short, Mick, Semino, Elena, McEnery, Tony, Heywood, John, McIntyre, Dan
The four major objectives of the project were: i) to establish an electronic corpus of (a) conversations, from the British National Corpus (BNC) and (b) oral narratives, from Lancaster's Centre for...
Epistemic modality in MA dissertations (2005)
Gabrielatos, Costas, McEnery, Tony
This paper reports on the compilation, and ongoing mark up and annotation, of a corpus of MA dissertations written by students at the Department of Linguistics and English Language, Lancaster...
Epistemic modality in MA dissertations (2005)
Gabrielatos, Costas, McEnery, Tony
This paper reports on the compilation, and ongoing mark up and annotation, of a corpus of MA dissertations written by students at the Department of Linguistics and English Language, Lancaster...
Epistemic modality in MA dissertations (2005)
Gabrielatos, Costas, McEnery, Tony
This paper reports on the compilation, and ongoing mark up and annotation, of a corpus of MA dissertations written by students at the Department of Linguistics and English Language, Lancaster...
Epistemic modality in MA dissertations (2005)
Gabrielatos, Costas, McEnery, Tony
This paper reports on the compilation, and ongoing mark up and annotation, of a corpus of MA dissertations written by students at the Department of Linguistics and English Language, Lancaster...
Epistemic modality in MA dissertations. (2005)
Gabrielatos, Costas, McEnery, Tony
This paper reports on the compilation, and ongoing mark up and annotation, of a corpus of MA dissertations written by students at the Department of Linguistics and English Language, Lancaster...
Epistemic modality in MA dissertations. (2005)
Gabrielatos, Costas, McEnery, Tony
This paper reports on the compilation, and ongoing mark up and annotation, of a corpus of MA dissertations written by students at the Department of Linguistics and English Language, Lancaster...
Developing asian language corpora: standards and practice (2004)
Zhonghua Xiao, Tony Mcenery, Paul Baker, Andrew Hardie
This paper first discusses standards for developing Asian language corpora so as to facilitate international data exchange. Following this, we present two corpora of Asian languages developed at...
Evaluating lexical resources for a semantic tagger (2004)
Paul Rayson, Dawn Archer, Tony Mcenery
Semantic lexical resources play an important part in both linguistic study and natural language engineering. In Lancaster, a large semantic lexical resource has been built over the past 14 years,...
Extracting Multiword Expressions with a Semantic Tagger (2003)
Andrew Wilson, Paul Rayson, Dawn Archer, Tony Mcenery
Automatic extraction of multiword expressions (MWE) presents a tough challenge for the NLP community and corpus linguistics. Although various statistically driven or knowledge-based approaches have...
Porting an English semantic tagger to the Finnish language (2003)
Laura Löfberg, Dawn Archer, Scott Piao, Paul Rayson, Tony Mcenery, Krista Varantola, ...
Semantic annotation is an important and challenging issue in corpus linguistics and language engineering. While such a tool is available for English in Lancaster (Wilson and Rayson 1993), few such...
A Unicode-based Environment for Creation and Use of Language Resources (2002)
Tablan, V., Ursu, C., Bontcheva, K., Cunningham, H., Maynard, D., Hamza, O., ...
A unicode-based environment for creation and use of language resources (2002)
Valentin Tablan, Cristian Ursu, Kalina Bontcheva, Hamish Cunningham, Diana Maynard, Oana Hamza, ...
GATE is a Unicode-aware architecture, development environment and framework for building systems that process human language. It is often thought that the character sets problem has been solved by...
Paul Baker, Andrew Hardie, Tony Mcenery, Hamish Cunningham, Rob Gaizauskas
The paper describes developments to date on the EMILLE Project (Enabling Minority Language Engineering) being carried out at the Universities of Lancaster and Sheffield. EMILLE was established to...
Corpus Resources and Minority Language Engineering (2000)
Tony Mcenery, Paul Baker, Lou Burnard
Low density languages are typically viewed as those for which few language resources are available. Work relating to low density languages is becoming a focus of increasing attention within language...
Issues in Transcribing a Corpus of Children's Hanwritten Projects (1998)
SMITH, NICHOLAS, MCENERY, TONY
In this paper we describe a corpus, the Lancaster-Leverhulme Corpus of Children's Writing, which is nearing completion at Lancaster University. The corpus has proved a particularly challenging one to...
Corpora and Translation: Uses and Future Prospects (1993)
Although corpora have been an object of study for some decades, the nineteen eighties saw an increased interest in their use and construction. With this increased interest and awareness has come an...