BioMed Central Research How to make the most of NE dictionaries in statistical NER (2009)
Bmc Bioinformatics, Yutaka Sasaki, Yoshimasa Tsuruoka, John Mcnaught, Sophia Ananiadou, John Mcnaught, ...
growth of a wide range of repositories of biomedical data and literature. The automatic construction and update of scientific knowledge bases is a major research topic in Biofrom
Karin Radrich, Yoshimasa Tsuruoka, Paul Dobson, Albert Gevorgyan, Neil Swainston, Jean-Marc Schwartz
The number of genome-scale metabolic models has been rising quickly in recent years, and the scope of their utilization encompasses a broad range of applications from metabolic engineering to...
An Annotation Type System for a Data-Driven NLP Pipeline (2009)
Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, John Mcnaught, Yoshimasa Tsuruoka, ...
We introduce an annotation type system for a data-driven NLP core system. The specifications cover formal document structure and document meta information, as well as the linguistic levels of...
Nobuo Araki, Kazuhiro Yoshida, Yoshimasa Tsuruoka
Abstract — We address the problem of predicting moves in the board game of Go. We use the relative frequencies of local board patterns observed in game records to generate a ranked list of moves,...
Accelerating the annotation of sparse named entities by dynamic sentence selection (2008)
Tsuruoka, Yoshimasa, Tsujii, Jun'ichi, Ananiadou, Sophia
Abstract Background Previous studies of named entity recognition have shown that a reasonable level of recognition accuracy can be achieved by using machine learning models such as conditional random...
How to make the most of NE dictionaries in statistical NER (2008)
Sasaki, Yutaka, Tsuruoka, Yoshimasa, McNaught, John, Ananiadou, Sophia
Abstract Background When term ambiguity and variability are very high, dictionary-based Named Entity Recognition ( NER ) is not an ideal solution even though large-scale terminological resources are...
BOOTStrep Annotation Scheme – Encoding Information for Text Mining (2008)
Scott Piao, Ekaterina Buyko, Yoshimasa Tsuruoka, Katrin Tomanek, John Mcnaught, Udo Hahn, ...
Annotation of information in corpora is an important aspect of text mining. It bridges between the information hidden in natural language texts and the semantic search queries for the information...
BioMed Central Open Access (2008)
Bmc Bioinformatics, Hong-woo Chun, Yoshimasa Tsuruoka, Jin-dong Kim, Rie Shiba, Naoki Nagata, ...
Proceedings Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts
Normalizing biomedical terms by minimizing ambiguity and variability (2008)
Tsuruoka, Yoshimasa, McNaught, John, Ananiadou, Sophia
Abstract Background One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of...
Yusuke Miyao, Yoshimasa Tsuruoka, Yuichiro Matsubayashi, Sophia Ananiadou
Recently, several text mining programs have reached a near-practical level of performance. Some systems are already being used by biologists and database curators. However, it has also been...
FACTA: a text search engine for finding associated biomedical concepts (2008)
Tsuruoka, Yoshimasa, Tsujii, Jun'ichi, Ananiadou, Sophia
Summary: FACTA is a text search engine for MEDLINE abstracts, which is designed particularly to help users browse biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds)...
Integration of Diverse Knowledge and Data into Biomedical Knowledge Matrices (2007)
Yoshimasa Tsuruoka, Teruyoshi Hishiki, Osamu Ogasawara, Kousaku Okubo
After the accomplishment of human draft sequence, more and more efforts are being made in the mapping of the data-driven patterns to background knowledge, hoping to efficiently produce hypotheses out...
Game-tree search algorithm based on realization probability (2007)
Yoshimasa Tsuruoka, Daisaku Yokoyama, Takashi Chikayama
In games like chess, the node-expansion strategy significantly affects the performance of a gameplaying program. In this article we propose a new game-tree search algorithm that uses the realization...
Tsuruoka, Yoshimasa, McNaught, John, Tsujii, Jun'i;chi, Ananiadou, Sophia
Motivation: One of the bottlenecks of biomedical data integration is variation of terms. Exact string matching often fails to associate a name with its biological concept, i.e. ID or accession number...
Kazuhiro Yoshida, Yoshimasa Tsuruoka, Yusuke Miyao
We aim to improve the performance of a syntactic parser that uses a part-of-speech (POS) tagger as a preprocessor. Pipelined parsers consisting of POS taggers and syntactic parsers have several...
Kazuhiro Yoshida, Yoshimasa Tsuruoka, Yusuke Miyao
We aim to improve the performance of a syntactic parser that uses a part-of-speech (POS) tagger as a preprocessor. Pipelined parsers consisting of POS taggers and syntactic parsers have several...
Chun, Hong-Woo, Tsuruoka, Yoshimasa, Kim, Jin-Dong, Shiba, Rie, Nagata, Naoki, Hishiki, Teruyoshi, ...
Abstract Background Automatic recognition of relations between a specific disease term and its relevant genes or protein terms is an important practice of bioinformatics. Considering the utility of...
Mining Opinion Polarity Relations of Citations (2006)
Scott S. Piao, Sophia Ananiadou, Yoshimasa Tsuruoka, Yutaka Sasaki, John Mcnaught
Opinion mining has been receiving increasing attention recently, and various approaches have been suggested for mining sentiment information, such as mining attitudes or opinions about a topic or...
Hong-woo Chun, Yoshimasa Tsuruoka, Jin-dong Kim, Rie Shiba, Naoki Nagata, Teruyoshi Hishiki
We describe a system that extracts disease-gene relations from MedLine. We constructed a dictionary for disease and gene names from six public databases and extracted relation candidates by...
Hong-woo Chun, Yoshimasa Tsuruoka, Jin-dong Kim, Rie Shiba, Naoki Nagata, Teruyoshi Hishiki
We describe a system that extracts disease-gene relations from MedLine. We constructed a dictionary for disease and gene names from six public databases and extracted relation candidates by...
Extremely lexicalized models for accurate and fast hpsg parsing (2006)
Takashi Ninomiya, Yoshimasa Tsuruoka, Takuya Matsuzaki, Yusuke Miyao
This paper describes an extremely lexicalized probabilistic model for fast and accurate HPSG parsing. In this model, the probabilities of parse trees are defined with only the probabilities of...
Yusuke Miyao, Tomoko Ohta, Katsuya Masuda, Yoshimasa Tsuruoka, Kazuhiro Yoshida, Takashi Ninomiya
This paper introduces a novel framework for the accurate retrieval of relational concepts from huge texts. Prior to retrieval, all sentences are annotated with predicate argument structures and...
Daisuke Okanohara, Yusuke Miyao, Yoshimasa Tsuruoka
This paper presents techniques to apply semi-CRFs to Named Entity Recognition tasks with a tractable computational cost. Our framework can handle an NER task that has long named entities and many...
Tomoko Ohta, Yoshimasa Tsuruoka, Jumpei Takeuchi, Jin-dong Kim, Yusuke Miyao, Akane Yakushiji, ...
We present a practical HPSG parser for English, an intelligent search engine to retrieve MEDLINE abstracts that represent biomedical events and an efficient MED-LINE search tool helping users to find...
Daisuke Okanohara, Yusuke Miyao, Yoshimasa Tsuruoka
This paper presents techniques to apply semi-CRFs to Named Entity Recognition tasks with a tractable computational cost. Our framework can handle an NER task that has long named entities and many...
Yusuke Miyao, Tomoko Ohta, Katsuya Masuda, Yoshimasa Tsuruoka, Kazuhiro Yoshida, Takashi Ninomiya
This paper introduces a novel framework for the accurate retrieval of relational concepts from huge texts. Prior to retrieval, all sentences are annotated with predicate argument structures and...
Chunk parsing revisited (2005)
tsuruoka,tsujii§ Chunk parsing is conceptually appealing but its performance has not been satisfactory for practical use. In this paper we show that chunk parsing can perform significantly better...
Bidirectional inference with the easiest-first strategy for tagging sequence data (2005)
This paper presents a bidirectional inference algorithm for sequence labeling problems such as part-of-speech tagging, named entity recognition and text chunking. The algorithm can enumerate all...
Bidirectional inference with the easiest-first strategy for tagging sequence data (2005)
This paper presents a bidirectional inference algorithm for sequence labeling problems such as part-of-speech tagging, named entity recognition and text chunking. The algorithm can enumerate all...
A machine learning approach to acronym generation (2005)
This paper presents a machine learning approach to acronym generation. We formalize the generation process as a sequence labeling problem on the letters in the definition (expanded form) so that a...
Biological Semantics Pages, Yoshimasa Tsuruoka
This paper presents a machine learning approach to acronym generation. We formalize the generation process as a sequence labeling problem on the letters in the definition (expanded form) so that a...
Chunk Parsing Revisited (2005)
Yoshimasa Tsuruoka, Jun'ichi Tsujii
Chunk parsing is conceptually appealing but its performance has not been satisfactory for practical use. In this paper we show that chunk parsing can perform significantly better than previously...
Improving the performance of dictionary-based approaches in protein name recognition (2004)
Dictionary-based protein name recognition is often a first step in extracting in-formation from biomedical documents because it can provide ID information on recognized terms. However,...
Introduction to the bio-entity recognition task at JNLPBA (2004)
Jin-dong Kim, Tomoko Ohta, Yoshimasa Tsuruoka, Yuka Tateisi, Nigel Collier
Iterative CKY parsing for Probabilistic Context-Free Grammars (2004)
This paper presents an iterative CKY parsing algorithm for probabilistic contextfree grammars (PCFG). This algorithm enables us to prune unnecessary edges produced during parsing, which results in...
Training a Naive Bayes classifier via the EM algorithm with a class distribution constraint (2003)
Combining a naive Bayes classifier with the EM algorithm is one of the promising approaches for making use of unlabeled data for disambiguation tasks when using local context features including word...
Boosting precision and recall of dictionary-based protein name recognition (2003)
Dictionary-based protein name recognition is the first step for practical information extraction from biomedical documents because it provides ID information of recognized terms unlike machine...
Probabilistic Term Variant Generator for Biomedical Terms (2003)
This paper presents an algorithm to generate possible variants for biomedical terms. The algorithm gives each variant its generation probability representing its plausibility, which is potentially...
Boosting Precision and Recall of Dictionary-Based Protein Name Recognition (2003)
Yoshimasa Tsuruoka, Jun'ichi Tsujii
Dictionary-based protein name recognition is the first step for practical information extraction from biomedical documents because it provides ID information of recognized terms unlike machine...
Yoshimasa Tsuruoka, Takashi Chikayama
Classifiers are often required to output not only a classification result but also the probability of the classification. We focus on the decision list classifier which has successfully been applied...
Yoshimasa Tsuruoka, Takashi Chikayama
Classifiers are often required to output not only a classification result but also the probability of the classification.
Chun, Hong-Woo, Tsuruoka, Yoshimasa, Kim, Jin-Dong, Shiba, Rie, Nagata, Naoki, Hishiki, Teruyoshi, ...
FACTA: a text search engine for finding associated biomedical concepts
Tsuruoka, Yoshimasa, Tsujii, Jun'ichi, Ananiadou, Sophia
Summary: FACTA is a text search engine for MEDLINE abstracts, which is designed particularly to help users browse biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds)...