Yoshimasa Tsuruoka

BioMed Central Research How to make the most of NE dictionaries in statistical NER (2009)

Bmc Bioinformatics, Yutaka Sasaki, Yoshimasa Tsuruoka, John Mcnaught, Sophia Ananiadou, John Mcnaught, ...

growth of a wide range of repositories of biomedical data and literature. The automatic construction and update of scientific knowledge bases is a major research topic in Biofrom

Reconstruction of an in silico metabolic model of _Arabidopsis thaliana_ through database integration (2009)

Karin Radrich, Yoshimasa Tsuruoka, Paul Dobson, Albert Gevorgyan, Neil Swainston, Jean-Marc Schwartz

The number of genome-scale metabolic models has been rising quickly in recent years, and the scope of their utilization encompasses a broad range of applications from metabolic engineering to...

An Annotation Type System for a Data-Driven NLP Pipeline (2009)

Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, John Mcnaught, Yoshimasa Tsuruoka, ...

We introduce an annotation type system for a data-driven NLP core system. The specifications cover formal document structure and document meta information, as well as the linguistic levels of...

Computational Intelligence and Games (CIG 2007) Move Prediction in Go with the Maximum Entropy Method (2008)

Nobuo Araki, Kazuhiro Yoshida, Yoshimasa Tsuruoka

Abstract — We address the problem of predicting moves in the board game of Go. We use the relative frequencies of local board patterns observed in game records to generate a ranked list of moves,...

Accelerating the annotation of sparse named entities by dynamic sentence selection (2008)

Tsuruoka, Yoshimasa, Tsujii, Jun'ichi, Ananiadou, Sophia

Abstract Background Previous studies of named entity recognition have shown that a reasonable level of recognition accuracy can be achieved by using machine learning models such as conditional random...

How to make the most of NE dictionaries in statistical NER (2008)

Sasaki, Yutaka, Tsuruoka, Yoshimasa, McNaught, John, Ananiadou, Sophia

Abstract Background When term ambiguity and variability are very high, dictionary-based Named Entity Recognition ( NER ) is not an ideal solution even though large-scale terminological resources are...

BOOTStrep Annotation Scheme – Encoding Information for Text Mining (2008)

Scott Piao, Ekaterina Buyko, Yoshimasa Tsuruoka, Katrin Tomanek, John Mcnaught, Udo Hahn, ...

Annotation of information in corpora is an important aspect of text mining. It bridges between the information hidden in natural language texts and the semantic search queries for the information...

BioMed Central Open Access (2008)

Bmc Bioinformatics, Hong-woo Chun, Yoshimasa Tsuruoka, Jin-dong Kim, Rie Shiba, Naoki Nagata, ...

Proceedings Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts

Normalizing biomedical terms by minimizing ambiguity and variability (2008)

Tsuruoka, Yoshimasa, McNaught, John, Ananiadou, Sophia

Abstract Background One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of...

FILLING THE GAPS BETWEEN TOOLS AND USERS: A TOOL COMPARATOR, USING PROTEIN-PROTEIN INTERACTION AS AN EXAMPLE (2008)

Yusuke Miyao, Yoshimasa Tsuruoka, Yuichiro Matsubayashi, Sophia Ananiadou

Recently, several text mining programs have reached a near-practical level of performance. Some systems are already being used by biologists and database curators. However, it has also been...

FACTA: a text search engine for finding associated biomedical concepts (2008)

Tsuruoka, Yoshimasa, Tsujii, Jun'ichi, Ananiadou, Sophia

Summary: FACTA is a text search engine for MEDLINE abstracts, which is designed particularly to help users browse biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds)...

Integration of Diverse Knowledge and Data into Biomedical Knowledge Matrices (2007)

Yoshimasa Tsuruoka, Teruyoshi Hishiki, Osamu Ogasawara, Kousaku Okubo

After the accomplishment of human draft sequence, more and more efforts are being made in the mapping of the data-driven patterns to background knowledge, hoping to efficiently produce hypotheses out...

Game-tree search algorithm based on realization probability (2007)

Yoshimasa Tsuruoka, Daisaku Yokoyama, Takashi Chikayama

In games like chess, the node-expansion strategy significantly affects the performance of a gameplaying program. In this article we propose a new game-tree search algorithm that uses the realization...

Learning string similarity measures for gene/protein name dictionary look-up using logistic regression (2007)

Tsuruoka, Yoshimasa, McNaught, John, Tsujii, Jun'i;chi, Ananiadou, Sophia

Motivation: One of the bottlenecks of biomedical data integration is variation of terms. Exact string matching often fails to associate a name with its biological concept, i.e. ID or accession number...

Ambiguous Part-of-Speech Tagging for Improving Accuracy and Domain Portability of Syntactic Parsers (2007)

Kazuhiro Yoshida, Yoshimasa Tsuruoka, Yusuke Miyao

We aim to improve the performance of a syntactic parser that uses a part-of-speech (POS) tagger as a preprocessor. Pipelined parsers consisting of POS taggers and syntactic parsers have several...

Ambiguous Part-of-Speech Tagging for Improving Accuracy and Domain Portability of Syntactic Parsers (2007)

Kazuhiro Yoshida, Yoshimasa Tsuruoka, Yusuke Miyao

We aim to improve the performance of a syntactic parser that uses a part-of-speech (POS) tagger as a preprocessor. Pipelined parsers consisting of POS taggers and syntactic parsers have several...

Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts (2006)

Chun, Hong-Woo, Tsuruoka, Yoshimasa, Kim, Jin-Dong, Shiba, Rie, Nagata, Naoki, Hishiki, Teruyoshi, ...

Abstract Background Automatic recognition of relations between a specific disease term and its relevant genes or protein terms is an important practice of bioinformatics. Considering the utility of...

Mining Opinion Polarity Relations of Citations (2006)

Scott S. Piao, Sophia Ananiadou, Yoshimasa Tsuruoka, Yutaka Sasaki, John Mcnaught

Opinion mining has been receiving increasing attention recently, and various approaches have been suggested for mining sentiment information, such as mining attitudes or opinions about a topic or...

Extraction of gene-disease relations from Medline using domain dictionaries and machine learning (2006)

Hong-woo Chun, Yoshimasa Tsuruoka, Jin-dong Kim, Rie Shiba, Naoki Nagata, Teruyoshi Hishiki

We describe a system that extracts disease-gene relations from MedLine. We constructed a dictionary for disease and gene names from six public databases and extracted relation candidates by...

Extraction of gene-disease relations from Medline using domain dictionaries and machine learning (2006)

Hong-woo Chun, Yoshimasa Tsuruoka, Jin-dong Kim, Rie Shiba, Naoki Nagata, Teruyoshi Hishiki

We describe a system that extracts disease-gene relations from MedLine. We constructed a dictionary for disease and gene names from six public databases and extracted relation candidates by...

Extremely lexicalized models for accurate and fast hpsg parsing (2006)

Takashi Ninomiya, Yoshimasa Tsuruoka, Takuya Matsuzaki, Yusuke Miyao

This paper describes an extremely lexicalized probabilistic model for fast and accurate HPSG parsing. In this model, the probabilities of parse trees are defined with only the probabilities of...

Semantic retrieval for the accurate identification of relational concepts in massive textbases (2006)

Yusuke Miyao, Tomoko Ohta, Katsuya Masuda, Yoshimasa Tsuruoka, Kazuhiro Yoshida, Takashi Ninomiya

This paper introduces a novel framework for the accurate retrieval of relational concepts from huge texts. Prior to retrieval, all sentences are annotated with predicate argument structures and...

Improving the scalability of semi-markov conditional random fields for named entity recognition (2006)

Daisuke Okanohara, Yusuke Miyao, Yoshimasa Tsuruoka

This paper presents techniques to apply semi-CRFs to Named Entity Recognition tasks with a tractable computational cost. Our framework can handle an NER task that has long named entities and many...

An intelligent search engine and GUI-based efficient MEDLINE search tool based on deep syntactic parsing (2006)

Tomoko Ohta, Yoshimasa Tsuruoka, Jumpei Takeuchi, Jin-dong Kim, Yusuke Miyao, Akane Yakushiji, ...

We present a practical HPSG parser for English, an intelligent search engine to retrieve MEDLINE abstracts that represent biomedical events and an efficient MED-LINE search tool helping users to find...

Improving the scalability of semi-markov conditional random fields for named entity recognition (2006)

Daisuke Okanohara, Yusuke Miyao, Yoshimasa Tsuruoka

This paper presents techniques to apply semi-CRFs to Named Entity Recognition tasks with a tractable computational cost. Our framework can handle an NER task that has long named entities and many...

Semantic retrieval for the accurate identification of relational concepts in massive textbases (2006)

Yusuke Miyao, Tomoko Ohta, Katsuya Masuda, Yoshimasa Tsuruoka, Kazuhiro Yoshida, Takashi Ninomiya

This paper introduces a novel framework for the accurate retrieval of relational concepts from huge texts. Prior to retrieval, all sentences are annotated with predicate argument structures and...

Chunk parsing revisited (2005)

Yoshimasa Tsuruoka

tsuruoka,tsujii§ Chunk parsing is conceptually appealing but its performance has not been satisfactory for practical use. In this paper we show that chunk parsing can perform significantly better...

Bidirectional inference with the easiest-first strategy for tagging sequence data (2005)

Yoshimasa Tsuruoka

This paper presents a bidirectional inference algorithm for sequence labeling problems such as part-of-speech tagging, named entity recognition and text chunking. The algorithm can enumerate all...

Bidirectional inference with the easiest-first strategy for tagging sequence data (2005)

Yoshimasa Tsuruoka

This paper presents a bidirectional inference algorithm for sequence labeling problems such as part-of-speech tagging, named entity recognition and text chunking. The algorithm can enumerate all...

A machine learning approach to acronym generation (2005)

Yoshimasa Tsuruoka

This paper presents a machine learning approach to acronym generation. We formalize the generation process as a sequence labeling problem on the letters in the definition (expanded form) so that a...

Proceedings of the ACL-ISMB Workshop on Linking Biological Literature, Ontologies and Databases: Mining (2005)

Biological Semantics Pages, Yoshimasa Tsuruoka

This paper presents a machine learning approach to acronym generation. We formalize the generation process as a sequence labeling problem on the letters in the definition (expanded form) so that a...

Chunk Parsing Revisited (2005)

Yoshimasa Tsuruoka, Jun'ichi Tsujii

Chunk parsing is conceptually appealing but its performance has not been satisfactory for practical use. In this paper we show that chunk parsing can perform significantly better than previously...

Improving the performance of dictionary-based approaches in protein name recognition (2004)

Yoshimasa Tsuruoka

Dictionary-based protein name recognition is often a first step in extracting in-formation from biomedical documents because it can provide ID information on recognized terms. However,...

Iterative CKY parsing for Probabilistic Context-Free Grammars (2004)

Yoshimasa Tsuruoka

This paper presents an iterative CKY parsing algorithm for probabilistic contextfree grammars (PCFG). This algorithm enables us to prune unnecessary edges produced during parsing, which results in...

Training a Naive Bayes classifier via the EM algorithm with a class distribution constraint (2003)

Yoshimasa Tsuruoka

Combining a naive Bayes classifier with the EM algorithm is one of the promising approaches for making use of unlabeled data for disambiguation tasks when using local context features including word...

Boosting precision and recall of dictionary-based protein name recognition (2003)

Yoshimasa Tsuruoka

Dictionary-based protein name recognition is the first step for practical information extraction from biomedical documents because it provides ID information of recognized terms unlike machine...

Probabilistic Term Variant Generator for Biomedical Terms (2003)

Yoshimasa Tsuruoka

This paper presents an algorithm to generate possible variants for biomedical terms. The algorithm gives each variant its generation probability representing its plausibility, which is potentially...

Boosting Precision and Recall of Dictionary-Based Protein Name Recognition (2003)

Yoshimasa Tsuruoka, Jun'ichi Tsujii

Dictionary-based protein name recognition is the first step for practical information extraction from biomedical documents because it provides ID information of recognized terms unlike machine...

2001. Estimating reliability of contextual evidences in decision-list classifiers under Bayesian learning (2001)

Yoshimasa Tsuruoka, Takashi Chikayama

Classifiers are often required to output not only a classification result but also the probability of the classification. We focus on the decision list classifier which has successfully been applied...

Estimating Reliability of Contextual Evidences in Decision-List Classifiers under Bayesian Learning (2001)

Yoshimasa Tsuruoka, Takashi Chikayama

Classifiers are often required to output not only a classification result but also the probability of the classification.

FACTA: a text search engine for finding associated biomedical concepts

Tsuruoka, Yoshimasa, Tsujii, Jun'ichi, Ananiadou, Sophia

Summary: FACTA is a text search engine for MEDLINE abstracts, which is designed particularly to help users browse biomedical concepts (e.g. genes/proteins, diseases, enzymes and chemical compounds)...