Publication View

Robust Machine Learning Applied to Terascale Astronomical Datasets (2008)

Abstract
We present recent results from the LCDM (Laboratory for Cosmological Data Mining; http://lcdm.astro.uiuc.edu) collaboration between UIUC Astronomy and NCSA to deploy supercomputing cluster resources and machine learning algorithms for the mining of terascale astronomical datasets. This is a novel application in the field of astronomy, because we are using such resources for data mining, and not just performing simulations. Via a modified implementation of the NCSA cyberenvironment Data-to-Knowledge, we are able to provide improved classifications for over 100 million stars and galaxies in the Sloan Digital Sky Survey, improved distance measures, and a full exploitation of the simple but powerful k-nearest neighbor algorithm. A driving principle of this work is that our methods should be extensible from current terascale datasets to upcoming petascale datasets and beyond. We discuss issues encountered to-date, and further issues for the transition to petascale. In particular, disk I/O will become a major limiting factor unless the necessary infrastructure is implemented.. Comment: 11 pages, 2 figures, uses llncs.cls. To appear in the 9th LCI International Conference on High-Performance Clustered Computing

Publication details
Download http://arxiv.org/abs/0804.3417
Repository arXiv (United States)
Keywords Astrophysics, Computer Science - Distributed, Parallel, and Cluster Computing
Type text