Publication View

Taming the Giants and the Monsters: Mining Large Databases for Nuggets of Knowledge (1998)

Abstract
enge. Consider a simple applications that determines if two rows in a data table are likely to be the same, given that it is acceptable for "a few fields" to differ. While this "find-similar" problem appears simple, and one could think of several ways to achieve it, executing it on a massive data store is far from straightforward. Large data stores are now a fact of life for most organizations. A gigabyte is a quantity of information; it represents about 10 9 bytes of stored information. The word derives from the Latin giga, meaning "giant." The next unit up is the terabyte, from the Greek teras, meaning "monster", represents 10 12 bytes. Quite appropriately, in certain database circles, the terabyte is also referred to as the "terrorbyte": a term I first heard used by Jim Gray. The modern information revolution is creating huge data stores which, instead of offering in

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=?doi=10.1.1.39.8597
Source ftp://ftp.research.microsoft.com/pub/dmx/usama/papers/DBPD98/giants_monsters.ps
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.26.6422, 10.1.1.135.9883, 10.1.1.51.5219, 10.1.1.34.836, 10.1.1.33.1283, 10.1.1.69.5344, 10.1.1.73.2736, 10.1.1.33.2633, 10.1.1.33.4625