| Taming the Giants and the Monsters: Mining Large Databases for Nuggets of Knowledge (1998) | |||||||||||||||
Abstract | |||||||||||||||
| enge. Consider a simple applications that determines if two rows in a data table are likely to be the same, given that it is acceptable for "a few fields" to differ. While this "find-similar" problem appears simple, and one could think of several ways to achieve it, executing it on a massive data store is far from straightforward. Large data stores are now a fact of life for most organizations. A gigabyte is a quantity of information; it represents about 10 9 bytes of stored information. The word derives from the Latin giga, meaning "giant." The next unit up is the terabyte, from the Greek teras, meaning "monster", represents 10 12 bytes. Quite appropriately, in certain database circles, the terabyte is also referred to as the "terrorbyte": a term I first heard used by Jim Gray. The modern information revolution is creating huge data stores which, instead of offering in | |||||||||||||||
Publication details | |||||||||||||||
| |||||||||||||||