| Bulk Loading into Databases: a Declarative Approach (2007) | |||||||||||||||
Abstract | |||||||||||||||
| We present novel optimization techniques for bulk loading into databases. The aim of this work is to capture, understand and optimize different performance criteria (e.g., elapsed time, bandwidth, memory consumption) when populating a database with a large amount of data. This work applies to many applications where large amounts of data are manipulated: datawarehouses, replicated databases, etc. As opposed to commercial systems [Ora96, Obj, O2T96, Gem, Exe97, Pac91] and previous research work on the same topic [Fon97, WN95, WN96, WJLG00], our approach follows the fundamental database principle of physical-logical independence. A bulk loading program is represented as a sequence of algebraic expressions; it is optimized using equivalences, physical organization directives (e.g. cluster specifications on target data) and a cost model that captures efficiency requirements (e.g., is bandwidth more important than processing time at the source system?); from the optimized algebraic expressions, code is generated. This approach offers two very desirable properties: true efficiency and reusability. (i) There is not one good way to measure the efficiency of a bulk loading program. For instance, a program that runs slowly may | |||||||||||||||
Publication details | |||||||||||||||
| |||||||||||||||