| Exploratory Analysis of Retail Sales of Billions of Items (2008) | |||||||||||||||
Abstract | |||||||||||||||
| This paper describes several different approaches to analysis of a data set collected over the past year from a retail grocery chain containing hundreds of stores. Each record in the data set represents an individual item processed by an individual checkout laser scanner at a particular store at a particular time on a particular day. Each record contains additional information such as store department, price, etc. together with iden-tifying information such as the particular checkout scanner and, for some transactions, customer identification. The total data set contains billions of items which can be ag-gregated into hundreds of millions of transactions for millions of repeat customers. In order to get some insights in the data, we used several different approaches includ-ing some statistical analysis, some machine learning, and some data mining methods. Some of these have simply focused on ascertaining the “quality ” of the data while oth-ers have been more narrowly focused on simple questions like “which pairs of items are most frequently purchased together ” or “what is the relationship between basket size and number of baskets”. The sheer size of the data set has forced us to go beyond usual data mining methods and utilize Meta-Mining: the post processing of the results of basic analysis methods. 1 | |||||||||||||||
Publication details | |||||||||||||||
| |||||||||||||||