Unsupervised duplicate detection using sample non-duplicates (2006)
The problem of identifying objects in databases that refer to the same real world entity, is known, among others, as duplicate detection or record linkage. Objects may be duplicates, even though they...
Probabilistic iterative duplicate detection (2005)
The problem of identifying approximately duplicate records between databases is known, among others, as duplicate detection or record linkage. To this end, typically either rules or a weighted...
A precise blocking method for record linkage (2005)
Identifying approximately duplicate records between databases requires the costly computation of distances between their attributes. Thus duplicate detection is usually performed in two phases, an...
SWQL - A query language for data integration based on OWL (2005)
The Web Ontology Language OWL has been advocated as a suitable model for semantic data integration. Data integration requires expressive means to map between heterogeneous OWL schemas. This paper...