Similarity search: the metric space approach

Instructor: Prof. Pavel Zezula

Affiliation: Faculty of Informatics, Masaryk University in Brno, Czech Republic

Duration: 20 hours

Date: May 15 - May 19, 2006

Place: Dipartimento di Ingegneria dell'Informazione: Elettronica, Informatica, Telecomunicazioni, via Diotisalvi, meeting room

Contacts: Dott. Fausto Rabitti, Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo” del CNR di Pisa


Contents

Part I: Metric searching in a nutshell

  • Foundations of metric space searching (7 hours)
  • distance searching problem in metric spaces
  • metric distance measures
  • similarity queries
  • basic partitioning principles
  • principles of similarity query execution
  • policies to avoid distance computations
  • metric space transformations
  • principles of approximate similarity search
  • advanced issues
  • Survey of exiting approaches (3 hours)
  • ball partitioning methods
  • generalized hyper-plane partitioning approaches
  • exploiting pre-computed distances
  • hybrid indexing approaches
  • approximated techniques

Part II: Metric searching in large collections of data

  • Centralized index structures for large databases (4 hours)
  • M-tree family
  • hash-based metric indexing
  • performance trials
  • Approximate similarity search (2 hours)
  • relative error approximation
  • good fraction approximation
  • small chance improvement approximation
  • proximity-based approximation
  • PAC nearest neighbor searching
  • performance trials
  • Parallel and distributed indexes (4 hours)
  • preliminaries
  • processing M-trees with parallel resources
  • scalable and distributed similarity search
  • performance trials