Issues in processing large data sets

Instructor: Prof. Dan Stefanescu

Affiliation: Department of Mathematics and Computer Science, Suffolk University, Boston, Massachusetts

Duration: 16 hours

Period: November 17 - 25, 2005

Place: Dipartimento di Ingegneria dell'Informazione: Elettronica, Informatica, Telecomunicazioni, via Diotisalvi, meeting room

Credits: 4

Contacts: Prof. Beatrice Lazzerini


Aims

Fueled by advances in computing and storage hardware, data is acquired and retained at unprecedented rates. We shall examine parallel and sequential techniques for computing with very large data sets. Parallel methods offer the advantage of incremental computational power, but also the challenges of new computational models and algorithms. A number of issues in parallel algorithms, hardware and software, will be discussed with an emphasis on the Bulk Synchronous Processing (BSP) computational model. Sequential methods, while working on familiar territories, need to reorganize established algorithms in order preserve scalability on modern processors. Data mining and, in particular, market basket analysis provide a fertile ground to evaluate these issues. A number of methods for computing frequent itemsets will be discussed and evaluated with respect to efficiency and scalability.

Syllabus