Issues in processing large data sets
Instructor: Prof. Dan Stefanescu
Affiliation: Department of Mathematics and Computer Science, Suffolk University, Boston, Massachusetts
Duration: 16 hours
Period: November 17 - 25, 2005
Place: Dipartimento di Ingegneria dell'Informazione: Elettronica, Informatica, Telecomunicazioni, via Diotisalvi, meeting room
Credits: 4
Contacts: Prof. Beatrice Lazzerini
Aims
Fueled by advances in computing and storage hardware, data is acquired and retained at unprecedented rates. We shall examine parallel and sequential techniques for computing with very large data sets. Parallel methods offer the advantage of incremental computational power, but also the challenges of new computational models and algorithms. A number of issues in parallel algorithms, hardware and software, will be discussed with an emphasis on the Bulk Synchronous Processing (BSP) computational model. Sequential methods, while working on familiar territories, need to reorganize established algorithms in order preserve scalability on modern processors. Data mining and, in particular, market basket analysis provide a fertile ground to evaluate these issues. A number of methods for computing frequent itemsets will be discussed and evaluated with respect to efficiency and scalability.
Syllabus