**Adminstrative info**

### Contents

State-of-the-art algorithmic techniques and models for massive data sets.

### Teachers

Inge Li Gørtz, ilg@imm.dtu.dk, office hours Friday 12-13, Building 322, room 018.

### When and where

Monday 8.15- 10. Bldg. 308/Aud. 11

Monday 10-12. Bldg. 324/Room 040

The course runs in the DTU spring semester (Feb 4th to May 13th). There is no teaching in the Easter break (March 25th and April 1st).

### Mandatory exercises

Use the template.tex file to prepare your hand in exercises. Compile using LaTeX. Upload the resulting pdf file (and only this file) via Campusnet. The maximum size of the finished pdf must be at most 2 pages.

### Collaboration policy for mandatory exercises.

- You may collaborate with fellow students on the hand in exercises.
- Collaboration is limited to discussion of ideas only, and you should write up the solutions entirely on your own.
- Do not use or seek out solutions from previous years of the course, solutions from similar courses, or solutions found on the internet.
- You should list your collaborators (see the template) as well as cite any references you may have used.

**Weekplan**

**The weekplan is preliminary.** It will be updated during the course. Under each week there is a number of suggestions for reading material regarding that weeks lecture. It is not the intention that you read ALL of the papers. It is a list of papers and notes where you can read about the subject discussed at the lecture.

**Week 1: Introduction and Hashing: Chained, Universal, and Perfect.**

- J. Carter and M. Wegman, Universal Classes of Hash Functions. J. Comp. Sys. Sci.
*, 1977* - M. Fredman, J. Komlos and E. Szemeredi, Storing a Sparse Table with O(1) Worst Case Access Time, J. ACM., 1984
- Scribe notes from MIT
- Peter Bro Miltersen’s notes from Aarhus
- Slides
- Exercises

**Week 2**: **Predecessor Data Structures: x-fast tries and y-fast tries.**

- P. van Emde Boas, Preserving Order in a Forest in less than Logarithmic Time, FOCS, 1975
- Dan E. Willard, Log-Logarithmic Worst-Case Range Queries are Possible in Space Θ(n), Inf. Process. Lett., 1983
- Scribe notes from MIT
- Mihai Patrascu and Mikkel Thorup, Time-space trade-offs for predecessor search, STOC 2006
- Cormen, Rivest, Leiserson, Introduction to Algorithms, 3rd edition, Chap. 20
- Slides
- Exercises

**Week 3: Decremental Connectivity in Trees: Cluster decomposition, Word-Level Parallelism.**

- S. Alstrup, J. P. Secher, M. Spork: Optimal On-Line Decremental Connectivity in Trees, Inf. Process. Lett., 1997
- S. Alstrup, J. P. Secher, M. Thorup: Word encoding tree connectivity works. SODA, 2000
- Scribe notes from MIT
- Slides
- More slides
- Exercises

**Week 4**: **Nearest Common Ancestors: Distributed data structures, Heavy-path decomposition, alphabetic codes.**

- S. Alstrup, C. Gavoille, H. Kaplan, T. Rauhe, Nearest Common Ancestors: A Survey and a New Algorithm for a Distributed Environment, Theory of Comput. Sys., 2004
- D. Harel, R. E. Tarjan: Fast Algorithms for Finding Nearest Common Ancestors. SIAM J. Comput., 1984
- Scribe notes from MIT
- Slides
- Exercises

**Week 5:** **Range Reporting: Range Trees, Fractional Cascading, and kD Trees.**

- M. de Berg, O. Cheong, M. van Kreveld and M. Overmars, Computational Geometry: Algorithms and Applications, 2008
- B. Chazelle and L. Guibas: Fractional cascading: I. A data structuring technique, Algoritmica, 1986
- J. L. Bentley and D. F. Stanat. Analysis of range searches in quad trees. Inf. Process. Lett., 1975
- Scribe notes from MIT
- Slides
- Exercises

**Week 6: Persistent data structures.**

- H. Kaplan, Persistent Data Structures, In Handbook on Data Structures and Applications, D. Mehta and S. Sahni, editors, CRC Press, 2005.
- N. Sarnak and R. E. Tarjan, Planar Point Location Using Persistent Search Trees, CACM, 1986.
- J. R. Driscoll, N. Sarnak, D. D. Sleator, R. E. Tarjan, Making Data Structures Persistent, JCSS, 1989.
- P.F. Dietz, Fully Persistent Arrays, WADS 1989.
- G. F. Italiano and R.Raman, Topics in Data Strutures.
- Slides
- Exercises

**Week 7:** **Union-Find and amortized analysis (potential method).**

*Amortized Analysis:*

- Rebecca Fiebrink’s notes on amortized analysis from Princeton.
- Pawel Winter’s notes on amortized analysis from DIKU.
- R. E. Tarjan: Amortized Computational Complexity, SIAM. J. on Algebraic and Discrete Methods Volume 6, Issue 2, pp. 306-318 (April 1985)

*Union-Find:*

- R. E. Tarjan and J. van Leeuwen: Worst-case Analysis of Set Union Algorithms, JACM, 1984 .
- R. E. Tarjan: Efficiency of a Good But Not Linear Set Union Algorithm, JACM, 1975.
- Alstrup et al.: Union-Find with Constant Time Deletions, ICALP 2005.
- R. Seidel: Top-Down Analysis of Path Compression: Deriving the Inverse-Ackermann Bound Naturally (and Easily), SWAT 2006.
- Slides
- Exercises

**Week 8: String Indexing: Dictionaries, Tries, Suffix trees, and Suffix Sorting.**

- Martin Farach-Colton, Paolo Ferragina, S. Muthukrishnan: On the sorting-complexity of suffix tree construction, J. ACM, 2000.
- Juha Kärkkäinen, Peter Sanders, Stefan Burkhardt: Linear work suffix array construction. J. ACM, 2006
- Dan Gusfield. Algorithms on Strings, Trees, and Sequences, Chap. 5-9.
- Scribe notes from MIT
- Slides
- Exercises

**Week 9: Introduction to approximation algorithms: TSP, k-center, and vertex cover.**

- David P. Williamson and David Shmoys: The Design of Approximation Algorithms (sections 1.1., 2.2, and 2.4)
- Jeff Erickson: Non-Lecture K: Approximation Algorithms
- You can also read about some of these problems in Kleinberg and Tardos: “Algorithm Design”, V.V. Vazirani: Approximation Algorithms, Cormen, Leierson, Rivest, and Stein: “Introduction to Algorithms”.
- Exercises

**Week 10: Approximation algorithms: Stable matching and TSP.**

- Zoltán Király: Better and Simpler Approximation Algorithms for the Stable Marriage Problem (page 3-10)
- David P. Williamson and David Shmoys: The Design of Approximation Algorithms (section 2.4)
- Exercises
- Slides

**Week 11: Data Compression: Lempel-Ziv, Entropy, and Burrows-Wheeler Transform.**

- Jacob Ziv and Abraham Lempel: “A Universal Algorithm for Sequential Data Compression”. IEEE Transactions on Information Theory 23 (3): 337–34
- Jacob Ziv and Abraham Lempel: “Compression of Individual Sequences via Variable-Rate Coding”. IEEE Transactions on Information Theory 24 (5): 530–536.
- Guy E. Blelloch: Introduction to Data Compression.
- Slides
- Exercise

**Week 12: External Memory: I/O Algorithms, Cache-Oblivious Algorithms, and Dynamic Programming**

- A. Aggarwal and J. Vitter, “The Input/Output Complexity of Sorting and Related Problems“, CACM 31 (9), 1988.
- Erik Demaine, “Cache-Oblivious Algorithms and Data Structures”, Lecture Notes from the EEF Summer School on Massive Data Sets.
- Rezaul Alam Chowdhury and Vijaya Ramachandran, “Cache-oblivious dynamic programming“, SODA 2006.
- Slides
- Exercises

**Week 13: Round up, Questions, and Further Perspectives.**

**FAQ**

**How should I write my mandatory exercises?**

The ideal writing format for mandatory exercises is classical scientific writing, such as the writing found in the peer-reviewed articles listed as reading material for this course (not textbooks and other pedagogical material). One of the objectives of this course is to practice and learn this kind of writing. A few tips:

- Write things directly: Cut to the chase and avoid anything that is not essential. Test your own writing by answering the following question: “Is this the shortest, clearest, and most direct exposition of my ideas/analysis/etc.?”
- Add structure: Don’t mix up description and analysis unless you know exactly what you are doing. For a data structure explain following things separately: The contents of the data structure, how to build it, how to query/update it, analysis of space, analysis of query/update time, and analysis of preprocessing time. For an algorithm explain separately what it does, analysis of time complexity, and analysis of space complexity.
- Be concise: Convoluted explanations, excessively long sentences, fancy wording, etc. have no in place scientific writing. Do not repeat the problem statement.
- Try to avoid pseudocode: Generally, aim for human readable description of algorithms that can easily and unambiguously be translated into code.
- Examples only as support: Do not explain your algorithms and data structures with an example. Only use examples as additionally illustration of your ideas.

**How much do the mandatory exercises count in the final grade?**

The final grade is an overall evaluation of your mandatory exercise and the oral exam combined. Thus, there is no precise division of these part in the final grade. However, expect that (in most cases, and under normal circumstances) the mandatory exercises account for a large fraction of the final grade, and the oral exam is a “fine tuning” of your scores in the mandatory exercises.

**What do I do if I want to do a MSc/BSc thesis or project in Algorithms? **

Great! Algorithms is an excellent topic to work on 🙂 and Algorithms for Massive Data Sets is designed to prepare you to write a strong thesis. Some basic tips and points.

- Let us know well in advance: Identifying an interesting problem in algorithms that matches your interest can take time. With enough time to go over the related litterature and study up on relevant topics your project will likely be more succesful. It may also be a good idea to do an initial “warm up” project before a large thesis to test ideas or survey an area.
- Join the community: It is very good idea to enter the local algorithms community at DTU to get a feel for what kind of stuff you could work on for your thesis and what thesis work algorithms is about. Talk to other students doing thesis work in algorithms. Join the algolog mailing list and go to algorithms talks. Also, we strongly encourage you to go to the thesis defenses in algorithms announced on the mailing lists.
- Collaborate: We strongly encourage you to do your thesis in pairs. We think that having a collaborator to discuss with greatly helps in many aspects of thesis work in algorithms. Our experience confirms this.
- No strings attached. Choosing a topic for your thesis is important. You are welcome to discuss master thesis topics with us without pressure to actually write your thesis in algorithms. We encourage you to carefully select your topic.