Data Science course 

This page contains syllabus, lecture slides, reading material, and exam for the course "Data Science".

For any questions or comments regarding the lecture or this web site, please contact  Rajendra Akerkar.

Pre-requisites and Co-requisites

Information and Database Systems  or equivalent is a pre-requisite. Preferably you have already taken one of the following courses: Database System Implementation,  Data Mining or courses in Machine Learning/Natural Language Processing.

Course Objectives
In this course, we will discuss recent works on Data Science with emphasis on algorithms and systems for large-scale advanced data analysis. Each student will be responsible for presenting one or more research articles in class and participating in discussions on papers presented by the other people in class. Also, each student will do a class project that has the largest impact  on the final grade. Every student should be comfortable with programming and preferably have prior experience with data management systems, data modeling and analysis.

Course Syllabus

Data Science is a dynamic and fast growing field at the interface of Statistics and Computer Science. In the international marketplace, businesses, suppliers and customers are creating and consuming vast amounts of information. Gartner predicts that enterprise data in all forms will grow 650 percent over the next five years. According to IDC, the world's volume of data doubles every 18 months. Digital information is doubling every one and half years and will exceed 1,000 exabytes next year according to the MIT Centre for Digital Research. In 2010, medical centres held almost 1bn terabytes of data. That is almost 2,000bn file cabinets worth of information. This deluge of data, often referred to as big data, obviously creates a challenge for business community and data scientists.

The objective of this course is to elaborate on broad areas of data science and offers students to specialize in data analytics, data storage and management, data visualization or general systems management.

The course outline:

Session 1   Introduction to Data Science & Analytics

Session 2   Data Analytics Stages

Session 3   Data Analytics Opportunities (Faceted search at scale, sentiment analysis, exploratory analytics, 
                 operational analytics)

Session 4    Statistics and Data Analysis (Using R and RStudio)

Session 5   Data Mining concepts (Clustering, Link analysis, Machine learning)

Session 6    MapReduce Basics

Session 7    MapReduce Algorithm Design

Session 8    SQL Essentials for In-Database Analytics

Session 9    Analytics Case Studies

Session 10   Data Visualisation Tools

Learning outcomes

  You will

Project, Presentation Topics and Lecture Slides

The course material will be available via course management system.

Course Structure & Resources

This will be 2 weeks course. Each class session will be of 4 hour 30 min. duration. Classes will comprise of lecture, hands-on-practice, discussion etc. Students will be encouraged to participate in class-discussion and will make at least one presentation during the course. Morning sessions will mainly consist of lectures and group exercises. Afternoons will be focused mainly around assessment activities and personal study. Students are expected to organise their time to cover preparatory work and assessment activities.

We will use research papers as our main source.

Readings have been derived from the following books

  1. Big Data Computing, (Edited by: Rajendra Akerkar) Taylor & Francis Group/CRC Press (To publish in 2013)
  2. Mining of Massive Datasets by Anand Rajaraman and Jeff Ullman
  3. The elements of statistical learning :data mining, inference, and prediction by Trevor Hastie, Robert Tibshirani, Jerome Friedman.New York :Springer
  4. Intelligent Technologies for Web Application by Rajendra Akerkar and Priti S. Sajja, Taylor & Francis Group/CRC Press
  5. Building an Intelligent Web: Theory & Practice by Rajendra Akerkar and Pawan Lingras, Jones & Bartlett


O'Reilly Strata conference

Big Data Journal

General Reading Material


This course is assessed by coursework (40%) and online exam (60%). There are two types of coursework. The first is a Group Presentation. This is a 30-minute group talk by three students on a given topic using Powerpoint slides. The second is a Project Report.

You will give a group presentation: a 30-minute talk to be presented to your seminar group in second Week. The talk will be followed by 5 minutes for questions. The presentation represents 10% of the total marks for the course.

The content of this project will be a report on an IR & IE system undertaken by students from a choice of 3 options. The project report represents 30% of the total marks for the course.

50% of the marks for the course are allocated to an final exam which takes place online through the course managment system. Advice on what the exam consists of and how to approach it will be given in the last session of the course.

Details about the end of semester exam will be available on course management system.

Marking & Grading

The candidate will be evaluated on a 10 point scale and the Grading pattern will be as follows:

Percentage 96≤P≤100 90≤P≤95 80≤P≤89 70≤P≤79 60≤P≤69 55≤P≤59 50≤P≤54 40≤P≤49 31≤P≤39 00≤P≤30
G 10

Specific criteria for judging the assignments will vary, but generally they will be judged on:

This website is licensed under a Creative Commons Attribution-Share Alike 3.0 License.

Back to Rajendra's home page