Module Details

The information contained in this module specification was correct at the time of publication but may be subject to change, either during the session because of unforeseen circumstances, or following review of the module at the end of the session. Queries about the module should be directed to the member of staff with responsibility for the module.
Title Big Data
Code CKIT525
Coordinator Prof FP Coenen
Computer Science
Coenen@liverpool.ac.uk
Year CATS Level Semester CATS Value
Session 2018-19 Level 7 FHEQ Whole Session 15

Aims

  1. To provide students with in-depth knowledge of the domain of Big Data and the relevant concepts and technologies involved.

  2. To provide students with a comprehensive, but critical, understanding of an open-source software framework for distributed data storage and processing.

  3.  To allow students to develop practical solutions to big data problems using theoretical underpinning and know-how obtained during the course of the module.

  4. To provide students with a critical awareness of practical issues related to the integration and deployment of Big-Data management systems in the context of enterprise deployment.

  5.  

     


Learning Outcomes

A comprehensive understanding of the theories, models and frameworks underpinning the concept of Big-Data in a variety of organisational settings.

An ability to critically apply the standard techniques of Big Data so as to design and implement effective Big Data ecosystems to support business analytics.

A complete and systematic understanding of an open-source software framework for distributed data storage and distributed processing, and its practical application.

An in-depth awareness of the critical issues involved in the deployment of distributed data processing pipelines.

Knowledge of the application of Big Data techniques in the wider context such as with respect to enterprise deployment and data security.


Syllabus

Week 1

Introduction to Big Data and Apache Hadoop, terminology and basic concepts.
 
Week 2
Big Data Ecosystems and the Big Data landscape, the six V’s of Big Data.
 
Week 3
Components of the Hadoop stack, attributes and uses of MapReduce, the Hadoop Distributed File System (HDFS) and Yarn, inst alling Hadoop and running "large-dataset programs" with Hadoop.
 
Week 4
Modeling and managing Big Data, Big Data Management Systems (BDMS), practical work with the Cloudera Data Management Virtual Machine.
 
Week 5
Big Data Integration and Processing; configuring and working with BDMS schemes; further work with the Cloudera Virtual Machine.
 
Week 6
Data Frames and Document-Oriented Big Data systems; predictive analytics with the Pandas Data-frames, MongoDB, Splunk and Datameer.
 
Week 7
Big Data processing pipelines and graph analytics, distributed processing with Apache Spark components (Spark core, pipelines, transformation engines, Spark-SQL and Spark Graph-X).
 
Week 8
Big Data enterprise deployment, integration and security issues.

 


Teaching and Learning Strategies

Online Learning - Weekly seminar supported by asynchronous discussion in a virtual classroom environment facilitated by an online instructor.

Number of hours per week that students are expected to attend the virtual classroom so as to participate in discussion, dedicated to group work and individual assessment is 7.5.


Teaching Schedule

  Lectures Seminars Tutorials Lab Practicals Fieldwork Placement Other TOTAL
Study Hours           60
Weekly seminar supported by asynchronous discussion in a virtual classroom environment facilitated by an online instructor.
60
Timetable (if known)           Number of hours per week that students are expected to attend the virtual classroom so as to participate in discussion, dedicated to group work and individual assessment is 7.5.
 
 
Private Study 90
TOTAL HOURS 150

Assessment

EXAM Duration Timing
(Semester)
% of
final
mark
Resit/resubmission
opportunity
Penalty for late
submission
Notes
             
CONTINUOUS Duration Timing
(Semester)
% of
final
mark
Resit/resubmission
opportunity
Penalty for late
submission
Notes
Coursework  Weekly Ddiscussion Q  Whole session  40  No reassessment opportunity  Standard UoL penalty applies  Eight discussion questions There is no reassessment opportunity, The nature of the adopted online learning paradigm is such that no reassessment opportunity is available; instead students failing the module will be offered the opportunity to retake the entire module. 
Coursework  one week: 750 - 1000  Week 2  No reassessment opportunity  Standard UoL penalty applies  Essay - Big Data trends and salient Hadoop eco-system features There is no reassessment opportunity, The nature of the adopted online learning paradigm is such that no reassessment opportunity is available; instead students failing the module will be offered the opportunity to retake the entire module. 
Coursework  one week; Software f  Week 3  10  No reassessment opportunity  Standard UoL penalty applies  Portfolio - Managing and reporting on big-data using Hadoop, Part 1 There is no reassessment opportunity, The nature of the adopted online learning paradigm is such that no reassessment opportunity is available; instead students failing the module will be offered the opportunity to retake the entire module. 
Coursework  one week: Software f  Week 4  No reassessment opportunity  Standard UoL penalty applies  Practical - using Map-Reduce There is no reassessment opportunity, The nature of the adopted online learning paradigm is such that no reassessment opportunity is available; instead students failing the module will be offered the opportunity to retake the entire module. 
Coursework  one week: Software f  Week 5  10  No reassessment opportunity  Standard UoL penalty applies  Portfolio - Managing and reporting on big-data using Hadoop, Part 2 There is no reassessment opportunity, The nature of the adopted online learning paradigm is such that no reassessment opportunity is available; instead students failing the module will be offered the opportunity to retake the entire module. 
Coursework  one week: 750 - 1000  Week 6  No reassessment opportunity  Standard UoL penalty applies  Essay – Big-Data Analytic schemes There is no reassessment opportunity, The nature of the adopted online learning paradigm is such that no reassessment opportunity is available; instead students failing the module will be offered the opportunity to retake the entire module. 
Coursework  one week: Software f  Week 7  10  No reassessment opportunity  Standard UoL penalty applies  Portfolio - Managing and reporting on big-data using Hadoop, Part 3 There is no reassessment opportunity, The nature of the adopted online learning paradigm is such that no reassessment opportunity is available; instead students failing the module will be offered the opportunity to retake the entire module. 
Coursework  one week: 750 - 1000  Week 8  No reassessment opportunity  Standard UoL penalty applies  Essay - Big-Data systems There is no reassessment opportunity, The nature of the adopted online learning paradigm is such that no reassessment opportunity is available; instead students failing the module will be offered the opportunity to retake the entire module. Notes (applying to all assessments) (1) Due to nature of the on-line mode of instruction this work is not marked anonymously. (2) Students who fail the module have the opportunity to repeat the entire module. (3) The "Standard UoL Penalty" for late submission that applies is the "Standard UoL Penalty" agreed with respect to online programmes offered in collaboration with Laureate Online Education. (4) For group work assessments groups typically comprise 3 to 4 students. Both group and individual contributions are assessed and integrated to produce a final mark for each student. 

Recommended Texts

Reading lists are managed at readinglists.liverpool.ac.uk. Click here to access the reading lists for this module.
Explanation of Reading List:

The online programmes offered by the department of Computer Science in Collaboration with Laureate Online Education use online materials wherever possible including the online resources available within the University of Liverpool’s libraries. This module does not require a specific text book.