Module Details

The information contained in this module specification was correct at the time of publication but may be subject to change, either during the session because of unforeseen circumstances, or following review of the module at the end of the session. Queries about the module should be directed to the member of staff with responsibility for the module.
Title Big Data
Code CKIT525
Coordinator Prof FP Coenen
Computer Science
Coenen@liverpool.ac.uk
Year CATS Level Semester CATS Value
Session 2020-21 Level 7 FHEQ Whole Session 15

Aims

To provide students with in-depth knowledge of the domain of Big Data and the relevant concepts and technologies involved. To provide students with a comprehensive, but critical, understanding of an open-source software framework for distributed data storage and processing.   To allow students to develop practical solutions to big data problems using theoretical underpinning and know-how obtained during the course of the module. To provide students with a critical awareness of practical issues related to the integration and deployment of Big-Data management systems in the context of enterprise deployment.


Learning Outcomes

(LO1) A comprehensive understanding of the theories, models and frameworks underpinning the concept of Big-Data in a variety of organisational settings.

(LO2) An ability to critically apply the standard techniques of Big Data so as to design and implement effective Big Data ecosystems to support business analytics.

(LO3) A complete and systematic understanding of an open-source software framework for distributed data storage and distributed processing, and its practical application.

(LO4) An in-depth awareness of the critical issues involved in the deployment of distributed data processing pipelines.

(LO5) Knowledge of the application of Big Data techniques in the wider context such as with respect to enterprise deployment and data security.

(S1) Skills in using technology - Online communications skills

(S2) Communication (oral, written and visual) - Influencing skills – argumentation

(S3) Critical thinking and problem solving - Critical analysis

(S4) Critical thinking and problem solving - Evaluation

(S5) Commercial awareness - Relevant understanding of organisations


Syllabus

 

Week 1 Introduction to Big Data and Apache Hadoop, terminology and basic concepts.   Week 2 Big Data Ecosystems and the Big Data landscape, the six V’s of Big Data.   Week 3 Components of the Hadoop stack, attributes and uses of MapReduce, the Hadoop Distributed File System (HDFS) and Yarn, installing Hadoop and running "large-dataset programs" with Hadoop.   Week 4 Modeling and managing Big Data, Big Data Management Systems (BDMS), practical work with the Cloudera Data Management Virtual Machine .   Week 5 Big Data Integration and Processing; configuring and working with BDMS schemes; further work with the Cloudera Virtual Machine.   Week 6 Data Frames and Document-Oriented Big Data systems; predictive analytics with the Pandas Data-frames, MongoDB, Splunk and Datameer.   Week 7 Big Data processing pipelines and graph analytics, distributed processing with Apache Spark components (Spark core, pipelines, transf ormation engines, Spark-SQL and Spark Graph-X).   Week 8 Big Data enterprise deployment, integration and security issues.


Teaching and Learning Strategies

Teaching Method 1 - Online Learning
Description: Weekly seminar supported by asynchronous discussion in a virtual classroom environment facilitated by an online instructor.
Attendance Recorded: Yes
Notes: Number of hours per week that students are expected to attend the virtual classroom so as to participate in discussion, dedicated to group work and individual assessment is 7.5.


Teaching Schedule

  Lectures Seminars Tutorials Lab Practicals Fieldwork Placement Other TOTAL
Study Hours           60

60
Timetable (if known)              
Private Study 90
TOTAL HOURS 150

Assessment

EXAM Duration Timing
(Semester)
% of
final
mark
Resit/resubmission
opportunity
Penalty for late
submission
Notes
             
CONTINUOUS Duration Timing
(Semester)
% of
final
mark
Resit/resubmission
opportunity
Penalty for late
submission
Notes
Eight discussion questions Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Whole session  Weekly Ddiscussion Q    40       
Essay - Big Data trends and salient Hadoop eco-system features Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 2  one week: 750 - 1000         
Portfolio - Managing and reporting on big-data using Hadoop, Part 1 Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 3  one week; Software f    10       
Practical - using Map-Reduce Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 4  one week: Software f         
Portfolio - Managing and reporting on big-data using Hadoop, Part 2 Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 5  one week: Software f    10       
Essay – Big-Data Analytic schemes Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 6  one week: 750 - 1000         
Portfolio - Managing and reporting on big-data using Hadoop, Part 3 Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 7  one week: Software f    10       
Essay - Big-Data systems Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 8  one week: 750 - 1000         

Recommended Texts

Reading lists are managed at readinglists.liverpool.ac.uk. Click here to access the reading lists for this module.