Module Details |
The information contained in this module specification was correct at the time of publication but may be subject to change, either during the session because of unforeseen circumstances, or following review of the module at the end of the session. Queries about the module should be directed to the member of staff with responsibility for the module. |
Title | Big Data | ||
Code | CKIT525 | ||
Coordinator |
Prof FP Coenen Computer Science Coenen@liverpool.ac.uk |
||
Year | CATS Level | Semester | CATS Value |
Session 2020-21 | Level 7 FHEQ | Whole Session | 15 |
Aims |
|
To provide students with in-depth knowledge of the domain of Big Data and the relevant concepts and technologies involved. To provide students with a comprehensive, but critical, understanding of an open-source software framework for distributed data storage and processing. To allow students to develop practical solutions to big data problems using theoretical underpinning and know-how obtained during the course of the module. To provide students with a critical awareness of practical issues related to the integration and deployment of Big-Data management systems in the context of enterprise deployment. |
Learning Outcomes |
|
(LO1) A comprehensive understanding of the theories, models and frameworks underpinning the concept of Big-Data in a variety of organisational settings. |
|
(LO2) An ability to critically apply the standard techniques of Big Data so as to design and implement effective Big Data ecosystems to support business analytics. |
|
(LO3) A complete and systematic understanding of an open-source software framework for distributed data storage and distributed processing, and its practical application. |
|
(LO4) An in-depth awareness of the critical issues involved in the deployment of distributed data processing pipelines. |
|
(LO5) Knowledge of the application of Big Data techniques in the wider context such as with respect to enterprise deployment and data security. |
|
(S1) Skills in using technology - Online communications skills |
|
(S2) Communication (oral, written and visual) - Influencing skills – argumentation |
|
(S3) Critical thinking and problem solving - Critical analysis |
|
(S4) Critical thinking and problem solving - Evaluation |
|
(S5) Commercial awareness - Relevant understanding of organisations |
Syllabus |
|
Week 1 Introduction to Big Data and Apache Hadoop, terminology and basic concepts. Week 2 Big Data Ecosystems and the Big Data landscape, the six V’s of Big Data. Week 3 Components of the Hadoop stack, attributes and uses of MapReduce, the Hadoop Distributed File System (HDFS) and Yarn, installing Hadoop and running "large-dataset programs" with Hadoop. Week 4 Modeling and managing Big Data, Big Data Management Systems (BDMS), practical work with the Cloudera Data Management Virtual Machine . Week 5 Big Data Integration and Processing; configuring and working with BDMS schemes; further work with the Cloudera Virtual Machine. Week 6 Data Frames and Document-Oriented Big Data systems; predictive analytics with the Pandas Data-frames, MongoDB, Splunk and Datameer. Week 7 Big Data processing pipelines and graph analytics, distributed processing with Apache Spark components (Spark core, pipelines, transf ormation engines, Spark-SQL and Spark Graph-X). Week 8 Big Data enterprise deployment, integration and security issues. |
Teaching and Learning Strategies |
|
Teaching Method 1 - Online Learning |
Teaching Schedule |
Lectures | Seminars | Tutorials | Lab Practicals | Fieldwork Placement | Other | TOTAL | |
Study Hours |
60 |
60 | |||||
Timetable (if known) | |||||||
Private Study | 90 | ||||||
TOTAL HOURS | 150 |
Assessment |
||||||
EXAM | Duration | Timing (Semester) |
% of final mark |
Resit/resubmission opportunity |
Penalty for late submission |
Notes |
CONTINUOUS | Duration | Timing (Semester) |
% of final mark |
Resit/resubmission opportunity |
Penalty for late submission |
Notes |
Eight discussion questions Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Whole session | Weekly Ddiscussion Q | 40 | ||||
Essay - Big Data trends and salient Hadoop eco-system features Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 2 | one week: 750 - 1000 | 7 | ||||
Portfolio - Managing and reporting on big-data using Hadoop, Part 1 Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 3 | one week; Software f | 10 | ||||
Practical - using Map-Reduce Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 4 | one week: Software f | 7 | ||||
Portfolio - Managing and reporting on big-data using Hadoop, Part 2 Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 5 | one week: Software f | 10 | ||||
Essay â Big-Data Analytic schemes Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 6 | one week: 750 - 1000 | 8 | ||||
Portfolio - Managing and reporting on big-data using Hadoop, Part 3 Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 7 | one week: Software f | 10 | ||||
Essay - Big-Data systems Standard UoL penalty applies for late submission. This is not an anonymous assessment. Assessment Schedule (When) :Week 8 | one week: 750 - 1000 | 8 |
Recommended Texts |
|
Reading lists are managed at readinglists.liverpool.ac.uk. Click here to access the reading lists for this module. |