Module Details

The information contained in this module specification was correct at the time of publication but may be subject to change, either during the session because of unforeseen circumstances, or following review of the module at the end of the session. Queries about the module should be directed to the member of staff with responsibility for the module.
Title Introduction to Data Science
Code COMP229
Coordinator Dr V Kurlin
Computer Science
Vitaliy.Kurlin@liverpool.ac.uk
Year CATS Level Semester CATS Value
Session 2021-22 Level 5 FHEQ First Semester 15

Aims

1. To provide a foundation and overview of modern problems in Data Science.
2. To describe the tools and approaches for the design and analysis of algorithms for da-ta clustering, dimensionally reduction, graph reconstruction from noisy data.
3. To discuss the effectiveness and complexity of modern Data Science algorithms.
4. To review applications of Data Science to Vision, Networks, Materials Chemistry.


Learning Outcomes

(LO1) describe modern problems and tools in data clustering and dimensionality reduction,

(LO2) formulate a real data problem in a rigorous form and suggest potential solutions,

(LO3) choose the most suitable approach or algorithmic method for given real-life data,

(LO4) visualise high-dimensional data and extract hidden non-linear patterns from the data.

(S1) Critical thinking and problem solving - Critical analysis


Syllabus

 

1. Descriptive Statistics (3 lectures): average, range, median, mode, quartiles, sample deviation and variance, box plot.

2. Introduction to probability (3 lectures): probability axioms, combinatorial probabilities, probability paradoxes.

3. Probability distributions (3 lectures): uniform distribution, beta distribution, normal distribution.

4. Hypothesis testing (3 lectures): confidence intervals, P-value, statistical significance.

5. Bayesian statistics (3 lectures): conditional probabilities, Bayes formula, Bayesian vs frequentist approaches.

6. Linear regression (3 lectures): scatterplots and correlation, linear approximation to data, regression formulae.

7. Clustering (3 lectures): types of clustering algorithms, optimisation for k-means clustering, Lloyd’s algorithm.

8. Linear maps (3 lectures): matrices of linear maps, scaling, reflections, rotations, compositions.

9. Invariants of linear maps (3 lectures): determinant a nd eigenvalues of a matrix, a change of a linear basis.

10. Dimensionality reduction (3 lectures): principal component analysis and singular value decomposition.


Teaching and Learning Strategies

Teaching Method 1 - Lecture
Description: Formal Lectures

Teaching Method 2 - Tutorial
Description: Tutorials with 4-5 formative assessments (marked by demonstrators) - using problems similar to exam questions.

Due to Covid-19, in 2021/22, one or more of the following delivery methods will be implemented based on the current local conditions.

(a) Hybrid delivery
Teaching Method 1 - Lecture
Description: Mix of on-campus/on-line synchronous/asynchronous sessions
Teaching Method 2 - Tutorial
Description: Mix of on-campus/on-line synchronous/asynchronous sessions

(b) Fully online delivery and assessment
Teaching Method 1 - Lecture
Description: On-line synchronous/asynchronous lectures
Teaching Method 2 - Tutorial
Description: On-line synchronous/asynchronous sessions

(c) Standard on-campus delivery
Teaching Method 1 - Lecture
Description: Mix of on-campus/on-line synchronous/asynchronous sessions
Teaching M ethod 2 - Tutorial
Description: On-campus synchronous sessions


Teaching Schedule

  Lectures Seminars Tutorials Lab Practicals Fieldwork Placement Other TOTAL
Study Hours 30

  10

      40
Timetable (if known)              
Private Study 110
TOTAL HOURS 150

Assessment

EXAM Duration Timing
(Semester)
% of
final
mark
Resit/resubmission
opportunity
Penalty for late
submission
Notes
(229) Written examination      70       
CONTINUOUS Duration Timing
(Semester)
% of
final
mark
Resit/resubmission
opportunity
Penalty for late
submission
Notes
4-5 formative assessments (marked by demonstrators) - using problems similar to exam questions, without a contribution to the final mark.           
(229.1) Class test      30       

Recommended Texts

Reading lists are managed at readinglists.liverpool.ac.uk. Click here to access the reading lists for this module.