Home

CPSC 436C: CLOUD COMPUTING FOR DATA SCIENCE

Fall 2023

COURSE DESCRIPTION

This course is an introduction to cloud computing designed for the students who wish to use the cloud for data science applications. It covers the topics of how cloud computing can be used to support data science workflows, including data storage, processing, analysis, and visualization. It also includes security considerations for the entire pipeline. Overall, the course provides students with the skills and knowledge necessary to effectively use cloud computing for design, implementation, test, and deployment of data science applications.

LECTURES & CLASSROOMS

Monday - 11:00 am to 12:00 pm, ISSCS Room X150 - Demco Table 1,
Project Room - ICCS X239

Friday - 3:00 pm to 4:00 pm, ISSCS Room X150 - Demco Table 1,
Project Room - ICCS X239

TEXTBOOKS

The course will rely mainly on the following textbook.

Learning Spark: Lightning-fast Data Analytics by: Jules Damji, Brooke Wenig, Tathagata Das

Topics

Cloud service delivery models
Cloud storage systems
Batch processing
Stream processing
Cloud security

Team

TA1: Richard Yang

Email: rzhyang@student.ubc.ca

TA2: Aryan Bhairaw

Email: baryan01@student.ubc.ca

SYLLABUS

Download the syllabus (v1.0)

HANDOUT

Lecture 1

Introduction to Datacentres and
Cloud [SLIDES]

Lecture 2

Function as a service &
Containerization [SLIDES 1 ] , [SLIDES 2 ]

Lecture 3

Virtualization [SLIDES]

Lecture 4

Big Data [SLIDES]

Lecture 5

Data Stores [SLIDES1] , [SLIDES2] , [SLIDES3]

Lecture 6

Data Management Systems [SLIDES]

Lecture 7

Data Processing [SLIDES1] , [SLIDES2]

Lecture 8

Structured Data Processing - Machine Learning [SLIDES]

Lecture 9

Distributed Machine Learning [SLIDES]

Lecture 10

Stream Processing [SLIDES1] , [SLIDES2]

Lecture 11

Graph Processing - Resource Management [SLIDES1] , [SLIDES2]

Lecture 12

Cloud Security- part1: [SLIDES]

Cloud Security- part2: [SLIDES]

Guest Speaker-Cloud Security [SLIDES]

Lecture 13

Guest Speaker-Advanced topics [SLIDES]

Assignments

Assignment 0: Go Serverless (5%); [AWS] [Azure] [Rubric]
Assignment 1: Containerization Vs. Serverless (5%); [AWS] [Azure] [Rubric]
Assignment 2: Running Image recognition on a Virtual Machine (5%); [AWS] [Azure] [Rubric]
Assignment 3: Running image recognition in a VM using Object Store (5%); [AWS] [Azure] [Rubric]
Assignment 4: Building a Machine Learning Pipeline through Jupyter Notebook (5%); [AWS] [Azure] [Rubric]
Assignment 5: Comparing Single Node vs Cluster Performance and Cost in Image Classification Using Amazon EMR (15%); [AWS] [Azure] [Rubric]
Assignment 6: Streaming Text Analysis using Spark; [AWS] [Azure] [Rubric]

Tutorials

Tutorial 1: [AWS] [Azure]
Tutorial 2: [AWS] [Azure]
Tutorial 3: [AWS] [Azure]
Tutorial 4: [AWS] [Azure]