CPSC 436C Cloud Computing for Data Science
CPSC 436C: CLOUD COMPUTING FOR DATA SCIENCE
Spring 2024
COURSE DESCRIPTION
This course is an introduction to cloud computing designed for the students who wish to use the cloud for data science applications. It covers the topics of how cloud computing can be used to support data science workflows, including data storage, processing, analysis, and visualization. It also includes security considerations for the entire pipeline. Overall, the course provides students with the skills and knowledge necessary to effectively use cloud computing for design, implementation, test, and deployment of data science applications.
LECTURES & CLASSROOMS
Lecture Time : Tuesday/Thursday - 8:00AM to 9:30AM
Classroom: Hugh Dempster Pavilion (DMP) - Room: 110
Office hours: Tuesday, 10:00AM -12:00 PM
TEXTBOOKS
The course will rely mainly on the following textbook.
- Learning Spark: Lightning-fast Data Analytics by: Jules Damji, Brooke Wenig, Tathagata Das
TOPICS
- Cloud service delivery models
- Cloud storage systems
- Batch processing
- Stream processing
- Cloud security
TA TEAM
Arman Moztarzadeh (arman88@student.ubc.ca )
Aryan Bhairaw (baryan01@student.ubc.ca)
Ryan Dick (rdick01@student.ubc.ca)
SYLLABUS
Download the syllabus (v1.0)
HANDOUT
Lecture 1
Introduction to Data centres and Cloud [SLIDES]
Lecture 2
Cloud Computing Service Models, Serverless Computing [SLIDES]
Lecture 3
Cloud Computing Service Models, Containerization [SLIDES]
Lecture 4
Cloud Service Computing Models, Virtualization [SLIDES]
Lecture 5
Big Data [SLIDES]
Lecture 7
Data Management Systems [SLIDES]
Lecture 9
Structured Data Processing [SLIDES]
Lecture 10
Distributed Machine Learning [SLIDES]
Lecture 14
Guest Speaker-Advanced topics [SLIDES]
Assignments
- Assignment 0: Go Serverless (5%); [AWS] [Azure] [Rubric]
- Assignment 1: Containerization Vs. Serverless (5%); [AWS] [Azure] [Rubric]
- Assignment 2: Running Image recognition on a Virtual Machine (5%); [AWS] [Azure] [Rubric]
- Assignment 3: Running image recognition in a VM using Object Store (5%); [AWS] [Azure] [Rubric]
- Assignment 4: Comparing Single Node vs Cluster Performance and Cost in Image Classification Using Amazon EMR (5%); [AWS] [Azure] [Rubric]
- Assignment 5: Streaming Text Analysis using Spark (5%); [AWS] [Azure] [Rubric]
- Assignment 6: Building a Machine Learning Pipeline through Jupyter Notebook (10%); [Non-simplified] [Simplified] [Rubric]