Diploma

# Graduate Diploma in Data Science

- CRICOS Code: 095994M

### Navigation

## What will I study?

### Overview

The Graduate Diploma in Data Science is a 100-point course, made up of:

- Core statistics subjects (50 points)
- Core computer science subjects (50 points).

In the course, you'll cover:

- Statistical modelling and inference
- Algorithms, machine learning and data mining
- Database systems
- Using a range of methods to conduct analyses
- Reporting on analytical findings.

#### Tailoring the course to you

Your subjects will be tailored to you, depending on your previous academic background.

You'll be allocated to one of four streams (Engineering and Science, Computer Science, Statistics, or Commerce and Arts).

You will be automatically assessed for credit (advanced standing) during the selection process. If you've already studied some of the core subjects (or their equivalents), you may be granted exemptions. In these cases, you'll take additional subjects to yield 100 points in total (50 points of statistics subjects and 50 points of computer science subjects).

### Explore this course

Explore the subjects you could choose as part of this diploma.

#### Complete all of the following subjects:

**Algorithms and Complexity**12.5 pts**AIMS**The aim of this subject is for students to develop familiarity and competence in assessing and designing computer programs for computational efficiency. Although computers manipulate data very quickly, to solve large-scale problems, we must design strategies so that the calculations combine effectively. Over the latter half of the 20th century, an elegant theory of computational efficiency developed. This subject introduces students to the fundamentals of this theory and to many of the classical algorithms and data structures that solve key computational questions. These questions include distance computations in networks, searching items in large collections, and sorting them in order.

**INDICATIVE CONTENT**

Topics covered include complexity classes and asymptotic notation; empirical analysis of algorithms; abstract data types including queues, trees, priority queues and graphs; algorithmic techniques including brute force, divide-and-conquer, dynamic programming and greedy approaches; space and time trade-offs; and the theoretical limits of algorithm power.

**Programming and Software Development**12.5 pts**AIMS**The aims for this subject is for students to develop an understanding of approaches to solving moderately complex problems with computers, and to be able to demonstrate proficiency in designing and writing programs. The programming language used is Java.

**INDICATIVE CONTENT**

Topics covered will include:

- Java basics
- Console input/output
- Control flow
- Defining classes
- Using object references
- Programming with arrays
- Inheritance
- Polymorphism and abstract classes
- Exception handling
- UML basics
- Interfaces
- Generics

**Database Systems & Information Modelling**12.5 pts**AIMS**The subject introduces key topics in modern information organisation, particularly with regard to structured databases. The well-founded relational theory behind modern structured query language (SQL) engines, has given them as much a place behind the web site of an organisation and on the desktop, as they traditionally enjoyed on corporate mainframes. Topics covered may include: the managerial view of data, information and knowledge; conceptual, logical and physical data modelling; normalisation and de-normalisation; the SQL language; data integrity; transaction processing, data warehousing, web services and organisational memory technologies. This is a core foundation subject for both the Master of Information Systems and Master of Information Technology.

**INDICATIVE CONTENT**This subject serves as an introduction to databases and data modelling from a data management perspective. Database design, from conceptual design through to physical implementation will be covered. This will include Entity Relationship modelling, normalisation and de-normalisation and SQL. Additionally the use of databases in various contexts will be explored (web based databases, connecting programs to databases, data warehousing, health contexts, geospatial databases).

**Elements of Data Processing**12.5 pts**AIMS**Data processing is fundamental to computing and data science. This subject gives an introduction to various aspects of data processing including database management, representation and analysis of data, information retrieval, visualisation and reporting, and cloud computing. This subject introduces students to the area, with an emphasis on both tools and underlying foundations.

**INDICATIVE CONTENT**The subject's focus is on the data pipeline, and activities known colloquially as 'data wrangling'. Indicative topics covered include:

- Capturing data (data ingress)
- Data representation and storage
- Cleaning, normalisation and filling in missing data (imputation)
- Combing multiple sources of data (data integration)
- Query languages and processing
- Scripting to support the data pipeline
- Distributing a database over multiple nodes (sharding), cloud computing file systems
- Visualisation and presentation

**Methods of Mathematical Statistics**25 ptsThis subject introduces probability and the theory underlying modern statistical inference. Properties of probability are reviewed, univariate and multivariate random variables are introduced, and their properties are developed. It demonstrates that many commonly used statistical procedures arise as applications of a common theory. Both classical and Bayesian statistical methods are developed. Basic statistical concepts including maximum likelihood, sufficiency, unbiased estimation, confidence intervals, hypothesis testing and significance levels are discussed. Computer packages are used for numerical and theoretical calculations.

**A First Course In Statistical Learning**25 ptsSupervised statistical learning is based on the widely used linear models that model a response as a linear combination of explanatory variables. Initially this subject develops an elegant unified theory for a quantitative response that includes the estimation of model parameters, hypothesis testing using analysis of variance, model selection, diagnostics on model assumptions, and prediction. Some classification methods for qualitative responses are then developed. This subject then considers computational techniques, including the EM algorithm. Bayes methods and Monte-Carlo methods are considered. The subject concludes by considering some unsupervised learning techniques.

#### Complete all of the following subjects:

**Internet Technologies**12.5 pts**AIMS**The subject will introduce the basics of computer networks to students through a study of layered models of computer networks and applications. The first half of the subject deals with data communication protocols in the lower layers of OSI and TCP/IP reference models. The students will be exposed to the working of various fundamental networking technologies such as wireless, LAN, RFID and sensor networks. The second half of the subject deals with the upper layers of the TCP/IP reference model through a study of several Internet applications.

**INDICATIVE CONTENT**Topics covered include: Introduction to Internet, OSI reference model layers, protocols and services, data transmission basics, interface standards, network topologies, data link protocols, message routing, LANs, WANs, TCP/IP suite, detailed study of common network applications (e.g., email, news, FTP, Web), network management, and current and future developments in network hardware and protocols.

**Elements of Data Processing**12.5 pts**AIMS**Data processing is fundamental to computing and data science. This subject gives an introduction to various aspects of data processing including database management, representation and analysis of data, information retrieval, visualisation and reporting, and cloud computing. This subject introduces students to the area, with an emphasis on both tools and underlying foundations.

**INDICATIVE CONTENT**The subject's focus is on the data pipeline, and activities known colloquially as 'data wrangling'. Indicative topics covered include:

- Capturing data (data ingress)
- Data representation and storage
- Cleaning, normalisation and filling in missing data (imputation)
- Combing multiple sources of data (data integration)
- Query languages and processing
- Scripting to support the data pipeline
- Distributing a database over multiple nodes (sharding), cloud computing file systems
- Visualisation and presentation

**Methods of Mathematical Statistics**25 ptsThis subject introduces probability and the theory underlying modern statistical inference. Properties of probability are reviewed, univariate and multivariate random variables are introduced, and their properties are developed. It demonstrates that many commonly used statistical procedures arise as applications of a common theory. Both classical and Bayesian statistical methods are developed. Basic statistical concepts including maximum likelihood, sufficiency, unbiased estimation, confidence intervals, hypothesis testing and significance levels are discussed. Computer packages are used for numerical and theoretical calculations.

**A First Course In Statistical Learning**25 ptsSupervised statistical learning is based on the widely used linear models that model a response as a linear combination of explanatory variables. Initially this subject develops an elegant unified theory for a quantitative response that includes the estimation of model parameters, hypothesis testing using analysis of variance, model selection, diagnostics on model assumptions, and prediction. Some classification methods for qualitative responses are then developed. This subject then considers computational techniques, including the EM algorithm. Bayes methods and Monte-Carlo methods are considered. The subject concludes by considering some unsupervised learning techniques.

#### Select one of the following subjects:

**Database Systems & Information Modelling**12.5 pts**AIMS**The subject introduces key topics in modern information organisation, particularly with regard to structured databases. The well-founded relational theory behind modern structured query language (SQL) engines, has given them as much a place behind the web site of an organisation and on the desktop, as they traditionally enjoyed on corporate mainframes. Topics covered may include: the managerial view of data, information and knowledge; conceptual, logical and physical data modelling; normalisation and de-normalisation; the SQL language; data integrity; transaction processing, data warehousing, web services and organisational memory technologies. This is a core foundation subject for both the Master of Information Systems and Master of Information Technology.

**INDICATIVE CONTENT**This subject serves as an introduction to databases and data modelling from a data management perspective. Database design, from conceptual design through to physical implementation will be covered. This will include Entity Relationship modelling, normalisation and de-normalisation and SQL. Additionally the use of databases in various contexts will be explored (web based databases, connecting programs to databases, data warehousing, health contexts, geospatial databases).

**Advanced Database Systems**12.5 pts**AIMS**Many applications require access to very large amounts of data. These applications often require reliability (data must not be lost even in the presence of hardware failures), and the ability to retrieve and process the data very efficiently.

The subject will cover the technologies used in advanced database systems. Topics covered will include: transactions, including concurrency, reliability (the ACID properties) and performance; and indexing of both structured and unstructured data. The subject will also cover additional topics such as: uncertain data; Xquery; the Semantic Web and the Resource Description Framework; dataspaces and data provenance; datacentres; and data archiving.

**INDICATIVE CONTENT**

Topics include:

- Introduction to High Performance Database Systems
- Issues of Performance and Reliability
- Transaction Processing
- Recovery from Failures
- Map Reduce Models.

#### Plus one of the following subjects:

**Algorithms and Complexity**12.5 pts**AIMS**The aim of this subject is for students to develop familiarity and competence in assessing and designing computer programs for computational efficiency. Although computers manipulate data very quickly, to solve large-scale problems, we must design strategies so that the calculations combine effectively. Over the latter half of the 20th century, an elegant theory of computational efficiency developed. This subject introduces students to the fundamentals of this theory and to many of the classical algorithms and data structures that solve key computational questions. These questions include distance computations in networks, searching items in large collections, and sorting them in order.

**INDICATIVE CONTENT**

Topics covered include complexity classes and asymptotic notation; empirical analysis of algorithms; abstract data types including queues, trees, priority queues and graphs; algorithmic techniques including brute force, divide-and-conquer, dynamic programming and greedy approaches; space and time trade-offs; and the theoretical limits of algorithm power.

**Models of Computation**12.5 pts**AIMS**Formal logic and discrete mathematics provide the theoretical foundations for computer science. This subject uses logic and discrete mathematics to model the science of computing. It provides a grounding in the theories of logic, sets, relations, functions, automata, formal languages, and computability, providing concepts that underpin virtually all the practical tools contributed by the discipline, for automated storage, retrieval, manipulation and communication of data.

**INDICATIVE CONTENT**- Logic: Propositional and predicate logic, resolution proofs, mathematical proof
- Discrete mathematics: Sets, functions, relations, order, well-foundedness, induction and recursion
- Automata: Regular languages, finite-state automata, context-free grammars and languages, parsing
- Computability briefly: Turing machines, computability, decidability

A functional programming language will be used to implement and illustrate concepts.

#### Complete all of the following subjects:

**Introduction to Programming**12.5 pts**AIMS**This subject introduces the fundamental concepts of computing programming, and how to solve simple problems using high-level procedural language, with a specific emphasis on data manipulation, transformation, and visualisation of data.

**INDICATIVE CONTENT**Fundamental programming constructs; fundamental data structures; abstraction; basic program structures; algorithmic problem solving; use of modules.

The subject assumes no prior knowledge of computer programming.

**Algorithms and Complexity**12.5 pts**AIMS**The aim of this subject is for students to develop familiarity and competence in assessing and designing computer programs for computational efficiency. Although computers manipulate data very quickly, to solve large-scale problems, we must design strategies so that the calculations combine effectively. Over the latter half of the 20th century, an elegant theory of computational efficiency developed. This subject introduces students to the fundamentals of this theory and to many of the classical algorithms and data structures that solve key computational questions. These questions include distance computations in networks, searching items in large collections, and sorting them in order.

**INDICATIVE CONTENT**

Topics covered include complexity classes and asymptotic notation; empirical analysis of algorithms; abstract data types including queues, trees, priority queues and graphs; algorithmic techniques including brute force, divide-and-conquer, dynamic programming and greedy approaches; space and time trade-offs; and the theoretical limits of algorithm power.

**Programming and Software Development**12.5 pts**AIMS**The aims for this subject is for students to develop an understanding of approaches to solving moderately complex problems with computers, and to be able to demonstrate proficiency in designing and writing programs. The programming language used is Java.

**INDICATIVE CONTENT**

Topics covered will include:

- Java basics
- Console input/output
- Control flow
- Defining classes
- Using object references
- Programming with arrays
- Inheritance
- Polymorphism and abstract classes
- Exception handling
- UML basics
- Interfaces
- Generics

**Database Systems & Information Modelling**12.5 pts**AIMS**The subject introduces key topics in modern information organisation, particularly with regard to structured databases. The well-founded relational theory behind modern structured query language (SQL) engines, has given them as much a place behind the web site of an organisation and on the desktop, as they traditionally enjoyed on corporate mainframes. Topics covered may include: the managerial view of data, information and knowledge; conceptual, logical and physical data modelling; normalisation and de-normalisation; the SQL language; data integrity; transaction processing, data warehousing, web services and organisational memory technologies. This is a core foundation subject for both the Master of Information Systems and Master of Information Technology.

**INDICATIVE CONTENT**This subject serves as an introduction to databases and data modelling from a data management perspective. Database design, from conceptual design through to physical implementation will be covered. This will include Entity Relationship modelling, normalisation and de-normalisation and SQL. Additionally the use of databases in various contexts will be explored (web based databases, connecting programs to databases, data warehousing, health contexts, geospatial databases).

**A First Course In Statistical Learning**25 ptsSupervised statistical learning is based on the widely used linear models that model a response as a linear combination of explanatory variables. Initially this subject develops an elegant unified theory for a quantitative response that includes the estimation of model parameters, hypothesis testing using analysis of variance, model selection, diagnostics on model assumptions, and prediction. Some classification methods for qualitative responses are then developed. This subject then considers computational techniques, including the EM algorithm. Bayes methods and Monte-Carlo methods are considered. The subject concludes by considering some unsupervised learning techniques.

#### Select two of the following subjects:

**Multivariate Statistics for Data Science**12.5 ptsModern statistics and data science deals with data having multiple dimensions. Multivariate methods are used to handle these types of data. Approaches to supervised and unsupervised learning with multivariate data are discussed. In particular, methods for classification, clustering, and dimension reduction are introduced, which are particularly suited to high-dimensional data. Both parametric and nonparametric approaches are discussed.

**Statistical Modelling for Data Science**12.5 ptsStatistical models are central to data science applications. Modelling approaches such as linear and generalized linear models, mixed models, and non-parametric regression are developed. Applications to time series, longitudinal, and spatial data are discussed. Methods for causal inference and handling missing data are introduced.

**Computational Statistics & Data Science**12.5 ptsComputing techniques and data mining methods are indispensable in modern statistical research and data science applications, where “Big Data” problems are often involved. This subject will introduce a number of recently developed methods and applications in computational statistics and data science that are scalable to large datasets and high-performance computing. The data mining methods to be introduced include general model diagnostic and assessment techniques, kernel and local polynomial nonparametric regression, basis expansion and nonparametric spline regression, generalised additive models, classification and regression trees, forward stagewise and gradient boosting models. Important statistical computing algorithms and techniques used in data science will be explained in detail. These include the bootstrap resampling and inference, cross-validation, the EM algorithm and Louis method, and Markov chain Monte Carlo methods including adaptive rejection and squeeze sampling, sequential importance sampling, slice sampling, Gibbs sampler and Metropolis-Hastings algorithm.

#### Complete all of the following subjects:

**Introduction to Programming**12.5 pts**AIMS**This subject introduces the fundamental concepts of computing programming, and how to solve simple problems using high-level procedural language, with a specific emphasis on data manipulation, transformation, and visualisation of data.

**INDICATIVE CONTENT**Fundamental programming constructs; fundamental data structures; abstraction; basic program structures; algorithmic problem solving; use of modules.

The subject assumes no prior knowledge of computer programming.

**Algorithms and Complexity**12.5 pts**AIMS****INDICATIVE CONTENT**

**Programming and Software Development**12.5 pts**AIMS**The aims for this subject is for students to develop an understanding of approaches to solving moderately complex problems with computers, and to be able to demonstrate proficiency in designing and writing programs. The programming language used is Java.

**INDICATIVE CONTENT**

Topics covered will include:

- Java basics
- Console input/output
- Control flow
- Defining classes
- Using object references
- Programming with arrays
- Inheritance
- Polymorphism and abstract classes
- Exception handling
- UML basics
- Interfaces
- Generics

**Database Systems & Information Modelling**12.5 pts**AIMS****INDICATIVE CONTENT**

**Methods of Mathematical Statistics**25 ptsThis subject introduces probability and the theory underlying modern statistical inference. Properties of probability are reviewed, univariate and multivariate random variables are introduced, and their properties are developed. It demonstrates that many commonly used statistical procedures arise as applications of a common theory. Both classical and Bayesian statistical methods are developed. Basic statistical concepts including maximum likelihood, sufficiency, unbiased estimation, confidence intervals, hypothesis testing and significance levels are discussed. Computer packages are used for numerical and theoretical calculations.

**A First Course In Statistical Learning**25 pts