The UCI Donald Bren School of Information and Computer Science MDS curriculum is designed to provide you with a strong foundation in modern data science. The MDS program differentiates itself by offering a dual-framework in Statistical and Computational methods to advance the knowledge base in the areas of Analytics and Modeling, Data Engineering, Deep Neural Networks, Machine Learning and Artificial Intelligence. Taught by pioneering faculty and researchers in the field of data science, students will be part of the only computational focused schools in the UC system and gain valuable hands-on experience through our industry partnered Capstone Project.

The 15-month MDS program consists of 12 courses – nine required core courses and three electives – for a minimum of 48 units.

A key component to the program is the capstone project where you engage with our industry partners to develop models to mine insights. Under the supervision of faculty and industry professionals, the capstone project provides you with the preparation in project design, development, professional writing and oral presentation to be a leader in the field.

Dan-Gillen

Industry is learning quickly that people can scrape and analyze data, but to do something truly meaningful with data - to make decisions that will drive the industry forward you need the foundations of data science and understanding the statistics and computing methods.

Dan Gillen, Ph.D.
Chair of Department of Statistics

Program Structure & Course Descriptions

Program Structure - Academic Year 2021-2022

STATS 200AP: Intermediate Probability and Statistical Theory I

COMPSCI 260P: Fundamentals of Algorithms with Applications

COMPSCI 273P: Machine Learning and Data Mining

STATS 200BP: Intermediate Probability and Statistical Theory II

STATS 210P: Statistical Methods I (Regression modeling strategies)

COMPSCI 220P: Databases and Data Management

STATS 211P: Statistical Methods II (Modeling and Data Visualization)

COMPSCI 271P: Introduction to Artificial Intelligence

COMPSCI 274P: Neural Networks and Deep Learning

INTERNSHIP (or)

ELECTIVE

DS 296P: Capstone I – Professional Writing and Communication for Data Science Careers

DS 297P: Capstone II – Design Project for Data Science

Course Descriptions

COMPSCI 220P: Databases and Data Management

Introduction to the design of databases and the use of database management systems (DBMS) for managing and utilizing data. Topics include entity-relationship modeling for design, relational data model, relational algebra, relational schema design, and use of SQL (Structured Query Language). The course will also touch on topics such as data wrangling and dataframes for data analysis and new technologies for semi-structured and/or scalable data analysis.

COMPSCI 260P: Fundamentals of Algorithms with Applications

Covers fundamental concepts in the design and analysis of algorithms and is geared toward practical application and implementation. Topics include: greedy algorithms, deterministic and randomized graph algorithms, models of network flow, fundamental algorithmic techniques and NP-completeness.

COMPSCI 273P: Machine Learning & Data Mining

Introduction to principles of machine learning and data-mining. Learning algorithms for classifications, regression, and clustering. Emphasis is on discriminative classification methods such as decision trees, rules, nearest neighbor, linear models, and naive Bayes.

STATS 200AP: Intermediate Probability and Statistical Theory I

Fundamental probability and distribution theory needed for statistical inference. Topics include axiomatic foundations of probability theory, discrete and continuous distributions, expectation and moment generating functions, multivariate distributions, transformations, sampling distributions, and limit theorems.

STATS 200BP: Intermediate Probability and Statistical Theory II

Fundamental theory and methods for making statistical inference. Topics include principles of data reduction (sufficient, ancillary, and complete statistics), methods of finding point estimators (method of moments, maximum likelihood estimators, Bayes estimators), methods of evaluating estimators (mean squared error, best unbiased estimators, asymptotic evaluations), hypothesis testing, and confidence intervals.

STATS 210P: Statistical Methods I

Statistical methods for analyzing data from multi-variable observational studies and experiments.  Topics include model selection and model diagnostics for simple and multiple linear regression and generalized linear models.

STATS 211P: Statistical Methods II

Statistical methods for designing experiments, visualizing, and analyzing experimental and observational data using generalized regression models, multivariate analysis, and methods suitable for dependent data.

 DS 296P: Capstone I: Professional Writing and Communication for Data Science Careers

Written and oral communication for data science careers.  Production of a detailed document describing the design, methods, analytic strategy, interpretation, and conclusions as related to the concurrent capstone design and analysis class and refinement of written documents and oral communications skills needed for a successful job search.  Co-requisite: DS 297P.

DS 297P: Capstone II: Design Project for Data Science

Complete implementation of a data science analytic strategy for obtaining empirically-driven solutions to problems from science and industry.  Focuses on the problem definition and analysis, data representation, algorithm selection, solution validation, and presentation of results.  Co-requisite: DS 296P.

COMPSCI 222P: Principles of Data Management

Covers fundamental principles underlying data management systems. Understanding and implementation of key techniques including storage management, buffer management, record-oriented file system, access methods, query optimization, and query processing.

COMPSCI 223P: Transaction Processing and Distributed Data Management

Introduction to fundamental principles underlying transaction processing systems including database consistency, atomicity, concurrency control, database recovery, replication, commit protocols and fault-tolerance. Includes transaction processing in centralized, distributed, parallel, and client-server environments.

COMPSCI 224P: Big Data Management

A technical overview of emerging technologies for large-scale data management. The course will focus on Big Data management frameworks such as Hadoop and Spark. The course will also cover relational and non-relational database technologies, including document (“NoSQL”) databases as well as emerging cloud data management solutions. The underlying storage and security properties of these systems will also be covered.

COMPSCI 261P: Data Structures with Applications

Data structures and their associated management algorithms with analysis and examination of practical applications and implementations.

COMPSCI 271P: Introduction to Artificial Intelligence

The study of theories and computational models for systems which behave and act in an intelligent manner. Fundamental sub-disciplines of artificial intelligence including knowledge representation, search, deduction, planning, probabilistic reasoning, natural language parsing and comprehension, knowledge-based systems, and learning.

COMPSCI 274P: Neural Networks and Deep Learning

Introduction to principles of machine learning and neural networks. Architecture design. Feedforward and recurrent networks. Learning models and algorithms.  Applications to data analysis and prediction problems in a wide range of areas such as machine vision, natural language processing, biomedicine, and finance.  

COMPSCI 275P: Graphical Models and Statistical Learning

Introduction to principles of statistical machine learning with probabilistic graphical models. We will study efficient inference algorithms based on optimization-based variational methods, and simulation-based Monte Carlo methods. Several approaches to learning from data will be covered, including conditional models for discriminative learning, and Bayesian methods for controlling model complexity.  Methods will be motivated by applications including image and video analysis, text and language processing, sensor networks, autonomous robotics, computational biology, and social networks.

STATS 205P: Bayesian Data Analysis

Covers basic Bayesian concepts and methods with emphasis on data analysis. Special emphasis on specification of prior distributions. Development of methods and theory for one and two samples, and binary, Poisson and linear regression.

STATS 240P: Multivariate Statistical Methods

Theory and application of multivariate statistical methods.  Topics include: Statistical inference for the multivariate normal model and its extensions to multiple samples and regression, use of statistical packages for data visualization and dimension reduction, discriminant analysis, cluster analysis, and factor analysis.

STATS 245P: Time series Analysis

Statistical models for analysis of time series from the time and frequency domain perspective, with particular emphasis on applications in economics, finance, climatology, engineering, ecology.  Topics include: linear time series models for trends; models for stationary time-series, ARMA models; non-stationary time series, ARIMA models; forecasting and Kalman filtering; time-series smoothing; seasonal models; ARCH, GARCH and stochastic volatility models; multivariate time series; vector autoregressive models; spectral analysis; case studies. Statistical software R will be used throughout this course.

STATS 262P: Theory and Practice of Sample Survey

This course covers the basic techniques and statistical methods used in designing surveys and analyzing collected survey data. Topics to be covered include: simple random sampling, ratio and regression estimates, stratified sampling, cluster sampling, sampling with unequal probabilities, multistage sampling, and methods to handle nonresponse.

STATS 270P: Stochastic Processes

Introduction to the theory and application of stochastic processes. Topics include Poisson processes, Markov chains, continuous-time Markov processes, and Brownian motion. Applications include Markov chain Monte Carlo methods and financial modeling (for example, option pricing).

Capstone Project

The Capstone Project provides students the opportunity to work on real-life data science problems with industry partners. The projects are comprehensive in scope, allowing students to demonstrate the breadth and depth of knowledge in data science. The Capstone Project will develop an empirically-driven solution to sponsoring organizations and will cover the full spectrum of the analytic process, from data gathering, manipulation, visualization, analysis, and interpretation of results. Students will work in small groups to put concepts and techniques together and put them in action.