Syllabus for ISM-663: Advanced Data Analytics Topics


		Multimedia Notebook

B U L L E T I N B O A R D

Syllabus for ISM-663: Advanced Data Analytics Topics

(Subject: Analytics/Authored by: Liping Liu on 1/10/2025 9:00:00 PM)/Views: 8751

Blog News Post

Instructor: Dr. Liping Liu, CBA360, X5947

Credits: 3 hours

Applicable Term: Spring 2025 (January 12 - May 18)

Textbooks:

Main Text: Brett Lantz, Machine Learning with R, 2nd Ed., Packet Publishing, 2015 (ISBN: 978-1-78439-390-8)
Supplementary Text: Rui Miguel Forte, Mastering Predictive Analytics with R, Packt Publishing, 2015. ISBN: 978-1-78398-280-6.
Supplementary Text: Matthew Taddy, Leslie Hendrix, and Matthew Harding, Modern Business Analytics: Practical Data Science and Decision Making, McGraw-Hill, 2023. ISBN: 978-1-264-07167-8.

Reference Resources:

Book: Trevor Hastie, Robert Tisshirani, and Jerome Friedman, The Elements of Statistic Learning: Data Mining, Inferences, and Prediction, 2nd Ed., Springer, 2016. ISBN: 978-0387848570
Book: Julian J. Faraway, Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, Chapman & Hall/CRC, 2022. 978-1498720960
Book: Radhakrishnan Nagarajan, Marco Scutari, and Sophie Lebre, Bayesian Networks in R with Applications in Systems Biology, Springer, 2013. ISBN 978-1-4614-6445-7
Book: François Chollet, Deep Learning with R, 2nd Ed., Manning, 2022. ISBN 9781633439849.

Office Hours: 1:30-3:30 PM on Tuesdays and Thursdays

Course Description: This course covers advanced topics on data analytics. The selected topics include generalized linear modeling (e.g., linear regression, logistic regression, gamma regression, mixed effects modeling, etc.), deep learning (e.g., deep neural networks, convolutional networks, recurrent neural networks, transformers, etc. ), and Bayesian learning (e.g., Naïve Bayes classifier, Hidden Markov models, Latent Dirichlet Allocation, etc.). It introduces the theory and application of each selected model, explains the rationale of each selected learning algorithm, and applies the implementation of each selected algorithm in a programming language to real-world projects. The course requires a programming language such as R, SAS, or Python for big data projects. This is a good free online book to go over on R programming: R for Data Scientists. Prerequisite: 6500:602

Objectives:

Understand the concepts, assumptions, and applications of general linear models, graphical probabilistic models, and neural networks
Understand the rationale of the algorithms for Bayesian learning (Naïve Bayes classifier, graphical probabilistic model, Hidden Markov models, and Latent Dirichlet Allocation), general linear modeling (linear regression, logistic regression, gamma regression, Poisson regression), and deep learning (regression, time-series prediction, computer vision, and natural language processing)
Enhances R programming skills in data preparation, exploration, and visualization and applies the implementations of each machine learning algorithm in R to real-world projects

Weekly Schedule:

Week 1: Introduction (Chapters 1 and 10): Data Analytics, Task - Model - Algorithm Alignment, Data Preparation, and Model Performance. Homework Assignments in PPT on Machine Learning and Read Chapter 4 to prepare for Project on Spam Detection.

Week 2: Bayes Learning 1: Knowledge representation for AI, graphical probabilistic models, Bayes rule, natural language processing (corpus, document term matrix, word cloud), Naïve Bayes classifier. Project (Chapter 4): Spam Message Detection using Naïve Bayes classifier

Week 3: Bayes Learning 2: random variables and probability distributions, joint, marginal, and conditional probability distribution, use apply() and sweep() to transform joint, marginal, and conditional probability tensors. Tensor computation for Bayes revision from contingency tables and likelihood tables. Assignments 1-3 on PPT on Bayes Learning II.

Week 4: Bayes Learning 3: Hidden Markov Model: Markov chains, google page ranking, transition probability, states, emission probability, forward probability, backward probability, Viterbi algorithm, Baum-Welch algorithm, HMM package, maximum likelihood principle: the concept of likelihood and computing log-likelihood of observations of assumed distributions (normal, binomial, Poisson, gamma), Project (supplementary material): Predicting promoter gene sequences using Hidden Markov Model

Week 5: Bayes Learning 4: Bayesian modeling and learning: Conditional Independence, d-separation, moral graph, visualize Bayesian network, create custom Bayes net, learn Bayes net structures from data, specifying marginal and conditional probability tensors, learn probability tensors from data, "bnlearn" package. Homework Assignments (see PPT on Bayes Learning IV).

Week 5: Bayes Learning 5: local propagation, cliques, triangulation, running intersection property, Markov tree, Markov blanket, soft and hard evidence, Bayesian inference using gRain package. Homework Assignments (see PPT on Bayes Learning 5)

Week 6: Bayesian Learning 6: Probabilistic topic modeling, Dirichlet distribution and multinomial distribution, generative modeling, Latent Dirichlet Allocation, topic model for Associated Press. Project (supplementary material): topic modeling for BBC News.

Week 7* (covered in ISM 602): Generalized Linear Modeling 1 (Chapter 6 and supplementary note on exponential family and link function): exponential family of probability distributions, linear combination, link functions, key features of generalized linear models, linear regression: assumptions: normality test and transformation, detecting collinearity, residual homoscedasticity and independence, OLS algorithm, interpret regression results: t-test and F-test, R-squared, RSS, RSE, MSE, TSS, and four residual plots for testing assumptions. Model selection using step() function. Assignment: see PPT on GLM 1 and prepare for Project (Chapter 6): predicting medical expenses using linear regression (apply Cox-Box transformation for the dependent variable)

Week 8: Generalized Linear Modeling 2 (Supplementary Note): Gamma distribution, Gamma regression, interpretation of gamma regression coefficients and results, residual deviance, deviance residuals, null deviance, model deviance, Project (Chapter 6): predicting medical expenses using linear regression (apply Cox-Box transformation for the dependent variable and apply gamma regression without using Box-Cox transformation)

Week 9: Generalized Linear Modeling 3 (supplementary notes): Binomial Logistic classifier (mathematics for logistic classifier model development and MLE estimates, concepts and code of deviance, null deviance, deviance residuals) Project (Supplementary Notes): Predicting Heart Disease using logistic regression

Week 10: Deep Learning 1 (Chapter 7): artificial neurons, activation functions (step, linear, ReLU, Sigmoid, GeLU, tanh, softmax), coding feedforward neural network, coding back propagation, use Netflow for visualizing neural networks and backpropagation. Assignment: see PPT on Deep Learning 1

Week 11: Deep Learning 2 (Supplementary Note): MNIST data set, reshape and normalize image matrix for dense layer inputs, multidimensional tensors, build neural networks of dense layers with Keras for binary classification, linear regression, and multicategory classification, selecting loss functions for classification and regression, selecting optimizers. Project (Chapter 7): Modeling the strength of concrete with the "neuralnet" package and Keras.

Week 12: Deep Learning 3 (Supplementary Note): Backpropagation, Optimization theory and batch gradient decent for minimizing SSE and Cross-Entropy using ReLU, sigmoid, step, and linear activation functions. Code backpropagation for perceptron. Assignment: See PPT on Deep Learning 3

Week 13: Deep Learning 4 (Supplementary Note): Convolutional Neural Network (convolution and pooling operations, Convnet architectures, transform image matrices, use keras and tensor flow to create convnets for image recognition). Homework Assignments (see PPT on Deep Learning 4)

Week 14: Deep Learning 5 (Supplementary Note): Recurrent Neural Networks, word embedding, input shape conversion, reshape time series data for RNN, use keras and tensor flow to design RNN modules for sentiment analysis and multivariate time series analysis. Homework Assignments (see PPT on Deep Learning 5)

Week 15: Deep Learning 6 (Supplementary Note): vanishing and exploding gradients, long term memory vs. short-term Memory, LSTM cell, update gates, forget gates, output gates, input shapes for RNN and LSTM, output shapes of LSTM with various return sequence, return state, and stateful options, repeat vector and time distributed densely, sentiment analysis, linear regression, and time series predictions using LSTM. Assignment: see PPT on Deep Learning 6

Week 16: Deep Learning 7 (Supplementary Note): Encoder and Decoder, Functional APIs, multistep and multivariate predictions and generations. Project (supplementary note): Text Generation with LSTM and keras

Week 17: Final Exam

Exam Schedule: This course is project oriented. It has one final exam including multiple choice, hands-on, and essay questions

Assignments: Homework and/or project is assigned once a week. Each weekly assignment consists of one real-world data project along with simple exercises and multiple choice questions to check your understanding of the basic concepts and algorithms. Homework assignments are usually due at the beginning of the next class. No late homework will be graded. Please show your work in a neat and orderly fashion. Write or type your work on one side and in every other line. Use standard size paper (8 1/2'' by 11''). Do not use spiral notebook paper. For electronic submissions, it is the student's responsibility to submit correct files in correct formats.

Attendance: Attendance is MUST and will be 10% of your final grade. Attendance will be managed by ecourse.org system. The formula for computing your attendance grade is non-linear. It will take 2 points off for the first absence and 7 points off for the second absence. If you missed the equivalent of three-week classes, you fail the course automatically. Under special situations, you can take a class online with the following guidelines:

You must obtain permission from the instructor at least one day ahead of the online session
Follow the lecture or its recordings to perform all in-class hands-on exercises and take notes. Within one day from the class submit your notes and the finished exercises to ecourse.org as Proof of Attendance.
All weekly assignments are due at the same time as in-person classes. All exams must be onsite.

Quizzes: I will use quizzes regularly to check your completion or preparation of assignments.

Makeup: Each student with an appropriate excuse (such as sickness) along with acceptable proof may have at most one chance to makeup one homework assignment or a quiz. Submitting wrong files or files in wrong formats is not eligible excuse. All makeup must be finished within one week of its normal due date and must be before the answer keys are released. Note that it is your privilege but not right to have this special favor.

Grades: Your final grade will be calculated by the following formulas:

30% (Exams) + 60% (Homework + Projects) + 10% (Attendance)

A = 93%-100%; A– = 90-92.4%; B+ = 87-89.4%; B = 83-86.4%; B– = 80-82.5%; C+ = 77-79.4%; C = 73-76.4%; C– =70-72.4%; D = 60-69.4%; F = 59% and less

Misconduct: Academic misconduct by a student shall include, but not limited to: disruption of classes, giving and receiving unauthorized aid on exams or in the preparation of assignments, unauthorized removal of materials from the library, or knowingly misrepresenting the source of any academic work. Academic misconduct by an instructor shall include, but not limited to: grading student work by criteria other than academic performance or repeated and willful neglect in the discharge of duly assigned academic duties.

On Collaboration: All for-credit assignments, except for those designated as group projects, must be done independently, and collaboration in providing or asking for answers to those assignments constitutes cheating.

On AI Tools: In this class, I allow students to use AI tools to help their learning. However, submitting AI generated work for credits is a violation of academic code. If a submitted work is suspected to be AI generated, the student will be asked to reproduce the submitted work in front of the instructor.

School Rule Cited: For graduate students that have been caught cheating: First offense = either a zero on the exam or assignment, or an F in the course; Second offense = Either an F in the course or expulsion (depending upon the punishment of the first offense)


Add Comments/评论

Register

Blog News Post

Blog Posts News Digest Contact Us About Developer Privacy Policy