Instructor: Dr. Liping Liu, CBA360, X5947
Credits: 3 hours
Applicable Term: Spring 2025 (January 12 - May 18)
Textbooks:
- Main Text: Brett Lantz, Machine Learning with R, 2nd Ed., Packet Publishing, 2015 (ISBN: 978-1-78439-390-8)
- Supplementary Text: Rui Miguel Forte, Mastering Predictive Analytics with R, Packt Publishing, 2015. ISBN: 978-1-78398-280-6.
- Supplementary Text: Matthew Taddy, Leslie Hendrix, and Matthew Harding, Modern Business Analytics: Practical Data Science and Decision Making, McGraw-Hill, 2023. ISBN: 978-1-264-07167-8.
Reference Resources:
- Book: Trevor Hastie, Robert Tisshirani, and Jerome Friedman, The Elements of Statistic Learning: Data Mining, Inferences, and Prediction, 2nd Ed., Springer, 2016. ISBN: 978-0387848570
- Book: Julian J. Faraway, Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, Chapman & Hall/CRC, 2022. 978-1498720960
- Book: Radhakrishnan Nagarajan, Marco Scutari, and Sophie Lebre, Bayesian Networks in R with Applications in Systems Biology, Springer, 2013. ISBN 978-1-4614-6445-7
- Book: François Chollet, Deep Learning with R, 2nd Ed., Manning, 2022. ISBN 9781633439849.
Office Hours: 1:30-3:30 PM on Tuesdays and Thursdays
Course Description: This course covers advanced topics on data analytics. The selected topics include generalized linear modeling (e.g., linear regression, logistic regression, gamma regression, mixed effects modeling, etc.), deep learning (e.g., deep neural networks, convolutional networks, recurrent neural networks, transformers, etc. ), and Bayesian learning (e.g., Naïve Bayes classifier, Hidden Markov models, Latent Dirichlet Allocation, etc.). It introduces the theory and application of each selected model, explains the rationale of each selected learning algorithm, and applies the implementation of each selected algorithm in a programming language to real-world projects. The course requires a programming language such as R, SAS, or Python for big data projects. This is a good free online book to go over on R programming: R for Data Scientists. Prerequisite: 6500:602
Objectives:
- Understand the concepts, assumptions, and applications of general linear models, graphical probabilistic models, and neural networks
- Understand the rationale of the algorithms for Bayesian learning (Naïve Bayes classifier, graphical probabilistic model, Hidden Markov models, and Latent Dirichlet Allocation), general linear modeling (linear regression, logistic regression, gamma regression, Poisson regression), and deep learning (regression, time-series prediction, computer vision, and natural language processing)
- Enhances R programming skills in data preparation, exploration, and visualization and applies the implementations of each machine learning algorithm in R to real-world projects
Weekly Schedule:
- Week 1: Introduction (Chapters 1 and 10): Data Analytics, Task - Model - Algorithm Alignment, Data Preparation, and Model Performance. Homework Assignments in PPT on Machine Learning and Read Chapter 4 to prepare for Project on Spam Detection.
- Week 2: Bayesian Learning 1: Knowledge representation, graphical probabilistic models, Naïve Bayes classifier (corpus, document term matrix, word cloud), and hidden Markov models. Project (Chapter 4): Spam Message Detection using Naïve Bayes classifier
- Week 3: Bayesian Learning 2: Conditional Probability and Bayesian Inference (supplementary note for matrix computing). Project (supplementary material): Predicting promoter gene sequences using Hidden Markov Model
- Week 4: Bayes Learning 3: Conditional Independence, d-separation, moral graph, Bayesian network visualization, Bayesian modeling and learning using "bnlearn" package. Homework Assignments (see PPT on Bayes Learning III)
- Week 5: Bayes Learning 4: local propagation, Markov tree, Markov blanket, soft and hard evidence, Bayesian inference using gRain package. Homework Assignments (see PPT on Bayes Learning 4)
- Week 6: Bayesian Learning 5: Probabilistic topic modeling, Dirichlet distribution and multinomial distribution, generative modeling, Latent Dirichlet Allocation, topic model for Associated Press. Project (supplementary material): topic modeling for BBC News.
- Week 7: Generalized Linear Modeling 1 (Chapter 6 and supplementary note on exponential family and link function). Project (Chapter 6): predicting medical expenses using linear regression (apply Cox-Box transformation for the dependent variable and apply Gamma regression without transformation)
- Week 8: Generalized Linear Modeling 2 (Supplementary Note): Gamma regression, maximum likelihood principle, and interpretation of gamma regression results. Homework Assignment (see PPT on Generalized Linear Modeling 2)
- Week 9: Generalized Linear Modeling 3 (supplementary notes): Binomial Logistic classifier (mathematics for logistic classifier model development and MLE estimates, concepts and code of deviance, null deviance, deviance residuals) Project (Supplementary Notes): Predicting Heart Disease using logistic regression
- Week 10: Deep Learning 1 (Chapter 7): artificial neurons, activation functions, coding feedforward neural network, coding back propagation. Project (Chapter 7): Modeling the strength of concrete with the "neuralnet" package
- Week 11: Deep Learning 2 (Supplementary Note): Optimization theory and batch gradient decent for minimizing SSE and Cross-Entropy using ReLU, sigmoid, step, and linear activation functions. Deep learning for numeric prediction and classification: Tensor Flow, model specification using keras, optimizers, loss functions, hyper parameters, computer vision, NLP. Examples: image classification, and customer reviews. Assignment: predicting housing prices.
- Week 12: Deep Learning 3 (Supplementary Note): Convolutional Neural Network (convolution and pooling operations, Convnet architectures, transform image matrices, use keras and tensor flow to create convnets for image recognition). Homework Assignments (see PPT on Deep Learning II)
- Week 13: Deep Learning 4 (Supplementary Note): Recurrent Neural Networks, Long Short-Term Memory Models, Word Embedding, input shape conversion, output of LSTM layers, use keras and tensor flow to design RNN and LSTM modules for sentiment analysis and multivariate time series analysis. Homework Assignments (see PPT on Deep Learning III)
- Week 14: Deep Learning 5 (Supplementary Note): Encoder and Decoder, Repeat Vector and Time Distributed Densely, Functional APIs, multistep and multivariate predictions and generations. Project (supplementary note): Text Generation with LSTM and keras
Exam Schedule: This course is project oriented. It has one final exam including multiple choice, hands-on, and essay questions
Assignments: Homework and/or project is assigned once a week. Each weekly assignment consists of one real-world data project along with simple exercises and multiple choice questions to check your understanding of the basic concepts and algorithms. Homework assignments are usually due at the beginning of the next class. No late homework will be graded. Please show your work in a neat and orderly fashion. Write or type your work on one side and in every other line. Use standard size paper (8 1/2'' by 11''). Do not use spiral notebook paper. For electronic submissions, it is the student's responsibility to submit correct files in correct formats.
Attendance: Attendance is MUST and will be 10% of your final grade. Attendance will be managed by ecourse.org system. The formula for computing your attendance grade is non-linear. It will take 2 points off for the first absence and 7 points off for the second absence. If you missed the equivalent of three-week classes, you fail the course automatically. Under special situations, you can take a class online with the following guidelines:
- You must obtain permission from the instructor at least one day ahead of the online session
- Follow the lecture or its recordings to perform all in-class hands-on exercises and take notes. Within one day from the class submit your notes and the finished exercises to ecourse.org as Proof of Attendance.
- All weekly assignments are due at the same time as in-person classes. All exams must be onsite.
Quizzes: I will use quizzes regularly to check your completion or preparation of assignments.
Makeup: Each student with an appropriate excuse (such as sickness) along with acceptable proof may have at most one chance to makeup one homework assignment or a quiz. Submitting wrong files or files in wrong formats is not eligible excuse. All makeup must be finished within one week of its normal due date and must be before the answer keys are released. Note that it is your privilege but not right to have this special favor.
Grades: Your final grade will be calculated by the following formulas:
30% (Exams) + 60% (Homework + Projects) + 10% (Attendance)
- A = 93%-100%; A– = 90-92.4%; B+ = 87-89.4%; B = 83-86.4%; B– = 80-82.5%; C+ = 77-79.4%; C = 73-76.4%; C– =70-72.4%; D = 60-69.4%; F = 59% and less
Misconduct: Academic misconduct by a student shall include, but not limited to: disruption of classes, giving and receiving unauthorized aid on exams or in the preparation of assignments, unauthorized removal of materials from the library, or knowingly misrepresenting the source of any academic work. Academic misconduct by an instructor shall include, but not limited to: grading student work by criteria other than academic performance or repeated and willful neglect in the discharge of duly assigned academic duties.
On Collaboration: All for-credit assignments, except for those designated as group projects, must be done independently, and collaboration in providing or asking for answers to those assignments constitutes cheating.
On AI Tools: In this class, I allow students to use AI tools to help their learning. However, submitting AI generated work for credits is a violation of academic code. If a submitted work is suspected to be AI generated, the student will be asked to reproduce the submitted work in front of the instructor.
School Rule Cited: For graduate students that have been caught cheating: First offense = either a zero on the exam or assignment, or an F in the course; Second offense = Either an F in the course or expulsion (depending upon the punishment of the first offense)
|
|