CSCI 4380/6380 Data Mining
CSCI 4380/6380 Data Mining
Spring 2026: Tuesdays and Thursdays 1:15pm - 2:35pm, Geography and Geology Room 200A
& Wednesdays 1:15pm - 2:10pm, Chemistry Building Room 674
Instructor: Prof. Khaled
Rasheed
Office Hours: Wednesday 2:30-4:30pm or by email
appointment
Office Location: Room 543, Boyd GSRC
Email: khaled@uga.edu
Objectives:
The course aims to provide students with a broad introduction to the
field of Data Mining and related areas and to teach students how to
apply these methods to solve problems in complex domains.
The course is appropriate both for students preparing for research in
Data Mining and Machine Learning, as well as Bioinformatics, Science
and Engineering students who want to apply Data Mining techniques to
solve problems in their fields of study.
Recommended Background:
CSCI 2720 Data Structures. Familiarity with basic computer algorithms
and data structures. Knowledge of a modern programming language.
Topics to be Covered:
Part I: Data Mining techniques: Selected from: Association and
Classification Rule Mining, Linear Models, Decision Trees and Random
Forests, Neural Network approaches, Support Vector Machines, Bayesian
Learning, Instance-based Learning, Pre-processing and Feature
Selection, Performance evaluation, Ensemble Learning and clustering.
Part II: Data Mining applications: Selected from: Bioinformatics,
Biomedical/Physical/Chemical modeling, medical diagnosis, text/web
mining, pattern recognition and/or other contemporary applications.
Expected Work:
Reading; assignments (include running experiments using the Weka
package); paper presentation, two midterms; and term project (may
require programming or running existing packages) and paper.
Unless otherwise announced by the instructor, all assignments and all
exams must be done entirely on your own.
Academic Honesty and Integrity:
All academic work must meet the standards contained in
"A Culture of Honesty." Students are responsible for informing
themselves about those standards before performing any academic
work. The penalties for academic dishonesty are severe and ignorance
is not an acceptable defense.
Grading Policy:
Assignments: 30% (Programs, homeworks, attendance, paper presentation)
Midterm Examinations: 40%
Term Project: 30% (includes term paper and presentation)
Students may work on their term projects in groups of up to
FOUR students each. The above distribution is only
tentative and may change later. The instructor will announce any
changes.
Assignment Submission Policy
Assignments must be turned in by the assigned deadline on eLC. Late
assignments will lose 10% for every calendar day. Rare exceptions may be made by the
instructor only under extenuating circumstances and in accordance with
the university policies.
Course Home-page
A variety of materials will be made available on the DM Class
Home-page at
https://khaledmrasheed.github.io/DMcourse/, including handouts,
lecture notes and assignments. Announcements may be posted between
class meetings. You are responsible for being aware of whatever
information is posted there.
Lecture Notes
Copies of Dr. Rasheed's lecture notes will be
available on eLC and at the bottom of the class home page. Not all the lectures
will have electronic notes though and the students should be prepared
to take notes inside the lecture at any time.
Textbook in Bookstore
"Data Mining: Practical Machine Learning Tools and Techniques
(4th edition)", Ian Witten, Eibe Frank , Mark Hall and Christopher Pal. Morgan Kaufmann,
2016. (Required)
ISBN-10: 0128042915 & ISBN-13: 978-0128042915
Web Resources
The WEKA Machine Learning Project
University of California at Irvine ML Repository
The Kaggle data science home
Announcements:
[2-16-2026] Course project signup link on https://docs.google.com/document/d/1QTRC2YvnQOmMQeYuLvZhV-KQc1CxI-c4YpkvWitMrTs/edit?usp=sharing
[2-20-2026] The first midterm exam will be on Thursday 2-26-2026. It will cover all the topics discussed in the course till the end of Chapter 4. It will be open notes but the use of books, laptops or phones will not be allowed. You should bring a calculator to the exam; If you do not have a calculator you may use your phone as a calculator after asking me for permission. You should also bring your lecture notes and all handouts and you may also bring any additional notes, homeworks etc. We shall have a review lecture on Wednesday 2-25-2026 in which we will go over the homework solutions and some additional problems from previous midterm exams.
Papers:
"A robust microbiome signature for autism spectrum disorder across different studies using machine learning" 2024. [Partha Koundinya Panguluri][4-7]
{download}
"Unbiased split selection for classification trees based on the Gini Index" 2007. [Aiden King Benise][4-7]
{download}
"Beyond Reality: The Pivotal Role of Generative AI in the Metaverse " 2023. [Joseph Vos][4-8]
{download}
"Winner-takes-all for Multivariate Probabilistic Time Series Forecasting" 2025. [Firas Astwani][4-9]
{download}
"Prediction and Prioritization of Rare Oncogenic Mutations in the Cancer Kinome Using Novel Features and Multiple Classifiers" 2014. [Pardis Sadatian][4-9]
{download}
"Predicting Post Severity in Mental Health Forums" 2016. [Quentin Boccaleri][4-9]
{download}
"DysLexML: Screening Tool for Dyslexia Using
Machine Learning"
2019. [Tilak Savani][4-14]{download}
"Automated Classification of Text Sentiment" 2018. [Chongxin Zhong][4-14]
{download}
"Application of data mining for young children education using emotion information" 2018. [Caleb Odunade][4-14]
{download}
"An automated approach to predict diabetic patients using KNN imputation and efective
data mining techniques" 2024. [Nikki Azadi][4-16]
{download}
"Merging computational fluid dynamics and machine learning
to reveal animal migration strategies" 2020. [Abigail Clark][4-21]
{download}
"Motor Imagery EEG Signal Processing and Classification Using Machine Learning Approach" 2017. [?][?]
{download}
"Text Similarity in Vector Space Models: A Comparative Study"
2019. [?][?]
{download}
"Edge Machine Learning: Enabling Smart Internet of Things
Applications" 2018. [?][?]
{download}
"Determination of Flowing Grain Moisture Contents by Machine Learning Algorithms Using Free Space Measurement Data" 2022. [?][?]
{download}
"Clustering cancer gene expression data: a comparative study" 2013. [?][?]
{download}
"Forecast of the higher heating value based on proximate analysis by using support vector machines and multilayer perceptron in bioenergy resources" 2022. [?][?]
{download}
"Web Application Attacks Detection Using Machine Learning
Techniques" 2018. [?][?]
{download}
Assignments:
Homework 1: Exercise 17.1 on pages 559 - 565 of the Weka exercises. You can download all the exercises
from https://khaledmrasheed.github.io/DMcourse/Weka-Tutorial-Exercises.pdf.
[Due 2-5-2026 on eLC] The use of Generaative AI is not allowed.
Homework 2
Course Project
Homework 3: Exercise 17.6 on pages 582 - 585 of the Weka exercises. You can download all the exercises
from https://khaledmrasheed.github.io/DMcourse/Weka-Tutorial-Exercises.pdf. Include screen shots and answers to the questions.
[Due 4-1-2026 on eLC]
Homework 4
Lecture Notes:
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Weka Tutorial Slides by Roxana Attar
Chapter 5
Chapter 7
Chapter 8
Chapter 12
The course syllabus is a general plan for the course;
deviations announced to the class by the instructor may be
necessary.
Last modified: April 2, 2026.
Khaled Rasheed
(khaled[at]uga.edu)