Skip to content

WayneJz/COMP9321-19T1

Repository files navigation

COMP9321-19T1

COMP9321 Data Services Engineering 2019T1

ALL CODES SHOULD BE APPROPRIATELY REFERENCED, COPYING MAY RESULT IN PLAGIARISM

Lecturer in charge: Lina Yao

Assignments

  1. Data cleaning and visualization, Mark: 13/13 (Bonus 3 marks).

  2. Restful Flask API and Swagger, Mark: 10/10.

  3. Heart disease analysis (including Machine learning, backend and frontend), Mark: 16/20.

Main content

  1. Data formation and access: Fetch and collect different types of data, PDF, XML etc. database access with SQL and ORM.

  2. Data quality and cleaning: Standardization, normalization (Min-max, z-score, log ...), NLP transformation (Tokenization, Lemmatization, Stemming), Pairwise matching score.

  3. Data visualization: Benefits of data visualization, different visual methods (Histograms, charts etc.), High dimension visualization, dimensionality reduction (PCA and SVD algorithms), drawbacks of PCA.

  4. Restful API and client: HTTP request methods, XML-based and restful API, uniform interface, statelessness, caching, swagger, SOAP vs REST, API design, API security (Authentication, token-based methods etc..), O-Auth.

  5. Data analytics: Bayes theorem, overview of data mining, correlation, similarity measure, unsupervised learning, clustering (K-Means, K-Means++), association rules mining (Apriori algorithm).

  6. Supervised learning: Linear regression, least square error (R-square values and p values), logistic regression, instance-based method (KNN), decision tree, build decision tree (entropy, ID3 algorithm), overfitting, cross validation, bagging decision tree, random forest.

  7. Neural networks: Gradient ascent/descent, forward pass, back propagation, activation and loss functions, learning rates, avoid overfitting (early stopping and dropout), Tensorflow and Keras, CNN, CNN convolution, RNN, RNN back propagation, long short-term memory (LSTM).

  8. Recommender Systems: Collaborative filtering, pearson correlation, user-based vs item-based vs content-based, latent factor based model, SVD based model, TF-IDF method, cosine similarity, knowledge-based approaches, hybrid recommender systems, accuracy (MAE, RMSE).

Copyright and Credits

All course slides, materials come from the lecturers. No sharing or commercial use before getting agreement from them. I will take no responsibility for such misuse.

Releases

No releases published

Packages

No packages published

Languages