Skip to content

Some of my Data Science and Data Analysis projects I have done for academic, self-learning and hobby purposes.

License

Notifications You must be signed in to change notification settings

HH197/Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Portfolio - Hamid Hamidi

This Portfolio is a collection of my Data Science and Data Analysis projects I have done for academic, self-learning and hobby purposes. This portfolio also contains my Achievements, skills, and certificates.

Table of Contents

Projects

ZINB-Grad: A Gradient Based Linear Model Outperforming Deep Models

In this project, I designed a generalized linear model and trained it using momentum-based gradient descent and overcame the scalability and efficiency challenges inherited in generalized linear models. I showed that my model is outperforming state-of-the-art deep learning solutions while using 90% fewer resources. I compared my model ZINB-Grad with scVI and ZINB-WaVE, both developed at the UC Berkeley, using a set of benchmarks, including run-time, goodness-of-fit, imputation error, clustering performance, and batch correction.

Deep Generative Modeling and Probabilistic Dimension Reduction

In this project, I re-engineered the implementation of a deep generative model built at the University of Berkeley (scVI), ensuring numerical stability and reproducibility of results while increasing efficiency by 5% using Pytorch, Pytorch Lightning, and Pyro, resulting in a general dimension reduction and data imputation tool.

Unsupervised Cell Type Identification

Cell type identification usually is one of the critical goals of scRNA-seq data analysis. This identification is typically a clustering problem. Groups identified in an unsupervised manner are annotated to cell types. However, clustering single cells based on their gene expression level is complicated considering the curse of high dimensionality, low SNR, and artifacts. This study aims to identify different cell types and cell states using publicly available single-cell data sets. Here, we propose various approaches consisting of three steps, pre-processing, dimension reduction, and clustering.

Car Price and Heart Failure prediction using Generalized Linear Models (GLMs)

Here, we used two different data sets to show the broad applications of the GLMs in real-world problems. We devised Gamma regression and logistic regression models to predict Car price and Heart Failure. In our analyses, we focused on model fitting and highlighting the statistically significant variables for prediction using step-wise log-likelihood ratio test and AIC test.

Micro Projects

Achievements

  • Recipient of the Graduate Assistant Teaching Excellence Award for excellent communication skills, solid academic background, and passionate devotion towards teaching, University of Calgary.
  • Publication: Signatures of Mutational Processes in Human DNA Evolution, 2021, bioRxiv.
  • Recipient of Top Student of the Academic Year Award for excellent academic performance, Iran University of Science and Technology.
  • Publication Prediction of MEMS-based INS Error Using Interval Type-2 Fuzzy Logic System in INS/GPS Integration, 25th International Computer Conference, Computer Society of Iran (CSICC), 2020, pp. 1-5.

Core Competencies

  • Methodologies: Machine Learning, Deep Learning, Statistics, Bayesian Inference, A/B Testing and Experimentation Design, Big Data Analytics
  • Languages: Python (Pandas, Numpy, Scikit-Learn, Scipy, Pytorch, Pyro, Pytorch Lightning, LightGBM, H5py, Matplotlib), R (Dplyr, Reshape, ggplot2), SQL
  • Tools: MySQL, Git, Jupyter Notebook, Apache Spark, Apache Airflow, Apache Kafka, MS Excel

Certificates

Acknowledgement

About

Some of my Data Science and Data Analysis projects I have done for academic, self-learning and hobby purposes.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published