Skip to content
View sugatagh's full-sized avatar

Block or report sugatagh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sugatagh/README.md

Sugata Ghosh, PhD

Sugata Ghosh | Gmail

Sugata Ghosh | Linkedin

Sugata Ghosh | GitHub

Sugata Ghosh | Kaggle

Sugata Ghosh | Twitter



Data Science and Machine Learning Portfolio Website: https://sugatagh.github.io/dsml/

Experience

Ford Motor Company 2024-Present
Reliability Data Scientist

  • Department: Global Data Insight and Analytics

Indian Institute of Science Education and Research Kolkata 2018-2024
Research Fellow and Teaching Assistant

  • Research Focus: Stochastic ordering

  • Teaching Assistantship: Served as Teaching Assistant for the courses Statistics I, Probability I, and Analysis I. Involved in conducting tutorial sessions, preparing question papers, and the grading process.

Education

Indian Institute of Science Education and Research Kolkata 2018-2024
Doctor of Philosophy in Statistics

Indian Institute of Technology Kanpur 2015-2017
Master of Science in Statistics

University of Calcutta 2012-2015
Bachelor of Science in Statistics

Skills

Languages: Python, SQL, R, MATLAB

Tools: LaTeX, Jupyter Notebook

Statistical Software: Minitab

Publications and Presentations

Refereed Journal Publications

Preprints

Academic Magazine Articles

  • Banerjee, P., Ghosh, S. (2016) A brief review on missing data. Prakarsho.*

  • Ghosh, S. (2014) A generalization of the Kelly gambling system. Prakarsho.

  • Dutta, T., Ghosh, S. (2014) An attempt to generate random numbers. Prakarsho.

Presentations

  • Departure-based Asymptotic Stochastic Order for Random Processes. International Workshop on Reliability Theory and Survival Analysis (IWRTSA) 2022, IISER Kolkata.

  • On Some Inconsistent Multivariate Distributions. Open House'16, IIT Kanpur.

*Prakarsho: Departmental magazine published by the Department of Statistics, St. Xavier's College, Kolkata.

Scholastic Achievements

Scholarship and Research Fellowship

  • Research Fellowship from University Grants Commission, MHRD, Government of India.

  • National Scholarship from Department of Higher Education, MHRD, Government of India.

Test Performances

  • AIR-94 in Mathematical Science paper in CSIR-UGC NET-JRF (Dec 2016).

  • AIR-31 in Mathematical Statistics paper in IIT-JAM (2015).

Seminars, Workshops, and Summer/Winter Schools

Winter School on Deep Learning: From Perceptrons to Diffusion Models.
Organized by Electronics and Communication Sciences Unit, ISI Kolkata.

International Workshop on Reliability Theory and Survival Analysis (IWRTSA) 2022.
Organized by Department of Mathematics and Statistics, IISER Kolkata.

Indo-French Center for Applied Mathematics (IFCAM) Winter School 2018.
On Stochastic Methods for Uncertainty Quantification and Sensitivity Analysis of Complex Models.

National Seminar on Application of Statistics and Statistical Computing.
Organized by Xaverian Statistical Association under Department of Statistics, St. Xavier’s College, Kolkata.

Fellowship Programs

TMLC Fellowship Program 2022-2023
Conducted by The Machine Learning Company.
Contributed to the Conversational AI DeepPavlov project.

Data Science and Machine Learning Projects

Author Identification with Natural Language Processing

  • Predicted the author of a new text, given a dataset of texts with corresponding authors.
  • Trained an LSTM algorithm with the help of GloVe embeddings and obtained a validation log loss of $0.581$.
  • GitHub repository: https://github.com/sugatagh/Spooky-Author-Identification

E-commerce Text Classification

  • Classified products into four given categories based on their descriptions available on an e-commerce platform.
  • Employed TF-IDF vectorizer and Word2Vec embedder with a number of classifiers. Obtained test accuracy of $0.949$ with the hyperparameter-tuned model achieving the highest validation accuracy (TF-IDF + Linear SVM).
  • GitHub repository: https://github.com/sugatagh/E-commerce-Text-Classification

Anomaly Detection in Credit Card Transactions

Higgs Boson Event Detection
Conducted by The Machine Learning Company.

  • Predicted whether or not an event produced in a particle accelerator indicates the discovery of a new particle.
  • Trained a deep neural network, achieving test AMS (approximate median significance) score of $1.200$ and test accuracy of $0.824$, using GridSearchCV for hyperparameter optimization.
  • GitHub repository: https://github.com/sugatagh/Higgs-Boson-Event-Detection

Patient Survival Prediction
Conducted by The Machine Learning Company.

Electron Energy Flux Prediction
Conducted by The Machine Learning Company.

  • Predicted the total electron energy flux based on various relevant features, in the context of modeling electron particle precipitation from the magnetosphere to the ionosphere.
  • Trained a deep neural network, achieving test $R^2$-score of $0.699$, using Keras Tuner for hyperparameter tuning.
  • GitHub repository: https://github.com/sugatagh/Electron-Energy-Flux-Prediction-using-Deep-Learning

Site Energy Usage Intensity Prediction
Conducted by The Machine Learning Company.

Road Traffic Accident Severity Classification
Conducted by The Machine Learning Company.

Natural Language Processing with Disaster Tweets
Jointly with Shyambhu Mukherjee.

Credit Card Fraud Detection
Jointly with Shyambhu Mukherjee.

  • Classified credit card transactions as authentic or fraudulent, based on relevant data such as time and amount.
  • Obtained test $F_2$-score of $0.880$ with random forest algorithm after oversampling the minority class (fraudulent transactions) in the training set via synthetic minority over-sampling technique (SMOTE).
  • GitHub repository: https://github.com/sugatagh/Credit-Card-Fraud-Detection

Online Internships

Machine Learning Internship Program 2022
Conducted by Uniconverge Technologies and The IoT Academy.

  • Detected duplication of points of interest in a dataset of over $1.5$ million place entries.
  • Trained several algorithms and obtained test accuracy of $0.770$ with hyperparameter-tuned XGBoost classifier.
  • GitHub repository: https://github.com/sugatagh/Foursquare-Location-Matching

Academic Course Projects

A Time Series Analysis of Monthly Airline Revenue Passenger Mile (RPM)
Supervisor: Dr. Amit Mitra (IIT Kanpur).

  • Analyzed RPM data for $1996 – 2014$ and built a predictive model for forecasting future revenue values.

A Study on Performances in the Olympic Games
Supervisor: Dr. Sharmishtha Mitra (IIT Kanpur).

  • Built a regression model to predict the overall performance of the countries in the Summer Olympic Games.

Students' Future Plans and the Reasons Behind
Supervisor: Dr. Shalabh (IIT Kanpur).

  • Examined the variation in career choices of the students at IIT Kanpur and how the reasons for such choices vary.

A Statistical Analysis of the Variation in Preference to Movie Genres among Spectators
Supervisors: Dr. Durba Bhattacharya and Prof. Soumya Banerjee (St. Xavier's College, Kolkata).

  • Studied how hobbies influence preferred movie genre of an individual. Checked bias due to gender and age-group.
  • Analyzed differences in preferring one factor for a movie's success over another across age-groups and gender.

Certifications

Generative AI for Everyone 2023
Authorized by DeepLearning.AI, offered by Coursera.
https://www.coursera.org/account/accomplishments/certificate/EV8T2EF4VUKN

Machine Learning Specialization 2022
Authorized by Stanford University, offered by Coursera.
https://www.coursera.org/account/accomplishments/specialization/certificate/U2MZV5HWRG5L

Data Analyst in SQL Track 2022
Offered by DataCamp.
https://www.datacamp.com/statement-of-accomplishment/track/689ba9d0ab9984f55aac593e6caacd1f9d197194

IBM Data Science Specialization 2022
Authorized by IBM, offered by Coursera.
https://www.coursera.org/account/accomplishments/specialization/certificate/9V355HMT2FB6

Applied Data Science with Python 2021
Offered by Electronics and ICT Academy, IIT Roorkee.
https://eict.iitr.ac.in/wp-content/uploads/L214613B669.jpg

Academic Courses

Statistics
Regression Analysis, Statistical Inference, Time-Series Analysis, Statistical Simulation and Data Analysis, Probabilistic Theory of Pattern Recognition, Multivariate Analysis, Analysis of Variance, Robust Statistical Methods, Nonparametric Inference, Non-linear Regression, Large Sample Theory, Sampling Theory, Matrix Theory and Linear Estimation, Design of Experiments, Statistical Quality Control, Distributions Theory in Statistics, Population Statistics, Economic Statistics.

Mathematics
Real Analysis, Linear Algebra, Multivariable Calculus, Numerical Analysis, Complex Variables, Ergodic Theory, Introduction to Graph Theory, Measure theory.

Probability and Applications
Probability Theory, Applied Stochastic Process.

Others
Computer Programming and Data Structures, Research Methodology.

Pinned Loading

  1. E-commerce-Text-Classification E-commerce-Text-Classification Public

    Proper categorization of e-commerce products enhances the user experience and achieves better results with external search engines. The objective of the project is to classify a product into four g…

    Jupyter Notebook 10 4

  2. Higgs-Boson-Event-Detection Higgs-Boson-Event-Detection Public

    The goal of the project is to classify an event produced in the particle accelerator as background or signal. A background event is explained by the existing theories and previous observations. A s…

    Jupyter Notebook 2 1

  3. Anomaly-Detection-in-Credit-Card-Transactions Anomaly-Detection-in-Credit-Card-Transactions Public

    The objective of the project is to detect anomalies in credit card transactions. More precisely, given the data on time, amount and 28 transformed features, our goal is to fit a probability distrib…

    Jupyter Notebook 3

  4. Spooky-Author-Identification Spooky-Author-Identification Public

    The objective of the project is to train an LSTM model with the help of GloVe embeddings, to predict probabilities that a given text is written by particular authors. Furthermore, these probabiliti…

    Jupyter Notebook 1

  5. Foursquare-Location-Matching Foursquare-Location-Matching Public

    Using the provided dataset of over one-and-a-half million place entries, heavily altered to include noise, duplications, extraneous, or incorrect information, the objective is to produce an algorit…

    Jupyter Notebook

  6. Implementing-Logistic-Regression-from-Scratch Implementing-Logistic-Regression-from-Scratch Public

    While it is convenient to use advanced libraries for day-to-day modeling, it does not give insight into the details of what really happens underneath, when we run the codes. In this work, we implem…

    Jupyter Notebook 5 1