Skip to content
View rohitkulkarni08's full-sized avatar
  • Medicodio Inc.
  • New York City Metropolitan Area
  • 05:43 (UTC -05:00)
  • LinkedIn in/rohitak8

Block or report rohitkulkarni08

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
rohitkulkarni08/README.md

📈 Data Scientist & Data Engineer | Master's in Statistics and Data Science @ Rutgers University | Passionate about Machine Learning & Data-Driven Insights 🌟

I'm Rohit Kulkarni, a Data Scientist and Engineer with a passion for using data to drive impactful decisions. I'm currently pursuing my Master's in Statistics & Data Science at Rutgers University. I specialize in statistical modeling, machine learning, and data engineering.

Connect with me on 📧 LinkedIn | 📧 Email

About Me

🎓 Masters Science in Statistics & Data Science at Rutgers University

💼 Data Engineer & Data Scientist at Fractal Analytics

  • Developed a Portfolio Optimization for a prominent CPG company, to help identify delisting opportunities for underperforming products
  • Automated 10+ end-to-end ETL and CI/CD pipelines reducing manual activities by over 40%
  • Migrated 60+ notebooks from Python to PySpark improving runtime by 85%
  • Lead the technical activities of the US track of the project, managing a team of 3

📊 Skills and Certifications

  • Staistical Modeling | Machine Learning | Data Wrangling | Data Engineering | Cloud Computing | Data Mining
  • Python | R | SQL | PySpark | Microsoft Azure| PowerBI | Hadoop | Apache Spark
  • Microsoft Certified Azure Data Engineer Associate (DP-203)

🚀 Projects

  • Enhancing Predictive Model Reliability with Bootstrap Techniques: Enhanced the reliability and computational efficiency of predictive models by implementing Bag of Little Bootstraps (BLB) across large datasets, achieving superior scalability and accuracy in uncertainty estimation
  • Optimized E-Commerce Sales Analysis with Azure ETL Pipeline: Built an advanced ETL pipeline leveraging Microsoft Azure and PySpark to analyze and optimize e-commerce sales, providing actionable insights through detailed data processing and analysis.
  • Enhancing Predictive Model Reliability with Bootstrap Techniques: Applied both standard Bootstrap and the Bag of Little Bootstraps (BLB) methods to assess the reliability and efficiency of predictive models in large datasets, offering scalable and robust statistical analysis
  • Automated ETL Pipeline for Enhanced Movie Data Insights: Developed a comprehensive, automated ETL pipeline using Microsoft Azure to efficiently process and analyze IMDb movie ratings data, ensuring seamless integration and storage in sophisticated reporting frameworks
  • NFL Player Evaluation: Conducted regression analysis and hypothesis testing to evaluate NFL players, establishing the significance of key factors beyond physical attributes
  • Flight Price Estimation: Predicted flight prices using several regression algorithms like XGBoost, SVR, RandomForestRegressor, achieving 95% accuracy score
  • Customer Churn Rate Prediction: Analyzed customer retention in online food sales, and leveraged machine learning models to predict customer churn rate with 92% classification accuracy

More projects in my GitHub repo..

Languages and Tools

azure c cplusplus docker git hadoop java mongodb mssql mysql pandas python pytorch scikit_learn seaborn tensorflow

Pinned Loading

  1. Azure-ETL-AmazonSalesAnalysis Azure-ETL-AmazonSalesAnalysis Public

    A comprehensive ETL pipeline and sales analysis project leveraging Microsoft Azure and PySpark, designed to optimize e-commerce sales by providing actionable insights through detailed data analysis.

    Jupyter Notebook 3 2

  2. Azure-ETL-Pipeline-MovieAnalytics Azure-ETL-Pipeline-MovieAnalytics Public

    This project demonstrates an ETL pipeline using Microsoft Azure for IMDb Movie Rating Dataset analysis. It covers data extraction from Azure Blob Storage, transformation with Azure Databricks, and …

    Jupyter Notebook 1

  3. Flight-Price-Estimation Flight-Price-Estimation Public

    Predict flight prices using machine learning. This project involves data preprocessing, exploratory analysis, feature engineering, and model training with various regression algorithms to accuratel…

    Jupyter Notebook

  4. Customer-Churn-Analysis Customer-Churn-Analysis Public

    This is a customer churn prediction project using machine learning algorithms like Logistic Regression, Random Forest, K-Nearest Neighbors, Support Vector Machine, XGBoost, and Gradient Boosting. T…

    Jupyter Notebook

  5. Baseball-Lahman-SQL-Analysis Baseball-Lahman-SQL-Analysis Public

    Dive into the world of baseball with an in-depth analysis of players using the Lahman Baseball Database. Explore comprehensive player statistics and insights to gain a deeper understanding of playe…

    1

  6. Online-Food-Sales-Customer-Retention Online-Food-Sales-Customer-Retention Public

    A comprehensive analysis of customer retention in online food sales using various machine learning models and data preprocessing techniques.

    Jupyter Notebook