- Dependencies
- Project Introduction
- Business Understanding
- File Descriptions
- Results
- Licensing, Authors, and Acknowledgements
The code should run with no issues using Python versions 3. Other libraries used in this project are:
- scikit-learn
- numpy
- matplotlib
- seaborn
- pandas
Every year from 2017, Kaggle conducts an industry-wide survey that presents a truly comprehensive view of the state of data science and machine learning. The survey was live for 3.5 weeks in October and had 20,036 responses from over 55 countries and diverse demographics answering a wide range of questions ranging from frequently used ML algorithms, frameworks, cloud platforms, and products to the preferred programming languages and many others.
As more and more companies are entering the digital world, the role of data science is becoming very important for the growth and development of these companies. Therefore the demand of data science practitioners will continue to rise. But Data Science as a field is also continuously evolving with new tools entering into the field every now and then. Hence, it becomes extremely important to uderstand the current tools and practices of the field for the aspiring data scientists in order to enter into the field. In order to understand the current trends, tools, frameworks and practices existing in the field of Data Science, I have carried out the data analysis on the Kaggle 2020 Data Science and Machine Learning Survey (dataset) by answering 43 questions through data and visualization.
Some of the questions are:
- What is the highest level of formal education attained by the practitioners in the survey?
- Which programming languages do the data science practitioners use on a regular basis?
- Which integrated development environments (IDE's) do the data science practitioners use on a regular basis?
- Which data visualization libraries or tools do data science practitioners use on a regular basis?
- What are the job titles of Data Science practitioners?
- Is there a difference in salary of data science practitioners in India and USA? Is there a correlation between education status and salary?
There is a single notebook available here to showcase work related to the above questions. The notebook contains 4 different visualization sections.
- Part 1: Insights from demographic responses of data science practitioners
- Part 2: Tools used by the data science practitioners on regular basis
- Part 3: The skills which data science practitioners want to acquire in the coming next 2 years
- Part 4: Bivariate Analysis on specific columns (Comparison between India and USA)
The notebooks is self explanatory with necessary Markdown cells provided to guide through the notebook.
There is an additional sample_images
folder that contains the images of visualizations from the notebook for the purpose of quick demonstration of key findings in the results section below.
The key insights from the code can be found at the post available here.
Some visualizations from the data science survey
- Highest level of formal education
- Programming Languages
- IDEs
- Visualization tools/libraries
- Machine Learning Frameworks
- India vs USA (Salary Comparison)
- India vs USA (Coding experience Comparison)
- India vs USA (Education Status Comparison)
- Job Titles
Must give credit to Kaggle for the data and python 3 notebook. You can find the Licensing for the data and other descriptive information at the Kaggle link available here. Otherwise, feel free to use the code here as you would like!