This project focuses on analyzing glacier data in the USA, utilizing various data sources and machine learning techniques to predict glacier types and visualizing USA glacier related trends such as location, area of glacier, elevation and temperature.
- Glacier Dataset: NASA - NSIDC Data (https://nsidc.org/home)
- Temperature Dataset: NCEI Climate at a Glance (https://www.ncei.noaa.gov)
- Location Data: Simple Maps US Cities (https://simplemaps.com/data/us-cities)
- Analysis: Contains Statistical Analysis workbook
- Dashboard: Final dashboard and resource files with images and HTML files. Select
index.html
to run the dashboard. - Database: SQL database
- Notebook: Jupyter notebooks for data cleaning and modeling.
- Resources: Original data and resources used.
- .gitattributes
- .gitignore: Ensures unnecessary files and folders are excluded.
- README.md
- Data Acquisition: Data about glaciers was acquired from NASA in .kml format and supporting data related to temperature and location.
- Cleaning: Loaded into Jupyter Notebook, removed unnecessary columns, and cleaned data.
- Saving: Cleaned data was saved to a CSV file, loaded into a database, and then into a dataframe.
- Separation: Numerical data was separated from categorical data.
- Dummy Variables: Created dummy variables from categorical data and merged them back.
- Splitting: Data was split into testing and training sets.
- Model Initialization: Initialized
RandomForestClassifier
and trained the model. - Prediction: Initially predicted glacier existence, but shifted to predicting glacier types due to poor initial results.
- Performance: Achieved an R-square score of 93% for predicting glacier types.
- Temperature Time Series: Average temperatures from 1990 to 2024.
- Scatter Plot: Shows the distribution of areas across different clusters. Most clusters have smaller areas, but a few outliers have significantly larger areas, with an average cluster size of about 144.0712 square units.
- Stacked Line Graph: Displays the difference between the average minimum and maximum elevation in each glacier cluster.
- Location Scatterplot: Visualizes the location of clusters based on latitude and longitude, forming the western border of Alaska where most American glaciers are located.
- Glacier Map with Interactive Features:
- Map of Glaciers: Displays the locations of glaciers.
- Dropdown Menu: Allows selection of a specific glacier.
- Specifications: Shows the date of analysis, area, and location.
- Hover Box: Indicates whether the glacier still exists or no longer exists.
- Geographical Scope: Data is limited to the USA, primarily Alaska and northern parts.
- Feature Imbalance: Uneven amount of features compared to documentation.
- Recommendation: Use global data for a more comprehensive analysis.
This project showcases the process of data cleaning, modeling, and visualization to analyze glacier data in the United States. The model accurately predicted glacier type. The model demonstrated high accuracy in predicting glacier types. Visual representations, integrating variables such as temperature, area, geographical location, and elevation, provided significant insights into glacier trends. To further enrich these findings, integrating higher-quality data, additional environmental factors, and expanding the analysis to include global data over extended periods could deepen our understanding of glacier dynamics and enhance our predictive capabilities regarding their formation and evolution.