Used_Car_Price

Predict the car price in used vehicles listings from Craigslist.org

Context

Craigslist is the world's largest collection of used vehicles for sale, yet it's very difficult to collect all of them in the same place. I built a scraper for a school project and expanded upon it later to create this dataset which includes every used vehicle entry within the United States on Craigslist.

Content

This data is scraped every few months, it contains most all relevant information that Craigslist provides on car sales including columns like price, condition, manufacturer, latitude/longitude, and 18 other categories. For ML projects, consider feature engineering on location columns such as long/lat. For previous listings, check older versions of the dataset.

This is a Kaggle dataset which can be found in this link: https://www.kaggle.com/austinreese/craigslist-carstrucks-data

I followed in this project the steps of the project management method called CRISP-DM. This method has undergone modifications aimed at the reality of a Data Science project and with that it was called CRISP-DS.

Your main principle is doing the project following multiples cycles as the necessity.

1 - Business Question

2 - Understanding the Business

3 - Data Collection

0.0 - IMPORTS
0.1 - Helper Function
0.2 - Loading Data

4 - Data Cleaning

1.0 - DESCRIPTION OF DATA
1.1 - Rename Columns
1.2 - Data Dimensions
1.3 - Data Types
1.4 - Check NA
1.5 - Fillout NA
1.6 - Change Types
1.7 - Descriptive Statistical
- 1.7.1 - Numerical Attributes
- 1.7.2 - Categorical Attributes
2.0 FEATURE ENGINEERING
2.1 - Creation of Hyphoteses
- 2.1.1 - Demographic Hyphoteses
- 2.1.2 - Geographic Hyphoteses
- 2.1.3 - Sociocultural Hyphoteses
2.2 - Final list of Hypotheses
2.3 - Feature Engineering
3.0 - VARIABLE FILTERING
3.1 - Line filtering
3.2 - Column Selection

5 - Data Exploration

4.0 - EXPLORATORY DATA ANALYSIS
4.1 - Univariate Analysis
- 4.1.1 - Response Variable
- 4.1.2 - Numerical Variable
- 4.1.3 - Categorical Variable
4.2 - Bivariate Analysis
- 4.2.1 - Summary of Hyphoteses
4.3 - Multivariate Analysis
- 4.3.1 - Numerical Attributes
- 4.3.2 - Categorical Attributes

6 - Data Modeling

5.0 - DATA PREPARATION
5.1 - Normalization
5.2 - Rescaling
5.3 - Transformation
- 5.3.1 - Encoding
- 5.3.2 - Response Variable Transformation
- 5.3.3 - Nature Transformation
6.0 - FEATURE SELECTION
6.1 - Split dataframe into training and test dataset
6.2 - Boruta as Feature Selection
- 6.2.1 - Best Feature from Boruta

7 - Machine Learning Algorithms

7.0 - MACHINE LEARNING MOMDELLING
7.1 - Average Model
7.2 - Linear Regression Model
- 7.2.1 - Linear Regression Model - Cross Validation
7.3 - Linear Regression Regularized Model
- 7.3.1 - Linear Regression - Lasso - Cross Validation
7.4 - Random Forest Regressor
- 7.4.1 - Random Forest Regressor - Cross Validation
7.5 - XGBoost Regressor
- 7.5.1 - XGBoost Regressor - Cross Validation
7.6 - Compare Model's Performance
- 7.6.1 - Single Performance
- 7.6.2 - Real Performance - Cross Validation
8.0 - HYPERPARAMETER FINE TUNING
8.1 - Random Search
8.2 - Final Model

8 - Evaluation of Algorithms

9.0 - TRANSLATION AND INTERPRETATION OF THE ERROR
9.1 - Business Performance
9.2 - Total Performance
9.3 - Machine Learning Performance

9 - Production Model

10.0 - DEPLOY MODEL TO PRODUCTION
10.1 - Energy Consumption Class
10.2 - API Handler
10.3 - Tester

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
image		image
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Used_Car_Price.ipynb		Used_Car_Price.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Used_Car_Price

Context

Content

1 - Business Question

2 - Understanding the Business

3 - Data Collection

4 - Data Cleaning

5 - Data Exploration

6 - Data Modeling

7 - Machine Learning Algorithms

8 - Evaluation of Algorithms

9 - Production Model

About

Releases

Packages

Languages

License

panambY/Used_Car_Price

Folders and files

Latest commit

History

Repository files navigation

Used_Car_Price

Context

Content

1 - Business Question

2 - Understanding the Business

3 - Data Collection

4 - Data Cleaning

5 - Data Exploration

6 - Data Modeling

7 - Machine Learning Algorithms

8 - Evaluation of Algorithms

9 - Production Model

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages