I love scikit-learn library due to the myriads of machine learning functionalities it offers as well as it's easy to follow documentation.
sklearn.pipeline module in scikit-learn library provides several classes and methods which provide a way to automate a machine learning workflow. Although, pandas dataframe input is allowed in pipelines, the output is always a numpy array which may not be desired all the time.
Here, is my solution to this problem. In this repo, you will find two python scipts:
- utils.py (contains custom classes written for for the demonstration purpose)
- test_utils.py (python script which shows demonstration of my "Pandas In, Pandas Out" pipeline scheme.
This work would not have been possible without the great resources given below:
- http://flennerhag.com/2017-01-08-Recursive-Override/
- https://github.com/marrrcin/pandas-feature-union (Please find the article link here)
Other insightful articles on this topic: