Skip to content

This is a Data Visualization repository of Customer data using Sankey Chart.

License

Notifications You must be signed in to change notification settings

shanzson/Customer-Data-Visualization-using-Sankey-Chart

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ask Me Anything ! made-with-Markdown GitHub license

Customer-Data-Visualization-using-Sankey-Chart

Introduction

The data consists of three columns- userid, To and From. To visualize and analyze the data this data- histograms, boxplots and Sankey Chart has been.

Histograms

Histograms allow us to see the frequency of data points that lie within a certain range.

Box plots

  • Here, the boxplots of to and from data give us the minimum and maximum values of To and From data which will help us in Plotting the Sankey Chart

Plotting the Sankey Chart

In order to plot the Sankey chart, the data points need to be divided into regions. We will be plotting will be plotting the To values from the data as source for the Sankey diagram and From values from the data as target. In order to do this, we will need to make regions so that we can label the Sankey chart. The following regions or labels have been used for these data values-

Label Range
Region One 100 to 114
Region Two 115 to 129
Region Three 130 to 144
Region Four 145 to 159
Region Five 160 to 174
Region Six 175 to 189
Region Seven 190 to 204

As you can see, a difference of 15 is used to divide the data points into these regions. We will be using this to rank the region of both To and From data.

Sankey Chart

We can see the following Sankey chart below-

In order to find out the number of unique users that have gone from one region to another, just hover the arrow over the flow path. Here we can see that 12 unique users have gone from Region Two to Region One.

The incoming flow count, outgoing flow count and the total number of users passing through any region can be found out by clicking on the label itself. Here we can see that the total number of unique users passing through Region Two is 80.

Primary packages Used

  • pandas
  • matplotlib
  • plotly

References

Contributors

anupam
Shantanu Sontakke

About

This is a Data Visualization repository of Customer data using Sankey Chart.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages