Skip to content

Pavan-Satyam/datamining-cmpe255

Repository files navigation

datamining-cmpe255Homework1-DataMining-CMPE255

Colab notebook: Homework1_CMPE255.ipynb Tableau notebook and other project details are uploaded in Github Data: Data is acquired from Opendata Telangana, which is an open-source data.(link)

Problem Description: I have data regarding revenue of Telanagana .We need to derive insights on state development from the data we have.

I have data from state of Telangana regarding the state revenue. dim_districts dim_date fact_stamps fact_TS_iPASS MetaData: Column description for dim_districts: *The table contains information about districts. dist_code: This column represents the district code or identifier for each district. district: This column represents the name of the district. Column description for dim_date: This table contains the dates at the monthly level. Please be aware that the fiscal year of Telangana spans from April to March.

month: This column contains the starting date of each month.

Mmm: This column contains the name of the month.

quarter: This column contains the associated quarter for each particular month.

fiscal_year: This column contains the corresponding fiscal year of each month.

Column description for fact_stamps: The table provides data on the revenue generated from document registrations and estamp challan payments aggregated at the district and monthly level.

dist_code: This column represents the district code.

month: This column represents the starting date of each month. documents_registered_cnt: This column represents the total count of documents registered. documents_registered_rev: This column represents the total revenue generated from the registered documents which include like stamp duty, other taxes etc .

estamps_challans_cnt: This column represents the count of e-stamps challans.

estamps_challans_rev: This column represents the revenue generated by online stamp duty submissions.

Column description for fact_TS_iPASS: The TS-iPASS dataset in Telangana comprises data concerning units or businesses established within the state under the "Industrial Project Approval and Self-Certification System" (iPASS). This government initiative aims to foster industrial growth and investment by streamlining project approvals and enabling self-certification for businesses. For further details, visit: https://ipass.telangana.gov.in/

dist_code: This column represents the district code.

month: This column represents the starting date of each month.

sector: This column represents the industry category. Examples of sectors include 'Automobiles', 'Beverages', 'Engineering', 'Food Processing', etc.

investment in cr: The column represents the investment made in the specific sector, measured in crores (a unit of currency), for the corresponding district and month. number_of_employes: This column represents the number of employees associated with that sector for given district and respective month.*

Preprocessing : In table fact_TS_iPASS there are no district name ,I merged them from the 'dim_districts' file. After joining the CSV files, the output is saved as 'mer_iPAss'. I did this to make it easier since reading district names can be a bit difficult, and everyone would be more familiar with them.

Analysis 1:

What are the top districts where most of the money is invested in?

Clearly Rangareddy , Sangareddy, Medchal are the top three districts for investment.

Analysis 2 : What if stakeholder needed for different sectors?

I have implemented a dropdown menu for selecting sectors. This allows stakeholders to view individual sectors and identify the top-performing districts in each specific sector. For example, in Electrical and Electronics Production, Sangareddy ranks as the top district. Analysis 3: Is there a relation between the Stamps revenue and Investment? I have calculated a correlation between e-stamps revenue and investments in crores. Initially, we considered data from 2021 onwards and grouped it by date. The correlation between them was found to be 0.77. This suggests that when there is an increase in investment, there is a corresponding increase in e-stamps revenue, and vice versa.

Analysis 4: Which district had the most revenue in Documents registering?

This is regarding IStamps dataset.I have created a GeoJSON file to depict the evolving financial landscape of Telangana over time. Using distinct colors to denote varying revenue levels, this visualization is hosted on a local server and accessible through Tableau after logging in. Think of it as a time-lapse of the state, showcasing the generated revenue.

This visualization serves multiple purposes. It enables us to grasp revenue trends in Telangana across time, pinpoint regions with high and low revenue, and track the impact of governmental policies and initiatives. Furthermore, it facilitates comparisons between the revenue performance of different districts and regions.

In this visualization, we can observe a significant increase in revenue per district over time, as represented on the timeline.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages