- Designed data warehouses for employment data on the labor market.
- Implementation of ETL processes loading information from the csv file to the data warehouse.
- Implementation of analytical business reports used to visualize statements of data from the data warehouse.
- The data warehouse should collect data on the number of people working with regard to: industry (branża), location (lokalizacja), gender (płeć) and age range (przedział wieku). According to the csv file, identify which of the of the listed attributes are dimensions and which represent a measure. Then a star-shape diagram should be developed with the necessary dimension tables and fact table. All designed tables should be on the graphical diagram.
- Using the Pentaho Data Integration tool, you should implement ETL processes that
will allow you to load data from the provided csv file into the data warehouse tables with
star-shape diagram. Note the following difficulties:
- The data are contaminated and should be ETL corrected, e.g. size letters in industry names (eg "Edukacja", "EdukacjA"), short or full gender name (eg "K", "kobiety"), excess spaces (eg "Transport, gospodarka magazynowa___i łączność")
- The age dimension is written in columns instead of rows; it is necessary "rotating" the columns with age brackets to a vertical layout
- the csv file uses national character encoding compatible with Windows1250
- Using Pentaho Report Designer you need to implement three analytical business reports. They must in these reports
the following elements may occur:
- presentation of data in tabular form
- presentation of data in the form of a graphical chart
- data presentation in the form of a pivot table (matrix layout)
- report calling parameter, entered by the user
- a graphic element, e.g. a logo