Skip to content

FL-Marine/Health-Analytics-Mini-Case-Study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 

Repository files navigation

image

Health-Analytics-Mini-Case-Study

Mini Case Study is from Danny Ma's 8 Week SQL Challenge

The Objective of this case study is to debug SQL code and assist the General Manager of Health Analytics in answering questions for a board meeting.

Prior to answering the questions, SQL best practices encourage understanding the dataset and its fields.

This dataset includes 6 columns which are named as id, log_date, measure, measure_value, systolic, and diastolic. How this was determined by running this query (See Below).

SELECT * FROM health.user_logs;

Results:

id log_date measure measure_value systolic diastolic

Due to the large amount of data, I was initally unsure of how many measures were present, running the query below displayed 3 measures blood_glucose, blood_pressure, weight.

SELECT
  (DISTINCT meausure)
FROM
  health.user_logs;

Results:

measure
blood_glucose
blood_pressure
weight

Questions 1-9 will be displayed as screenshots of the the incorrect code followed by screenshots of the correct code.

Q.1

image

This will not run properly due to user_id not being a field to COUNT. The following code will produce the correct answer

SELECT
  COUNT (DISTINCT id)
FROM
  health.user_logs;

Results:

count
554

Q.2-8 I created a temporary table*

DROP TABLE IF EXISTS user_measure_count;
CREATE TEMP TABLE user_measure_count AS
SELECT 
  id,
  COUNT(*) AS measure_count,
  COUNT (DISTINCT measure) AS unique_measures
FROM health.user_logs
GROUP BY 1;

To ensure duplicate tables are not made DROP TABLE IF EXISTS is used followed by a CREATE TEMP TABLE

Q.2

image

The above query is incorrect because the mean is calculated used the AVG function. It must be aliased as the mean_value.

SELECT
  ROUND (AVG(measure_count), 2) AS mean_value
FROM user_measure_count;

Results:

mean_value
79.23

Q.3

image

The question is asking median number of measurements Per User must order by measure_count

Median isnt a funciton in PostgresSQL Ordered Set Aggregate Functions is required to get the median value.

SELECT 
   PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY measure_count) AS median_value 
FROM user_measure_count;

Results:

median_value
2

Q.4

image

This query will not work due to the HAVING clause which filters records from groups based on a specified condition. The correct clause to use is the WHERE clause because it filters data from a specific table.

SELECT COUNT(*)
FROM user_measure_count
WHERE measure_count >= 3;

Results:

count
209

Q.5

image

Summing id values doesn't make sense because id values are individual per user and will not produce the intended results. The question is asking how many users which needs the COUNT clause.

SELECT COUNT(*)
FROM user_measure_count
WHERE measure_count >= 1000;

Results:

count
5

Q.6

image

COUNT DISTINCT needs to be inside () 'blood_sugar' needs to be replaced with 'blood_glucose'

SELECT 
  COUNT(DISTINCT id)
FROM health.user_logs
WHERE measure = 'blood_glucose';

Results:

count
325

Q.7

image

COUNT(DISTINCT measures) does not exist because it was aliased using the AS clause to unique_measures.

SELECT 
  COUNT(*)
FROM user_measure_count
WHERE unique_measures >= 2;

Results:

count
204

Q.8

image

This question is asking how many users have all 3 measurements. User was not fully spelled in the FROM clause.

SELECT
  COUNT(*)
FROM user_measure_count
WHERE unique_measures = 3;

Results:

count
50

Q.9

image

This code will not run properly because GROUP clause is missing after each WITHIN clause. blood_pressure is a string and needs to be in single quotes 'blood_pressure'.

SELECT
  PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY systolic) AS median_systolic,
  PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY diastolic) AS median_diastolic
FROM health.user_logs
WHERE measure = 'blood_pressure';

Results:

median_systolic median_diastolic
126 79

Lessons Learned

Debugging Code

Summary statisitcs (mean & median)

Sorting & filtering Data

Dropping & Creating CTE's

Releases

No releases published

Packages

No packages published

Languages