run_analysis.R
Program submitted for course project
Submitted by: Chiau Ho ONG
The run_analysis.R program does the following:
Steps 1 to 8 gets us the data set required for step 4 of the assignment
- Reads in the test and train data sets ("X_test.txt" and "X_train.txt")
- Combines the two data sets into one (total_data)
- Reads in the descriptions of each column (variables) from file "features.txt". This file contains the description of each column of the test and train data sets.
- Give meaningful description to each column of total_data by naming each column of total_data with variables read from the "features.txt". The number of columns and number of variables matches up nicely
- Now extract the columns that records the mean and std measurements only. Ignore all other columns. The data frame total_data now contains only columns with mean and std measurements.
- Now reads in subject id and activities files ("subject_test.txt", "y_test.txt", "subject_train.txt" & "y_train.txt"). Merge Subject_test & Subject_train into one. Merge y_test & y_train into one.
- Now append subject and y (activities) as columns 1 and 2 to total_data. The new sets is c_total_data. Give meaningful column names to Subject & activities
- Finally give meaningful activity names to the data in y. c_total_data is the required data set in step 4 of the assignment.
Now for step 5 - create a tidy data set with average of each variable for each activity and each subject:
- Use the plyr package. ddply function is handy. This function takes in a data frame and returns a data frame.
- Apply it to c_total_data data frame. Tells ddply to split by Subject_ID and Activities. Then apply mean to each column based on Subject_ID and Activity type for that Subject_ID. For example Subject_ID 1 & Activity STANDING, return the mean of each column.
- Change the variable names to reflect that they are now the mean values
- Finally write the resultant data frame to a txt file "tidy_data.txt"