This repository contains all the files related to the final course project for the 'Getting and Cleaning Data' offered by John Hopkin University on Coursera.
- run_analysis.R script : Performs the required data collection, manipulation and extraction
- final_dataSet.txt : Final Data set obtained after running the script on the data set
- CodeBook.md : Explains the data set, variables and transformations done on the original data to obtain the tidy Data
- Download the data set: https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip) and unzip the subfolders into your woring directory
- Run the script in R studio
- Make sure you set your working directory where you unzipped the subfolders.
- Run the script and verify you obtain "final_dataSet.txt" in your working directory.
- The script is broken down into five tasks:
- Aquire Data: Read the "train/X_train.txt" and "test/X_test.txt" using read.tables() and merge using rbind()
- Acquire Label: Read the "train/y_train.txt" and "test/y_test.txt" using read.tables() and merge using rbind()
- Acquire Subject: Read the "train/subject_train.txt" and "test/subject_test.txt" using read.tables() and merge using rbind()
- Read the "features.txt"
- Using the grep() function with the expression "-mean\(\)|-std\(\)" seach through the variable names to map the indices of the concerned columns. Note -mean\(\)|-std\(\) means "-mean""()" OR "-std""()"
- The indices are used to subset the Data
- Read "activities.txt" and map label to activities names
- Set the name of Subject data frame to "Subject"
- Set the name of Label data frame to "Activities"
- Finally merge the data sets using cbind()
- Using nested for loop evaluate mean of each activity of each subject
- bind it to the final data set at each evaluation