Skip to content

Using A.I. and computer vision to build a virtual personal fitness trainer. (Most Startup-Viable Hack - HackNYU2018)

Notifications You must be signed in to change notification settings

jrobchin/phormatics

Repository files navigation

This repo is NOT actively maintained and may not work out of the box as it has been 3 years since the last update. If you want to learn more about the next version of this project, check it out here: https://www.youtube.com/watch?v=tZcRYcjTwWA.

Phormatics: Using AI to Maximize Your Workout


f1: front page (the gif may be choppy at first, but it's worth it I promise)

by: Jason Chin , Charlie Lin , Brad Huang , Calvin Woo

HackNYU2018 project developed in 36 hours, focusing on using A.I. and computer vision to build a virtual personal fitness trainer. Capable of using 2D human pose estimation with commodity web-cameras to critique your form and count your repetitions.

This project won the award for "The Most Startup-Viable Hack" as awarded by Contrary Capital.

2D Human Pose Estimation:


f2: live pose estimation in a busy environment; note: here the user has over-extended their right arm (image is mirrored), which is considered bad form in this variant of the dumb bell shulder press, hence the message.

The pose estimation was based off of tf-pose-estimation by ildoonet. The model architecture, OpenPose developed by CMU Perceptual Computing Lab, consists of a deep convolutional neural network for feature extraction (MobileNet) and a two-branch multi-stage CNN for confidence maps and Part Affinity Fields (PAFs).

This feature allowed us to track the position of the user's joints using a commodity webcam.

Data Flow (Web Based):

f3: pseudo data flow diagram; note: the pose estimation model output must be processed as it returns pose estimation for all possible humans in frame (see: Future Changes [1]).

This app runs in browser and the pose estimation and form critique generation is performed on a Flask server. The webcam feed is captured using WebRTC and screenshots are sent to the server as a base64 encoded string every 50ms or as fast as the server can respond - which ever is slower (see: Future Changes [2]).

This means the server could be run in the cloud on high-performance hardware and the client could be any device with a WebRTC-supported web browser and camera. There is also the option for video to be recorded and sent to the server for post-processing if the user's network connectivity is too slow to stream a live feed.

Currently Supported Exercise Analysis:

  • Squat: exaggerated knees-forward checking
  • Dumbbell Shoulder Press: exaggerated arm bend and extension checking
  • Bicep Curls: horizontal elbow deviation from shoulder checking

Future Changes:

  1. Multiple Pose Estimations for One User

    Current: The model estimates joints for all subjects found in the input image; we then analyze the output and extract the pose that is most likely to be the user.

    Possible Improvements:

    a. Modify model and training data to only estimate a single 'best' pose.

    or

    b. Implement re-identification and support multiple users at once. This is viable as forward propagation time does not increase with multiple poses being estimated.

  2. Webcam Image Data Transfer

    Current: Webcam captures are encoded in base64 strings and a post request is sent to the server with the data (note: this was done for ease of implementation due to the hackathon time constraint).

    Possible Improvements: Implement web sockets to transfer webcam captures instead.