Given an existing application which generates a report from a large data set, improve the efficiency of the report using database optimization methods.
After completing this assignment, you should...
- Understand the downsides of loops within loops in Rails.
- Understand the benefits and appropriate use of indices on database tables.
- Understand the downside of indices.
- Be able to measure the runtime of various webapp functions.
- Be able to query the database more efficiently.
- Be able to implement database indices.
- An estimate. After you read through this assignment (but before you start coding), write down a number of hours that you expect each part to take (1-3). Record your hours as you go.
- A repository. You will be working from the existing application found in this repository, but you will make your own copy.
- A README. The README should include data on all of the metrics requested below.
- A test suite. Build your application using TDD. Your test suite must include unit tests, controller tests, and at least two integration tests.
- A reflection on your estimate.
For this project, you will be starting with an application which runs very slowly. This ineffiency is due partly to the sheer amount of data present, but mostly due to the structure of the code and the database. Your task is to make it run in a reasonable amount of time.
Once you pull down the application from GitHub, run bundle install
and rake db:migrate
, then follow the steps below.
- Run
rake db:seed
, but time it. Record the amount of time it takes for the seeds to run. - Turn on your server and open your browser.
- Open Chrome's timeline in developer tools, then go to
localhost:3000
. - Determine how long it takes the index page to load. Record that time.
- Add appropriate indices to the data structure (via migrations).
- Record how long it takes to run the migrations that add indices.
- Use Chrome's developer tools to determine how long it takes the index page to load. Record that time.
- Calculate your percent improvement in runtime.
- Examine the code that is run when the root path loads. Modify the commands which access the database to make them more efficient.
- Calculate your percent improvement in runtime.
- Once you have optimized your code as much as you think you can, drop the database, run
rake db:migrate
, and then time how long it takes to runrake db:seed
. Was there an improvement or a worsening of runtime? By what percent and why? - Which is faster: (a) running
rake db:seed
without indices and then running a migration to add indices, or (b) adding indices during your initialrake db:migrate
, then runningrake db:seed
?
You've done a good job of analyzing runtime, but now take a look at storage space:
- Record the size of your database (in bytes).
- Record the size of your development log.
- Give at least one method (feel free to Google) for reducing the size of one of these, yet keeping your data intact.
- Do you think that this is smaller, about right, or larger than the size of databases you'll be working with in your career?
This data structure has a number of tables connected with a series of one-to-many relationships between them. A more advanced way to improve efficiency would be to cache the id of the upper-most (ancestor) table's id in a field in the lower-most (descendant) table. To accomplish this, do the following:
- Write a migration to add this cached foreign key.
- Write callbacks to maintain this foreign key appropriately. Hint: you will need more than one.
- Modify the report to use this new cached field instead of the actual id stored in the ancestor table.
- Measure the improvement in runtime.