Roll your own SQL session store

Posted over 7 years back at RailsExpress.blog

The alert reader may have noticed that I'm using my own SQL session store, which provides much better performance than the default ActiveRecordStore shipping with Rails.

I have decided to publish the source code here. If you are using Mysql, simply unpack the files into your lib directory and make SQLSessionStore your session storage.

require 'sql_session_store'
require 'mysql_session'
ActionController::CgiRequest::DEFAULT_SESSION_OPTIONS.
    update(:database_manager => SQLSessionStore)
SQLSessionStore.session_class = MysqlSession
If you're using components, which you probably shouldn't, you'll need to activate eager session saving:
MysqlSession.eager_session_creation = true

The improved performance is due to a number of factors:

  • fixed number of known DB columns
  • retrieve only fields into the session that are actually used to create the session for a controller
  • offloading most of the work to the DB, including updating the created_at and updated_at fields
  • hard coded Mysql statements

As a consequence, if you want another DB, other DB column names, don't want created_at or updated_at fields, you'll have to change the code.

I have uploaded HTML representations of the call trees for ActiveRecordStore and SQLSessionStore. If you inspect them closely, you'll see how much time ActiveRecordStore spends in just creating the SQL statements and how much simpler the call tree is for SQLSessionStore.

If you adapt the provided code for other database systems, I'd be happy if you sent me the code, and I'll add it to the download after reviewing. The required changes are probably pretty small, but I currently don't have the time to test other DB systems.

Enjoy!

Rails 1.0 Release

Posted over 7 years back at RailsExpress.blog

It's party time! Rails 1.0 hit the streets this week.

Somehow I didn't manage to release a performance report on the new release on the very same day, as planned ..., but here it is.

Rails 1.0 performs slightly worse than the previous 0.14.1 release for my application. However, 1.0 is still much faster than the 0.13 release series.

The following performance data table shows the speed difference for the fastest available configuration for the tested application.

page c1 totalc2 total c1 r/sc2 r/s c1 ms/rc2 ms/r c1/c2
/empty/index 1.261111.40676 793.0710.9 1.261.41 0.90
/welcome/index 1.418141.57333 705.1635.6 1.421.57 0.90
/rezept/index 1.500771.66533 666.3600.5 1.501.67 0.90
/rezept/myknzlpzl 1.493741.65309 669.5604.9 1.491.65 0.90
/rezept/show/713 3.599883.99798 277.8250.1 3.604.00 0.90
/rezept/cat/Hauptspeise 4.540244.86669 220.3205.5 4.544.87 0.93
/rezept/cat/Hauptspeise?page=5 4.647294.97066 215.2201.2 4.654.97 0.93
/rezept/letter/G 4.590544.84387 217.8206.4 4.594.84 0.95
 c1: 0.14.1, c2: 1.0, r/s: requests per second, ms/r: milliseconds per request

The reasons for this significant slowdown have been identified and are being worked on (tickets 3155 and 3174).

You can find additional information and lots of performance data in the full report.

Technoweenie - Ruby on Rails Podcast

Posted over 7 years back at Ruby on Rails Podcast

Rick Olson explains the Rails plugin system, RJS templates, and Rails-weenie.

Dan Benjamin - Ruby on Rails Podcast

Posted over 7 years back at Ruby on Rails Podcast

A List Apart Systems Developer Dan Benjamin talks about developing a high traffic volume site with Rails.

Trim your output

Posted over 7 years back at RailsExpress.blog

The 0.14 Rails series includes a new option setting for template rendering, which allows for reducing the number of newlines generated by template processing. It's a string valued class attribute for ActionView, called erb_trim_mode. Its value is passed to the ERB compiler when a template gets compiled, with a default value of "-".

Some documentation for possible settings is part of the ERB class doc.

However, the doc is incomplete, as it doesn't mention Rails' default setting "-".

erb_trim_mode affects the way template processing handles newline characters in the template. Per default, (when erb_trim_mode is set to nil), processing leaves all newline characters occurring in the template in the generated html code. Which means that many extraneous newlines get inserted into your output.

For example,

<% if expr %>
some text
<% else %>
other text
<% end %>
will generate the following processing code:
if expr ; _erbout.concat("\n")
_erbout.concat("some text\n")
else ; _erbout.concat("\n")
_erbout.concat("other text\n")
end ; _erbout.concat("\n")

With the default setting of erb_trim_mode, you can suppress extraneous newline characters by adding a - before the closing %>:

<% if expr -%>
some text
<% else -%>
other text
<% end -%>
will generate the following template code:
if expr
_erbout.concat("some text\n")
else
_erbout.concat("other text\n")
end

This version looks much cleaner, suppresses 2 superfluous newline characters in the generated HTML code and saves 2 function calls when processed. Which is no big deal, unless the above call occurs inside a loop. If the loop body gets executed 100 times, you have already saved 200 characters and, more importantly, 200 function calls.

Manually adding all the minuses to closing %> markup is somewhat tedious. erb_trim_mode comes to rescue: if set to ">", newline characters will be suppressed for <%...%> markup closing a line; if set to "<>", newline characters will be suppressed for <%...%> occurring as a line of its own.

I have measured the relative performance of setting erb_trim_mode to ">". The following table shows that my app's pages receive a performance increase ranging from 1 to 4%.

configuration 1: 12-06.uncached.untrimmed
  requests=1000, options=-bm=uncached -lib=r141 -mysql_session
                         -fast_routes -fast_readers

configuration 2: 12-06.uncached.trimmed
  requests=1000, options=-bm=uncached -lib=r141 -mysql_session
                         -fast_routes -fast_readers -trim

page   c1 real   c2 real  c1 r/s  c2 r/s   c1 ms/r  c2 ms/r  c1/c2
 1:    3.52722   3.47518   283.5   287.8      3.53     3.48   1.01
 2:    4.30330   4.26213   232.4   234.6      4.30     4.26   1.01
 3:    4.49761   4.36027   222.3   229.3      4.50     4.36   1.03
 4:    4.45229   4.27840   224.6   233.7      4.45     4.28   1.04

urls:
 1: /rezept/show/713
 2: /rezept/cat/Hauptspeise
 3: /rezept/cat/Hauptspeise?page=5
 4: /rezept/letter/G

Unfortunately, ERB doesn't support combining the "-" and ">" setting. So you'll have to decide which way to go upfront. If you choose on ">", you may find that you have to add empty lines in some places where they weren't needed before.

Chad Fowler - Ruby on Rails Podcast

Posted over 7 years back at Ruby on Rails Podcast

Chad Fowler talks about his new book, RubyGems, and RailsConf.

Don't load stuff you don't need

Posted over 7 years back at RailsExpress.blog

Using Rails 0.14 and onward, you can give your app a slight performance boost by not loading parts of the framework that you don't need. For example, if you don't expose parts of your app as a web service, then you don't need to load actionwebservice. Likewise, if you don't send mail from your app, you don't need to load actionmailer. If you're using the Rails initializer to start your app, you can do it like so:

Rails::Initializer.run do |config|
  # Skip frameworks you're not going to use
  config.frameworks -= [ :action_web_service, :action_mailer ]
  ...
end
If you're still using an old environment.rb, just comment out the corresponding require lines:
# Require Rails libraries.
require 'active_support'
require 'active_record'
require 'action_controller'
# require 'action_mailer'
# require 'action_web_service'
Additionally, you should also load only the database adapters for the database(s) which you are using. Add the following to your environment.rb, before Rails gets required:
RAILS_CONNECTION_ADAPTERS = %w(mysql)
By default, Rails will load all connection adapters it can find. Minimizing your code footprint will speed up Rails, because Ruby traverses all abstract syntax trees stored on the heap during garbage collection. If you can avoid loading significantly sized code, you will reduce the memory footprint of your app, thereby speeding up garbage collection. It might also help to increase object lookup at runtime due to smaller sized data structures containing meta information.

Speeding up the creation of new sessions

Posted over 7 years back at RailsExpress.blog

There's a tiny peculiarity in the cgi/session.rb, which slows down the creation of new sessions quite a bit. The library digest/md5 gets required on creation of new sessions in method create_new_id. This can be rather expensive to execute. Move the require line to the beginning of the file, and your session creation will run faster. In environments where require is really slow, it can make a big difference (almost 3 times faster), as you can see from this data:

perf data file 1: 11-22.new.md5_unfixed
  requests=1000, options=-bm=new_sessions 
perf data file 2: 11-22.new.md5_fixed
  requests=1000, options=-bm=new_sessions

page              c1 real   c2 real    r/s    r/s   ms/r   ms/r  c1/c2
/empty/index      7.07800   2.39067  141.3  418.3   7.08   2.39   2.96
/welcome/index    7.07800   2.65100  141.3  377.2   7.08   2.65   2.67
On my Linux machine, the difference is less pronounced, but still significant:
page              c1 real   c2 real    r/s    r/s   ms/r   ms/r  c1/c2
/empty/index      1.47967   1.25513  675.8  796.7   1.48   1.26   1.18
/welcome/index    1.63819   1.41946  610.4  704.5   1.64   1.42   1.15
Maybe the require was placed inside function create_new_id to avoid loading the digest/md5 package in CGI environments when the request already passes a session id. But since Rails will usually run under FCGI/SCGI, moving it surely doesn't hurt! If you also have Ruby CGI apps running on the same machine, you can copy the file into your load path for Rails and modify this copy instead.

Components may not be evil, but they sure can be slow

Posted over 7 years back at RailsExpress.blog

Let me start with a confession: I've been ignoring Rails components for a while, as I just couldn't figure out what they where good for. Meaning of course, that I didn't need them for my application ;-)

However, after DHH posted his opinions on components on the Rails weblog, I thought I'd better a have a closer look, what they could do for me and also inspect their implementation details. Additionally, I discovered that Typo uses components to display various parts of the front page of my blog. And it seemed pretty slow at doing that. So I smelled a performance rat.

Basically, components enable you to capture the presentation logic of a controller and embed it into another controller. This is implemented by duplicating the current request object, starting all over with request processing, capturing the rendering results and embedding them in the page that called render_component in the first place.

This comprises creating a new controller object, in order enable request processing, and a new view object for template rendering. This already sounded like it wouldn't be particularly efficient, but the results I got, after creating a benchmark configuration for my blog and running railsbench to get some performance data, were shocking. Here's my benchmark data:

page request                            total  stddev%     r/s    ms/r
1:/                                  14.86220   0.1405    6.73  148.62
2:/articles/2005/11/08/accelera...   13.83662   0.1071    7.23  138.37
3:/admin                              1.47821   0.9429   67.65   14.78
4:/admin/content                      2.95742   0.2235   33.81   29.57
5:/admin/pages                        2.33728   0.0951   42.78   23.37
6:/admin/categories                   2.09189   0.2452   47.80   20.92
7:/admin/sidebar                     13.99597   0.9474    7.14  139.96
8:/admin/themes                       5.44847   0.3377   18.35   54.48
9:/admin/users                        1.48696   1.0194   67.25   14.87

Page / measures the main blog page, the second entry measures a single article, and the remaining entries present data on the blog's admin interface.

I just couldn't believe my eyes. Only 6.7 requests per second to generate my blogs entry page? Unbelievable (on an Athlon 64 3000+). Especially since my own app runs over 200 requests per second on the same box.

After studying call patterns using Ruby Performance Validator, which is a really good tool for this task btw., I found out that Typo was loading its configuration from the database several times per request. It also tried to figure out the current page's feed url just as often. That seemed awfully wasteful of resources.

The repetitions of the aforementioned actions were caused by 2 before filters installed on the application controller:

class ApplicationController < ActionController::Base
  before_filter :reload_settings
  before_filter :auto:discovery_defaults

  def reload_settings
    config.reload
  end

  def auto_discovery_defaults
    auto_discovery_feed(:type => 'feed')
  end

  def auto_discovery_feed(options)
    options = {:only_path => false, :action => 'feed',
               :controller => 'xml'}.merge options
    @auto_discovery_url_rss = url_for({:format => 'rss20'}.merge options)
    @auto_discovery_url_atom = url_for({:format => 'atom10'}.merge options)
  end
end

And sidebar content (archive, flickr, tada, etc.) gets delivered through components, which are all derived from ApplicationController, thus inheriting the filters. Bingo!

There are, of course, several ways to fix this problem, I chose the simplest one, taking advantage of the fact that the request object passed to components gets duplicated for component processing. Upon entering the controller responsible for the request, config will be loaded and a processing flag is stored as an instance variable in the request object. Similarly, the values retrieved during feed get stored there as well. When a component gets invoked it retrieves these values from the request object.

def reload_settings
  unless @request.instance_variable_get(:@config_reloaded)
    @request.instance_variable_set(:@config_reloaded, true)
    config.reload
  end
end

def auto_discovery_defaults
  @auto_discovery_url_rss =
      @request.instance_variable_get(:@auto_discovery_url_rss)
  @auto_discovery_url_atom =
       @request.instance_variable_get(:@auto_discovery_url_atom)
  unless @auto_discovery_url_rss && @auto_discovery_url_atom
    auto_discovery_feed(:type => 'feed')
    @request.instance_variable_set(:@auto_discovery_url_rss,
                                    @auto_discovery_url_rss)
    @request.instance_variable_set(:@auto_discovery_url_atom,
                                    @auto_discovery_url_atom)
  end
end

Running railsbench again and comparing the results shows that the most important pages are now rendered 4 to 5 times as fast:

page    c1 real   c2 real  c1 r/s  c2 r/s  c1 ms/r  c2 ms/r  c1/c2
1:     14.86220   3.75458     6.7    26.6   148.62    37.55   3.96
2:     13.83662   2.70294     7.2    37.0   138.37    27.03   5.12
3:      1.47821   1.29960    67.6    76.9    14.78    13.00   1.14
4:      2.95742   2.71447    33.8    36.8    29.57    27.14   1.09
5:      2.33728   2.13827    42.8    46.8    23.37    21.38   1.09
6:      2.09189   1.73313    47.8    57.7    20.92    17.33   1.21
7:     13.99597   2.73185     7.1    36.6   139.96    27.32   5.12
8:      5.44847   1.42805    18.4    70.0    54.48    14.28   3.82
9:      1.48696   1.15061    67.3    86.9    14.87    11.51   1.29

Note: c1 and c2 label data of the original/modified application. r/s are requests per second and ms/r are milliseconds per request.

Now that's some niiiiiice improvement!

I think that it's pretty easy to shoot oneself in the foot using components. I tend to avoid them. But sometimes they can be nice, in that case: watch out for da slow filters! Either don't derive the components from ApplicationController, make sure that the filters are fast, possibly using the technique presented in this article, or turn them off completely, e.g. if you don't really need to run them and or can get at the necessary data using some other technique.

The possibility to disable filters in derived controller classes was recently added to Rails. So we could have said

  skip_before_filter :reload_config

on the component classes, instead of storing a flag inside the request object. It should be even faster than storing flags on the request object, because the filter will be removed from the filter chain during class load time. Maybe I'll measure that at a later time.

Accelerate your dates

Posted over 7 years back at RailsExpress.blog

When I started work on improving my app's speed, I noticed that on pages that display lots of date information, a large percentage of CPU time was spent on formatting date strings. I added a quick fix to my code, was satisifed and quickly forgot about it.

But recently I read a message on the ruby talk mailing list, where someone more or less rejected Rails after writing a small Ruby program for log file analysis that ran for about 27 minutes on a 100MB log file and a literally translated Java program finished in only 2.5 minutes blog entry.

It turned out that the inefficiency of the Ruby script was caused by the Ruby Date class, as someone rewrote the program using regexps to parse date info that needed only 3 minutes to complete the task.

I'm pretty sure that the Date class performance could be much improved, for example by rewriting it in C, but we'll leave this to the ruby core team. For page speedup we simply resort to avoiding construction of Date objects altogether. For Mysql, the Rails database layer will always return date information as strings. The strings are formatted as 'YYYY-mm-dd', i.e. they will have ISO date format. Such a string can of course be turned into 'dd.mm.YYYY' easily using Ruby primitives. So instead of writing

<%= r.created_at.strftime "dd.mm.YYYY" %>

or

<%= r.created_at %>

we'll change this into

<%= r.created_at_formatted %>

and add a function created_at_formatted to our model, which will access the attributes hash directly and transform the string into our desired format. One possible way to code this function is

def created_at_formatted
  d = @attributes['created_at']
  "#{d[8,2]}.#{d[5,2]}.#{d[0,4]}"
end

another is

def created_at_formatted
  @attributes['created_at'] =~ /^(\d\d\d\d)-(\d\d)-(\d\d)/
  "#{$3}.#{$2}.#{$1}"
end

This assumes that created_at is a non null column. If it isn't, you'll need to add some guards for null values.

The first version comes out slightly faster than the second, which I expected. But how much faster than the original code?

On pages with 25 dates displayed it is approximately 70% faster than the original version. On a page with 1 date, it's already 6% faster.

If you have pages with a large number of date displays, you might consider this technique to make them a tad snappier :-)

Update: As mentioned by henri, the technique does not apply to the oracle adapter, as it returns Time objects. As time permits I will look into the other adapters, to find the "bad boys" which return strings.

The case for piggy backed attributes

Posted over 7 years back at RailsExpress.blog

If you're subscribed to the rails mailing list, you might have seen some people talking about piggy backing attributes onto SQL queries.

This is a feature of the Active Record ORM mapping that I like very much, because it can be used to speedup you queries enormously. And I hope it won't be eliminated, ever.

Suppose there's a 1:n relationship between models A and B, i.e., for each A record there's exactly one B record (think B owns A). In my case that would be recipes and users.

In ActiveRecord you'd have class declarations like this:

class User < ActiveRecord::Base
  has_many :recipes
end
class Recipe < ActiveRecord::Base
  belongs_to :user
end

Your usual code to retrieve some recipes with their associated authors will probably look like this:

def show_some
  @recipes = Recipe.find(:all, :limit => 50)
end

with the show_some template being rendered implicitely and looking similar to:

<% for r in @recipes %>
  <%= r.title %>, <%= r.user.name %>
  <br>
<% end %>

There's nothing wrong with this code, except that it's kind of slow:

  • there will be 51 queries to the database, because for each recipe the owning user (author) will be fetched in a separate query

  • all these queries are constructed dynamically upon calling r.user.name

We can do a little better than that: since approx. 0.13.1 Rails supports fetching associated objects by specifying our intention to access them via the :include syntax for find (called eager loading of associations). So we'd actually write our query like this:

def show_some
  @recipes = Recipe.find(:all, :limit => 50, :include => [:user])
end

But how much faster is this? The answer is: it depends. On how fast DB access is, for example. Whether your DB lives on another host, network speed, etc.

For a simple installation on one box, I have measured this (using railsbench, 1000 requests each, patched GC):

page request               total  stddev%     r/s    ms/r
/rezept/some_default    17.83765   0.2579    56.1   17.84
/rezept/some_include    13.25508   0.3499    75.4   13.26

Speedup is 34%, which is OK, but not spectacular.

We can do better than that using piggy backed attributes. First, we'll change the query to:

Recipe.find(:all, :limit => 50,
            :conditions => "r.user_id=u.id",
            :joins => 'r, users u',
            :select => 'r.*, u.name AS user_name')

We still have only one query to the DB, this time involving a join, which could be slow. But don't worry, databases were constructed to perform such queries as fast as possible.

In addition to the recipe columns we now get 'user_name' included in the retrieved records. To make access to this attribute fast, we'll add a user_name function to our model:

class Recipe < ActiveRecord::Base
  belongs_to :user
  def user_name
    @attributes['user_name']
  end
end

and change our template to:

<% for r in @recipes %>
  <%= r.title %>, <%= r.user_name %>
  <br>
<% end %>

Is this faster? Answer: you can bet!

page request               total  stddev%     r/s    ms/r
/rezept/some_default    17.83765   0.2579    56.1   17.84
/rezept/some_include    13.25508   0.3499    75.4   13.26
/rezept/some_piggy       4.19032   0.1059   238.6    4.19

It's whopping 4 times faster than the first version. And 3 times faster than the :include version.

I can almost hear you scream "But this code sucks! The original version was sooo much nicer. I hate adding this extra function to my model." My answer is: if your app is fast enough for you, don't do it. If it isn't, this is a nice opportunity for speedup.

Also, if your rails app lives in a shared hosting environment, it means being nice to everyone else to make it faster.

And if you weren't looking for speed, you wouldn't be reading this blog in the first place ;-)

session :off !

Posted over 7 years back at RailsExpress.blog

Rails 0.14 contains the possibility to turn off sessions on a controller or even action basis.

class WelcomeController < ApplicationController
  session :off
  ...
end

This will turn off session creation and update on all actions of class WelcomeController.

Depending on the complexity of the logic inside your actions, you can get a big speedup. For example, on my welcome controller, index is action cached. Turning sessions off for this action, I see a huge performance improvement on my windows development box:

c1 real   c2 real  c1 r/s   c2 r/s  c1 ms/r  c2 ms/r  c1/c2
2.29700   0.27633   217.7   1809.4     4.59     0.55   8.31
2.45833   0.29200   203.4   1712.3     4.92     0.58   8.42

This was measured against the default ActiveRecordStore session implementation. But even with a faster session implementation, the difference is astonishing:

c1 real   c2 real  c1 r/s   c2 r/s  c1 ms/r  c2 ms/r  c1/c2
1.20333   0.26600   415.5   1879.7     2.41     0.53   4.52
1.35933   0.27633   367.8   1809.4     2.72     0.55   4.92

So it is certainly worthwhile to consider turning sessions off for controllers that don't need them.

Don't use Pstore as a session container

Posted over 7 years back at RailsExpress.blog

You really shouldn’t be using Pstore for Rails. Apart from the fact that it doesn’t scale to a setup with more than 1 box, it is much slower than using the database for session storage, for example.

Using a Mysql session store can be up to 90% faster. Under, Linux, that is.

If you’re on a Windows box, the difference can be staggering: up to 560%!

Faster Pagination

Posted over 7 years back at RailsExpress.blog

Rails version 0.14 and upward contain a massive performance improvement for pagination. Most of it is free and any app will benefit from it without even knowing.

However, there's also a little api addition to the pagination helper module, which can be used to speed up your paginated pages even more, if necessary.

We're talking about pagination_links_each. It takes the same parameters as pagination_links but instead of calling link_to to create the html code for each link, it yields the page number to a block. It is the block's responsibility to create the html code for the link.

So for example, you could write

pagination_links_each(pages, :window_size => 4) do |n|
    "<a href='?page=#{n}'>#{n}</a>"
end

instead of

pagination_links(pages, :window_size => 4)

The first version is faster than the second one, because it avoids creating lots of intermediate data structures to generate the html code. Additionally, the created html is shorter and less data gets transmitted over the wire. And the code will work for any page embedding it.

The speedup obtained will of course depend on your app. Here's some data for paginated pages of my application, showing speedups ranging from 11 to 26%. Quite an improvement for adding one line of code!

Blog opened!

Posted over 7 years back at RailsExpress.blog

So finally I managed to get me a place in the blogosphere. Hey!

The main purpose of this blog will be to keep a log of (my) activities related to improving performance of the Rails web application framework.

A good deal of work already went into performance improvement since realease 0.12.1. To sum it up: I managed to improve the throughput of a recipe database application by a factor of 4 since I started this work, going from around 50 requests per second to over 220 requests per seconds.

In order to ease performance measurements of Rails applications, I have developed the package railsbench. It really does make performance regression tests easy, so give it try!

And stay tuned for further improvements :-)