Posted 22 Nov 2005
There's a tiny peculiarity in the
cgi/session.rb, which slows down the creation of new sessions quite a bit. The library
digest/md5 gets required on creation of new sessions in method
create_new_id. This can be rather expensive to execute. Move the require line to the beginning of the file, and your session creation will run faster. In environments where require is really slow, it can make a big difference (almost
3 times faster), as you can see from this data:
perf data file 1: 11-22.new.md5_unfixed
requests=1000, options=-bm=new_sessions
perf data file 2: 11-22.new.md5_fixed
requests=1000, options=-bm=new_sessions
page c1 real c2 real r/s r/s ms/r ms/r c1/c2
/empty/index 7.07800 2.39067 141.3 418.3 7.08 2.39 2.96
/welcome/index 7.07800 2.65100 141.3 377.2 7.08 2.65 2.67
On my Linux machine, the difference is less pronounced, but still significant:
page c1 real c2 real r/s r/s ms/r ms/r c1/c2
/empty/index 1.47967 1.25513 675.8 796.7 1.48 1.26 1.18
/welcome/index 1.63819 1.41946 610.4 704.5 1.64 1.42 1.15
Maybe the require was placed inside function
create_new_id to avoid loading the
digest/md5 package in CGI environments when the request already passes a session id. But since Rails will usually run under FCGI/SCGI, moving it surely doesn't hurt!
If you also have Ruby CGI apps running on the same machine, you can copy the file into your load path for Rails and modify this copy instead.
View Comments
Posted 18 Nov 2005
Let me start with a confession: I've been ignoring Rails components for a while, as I just couldn't figure out what they where good for. Meaning of course, that I didn't need them for my application ;-)
However, after DHH posted his opinions on components on the Rails weblog, I thought I'd better a have a closer look, what they could do for me and also inspect their implementation details. Additionally, I discovered that Typo uses components to display various parts of the front page of my blog. And it seemed pretty slow at doing that. So I smelled a performance rat.
Basically, components enable you to capture the presentation logic of a controller and embed it into another controller. This is implemented by duplicating the current request object, starting all over with request processing, capturing the rendering results and embedding them in the page that called in the first place.
This comprises creating a new controller object, in order enable request processing, and a new view object for template rendering. This already sounded like it wouldn't be particularly efficient, but the results I got, after creating a benchmark configuration for my blog and running railsbench to get some performance data, were shocking.
Here's my benchmark data:
page request total stddev% r/s ms/r
1:/ 14.86220 0.1405 6.73 148.62
2:/articles/2005/11/08/accelera... 13.83662 0.1071 7.23 138.37
3:/admin 1.47821 0.9429 67.65 14.78
4:/admin/content 2.95742 0.2235 33.81 29.57
5:/admin/pages 2.33728 0.0951 42.78 23.37
6:/admin/categories 2.09189 0.2452 47.80 20.92
7:/admin/sidebar 13.99597 0.9474 7.14 139.96
8:/admin/themes 5.44847 0.3377 18.35 54.48
9:/admin/users 1.48696 1.0194 67.25 14.87
Page / measures the main blog page, the second entry measures a single article, and the remaining entries present data on the blog's admin interface.
I just couldn't believe my eyes. Only 6.7 requests per second to generate my blogs entry page? Unbelievable (on an Athlon 64 3000+). Especially since my own app runs over 200 requests per second on the same box.
After studying call patterns using Ruby Performance Validator, which is a really good tool for this task btw., I found out that Typo was loading its configuration from the database several times per request. It also tried to figure out the current page's feed url just as often. That seemed awfully wasteful of resources.
The repetitions of the aforementioned actions were caused by 2 before filters installed on the application controller:
class ApplicationController < ActionController::Base
before_filter :reload_settings
before_filter :auto:discovery_defaults
def reload_settings
config.reload
end
def auto_discovery_defaults
auto_discovery_feed(:type => 'feed')
end
def auto_discovery_feed(options)
options = {:only_path => false, :action => 'feed',
:controller => 'xml'}.merge options
@auto_discovery_url_rss = url_for({:format => 'rss20'}.merge options)
@auto_discovery_url_atom = url_for({:format => 'atom10'}.merge options)
end
end
And sidebar content (archive, flickr, tada, etc.) gets delivered through components, which are all derived from ApplicationController, thus inheriting the filters. Bingo!
There are, of course, several ways to fix this problem, I chose the simplest one, taking advantage of the fact that the request object passed to components gets duplicated for component processing. Upon entering the controller responsible for the request, config will be loaded and a processing flag is stored as an instance variable in the request object. Similarly, the values retrieved during feed get stored there as well. When a component gets invoked it retrieves these values from the request object.
def reload_settings
unless @request.instance_variable_get(:@config_reloaded)
@request.instance_variable_set(:@config_reloaded, true)
config.reload
end
end
def auto_discovery_defaults
@auto_discovery_url_rss =
@request.instance_variable_get(:@auto_discovery_url_rss)
@auto_discovery_url_atom =
@request.instance_variable_get(:@auto_discovery_url_atom)
unless @auto_discovery_url_rss && @auto_discovery_url_atom
auto_discovery_feed(:type => 'feed')
@request.instance_variable_set(:@auto_discovery_url_rss,
@auto_discovery_url_rss)
@request.instance_variable_set(:@auto_discovery_url_atom,
@auto_discovery_url_atom)
end
end
Running railsbench again and comparing the results shows that the most important pages are now rendered 4 to 5 times as fast:
page c1 real c2 real c1 r/s c2 r/s c1 ms/r c2 ms/r c1/c2
1: 14.86220 3.75458 6.7 26.6 148.62 37.55 3.96
2: 13.83662 2.70294 7.2 37.0 138.37 27.03 5.12
3: 1.47821 1.29960 67.6 76.9 14.78 13.00 1.14
4: 2.95742 2.71447 33.8 36.8 29.57 27.14 1.09
5: 2.33728 2.13827 42.8 46.8 23.37 21.38 1.09
6: 2.09189 1.73313 47.8 57.7 20.92 17.33 1.21
7: 13.99597 2.73185 7.1 36.6 139.96 27.32 5.12
8: 5.44847 1.42805 18.4 70.0 54.48 14.28 3.82
9: 1.48696 1.15061 67.3 86.9 14.87 11.51 1.29
Note: c1 and c2 label data of the original/modified application. r/s are requests per second and ms/r are milliseconds per request.
Now that's some niiiiiice improvement!
I think that it's pretty easy to shoot oneself in the foot using components. I tend to avoid them. But sometimes they can be nice, in that case: watch out for da slow filters! Either don't derive the components from ApplicationController, make sure that the filters are fast, possibly using the technique presented in this article, or turn them off completely, e.g. if you don't really need to run them and or can get at the necessary data using some other technique.
The possibility to disable filters in derived controller classes was recently added to Rails. So we could have said
skip_before_filter :reload_config
on the component classes, instead of storing a flag inside the request object. It should be even faster than storing flags on the request object, because the filter will be removed from the filter chain during class load time. Maybe I'll measure that at a later time.
View Comments
Posted 09 Nov 2005
When I started work on improving my app's speed, I noticed that on pages that display lots of date information, a large percentage of CPU time was spent on formatting date strings. I added a quick fix to my code, was satisifed and quickly forgot about it.
But recently I read a message on the ruby talk mailing list, where someone more or less rejected Rails after writing a small Ruby program for log file analysis that ran for about 27 minutes on a 100MB log file and a literally translated Java program finished in only 2.5 minutes blog entry.
It turned out that the inefficiency of the Ruby script was caused by the Ruby Date class, as someone rewrote the program using regexps to parse date info that needed only 3 minutes to complete the task.
I'm pretty sure that the Date class performance could be much improved, for example by rewriting it in C, but we'll leave this to the ruby core team. For page speedup we simply resort to avoiding construction of Date objects altogether.
For Mysql, the Rails database layer will always return date information as strings. The strings are formatted as 'YYYY-mm-dd', i.e. they will have ISO date format. Such a string can of course be turned into 'dd.mm.YYYY' easily using Ruby primitives. So instead of writing
<%= r.created_at.strftime "dd.mm.YYYY" %>
or
we'll change this into
<%= r.created_at_formatted %>
and add a function created_at_formatted to our model, which will access the attributes hash directly and transform the string into our desired format. One possible way to code this function is
def created_at_formatted
d = @attributes['created_at']
"#{d[8,2]}.#{d[5,2]}.#{d[0,4]}"
end
another is
def created_at_formatted
@attributes['created_at'] =~ /^(\d\d\d\d)-(\d\d)-(\d\d)/
"#{$3}.#{$2}.#{$1}"
end
This assumes that created_at is a non null column. If it isn't, you'll need to add some guards for null values.
The first version comes out slightly faster than the second, which I expected. But how much faster than the original code?
On pages with 25 dates displayed it is approximately 70% faster than the original version. On a page with 1 date, it's already 6% faster.
If you have pages with a large number of date displays, you might consider this technique to make them a tad snappier :-)
Update: As mentioned by henri, the technique does not apply to the oracle adapter, as it returns Time objects. As time permits I will look into the other adapters, to find the "bad boys" which return strings.
View Comments
Posted 06 Nov 2005
If you're subscribed to the rails mailing list, you might have seen some people talking about piggy backing attributes onto SQL queries.
This is a feature of the Active Record ORM mapping that I like very much, because it can be used to speedup you queries enormously. And I hope it won't be eliminated, ever.
Suppose there's a 1:n relationship between models A and B, i.e., for each A record there's exactly one B record (think B owns A). In my case that would be recipes and users.
In ActiveRecord you'd have class declarations like this:
class User < ActiveRecord::Base
has_many :recipes
end
class Recipe < ActiveRecord::Base
belongs_to :user
end
Your usual code to retrieve some recipes with their associated authors will probably look like this:
def show_some
@recipes = Recipe.find(:all, :limit => 50)
end
with the show_some template being rendered implicitely and looking similar to:
<% for r in @recipes %>
<%= r.title %>, <%= r.user.name %>
<br>
<% end %>
There's nothing wrong with this code, except that it's kind of slow:
there will be 51 queries to the database, because for each recipe the owning user (author) will be fetched in a separate query
all these queries are constructed dynamically upon calling r.user.name
We can do a little better than that: since approx. 0.13.1 Rails supports fetching associated objects by specifying our intention to access them via the :include syntax for find (called eager loading of associations).
So we'd actually write our query like this:
def show_some
@recipes = Recipe.find(:all, :limit => 50, :include => [:user])
end
But how much faster is this? The answer is: it depends. On how fast DB access is, for example. Whether your DB lives on another host, network speed, etc.
For a simple installation on one box, I have measured this (using railsbench, 1000 requests each, patched GC):
page request total stddev% r/s ms/r
/rezept/some_default 17.83765 0.2579 56.1 17.84
/rezept/some_include 13.25508 0.3499 75.4 13.26
Speedup is 34%, which is OK, but not spectacular.
We can do better than that using piggy backed attributes. First, we'll change the query to:
Recipe.find(:all, :limit => 50,
:conditions => "r.user_id=u.id",
:joins => 'r, users u',
:select => 'r.*, u.name AS user_name')
We still have only one query to the DB, this time involving a join, which could be slow. But don't worry, databases were constructed to perform such queries as fast as possible.
In addition to the recipe columns we now get 'user_name' included in the retrieved records. To make access to this attribute fast, we'll add a user_name function to our model:
class Recipe < ActiveRecord::Base
belongs_to :user
def user_name
@attributes['user_name']
end
end
and change our template to:
<% for r in @recipes %>
<%= r.title %>, <%= r.user_name %>
<br>
<% end %>
Is this faster? Answer: you can bet!
page request total stddev% r/s ms/r
/rezept/some_default 17.83765 0.2579 56.1 17.84
/rezept/some_include 13.25508 0.3499 75.4 13.26
/rezept/some_piggy 4.19032 0.1059 238.6 4.19
It's whopping 4 times faster than the first version. And 3 times faster than the :include version.
I can almost hear you scream "But this code sucks! The original version was sooo much nicer. I hate adding this extra function to my model." My answer is: if your app is fast enough for you, don't do it. If it isn't, this is a nice opportunity for speedup.
Also, if your rails app lives in a shared hosting environment, it means being nice to everyone else to make it faster.
And if you weren't looking for speed, you wouldn't be reading this blog in the first place ;-)
View Comments
Posted 05 Nov 2005
Rails 0.14 contains the possibility to turn off sessions on a controller or even action basis.
class WelcomeController < ApplicationController
session :off
...
end
This will turn off session creation and update on all actions of class WelcomeController.
Depending on the complexity of the logic inside your actions, you can get a big speedup. For example, on my welcome controller, index is action cached. Turning sessions off for this action, I see a huge performance improvement on my windows development box:
c1 real c2 real c1 r/s c2 r/s c1 ms/r c2 ms/r c1/c2
2.29700 0.27633 217.7 1809.4 4.59 0.55 8.31
2.45833 0.29200 203.4 1712.3 4.92 0.58 8.42
This was measured against the default ActiveRecordStore session implementation. But even with a faster session implementation, the difference is astonishing:
c1 real c2 real c1 r/s c2 r/s c1 ms/r c2 ms/r c1/c2
1.20333 0.26600 415.5 1879.7 2.41 0.53 4.52
1.35933 0.27633 367.8 1809.4 2.72 0.55 4.92
So it is certainly worthwhile to consider turning sessions off for controllers that don't need them.
View Comments