This is my first post on web performance. I have been wanting to write it for a while, and preparing the data for it has taken quite some time. Hope you like it, please give me feedback.
Squeezing the bytesWe can do all kinds of stuff to our page source to improve its performance. But if the page source was given, what could we do to minimize its pay load? Just a few examples to get things warmed up:
- Remove whitespace An easy fix would be to remove whitespace since it don’t affect browser rendering. Whitespace is highly redundant in itself, but removing it will have an impact.
- Remove comments Also an easy fix, and if you are not already doing it, you should be doing it.
- Remove quotes in some cases While it may be a bit extreme, and leave the result non-valid, there are certainly cases where all those quotes are not needed - at least from the browsers perspective. mod_pagespeed comes with a filter, Elide Attributes, that does this.
What else can we do?I think there is more to be done than outlined above. The data inside a HTML document can be categorized as either:
- elements, like <html>
- attributes, like id=“value”
This leaves us with attributes, and their importance and weight haven’t really had that much attention. I did a quick count on some pages to check the share taken up by attributes, and the results surprised me a bit.
|Website||Page Size||Attributes (1+)||Attributes (2+)|
|Amazon.com||251.972||79.948 (31,73%)||58.967 (23,40%)|
|Facebook.com||41.682||17.348 (41,62%)||13.593 (32,61%)|
|Google.com||119.297||9.662 (8,10%)||7.704 (6,46%)|
|Wikipedia.org||47.687||29.230 (61,30%)||27.159 (56,95%)|
|Yahoo.com||121.751||64.748 (53,18%)||42.034 (34,52%)|
|YouTube.com||103.188||69.396 (67,25%)||55.439 (53,73%)|
|Bilka.dk||94.297||56.881 (60,32%)||42.450 (45,02%)|
|DanskSupermarked.dk||13.789||8.364 (60,66%)||6.394 (46,37%)|
|Føtex.dk||56.027||26.007 (46,42%)||21.054 (37,58%)|
|Netto.dk||80.078||36.340 (45,38%)||25.268 (31,55%)|
|Salling.dk||62.369||37.472 (60,08%)||24.590 (39,43%)|
The sum of it all is that attributes on average make up approx. 35% of a document! And I don’t think we give them enough attention. So lets do that.
Compression: Another aspect to thisBy now everyone know that compression should be enabled to minimize the payload. And that goes for about 76% of the resources delivered. I am no expert on compression, but I do know that it takes the redundancies in a text, I know it from a high level perspective: it looks for redundancies and “reuses” them in the output.
Compression is good for our payload, and we should try to get the most benefit from it. If we want to maximize the compression ratio, we should aim to maximize the number and length of patterns found by the compression algorithm.
We are locked on most of our document, but lucky for us the W3C specified in the recommendation for XML that the order of attributes is not significant. This means that we can move around about 35% of the page size without too many constraints. The individual elements must maintain their semantical meaning, which essential leads to a few contraints:
- Two attributes with the same key would have to remain in the order they were specified
- Obviously, we can’t remove any attributes
But does attributes matter in a compressed document?To test if the order of attributes matters, I set up an experiment. Based on the same websites as above, I tried shuffling all attributes on all elements - making the order totally random - and compressing it all. As a control, I used the unaltered compressed original.
Can ordering of attributes improve compression ratio?Since we now know the order of attributes can hurt a compressed page, it should also be possible to use this for our advantage. What if there were some dominant way of ordering attributes? This would allow us to use output mechanisms like mod_pagespeed or Servlet Filters to apply this ordering for us, and improve compression ratio.
So I sat down, and thought about possible strategies for ordering the attributes, and this is the list I came up with:
- byName Sorts attributes by name
- byValueLength Sorts an elements attributes by the length of their value. The idea here is to get attributes with short values to appear first
- analytic-x Analyses the document for elements with more than X attributes, and sorts them based on the number of unique values. If two attributes share the same number of unique values, they are sorted by name. X can be either 0 or 2.
|Amazon.com||-89 (-0,16%)||40 (0,07%)||-34 (-0,06%)||-70 (-0,12%)|
|Facebook.com||-9 (-0,08%)||62 (0,52%)||-7 (-0,06%)||-15 (-0,13%)|
|Google.com||-11 (-0,03%)||17 (0,05%)||9 (0,03%)||8 (0,02%)|
|Wikipedia.org||45 (0,48%)||220 (2,33%)||294 (3,11%)||296 (3,13%)|
|Yahoo.com||-1 (0,00%)||48 (0,18%)||39 (0,14%)||45 (0,16%)|
|YouTube.com||-164 (-1,09%)||155 (1,03%)||0 (0,00%)||2 (0,01%)|
|Bilka.dk||-20 (-0,10%)||84 (0,42%)||58 (0,29%)||41 (0,20%)|
|DanskSupermarked.dk||-12 (-0,33%)||1 (0,03%)||-2 (-0,05%)||-1 (-0,03%)|
|Føtex.dk||-62 (-0,40%)||-35 (-0,23%)||-52 (-0,34%)||-39 (-0,25%)|
|Netto.dk||-58 (-0,31%)||-7 (-0,04%)||-61 (-0,33%)||-90 (-0,48%)|
|Salling.dk||-76 (-0,69%)||-20 (-0,18%)||5 (0,05%)||-2 (-0,02%)|
The results also show that there is no dominant strategy since re-ordering does not improve compression ratio in all cases. As for the effect of doing it in the first place, some of the websites listed above is among the most optimized worldwide.
The lack of a dominant strategy kind of disappoints me, but I have a few ideas that I will try out and see how the results look. Please leave a comment if you liked my post.