Table of contents
- Basic Layout
- Vertical lines
- Horizontal timings
- Common scenarios
- HTTP/1.1 vs HTTP/2
- Firefox enhanced tracking protection
- Service worker precaching
- Chrome stair-step
- HTML early flush
- Long DOM content loaded (DCL) time
- The never-ending waterfall
- Rogue image downloads (HTTP/2 with SRI)
- Requests with no response body (Error/Status Code: -1 & -2)
- HTTP/2 prioritisation
- Large Time to First Byte (TTFB)
- Inactivity after SSL negotiation
- Missing component times (DNS, Connect, TLS)
- The impact of a CPU bottleneck
- Hidden gems
I often find myself looking at WebPageTest (WPT) waterfall charts, but as I seem to have the memory of a chimpanzee (not a goldfish, that’s a myth), I tend to forget some of the details and what they all mean. So I decided to pull together many bits of information into a single blog post I can refer to at a later date. If you find it useful, or think I’ve forgotten anything, please let me know. If you are interested in learning about the
Connection View that sits directly below the
Waterfall View, then check out my post ‘How to read a WebPageTest Connection View chart’.
Here’s the basic layout of the waterfall chart user interface:
1 - Key
The key shows three types of information:
- Information about the connection status (DNS lookup, connection established, SSL negotiation)
- The type of resource being requested (e.g. HTML, images etc)
Each resource has 2 colours, a light shade and a dark shade. The light shade signifies the point at which the browser has made the request for the resource. The dark shade is the point at which the resource is actually downloading. See this comment from Pat Meenan for more information.
The “wait” visualisation is a new addition to WPT. It shows you the time between when the browser first discovered the resource on the page, up to when the request for the resource was made by the browser to the server.
2 - Request list
A list of assets the browser has discovered on the page and the order in which they were requested. Note the request number on the far left, as well as the yellow lock if the request is being made over a secure connection (HTTPS).
3 - Request timeline
The timeline shows the time along the horizontal (x) axis verses each request made on the vertical (y) axis. From this you can see the lifecycle of a request made by the browser. From discovery (wait), through to request being made, and finally to asset being download.
Ideally you want to make sure this timeline covers as little time as possible, as this indicates better overall performance. The thinner the timeline, the quicker the page loads for a user.
4 - CPU Utilisation
A simple graph showing the CPU utilisation of the browser process running on the device. It displays how much CPU the current webpage is using at any point in time. It ranges from 0 - 100% utilisation. See this comment from Pat Meenan for more information.
5 - Bandwidth In
This is an indicator of when data is coming into the browser. The visualisation is helpful to see when the browser is doing useful work vs wasted time. The absolute scale can be ignored, as it isn’t very accurate. Use the “Capture network packet trace (tcpdump)” option in the advanced tab on the WebPageTest homepage if you want more accurate results. See this comment from Pat Meenan for more information.
6 - Browser Main Thread
This graph visualises what the browsers main thread is doing at any specific point in time (x-axis). The y-axis shows the percentage from 0 - 100%. The colours were copied from the Chrome DevTools CPU graph (under the performance tab). Here’s what each of the colours mean:
- Orange - Script parsing, evaluation and execution
- Purple - Layout
- Green - Painting
- Blue - HTML parsing
Using this graph it is possible to see if the CPU is becoming a bottleneck in one of the above areas.
7 - Page is Interactive
This graph gives you an indication of when the main thread is blocked. The red blocks indicate that the main thread has been blocked for 100ms (which will also block inputs like button presses). Green indicates the thread isn’t blocked. Note: it still may be possible to scroll during red blocked phases, as scrolling is usually handled off the main thread for most browsers. See this comment from Pat Meenan for more information.
You can see the key for each of the coloured vertical lines below the “Waterfall View” header as seen below:
But what do each of them mean?
Start Render - Green line
This is the first point at which a user will see pixels painted to the page. The pixels could be from anything at all (background colours, borders etc), not necessarily content. Before this point the screen was blank. This metric is measured by analysing individual video frames that are captured during the page load. See comment from Pat Meenan for more information.
RUM First Paint - Light green line
This is the point where the browser renders anything to the screen that is visually different from before navigation (i.e. the blank screen for WPT). This metric is reported via the browsers API, so when it thinks it painted the first content. Because of this, the line is only visible if the browser supports the Paint Timing API.
DOM Interactive - Yellow line
The point at which the browser has finished parsing all the HTML, and DOM construction is complete. Unfortunately it’s not a reliable metric.
DOM Content Loaded - Pink line
The point at which the HTML has been loaded and parsed, and the browser has reached the end of the document. All blocking scripts have been loaded and run. The DOM at this point is completely defined. See comment from Pat Meenan for more information.
On Load - Lavender line
The point at which the window load event fires. All objects are in the DOM and all images and scripts have finished loading.
Document Complete - Blue line
The point where the
So let’s now concentrate on the request timeline (3). What do the horizontal blocks mean and what metrics do they refer too? Well, if you click on an individual request you will see a popup with a lot more information, as seen in the example below:
So lets take a look at a few of the requests from this waterfall view, as it gives us a number of quite varied requests to look at.
Request 1 - The HTML
Here the browser is requesting the HTML document, so at this point in time it is also having to setup the connection to the server. In the request details we are given the following timings:
- Discovered: 0.011 s
- Request Start: 0.116 s
- DNS Lookup: 27 ms
- Initial Connection: 25 ms
- SSL Negotiation: 43 ms
- Time to First Byte: 315 ms
- Content Download: 40 ms
I’ve annotated the request to show what each of these timings mean:
Adding the DNS, Initial Connection, SSL negotiation, Time to First Byte (TTFB) and the Content download times gives you the 450ms that is displayed directly after the request finishes.
It’s worth noting that WPT follows a specific convention in the request details panel:
- If the time corresponds to a duration, it is measured is milliseconds (ms), e.g. the DNS lookup took 27ms.
- If the time corresponds to a starting point, it is measured in seconds (s), e.g. the request started at 0.116s.
This request is different from the other requests examined because the file is coming from a different domain. The request details give the following timings:
- Discovered: 0.473 s
- Request Start: 0.702 s
- DNS Lookup: 28 ms
- Initial Connection: 39 ms
- SSL Negotiation: 153 ms
- Time to First Byte: 48 ms
- Content Download: 9 ms
Notice how the browser needs to go through the whole connection negotiation again (DNS, Connection, SSL negotiation) because the file exists on a different domain. This adds a fair chunk of time to the request (28 + 39 + 153 = 220ms).
The other interesting point about this request is the script executes around 200ms after the download is complete. There’s no information about this execution in the details panel, but you can see it in the waterfall as light pink lines after the request and orange in the “Browser Main Thread” panel (6) which signifies “Script parsing, evaluation and execution”.
Request 15 - A sponsor PNG
With this request the browser has discovered a PNG image and requests it from the server. In the request details we are given the following timings:
- Discovered: 0.652 s
- Request Start: 0.824 s
- Time to First Byte: 214 ms
- Content Download: 28 ms
Wait time is calculated by subtracting the discovered time from the request start time. The wait time is the time taken from when the browser first finds the asset, to the time when it has the capacity to send a request to the server.
The duration after this request is the time taken from the request being made, to when the request is completed (Time to First Byte + Content Download). Since a connection to the domain has already been established, there’s no need for DNS, Connect, SSL negotiation.
Request 23 - GIF file moved
Although request 23 looks quite unremarkable, there are a couple of things going on. The background of the request is yellow. This is to signify a server response status code that isn’t the usual
200. In this instance it is a
302 status code, which signifies the GIF file has been temporarily moved. In fact all responses with a
3xx status code will have a yellow background. The request details show the following information:
- Error/Status Code: 302
Notice how request 23 doesn’t require a TCP connection to be established. This is because it has already happened for this domain on request 20.
Error status codes
5xx are displayed in a similar way, only the background is red, like in the example below (note this image is from a different test):
The request details show the following information:
- Error/Status Code: 404
Notice the colour of the returned resource response in this instance. Rather than being the expected purple colour for an image, it is blue signifying that it is an HTML page. Or, in other words, it is the server responding with the 404 page because the asset can’t be found.
Another visual peculiarity you may be curious about is the vertical stripes in each of the requests. As mentioned earlier, the lighter colour signifies the request has been made and the browser is waiting for a response. The darker colour indicates that bytes for the specific resource are being delivered to the browser. Now sometimes this doesn’t happen all at the same time. This results in what looks like zebra striping, where the browser is being drip-fed bytes over time. These are called download chunks (red arrows).
This is very visible when an HTML early flush technique is being used (request 2 - see scenario below for more details) or if a large number of assets are being downloaded in parallel and are competing for resources (requests 3-9). Note the bandwidth graph at the bottom of the UI is maxed out from 1.6 to 2.5 seconds.
You may be asking “but this chunking also occurs after the bandwidth usage drops (2.6 seconds+), so what is happening there?”. Well the number of parallel connections has dropped so less is downloading in parallel. But the connections that have been created in requests 12 - 15 are still in the TCP slow start phase, so the the assets are still competed for (now limited) bandwidth.
Here’s a list of common patterns seen in a WPT waterfall chart. I’ll add more of these over time as I encounter them.
DNS Prefetch is part of the Resource Hints Working Draft. It gives a developer the ability to tell the browser that a DNS lookup is going to be needed for another domain in the near future. So instead of waiting, start the lookup immediately. By the time the domain is actually required, the browser will only need to complete the TCP handshake, and optional SSL negotiation. It looks similar to the preconnect example below, in that it is “floating” in the timeline. But here only the DNS lookup (green) is visible.
Notice where the
dns-prefetch occurred in the timeline: almost immediately after the HTML has finished downloading and is parsed. It is easy to see the difference if you compare it to the connection negotiations happening in requests 5 and 7, where
preconnect is being used.
Preconnect is part of the Resource Hints Working Draft. It allows a developer to give the browser a “hint” that a domain will need to be connected too in the near future. By connecting to the domain early, the connection won’t need to be established later in the waterfall, thus allowing assets from said domain to be requested and downloaded quicker when they are required.
As you can see in the image above, the
preconnect looks to be “floating” in the timeline. It happens way before the actual request for the image is made. This is the browser using the
preconnect hint to connect ahead of time before it is required. For more information on the
preconnect hint I recommend reading this blog post by Andy Davies.
Prefetch is part of the Resource Hints Working Draft. It allows a developer to tell the browser to prefetch a resource (e.g. CSS, JS, HTML document) in the current navigation, as it may be required at a later date. For example, if you know the majority of your users navigate to a specific page from your homepage (e.g. maybe your login page), you could decide to
prefetch it so it will already exist in the users browser cache when it is required. In the example below I am prefetching another HTML document that sits along a users journey:
prefetch is visible at request 19 in blue (HTML). It is worth noting that this prefetched HTML is simply stored in the browser cache. It isn’t parsed by the browser. You can verify this in the waterfall chart UI by looking at the browser main thread graph. At the point the the HTML is prefetched, there’s no main thread activity registered.
WebPageTest gives us some information in the popup to let us know it is a
Priority: IDLE(under the details tab)
purpose: prefetch(under the request tab)
It’s important to remember the priority of a
prefetch. In WebPageTest, when testing in Chrome it is listed as priority
IDLE. This maps to
Lowest priority in DevTools (according to the Resource Fetch Prioritization and Scheduling in Chromium document). So a
prefetch is an optional and often low-priority fetch and will be loaded as late as possible by the browser. This differs from
preload which is a mandatory fetch and gets a
High priority in the browser. A resource loaded using
preload is layout blocking, so use it sparingly else it could actually slow down perceived performance.
Prerender is part of the Resource Hints Working Draft. It gives a developer the ability to tell a browser what a users likely next navigation could be (assuming the developer is tracking this in some form of analytics). In December 2017, with the release of Chrome 63, Google overhauled how
prerender worked. Here’s a brief before and after explanation:
Pre-Chrome 63: Chrome would look for the
prerender (since the user can’t see them), it was a very complex task to achieve this when the whole point of a
prerender was to render the page. So the decision was taken to depreciate and remove the then current implementation.
Chrome 63+: Since the release of Chrome 63, the
prerender hint is still recognised and followed by Chrome, only it is handled in a much different way. Chrome now uses a technique called “NoState Prefetch” when it sees a
So what does this
prerender look like in a WPT waterfall chart:
In the waterfall chart you can see the usual page and resources loading from request 1 through to 19. Request 16 is where the
prerender occurs. Here you can see a request for a second HTML page. Once completed, this then triggers requests 20 through to 29. Notice how many of these requests have a yellow background with a
304 status code. This is telling us they are identical to a resource that already exist in the browsers cache. They exist here because the homepage HTML above them (request 1) put them there only a few 100ms before. Notice how there’s very little happening in the browser main thread graph (other than the homepage parsing which is happening because request 30 (CSS) completed). This confirms that the prerendered assets and subresources are simply being stored in the browser cache for later use.
prefetch, WPT gives us a little information in the popup for each
prerender resource to let us know the requests aren’t from a standard user navigation:
Priority: IDLE(under the details tab)
purpose: prefetch(under the request tab)
Note: It doesn’t explicitly tell us it comes from a
prerender hint, only from a
prefetch. Since “NoState Prefetch” is now being used, this actually makes sense.
prerender link element you get your standard page waterfall chart. Only requests to the current page subresources can be seen.
Preloading is a W3C Candidate Recommendation and is used to increase the loading priority of selected assets. A developer can tell the browser: “this resource will absolutely be needed soon, so load it right away”. This technique is often used when loading web fonts.
preload, when loading a web font the browser first needs to download the HTML and CSS, then parse both to create the render tree. Only at this point can the browser request the font. This can lead to what is known as a Flash of Invisible Text (FOIT) and Flash of Unstyled Text (FOUT). A way around this issue is to request the web font file immediately using the
If you compare both of the images above you will see the request for the preloaded WOFF2 font is made as soon as the HTML starts to be downloaded at request number 2 (dark blue strip). The browser parsed the
<head> tag, saw the
preload directive and made the file request immediately.
Compare this to the second image, where the browser downloads the font after waiting for the HTML and CSS to be downloaded and parsed. Only at this point can the WOFF2 font request be made. As you can see from the image, when preloading isn’t used the font is at request number 11. I’ve written more about font preloading here if you are interested.
HTTP/1.1 vs HTTP/2
HTTP/2 is the next iteration of the HTTP protocol after HTTP/1.1. Due to the fact that HTTP/2 uses a single TCP connection and multiplexes files over this single connection, it is easy to spot the difference in the resulting waterfall charts:
A browser using HTTP/1.1 requests images via separate TCP connections, and this tends to happen at slightly different times (hence the stepped nature of the waterfall). A browser using HTTP/2 on the other hand requests all the images at the same time. It is the server that decides when the images will be sent back to the browser, and in what order.
Online Certificate Status Protocol (OCSP) is an internet protocol used for obtaining the revocation status of SSL certificates. One way for a browser to certify a certificate is to connect to an OCSP server for verification. When this happens WebPageTest will show you in the waterfall as seen below:
This OCSP check is bad for performance. The verification requires a DNS lookup and an initial connection to the OCSP server. Only once the certificate has been verified, can the SSL negotiation take place to the original domain. As you can see in the image, the whole waterfall is being pushed back. It takes almost 2 seconds before the HTML page can even be requested!
If you compare the with & without OCSP waterfalls, you can see that the length of the SSL negotiation is much shorter for without OCSP (300ms instead of 1000ms+) and therefore the request for the HTML file happens much quicker (at 1 second verses 1.95 seconds). The OCSP check adds 950ms to the initial HTML request on a 3G Fast connection. That’s a huge number!
If you notice this on your WebPageTest timelines you should look into enabling OCSP stapling on your server. Note: If you are using Extended Validation certificates (EV), OCSP stapling won’t fully solve the issue, see this technical Twitter thread for more details on this.
Firefox enhanced tracking protection
Firefox enabled enhanced tracking protection by default as of version 69 (June 2019). The agents on WebPageTest updated around the same time. In some rare cases the tracking protection requests could be seen in the WPT waterfalls (request 1-3):
According to Pat Meenan these requests should now be filtered out by default, so they will never be seen in the waterfall charts.
Service worker precaching
The use of service workers is gradually increasing, and one of the many features they allow is a fine-grain control on what assets are cached and for how long. They also give a developer the ability to precache files for future use (e.g. for offline capabilities). An important detail to remember when precaching assets using a service worker, is the browser may need to download the same files twice. Once for the HTTP cache (the standard browser cache), and again for the service worker cache (Cache API). Theses are two totally separate caches, and don’t share assets. These duplicate requests can be seen in a WebPageTest waterfall chart:
Chrome has included a prioritisation technique that is named due to the pattern it creates in waterfall charts. It involves Chrome examining the assets in the
<head> tag (before the page even has a
<body>), and requesting, downloading, and parsing these requests first. The browser even goes so far as to delay requests for assets in the body until the
<head> requests are complete. It is easier to see this stepping in a HTTP/1.1 graph, as seen in the example below (although it also occurs in HTTP/2):
In the above image from the BBC News website, 8 out of the first 9 requests are made for assets in the
<body>. The “step” isn’t very long in terms of duration, only around 200ms. But it gives the browser enough time to concentrate all CPU and bandwidth on downloading and parsing these assets, so the
<head> is then setup and ready to go before the
<body> assets are downloaded and parsed.
Not much has been written about this “layout-blocking” phase in Chrome, but it can be seen in detail in the Resource Fetch Prioritization and Scheduling in Chromium document by Pat Meenan, and also in Chrome’s resource scheduler source code.
HTML early flush
HTML early flush was mentioned above in the download chunks section. It is when a web server sends a small slice of the HTML document, before the whole HTML response is ready. What this then allows the browser to do is parse the HTML it has received, and look for assets that it can then request early (compared to waiting for the whole HTML document to download, parse, and then request).
With early flush
In the example above, the small download chunk of HTML the browser receives (request 2) contains the
preload directives, and
dns-prefetch resource hints. This HTML is parsed and 16 requests are almost immediately triggered very close to each other. Note: Notice how I didn’t list CSS. CNN.com has inlined the CSS in a
Without early flush
If you compare the waterfalls for with & without early flush (from different sites unfortunately): With flush you will notice how the requests for assets is made during the HTML download response. Compared to without, where the browser must wait for the whole HTML document response to complete. Only then can it be parsed and requests made to other page assets.
Flushing allows the browser to make requests earlier in the waterfall, and therefore makes the page load and feel faster (if implemented correctly).
Long DOM content loaded (DCL) time
The vertical lines on the waterfall chart are usually very thin, maybe only a couple of pixels thick. This shows they take very little time to actually happen. But in some cases the DOM content loaded line takes up a large chunk of time, and you will very easily see it on your waterfall chart! Take a look at the two (different) charts below:
The pink DOM Content Loaded (DCL) vertical line is usually almost instantaneous. Very thin and barely visible on the waterfall chart as it usually occurs around the same time as other events like DOM Interactive & Start Render. This is what a healthy DCL looks like.
$(document).ready() method (or some equivalent). So if you ever see a DCL line eating a large chunk of your waterfall chart, you now know where to look.
The never-ending waterfall
Occasionally if you have a third party script setup incorrectly, you can end up in a situation where a test will take a very long time to stop, as can be seen in the waterfall chart below:
Yes you are seeing that correctly, 6150 requests logged by WebPageTest! So many in fact that my browser struggles to render all the DOM elements on the page. This is where the ability to customise the waterfall really helps, as you can limit the number of visible requests to a reasonable number.
So what exactly is happening? Well there’s a clue in the network activity graph. If you look closely you will observe that it never reaches zero. WebPageTest waits for there to be no network activity 2 seconds, at which point it ends the test. Unfortunately this third party script is executing every 30ms (request 4) and downloading a mere 43 bytes of data each time. Just enough to keep the network alive. It’s also worth noting the CPU utilisation graph, which is maxed out over the time period (120 seconds). That’s what happens to a device when a script is executing 33 times per second.
At 120 seconds WebPageTest automatically forces the test to stop. There are a couple of ways to force a test to stop yourself (other than fixing the issue):
- by clicking the “Stop Measurement at Document Complete” check box on the advanced page before you start the test, this will stop the test at the onLoad event
- by using the “setDomElement” script command to tell the test to wait for a specific DOM element to show in the DOM
So if you see unusually long tests running, check to see what your third party scripts are doing to the network graph.
Rogue image downloads (HTTP/2 with SRI)
I personally ran into this issue very recently on GOV.UK. The waterfall was showing an image download that at first glance shouldn’t actually be able to be downloaded:
Request number 2 shows an image downloading before the connection to the domain has been negotiated. I’ve now written a whole blog post “HTTP/2 and Subresource Integrity don’t always get on” about this curious case if you want to delve into the details. But a quick TL;DR for you here is: the connection to the “assets” domain in request 2 is using HTTP/2 connection coalescing, so it is using the same TCP connection from request number 1. The requests 3-10 are waiting for an
Requests with no response body (Error/Status Code: -1 & -2)
There’s more status codes that those mentioned in the Request 23 - GIF file moved section, another two available are codes
Notice how the above image is requesting assets from the third party domain then immediately failing. It is able possible for this to happen on requests from the same domain, as seen below:
When a request is made but WPT sees no response, it is given the error code of minus 1 or minus 2. This can either mean the response was never sent back by the server, or WPT simply didn’t detect it. This can also be seen if you click on the individual requests:
- Error/Status Code: -2
- Content Download: 0 ms
- Bytes In (downloaded): 0 B
- Uncompressed Size: 0 B
- Bytes Out (uploaded): 0.5 KB
There’s a fair amount going on in this waterfall chart. Request 13 is the important one to focus on. This is the browser requesting a version of jQuery from the Google Hosted Libraries CDN. The connection takes a while to negotiate, then the download takes a chunk of time because bandwidth is being used for other requests (11, 12, 14). Only once the script has downloaded does a whole bunch of other activity start. The JS is executed and there’s a 2nd HTML parse by the browser, which suddenly triggers lots of requests for images. Notice where the “Start render”, “RUM first paint”, and “Page is Interactive” green graph appear on the chart. They are all being pushed back along the waterfall because of this single third party script request.
And that’s why it is now recommended you Self-Host Your Static Assets. This example was sent over by Mike Herchel and if you are interested in what happens once you remove the third party JS, check out his thread here.
HTTP/2 is different to HTTP/1.1 in the fact that it aims to use a minimal number of TCP connections (1 or 2). Whereas for ‘optimal’ HTTP/1.1 performance the browser opens 6 TCP connections to a single domain, thus allowing 6 assets to be downloaded at a time, HTTP/2 is different. It introduces multiplex streams over a single TCP connection. A very simple explanation of this is:
- The browser opens a TCP connection to the server and downloads the page HTML
- Browser parses the HTML and looks for all other page assets (CSS, JS, Fonts, Images etc)
- Browser sends a list of assets it needs to load the page to the server, and the priority it would like them in
- It is then up to the server how it delivers these assets to the browser and in what order (ideally using the browser prioritisation as a reference). This is where HTTP/2 prioritisation comes in
- Server then sends multiple assets at the same time via streams across a single TCP connection
How the assets are prioritised depends on how the server has implemented the HTTP/2 specification. A lot more details about H2 server / CDN prioritisation check out Andy Davies’ ‘http2-prioritization-issues’ repository which is used to track results from Pat Meenan’s ‘HTTP/2 prioritization test’.
The test works by first warming up the TCP connection (seen in request 2 & 3). TCP slow start is part of the congestion control strategy used by the TCP protocol. It is used to stop the server flooding the TCP connection. The server gradually ramps up the speed, while looking to see if any packets are lost along the way. If packets are lost, it drops the speed.
Once the connection is up to speed the browser then requests 30 images concurrantly at a low priority (request 4 to 33). When two low priority images have downloaded (and therefore the network buffers are filled), then it requests a high priority image. The priority of the image is sent to the server along with the request (if the browser supports it), and is decided by the browser using the rules set out in the ‘Resource Fetch Prioritization and Scheduling in Chromium’ document. Looking at this document we can see that in Chrome an ‘Image (in viewport)’ in the ‘Load in layout-blocking phase’ is given a ‘High’ priority. Once one high priority image has downloaded, then the browser requests another.
The test has been designed to see how the server copes with H2 prioritisation when it is given a higher priority file to send, while already sending low priority assets. A server that prioritises correctly should divert resources from low priority assets, to high priority assets (so the high priority assets are loaded first).
So let’s take a look at a waterfall from good and bad prioritisations, and discuss what is happening:
Good - Fastly
So what exactly is happening in this waterfall? Requests 2 and 3 are warming up the TCP connection. Requests 4 - 33 are images requested with a priority of ‘lowest’. Keep in mind that the lower the request number, the faster it was discovered and requested by the browser, e.g. image at request number 4 was seen, and requested before the image at request 20. If the server didn’t bother with any form of prioritisation and simply sent images on a first come, first served basis, images later in the waterfall would always be delivered last, since they would always be at the back of the queue. But what we are actually seeing is the server sees a request for the high priority images and it stops streaming the ‘lowest’ priority data in favour of ‘high’ priority data, so the high priority images are completed sooner.
Now if you compare this to a CDN with poor HTTP/2 prioritisation:
Bad - Level 3
Here we can see the server essentially ignoring the information it’s been given in the request about the high priority images. The two high priority images are added to the end of the download queue. The server continues to send low priority data and makes no allowances for the fact that by doing so, it is delivering a suboptimal experience to the user. You can actually see this in the waterfall by looking at the green ‘start render’ line: for good prioritisation this is at approximatly 9 seconds, for bad prioritisation it is almost 21 seconds!
Large Time to First Byte (TTFB)
Time to First Byte is the time it takes from the browser requesting the first asset, to the time at which the server sends back the first byte of data to the browser. It consists of:
- Browser requesting the asset from the server (after DNS + Connect + SSL)
- Time taken for the packets to travel from the browser to the server
- Server recieves the request. It then processes and constructs the response and transmits it back
- Time taken for the response packets to travel from the server to the browser
The time taken to travel from the browser to the server is known as network latency. A data travelling there and back again is known as a Round Trip (RTT). So how do you use WebPageTest to identify when you have a large TTFB?
The above test was run on a standard ‘Cable’ connection as defined by WebPageTest, so the connection has a RTT of 28ms. Now if you compare this value to the TTFB, which is 3.1 seconds, we can see there’s an issue here. By looking at the connect negotiation time (in orange on request 3), this gives you an indication of how quickly packets can travel across the network (32 ms in this case). So it’s clear to see that the TTFB delay isn’t caused by congestion over the network. There is zero activity over the network according to the bandwidth graph, and fairly minimal activity happening on the CPU. To the device is waiting on the server to respond before it can do anything else.
In this case it’s the processing time on the server that is causing the delay. Whatever the server is doing to construct the response is taking approximatly 3 seconds. That’s a huge amount of time to construct the HTML. There are far too many reasons to list as to why this could be happening on the server, but a good starting place would be to look at the databases queries, hosting settings, server resources available, and the server software that is running. Whatever is causing the issue needs to be identified and fixed, as this will be having a huge impact on the sites users. So if you see a WebPageTest waterfall that looks like this, examine your server setup and try to reduce this time. As Harry Roberts mentions in his article ‘Time to First Byte: What It Is and Why It Matters’:
While a good TTFB doesn’t necessarily mean you will have a fast website, a bad TTFB almost certainly guarantees a slow one.
Inactivity after SSL negotiation
This is quite an unusual waterfall. We see a large Time to First Byte on request 3, but we see a huge gap with apparently no activity at all:
What is interesting about this waterfall is we can see the DNS + connect + SSL negotiation happen very quickly (34ms + 33ms + 80ms respectivly), then there’s zero activity on the waterfall, CPU and bandwidth graphs over this time period. This indicates that the device is idle and waiting on work to do. Right towards the end of this period of inactivity we see the browser instigate two OCSP revocation checks. But according to the waterfall the SSL negotiation has completed by this point in time.
I can’t be 100% sure why this is happening (it’s an old test with no tcpdump log to examine the network activity), but if I were to guess I’d say that there is something unusual happening with the sites certificate. It could be there’s an additional OCSP check happening that isn’t being displayed on the waterfall, or maybe the SSL negotiation hasn’t completed properly and the browser is trying to recover from the error. But given the fact that nothing is happening on either the CPU graph or the bandwidth graph, whatever is happening isn’ve very work intensive on the device. It is worth noting that the site currently uses an Extended Validation (EV) certificate that Chrome will always conduct an OCSP check on. If anyone has any other ideas about what is happening in this waterfall, I’d love to hear them, so let me know.
Missing component times (DNS, Connect, TLS)
This should no longer be an issue on WebPageTest, but it is worth pointing out if you ever look at waterfall test results from iOS devices pre-iOS 10. iOS versions before 10 didn’t report any of the connection component times back to WebPageTest. This was a limitation of what iOS reported through the Safari developer tools interface. Heres a test from 2016 on an iPhone 5c running iOS 9.3:
As you can see the information about DNS, Connect, and SSL are all missing from the waterfall. Note that although not reported individually, the times are included in the TTFB metric so it is still correct. Now compare this to a test run in 2020 on an iPhone 5c running iOS 10.3 (the last version supported on this device):
All iOS devices currently running on WebPageTest report the DNS, Connect, and TLS times correctly. All of them are running iOS >= 10.3. NOTE: There currently (Jan 2020) seems to be a rendering issue on WPT with the iPhone 6 and iPad Mini 2 that was introduced in iOS 12.2, whereby the test is rendered midway through the waterfall chart. But by looking at the two tests listed it is possible to see some component connection information rendered. So at least that isn’t broken…
The impact of a CPU bottleneck
This is an old test but it gives you an example as to what can happen once the CPU becomes a bottleneck. You can see the full WebPageTest run here. It is filled with lots of sporadic short bursts of activity across the whole page load. I’ve focussed on the first 20 requests in the diagram below as the full chart is far too long:
More to be added soon…
As I discover more common waterfall scenarios I will add them here. If you know of any common ones that are missing, please do let me know!
So there are features that are sitting in plain sight on the WebPageTest UI that you may not have even noticed before. Here are a few that I’ve found useful:
How the filmstrip view and waterfall chart are related
Now it may seem obvious to some, but it is worth pointing out nonetheless. There is a direct relationship between the filmstrip view and the corresponding waterfall chart below it:
To the far left of the filmstrip you will see a 1px vertical red line. As you scroll the filmstrip horizontally you will see an identical red line moving across the waterfall chart. These are directly related. Together they show you what the page looked like at that exact point in the waterfall chart (you would not believe how long it took me to notice this feature!).
Another feature that can be seen is the use of a thick orange border around some of the images. This orange border signifies that something has changed on the screen (compared to the previous one). This is very useful if you are trying to identify even minor changes between screenshots (like an icon loading).
You can see both of these features in action in the screenshot. There is a thick orange border around the image at 0.9s, as it shows a major change to the page compared to the image at 0.8s. Looking closer at the waterfall we can see the red line is approaching and the vertical green line (start render). The image at 0.9s is actually the start render time for the page.
How to generate customised waterfall chart images
Almost all of the waterfall images in the article you see above have used this feature that is tucked away at the bottom of every waterfall chart. WebPageTest gives you the ability to customise what is included in a waterfall chart:
Clicking the ‘customize waterfall’ link directs you to a set of customisation options (see image below):
As you can see in the image above, I have customised the output image. We have:
- Set a custom image width
- Set a custom time that the whole waterfall chart covers (note this will lead to cropping of requests further down the waterfall)
- Only selected certain requests to be shown either individually or as a range (notice the ellipsis between these that identify missing items)
- Unchecked ‘Show CPU Utilization’ so the ‘CPU Utilisation’ and ‘Browser main thread’ graphs are hidden
All these options allow you to remove much of the noise in the chart so you can focus on the areas you wish to identify and highlight. Once you are happy with the image, simply right click and save the image to your device as you would any other image.
How to add custom marks using the User Timing API
Here’s a useful feature you may not know about. Using the User Timing API you have the ability to mark certain points in your page load. These points will be discovered by WebPageTest and displayed accordingly in the results.
For example, if you wanted to know when the browser got to the end of the
<head> tag, you could add the following code right before the closing tag:
<head> <!-- head stuff here... --> <script>window.performance.mark('mark_head_parsed');</script> </head>
The browser then sets a mark at this point which can be read by WebPageTest. The resulting WebPageTest run will show you results similar to this:
As you can see from the image above I have set four User Timing marks on the page and one of them is called
mark_head_parsed. You can add as many marks as you need, it really depends on what you are trying to measure. Now if you head over to the ‘customise waterfall’ link (as mentioned in the section above), you will see one or more purple triangles and vertical lines on the waterfall chart. These are the User Timing marks we just set:
Note: If you happen to be running a private instance of WebPageTest, you can configure it to display the User Timing marks to be on by default for the interactive waterfall chart.
How to display the bandwidth graph on mobile devices
When you run a test on one of the real mobile devices that are sitting in Pat Meenan’s basement in Dulles, VA, by default you don’t get a bandwidth chart like you do on desktop agents. Resulting in an output that looks like this:
If you do want to capture this data and display the graph in the resulting test, make sure you check ‘Capture network packet trace (tcpdump)’ under the ‘Advanced’ when you configure your test:
This will give you a slightly different version of the ‘Bandwidth In’ graph than you see on desktop agents:
It now gives you information about the data wasted due to the retransmission of packets over the network (although not actually seen on the example graph). An example with ‘Duplicate (wasted) Data’ can be seen here along with a little more information.
So there you have it, a brain dump into a blog post about some of the aspects of WebPageTest that I’ve found a little mysterious. As this has been a learning exercise for me too, if there is anything I have misinterpreted, please do tweet me and let me know!
- 11/10/19: Added waterfall for ‘Firefox enhanced tracking protection’ scenario.
- 12/10/19: Added ‘service worker precaching’ scenario.
- 13/10/19: Added ‘Chrome Stair-Step’ scenario.
- 14/10/19: Added the
- 15/10/19: Added note about OCSP stapling and EV certificates (thanks Ryan Townsend for flagging).
- 16/10/19: Added information about download chunks (thanks Pat Meenan for raising and checking). Clarified the error code presentation and added a “Without OCSP” waterfall chart for comparison (thanks Andy Davies).
- 17/10/19: Added the ‘HTML early flush’ scenario.
- 18/10/19: Added
- 20/10/19: Added
prerenderscenario (thanks Simon Hearne & Ryan Townsend for the input).
- 24/10/19: Added new section called ‘Hidden Gems’, including ‘How the filmstrip view and waterfall chart are related’, ‘How to generate customised waterfall chart images’, and ‘How to add custom marks using the User Timing API’ (Thanks again to Pat Meenan for clarification with this).
- 08/12/19: Added the Long DOM content loaded (DCL) time scenario. Originally written for my article ‘Reading a WebPageTest Waterfall Chart’ on the Web Performance Calendar 2019.
- 12/12/19: Added the never-ending waterfall scenario (thanks to Boris Schapira for the example).
- 20/12/19: Added a table of contents for easier navigation.
- 03/01/20: Added information about good and bad HTTP/2 prioritisation (thanks Barry Pollard and Šime Vidas for feedback), and ‘How to display the bandwidth graph on mobile devices’ hidden gem (Thanks Pat Meenan & Barry Pollard).
- 11/01/20: Added the ‘Large Time to First Byte (TTFB)’ scenario.
- 12/01/20: Added the ‘Inactivity after SSL negotiation’ scenario.
- 13/01/20: Added the ‘Missing component times (DNS, Connect, TLS)’ scenario (thanks again Pat Meenan for his input), and ‘The impact of a CPU bottleneck’ scenario.