CSS and the First Meaningful Paint

Patrick Hamann speaking at London CSS in April, 2017
10104Views
 
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

In this talk we will take a journey exploring the current, past, and future best-practices for loading CSS in the browser and how we can achieve a first meaningful paint within 1000ms. Ultimately creating a faster, more resilient experience for our users.


Transcript


Patrick: Hello. Thank you for inviting me, and thank you to Pusher for hosting. I've been really impressed, Pusher have been doing some amazing work for the London developer community recently which is really good. It's my first time at Under CSS, so thank you for having me. It seems like a really nice community. I'll definitely be coming back. Yeah, my name's Patrick, you can catch me on Twitter there. Please come and chat to me afterwards. I'm not a scary monster, I love talking to people, I love finding out what you do, and just come and hang out or come and [00:00:30] shoot me some questions at Twitter. I work for Fastly. We're an edge cloud provider. What does that mean? We specialise basically in ... we're CDN, we specialise in real-time content delivery. There my role is as a web performance engineer, where I get a lot of time to think and do research on how to make our customers' websites even faster. That's what I'm here hopefully to talk to you about today, and share some of the research that I've been doing recently. Why am [00:01:00] I really here? And what did that loaded title of "CSS and the first meaningful paint" even mean? You're probably wondering what does "first meaningful paint" even mean. Hopefully all will come true. But to start with, I want to ask you a question: how fast is your website? It might be your own personal one, or the one you're working for a client at the moment, or your company that you work for. But what does fast, being fast, even mean? [00:01:30] Is it how long the website takes to load, that final load event? Or is it how long it takes a user to perform an action on that site? Because actually, eventually, they came here to do something, they came to your site for a reason, didn't they? So maybe speed should be a perception of people performing actions on your site. For years, we've been searching for the golden performance metric, the one that we should all be optimising for, the metric that beats all others, and we could take back the business and say, "Yes, I have a load event of five seconds," but [00:02:00] again, I want to ask a question of does that even exist? Or more importantly, should we even have a single metric that we can measure our sites by? I would beg to differ that we shouldn't. I can't stress this enough. Ultimately, we are building websites for our users, or the users of our customers caught on our clients' websites. Not ourselves. Do you know how your users are actually interacting with the site, and how they perceive the loading of your website? Can you repeat [00:02:30] that experience that they are having in the real world, even not on your computer or on a test device, so that you can actually feel how users in the real world are interacting with your websites? Often, performance metrics track how our pages are built. We used to have the time to first byte was the golden metric, then the document complete, then the load event, but all of these metrics don't actually correlate to a nice user experience. Again, that's why your users are coming here, they're coming here to do something. [00:03:00] Luckily, the times are changing, and there's a new collection of metrics that are starting to arrive that focus purely on user experience that we're starting to agree upon in the industry that these are the things we should be optimising for. Such as start render, SpeedIndex, and the first meaningful paint, which is what I'm here to talk to you about today. Timed interactivity, especially, that's really important if you have JavaScript-heavy websites. And, I can't stress this bottom one enough, custom metrics. You should be thinking about your business needs and your clients' needs of why did they [00:03:30] actually come here for? Maybe on a search page that's the search result. Why didn't you have the time for the search results to appear on the page, and that's what you should be optimising for? The time for someone to add item to your shopping cart, if you're an ecommerce site, or the time for someone to be able to view the article when that's why they came to the newspaper website. So, start thinking about what are your custom metrics. But today we're going to be focusing on the first meaningful paint. So what is the the first meaningful paint even mean? Hopefully [00:04:00] we're about to find out. To put it simply, first meaningful paint is the time when a page's primary content, so the thing the user came here for, appeared on the screen. It's very easy for us to render quickly just some background colours or an image, but that's not what a user came here for. They came here to read the news, for instance. More detailed, it's the first paint after which the biggest above-the-fold, so we only care about that first viewport, think about on mobile, it's what's ever is in the first viewport. We don't care about anything below the fold. [00:04:30] But the biggest above-the-fold layout change that happens. The biggest of those pixels, the change went most of them were painted. Most importantly, there's ... we all love web fonts these days, when the web fonts have loaded. Because web fonts are render-blocking weasels, and also web fonts by default in most browsers will block between three and five seconds whilst you're waiting that download. Even though the text is all there, we normally don't paint until the web fonts come in. We have to add that heuristic into the timed [00:05:00] first meaningful paint metr But this is probably explained as a visual representation. I want to start by playing a game. I want you all to look at this timeline. This is of the Financial Times's homepages, loading, and have a think in your mind. Keep it to yourself, where you think the timed first meaningful paint is on this timeline. Is it 3.5 seconds? Is it 4.5? Is it 5.5? Remember the [00:05:30] answer, we're going to come back to that in a second. So, first meaningful paint was actually coined by some very clever people at Google last year. It's a very, very new metric that's not actually exposed in browsers yet. They wrote a white paper in April which you can find here, I'll share the slides to these links later. They detailed in which they were fed up of the industry using metrics such as time to first byte and load event as start render as an indicator of page performance, and how [00:06:00] could we go about thinking of the user's perception, how can we measure user perception, and that's how they detailed their methodology in this white paper. So, again, it's probably better visually explained, and the paper states that the time to first meaningful paint is when the biggest number of layout objects in the above-the-fold viewport that have been laid out before, IE we've never painted them, are painted to the screen. On the top here, this is the number of layout objects Chrome is building, so a layout object you can just translate probably to [00:06:30] a HTML element like a div. It's the x and y coordinate of where that should be paint, and its width and its height. That is a layout object. So when the browser's parsing your HTML, CSS, it has to perform the layout to know where to paint them. Here, we can see Google did something very clever with their search results, is that they flush the header way before they even send the search query back to the databases to find the data. So you get a very, very fast rendering. You can test this on your phone now, they've been doing it for years. It's very clever. [00:07:00] It means you get the header instantly, but it's not until about 1.9 seconds here that the search results came in. That correlates to the timed first meaningful paint because that is when the most amount of objects were painted to the screen. You can see that this is a really good indication of user perception. That's why people like myself get really excited about we now have a metric that is much more useful than, say, first render and time to first byte. So, now that we have that newfound knowledge [00:07:30] of how we go about ascertaining the timed first meaningful paint, does that now ... has that changed your answer for this? Just shout out, who thinks, where do people think the timed first meaningful paint is here? Speaker 2: [inaudible 00:07:44] Patrick: Four? Five? Who said five? Speaker 2: [inaudible 00:07:50] Patrick: Bingo. The answer is five. Speaker 2: [inaudible 00:07:53] Patrick: Because, why, because there's a lot, even though we've got the text, [00:08:00] which is the kind of ... well, actually what the user came here for is in this. We also have the hamburger menu, the logo, and all the icons and images have loaded as well. It's the largest amount of layout objects that came finally in. You're probably wondering how you can measure TTFMP yourself. The only way about going about doing this now is using lighthouse, which is Google's open-source performance auditing tool. They actually made it to measure progressive web apps, but it just so happens that it has very useful auditing of non-performance metrics. [00:08:30] You can instal this as a browser plugin if you go there, or as a command-line interface, as a CLI, so you can automate that process. They expose your TTFMP metric for that given URL. The good news is that we want to push this further, and they're actually going to start exposing it as JavaScript object so that we will be able to programmatically get access to metrics like that expose via Chrome's tracing soon. But this is the only way you can get that at the moment. But we'll see ... I [00:09:00] lie. We're going to see another way later on. So. Let's dive right in. Now that we know what TTFMP is, how can we optimise for it to create a much better user experience? To do this, we're going to apply the optimizations on a real website, one step at a time, looking at the present and future best practises for loading your CSS and your assets in the browser to try and optimise for that first meaningful paint. So, it's really easy to create [00:09:30] lab kind of static simple sites to prove how these things work, like hacker news or to do NBC, but I kind of disagree with them because they're aesthetic sites, and they're not the sites that we're building on a daily basis. I actually prefer to use real websites as an example of how we do this, because at the end of the day we are all making real websites. For the purposes of the talk and the research that I did, we're going to be using FT.com homepage, and we're going to be applying optimizations to it. I must say, disclaimer: [00:10:00] I used to work for the Financial Times before I joined Fastly, so it's also comfortable to me. I know how that page was built because I helped build it, and so I can talk a bit more about it. The FT have given me permission to do this, I'm not kind of bashing them in any way, it's actually a very fast and well-developed website, but it's a good baseline to show to you. So that's the disclaimer over. Each optimization is going to be tested using WebPageTest. Who's hopefully here that's heard of WebPageTest? If you haven't, I urge you to go and [00:10:30] test it out, webpagetest.org. It's like the number one tool in the performance ninja's toolbox. The reason why I've chosen this to do it is that it allows you to test on physical devices. When you choose this option, Moto G, there is at the end of that, a real device sitting in a box somewhere in a server farm, and it's going to run the test on that. I can't stress this enough, that you should be testing on real, physical devices, and not just your shiny iPhone or your Google Pixel [00:11:00] that, you know, we're all privileged that we might have. Actually, the real average device globally, is something like this. It's a low-to-mid-end Android stock phone that has pretty terrible CPU and very bad memory performance. This is what we should be testing on, not on our MacBook pros and our cable connexions. Again, it allows us to also shape the network connexion to what a 3G in emerging market's actually like. This should be your baseline profile that [00:11:30] you are testing your websites against. To be able to think about how, what should our baseline be, what should we be testing against, I want you to ask yourself these questions: what is your average user profile? Where are your users based for the website you're building at the moment? What's the device landscape? What type of devices are they using? The most important one, what context are they using your site? I highly doubt that there'll be any of you in the room that you're building a website that has a single context. All of us have multiple contexts. In fact, [00:12:00] if we were to take the FT.com, I myself have two contexts that I read that. I read that on my commute on the way in, on my mobile device, on my flakey Virgin network as we go in and out of tube tunnels, and then I read it at lunchtime, and I have a very fast connexion on my laptop, so just a single user can have multiple contexts. What is their network profile? I just stated two of those network profiles. We may think that we have amazing 4G in this country, but as soon as you go out of London, or you're walking around in the field, or [00:12:30] even if you're going in and out of tunnels ... a flakey is what we like to call ["li fi" 00:12:34]. Why did they come to your website? What is the primary action that they've come to do? So, if Google doubleclick for publishers, Google's largest advertising network for publishers, did an amazing research ... I should have a link for it here but I don't. Again, another white paper, where they detailed the average load times of globally, so not just in the Western world, [00:13:00] and they found the average load time for a mobile site globally is 19 seconds. I don't know about you, I find that astonishing over a 3G connexion. They found that 53% of mobile visits are abandoned if the page takes longer than 3 seconds to load. If you put those two together, it's a bit weird. We've got 19 second page loads, but people are abandoning them after 3 seconds. You can probably work out what's going on there, is a lot of people are going to be leaving your site. [00:13:30] We are told on a daily basis that the future is here, but as hopefully you'll know, the future is here is not yet evenly distributed. The 3G and 2G profile that we have in the US and the EU and Australia and southeast Asia is very different to that of Africa, South America, Russia, and India. Open signal do open source research showing you the overall of two, three Fiji connexions. [00:14:00] You can see that the future is here, it definitely is not evenly distributed. As the next billion come online as they are, and are going to start to be using your products, again, I must stress, this is the network profile that you should probably be basing your tests against, not that one. So taking all of that into consideration, today we're going to focus on achieving three seconds to our timed first meaningful paint on an emerging market profile, one second on cable. [00:14:30] Again, I can't stress it enough, don't take my budgets that I defined as the golden baseline. You need to go home and think about your user profiles, and then define your own custom budgets that you then test against. Our test methodology, as I said, it's going to be a webpage test. We're going to use Moto G to do it. We're going to do nine runs on webpage test and choose the median average of that. Then here's our network profiles that we're going to run each test against. This is really the important thing to note, that the emerging market 3G [00:15:00] has a 400 millisecond round trip time, so that's the time it takes for the request to go from your client, the mobile device, to the server, and back again. That's the round trip time. A single HTML file's probably the FT.com homepage, takes about four round trips to download, so you can already, if you do the maths, we've already gone over near or around one second, and there's nothing we can do that. We'll only ever be able to get a one second baseline on a 3G [00:15:30] emerging market. 28 milliseconds is roughly on a good internet connexion is how long it takes a packet to travel between London and New York. The internet is very fast when it can be, but it can also be very, very slow. So that we can measure the impact of each test that we apply, we need to form a baseline. Hopefully this looks familiar to most of you. This is how we've been building webpages, at least since cascading style sheets came about, so from about [00:16:00] 2000 onwards. We have an HTML document, we reference our CSS in the head of the document using a link element, and that's how we go about. We ship that into production, and that's, you know, how we've been building our websites. If we were to run that in WebPageTest, this is the waterfall network graph that we'd get. Who, hands up, if you've seen a waterfall before. You might see them in your network panel in dev tools as well, so for those of you who haven't, it's called a waterfall graph. On the X axis is time, [00:16:30] and on the Y axis is the requests in the order that they happened. We loaded our index.html file, we had to open that TCP connexion, we waited for a bit, so there's two segments to each segment. There's the light shaded one and the dark shaded one. The dark shaded one is when the bytes have actually been downloaded, and that's just waiting, idle waiting time, whilst we wait for the response, the time to first byte, and then WebPageTest does this really nice thing that it color-codes the [inaudible 00:16:56] types, so we can see that HTML is blue, CSS is green, images is purple, [00:17:00] JavaScript's orange, and fonts are red. That's as simple as it is, that's how you read a waterfall chart. The important thing here is we've got a start render, which is the green ones. It's a good indication at least of when we've started painting, obviously not when the meaningful paint happened. To run that test against our methodology, we get these results. We're way above our budget here on the emerging markets. We've got ... this block should not be here. [00:17:30] Sorry. That's got the improvement on it. It results in the 8,000 millisecond time to first meaningful paint. How can we optimise this? Our first experiment is going to be to inline the critical CSS only required to render that viewport, above the fold viewport, into our document. Many of you might have heard of this technique. I personally have been speaking and advocating since about 2014 when I applied it to the Guardian.com, but let's go and have a look at [00:18:00] determining how we do that. We first, to understand this, we have to look at the critical rendering part of our page. This is the single path that the browser needs must have to do all of these steps when from a user clicks on the link all the way through to when we render the page. First we have to fetch the index.html file. We wait for the response, so we've got idle time on the main thread here. We then get the response, and HTML can be paused incrementally. This is one of the most [00:18:30] amazing things and often underlooked things of the HTML specification is we don't have to wait for all of the HTML file to download before we can start processing it. We can stream it in and start building the DOM there. If you ever heard that term, HTML is parsed incrementally. We don't have to wait for the response, we can start building the document object module. At that point, we find that link element from our template, and it goes to perform the networking from that. We have to wait for that. The CSS is known as a render-blocking resource, you might have heard that before, because [00:19:00] if we weren't to wait for the CSS, and we were just to continue building up the DOM, and we paint black text on our white screen, as we brought this back we'd cause loads of flashes of restyling. We don't actually, that's a really terrible user experience. We force CSS to be render-blocking. We have to wait for all of it to be downloaded. CSS can't be parsed incrementally, unlike HTML, so we have to wait for all of the bytes to be downloaded in a CSS file before we can render the page. That's because [00:19:30] cascading style sheets, it means that if we have one thing at the top of our CSS file that says, "paint this red" and we then painted it, as we were parsing down, you might override that in the cascade and it might say, "paint this blue" so you'd have a horrible flashing light disco on your screen. For that reason, we don't parse CSS incrementally. Once we've got all of that, we have enough information to paint to the screen. But note this idle time here. We've wasted a lot of time on the browser [00:20:00] spread whilst we've waited for the four round trips on this. Yep, they go and note that. The other thing we've done here is we've created a single point of failure. What if I clicked on FT.com, and then I got all the HTML, and then I went into a train tunnel. AT that point, my browser would've requested the CSS, but I've created a single point of failure here. I have all that content for that article that I wanted to read, but I can't go and paint the paint to the screen [00:20:30] because I'm blocked waiting for that CSS file and I'm in a tunnel. The idea is what if we were to in-line, in a style block, all of the styles needed to paint our above the fold viewport into our document. We've declared the CSS is asynchronous, we make the browser think that there is no more CSS, and we've got rid of that idle time. Now, we've completely eliminated a round trip, and the single point of failure, and we've given the browser [00:21:00] all the information it needs to be able to paint to the screen just with inside that document. So, we need to start thinking about what are our critical styles. Are they the sharing buttons that we have on the article page. Is it the opinion block that we have down the bottom of the [inaudible 00:21:16] home page. No. The only styles that we should be inlining are the ones that are critical to the top of that viewport. Sorry for the content selfie bomb of Theresa May. [00:21:30] That's what this looks like. We have a style element in our template with some CSS. This goes against everything that we've been taught as front-end developers of separation of concerns or styles in our CSS, behaviour in JavaScript and content in HTML. And now we have another CSS declaration, but this time we're "rel preload" which we're going to talk about later. So this is hiding this from the browser, saying, "This is asynchronous, you're not going to block on this," and to do this, I'm using the Filament Group's load CSS [00:22:00] function. I urge you to go and check that out. It's because older browsers don't understand this link, so you have to poly fill that. If there's no JavaScript at all, we just have inside our no script a normal link [inaudible 00:22:11]. So a lot of people ask me how much critical CSS can I put into my document. Obviously, it would actually be detrimental if I included all of the CSS in line. To do this, we have to understand a bit lower level of our networking stack. When your [00:22:30] browser makes that network request to fetch your FT.com homepage, what's actually happening is it's creating a TCP socket connexion underneath. TCP's got this amazing functionality called the slow start. TCP slow start, or the congestion avoidance algorithm. This is because the client and the server, both of them, when they're creating that connexion, neither of them know if the network link underneath is congested. Your flatmate might be watching loads of YouTube, [00:23:00] and actually you don't have much bandwidth. To avoid packets from being dropped, you actually only start with a very small congestion window, so we only allow the server to send us 10 packets. 10 packets is worth 16 kilobytes. Once the browser gets those back, it acknowledges them and the window gets bigger and bigger and bigger. This is why actually your internet will get faster when you're watching videos, because it's got a very long time for the congestion window to get high, but on websites, when we're just downloading [00:23:30] small files, we actually never really surpass a C window of 40. The answer to the question of "How much critical CSS can I put in?" Is that you want to give the browser enough information with inside the first round trip, so inside our initial congestion window, so we can only put a maximum of 16 kb of gzipped CSS inside that critical CSS. If you've ever wondered that, that's the science behind that. So, we're looking at our baseline [00:24:00] before, here is our start render. We had to block waiting for main.css. Now, if we apply our inline, there's two things to note here. The first is we've started rendering whilst we're still downloading HTML bytes. This proves that HTML's parsed incrementally. It's found all those styles in the link, and it's just started painting. It has everything it needs. The other interesting thing is obviously it's not blocked for the main CSS file, thus proving that our secondary non-critical CSS is now asynchronous. [00:24:30] So now, comparing that to our baseline, we had a 63% improvement on our time to first meaningful paint on 3G and emerging markets, which I think is incredible. We're getting close down the bottom, 1,300 seconds to our cable. This comes with some cons, though. The eagle-eyed in this room will probably note I've made my non-critical CSS asynchronous, but what happens when that does load in? That's going to cause a massive [00:25:00] reflow or repainting as the browser reads and parses all that information, so this is really have to be important that we separate that anything above the viewport has to be critical because you don't want to call a reflow and have a bad user experience when that loads in. We're making not cacheable, because by putting it inside the HTML file, we don't have a CSS object that the browser can cache. We're not going to benefit from browser caching now, and every time we update our critical CSS, we're going to have to invalidate all of our [00:25:30] templates that we're rendering on those pages, because they have the CSS in them. And at scale, and even not at scale, small websites, it's quite hard to maintain, and even harder to automate. Every time that I've incremented it, I still don't use any of the automation tools to do it because I don't trust them, really. You kind of have to do that by hand. I'd love to talk to people if you have ideas about that afterwards. But the pros are amazing. There are not blocking resources, [00:26:00] you give all of ... you get instant painting, you give the browser everything it needs with inside that first round trip. Now that we've been able to get our CSS down as soon as possible, how can we prioritise the other resources that we have on the page required for that timed first meaningful paint? Specifically, as we learnt at the beginning, we fonts, because that's part of the TTFMP heuristic. Again, I want to ask you a question, can you think about the website that you're building now. What are the critical resources [00:26:30] on that website that are required for the timed first meaningful paint? And how many of those resources do you have? So, if we're looking at the FT.com webpage, do we consider the logo a critical resource? Is it the web fonts? Yes, because talked about that. Is it our lovely image of Theresa May? Is it the hero image? You don't have to think about it yourself, fortunately. In lighthouse, Google have done this for you. And so, we call these the critical request chains. [00:27:00] How many resources are required to get to that timed first meaningful paint? In their audit, they've shown that the FT.com homepage has a critical request chain of five. These are, you'll note that they're the font, the logo, and not the hero image of Theresa May. This is our critical request chain. If you take away one thing from my talk tonight, it is that to get a fast timed first meaningful paint, you need to be optimising the delivery of these assets. There should be nothing blocking [00:27:30] them, there should be nothing coming in the waterfall before them. These are the things that you care about if you want a fast painted user experience. Also, you should be trying to reduce the height of it. I would say to the FT.com, they don't need three custom web fonts. They can probably get away with one, and I would just say get rid of the web fonts completely and you'll have very fast painting. Could you inline one of the logos, so ... actually just eliminating your critical request chains, you're going to instantly be speeding up [00:28:00] your time to first meaningful paint. Here's our inline waterfall before. If we were to look at our waterfall before, now that we know that fonts are in our critical request chain, where are they on our waterfall? They're right down the bottom here. We haven't optimised for their delivery, in fact, for our time to first meaningful paint we're going to have to wait for all of these, which are some arrows and a play and a pause icon, before our most critical resources get loaded [00:28:30] in. To understand why they're coming in so late, we need to take another little detour again and understand how browsers go about painting to the screen. I personally find this kind of thing fascinating, but I think the more that I've learned about the internals of browsers and how they go about rendering, they've made me a better front-end developer, and make me think differently about how I go about optimising. We have the network request, the user clicked on that link. As we learned earlier, [00:29:00] it starts to get the bytes, start from the HTML down, and HTML's parsed incrementally. We can start building the document object model. Most of you probably know this if you write a bit of JavaScript, you're interacting with the DOM. We've all heard that term before, the DOM. The DOM is a tree structure, so there is a parent-child relationship inside that HTML, so we have the body tag, that's its parent, and then any element below that is going to go in the tree is going to to go the next leaf down. If you think about an upside-down tree, [00:29:30] these are the leafs, that might have some more children underneath it, that div bit might have some paragraphs, and that's what the Document Object Manager is. It's a tree-like representation of our HTML. The way it goes about doing this is it converts the HTML bytes to strings, so from bytes into UTFH strings. Then it splits up those strings into tokens, so this phase is called the tokenization phase, of saying, "I have a start body tag, and that's one token," then it'll have some children and eventually [00:30:00] right at the end you have the end body tag and that's another token. Those tokens then get parsed into loads, and then put in the tree. But as we learned earlier, the DOM construction actually has to stop when we find a critical render-blocking resource such as a CSS file, such as a link element or CSS file. So we have to stop the DOM construction, go back to the network, and go and fetch that CSS because we can't continue creating the DOM [00:30:30] asynchronously because we'd paint black on white text and it'd be a horrible experience. So we have to block waiting for this. Again, we learned that we have to wait for all of the bytes, and then it creates the CSS object model. This is often underlooked. It's exactly the same as the Document Object Model, but it's a treelike structure, again containing all the nodes, but it has all the styling information required for them. Then the DOM construction finds a script element. Our lovely friend and foe JavaScript, the thing that we all think we love but actually deep down we [00:31:00] hate it sometimes. JavaScript is also render-blocking. That's why we're always told to declare it as asynchronous, tell the browser we don't want it. And why is this? It's for two reasons. JavaScript is blocked by the construction of the CSS object model, because if we were to execute that script, the JavaScript might query the style of an element. Because in JavaScript you can say, "How big is this element?" So we can't execute this straightaway if we don't have all of that information. So the first thing is JavaScript is blocked by [00:31:30] CSS, because you need to have a full CSS object model to be able to query it. If we were to pause and execute this asynchronously, that means depending and your network condition and how much processing this has, say you were to have document append child element, you don't actually know where that element's going to be injected. So you'd have a range condition and that element might be injected at different locations depending on the device and the network characteristics, and so you'd have a horrible range condition. [00:32:00] Especially as a developer, if you were relying on the location of that element. So for that reason, that JavaScript is blocked by CSS construction, and DOM construction is blocked by JavaScript execution, because that can alter that. Basically, JavaScript's terrible, don't go anywhere near it, just build static websites is my advice. Once we've constructed the DOM and the CSS, we now have these two trees come together to form the render tree. Because some of your elements might be hidden, they might [00:32:30] be display:none, they are taken out of the DOM, and you're left with only a tree with only the things that are visible to be painted. Why have I told you this, why have I gone off on this whole tangent? It's because of fonts. It's at this point at which fonts are found by the browser, because you only reference them inside CSS, and only until we want to layer on paints and text that we find that declaration. At this point, we've wasted a lot of network time. [00:33:00] Fonts are a very late found, they're known as a hidden subresource. Unfortunately they are hidden from the browser because the browser finds them too late. It can't go and optimise the networking for them because it finds them this late in the game. I, and many other people, argue that that is far too late. Here's a better representation of [inaudible 00:33:21] we get. The bytes get turned into characters and strings, then we turn them into tokens, and then our DOM structure. The reason why I'm showing you this [00:33:30] is that browsers have a very amazing [performance improvement and optimization that we don't need to do anything about and it's called the pre-load scanner, or the speculative parser. What this does is even though DOM construction is blocked by JavaScript and CSS whilst we wait, the preload scanner carries on parsing the document and building these tokens, and it looks for other critical resources such as maybe font declarations, images, script [00:34:00] elements, and then we can go and perform the networking for those. It's actually like queuing up all of these things. We don't have to do this. The browsers do it for us. Google found that just by introducing a preload scanner, they improved all webpages globally by about 20% in terms of their loading speed. Now, that's awesome, but as we know, fonts are hidden, because in tokens, tokens don't have the content of the CSS. The preload scanner can't find our fonts. They're still hidden. [00:34:30] Fortunately, the web performance working group, the W3C, have solved this problem by creating the preload API, or the resource intake API, and the preload attribute and element. It gives us a way, a primitive, of declaring to the browser, "These are the hidden subresources such as fonts, and I want you to go and do the downloading. You are going to want these, these are in my critical request chain." So it provides a declarative fetch primitive that initiates an early fetch and separates fetching [00:35:00] from the resource execution. Because it doesn't execute the JavaScript or CSS, so it's safe for you to go and do the fetching, and it's us at the authors, as the web developers and designers, saying, "Browser, these are my critical resources. I have identified these and I know they are. You need to go and fetch them now, before anything else." Which I think is awesome. And that's how we do it, we've got now three new primitives in HTML. It's just a link element with the rel "preload". We've got a script we can dynamically do this with a link [00:35:30] element, so we can inject these. You can imagine if you have an image carousel that you had to click on a button to expand the image carousel. As the user was hovering over that expand button, you could go and inject preload link elements for all of the images that were inside the carousel, so when the user does click on it, those images are going to paint instantly. My favourite way, and the way that we're going to apply this optimization to the FT.com homepage is by the often-underlooked link header. Link HTTP header on the response of your [00:36:00] document. Here, you're saying, "I want you to pre-load these assets." This is how we're going to apply optimization to the FT.com homepage. This is the HTTP response headers for when we request the FT.com. Here you can see those five, well, there's actually only four there, critical resources that we identified as our request chain in Lighthouse. We're now declaring them. There's two important things to note here. The first one is that the order in which you declare them [00:36:30] is the order in which the browser's going to download them, so this dictating priority. Fonts for some crazy reason, many years ago, when the font specification was being written, someone thought that we should always treat fonts as a cross-origin resource, even if they're being served at the same origin. It's madness. So we have to declare fonts as cross-origin, or else they won't be downloaded. Take note of this no push directive that I've put there as well. We're going to talk about that later. So let's look at our waterfall before [00:37:00] applying the preload. We have, the fonts are very low down in the network waterfall. Just by adding those headers in, we've re-prioritized the delivery of those assets, told the browser, "These are the critical ones." So what's that done? Oh, sorry. So the only bad news about this is that the browser's supported landscape for rel preloads at the moment is only about 50%. I still think that's enough, so seeing Chrome and Opera have shipped it, good news is Firefox and Microsoft Edge have [00:37:30] both expressed their intent to implement. What's that done to our results? We've now got a 64% improvement on our 3G in emerging markets. We're very, very close now to our target of three seconds time to first meaningful paint. We've basically made it for our cable. We're now at 1,000 milliseconds, work by introducing preload. It's really easy to create contention on the network, because with great power comes great responsibility. You [00:38:00] could just list all of the resources on your page, all of the images, but actually what you're going to do there is just congest the network. IT gives us a way of indicating hidden resources. I need to rush forward because I've just looked at the time and I'm going quite slow this evening, which is really bad. We could just stop there. We've improved our time to first meaningful paint by 64% now. But surely we can do more. This is where HTTP/2 server push comes in. For the first time in [00:38:30] over 20 years, we now have a new version of the underlying transfer protocol of the internet. I can literally do a whole talk about this. In fact, I have. You can check it, it's on the Pushers sessions website where this video's going to go, from Front End London a couple of months ago. How many people are actually using HTTP/2 at all in production? Yeah, a very small amount. I was hoping there'd be more than that. I'm not going to go into details, that's completely out of the scope. [00:39:00] But one of the features that it has is something called server push. To understand this, let's have a look at how we normally go about requesting our page. The client requests the index file, we get a 200 response code with all of the HTML. The browser then finds that CSS link element, and makes a request for it. Now, this is actually quite inefficient. What if we were to request the index file? We as the web developers and the authors, we [00:39:30] know that the next resource this browser is going to request is going to be the CSS file, because that is the highest priority thing. We've probably used rel preload, we've told the browser, "I want you do download main.css next." So what if we were able to push all of the bytes for the CSS file down before the client gets any of the HTML, before it can even make that request? H2 does this by using a push promise frame. The push promise frame has to be sent before any bytes of the HTML. This is the server [00:40:00] saying, "I am going to push you the bytes for main.css, I'm going to stream them down, so don't request anything." It's really important it comes first, because else you'll get a race condition. Then we send the HTML bytes. Now we used the idle time that we did have on the connexion normally waiting for that request again. So we've actually made it much more efficient. How can we do this? We can do this by programmatically indicating to our HTTP server by using the link [00:40:30] header again, as we learnt in the preload specification. But note, this time again, I don't have the no push directive. Any HTTP/2 enabled server, most of the, all of the good ones, will use the link preload header as an indication of the resources that you want to push down. Have I ... yes. Now, for our experiment on FT.com, we're actually going to get rid of our horrible inline [00:41:00] critical CSS, because we weren't benefiting from the caching, remember. We're going to turn it into its own file. We're still going to have the separation, because it doesn't need all of that information in the first point, and we're going to declare it as a pushed resource. Let's have a look at what's actually happening on the network here. We send the request for the index file, we've got a bit of idle time waiting for the server to generate and render that template, and then because we've got inline CSS, we send the [00:41:30] bytes down, we get our start render, and then we have some idle time again waiting for the DOM to re-parse and for that request to happen, and there's our time to first meaningful paint. What do you think is going to happen when we push? Unfortunately, you may actually be surprised, but the result of pushing on critical files has actually had a detrimental impact on our start render and our time to first meaningful paint, because the critical file was actually pushed after the HTML, and not [00:42:00] beforehand in the idle time, which is where we actually wanted it to be. Why is this? To understand that, we have to understand a bit more about HTTP/2. Because we were using the link preloaded header on the response of our index file, the server already had all of the bytes for the index file. We've rendered our templates, we've made our database requests and sent it back. Now, the server's like, "Okay, you want me to push main.css, but I've also got [00:42:30] all of the bytes for the HTML file. And H2 uses priority dependency graphs, and it does that by weighting of mine type and importance. HTML's got a much higher priority than CSS does. Because the server has both of them, it's going to prioritise pushing out the HTML bytes, even though you've told it to push the CSS file. That dependency behaviour might be slightly different in some of the implementations of HT2, so I have to make sure I [00:43:00] mention that, but most of them prioritise HTML over CSS. So the question is, is indicating what we want to push via that link header on the HTML response actually too late in the connexion state? Just as we saw in our results, the push experiment caused a negative effect on our time to first meaningful paint. Here you can see that it does, we're not waiting for the request, because we never sent the request. So that's awesome that we've only got the bytes bit. But it's coming [00:43:30] too late. We actually want it when we're using the idle time here. Good news is, H2 is fully supported, we've got about 80% coverage globally, and most of the server implementations also support push now. If you're using NGINX or Apache or Varnish or things like that as your applications server, they will support it, just go in and enable it today. You're going to see a lot of benefits because of the multiplexing anyway. Now you can see that we have a negative result on our time to first meaningful paint by pushing [00:44:00] the resources. We've got 5,000 new seconds on our emerging market here. So again, it has to come with cons, that we're actually creating contention now on the network. It requires some custom logic. It's really hard to debug push. In fact, most of the browsers are a bit iffy in their dev tools of showing you when the push actually happened. That's why you need to use things like WebPageTest and even lower level tools like WireShark and Grok to do to see what's happening on the network. I have to mention this. It's [00:44:30] not cache aware. So if I was to push the critical CSS, then the user would then, the client caches that resources in the HTTP cache of the browser for the repeat view. The next time I visit the FT.com homepage, I'm going to push it again, and that's because I don't know whether or not it's in the cache, but then we're actually going to be wasting bytes. Not only are we congesting the network, we're actually pushing too many bytes down. So even though the server push experiment gave us a negative result, there must be a much better way of doing this. This is [00:45:00] firstly recall asynchronous push, a way of decoupling your pushing resources from that HTML response. Let's look at the network utilisation again, and I pointed out that fact that we're wasting some idle time here, whilst we wait for the server to respond. We look at the connexion again. What normally happens is that we have some browser think time. You probably have to go request the articles from the article database, parse that to your template and language of choice, and that's the time whilst [00:45:30] the server is doing that think time, is when we really should be pushing. But because we can only indicate push via link header, we have to wait for that think time before we do it. A much more common architecture that people have these days is you've normally got your application is actually in front of another server like Apache or NGINX, and so when we dispatch the request to the app server, it's at this point that we should be pushing our CSS, and not whilst we wait for [00:46:00] the wait time and the response. SO, this is how you can actually do this. By decoupling from the link header, this example is using nodes HTTP/2 model. Here we're saying, we just have a middle where, as you would do the express or anything, and we're saying, "If the request was from my home page, I want you to push and flush the bytes down the response." Then you go and do your stuff with fetching your data from the database. Don't use link [00:46:30] rel preload. Try, if you have the capabilities to have access to your server like this, then you have to manually push these to get the best benefit, which we're going to see. So before our pushed critical CSS was here, but by decoupling it and using an async push, we've managed here the holy grail. We've managed using the idle time here. That's when our critical styles were sent down, before we even received a single byte of HTML. Our start render's gone back to [00:47:00] the lightning fast time to first meaningful paint as well. Now that's given a 65% improvement on our 3G emerging market. We're now at 3,062. We've reached our goal, and we've it on cable, which I think is extremely impressive. The problem with it, it's limited availability, only certain servers are capable of that. Firstly, we are a version of Varnish, we allow you to do this. You can in nodes, H2 server does it, I think the [00:47:30] Ruby implementation does as well. But it's hard to debug and it requires a lot of custom terminology. But you're using that idle time, and you're ensuring the delivery of your CSS as soon as possible. I mentioned briefly that while push is useful is in the first view, what do you think'll happen on the repeat view when users have already got it and it's in their cache? Now, I wanted to show you an experiment of how you could use service workers and take it to the one final next step to tell the browser not to push any resources [00:48:00] for subsequent requests because you pre-cached them. But I ran out of time last week, and I actually have to do real work rather than these fun researches. But I'm hoping to write up some more about that research soon. Looking at the results finally, we can see that I pushed async ... this is that visual progression graph, like we saw at the beginning, with the Google home search one, and compared to our inline example there, it's clearly winning. What does that look like for the user? So, we really [00:48:30] improved that user perception of their time to first meaningful paint. What does that look like in a video? Hopefully this is going to paint the fastest, which is awesome, look at that baseline. I just wanted to leave you on the future. We've now got a strong toolbox, a very good collection of APIs, preloads, using critical CSS, and using HTTP/2 push. [inaudible 00:48:54] of us have faster loading in the browser. But there are still some issues that we identified, the first one of which is [00:49:00] the server is not aware of what is inside the browser's cache. We're going to send it files that actually we probably don't need to send it. This is where the cached digest specification in comes in. My colleague Kazuho and Mark Nottingham from Akamai have written this. It's in working draught at the moment, but what this does is it specs out a way that the browser, using a digest, so a very small hashed version of the [00:49:30] files that are inside the browser cache, but obviously because of security reasons it needs to be scoped only to that origin only for that domain, so then the server can say, "Oh, I was going to push main.css, but you've told me you have it, so I'm not going to send you, I'm not going to push it," and then the response can come down. Which I think is awesome. Finally, the biggest weakness of push is the initiation, the way of initiating that push, is actually too late on the HTML response in the link headers. This is where the 103 status comes [00:50:00] in. It's for a news spec called early hints, and so none of us have really even heard of there was a 100 status code range. It's actually called the informational range. We know about 200 and above. 103 is a way of when the client makes the request, whilst the server's doing that think time and processing, it can flush down the response of 103 that just has those link headers. That's all that will be in the response. They're saying, "Here are all the critical files. [00:50:30] I'm still doing some thinking, but you should go and perform the networking for them," and then finally it can flush down the 200. Basically, doing what we were trying to do in my hacky async way, but it's formalising it as a spec. Sorry for going massively over time. This has been a whirlwind tour of how you can load assets in the browsers and I've only really scratched the surface of each methodology, but hopefully I've left you with some methodologies you can take home and use. I just want to leave you with a couple of takeaways. We've realised that resource loading [00:51:00] in the browser is hard. Probably harder than we thought it was. Bandwidth is often underutilised. We've normally got a lot of idle time on our connexions that we're just wasting. So identify your critical resources. Remember, I said that's the most important thing to take away from today. Use something like Lighthouse to identify your critical request chains. Then use the preload API to indicate to the browser what those resources are, especially your web fonts and push critical CSS, [00:51:30] but only in the first view, and if you can, only with inside the idle time. I'll leave you that we should always be testing. Nonstop. Once you get to a baseline, keep on going through it, you can beat it. Most importantly, test on real devices, on real network conditions, that real people in the world have, not just your shiny MacBook pro. Thank you very much. Sorry for taking your time.