Building Scalable and Resilient Web Apps with Microsoft Azure

Steve Spencer speaking at dotnetsheff in March, 2017
253Views
 
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

Building web apps that are both scalable and resilient is challenging especially when you have an existing application that you need to change. This talk discusses scalability and resilience, the issues you could have and highlights some of the services within Azure that you can use to help.


Transcript


This talk's more of a talk rather than demos, so the second one's more techie. This is more just a conversation. So to build a scalable web application, I'm here to talk about what we mean by that. So I want to talk about what resilient means, I want to talk about what scalable means, and also why do we care about it? So we'll have a bit of a conversation about that. Then we'll talk about what you can do to scale, what you could do to build your [inaudible]. Then we'll look at what Azure can do to help you with that, and a few sort of hints and tips that, maybe will help you along the way. So what do I mean by resilient? So what we want is something that will maintain an acceptable level of service whilst our application is running. So we've got a website there. We don't want it to fall over. and if something goes wrong, quite often you'll have seen some sort of error page pop up. Maybe you don't want that. You want it, to be a bit more solid in what you're building and, hopefully, your application will be running normally. When we talk about scalability we want to be able to maintain a level of performance. So whilst resilience is about making sure something, it still runs when it goes wrong, scalability's about making sure it runs when you've got a lot of load on your site, and so, you need to be able to grow that, and your service out with the sort of demand you get, and, also, if you're using cloud computing, then you may want to shrink that down when the load's not there. So why do we actually care about this? We, generally, if we're building a website, we're trying to use that to attract customers, so we want to provide a good service. So we want the website to be quick, so when our customers are there they're not getting bored waiting for the spinny things to happen or waiting for the page to load or whatever. We also want the website to still work when things go wrong. So you don't want it just to throw an error page up when that's really helpful, your customers are going come along and think, "That's all over. We should not be in there again." So if you put stuff in there to help them, so maybe things still want to work, even though there's something behind the scenes going wrong, but it'll catch back up when things get back up. And I guess, what's more important is...when I've got a shop online, I don't want that to go down. So if something's going wrong in the backend, I'll still want to get my orders in, I still want to get the money there, but I want to make sure that it still happens, the orders will get processed, even if the backend service is down. And, actually, we want to make a good impression. So your website, you're sort of selling your site, your company. If that's rubbish, then who's going to come to you? So you want to make sure that you actually got something there that people are confident that as you, as a business, you're actually going to be...it's going to be good. So talk about scalability, there's actually two different types of scalability. So the first one is probably what most of us would do at home. So you've got a computer, it's running slow, you buy a bit more RAM for it. You might buy a faster graphics card. You might buy a new CPU with more cores, whatever, faster, and a bigger hard drive or SSD or whatever. So basically you're taking the hardware you've got and you're going to build it with more power, and that's actually easier. If you've got an application that you've already got running on a single server, actually scaling that out, adding more resources to it, it's probably the way that most people will go. And it's actually the easiest way of doing it because you don't have to do any changes to your application, your application should still just run, but you're limited. When you get to a certain size, the cost of a new CPU which is running however fast, and however many cores, it's going cost you too much, and also you've got physics there, so thing's can only go so fast. So you get to a point where you've built out this service as big and powerful as you can get, but you can't really spend any more money because it's not cost-effective. But, also every time you scale out like that, scale up like that, you've got the old server you probably have to throw away or the old CPU or whatever. So you're actually wasting money because you're not really using it. So there are other ways you're scaling out. So what you do is you build lots of machines that are really similar, they're all similar specs, and that way, if you want to scale, you start another machine, and then, actually in theory, you've got unlimited scale because you can just keep building out, keep building more and more machines, and you actually scale out. You're not having to rebuild your application each time. Well, the problem with that is quite often your web app won't work on multiple machines. So there's things you've done, maybe accessing files or whatever, and you've got to actually change things to make it work. You also might have to change your infrastructure or whatever. So I'm just going to PowerPoint and animation. So I just thought I'd sort of try and explain this. So, normally you have multiple layers in your application, multiple tiers, so you have your website. You want some web services. You want some back office business logic and the database. So, if you want to scale, you might put more machines with INS, to boost your web server and your web services, and you've got an application service sitting there in the backend running your application logic. So you've scaled that out. And the nice thing about this is you can scale each level, each layer independently. So as you get more and more customers there, you might actually be able to cope with the web services being at that level, but you need more and more website there to actually take the extra traffic. But then if things do get really successful, you scale the rest out, and what I actually have done there is I've scaled up the database server because it was easier for me to buy a bigger box and handle that, then actually try and do a cluster of machines or whatever. So there's actually quite a lot of things you can do. Equally, if things don't go so well, you can take some of those machines away. I can't actually take that database server away now because I've made it bigger. So you've got benefits for both and actually you've got flexibility there if you're using some sort of cloud environment. Now I've talked about the things that you could have problems with, so file access is one of them. There are quite a lot of web applications I've seen over the years, actually, write things to file. That file sits on that machine. If you've now got two machines, you write to a file, then that other machine can't actually see it. So you could actually solve that by moving that to a network share or something or a data store or whatever. If you've put in network share, then you can have issues like, I've got to then work out who's going to write to it. Anyone then can write to it at any one time. Similarly with Session State, if you put your session state in memory, that memory resides on that one machine. If you've got multiple machines and it's not shared, you better work out a way of distributing that. So there are other ways of using things like the SQL Session State or use Redis or something like that to do that, and then similarly with caching, as well. So if you've got your memory caching, that again is locally, you need some sort of distributed cache to handle that. I've already mentioned this. So with file access, we've got shared resource. If I started putting this file on the network share, then there's multiple things trying to access that file at the same time. So I've got to put some controls in there to make sure that we don't get lock-outs. That actually might not just be a file. You might have some backend mainframe, for example, and any one thing can happen in there at any one time, so you've to got to then control how you gain access to that. So scaling out with multiple machines, actually, can cause you some issues. It can also cause bottleneck. So as in that previous diagram with the database, well, there I have one database so, everything's trying to access that. That then becomes a bottleneck, and we've got to work out how we mitigate those sort of problems. You also get infrastructure issues. It's nice when I've got one box because I only have to worry about getting access to it, but if I start having multiple machines doing multiple things, I've then got to put some sort of load balancer in there or make sure that the routing is right, so that the actual traffic is routed to the right place. If we put file share in, it's not just a matter of making sure that we can access those files. Now we've got to put some security in there so that we can only give access to the right machines or the right people. Similarly with access, you share resources. So we might put some IP whitelist in there. We might change the firewall, so that you can only access those shared resources from certain machines. It's a lot more work you've got to do when you start having these extra things. And as you're adding more servers, then you've got to worry about bandwidth. So your machines are now running on this one network that you've got, and you've suddenly got loads of extra network traffic, can your network cope? You might need to do some work to do that. And I've mentioned, sort of legacy hardware, and you've got legacy applications. You've got to worry about how do we deal with that? So if you've got a mainframe or I've got some software that only works on a single machine somewhere, then how do we access those when I've got all these multiple machines all trying to sort of do it? I've got multiple machines now in my environment, I've got to maintain the configuration of all those machines. So how do I make sure that when I change something in one place, that change got propagated all the way through, and then trying to find it when you've got one machine that's not configured correctly? It starts to become a bit of a nightmare. So you've got to put some decent controls in there and try and maintain it. If you don't put some thought into that, then you're going to get yourself in a bit of a mess, and sort of probably spend quite a lot of time trying to find out which of those 100 machines you've got, is actually not configured correctly. So when we're talking about resilience, there's a number of different things that could go wrong. So we generally think of resilience in terms of some sort of disaster recovery which could be something as small as just a machine going down or the whole datacenter going down, so you've got to handle that. But there's a number of other things that can cause your system to go down, things like transient failure. So maybe I've got a network switch that somebody's accidentally tripped over and pulled the plug out, so it's been rebooted. So it might take 10-30 seconds to reboot and in that time, none of my machines are talking to each other. So what happens? Do I start getting errors? Do I start showing you a web page with this big exception page or sort of showing that it can't connect to that socket somewhere? There's also cyber attacks, so that's a similar thing. If you start getting denial of the service, then they're actually putting extra load on your site, causing all the problems that you would see, and you've got to try and work out how to deal with that, and a common one is actually yourself. So actually you'll upgrade and you're maintaining your servers yourself, you've got to think about what you do. Can I take a machine down? Does it affect the rest of the system? Do I need to tell my customers that we're doing this upgrade? Is it actually going to affect them? I guess the thing you've got to think about is if something's going to go wrong, it will go wrong, and you've got to assume that you've got to deal with it. So whatever is going to go wrong, it will, and I'll guarantee it will when I'm doing my demos later. And I think another important point is, understand where your bottlenecks are. If you don't know where your bottlenecks are, then how can you fix them? And that is if you have a bottleneck, that tends to be single point of failure, as well. So if that database server goes down or whenever, that mainframe connection goes down, how's your application going to proceed? Is it going to fall flat on its face or are you going to put something in place to sort of make that work? So failure can be handled in a number of different ways, and I'll talk about a couple of them now. So the first one is multiple instances. It's very similar to what I talked about scalability. So, if you've got two machines running, if one goes down, you've always got something there, but then this is an additional cost. So quite a lot of places will say they've got disaster recovery, but all they've got is some machines on standby somewhere waiting for somebody to turn them on or they've got some space where they can install the software later. But if you're wanting your application to run, if you're installing stuff on your site, and it goes down, you don't want to have to wait 24 hours while somebody spins up some more machines in another location. You want to just flip it over. So you could put another mirror of your site in another datacenter. If you're doing that, you might as well run it, as if it was your main site. If you're trying to sell stuff, and you've got two sites running, then what's the point of it sitting there doing nothing? You might as well just get there, and get the two sites working together. Then if something does go wrong, if a machine falls over, you've still got capacity in both datacenters. If a plane lands in your datacenter and wipes it totally out, you've still got the data somewhere else so you've not lost anything, and this helps with a lot of them. So it helps with disaster recovery. It helps with cyber attacks. It helps with upgrade and maintenance. If you've got multiple machines there, you can take one down, upgrade it whilst the others are still running, and then swap them over. So this gives you a bit of flexibility. It has the same issues, like I talked about the scale add, so all those issues ] sort of like having to sort your networking out, having to sort your routing, having things running in multiple machines, and the shared resources, you have all those same sort of issues. If you've got transient failure so, this is where, like I said, with the network switch going down, for example, if you've got distributed systems, something in that network will fail at some point, and what you don't want to do is that to just ripple through and cause massive disruption to your site. So one way to do that is to use retry logic. So if you're making a call to web services and it's not there, then wait a short while, try again, and then keep doing that. So, but we're going to make sure that if you've got, say, 100 machines all trying to access a web service, and the network goes down, you don't want them all to back off at the same time, and then all try again because then you might end up your own denial of service. So you've got 1000 machines or a million machines, whatever, then that is going to cause an issue. So one of the ways to do that is to actually put some randomness in the retry. So it may be the first time it tries in a second, but not every machine's going to have a second. One might have 1.1 seconds. One might have 0.9 or whatever. So they all come back at different intervals. Also, if they fail multiple times, then make that longer. So, obviously, if it's failed two times, then there is something wrong. The first time might just be a blip. The second time there's something wrong, so let's wait a bit longer. Then you could keep trying that. So the first time it waits a second, a second time, two, four, whatever. But what you don't want to do is to keep doing that because at some point, everything will come back online, but this helps you with those sort of transient failures. I think quite an important one is handling failure in the user interface. You don't just want to fail. You don't just want to put an exception page up that says you've got a problem. You'll want to do that in a sensible way. If you've got say a share price site, and you lose the link down to the stock exchange, just leave the prices up there, and say, "These were correct as of 15 minutes ago," or whatever the answer is. You don't necessarily need a connection all the time, so it's better to show something. And if you're going to take your system down for maintenance, put a screen up to tell them that it's down. Also, if you've got some important customers who are on the site, warn them. Say, "We're going to take the system down on Sunday for two hours, and you'll get this message. Don't be alarmed. We know it's happening and please don't ring us up and cause us loads of problems." I think you need to try and show it in the UI, as much as you can without having to show any errors, and if things are important and they're out-of-date, then you highlight them. Maybe just you put, "Data unavailable," or whatever. So you need to just do some sort of messaging out there so you can actually make sure that people understand what's going on, and if you can, cache things in the UI. So, if you always need to go back to the database to pull out your catalog, for example, maybe you can cache some of it locally and you don't need to keep going back. Just sort of make sure that if something goes wrong you're not stopping the service totally. Also, with the UI, you may want to put some sort of queuing in there. So if you're going to do an order and the backend order system's in there, then make sure you queue those requests up, put something in there that when the system's back up, the queues can then be processed, and all the orders are still there. So you can take all the payments. You can take everything, but just maybe not update your stock, and maybe after then email a couple people saying that "The stuff's out-of-stock. We'll get back to you," or whatever. That should be better than actually having nothing there to sell because they can't get on the site. So you just need to sort of try and think about what you're trying to do and how you do it. One technique I've seen is, if you have a lot of downloaded data that doesn't change very change very often, then instead of getting that from the database, then actually create some static pages. So create an XML file or JSON file, whatever with that data in it, and then serve that from files, from your Content Delivery Network or your file store or wherever, rather than going back to the database. So this might be something like horse racing results, historic horse race results or football results, something like that, something where there's a lot of data where you don't need to keep pulling it out from the database because it's never going to change. It's always going to be the same, so you can pull that out and store that locally. You could have that served from your web page, so if people want to look at that, they don't have to rely on the backend services there. Bottlenecks, they're single points of failure, so they are going to cause you a problem. So if you're not sure what they are, then you don't know how to fix them. So good examples are databases. Login is actually one that you don't quite often realize that if you login, as soon as the framework are single threaded, and they can actually cause your site to go down just because everything's queuing on the file, for example. So if you've got it all configured incorrectly, then you've got to be aware that login could be a bottleneck. And legacy components, so that again might be something that only works on a specific machine, on a specific operating system, and you have only a single machine with that because you've only got one license, that's a bottleneck. You've got to make sure that you can work around those and work around different ways of doing it. And the only real way around that is to understand your application. If you don't understand it, and you don't know where they are, then you can't fix it. So one way to do this is load test it, so you can load your application, work out where things go wrong. I had in my previous job, so we were doing some load testing on a website. The website was an Umbraco site using a backend database and hosted in Azure. So I though, "I know what I'll do. I'll try and load test it. I'll see how many people I can get on the site before I have to increase the number of websites I have in Azure WebApps." So, I ran the load test. I got to 3,000 users, and the website fell over. I thought, "Brilliant." So what I did was I increased the number of WebApps, the new instances of my WebApp in Azure, and ran that again. I got 3,000 again. Well, I thought, "That's a bit confusing. I was expecting to get 6,000." Well, it turned out, it seems the database was it seems the database was a bottleneck and I didn't actually realize that. So I went into the Azure, spun the database up to the next level up, and then ran the load test again. I got 6,000. So I thought my understanding of the application was it was the website that was going to be the problem. Actually, it was the database. If I didn't do that test, then every time something had gone wrong, it would have been like trying to work out why the website was slow, rather than look at the backend. So this talk was about building resilient WebApps in Azure. So, I guess, we need to talk about Azure at some point. So I guess most of you are familiar with what Azure is, but if you're not, it's a cloud-based service. So it's not on-prem, although they have of lot of services that can run on-prem, and it's a multi-tenant environment, but they built it in a way that they try not to affect, one client shouldn't be able to affect another. It also runs in Windows and Linux. You can run it in any language. So they're trying to push now that this is a platform for everyone. It's not just Microsoft Technologies anymore, so it's open source technologies. It's Microsoft Technologies, Java, Linux, and Windows. So there's quite a lot of traction there to use different things, and there's actually different ways of doing it. So, if you just want to take your existing infrastructure that you've got on-prem, push into cloud, you can use infrastructure-as-a-service or virtual machines. If you want to start being a bit more focused on what you do, for example, you might take your website out of your VM in IS, and put in Azure WebApp. That's what's called platform-as-a-service. They also have a number of services that run on their own infrastructure. So there's a lot of Azure services like Office 365 and CRM that are actually running in Azure, so Microsoft using their own technologies to host their own services. They've also recently released container-as-a-service. It can help your Docker containers running as a service in Azure now. And the nice thing is, is we talked about scaling, so you can do scaling in and out and, basically, you can scale up and down. So, if you need resources during the day and not during the nights, you can have it automatically drop your resources by night and actually save you some money. So you only pay for the services you need, and not for building these big machines. Also, I've talked about earlier with the animations, if that was on-prem, I'd have to pay for all those machines myself. They sat there, so where I scaled them down, I wouldn't have been able to do that. All that money would have sat there in my datacenter, sort of pretty much unused. So using some sort of cloud provider allows me to go back down in scale as well which is quite often people don't realize. But there are issues. We talked about a few of these already. So you've got some legacy components, and can you actually run those in the cloud? Can you actually put your operating system that thing runs on, in the cloud? Can you transfer it from the machine you've got, sat in your datacenter into a cloud? You've got some of these things can't happen. So maybe that needs to stay where engineering were to work out how you connect to it. Similarly, if I've got something...I definitely can't move my mainframe. I've got this thing that's working, sitting in the corner...well, probably not sitting in the corner, it'd probably fit in the room...accessing the data that I want from that, I can only do that on-prem. So I've got to find a way of bridging that gap between a cloud computer and my backend services in as secure a way as possible. If I've got our application using file access, can we actually use file share in the cloud, and are there other, better, ways of doing that? I think one that is most important is security. Cloud's a public domain so it's not behind your own firewall now. You've got to do things yourself. You've got to make sure it's secure. Don't use Pass@word1 as your password. I did that once. I had VM. When VMs first came out, I set a VM up. I said it as Pass@word1, forgot about it for two weeks, got an email from Microsoft telling me to turn it off because somebody's hacked into it and was using it for naughty things. So, I don't use that anymore. Yeah, that was quite good, actually, that they actually did that. You know she told me that they turned it off and turned it off, and they were monitoring it, and they didn't really publicize that at the time, so I was quite pleased. And I said, "Well, I did that on purpose really." Another nice thing with Azure is it's multi-region. So I have lost count now how many regions they have. They tend to build them in pairs. So, for example, European's more than that now. There used to be in Dublin, Amsterdam, but now there's Germany and the UK. I think there's some others spinning up. You've got quite a few in the States. So Dublin is paired with Amsterdam. So when we talk about geo-redundancy, there are services where I could put some data in Dublin, it will ultimately get replicated over to Amsterdam, I don't need to do anything. And they do a lot of these things, so that helps with the disaster recovery. So there's different ways of hosting, so I've talked about Azure Web Sites. I've talked about virtual machines, and I'll talk about service a little bit later. So, I won't talk much about that now. And there's lots of different databases. So you've got SQL database. We've also got Elastic SQL. So that's also called sharding. So it scales your database out, and it creates shards of data. So, you might have contacts with A to B in the first shard, and C to F in another one, and that's done for you, and you sort the querying out and all that sort of thing. They also have other things like Redis and DocumentDB, Blobs, and Content Delivery. I want to talk a bit about those a bit later. We talked about how do we connect these things together and connect the on-prem stuff with the cloud? So there's a few things there. So Traffic Manager's one thing. I'll go into a bit more detail about that, and there's Service Bus. Service Bus allows you to do sort of messaging. So it's got what's called topics and queues, and they use this for a lot of the thing called Event Hub, so getting a massive amount of data around the system, and they also have Hybrid Connections. So Hybrid Connections are a service that allows you to connect an on-premise service with something in the cloud over a secure network. It's not a VPN. It basically if you were an agent on-prem, and it makes an outbound call, so it's using Service Bus behind the scenes. It's making an outbound connection to an endpoint in the cloud, and they integrate that with Azure WebApp. So I can create a WebApp with this Hybrid Connection talking to my database on my PC at home, if I wanted to, except my PC at home here, and then there's a whole load of security stuff. So there's Azure Active Directory. It's not the same as active directory you get on-prem, but they're adding more and more things just to make it as close as possible. It's really at the moment, user management. There's Azure Business to Consumer so that sort of thing. It's like Google...what are they calling it these days...Microsoft Accountant, and those sort of providers. There's multi-factor authentication. And there's this thing called Key Vault which is actually a way of storing your keys, and you can get them out securely without having to put configuration in everywhere. And there's a whole load of tools. So there's TFS now in the cloud. It's free. If you've got a small business or a small number of people, you can use it for free for five users. If you've got MSDN you get a free license, so it's actually pretty good. I use it for a lot of stuff I've got. It's got Release Manager built in, builds and load testing. So I actually use...when I did the load testing on that app, I used up my free load testing quite quickly. But those three sets of load tester did cost me 70 quid, I think in the end which I don't think was too bad, but I didn't really want to do much more, since it was on my credit card. There's also a dev test lab as well, which is a way of building your dev environments in the cloud. So the first thing I want to talk about is Blobs. So this is basically scalable file storage, so I can use it for anything. So I've got it if you want to put images up there, you can use images. They use it for videos, use it for any static file, if you want to just put stuff up there. And the nice thing is, as soon as you create a Blob it'll replicate that data. So if I put a Blob into Dublin, then the minimal I will get is it will copy into two other places in the datacenter. They'll do that across different areas, so if something fails, you've still got access to it, and you can actually configure it to be geo-redundant. So what that means is if I've put it into Dublin, it will automatically copy into Amsterdam. As soon as it's into Amsterdam, you get three copies in Amsterdam, so you get six in total. It can also configure that to be read-only access in Amsterdam. So I could do all my changes, if I've got a file over in Dublin which I want to change, I can change it over here, it will ultimately get replicated over to Amsterdam, and I can read it from Amsterdam, so it's quite a handy feature. It also automatically load balances. So if I've got something, a video for example, that suddenly goes viral, then Microsoft will scale out the backend, and all of the data, so that it can be served correctly. Which is actually quite a useful thing to remember because if you're then pushing your static data and files into Blob storage, you're not having to worry about scaling that because that will be done for you. It will come as an extra cost because you pay for data egress out of the datacenter, but the actual scalability is just part of that, and you can create SMB shares in there if you want. I've not needed to do it because I, generally, if I'm going to use Blob storage, I will use the SDK, create them in my code, and access them like that, or I can access them using the URL. So I can make the Blob public, give you a URL, you can access that file, but also I can make it semi-public so, whatever that means. I can lock it down, but give you a URL with a key on it, so if you've got that key you can access that file, but if you give that URL to somebody else, they can also access it. So it's sort of semi-secure, but it does mean that I can put a time limit on that, as well. So those signatures will have, I can make that valid for a week or a day or a couple of hours or whatever I want, so that that means you can give access to something for a limited period. I've talked about SQL, but it's actually got some nice features which you don't think would be there. So you'd expect that to be able to scale, which is what you would expect from a cloud service. It actually has a good threat detection and alert, so if somebody's trying to hack your database, you'll actually get alerted from Microsoft. Also, there's point in time restore. so you don't have to do anything to configure this. It says, if something goes wrong, you can go back and pull that data back to whatever you had, whenever you wanted it. It's not fully compatible with on-prem, but if you're using, I guess non-enterprise-y type functions, then you can pretty much move your database across. For most of the applications I've used, it's a matter of just there might be the odd thing that you need to do. You could have a specific type of key in there, but it's pretty much there's a tool you run, a compatibility tool. It'll basically look at your database. It'll write all the scripts and transform it. It's fairly straightforward to use, so most things are there. It doesn't do things like, it can't put any C# code in your database, but why bother anyway? But it does do geo-replication. It does that read-only geo-replication as well, so you can actually have a database written in one datacenter, ultimately replicated. It also does the three copies of the data, as well, but it does it slightly differently. The SQL database uses master-slave, so there'll be two slave copies of the data. So if the master goes down, one of the slaves will then become the master, and another slave will spin up with a copy of that data, just slightly different to the Blobs, but it's still there with three. Similarly, if it geo-replicates, you get the same between Amsterdam, you'll get the three copies, as well. And then we've got this Elastic Database which I mentioned. This is a sharding. That has all the features of the SQL, but it allows you to scale out. So it's done the SQL, basically, it gives you more powerful machines in the backend. Elastic Database allows you to scale out, equivalent to scaling out, and it does that via sharding which is basically splitting the data. And Azure Websites, this is probably one of the services I use the most. So, I build a website, push it into Azure Websites. I can do that and publish it in Visual Studio or PowerShell, or whatever and you basically just package it up and publish it. It's fairly straightforward. I use Release Manager. It's hooked in, I can just use TFS in the cloud, hook it straight into my Azure subscription, and push it straight out. Because it's platform-as-a-service, I don't need need to manage the backend services at all, it just does that for me. So I don't need to worry about things falling over and being spun back up. If the server falls over, it'll automatically get spun up somewhere else. I don't need to worry about it. And you get alerts and monitoring, so you can put things like, "If a CPU gets to a certain level, tell me about it," or whatever, and you can scale that. So the thing I like is, you can autoscale. So I can say, "When the CPU gets to 70%, add another instance, and keep doing that until I tell you to stop." So you can put limits on there, so you may put 10. So I'll want it to scale, and until I get a minimum 2 machines and scale out to 10, while it's at 70%. If it drops below 40%, then we'll scale down again. So this is where if you've got a service that's busy during the day and quiet during the night, you can actually set autoscale on and actually save yourself quite a lot of money so, you're not actually paying for those services overnight. I think websites that charge...I can't remember whether it's by the hour or by the minute, actually. But if you're not using them, then you don't pay for them, so it's really good. I went to a customer to do this job. I was sent out. He said, "This is your thing. It's costing me too much money." And I went in there and somebody had scaled up their website to some stupid number and they've got these really big machines. I'm like, "Why'd you need all that?" So, basically I turned them down and it paid for my consultancy fee. So they were quite happy. The other thing I really like about Websites is WebJobs. WebJobs are basically a way of running your services, but they run really nice SDK. So they've got triggers in there, for example, so if I wanted to use the Service Bus, one of the keys in Service Bus, it's like an attributed trigger. It's [inaudible] your console out, it sits there, and it just waits for a message to appear on the Service Bus, and then it will trigger the message, and you can process it. You can actually scale these out separately. So you could build it. They sit on the backend of the website and they use the spare capacity, but if you wanted to, you could just have an empty website and just have WebJobs there. So if you had something that was really intensive that would take your website down, then you can move it out. You could also use it overnight. So what if you've got this scenario where your business is busy during the day and quiet overnight? You could actually have your WebJobs running all the time. So at 2:00 in the morning, they're actually doing some processing and actually using that spare capacity that you're not using during the day. You can set up really configurable, multiple deployment sites. So I generally use this just for site staging and production. So I can push my changes into stage, and do all my testing on stage, and when I'm happy with it, I'll do a VIP swap, and my stage becomes production, production becomes staging. Then if anything goes wrong, I can switch them back, just really easy. So this probably takes about a minute, two minutes to do that. You can also configure it up to do A-B testing. So you can setup a number of sites in the backend, and it will ultimately route, you can premise in there, and say what percent of your traffic you want to go to each, and that's automatic. You can set that up. I've already talked about using Release Manger to actually deploy to the Websites , but it's fairly easy and there's PowerShell scripts do it, as well, so it's pretty good. I mentioned Hybrid Connection. So having your website sitting there, talking to something in the backend without having to overlook file reports and this other is really good, really powerful, and you can actually limit it as well. So when you configure Hybrid Connection, you basically say put the IP address on the machine name of what it's allowed to connect to. So you can only connect to one thing with each Hybrid Connection, so that's actually pretty good. Then we've got Traffic Manager. So if you have a number of things running across the world, or even if it's across different datacenters in Europe or different datacenters in the UK, what Traffic Manager does, it does DNS backed routing. So you can actually configure it to route your traffic a number of different ways. So if I've got my website running in the UK datacenter, the Irish datacenter, and in Amsterdam, I could set the Traffic Manager up to route traffic to whichever one. I could set that up to be, always go to the Amsterdam one, unless there's a problem, then we'll go to the Dublin one. But that's, to me, again, that's a bit of a waste. If I've got it, I really might as well use it. So you can get it configured to be performance. So it'll go wherever the user is, it will go to whichever of the datacenters is the most appropriate for them, the fastest one. They'll get traffic around your pings at each of your services from different locations to try and work out what the best combination of user to server is, and so the other one is just a general, Round Robin. So it'll go the first person goes to Amsterdam, the second one goes to Dublin, the third one goes to the UK, or whatever. We can also configure it as long as it's got a public endpoint, you can add it into Traffic Manager. So you could have something in Amazon cloud or something in your own datacenter, so you can actually use it as a load balancer across all those if you want. Then we've got Content Delivery Network. So Microsoft has spent a lot of money building out their Azure datacenters. They've also spent a lot of money putting other servers elsewhere in the world that actually aren't in the datacenter to deliver content out, and it basically integrates with Blob storage. So if you've got a video in Blob storage or some images in there, you can add them to Content Delivery Network and you effectively change your URL. So instead of pointing to a Blob URL, you point to a CDN URL, and it ultimately pulls that data. The first person in that region that gets the data will go back to the original source and pull it through. The subsequent person will then just pull it out of the cache. So it's actually a good way of moving your data out of your datacenter locally to where the user is. So I've put a few sort of tips and tricks in there. I've tried to categorize them. So, although I've said no code change, one of them did say a config change. But the first one's autoscale. I've already talked about this. So in an autoscale, it's probably one of the best ways of making your application performant, and it only scales when you need it to, and you can put limits on it. So it's actually a good way. We can fairly well predict what the maximum is because if you put 10, it's always going to be 10, as the worst case scenario, and you just need to monitor it. You can use alert sensor, so if it starts getting to 10, you start getting alerts that things are getting busy, maybe you just stop it a bit, and then take it back down when it's quiet. There's another important one is moving load from the website. I've already sort of hinted at this. So instead of pulling everything, effectively if you've got iOS running, and everything's been pulled from iOS, the more people you've got, the more stuff's been pulled. So if you push your stuff into Content Delivery Network or into Blobs or whatever, you're then pushing that processing out to somewhere else, rather than having it hosted and being pulled from your local service, being pushed out to the periphery. So, it's actually taking some of the load away. Then use some sort of caching. So you can use Redis for caching, and it's basically a config change and then you get a package. It's not really a code change, but it's probably a rebuild to get that in, and then as I've already mentioned, load test. So if you don't know what your performance of your website is, then load test it. At least you know where things are likely to go wrong so you can prepare for them. You might not know how to fix them right now, but at least you know that there's a potential problem when you get to 6,000 users, or whatever that number is, but at the moment, you don't know what that is because you're not even load tested. But there are things you can do in code change, and I guess the obvious one is use an async. So using asynchronous calls, I guess, we all should be using those now rather than synchronous . If you're doing synchronous, and your websites waiting, so you don't want me to wait for something to happen, you want it to do asynchronous. Another useful thing is to use queue. So, for example, you want an order, instead of waiting for the backend to be free so that you can push it in, just drop a message on the queue, do some sort of auto-numbering for order numbers or whatever so you can actually manage it. But if you put it in a queue, you can actually then scale out the backend of that queue processing, independently of your website. So you're not waiting on anything, then, to process. You've actually got a fairly fast way of doing it. Use geo-redundant data, as well. So if you're using Dublin and Amsterdam, and you're out writing into Dublin, then use the Amsterdam instance to read all the data from the read-only replica. Two things there, you're not pulling the data from the main database. You're pulling it from the replica so, you're not having to have that load, and if something goes wrong, you've still got the data there. You can access it. And then you can use sharding or partitioning. Database sharding I already talked about. Partitioning is, basically it's similar, except you probably do multiple database on a per client basis, for example. So if you've got multiple clients, you give each one a database. Then each one's accessing their own database. So it depends on what your application does. I tried to do all this. So all that thing I've just talked about is in this sort of diagram. So I've pushed in there, I've tried to sort of show the geo-replication. So the SQL's geo-replicate from Dublin to Amsterdam. Similar with Blob storage, I'm using Content Delivery Network and Traffic Manager. The problem I have there is with Amazon Web Services in Frankfurt, I haven't got a mechanism to get my data in there. So I've got to work out some sort of data sync to get that in. Also, when we're putting our orders in, for example, they're all writing to that single Service Bus, which is in Dublin. So maybe we need to move that out, and have some sort of queue mechanism in each with some sort of processor in each that then talks back to the Dublin queue so that we can control it. So, if something goes wrong in Dublin and I've lost all my orders, I can't place an order from Frankfurt or Amsterdam because I'm waiting on that one single queue. So maybe you need to replicate that or do some sort of clever processing. If you want to make things resilient and scalable, then you're probably going to have to change your application. You're probably going to have change your infrastructure. Things will fail. If it's in a network environment, things are going to go wrong, so be aware of it and be prepared. And understand your application, and where your bottlenecks are because if you don't know what your application's doing, you can't fix it. I think Azure, there's a lot of things in Azure. There's a massive set of services, and I can't remember them all. So you tend to pick the ones that you know. I've tried to pass that information on of what I have found useful. So, yeah, I hope that's been good. So use the tools and the services available to make your website scalable and resilient. Thank you for listening.