Sessions is temporarily moving to YouTube, check out all our new videos here.

Speeding up CI with Node and Docker

Will Munn speaking at London Node User Group in May, 2017
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

Tips for running faster builds with NodeJS and Docker. This talk mainly outlines lessons learned and optimisations which will hopefully help you get quicker feedback from your CI pipeline, and make your deployments run quicker.


- Right so, yeah, hi. I'm Will Munn. So I work for a company called Focusrite/Novation. We make iOS apps to make music with. That's my GitHub address. You can be friends with me if you want. I'm gonna give you some tips for these like lessons I've learned in the last sort of couple of years using various CI tools, like for Node and Docker. So, what is continuous integration? So it's about checking commits can integrate with the current system. So it's important for, especially in like a microservice architecture to ensure that like contracts are kept between services. And it's about alerting development teams of failures as quickly as possible. Failing fast, avoiding expensive mistakes, making their way out to users. Then the CD part is continuous delivery. Start encouraging a smooth automated deploy process, like regularly deploying small changes to production. And again, alerting development teams of failures as quickly as possible. It reduces inventory so that you don't have un-deployed codes 'cause that's expensive, that's an asset. And it allows the developers to add value frequently by releasing features and fixes continuously. So, what tools can you use for CI/CD? So, I've got a few there. I'm sure there's a few more. So you've got things like Jenkins and TeamCity, which are kind of a bit more traditional. You've got like usually you run a server and an agent or multiple agents, or slaves, they're sometimes called, and these slaves run your builds, your compilers or your tests, whatever you need to do. And then Circle and Travis, they're kind of more on the configuration as codes, SAS kind of idea. I haven't used Go that much, but I imagine it's probably the similar kind of idea. So, configuring your tool. So you decided to use CI. The first tip I've got for you is keep your agents lean. So, the first place I ever did continuous integration at there was a, we were using TeamCity and there was a massive git repository called TeamCity Build Tools, which had about three different versions of Ruby in it. It had various tools, PowerShell extensions, anything you can name, and basically we would clone this repository onto every agent and it would take hours and inevitably be not at the same place in all the agents. So basically this is the kind of situation that happens. So you start a CI system, you're like, okay, I'm gonna write one project, so I've got my team. And my team, my project's in Git. It's a Node 4 project, and I'm deploying to App Engine. So instal Node, instal Git, instal App Engine, and I'm away. Then Team B comes along and they're like, oh, we kinda like what you're doing. You're deploying things quickly and we like the CI, do we want to implement it as well? So, we're gonna use your CI server. Except their app is using ImageMagick, and it's in Python, and they're deploying to servers using FTP, like old-school style. So now you've got all these things installed on your agents. And Team C comes along and they're like well, we wanna adopt this as well. So they go, all right, we're gonna write our app in Node, but like we're brand new, so we're not gonna use Node 4, we're gonna use Node 7. So then you've got this problem. So you're gonna have to instal two versions of Node on your agents or slaves, or you're gonna have to instal something like NVM or N to manage these versions. And basically you could go down that route, but you eventually will instal everything. I mean especially when you get to things like browsers wanting to run tests inside various browsers and you have to instal different versions of Firefox and Chrome. And installing a new build agent when you need to add another team because you got two build agents can't run builds for every team in the world. And then you've got snowflake build agents because you've got an un-maintainable list of software to instal on there. So, yeah, it's hard to scale this. It uses more resources. Yeah, basically as your company grows, you'll have more agents to keep up with the demand and have so much software installed that it'll take ages to configure and your ops team will hate doing it and basically it will become unloved. Also, agents will need loads of space to handle all the dependent software. So, some solutions to this. If you're using Node, or yeah actually, just instal your packages locally. So a lot of npm modules, I mean like things that spring to mind are things like Babel and Grunt will go, you need to instal this globally with the minus g flag, don't listen to them, basically. And you don't need to have that globally, like have everything defined inside your repository, in your dev dependencies, so it's local to your application. Inside your repository you've got a definition of everything you need to run. I mean, it doesn't just apply to Node, if you're using Python, virtualenv, or the new Git for C sharp, or there's lots of ways you can define your dependencies inside them. Scripting the agent configuration process, it's a good minimum sort of thing to do, where it's like just a bash script, or just using chef or puppet, or something that can take an empty machine with like just the OS installed and instal everything you need to do for your development teams to run their builds. The problem is that they only solve part of the problem, so it kind of depends like how your infrastructure's split in terms of people and management. But like you don't really want every time your apps requires a new piece of software installed, to have to ask a CIS admin to like instal something new on all of the agents because like they're not gonna wanna do that and you want to be able to be quite fluid with that. So, suggest moving it over to the developers and only instal Docker and Docker Compose on your agents. Everything else can be down to the developer. And like I said, that forces you to bring all your dependencies inside your repository and developers to think about what is needed for the runtime of this app. So, tip number two. Keep your configuration in source control. Start so, like a lot of CI servers, like TeamCity or well basically, all of them, are built to support like a million different types of build tools. You've got things to support Maven. You've got things to support Ant. You have things to support npm. You've got like Cron jobs defined, like zip file expressions, like all sorts of crazy shit. So it all sounds great. And like when you first discover it and you're like, oh, this amazing, I can do so much stuff with it, but and you don't really think about that at first though. You'll just have to start with something like this. You'll have like checkout my source code and run this, npm instal, npm test, and deploy. So then you go and like discover that all these plugins exist. And then you'll go, okay, I want my build to run at 9:00 a.m. every day. Why not, 'cause I know I come into work and I know that it's working. And then I've got like a database and I want to like configure some canned data and my CI tool has got like a nice plugin to help me do that, so I'll use that. Then I'll checkout my source code. And then run my build steps that I described before. And then I found there's another great plugin that reports great test output, so that I can know exactly, I don't know where my code failed. And then I'll go, oh yeah, there's a plugin that also emails people when there's a release that happens. So that's really great, like people will be aware that things is going on. And then another plugin that zips your build source code and stores it somewhere, to an archive or whatever. And then there's some other super awesome plugin that like is absolutely critical to make a release functional and is only available on the CI server. So that's all great and you're like, I'm a pro user of Circle now or whatever, and then one day this happens. So you hit your CI server and there's nothing there. So this happened to me before. It was a day where for one reason or another, all the source code disappeared from the production servers. Let's say a fat-finger error. And at the same time, the CI server went down, so we couldn't deploy. And because a lot of the logic to tell us how to deploy was on that server, so the first thing to get back to life was bring this back up. And then before we could deploy, so it was a lot of time wasted when we were down. where we couldn't do anything. So, solutions to this. Keep all your logics in scripts or task runner. Like as tempting as those plugins might be, keep them in source control. And commit, so like a lot of CI tools have kind of know about this problem now and they offer solutions for this. So Jenkins has like the single Jenkins pipeline script, which is like a little groovy script which allows you to define each step inside of a file which you commit with your source code. Circle has got yaml file, so does Travis. I think TeamCity has got an xml file to do the same thing. So yeah, basically I recommend, although you can have like a Maven task, or this task or whatever, just keep it to small command line scripts that will run efficiently and that you can commit. So, tip number three. Avoid npm bloat. So all right, you decided to be like a JavaScript ninja and like you've read about Node and you wanna use Mocha 'cause you love test-driven development and use the latest JavaScript features, and then you're gonna use a task runner like Grunt and also you love static typing now so you're gonna TypeScript. And like you've Googled all of this and then like the best way to make a web app it tells you is to run this magical command. So that's before you've written any code, so yeah, that looks pretty unoffensive at first. But let's look at that. So this is my internet connection at home, which I think is pretty good, to be honest, at 50 mega second, I'm quite happy with that. So on that connection, it takes 10 seconds to download, to run than command, that npm instal command. So that's 50 megs of node modules, right? 199 different packages, and 4,086 files. So, I don't wanna tell you how many contributors and how much code, potentially buggy code there is in that. So, to start a small web project, there is no need to do that. So, solutions to this is please spend a bit of time learning node core. To write a web server in Node, it's those five lines of code, right? So if you just need an API to return this in JSON, static JSON, that will do it. It will also make web requests. You don't need, at first you don't need like a million packages to do that. So yeah, instal modules as you need them. So like the whole Node ecosystem is built on the Linux philosophy of bringing small things together. Avoid massive frameworks. So my experience in Angular 2 and Ionic is just an absolute blasphemy. Start with lightweight modules and solutions. Here I've got some alternatives for you. So you need to make some web requests, well Request in 6.3 megabytes. You've got Needle which does most of the work for you, unless you need to do oh-worth or range requests or anything complex like that, that will be perfectly sufficient. AVA, really popular testing framework at the moment, but look it's 34 meg for AVA. And Tape has got the same API, two megs. So start with AVA, and then, oh start with Tate, and then you can migrate to AVA really easily. Again, Router versus the Express is maybe a bit like overkill, but it's also another example. I mean, if you're just building an API and you don't need to like render complex views or do compression and cookies and all this stuff, like having just something that roots web requests is fine. Also, build tools. So start with npm scripts, it's built into npm. You don't need Broccoli, you don't need Grunt, or all these things. Just start out small and see where you go from there. Also, something else I've seen people do is using boilerplate projects, like especially things to do with client-side code, like React and Angular. You go and look at React boilerplate projects and you'll get every module under the sun that you won't need. Just start small and build on what you need. Tip number four. Keep your Docker images lean. So, large images take up unnecessary disc space. They take a long time to build and deploy. And having more software installed means that more things can go wrong. So basically your storage costs will increase. And so continuous delivery is about getting change out to users as soon as possible so you get fast feedback. You don't wanna be waiting, like ages, for Docker images to build. Having more software installed on your images means that more things can go wrong. So, this is like just a suggested workflow that I found works. I didn't invent this. I think a lot of people are doing it. It's about having two Dockerfiles. You have one for development. So here you have a node-based image. And you add your package.json, and you add your app and instal, and then run your build. So here is the explanation of the first line. So if you're not familiar with Alpine, it's a very lightweight Linux distribution. The default node, JS Docker image is based on Debian. So, as you can see, that's quite a lot larger, quite a lot you're gonna have to wait, download more than downloading Alpine. So I recommend basing it on something like that. The next important line is this. So add your package.json first. So this basically takes advantage of the Docker cache. So what happens is you add your package.json and you build this Docker file, right? And the next time you build it, if you haven't changed your package.json, it will skip over it, all the way up to add app. So that means that you've cached the npm instal step, which takes a long time. So yeah, it will only run npm instal when your package.json changed, which is usually when you add dependency. Okay, maybe you'll run it a few more times if you add an npm script, but this saves you a lot of time. It means you don't have to do npm instal every single time. If there's one thing you take away from this talk, I think that will speed up your builds the most. So in production, you want to deploy the smallest possible artefact to your production servers. So again, node Alpine, then add, so dot here is a directory with your build application, everything just for the runtime. So I mean your dev dependencies removed. Your npm package is de-duplicated, your tests deleted. Anything that's not required to run your application should not be in there. So in the build steps for this kinda workflow, so build your development Docker image. Run the tests inside the development container. And like I said, remove the runtime specific files, non-runtime specific files. And publish your production Docker image with your built artefact. And then run some smoke tests on your test environment. And repeat in production. So this is now kind of obsolete because as of Docker 17.05, there's something called multi-stage pipelines, which I haven't tried yet, but it's basically, allows you to inside a single Docker file and base lots of little images from one Docker file. So you can do the build container and then at the end, end up with just the production container, just all in one descriptive file. So that's quite a new thing. I haven't used it yet, but I'm quite excited to try it out. So a final tip is use docker-compose. Say you're building a website and you wanna run some tests using Chrome. Then and that website also reads from a database and talks to an API and talks to a queue. So if you have to wait to deploy to a test environment, to test integration between all of these things, I mean that's slow. Basically anything that involves having to deploy your application, it will take longer. With docker-compose you can have a truly like local environment running all of these things and test locally before you deploy anything, and you can also test it, use it for a local development. So here I've got, this is taken from an application I built a few months ago. So it's an application, talks to the openshift API. It's basically an API router. So those services and ssl service, they're just dummy APIs. The API is the application itself. And then the tests is a container that runs web requests against the API. So as you can see, so with the links at the bottom, it means that this container can talk to those two containers. And it's basically a very descriptive way to like describe your entire environment and run quite a comprehensive suite of tests, acceptance tests before you have to deploy, as I said. So, some other tips. Get a build monitor. It sounds obvious, but it seems like a lot of people don't have one these days. Just having people see something red is just like a really good visual cue and it will help you. Integrate with pull requests. So GitHub and GitLab both allow you to run a suite of tests or some kind of CI check before they let you merge things. So there's no point in doing a code with you for a code that doesn't work, so I'd rather know, well go fix your tests, and then I'll review a code. So Yarn, Yarn again, it is quicker than npm, so I recommend trying that. And try running your tests in parallel. So yeah, that's it from me. Does anyone have any questions? At what point should you like look to implement this kind of thing, like what are some tell-tale signs that your current system is slow? So yeah, I mean it's really an iterative process. I didn't sit down and go learn all of these things straight away and build it at once. It's about like you want to spend as little time maintaining your build process as possible. So like if you find yourself spending time massaging builds through, like re-running builds 'cause they're red, waiting for them, like look for little things while they're running, look up ways you can make 'em faster and just like constantly improving on what you're doing. I mean like this isn't the end. There'll be loads of other ways to optimise things. I think it's just constantly improve. How long do my builds take? Well, right now yeah, my builds take about 30 seconds 'cause there's no tests. I've just joined somewhere new and they didn't really have CI until about a week ago. So I think it's a bit of an unfair question. I guess after a year, if you've got end-to-end tests that are taking a long time, like it's more, you want to start looking at the tests that you have and obviously you should have a lot of small fast tests and fewer slower, slow tests at the end. So, maybe it's time to look at the test weight and try and like reassess what's the best way of testing these things. In terms of, if it's npm instals that are being slow, another thing that can speed things up is having your own npm registry, so the main npm registry I think is only based in the US. So when you're doing a lot of npm instals, that's very slow, so I can't remember off the top of my head but there's a few solutions to installing a local npm registry that are like, if you were running things in AWS, you can run it inside your own BPC and that speeds things up a lot as well. But yeah, it sounds like assessing your re-assessing rate for how you're testing things and like looking at that kind of thing is probably the way to go. - [Moderator] Cool, I think that's all the time we have for questions. Let's give another round for Will.