JSONata: A Declarative Syntax for Querying Your JSON Data

Andrew Coleman speaking at London Node User Group in April, 2017
781Views
 
Great talks, fired to your inbox 👌
No junk, no spam, just great talks. Unsubscribe any time.

About this talk

JSON is becoming the de facto standard format for exchanging data between systems over the Internet. Its key strengths are its lightweight syntax and its simple data model, which have helped propel it to be the dominant payload format in RESTful web services. JSONata provides that in an open-source module that can be embedded in a web page or used in a node.js app. This session will introduce you to JSONata and show you how to find buried treasure in your JSON data.


Transcript


- Hi, so my name is Andrew Coleman. So I work for IBM, I'm based down in the IBM development labs in Hursley down in Hampshire. And the product area I work on is integration. Integration and applications of services. And the challenge there is that the things we're integrating have no knowledge of each other. They could be legacy applications or new rest services. And they don't know they're talking to each other, and they were never designed to talk to each other, so there data formats that they understand are generally completely incompatible. So the integration for that in the middle, has to be able to speak lots of different transports and data formats and to transform these data formats in such that you can grab data out of one system and put it into another. And my area of specialisation over the last few years has been this data transformation. So lets talk about JSON, so JSON is my current area of focus. So JSON's a very lightweight format, you know it's very easy to describe, its just got numbers and strings and bouillons and a couple of simple structures arise and objects is associative to arise. But the challenge with JSON is that it's very simple on one hand, but also you can build very complex structures out of it because it's fully composable. So you can build these deep hierarchies of data. And so on the face of it, it seems really simple, but when you actually get sample data out of different systems, the fact that they all speak JSON, again doesn't really help you much because they're all wildly different formats. And so there's a challenge for doing not just extracting the data you're seeing, but also transforming, so you can pass to another system. Now we're all Java Script programmers, so it's bread and butter to us just to take some JSON data and write a Java Script programme and also traverse that in order to pick out the interesting bits of data that we want. When you think about it, there's a lot involved in just doing that simple task. Of course, you think about algorithms, and we have to set up loops, perhaps, and do there any conditional processing, we've got to check data types, and then we've got to do lots of error checking as well, if we want to make this a robust bit of code. So we're gonna spend a lot of time just doing something that's actually on the face of it, quite a simple task. And we're programmers. We're locking out all of these people who are, you know, perfectly technical computer users, but they can't get at this data within the JSON format, or any other format for that matter, because of this problem, where by you've got to, you've got to do all this navigation, and querying in code. Now, this isn't a new problem, so if we go back to the 1970's, anyone remember the 1970's? No, neither do I. But apparently, back in the 1970's the relational database was invented. This was great, all this data was suddenly being stuffed into tables, and rows and columns. And the problem with that was, all this data was now only accessible to the IT people who understood how to programme and access data out of these newly fangled relational databases. Again, the people who really cared about the data, maybe the business people, they couldn't get at it because they didn't have these programming skills. And then SQL was invented. So SQL, the Structured Query Language was invented that suddenly made this data accessible to a much wider audience. We've got these, reasonably competent computer users who are not programmers necessarily, can now access all of this data. So what is it about SQL that made the difference? That made it accessible to this wider audience? Well the difference is, it was a clarity of language. Okay, now what do I mean by the clarity of language? It's a language where you still have a syntax to learn, but you take that syntax and you use it to describe the result, okay, rather than describe it in the steps that a computer has to take in order to get that result. That's the fundamental difference of the clarity of language. And users could really get their heads around that the idea of describing the result, rather than having to implement an algorithm and do error checking and all those other hideous things that we do all the time. So, well along another two decades, and XML became popular. And that the World Wide Web Consortium quickly realised that they need to standardise another declarity of syntax in order to make this XML data accessible to a much wider audience. And also much simpler to access for programmers. And so that was the origins of XPath and XSLT and later Xquery. Okay, so I think we're at that point now with JSON, we've got so much JSON data now flooding across our networks, through your rest API's and IO2 devices and various other things. I think we're spending too much time and effort in getting out the interesting data within this JSON. And also we're locking out the non programmers who really deserve a chance at getting this data themselves. And I think what we need is that the clarity of syntax, like Xpath, learn from the heritage that SQL through to Xpath, Xquery has developed. But apply that to this nice lightweight data format of JSON. Now I need that actually today in my day job. Okay, so at the moment, my colleagues and I are building a graphical data mapper. And also allow users to do integration between these various rest services, okay, in a graphical manner. Now we're aiming this system at non programmers, you know they're technical people, they're people who would use Excel on a daily basis. So they understand writing scripts, expressions, things like that, but they're not hard core programmers. So we've created at the clarity of language for doing just that, JSONata. And it borrows very heavily from the heritage of XPath, XQuery. So my other job, by the way, is that I chair the W3C, XML Query working group. So that's the standards committee that produced the XPath and the XQuery specifications. And that work is still going on, so the first XPath spec came out 18 years ago. The latest XPath, XQuery spec version 3.1 just went to recommendation four weeks ago. So 18 years on we're still adding into that specification, we're still learning the best practises for doing the clarity of syntax. And I wanna take all of that experience that we've learned from developing XPath and XQuery, and apply it to JSON. Alright, so here we have JSONata. Now, you've probably noticed by now, I haven't prepared any slides for this talk, okay. I was just gonna do a show and tell and demonstrate this stuff running, okay, and I'll take requests. So what I'm gonna do is, I'm gonna go off and play it on our website, and this thing we call the JSONata Exerciser, which is the webpage that allows you to try out expressions. I won't play the video, 'cause that's cheating, isn't it? Alright so this is the JSONata Exerciser, and let me just zoom in on that a bit. Alright, so very simple, we've just got a panel on the left hand side where we've got some sample input JSON. And then in this little box here, we've got a space to type out our JSONata Expression, and the results will get shown down below, there. And we provide some sample data out at the box, you know, for you to play around with. Or you can just cut and paste in your own JSON data there. So lets start with really, really simple expressions. Okay, so this is gonna be very shallow learning curve, we're aiming this at people who are not programmers, we want this to be easy to use. So we're got some data on the left hand side, which has got some contact details. And our first expression, very, very simple. It's just a name, surname, what does that do? That picks out the tag there, the item surname from that object and it returns the value, Smith. What else have we got? So lets look at address. Now Address, is an object. Okay, so it has some structure to it. So just type in Address, returns that whole object. But we can carry on down navigating into that, using this thought notation. So Address.City, returns you to City, which is Winchester. And okay, no surprise here at all, this is what you've been doing in Java Script anyway. So we can extract data, probably what we want to do, is extract several bits of data and combine them in some way. So, we've got a set of operators and functions, allow you to combine and manipulate data as well, or within the expression. So lets try FirstName and here we have a concatenation operator. Okay, so we're pulling out two fields now, and we're concatenating them together with a string literal in between and just a space. Alright, so lets starts looking at array data. So Phone, is an array, okay, we type in phone and we get the array return, it's an array of objects. They're all identically formatted objects, they each have a type and a number. So, we're interested in the number, phone numbers. So lets type dot, and type number. And we get all four of them, okay. Now, this shouldn't surprise you, this is a query language, okay. So we've said I want all of the phone number. I want all of the items in that document, that match phone number. And there's four of them. It's navigated directly inside array, it's almost ignored the fact that there's an array there as far as the expression is concerned, okay. So if you did that in Java Script, that's not the result you'd get. Okay. If your interested in just one of the items in the array, then that's fine, we can put an expression in there that just picks out the first one. So we put an index in there in square brackets, and that's picked out the first one. But actually, if we're dealing with real world data, that array could be hundreds of items long and the particular item within that array, we probably don't really know the index, or people don't care about the index. What we're really interested in, is something else about it. For example, I'm interested in the mobile number. I've no idea where it is in that array, I just want the mobile number. And so, I actually want to query that. And let the engine find it for me. So you might be thinking, oh, what's any of this got to do with SQL? Well, very similar concept, syntax is completely different, but in SQL, if you're familiar with it at all, you'd be doing a select, you'd say select number from a phone where type equals mobile. But this is much more aligned to the XPath, XQuery syntax, which is designed for traverse in query and hierarchical data. Alright, so lets carry on then, and look at something else. So lets look at some other data. So this one, we've got some arrays, and we've got arrays nested within arrays, and we've got a bit more variety in the other data types there. We've got a lot of numeric data. So lets have a play around with this. So, we can see one of the items we've got here is price. So lets navigate down and select all of the prices of all the products. There we are. So, I get an array of four numbers. And then I can apply functions, we've got a function library, and some of our functions are good at aggregating data. So we have a function called sum. So we could apply the function sum of that array of four numbers and get the total of them. Before we do that, we know to start actually, as well as price, we've got a quantity field in there. And actually, it'd be wrong just to sum up the prices because, we'd want to, for each item, for each product, we'd want to multiply the price times the quantity before we did that final sum, wouldn't we? And we can do that. So we can do a little sub expression at this point, put parenthesis around there, and so actually, I want price times quantity. And so, each of those values then, is the price times quantity for each product. And then, I can apply my sum function. To get the final result. Okay, so there's a fairly simple expression we've built up, we've learnt fairly quickly. You'd have written quite a bit of Java Script code to do the same thing. So that's one of the benefits of this sort of declarative query syntax. So, we've seen how to navigate a structure, we've seen how to combine values using operators, we've seen a fairly limited set at the moment. We've seen that there are functions, there are a lot more functions in this. One of the other things that you might want to do, is actually format the output, okay, so at the moment, I've just created each of these expressions, a single value. But actually, this is capable of generating, arbitrarily complex JSON structures, as it's output. And so to do that, we need to have some syntax that builds arrays and objects. And since JSON already has that syntax in the form of square brackets and braces, then why don't we just use that one, so things like dot one, dot two, dot three, will actually generate an array and a similar natural JSON syntax for generating objects as well. In fact, JSONata is a super set of JSON, and as father, any valid bit of JSON, is also a valid JSONata expression, which produces that JSON. So I'll show you an example of that. So here, I've got some JSON from a rest API, lets just expand that. Okay. So if you can see the URL there, I'm querying the NPM JS.Org rest service and getting all the downloads statistics for the JSONata package from NPM since the first of October, last year. Okay. And that's generated this structure, it's a relatively simple structure. But it's got lots of objects within array. So lets copy that out. And first of all, I just paste it into the JSONata expression window here, just to prove to you that it's quite happy with processing that, and generating that as the output. Actually, lets do some more interesting things and put it over here. Alright so we've got... We've got this JSON data, the thing we're really interested in, is this array called downloads. So lets start off looking at that. So downloads, lets start by transforming that structure into something slightly differently. So, lets do dot, and then we'll do our object creation syntax. So at the moment, we're just creating empty object for every download. So I could do something like, oh I don't like day, I'd rather call it date. Okay. So now that's gonna start populating our output structure. And then, oh I'd rather have account instead of download. Thank you. So it didn't find anything called downloads, so it just ignored it, didn't match the query. I did that on purpose. Now the usual rules of JSON exist, the key has to be a string, and it has to be unique within the object. In JSONata, it doesn't necessarily have to be a string literal, it could be any expression that generates a string, as long as it generates a unique string. So now, I've got an array of objects, where the key is the date, and the value is the downloads. And since each one is unique, actually it'd be nice if I, rather than having an array of objects, I just had a single object, with key value pairs here. And the reason that's generating an array is because of it's dot operator here. Dot is really a sort of a for each, it's saying for each of my downloads, creates whatever we're defining on the right hand side. For each of my phones, give me the number. It's kind of a glorified for each. So if I get rid of that dot, now I'm just saying, well within the contexts of downloads, please create me this structure here, this object. And so now we've got a single object, with these nine value pairs. These keys have to be unique, within a structure. So how does JSONata behave if this expression doesn't generate a unique key? So for example, if we stripped off the day, then we'd have year and month, and then we'd get a clash. What happens then? Well we can do that, and we'll see. So there is function called sub string. And so what we'll do is we'll just select the first seven characters of that day. So we start at character null, and we select seven of them. So what it's done here, it its grouped together, all of the values that match that key. So in SQL terms, it's done group by, okay. It's done some grouping and it's gathered together all of the values in an array that's associated with that key. Now we know what we can do with values, numbers in an array we can aggregate them. So lets use our old sum function again on downloads. And so now what we've got is a transformed structure where we've got the aggregated monthly downloads of JSONata from NPM okay, since the first of October, okay, which we've derived from the daily download information. Now we've got all the aggregation functions we could have chosen, there's max, which'll give you the maximum daily downloads, or average, okay. I'll stick with sum. Alright, so the JSONata engine is actually running within the browser, it's not going off to the server at all, okay. The JSONata engine was designed to be lightweight, I want it to run full stack, it runs Node JS on the server side and it runs in the Java Script engine directly in the browser. So we've seen it running in the browser side. Lets have a look at it running server side, in Node JS. You want me to hurry up, okay, I'll hurry up. So, Node Red, so we saw this last month, this was demonstrated. What wasn't demonstrated was that Node Red embeds JSONata. Okay, so lets set up a very simple flow. What we're gonna do, we're gonna wire together a webserver, okay, so I've an HTTP get, and lets call it demo. Okay, so what's gonna happen, this is gonna get triggered whenever we do a get request into this server. Now, I guess it's request reply, we need to do an HTTP response at the end of this flow somewhere. And in the middle, we want to call out and get some data. So I'm gonna do a net http request and the thing I'm gonna request is, this URL here. Alright. And that returns the positive JSON object. So if I wire these together, and deploy that... Alright. I can drive that using Postman. So here we are, I'm going to local host 1880/demo. Cannot get demo, why is that? Maybe. Oh that's a relief. Okay, so that's the data as it comes out of NPM JS. So what we're going to do now, is we'll do some processing on there. And rather than writing and Java Script to do that processing, we're gonna use one of the built in Nodes, called a Change Node, okay. And this just allows you to change something in the message pay load, and you can set it to string literals, or numbers, or this thing called an expression, which is a JSONata expression. So lets go in there, and we'll use this one that we've created here. So we'll just cut and paste that in. And now we do have to modify this slightly within the Node Red environment. The payload goes into this structure called message.payload. So if we do message.payload.downloads that should do what we want. And now we'll break this connexion and we'll wire the flow to go through that. So I've deployed that, so now, when I do a get, it now evaluates that JSONata expression. Okay, so that's sort of a demo of it working server side. And you can embed that in your Node JS applications as well. Okay, so that's, you know, just scratched the surface of the capability of JSONata. It's got a fully functional computer engine within it's too incomplete. It supports regular expressions and all sorts of other features. Okay, so, please do go and have a play around with it.