JQL: Getting started guide

In this section, we'll explore the core concepts of JQL by working through a real world example.

Imagine we're a freemium SaaS business with two paid plans, Basic and Premium, and we want to understand how many of each plan were purchased, and how long it took users to convert from free to paid.

Running example queries

To help you work through this example, we've populated a demo project with some data you can query. At any point in the tutorial, you can execute the current version of the code to see how it works.

Try out JQL using our playground

To run a query from your browser using test data, you can use this query console. Just paste in the example code and hit "Run query".

Running & saving queries within robotmia

When you're ready to run queries against your own data in robotmia, just log in, go to Applications, and click on the application "JQL" to be able to run queries interactively and save them so your co-workers can run them too.

Running queries from the command line: Mac OS X or Linux

JQL can be executed via an HTTP endpoint, with JQL query provided in a body of a POST request. With the query saved in query.js file, the following command uses the cURL utility to upload our query and return back the results, in JSON format.

# sends the JQL code in `query.js` to the api
# to a project with sample data for this tutorial
curl https://robotmia.com/api/2.0/jql \
    -u ce08d087255d5ceec741819a57174ce5: \
    --data-urlencode script@query.js | python -m json.tool

In order to authenticate requests the -u flag lets us pass a username and password. To query the JQL API for your project, use its API secret as the username, with a blank password.

To learn more about Query Authentication, look at the Data Export API documentation.

If you get an error that says Missing required parameter: script, you probably forgot to save your code in the query.js file.

Step 1: Querying events

To get started, let's look at the simplest possible JQL query, which retrieves all of the events between two dates:

function main() {
  return Events({
    from_date: "2016-01-01",
    to_date: "2016-01-07"
  })
}

There are two requirements that all JQL queries must fulfill:

  1. They must define a main() function. We compile your JavaScript and execute the main() function against your raw data to perform the analysis
  2. The main() function must call the Events() or People() function to select a collection to analyze

As you can see, our simple example fulfills both of these criteria and no more. It uses the events collection. When you run this minimal query, it will return a JSON list of all events sent to your project between Jan 1 and Jan 7, 2016:

[
  {
    "name": "login",
    "distinct_id": 4,
    "time": 1420124166000,
    "properties": {}
  },
  {
    "name": "purchase",
    "time": 1420120626000,
    "properties": {
      "plan": "Basic",
      "signup_date": "2014-12-03"
    }
  },
  ... // more events
]

The JQL framework revolves around two concepts: collections and transformations. A collection is a list of values which can be transformed into other collections to perform analyses and answer complex questions.

One example transformation is filter(), which is used to filter down a collection to the values you care about.

Step 2: Filtering the dataset

We are only interested in "purchase" events, so the next step is to filter the events collection to just those events:

function main() {
  return Events({
    from_date: "2016-01-01",
    to_date: "2016-01-07"
  }).filter(function(event) { return event.name == "purchase" });
}

When you run this query, the output is still a list of events, but it's been filtered down to just the purchase events:

[
  {
    "name": "purchase",
    "time": 1420120626000,
    "properties": {
      "plan": "Basic",
      "signup_date": "2014-12-03"
    }
  },
  {
    "name": "purchase",
    "time": 1420380728000,
    "properties": {
      "plan": "Basic",
      "signup_date": "2014-11-17"
    }
  },
  {
    "name": "purchase",
    "time": 1420603167000,
    "properties": {
      "plan": "Premium",
      "signup_date": "2014-12-29"
    }
  },
  ... // more purchase events
]

Step 3: Counting purchase events

Now we can answer a real question: how many purchases were there? To do this, we can use another transformation, reduce(), which is used to reduce a collection to just a single value.

function main() {
  return Events({
    from_date: "2016-01-01",
    to_date: "2016-01-07"
  })
  .filter(function(event) { return event.name == "purchase" })
  .reduce(function(accumulators, events) {
    var ret = events.length;
    // Add previously accumulated reduce function results.
    for (var i = 0; i < accumulators.length; ++i) {
        ret += accumulators[i];
    }
    return ret;
  });
}

With result being equal to:

[359]

It's important to note that just calling reduce() by itself doesn't do anything -- you have to tell it what to do. Like other tranformations, reduce() has a required function argument - a reducer function. That function implements the following signature:

function (accumulators, items) {
    // Combine previously-aggregated "accumulators"
    // with new items and return the result.
}

A reducer function accepts two arguments: an array of its previous results (accumulators), and an array of items to reduce. The first argument - accumulators - is used to break down the initial set of values into many calls of the reducer function. JQL runs the reduction in a hierarchical fashion, where some subsets of inputs are reduced to accumulator values first, and multiple accumulator values are combined subsequently.

robotmia provides a number of built-in reducer functions for frequently used aggregations. To just count the number of inputs, like in the example above, we can pass robotmia.reducer.count() to .reduce():

function main() {
  return Events({
   from_date:"2016-01-01",
   to_date:"2016-01-07",
  })
  .filter(function(event){return event.name == "purchase";})
  .reduce(robotmia.reducer.count());
}

Step 4: Grouping purchases by plan

So, now we know how many purchases occurred during that week in January. But what did people buy? We have two plans, and we would like to see how many we sold of each.

We can do that using another of our transformations, groupBy(), to group these transactions by "plan" before counting them:

function main() {
  return Events({
    from_date: "2016-01-01",
    to_date: "2016-01-07"
  })
  .filter(function(event) { return event.name == "purchase" })
  .groupBy(["properties.plan"], robotmia.reducer.count());
}

You will notice that we've replaced our reduce() transformation with groupBy(). groupBy() is used to split a collection into groups. After splitting into groups, the reduce function passed to groupBy() is called on each group individually.

This query is getting complicated, so let's take a step back and reiterate what it's doing:

  1. get all the events in a 7 day period
  2. filter down to just the purchase event
  3. group the events by their "plan" property
  4. count the number of events in each group

When we execute the query, we get the following output:

[
  {
    "value": 254,
    "key": [
      "Basic"
    ]
  },
  {
    "value": 105,
    "key": [
      "Premium"
    ]
  }
]

Now this is getting interesting. Our analysis shows that we sold 254 Basic plans and 105 Premium plans in that week.

Step 5: Calculating days between signup and purchase

Let's take this one step further and answer a more complicated question: how long did it take users to sign up for these different plans? Does it take more or less time to convert a user to a Premium plan than a Basic plan?

This will require us to calculate a "days to purchase" value for each event, which we can achieve with the "signup_date" property and the timestamp of the purchase event.

We want to calculate the average days to purchase, but we can't go straight there. First, we have to get the components required to get an average: sum of numbers and the total count.

function main() {
  return Events({
    from_date: "2016-01-01",
    to_date: "2016-01-07"
  })
  .filter(function(event) { return event.name == "purchase" })
  .groupBy(
    ["properties.plan"],
    function(accumulators, items) {
        var ret = { sum: 0.0, count: 0.0};
        for (var i = 0; i < accumulators.length; ++i) {
            ret.sum += accumulators[i].sum;
            ret.count += accumulators[i].count;
        }
        ret.count += items.length;
        for (var i = 0; i < items.length; ++i) {
            ret.sum += getDaysSinceSignup(items[i]);
        }
        return ret;
    }
  );
}

// helper function to figure out how many days have
// passed since the user signed up
function getDaysSinceSignup(event) {
  var signup_date = new Date(event.properties.signup_date),
      purchase_date = new Date(event.time),
      milliseconds_per_day = 60 * 60 * 24 * 1000;

  // date subtraction yields milliseconds, so divide to get days
  return (purchase_date - signup_date) / milliseconds_per_day;
}

When we run this query, we see:

[
  {
    "value": {
      "count": 254,
      "sum": 14745.350497685187,
    },
    "key": [
      "Basic"
    ]
  },
  {
    "value": {
      "count": 105,
      "sum": 3492.4220370370363,
    },
    "key": [
      "Premium"
    ]
  }
]

For the final step, we will use the map() transformation to get the average from the sum and count. map() is used any time we want to do a one-to-one transformation of a collection, by applying the provided function to each element of the collection.

function main() {
  return Events({
    from_date: "2016-01-01",
    to_date: "2016-01-07"
  })
  .filter(function(event) { return event.name == "purchase" })
  .groupBy(
    ["properties.plan"],
    function(accumulators, items) {
        var ret = { sum: 0.0, count: 0.0};
        for (var i = 0; i < accumulators.length; ++i) {
            ret.sum += accumulators[i].sum;
            ret.count += accumulators[i].count;
        }
        ret.count += items.length;
        for (var i = 0; i < items.length; ++i) {
            ret.sum += getDaysSinceSignup(items[i]);
        }
        return ret;
    }
  )
  .map(function(item) {
    return {
      "plan": item.key[0],
      "avg days to purchase": item.value.sum / item.value.count
    }
  });
}

function getDaysSinceSignup(event) {
  var signup_date = new Date(event.properties.signup_date),
      purchase_date = new Date(event.time),
      milliseconds_per_day = 60 * 60 * 24 * 1000;

  // date subtraction yields milliseconds, so divide to get days
  return (purchase_date - signup_date) / milliseconds_per_day;
}

When we run this query, we will get the following output:

[
  {
    "plan": "Basic",
    "avg days to purchase": 58.05256101450859
  },
  {
    "plan": "Premium",
    "avg days to purchase": 33.261162257495585
  }
]

And that's the final result: users take about twice as long to buy a Basic plan as they do to purchase a Premium plan. This is a contrived example, but in the real world your next steps might be

  • calculate a distribution and figure out the median instead of the average
  • dig in further to figure out what else is different between those user groups

Step 6: Examining behavior patterns

An analytic task is often centered around people behavior - sequences of actions performed by particular users. We are often interested in what events our users do, when these events happen, and in what order.

JQL provides a groupByUser() transformation, which allows its argument function to look into the event history of each single user.

The result of this transformation is a collection that contains one object per user. Each object has the structure outlined below:

{
    key: [<user distinct id>]
    value: <return value of transformation function>
}

In this tutorial step, we use groupByUser() to find out how often users are making purchase straight after logging in.

function main() {
  return Events({
    from_date: "2016-01-01",
    to_date: "2016-02-01"
  }).groupByUser(function(state, events) {
    state = state || { last_login: false, count: 0 };
    if (events.length > 0) {
        if (state.last_login &&
            events[0].name == "purchase") {
            ++state.count;
        }
        for (var i = 0; i < events.length - 1; ++i) {
            if (events[i].name == "login" &&
                events[i + 1].name == "purchase") {
                ++state.count;
            }
        }
        if (events[events.length - 1].name == "login") {
            state.last_login = true;
        }
    }
    return state;
  });
}

Note the the transformation function provided to groupByUser() takes two arguments - the state object and list of events. When the list of events that belong to a single user is long, JQL breaks down that list into multiple calls of the reducer function.

In that case, the state argument is used for communication between calls - it contains the return value of the last call to the reducer function.

In the example above, the output collection will consist of objects that include properties computed by the reducer function, in the value field.

  {
    "value": {
      "count": 4,
      "last_login": false
    },
    "key": [
      "3f151e8d-2bf4-4ecd-91e3-dbe8a8666eb0"
    ]
  },
  {
    "value": {
      "count": 2,
      "last_login": false
    },
    "key": [
      "da1ab57e-8fed-4ae4-9b65-5ea648b0f686"
    ]
  }
  // More per-user records.

Writing your own query

This query walkthrough has introduced you to most of the concepts required to write your own JQL query. We covered:

For more information about these concepts (and a few we haven't covered yet), please read the full API reference.