guides:drawqueries

Queries

This is an introduction into the draw query language. It is supposed to help understand, read and write draw queries.

Connecting your data in DuckProve is quite easy. In this tutorial you will learn how to use DQL (Draw Query Language) in order to draw your data on charts. First, let's login to DuckProve and open the data query page using the navigation on the top right corner.

Our first query will output a list of all experiments which are configured. You can try executing the following:

draw table using name from @experiments

Did it work? Great!

Now if we have a closer look on this query, we can identify 3 main parts in this query:

draw table -- part 1 (What to draw? Could also be a line-chart, number etc.)
using name  -- part 2 (What columns are used?)
from @experiments -- part 3 (From which table is the data?)

A draw query always starts with “draw …” in the first part which identifies what kind of element you would like to draw. In this case, we are drawing a table. The second part identifies the columns we are using and the third part which table. In DuckProve, all data is stored in tables. The from directive, allows you to tell DuckProve which table you would like to access. The using directive on the other hand, allows you to tell DuckProve which columns you want to use. There are more tables available.

This was very easy. Now let's try to get data from another table, for instance the @tags table. This table contains all tags which were created using the experiments.

draw table using name from @tags

This should output a list of the tags.

Limiting Output

Now, let's just imagine that you have a huge amount of experiments. You are just interested in the first 5 items. In that case, you can use the limit directive like in the following example:

draw table using name from @experiments limit 0, 5

This tells DuckProve to use the items 0-5. Okay, let's be honest. This is not too much interesting, because you can also access this data from the navigation. Let's try to use a different table. There is the @views table. This table has a row for each page view made on your site. Because this table is potentially huge, and we don't want your browser to crash while trying to draw all data, let's take the first 5 items from this table.

draw table using url from @views limit 0, 5

You might have noticed we are using url instead of name in this example. The views table does not contain an accessible name column. If you would like to know what columns are available you can read it up in the documentation.

Ordering

So, now you just saw the first 5 urls which were accessed after DuckProve was installed on your site. Yes, those are the FIRST 5 items. What if we wanted to access the LAST 5 urls? In the next example you will see, that won't be much of a problem.

draw table using url from @views 
order by id desc 
limit 0, 5

The “id” column is available in all tables. It is actually a number, and every row which is added in the table gets one number higher than the previous one. This makes the “id” column the perfect identifier. Because the id column is always increasing, the latest values will be the biggest.

Another thing we could do now, which probably would be quite interesting, is to check WHEN those page views happened. The @views table has a column called captured which contains this time.

draw table using url, captured from @views 
order by id desc 
limit 0, 5

As you can see now, you will have two columns in your output. One is the URL of the view, and the other one is the time when this page view was captured. But DuckProve can do much more. Let's assume you would like to know for how long the user was on that specific page.

draw table using url, captured, ended from @views 
order by id desc 
limit 0, 5

Calculations

This will show you the approximated time of how long this view occurred. In most cases this time is quite accurate, but when the user closed the browser it can be wrong up to 30 seconds at the moment, this might change in future.

We are still doing very simple queries. The previous one was a bit more interesting, but it still is difficult to read in case you are interested in the page view durations. It would be way easier if there was just the time.

draw table using url, ended - captured as duration from @views 
order by id desc -- "desc" means descending, you can also use "asc" for ascending
limit 0, 5

In this query we subtract the captured time from the ended time. Because both values are in seconds, the result will be the amount of seconds the user spent on the url. As you can see, you can do arithmetic with your data which gives you a great flexibility. Using the “as” you are able to give the column a name. You always need to do this if no name can be determined, otherwise the query will fail.

Filtering

What you also can do is, filter the data based on a condition. This allows you for example, filtering for the last 5 page views which were longer than a minute:

draw table using url, captured, ended from @views 
-- "@views." is just a hint to DuckProve to use the column from the views table.
where @views.ended - @views.captured > 60 
order by id desc
limit 0, 5

You can use multiple conditions using the or and the and operator like in the following example:

draw table using url, captured, ended from @views 
where @views.ended - @views.captured > 60 and @views.captured > time() - 3600
order by id desc 
limit 0, 5

The result you should receive now, should be the last 5 page views captured within the last hour which were longer than 1 minute. In this example you can also see one more new thing, the “time()”. This is a function, and this function returns the current time. You can safely assume this time is in seconds, as in this example we subtracted 3600 seconds (1 hour).

Aggregations

Let's move forward to the next level. Using aggregations, we can “fold” this data and calculate sums, counts, averages etc. The following example will output the total amount of views made during the selected time range.

draw number
using count() as value from @views

You should be aware of this number being not entirely the total amount of page views, it is just the amount of views made during the selected time range. What happened here exactly? We use here the count() function. This function is an aggregation function. If there is an aggregation function in the using clause, all rows of a table collapse together to a single row. To be more clear, let's imagine we did not use an aggregation function in this query - it would return many rows, one row for each page view, right? But because we are using an aggregation function, there is only one row, with one column. This column contains the result of the aggregation function.

This is very useful, especially if it is used together with groupings like in the following example:

draw table 
using url, count() as TotalViews from @views
group by @views.url
order by TotalViews desc
limit 0,100 -- there might be a lot of entries, which would take forever. Lets only take the first few entries.

Now, the aggregation function still collapses the rows, but it collapses only those which have the same url. This means, the data is grouped by the url. That's fancy! We now have a list of pages which have been viewed most in the first 100 page views. What if we wanted to know the average time spent on a page? Easy.

draw table 
using url, avg(ended - captured) as AverageDuration from @views
group by @views.url
order by AverageDuration desc
limit 0,100

There are more functions available. You can read about them in the functions documentation.

So, lets do one more thing in this @views table. Let's draw a line chart. In the end we want a chart with the time on the x-axis and the amount of views on the y-axis. Easy.

draw "line-chart"
using timefloor(captured) as "x-value", count() as "y-value" from @views
group by timefloor(captured)

That's it, your line chart. We might want to mention that the timefloor function rounds the time down according to the selected time interval (hour, day, week…).

Joining Tables

In the next example, we are finally going to connect data of two tables. One case where this is very useful, are the tags and the experiments. The tags are not really helpful without the experiments, so let's count the amount of experiments in each tag.

draw table
using @tags.name, count() as experiments
from @tags
join @experimenttags on @tags.id = @experimenttags.tagid
group by @tags.id -- we could also group on the tags name, but "id" is faster!

This example included the “join” clause. The join clause takes the two tables (@tags and @experimenttags), and makes a new row for each two rows which match on the condition specified behind the “on”. Not that complicated, right? In the end we group by the tags identifier and count the amount of rows in each group. When you have a look on the tables documentation, you will see a lot of identifiers like the “tagid” from the @experimenttags table. Those table are references to another table, and they are very useful in order to connect data.

And here this instruction ends. Obviously there are way more possibilities and more complicated queries. You will find more examples in the documentation, and if you are looking for something specific which you can not find, feel free to ask in the community.