Pages

Friday, January 10, 2020

Einstein Analytics: Grouping in Dataflow

After the blog to transpose data from columns to rows, and from rows to columns. Today I have another challenge to group data based on a date.

Here is the data


I know the recipe offers this functionality to group data easily, however, I am reluctant to put a recipe in between of two dataflows, as it will cause maintenance nightmare in the future.



But, can we do this in dataflow? Dataflow does not offer data grouping by default, but as still we can achieve with it some tricks. Here we go:


The key node here is just cr1 which is a computeRelative node. I add 4 fields here:
- Sum_1
- Sum_2
- Sum_3
- IsLast

1. Partition the data with Date



2. For fields Sum_1 to Sum_3, choose SAQL (not Source Field), the Type should be Numeric and remember to enter Scale and Default Value. 
Here is the SAQL Expression 
case when previous(Sum_1) is null then current(Data_1) else current(Data_1) + previous(Sum_1) end


3. For IsLast, choose SAQL (not Source Field), the Type should be Text. Here is the SAQL Expression
case when next(Data_1) is null then "Yes" else "No" end

data after computeRelative, before cleanup


4. Delete unused rows with Filter node and unused columns with Slice node.



In another scenario, if you just need to count items in a group, change Data_1 to 1.
e.g. Count_1 is the field name in CR node
case when previous(Count_1) is null then 1 else 1 + previous(Count_1) end


Reference:


6 comments:

  1. I am receiving error messages: about Sum_1: Transform dim/mea not found in Edgemart. Would you have any ideas how to solve this?

    ReplyDelete
    Replies
    1. Do you get the error from ComputeRelative node? What is the field type, make sure it is Measures field for Sum_1 as in my sample

      Delete
    2. Hello Johan,
      Yes, the error is from ComputeRelative node. The fields used are all numeric. I am not able to use the self-referential Sum_1 field in defining the SAQL Expression without the error.

      case when previous(Sum_1) is null then current(Data_1) else current(Data_1) + previous(Sum_1) end

      The reference to previous(Sum_1) causes the error.

      Delete
    3. hmm, not sure what is the cause, can you make sure Scale is entered, and the field name with special characters such as _ or - is correct.

      Delete
  2. Thanks a lot Johan, this article saved my day...

    ReplyDelete