Pages

Saturday, April 27, 2019

Einstein Analytics: Handling null values

This blog is not related to Null Handling in Measures. As per this article, null values in Dimensions are not completely supported in Einstein Analytics, however, we often deal with null values in many scenarios.

To prevent data quality issues, Einstein Analytics will disregard any fields in Salesforce (or columns in external data) that are entirely null.

Grouping with Null
Date field by default is null, when you use it in a chart for grouping, null value will not be shown. To overcome this, if you pull data from Salesforce, set a defaultValue (e.g. 1900-01-01) to override in sfdcDigest.



Dimension field by default is null, when you use it in a chart for grouping, null value will not be shown. To overcome this, if you pull data from Salesforce, use defaultValue (e.g. NA, " is not needed) to override in sfdcDigest.

Measure field by default is 0 for null value. We cannot use the Measure field for grouping, but we can use it for the filter.


Dataflow
If you need to filter out null value for Dimension, you can't use the following in Filter node:
  Field1__c != "" && Field2__c != "" <-- this will not filter any
  Field1__c is not null && Field2__c is not null <-- this will throw error

Solution: use defaultValue in sfdcDigest (e.g. NULL, " is not needed) to override null. So, your filter nodes would be Field1__c != "NULL" && Field2__c != "NULL".
The 2nd option, adding compute expression nodes to check case when Field1__c is null then "No" else "yes" end, then use Filter nodes to check the ComputeExpression field for "Yes".

Filter null records in SAQL
q = load "DTC_Opportunity_SAMPLE";
q = filter q by 'Product_Name' is null;
q = foreach q generate 'Account_Owner' as 'Account_Owner', 'Product_Name' as 'Product_Name';
q = order q by 'Account_Owner';
q = limit q 10;

Filter null records in SAQL
q = load "DTC_Opportunity_SAMPLE";
q = foreach q generate 'Account_Owner' as 'Account_Owner', 'Product_Name' as 'Product_Name', (case when 'Product_Name' is null then "" else 'Product_Name' end) as 'PM';
q = filter q by PM == "";
q = order q by 'Account_Owner';
q = limit q 10;

Count Not null records in SAQL
q = load "Lead";
q = foreach q generate (case when 'SFDC_Lead_ID__c' is null then "" else 'SFDC_Lead_ID__c' end) as 'SFDC_Lead_ID__c';
q = filter q by 'SFDC_Lead_ID__c' != "";
q = group q by all;
q = foreach q generate count() as 'count';

To group null as NA
q = load "DTC_Opportunity_SAMPLE";
q = foreach q generate coalesce('Product_Name',"NA") as 'Product_Name', count() as count ;
q = group q by 'Product_Name';
q = foreach q generate 'Product_Name' as 'Product_Name', count() as count;
q = order q by count desc;


Augment transformation cannot find the parent will cause Null


SAQL Expression in above computeExpression case when 'Acq.Industry__c' is null then "Parent not available" else 'Acq.Industry__c' end

Data in Salesforce

Result in Einstein Analytics

Notes from above screenshot:
  • 1st row for Parent Industry = NA, because we set the default value to NA in sfdcDigest.
  • 1st and 2nd row for Account Source = null, because we set the default value to null in sfdcDigest.
  • 1st row for Employees and 2nd row for Annual Revenue = 0, Einstein Analytics auto set 0 for Measure field with a null value.
  • 3rd and 4th row for Parent Industry = Parent not available, this is because there is parent lookup value, we use computeExpression to set the value, this is different with 1st row, where the parent lookup value is available, but Industry for that parent record is null.


Last update: 23 Sep 2019


Reference:


No comments:

Post a Comment