When we create a new dataset using CSV file, here are the items in the Monitor:
Let's understand each item and the Node Type, data flow title sample_data_4 Upload flow - Overwrite:
- sample_data_4 is the dataset name, not CSV file name;
- Upload flow - Overwrite suffix is always the same for all CSV load.
Nodes involved for CSV data load:
- sfdcFetch
- csvDigest
- optimizer
- sfdcRegister
When we replace dataset with a new CSV file, the title and nodes in Monitor will stay the same.
2A. Simple data fetch from Salesforce
Here we have a simple dataflow with 2 nodes: sfdcDigest and sfdcRegister.
Items in the Monitor:
The title is Dataflow name, and nodes involve in Monitor for this dataflow:
- sfdcDigest
- optimizer
- sfdcRegister
What happened if we add filter conditions to the sfdcDigest node? Will it change the nodes in the monitor? The answer is No, because the filter happens inside sfdcDigest node only.
optimizer is always run before sfdcRegister for each sfdcRegister mode.
2B. Adding augment nodes to Dataflow
Here are nodes in the Monitor:
From the above screenshot, we have another sfdcDigest node for getUser, and augment node.
2C. Adding sliceDataset and filter nodes to Dataflow
Here are nodes in the Monitor:
Now we have addition nodes: sliceDataset and filter as per order in the data flow.
2D. Add edgemart and computeExpression nodes to Dataflow
Here are nodes in the Monitor:
Edgemart note start first and computeExpression node is run after augmentAccount_User, so this order as per dataflow. From the screenshot edgemart and computeExpression also run sliceDataset node with name DropSharingRulesFrom-, further check, this DropSharingRulesFrom- is randomly added, it can be appeared for sfdcDigest or augment node too, I am still checking what is the caused.
3. Trend Salesforce Report
Next, let us see how Trend from Salesforce report to Einstein Analytics. When you setup Trend for the first time from Salesforce report, it will run once to create dataset and dashboard, this activity happens before the scheduled date/time.
Sample from Monitor:
There are only 3 nodes here:
- sfdcFetchReport
- optimizer
- sfdcRegister
But, when the scheduler running, this is the nodes:
Let us see each node:
- edgemart - to read existing dataset
- sfdcFetchReport - to get data from Salesforce
- let us ignore DropSharingRulesFrom
- append - to add existing dataset data read from edgemart, with new data from sfdcFetchReport
- optimize and register by overwrite the dataset
4. Recipe with Append
This is a simple recipe to add a dataset to another dataset and produce new dataset.
When we run the recipe, here are nodes in the Monitor:
Let us see each node:
This is a simple recipe to add a dataset to another dataset and produce new dataset.
When we run the recipe, here are nodes in the Monitor:
Let us see each node:
- edgemart from append (new) table and edgemart from root (based) table
- let us ignore DropSharingRulesFrom
- two computeExpression nodes
- append transformation node
- slideDataset transformation node
- optimizer and sfdcRegister nodes