Here are a few items noted based on my personal findings/experience, if you have additional inputs or a different perspective, feel free to reach me.
1. Combine all computeExpression nodes whenever possible
image-1
image-2
calcURI node in image-1 contains 1 compute field return Numeric, the same for calURI2 node also contains 1 other compute field return Numeric, a total of calcURI1 + calURI2 = 3:41 sec.
In image-2, we combined both compute fields into calcURI node, and it only took 2:0 sec.
2. Do compute as early as possible, and augment as late as possible
The rationale behind this is, compute node will process lesser fields before augment (as augment always adding fields to the stream), unless you need the field from the augment node for computation.
3. Remove all unnecessary fields
Remove all unnecessary fields with a slice node or not to include the unnecessary fields when do augment from the right source. The more fields are handled by each node, the system will need more power and time to process, so slice out unnecessary fields if they are not needed in the dashboard or lens.
Register node usually takes much more time when you write lots of fields, so always clean up before registering to a dataset.
image-3
4. Combine all sfdcDigest nodes of the same object to a node, if sync is not enabled
For some reason, your org. maybe not enable for sync, this does not mean you "must" enable straight away, and please DO NOT enable it without a complete analysis, as this may cause data filtering issue.
You should combine all sfdcDigest nodes of the same object into a node, imagine if you have 10 millions row of opportunity, every sfdcDigest nodes take 10 minutes (as an example), and if the dataflow designer adds 3 sfdcDigest nodes of opportunity, the data retrieve itself will need 30 minutes.
5. Do not perform Null check on filter node
So instead of having something like 'Check.Id' is null in SAQL filter, create a computeExpression node to have a Yes/No compute field, then filter with CheckIdIsNull:EQ:Yes
Filter node with Null check will take a lot of time when the dataflow runs.
Filter node with Null check will take a lot of time when the dataflow runs.
6. Remove unused Register node
Many times, we add Register nodes across dataflow for testing/debugging, but once deployed to Production, make sure Register nodes for testing are removed. Register nodes will take quite some time of the dataflow run, depending on the number of fields and rows.
7. Remove all nodes that are not related to a Register node
These nodes are simply useless.
8. Use Source Field in computeRelative node, not SAQL, whenever possible
Check out this blog.
Thank You Johan :)
ReplyDeleteThis is great information.
ReplyDelete