The data streams must have at least one common field.
By default, only data that exists in both groups appear in the results.
example:
qs = cogroup qsd by 'State', qsp by 'State';
in this sample, data stream qsd contain field State and data stream qsp also contains field State, we can use it for grouping.
q = cogroup ops by 'Account', meetings by 'Company';
Account in ops data stream should have the same value as Company in the meetings data stream.
Use case: show death per state percentage from 2 datasets.
dsd = load "StateDeath2";
dsp = load "StatePopulation2";
ds = cogroup dsd by 'State', dsp by 'State';
ds = foreach ds generate dsp.'State' as 'State', sum(dsp.'Count') as 'Population', sum(dsd.'Count') as 'Death', (sum(dsd.'Count')/sum(dsp.'Count')*100) as 'Death (%)';
The Result
Let's try to use Opportunity and User datasets from our previous blog.
dsu = load "user1";
dso = load "opportunity";
ds = cogroup dsu by 'Id', dso by 'OwnerId';
ds = foreach ds generate first(dsu.Name) as 'Name', sum(dso.Amount) as 'Sum_Amount';
ds = order ds by Name;
If you notice, Angela is not shown on that chart, because she does not have any Opportunity records. Remember that only data that exists in both groups appear in the results.
Wait... there is more, Outer cogrouping combines groups as an outer join. For the half-matches, null rows are added. The grouping keys are taken from the input that provides the value.
result = cogroup rows by expression [left | right | full], rows by expression;
Specify left, right, or full to indicate whether to perform a left outer join, a right outer join, or a full join.
Example: z = cogroup x by (day,origin) left, y by (day,airport);
You can apply an outer cogrouping across more than 2 sets of data. This example does a left outer join from a to b, with a right join to c:
Example: result = cogroup a by keya left, b by keyb right, c by keyc;
Let's use the same SAQL from above and add left outer join.
dsu = load "user1";
dso = load "opportunity";
ds = cogroup dsu by 'Id' left, dso by 'OwnerId';
ds = foreach ds generate first(dsu.Name) as 'Name', sum(dso.Amount) as 'Sum_Amount';
ds = order ds by Name;
Reference:
No comments:
Post a Comment