How to understand the new BigQuery connector "once per page of records"
According to https://docs.celigo.com/hc/en-us/articles/6923178410523-Import-data-into-Google-BigQuery#Records you can use Handlebars to customize your insert into BigQuery.
Take this example given.
"query": [ "insert into `1234.IO_Once_Per_Page_Of_Records` values ({{batch_of_records.0.record.ID}},{{batch_of_records.0.record.NAME}},{{batch_of_records.0.record.POSITION}},{{batch_of_records.0.record.DEPARTMENT}},{{batch_of_records.0.record.DAYS_PRESENT}},{{batch_of_records.0.record.DAYS_ABSENT}},{{batch_of_records.0.record.SWIP_IN}},{{batch_of_records.0.record.SWIP_OUT}},{{batch_of_records.0.record.LEAVE_BALANCE}},{{batch_of_records.0.record.LEAVES_USED}})"
I have several questions on this.
1. There is no closing bracket. Is this on purpose? Do we not close this block? Further do we need any curly braces before or after this to make this an actual JSON object?
2. What does `1234.IO_Once_Per_Page_Of_Records` represent? dataset.table? Typically I use `project.dataset.table` when writing against BigQuery API.
3. batch_of_records.0.record.ID - why are we only taking the first record (0th index)? Don't we have multiple records typically?
I feel like I have no idea where to start even with this documentation.
Comments
Hey Jeremy,
The docs are not good here, and I passed a note to the doc team to fix this asap, and someone from doc/product team will post an update here once the docs are improved. Sorry about this.
At a very high level, the "once per page of records" option is pretty uncommon. The "bulk insert" is much easier to use. Is "bulk insert" an option for your use case? If not, then the key to get the "once per page of records" working is to use the AFE 2.0 UI to help you build the exact SQL query that you would build with any other SQL tool. Please see the following pics for a very rough example. To incorporate data from your page, then you would need to iterate through the "batch_of_records" variable using the handlebars #each function. Hopefully you dont need to go down this path, and I will skip trying to build a proper example here until i hear back that you need this.
Thanks Scott! My intention with using the once per page of records was to add an "ingestion_date" field that captures the current timestamp using Handlebars. Would you suggest using a hook for this instead of my original approach?
You could do this via a hook, which would be easier than using the "once per page of records" option.
Or (even easier), you can use a field mapping to do this. Hopefully the following screenshot explains it all. i.e. click the settings icon for any of your field mappings to see this settings drawer
Please sign in to leave a comment.