Answered
Can we specify write disposition with new bigquery connector?
Happy to see the new bigquery connector for use.
Is there any way we can specify a new write disposition for imports into a table? For example if I want a batch load job to have a WRITE_TRUNCATE write disposition?
For now I have a workaround where I ingest into a partitioned table and then have a scheduled query create another table that consists of the latest partition of the former table so that users don't have to worry about dealing with partitions. But this is slightly annoying to deal with and it'd be great if I can just set a WRITE_TRUNCATE write disposition on the batch load and overwrite the contents of the table on every load.
0
Comments
Jeremy Nelson I can confirm the new connector doesn't support running query as jobs. Having said that, we would register this as an enhancement request and would update you once it is rolled out.
Thanks Jeremy Nelson for posting this. Can you confirm if you're trying to achieve this with the new BigQuery connector or existing REST API based connector ?
I'm trying to set this up using the new BigQuery connector .
Thanks for following up, however I'm a bit confused. In the connector itself, for the option "How would you like the records imported?" The first option is "Use bulk insert SQL query".
This implies to me a batch insert is done, no?
If so then this is the "write disposition" option I am referring to that I would like to be able to setk: https://cloud.google.com/bigquery/docs/batch-loading-data#appending_to_or_overwriting_a_table
Jeremy Nelson Batch insertion that is being supported today is through createQueryStream()" instead of a query job. This enables to load the bulk data in JSON array format.
https://googleapis.dev/nodejs/bigquery/latest/RowQueue.html#insert
I think this would be nice opportunity to understand the case in detail. Is it possible to log a ticket through Celigo support and then we can take it forward from there?
Hey Jeremy,
I am not an expert with BigQuery, but maybe you could solve this via two flows in integrator.io:
You can configure the second flow to run immediately after the first flow. Or, maybe this is no better than the work around you are already using?
Also, not sure if it helps, but we use Snowflake + integrator.io internally at Celigo, and here is how we generally do ingestion; and if you are syncing really large data sets then there are some cool efficiencies here that should translate to BigQuery too.
It is worth noting that we always sync raw JSON data into Snowflake, and we sync everything into a single table per source system (i.e. NetSuite, Salesforce, MongoDB, etc...), and then we use Snowflake views to unpack the JSON into views per record type, and this allows us to be really dynamic and iterative with views, because we dont have to ever sync the full data sets again. I am pretty sure Google BigQuery supports this same paradigm where you load raw JSON, and unpack via views.
Lastly, here is a picture of the merge and delete flow if it helps. You can basically use this pattern to run any random SQL stuff in serial order, before or after other flows, etc...
Please sign in to leave a comment.