Snowflake Bulk Vs Once Query Difference
Hello everyone,
I would greatly appreciate your assistance in clarifying when to utilize bulk queries versus once per record queries.
My flow is suppose to run every 30 minutes, and I may encounter scenarios where there are more than 50 records or none at all. I'm unsure about the criteria for deciding which type of query to use in such cases.
Additionally, I'm curious if the selection of query type is determined solely by the quantity of data or if it depends on other factors. For instance, should we use the once per record query for custom queries and the bulk query for automatically generated queries?
Thank you in advance for your guidance and insights.
0
Comments
Yash Kumar you've just about hit the nail on the head. Bulk insert is great when you have thousands and millions of records to push into Snowflake because it's more efficient (up to 1 MB of data per insert) thus consumes fewer credits in the long run, you don't have to write any sql, and you can use the mapper that you're used to. However, with bulk insert, you most likely need an additional flow to run your merge query so that you can dedup your data. Outside of the bulk insert option, you have "run once per page of records" and "run once per record". These two options require you to write some sql, but would allow you to not have two flows strung together since the sql could just be a merge query that inserts and updates your data. Out of those two options, running once per page would be preferred because you could run the query once for the entire page of data (default is 20 records per page) instead of having to run a query for each individual record.
Bulk insert
Pros:
Cons:
Typical setup:
Run once per page
Pros:
Cons:
Typical setup:
Run once per record
Pros:
Cons:
Typical setup:
Same as run once per page, but with this query.
Thanks a lot Tyler!
I was looking to know regarding one more point i.e if I have to perform both Add and update on the same table does it requires two separate steps?
Yash Kumar the merge query for Snowflake handles both inserts and updates so you could use it like I showed above. The recommended approach is to have one flow that bulk inserts data into a table, then have another flow triggered to run afterwards that merges the staged data into some final production table.
Please sign in to leave a comment.