The integrator.io platform is robust, scalable, and secure, with capabilities tailored to most third-party APIs. For many flows, the default settings will provide adequate performance levels. However, you may eventually find yourself troubleshooting a slow-running flow, or wanting to improve the general performance of all your flows. In this article, we cover all the different aspects of flow performance. The Celigo platform supports a wide range of settings to debug, govern, and optimize the performance of your flows.
Exports and listeners
- Choosing the right page size
- Four different export types to retrieve data
- Exporting batches vs. single files from an FTP site
- Running SuiteScript hooks on data returned by NetSuite search APIs
- Including sublists in NetSuite real-time listeners
- Storing retry data for errors in your flows
- Further reducing the size of data being exported
- Input and output filters
Configure connections to optimize throughput and governance
Connections play a vital role in the performance of flows. A must-read article, Configure connections to optimize throughput and governance explains in great detail how connections work behind the scenes, along with how connections can be configured to optimize performance of flows and adhere to API governance limits.
Offline connections will pause your flows
Connections go offline when the authentication to the external application fails.
An offline connection will automatically pause all flows using the connection, and the usage stats for the affected flows will show zero until it is brought back online. As soon as the connection starts working, all paused flows will automatically resume processing right where they left off.
It is very important to get notified when a connection goes offline. See Get error notifications via email (You only) and also Configure error notification emails for other users. When subscribed to connection offline notifications, you will also get notified when the connection goes back online.
It is worth noting that while a connection is offline, integrator.io will automatically test the connection to see if it can be brought back online. If the connection cannot be brought back online after a few days, then integrator.io will cease testing it, and the connection will remain offline until you bring it back online by updating the credentials or clicking Test connection. For real-time flows (initiated with a listener endpoint), data will still be queued by the flow when a connection in the flow is offline, but it will not be processed until the connection is brought back online.
Debug mode can potentially log a lot of extra information
When Debug mode is on, all data going in and out of the connection is recorded in the debug logs. A lot of information is captured in order to provide the most meaningful history and troubleshooting information.
In many cases, capturing the extra information will not affect performance, but if you are processing millions of records, then there could be a noticeable lag in your flow’s performance. For troubleshooting these types of flows, it may be helpful to run the flow in test mode, or alter the search criteria in your source export to make sure the number of records being processed is not too big when running in debug mode.
Exports and listeners
Exports and listeners control how data enters your flow for processing. An export is where, for example, you define the database query, set API query parameters, or pick which directory to download files. You have lots of ways to configure exports and listeners to optimize the performance of your flows.
Choosing the right page size
Data comes in lots of different shapes and sizes – compare a 100 MB CSV file containing 1 million rows downloaded from an FTP site to a 100 KB JSON document containing 100 records returned by a REST API. For a flow to be able to consistently process data of any shape and size, exports and listeners will always break data into pages of records.
You can configure the number of records per page via the Page size property in the Advanced section. The default page size is 20 records. There is no hard limit on the number of records per page, but there is a 5 MB limit on the size of a single page. For example, a single page can hold 5,000 smaller 1 KB records, or 1 very large 5 MB single record. If the 5 MB page size limit is reached before the page contains the number of records set in the page size field, then integrator.io will automatically stop adding records to the page, and will instead start a new page. Even though a single page is limited to 5 MB, there are no limits regarding the total number of pages that can be processed by a flow. Flows can process supermassive data volumes because big data is always broken down into smaller pages.
How does page size impact the performance of your flow, and how do you pick the perfect page size for your flow? Let’s look at an extreme example. If you are exporting 1 million records from a database, and you are importing those records into another database, and you set the page size to 1, then this means that every single record in the database will get saved in its own page, and then every single record will get processed one at a time. The time spent paging, combined with the inefficiency of submitting records to the destination database one at a time, will definitely have a negative impact on the performance of your flow. For database to database flows, the ideal page size is probably going to be in the 1,000 range, which means that 1,000 records will get stored in a single page at a time, and then imported into the destination database in batches of 1,000.
Why not always set page size to 1,000? Well, let’s say you are exporting 1 million sales order transactions (each containing dozens of line items) from a database, and you are importing those transactions into your ERP. Your ERP might not allow 1,000 records to be submitted in a single API request. Or your ERP might take 1 - 5 seconds to import even a single transaction, which means that 1,000 records in one HTTP request could take an hour to process. It is very risky to submit long-running HTTP requests with lots of records, because if the HTTP connection is lost due to network failures, then you won’t know which records succeeded or failed. For database to ERP flows, a smaller page size like 20 is a much better fit.
So back to the question, how do you pick the perfect page size for your flow? Most of the time, the ideal page size can be determined by looking at the ideal batch size for the destination applications in your flow. For example, if your flows are importing data into a high-performance database, then a larger page size will be much more performant. If your flows are importing data into a slower API, then a smaller page size will be better. Another factor to consider when picking the page size is the concurrency level on the connection associated with the destination application. The concurrency level represents the number of pages (not records) that will be processed in parallel by the connection. You always want to make sure the data being exported from your source app is broken down into enough pages to fully utilize the concurrency level set on your connection. For example, if the concurrency level for a connection in your flow is 10, and the page size is 1000, but your flow only processes 1000 records total, then you will have just one page of records, and only 1/10 of the available concurrency will be used. Here it would be better to set page size to 100, so that 10 independent pages will be processed in parallel.
- When grouping records in your export, the page size dictates the number of groups per page (that is, not individual records anymore), and all records within one group must fit within the max page size (5 MB).
- Paging of data and page size is independent of the API pagination used to retrieve data from an HTTP/REST based API. When you are working with an HTTP/REST API that has its own paging mechanism, integrator.io will use the paging method of that API to retrieve data but will then create its own pages for the flow to process based on the page size set for your export.
Four different export types to retrieve data
There are four primary patterns for exporting data. Choosing the right pattern for your export will impact the number of records retrieved from the source application each time it runs. To state the obvious, choosing the right pattern to minimize the number of records will give your flow the best performance.
- All – always export all data
- Delta – export only modified data
- Once – export records only once
- Test – export only 1 record
This export type returns all records. For example, if you are exporting data from a MySQL database using All, then you will process all the records returned by the query every time the export runs. Choose this export type with caution when dealing with large numbers of records.
This export type returns only those records that have changed since the last time the flow was run. For example, if you are exporting data from a MySQL database using Delta, then integrator.io will inject a timestamp into your query to request records that have changed since the flow last ran. This export type can drastically improve performance over All, but it does require the source application to support the ability to query via a date field, such as lastModifiedDate or createdDate.
This export type returns only those records that have not yet been exported. It has a built-in mechanism to mark records as exported in the source application after they are exported, so that they are not exported again. For example, if you are exporting data from a MySQL database using Once, then integrator.io will inject a Boolean into your query to request records that have not been exported yet. Then, after the records are exported, integrator.io will submit an update back to the database to mark the records as exported using the same Boolean. This export type requires the source application to support the ability to mark records as exported. It is worth noting that updating records as exported in the source application will slow down the initial export, but the overall performance of the flow will be improved since records are processed only once.
This export type only returns one record and is really useful for testing new flows.
Exporting batches vs. single files from an FTP site
Historically, FTP exports always retrieved files one at a time, and they work great for processing large files. But if you are processing lots of smaller files, then it might slow down your flow. The Batch size setting allows you to control how many files should be retrieved at a time.
The upper bound limit is 1000. The batch size setting affects only the number of files retrieved in a single request to the FTP size; it does not control the total number of files that can be retrieved when a flow is running. When the number of files on the FTP site is greater than the batch size, integrator.io will continue to make subsequent batch requests until all files have been retrieved.
Note: If you are processing a very large number of files that are also very large in size, and you set batch size to 1000, then your flow will likely fail due to network timeout errors associated with trying to retrieve too many large files in a single request to an FTP site.
Running SuiteScript hooks on data returned by NetSuite search APIs
NetSuite search APIs always return batches of 1,000 records. This volume can cause performance problems if you are running a SuiteScript based hook on the records before they are exported. For example:
- You may run out of SuiteScript points or hit NetSuite instruction count limits
- The sum of all 1,000 records might exceed the 5 MB limit that a SuiteScript hook can process in a single invocation
For such scenarios, specify a value in the Advanced > Batch size limit setting to tell integrator.io to break down the 1,000 records into smaller batch sizes.
Including sublists in NetSuite real-time listeners
For real-time NetSuite listeners, the default behavior is to include only body-level fields (like name, phone, and email for a customer record).
If you need to export sublist data (like addresses for a customer,or the line items in a sales order), then you must explicitly include those in the listener’s Sublists to include field.
When including sublist data, keep in mind that each sublist typically requires an additional query to NetSuite to get the extra data. While individual queries run relatively fast, if you are exporting lots of different sublists, it can slow down the time it takes to retrieve a full record from NetSuite.
Storing retry data for errors in your flows
By default, when a record fails in your flow, integrator.io will store retry data in S3 so that it is easy to see the exact data that failed, and/or make changes to it, and then retry the error again. If your flow has the potential to produce millions of errors, then storing retry data for every error can negatively affect the overall performance of your flow.
When dealing with millions of errors, it is probably best to turn off storing retry data in the setting Advanced > Do not store retry data. Turning this setting off will keep the performance of your flow optimized. The better strategy for fixing millions of errors is to fix the data in the source application directly, and then re-run the flow from scratch.
Further reducing the size of data being exported
Exporting really huge single records can negatively affect the performance of your flow. For example, a contact object containing more than 1,000 fields will take longer to export than a contact object containing only 40 fields.
Exporting records that do not need to be processed at all will also negatively affect the performance of your flow. For example, if you are blindly querying all contact objects without any specific search criteria, but only a subset of the contacts actually needs to be synced, then your flow definitely has room to be optimized and sped up.
Generally speaking, if you are exporting records from a source application containing big data, then you should take advantage of the following capabilities in integrator.io to minimize the overall size of the data being processed by your flow.
- Wherever possible, refine your export search criteria to retrieve ONLY the records that you need to process and ONLY the fields within each record that you need to process. That is, avoid statements like
select * from table_xyzunless you’'re sure that you want all the data.
- If you are working with an API that does not natively support the ability to define the specific search criteria that you need, then use an output filter in your export to discard unnecessary records.
- You can use a preSavePage export hook to discard records and remove/restructure fields all in one efficient batch function call.
- For NetSuite exports, you can optimize saved searches (defined in NetSuite) to streamline the fields returned.
As a general rule, importing data takes longer than exporting data because APIs will typically perform stricter validations before allowing data into their systems. Also, writing data to disk fundamentally takes longer than reading it. That aside, there are still lots of ways that imports can be configured to optimize the performance of your flows.
Too many dynamic lookups in your mappings
Generally speaking, mapping fields is very fast, and there should not be any performance impact when mapping 10, 100, or 1,000 fields. That said, the one exception to this is when your mappings use dynamic lookups, as demonstrated below:
Dynamic lookups are inherently slow because you are telling integrator.io to run a search on the fly against the destination application to look up a single field value that is not in the source record already. This means that a separate API call to the destination application is being made for every record being processed in the flow. If you have too many dynamic lookups, or you are processing very large numbers of records, then the performance costs associated with dynamic lookups will add up fast.
If you ever need to remove dynamic lookups from your import mappings, then the most common way to do this is to build a secondary flow to sync the dynamic values from the destination application back into the source application. For example, if you are importing orders into NetSuite from your web store while doing lots of dynamic lookups to set internal ID fields, then you could build one or more secondary flows to first sync the NetSuite internal IDs to your web store so that those values can be included in your source records being exported from the web store.
Identifying existing records using a dynamic lookup
When importing records into an application, five patterns are supported by integrator.io.
- Create new records & update existing records. If your source records can be either new or existing records, then this option will dynamically create vs update accordingly.
- Create new records & ignore existing records. If your source records can be either new or existing records, then this option will create new records and ignore existing records.
- Ignore new records & update existing records. If your source records can be either new or existing records, then this option will ignore new records and update existing records.
- Update existing records. If your source records are always existing records, then this option will always update existing records.
- Create new records. If your source records are always brand new records, then this option will always create new records.
For each of the first four patterns above, you must choose how to tell if a record is new or existing (see image below). When you choose to run a dynamic search, then the same performance concerns outlined above for dynamic lookups in field mappings applies here, too.
If you want to remove the dynamic lookup from your import, then the most common way to do this is to build a secondary flow to sync the dynamic values from the destination application back into the source application. Then, you can pick the option Records have a specific field populated to reference the ID field in your source record.
Submitting records in batches to minimize network API calls
For many applications, you can choose the number of records that should be sent in each request to the destination application. Choosing a larger batch size will reduce the number of network API calls that need to be made, and should improve the performance of your flow. Please note the following parameters:
- Most applications have a hard limit on the number of records that can be submitted in a single batch request.
- Batch size cannot be greater than the page size of the flow, because integrator.io will not aggregate records across pages to build a batch.
- Batch size can be much smaller than page size. For example, page size can be 1000 and batch size can be 5, and then integrator.io will break the page into 200 batches.
Limiting the number of records sent to NetSuite per batch
By default, integrator.io will send NetSuite an entire page of records in a single API call. If your page size is too big for NetSuite APIs (due to NetSuite governance rules), and also assuming you cannot lower the page size in your source export since other steps in your flow need the larger page size, then adjust the Batch size limit setting on the NetSuite import to break the page of records into smaller batches.
Input and output filters
Output filters and input filters provide a mechanism to exclude records from being processed by your flow. To state the obvious, the more records you can discard, or ignore at specific steps in your flow, the better your flow performance will be.
Output filters and input filters are very similar, but differ in one major regard.
- Output filters allow you to remove records entirely from the flow such that the records will not get processed by any of the steps in the flow. Output filters are a great way to optimize the performance of your flows. They are especially useful if you are exporting data from an API or a file system that does not natively support the ability to exclude records through search criteria.
- Input filters allow you to skip processing for only a single step within a flow. They are useful if specific steps in your flow do not need to run for every record being processed by the flow.
If you run into performance problems in your hooks, then be sure to check the following:
- Is the hook code invoking any APIs on a per record basis? If so, can the API calls be batched up?
- Is the hook code inefficiently looping through really large arrays or building memory intensive data structures?
- Is the hook code easy to understand and well written? Maybe a refactor will shed some light on why the code is performing poorly.
I appreciate this information - thank you. I have a follow up on the FTP batch size parameter. If we're processing small files, then we should set the batch number higher? We have a use case where the might be 100 small files of about 2KB in size with a few large ones around 200KB mixed in. We're parsing these into JSON data. I set the batch size to be 20. Do you think there will be any issues?? I'm not sure where these timeouts mentioned about would be happening. Can I set it to 100?? The page size is set to 1. Does the batch size impact the page size?
Hi David Gollom!
We're checking into your question and will get back to you asap.
Hi David Gollom,
If you expect many large-sized files to export, then setting a really high batch size might experience network timeout errors. However, if you expect many small-sized files, then it is advisable to set a higher batch size value so that IO does not process one small file at a time, leading to a more performant integration. You can start with batch size 20. The new error management (to be released) has a chart that plots average processing time/success event over time, and you can use this chart to analyze your flow performance and tweak the batch size as you see fit.
The Concurrency in the FTP connection provides another control to tune your FTP export performance. Setting this field allows you to open multiple connections to the same FTP site and export multiple files from the FTP site. To be sure, please note that that the FTP connection concurrency setting only works for an export step and it is not applicable for import step.
Thank you so much for clarifying that. It makes sense now. I made the change to 20 earlier for the batch size and the performance improved greatly.
Please sign in to leave a comment.