For many custom integrations and almost all Integration Apps, the default settings in integrator.io will enable smooth and rapid data transfer the first time you sign in or install an Integration App. The integrator.io platform is a robust, scalable, and secure cloud-based application, with resources tailored to several third-party partner apps.
You will find the in-depth explanations of integrator.io architecture in this article – along with suggested tips and workarounds – most helpful if you’re an experienced web developer who has encountered the following issues:
- Slow data transfer
- API governance or rate limit errors
- Import errors
- Or you’re simply curious about improving an integration’s performance
integrator.io offers powerful advanced settings to allow you to control data transfer and processing for improved throughput in any real-world scenario. It will get complicated at times – many variables are at play among integrator.io settings and remote system assumptions. The sections below examine the applicable integrator.io resources and how you might fine-tune each for your integration’s requirements.
Each connection to an app, file server, or database system serves as a queue for import and export data requests.
Depending on the number and size of the records retrieved from an API’s database, it could take several minutes or much longer to process an export request. When you run a flow, for example, one query in a single queue could retrieve 1,000 or more individual records. The exported records are split into pages (see integrator.io page size, below), and these pages are ordered into a queue, which may then be sent to the same or a different app, where it enters another queue for the items to be further processed.
Concurrency, or the maximum number of requests that a connection can run in parallel, can affect performance, given the following premises:
- The same connection can apply to additional exports and imports across multiple integrations, within the same account. (For security purposes, a connection may be applied only to a single tile in an Integration App.)
- All items in a queue are sent in parallel. However, it is possible for a single connection to act as a gateway – and on occasion, as a bottleneck – for processing large sets of data from multiple queries.
- An app’s API policy can impose restrictions on concurrency, also known as governance or rate limits.
Add to these factors your unique business requirements: some data might need to be processed immediately, and some transactions might have higher volumes of data with separate priority requirements.
To learn more about setting concurrency and the resulting throughput considerations, including establishing shared concurrency for multiple connections to the same endpoint, see Assign concurrency levels to data transfer.
When Debug mode is on, all data going in and out of that connection are recorded in the debug logs. A lot of extra information is captured for you, in order to provide the most meaningful history and troubleshooting information. This file is written and then saved to the cloud.
In most cases, the performance hit is negligible. But, when thousands or millions of records have accompanying metadata, there may be a noticeable lag in the export or import completion time.
It may be helpful to disable Debug mode before running certain flows that are taking too long, after you have completed testing and any troubleshooting.
A connection will change to offline status if either of the following is true:
- Your app’s account credentials are changed or expired
- A server is down or unreachable
An offline connection pauses all data being processed in its queue. As soon as integrator.io detects that it is back online, the flow resumes processing.
However, after a few days of retrying unsuccessfully, the offline connection is abandoned until you restore it. The offline connection may well affect other resources in your flows: downstream connections cannot accept new data if the input is offline.
You can stay on top of connection errors with automatic email notifications.
Requests to export data vary according to the source:
- integrator.io calls to an API
- An API webhook that calls an integrator.io listener
- FTP file transfers
- Database queries
Then, the many APIs and database systems with which you can integrate have very different policies and behaviors. For many apps, integrator.io includes specific logic and unique settings. In either case, before continuing to fine-tune export settings, refer to the vendor’s API documentation to learn more about the app’s data structure and HTTP requests.
When integrator.io gets data from a source, it splits the response into “pages” for queueing and further processing. Simply put, a page chunks up data into a discrete number of records to make them more manageable for processing and sending downstream to be imported into their destination.
The default page size is set to 20 records. You can modify the Page size advanced setting of an export for fewer or greater than 20 records, up to 5 MB per page. In fact, there is no upper limit to the number of pages or the amount of data you can export.
For performance purposes, note that the page size does not take into account the size of each record. A page size of 10 could return a 2 KB page or a 4.9 MB page, depending on the data requested.
To get an idea of the size of data flowing through the integration, multiply the average page size by the number of pages. For example, if a page is typically 1 KB and your export retrieves 38 pages, you will have a total of around 38 KB of records. Navigate to your Run dashboard to see the summaries of pages exported.
Often, the biggest factor determining ideal page size is the destination app where the data will be imported. NetSuite, for example, can reject attempts to write or update large data sets in a single request, so the page size should not exceed 100 records. Even then, if the records themselves are large, containing things like sales orders with line items, you may want to reduce the page size further to 10 or 20 records for optimal throughput and to avoid governance errors.
On the other hand, when syncing with a database system, such as MongoDB, or a data warehouse app, a page size of 1,000 records might be preferred.
Finally, with respect to pagination, keep the processing overhead in mind. In general, the smaller the page size, the slower the flow will run, since pages are processed individually from the queue. With real-time listeners, for example, the page size is often for a single record that the app sends to integrator.io. The performance overhead of a smaller page size is negligible per page, but it’s something to be aware of when fine-tuning these settings.
Note: You may be familiar with the pagination procedure for a specific API, in which results are returned in a series of pages at a maximum number of records each. Do not confuse API pagination with integrator.io page size. When exporting data, you don’t have to be concerned with an API’s page count or limit; integrator.io first retrieves the complete set of data from the API and then splits it into pages according to the page size you specify.
Pagination, concurrency, and order of operations
When a page is created, it goes through the connection (in its queue, as described above). For file providers such as FTP and Amazon S3, integrator.io first makes a copy of the file before splitting records into pages and queueing them for further processing.
Recall that the connection’s concurrency level determines how many requests are processing pages of records in a queue. That level also applies to all exports and imports that reference the same connection.
For example, if the concurrency level is 10, then 10 separate pages occupy one slot in the queue. For a page size of 50 records, one full item in the queue would contain 500 records.
In practice, the pages in the queue are sent sequentially – not literally in parallel – in the order in which they arrived.
Data can be exported via four different methods, which will affect the quantity of the data returned, in descending order:
- All – Export all matching records each time the export is run
- Delta – Export all matching records with a date/time value greater than the last time the data flow was run
- Once – Export any data that has not been exported already and also update records to mark them as exported
- Test – Export only one record by default; used for testing, to avoid syncing lots of data
This type returns anything that matches the selection. For a database such as MySQL, it would include the records for anything retrieved by your MySQL query, which could be every item in a large table.
If you’re exporting a lot of data with unchanged records, consider narrowing the export type by selecting another option.
Delta flows drastically improve performance. They export only the matching records that were changed or created since either of the following benchmarks:
- If the last flow ran successfully, meaning that the flow was completed and the last job record had a positive page size
- A custom date and offset that you specified occurred in the past
Delta flows rely on the source system’s lastModified date.
This setting lets you export a record, after which it will be marked as submitted in the source application and not exported again. It ensures that a record that you want to remain unique, such as a sales order, is not sent twice.
For example, you may want to sync all orders that have not been shipped to a warehouse. Within the same flow, you may also need to change the billing ZIP code – an operation that does not impact shipping, but doing so would update the lastModified date of the order. Setting Export type to Once will ensure that the record is marked as exported and not retrieved again.
This flow type relies on the source system’s isExported Boolean field. (If none is present, integrator.io attempts to create a custom field to store this value, which not all APIs support.) Updating isExported to true also has the effect of slowing the export’s performance, because it has to write back to the system.
Use this export type when you first build a flow to check sample data. A test export gets just the first record, enabling you to ensure that a flow is running properly before you fill it with production data.
By default, an FTP export reads and processes each file one at a time. That is, the export stops retrieving files until the current file is completed. If you are exporting several large files, the transfer can proceed much more slowly than expected.
You can control the speed and performance of FTP file transfers in the Advanced > Batch size setting of the export. It lets you specify how many concurrent files you want processed in a single request (up to 1,000). The Batch size does not in any way limit the total number of files that you can export in a flow.
Setting a maximum number of files allows you to optimize your flows for large files and avoid timeout errors. Then, for smaller files, processing a greater number of files in a single batch keeps the flow more performant.
NetSuite search APIs will return up to 1,000 records by default when you request a new page of results. That volume can cause problems in a few instances:
- If you’re executing a SuiteScript-based hook on the records before they are exported, you may run out of SuiteScript points or hit NetSuite instruction count limits
- If the individual records you are exporting are very large, then the sum of all 1,000 records may exceed the 5 MB per page limit
- The large NetSuite batches may need to be split up for downstream app imports
In such situations, modify the advanced Batch size limit setting to tell integrator.io to break down the 1,000-record batches into smaller groups.
Exports from certain apps offer an advanced Do not store retry data setting. By default it is left unchecked, meaning that integrator.io will store retry data for records that fail in your flow.
The extra retry data can have a performance drag when you are exporting large data sets. In such exports, storing the retry data may be unnecessary, anyway, for data that has presumably already been imported.
In order to keep the time it takes to save a record in NetSuite lightning fast, the default behavior for a real-time export is to include only body-level fields (like name, phone, and email for a customer record).
If you do need to export sublist data (like addresses for a customer or the line items in a sales order), then you must explicitly specify it in the export’s Sublists to include setting.
When including sublist data, keep in mind that each sublist typically requires an additional query to NetSuite to get the extra data. While individual queries run relatively fast, if you are exporting many different sublists it can slow down the time it takes to get a record from NetSuite.
When exporting data from a system, such as a REST API, integrator.io makes the call via HTTP, which returns the records in pages. A record can conceivably contain a thousand fields or a small number of fields with megabytes of data.
In a standard query, there is no mechanism to tell an API not to return a particularly data-heavy field, such as a readme file that is not relevant to the integration. Even setting the page size to 1 is not enough to improve the flow’s performance in these situations. Instead, you may want to consider some combination of the following tools to reduce export size and improve data processing:
- Set up a filter to exclude unnecessary records
- Use NetSuite saved searches whenever possible
- Refine your database queries to retrieve data efficiently
- Create a transform rule to reduce the size of the exported records, deleting unwanted fields, for later processing and importing
- The hook code is executed anew, with no caching, on each and every record in a flow. A single function may be called numerous times to process one or more records.
If you run into performance problems and your records pass through hooks, ask yourself:
- What is the hook trying to accomplish?
- Would it be faster to achieve the same goal with a mapping? Can I use a lookup instead to take advantage of integrator.io caching?
- Is the code well formed? What errors or extra steps might be introduced?
As a rule, the more imports that you have, the slower a flow will go. Reading – via an export – is much faster than writing – via an import – because APIs typically perform additional validation to protect your data and their systems.
Overall, imported data is processed quickly and in order, according to the parameters that you established from one or more exports, above. As always, consult your API’s documentation for its policies and best practices.
Tip: When available, authentication and API guide links are provided for each third-party connector. If you are customizing a universal connection (such as HTTP or REST API) to your app or no links are provided, contact the app vendor’s support directly.
Opportunities also exist for fine-tuning data processing before the records are submitted to the destination app.
Four options are possible for mapping data to most destination endpoints:
- Standard – Defines a 1:1 direct mapping between two fields
- Hard-coded – Overrides the exported data with a single value that you provide for that field in each imported record
Lookup – Retrieves information from an alternate location to populate fields for import, in one of two lookup types:
- Dynamic – Searches the destination app’s records to find a match for the exported field
- Static – Allows you to build a lookup table to replace expected source field values
- Multi-field – Merges multiple fields from the source app into a single field in the destination app
All import types let you specify a default value for missing data.
Simple data mappings (standard, hard-coded, and multi-field) are processed very quickly.
Dynamic lookups are, by definition, more processor- and HTTP request-intensive. For every record in a queue with a dynamic lookup applied to a field, integrator.io must perform a search in the destination app to retrieve the expected value. Although dynamic lookups are cached for later use, to improve performance, multiple dynamic lookups are a common cause of slow flows.
One way to minimize dynamic lookups involves adding a flow to gather the information. For example, you might have an import that dynamically looks up NetSuite customer IDs. Before running that flow, you could sync the customer ID with the source MongoDB internal ID in a new flow, making the import mappings simpler.
Imports to certain apps require you to select the import Operation and offer additional options such as adding or upserting according to a lookup.
Specifying a lookup to verify whether the record exists does add some measurable overhead to the performance.
If you’re certain that the records are always present, then an add/update lookup might not be worth the expense. Updating the same record time and again, after all, is a classic idempotent operation.
A further Operation setting is the Ignore existing [records] checkbox, which protects against adding a duplicate record. For data integrity, you may need to select this option, but the downside is that an additional transaction is required before the record can be imported.
Filters are designed to exclude records that do not match the criteria that you specify. Any filter that you apply to a destination app can limit the records that will be imported.
Tip: Keep in mind an important distinction between the filters for exported and imported data:
- Export filters remove non-matching records from the flow
- Import filters then pass through matching records from the remaining data only to a particular destination. If there are other destinations downstream, the flow continues to process all of the original records regardless of whether or not they met the filter criteria for a previous import.