Articles in this section

What is data ingestion?

Data integration, at a high level, refers to requesting data from one or more systems and uploading it to other systems programmatically. Data integration tools specialize in combining data from multiple sources for analysis, reporting, and operations.

While data integration typically focuses on bulk data loading, it has similarities to application integration, such as via Celigo platform flows, which “connect” applications. Application integration is designed to give you maximum flexibility in modifying exported data to comply with business requirements and reflect the correct and up-to-date information in other programs.

Data integration tools restructure source data to fit the requirements of the destination system, in the following ways:

  • Compile data from multiple sources into a rationalized object

  • Harmonize field values

  • Restructure data to support optimized querying

  • Create smart aggregates

Data integration generally follows one of two approaches:

  • Extract transform load (ETL): Data is extracted from a source system, transformed within the data integration technology into the format to support analytical workloads, and then loaded into the target warehouse.

  • Extract load transform (ELT): Data is extracted from a source system, and then loaded in its native (or close to native) form.  Once loaded, the data is then transformed into the structure to support analytical workloads.

Data ingestion, such as in a Celigo sync, is the process of exporting data from a variety of sources (such as applications, data stores, and files) and loading them into an analytics store (such as data warehouse or data lake) so that it can be accessed and analyzed.

Think of data ingestion a subset of the broader data integration market. Tools in this space focus on only the “extract” and “load” parts of the data integration model.  By doing so, data ingestion tools subscribe more closely to the ELT pattern – assuming that once they have done the extract and load work, some other technology can be applied to solve the transformation requirements.  

The reason that data ingestion is so appealing (such that it earns a separate spot in the modern integration stack) is due to the acceleration it provides in solving the core core problem. With traditional data integration tools, loading a dozen tables into the warehouse would take weeks of development time. Due to the simplification and standardization of the pattern for extract/load, data pipelines can be configured to accomplish the design, implementation, and ingestion in a fraction of the time.