ETL vs ELT — What, when and why
There's been a lot of discussion in tech industry circles recently about ETL (extract, transform, load) and ELT (extract, load, transform).
These terms and the misunderstandings about which method are not new. However, with a renewed emphasis on collecting and using business data in the cloud to streamline processes and make better decisions, it only makes sense that the discussion has been revived.
What comes first — the T or the L?
ETL and ELT are very closely related, and there's no right or wrong answer to which method organisations should be using. But to understand which method is a better fit, it's important to understand what it means when one letter comes before the other. Let's look at the common definitions of each:
- ETL — Extract, Transform, and Load: With this approach, organisations pull data from one or more sources, but before loading it into their data warehouse or data lake, the next step is to cleanse the data for use. This entails reviewing the data, putting it into the correct categorisations and formats, and ensuring it will line up with existing target databases. The data is then loaded into the target system.
- ELT — Extract, Load, and Transform: This method takes data from one or multiple remote data sources and loads it into a staging area in the data warehouse without looking at or changing any of the data beforehand. Once the data is loaded, organisations would then cleanse and transform the data into the specific target formats after the load, to make it usable by particular programs and team members.
Both are similar, and have the same end goal — the usage of data to improve the way organisations work. That said, the major difference between the two — when to cleanse and transform incoming data — needs to be looked at a bit more in detail, as the organisation's resources to handle each approach will help to dictate which is better.
When to transform data
Organisations can start by looking at the types of data they will be working with. In many cases, reviewing the following variables will inform the decision.
- Data format: If data is unstructured, it does not neatly fit into a relational structure — which most of the analytics tools will work from. In that case, organisations will need to use the ETL approach to re-shape the data to work with a relational format, which the end-user data consumers can then utilise. If the data is already in a relational form, then it can be loaded directly ELT-style into the target, and then potentially massaged into the target form, along with any sort/aggregate/joins and cleansing operations.
- Data size: Datasets come in many different shapes and sizes. This variable influences the ETL vs ELT decision as well. For large datasets, ELT is used so a large amount of data can be processed and transformed simultaneously. Improvements in the speed and power of processing have also made it possible for large datasets to be handled as one unit. Smaller amounts of data are often connected to the ETL approach.
- Cost: Alongside the above, a critically important consideration is how expensive working with data can be. Often, the data has already been landed in the target system, and the ask is to cleanse/format/aggregate the data. If there is only an ETL tool to do that, teams must physically move those bits out of the database, then into an external system to do that processing, only to then move the resulting data back into the database or warehouse. Rather, if the organisation has 'limitless processing' on demand in a cloud data warehouse, it may be beneficial to use the massive parallel processing capability to do what's necessary to the data without moving it around. This is a much more efficient process, with far fewer moving parts to coordinate as well.
- Data source: The source of the data comes into play here as well. What type of application or data source is it coming from? Does that source easily connect to what the organisation is using, or will there be a great deal of transformation work to ensure the data is usable? Is it coming from an on-premises store or the cloud? Another consideration here is that the ELT approach will use set-wise operations on the data, which are inherently very 'batchy'. This may be fine — and even well-suited — for larger volumes of data, but is simply not a good fit if the data is more akin to streaming or messaging. In this case, the ETL style will almost invariably be the better approach.
- Data destination: Closely tied to the source is the question of the destination of the data within the organisation. Does the data source easily connect to it? Are they from different companies with limited connective tissue? Will the data come from one product need to look completely different from being used by the data warehouse solution? ETL is often the preferred method for data with a different source and destination product; while for data that is going from apples to apples, ELT is often used.
- Intensity: This one is more subjective, but the idea is to look at just how much work the data transformation will take to become useful for the team's analysis and decision-making needs. Size comes into play here as well. If the transformation process is less complex, then ELT may be the right choice. If the transformations are more complex, then many organisations choose ETL, so a little is done at a time instead of all at once.
The data transformation takes place in different places depending on the method chosen. In ETL, there's an in-between stage before the data makes it to the warehouse where the transformations are done.
In ELT, the data warehouse does the transformation. ELT only requires raw data from the database to work and requires a great deal more power and overhead to store and transform the data. This, in turn, allows for a shorter time between extracting and using the data, and provides the option for a greater deal of customisation. Because of this, business teams can now quickly build their own data pipelines and immediately see insights that can change the business.
Pick technology tools wisely
There are many variables to consider when deciding to move data into a database with either the ETL or ELT method. For some, the variables may make the choice more of a non-choice.
However, at different points throughout an organisation's journey, one approach may be favoured over the other. Many tools on the market can help teams either transform data before loading, or load data before transforming.
Some solutions can make it easier for organisations to select either approach depending on the types of data and organisational requirements at the specific moment in time, instead of forcing them to choose an ETL or ELT approach forever.
The critical thing to remember is that an approach may, and should, shift over time — so invest wisely in tools that can adapt as your approach does.