Extract Transform Load vs Extract Load Transform

Aparna Bhat
3 min readAug 4, 2020

Today we will talk about ETL and ELT techniques. Before going forward and discuss about the above techniques. We need to first understand the word Big Data. Big data is a term that describes the large volume of data(both structured and unstructured) that inundates a business on a day-to-day basis. But it’s not about the data, it’s about what we gonna do with that data?

To understand both the techniques lets take an example lets say there is a product company who gets large amount of data from different sources.Different Sources might be :

  1. Data from MYSQl DB(CRM data)
  2. Data from Oracle DB(Core Banking Data)
  3. Data in Json Format (Social Media data)
  4. Data in Flat file format(Text file chats and log files)
  5. Data in XML format.

Now after getting the data from different what is the end target:

Target (Analysis Result from Data) :

  1. From social media data, they want to analyze how many people liked their product posts so as to get insights about how many people are interested in their product? Out of 10,000 likes how many are youngsters and so on ?
  2. From chats and call logs data, how many people called, which category they dialed? How much they spent on call?

ETL(Extract,Transform and Load): It is a traditional technique to extract the raw data from different sources, transform them in to structured format. Once the data is formatted the resultant data is stored in Datawarehouse. There are some ETL tools like Informatica, Panoply, Talend which apply some schema and push the resultant data to Datawarehouse for analytics.

Datawarehouse which is also called as OLAP(Online Analytical Platform). Now further from Datawarehouse some BI tools like Tableau are connected to Datawarehouse so as to get sense out of data.

Note: Datawarehouse has no direct connection to customers(only internal people can access it)

Source(Google Image)

Drawbacks of ETL Process:

  1. ETL tools run on single machine, and it has some limitations to handle it. It can’t pull terabytes of data ,so when the data gets increased, it becomes impossible to run the ETL tool.
  2. The data extracted from different sources is discrete so if a company wants to capture the real time data and analyze it for different purposes like some customized offers to customers, it wouldn’t be possible in case of ETL.
  3. Its a costlier process and it needs highs maintenance as you need to select data to load and transform.

ELT(Extract,Load and Transform): It is the technique of extracting raw data from the source and storing it directly in Data lake and later some of the data transformed based on needs for analytical purpose also the raw data will be always always available in Data lake. Now the question arises What is Data lake? Data Lake is a central repository which allows you to store any kind of structured and unstructured data.

ELT uses cloud-based data warehousing solutions for all different types of data — including structured, unstructured, semi-structured, and even raw data types. ELT loads all data immediately, and users can determine later which data to transform and analyze. By using some Hadoop tools Kafka and Spark Streaming can be used to analyze the real time data.

The key difference between both of them is ETL extracts data, load first into the staging server and then into the target system whereas ELT loads data directly into the target system. Moreover, ETL is easy to implement while as ELT requires niche skills to implement and maintain.

Source(Google Image)

Conclusion: ETL is used to process small amount of data while as ELT is used to process large amount of data.The major drawback between ETL architecture is once it gets stored in Datawarehouse we can’t modify it while as in ELT architecture you transform the data as per the user needs and original copy always remains in Datawarehouse.

Thank you for reading!!Have a great day :)

--

--

Aparna Bhat
0 Followers

Budding Analytics Product Manager,Data Science enthusiast & Music lover ❤