Skip to content
Go back

Data Engineering Primer - Part 1

By Eugene Venger

Abstract image of data engineering process

Lately, I’ve been reading a great book called “Fundamentals of Data Engineering: Plan and Build Robust Data Systems”.

I’ve made my way through a third of the book, and it has been so insightful and full of knowledge that I decided to put on my blogger hat and share the most interesting bits. After all, sharing is caring.

Table of contents

Open Table of contents

WTF Is Data Engineering

Let’s boil it down to some shared definitions.

I really like how Joe Reis puts it: we take in the raw data and turn it into information that can be used for analysis and machine learning. Its intersection with different disciplines is why I love this field and feel very enthusiastic about it.

Data Engineering Lifecycle

Data Engineering Cycle

Here are the stages of the data engineering lifecycle:

The data engineering lifecycle starts by getting data from source systems (could be anything from websites to IOT devices) and then storing it. Next, we transform the data and then move on to our main goal, serving data to analysts, data scientists, ML engineers, and others. In reality, storage occurs throughout the lifecycle as data flows from beginning to end. Therefore, the diagram shows the storage “stage” as a foundation that underpins other stages.

In general, the storage, ingestion, transformation stages can get a bit jumbled. It is ok.

Data Engineering Cycle - Maturity Stage

Data Maturity Model

Data maturity is the progression toward higher data utilization, capabilities, and integration across the organization, but data maturity does not simply depend on the age or revenue of a company. An early-stage startup can have greater data maturity than a 100-year-old company with annual revenues in the billions. What matters is the way data is leveraged as a competitive advantage.

Data maturity model has three stages: starting with data, scaling with data, and leading with data.

Stage 1: Starting with data

When a company first begins working with data, it’s at the beginning of its data journey. At this point:

The main goal for data engineers at this stage is to move quickly and show the value of data.

Most people in the company don’t really understand how to use data effectively yet, but they want to. Reports and analyses are usually done on the fly, without much planning.

It’s tempting to jump into machine learning right away, but it’s not recommended. Many teams struggle when they try ML before they have a good data foundation. It’s possible to get some wins with ML at this stage, but it’s rare.

What Data Engineers Should Focus On

  1. Get support from company leaders. Try to find someone who will back your efforts to build data systems.
  2. Plan out the data systems (you’ll probably do this alone). Figure out what the company wants to achieve with data and design systems to support those goals.
  3. Find and check the data that will help with important projects.
  4. Build a solid base for future data work. You might need to do some analysis and reporting yourself until more people are hired.

Tips for This Stage

Remember, this is a tricky stage with many potential problems. Stay focused on providing value and building a strong foundation for future data work.

Stage 2: Scaling with data

A company has now established formal data practices and moved beyond ad hoc data requests. The next challenge is building scalable data systems and planning for a truly data-driven future. Data engineering roles shift from generalists to specialists, each focusing on specific parts of the data lifecycle.

In stage 2 of data maturity, a data engineer’s goals are to:

Key issues to be aware of include:

Stage 3: Leading with data

At this stage, the company is truly data-driven. Automated systems and pipelines built by data engineers enable self-service analytics and machine learning for everyone. New data sources can be added easily, providing clear value. Data engineers ensure data is always available through proper controls and practices. Their roles continue to become more specialized.

In stage 3 of data maturity, data engineers will:

Key issues to watch out for include:

This is it for now. In the next parts, I’d like to publish more technical nuances. Feel free to email or text me with your suggestions!


Share this post on:

Previous Post
Analytical Tools - Hotjar, Microsoft Clarity, and PostHog