0:00
/
0:00

Database Internals Explained: Storage, ACID, Concurrency Control & More

+ FREE access to my video courses (available for a limited time)

Databases quietly keep everything in check while your favorite apps, websites, and services dazzle you on the surface. Let’s peel back the curtain and take a closer look at what’s happening behind the scenes in a database.

In this post, I’ll dive into the database internals — a fascinating mix of algorithms, data structures, and clever engineering. Don’t worry, I’ll keep it lighthearted and digestible.

TekForge is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

The Foundation: What Makes a Database Tick?

At its core, a database is a sophisticated system for storing, retrieving, and managing data.

But it’s not just a big digital filing cabinet. No, databases are way smarter. They’ve got layers of systems working in harmony to ensure your data is safe, accessible, and consistent.

1. Storage Engine

The storage engine is the brain behind where and how your data lives. It’s like the basement of a massive library. The books (your data) are shelved in a specific order to make retrieval as efficient as possible.

Whether it’s a magnetic hard drive or a flashy SSD, the storage engine handles how data gets written to and read from disk.

  • Indexes are like the card catalog in that library. They make finding your data way faster.

  • Pages and blocks break up your data into chunks that are easier to manage and retrieve.

  • The storage engine uses clever tricks like caching, so the most frequently accessed data sits closer to the surface, ready to be grabbed in an instant.

2. Query Processor

Ever asked your favorite search engine a question and marveled at how fast it spits out an answer?

That’s the query processor in action. You ask a question in SQL (the database’s language), and the query processor translates it into something the database can understand.

  • Parsing checks that your question is valid and follows the rules of SQL.

  • Optimization is where the query processor gets crafty. It figures out the fastest way to answer your question by looking at available indexes, table structures, and more.

  • Execution is when the plan comes to life, fetching or updating the data you’re after.

3. Transaction Manager

A transaction is like a promise. You tell the database to do something, and it either does it completely or doesn’t do it at all. There’s no in-between.

ACID properties are the gold standard here:

  • Atomicity is all or nothing. If one part of a transaction fails, the whole thing rolls back.

  • Consistency means our data stays in a valid state.

  • With Isolation, transactions don’t step on each other’s toes.

  • Durability means once a transaction is complete, it’s written to disk and safe from power outages or system crashes.

The transaction manager is the referee ensuring these rules are followed.

4. Concurrency Control

Databases often juggle multiple users at once. Imagine you’re at a coffee shop and everyone is ordering their favorite latte simultaneously. The barista needs a system to manage the chaos. Concurrency control does this for databases, ensuring simultaneous actions don’t trip over each other.

  • Locking is a popular strategy. Think of it as reserving a book in the library while you’re reading it.

  • Multiversion Concurrency Control (MVCC) takes a more chill approach. It lets you read a snapshot of the data while someone else might be updating it, ensuring everyone stays happy.

Database Internals: Database Layers
Database Internals: Database Layers (image by author)

Digging Deeper: The Magic of Database Internals

Okay, now that you’ve got the basics, let’s nerd out a bit. Here’s where things get even cooler.

1. Data Structures Galore

Databases lean on some seriously clever data structures to keep things running smoothly.

  • B-trees and B+ trees are the bread and butter of indexing. They keep your data sorted and make it quick to find what you need.

  • Hash tables are great for quick lookups, like finding a single record with a unique identifier.

  • Log-structured merge (LSM) trees shine in write-heavy workloads, making them a favorite in modern NoSQL databases.

2. Write-Ahead Logging (WAL)

WAL is the insurance policy for your data. Before a change is made to the database, it’s first written to a log. If something goes wrong (like a power outage), the database can replay the log to recover.

3. Sharding and Partitioning

When your database grows too big for a single machine, you split it up. This is called sharding or partitioning. It’s like dividing a massive warehouse into sections, each handling a specific range of data.

  • Horizontal partitioning (sharding) splits data across multiple machines based on a key, like user ID.

  • Vertical partitioning splits data by columns, keeping frequently accessed columns separate from rarely used ones.

4. Caching Layers

Caching is like keeping your most-used tools right on your desk instead of walking to the toolbox every time. Databases use caches to store hot data in memory for lightning-fast access. Tools like Redis and Memcached are pros at this.

5. Replication

Replication creates copies of your database for backup and high availability. If one server goes down, another steps in seamlessly. This is the cornerstone of fault-tolerant systems.

  • Master-slave replication has a primary server (master) and one or more secondary servers (slaves).

  • Multi-master replication allows updates on any server, syncing changes across the board.

6. Query Planning and Optimization

The query optimizer is a database’s secret weapon. It’s like a GPS for your SQL queries, finding the shortest route to your data.

  • It considers factors like index availability, join types, and table size.

  • It uses algorithms like dynamic programming to explore multiple execution plans and pick the best one.

Database Internals: Query Planning and Optimization
Database Internals: Query Planning and Optimization (image by author)

7. Background Tasks

Databases are busy even when you’re not looking. They run background tasks to keep things in order:

  • Vacuuming cleans up old data and reclaims space, especially in MVCC systems.

  • Checkpointing ensures changes are periodically flushed from memory to disk.

  • Statistics gathering helps the query optimizer make smarter decisions.

Scaling the Heights: Distributed Databases

When your database needs to handle the big leagues — think global-scale apps like Netflix or Spotify — it enters the world of distributed systems. This adds a new layer of complexity and excitement.

1. Consistency vs. Availability (CAP Theorem)

Distributed systems often juggle consistency and availability. The CAP theorem says you can’t have it all when there’s a network partition:

  • Consistency: Everyone sees the same data.

  • Availability: The system is always responsive.

  • Partition tolerance: It keeps going despite network hiccups.

Databases like Cassandra favor availability, while others like MongoDB strike a balance.

2. Consensus Algorithms

Consensus algorithms help distributed databases agree on a single truth.

Paxos and Raft are two popular approaches. They ensure that even if some nodes fail, the database can still make consistent decisions.

3. Eventual Consistency

In distributed systems, eventual consistency means all copies of the data will align over time. It’s like syncing your phone’s photos to the cloud — it might not be instant, but it happens eventually.

Wrapping Up

Databases aren’t just storage boxes; they’re intricate systems designed with layers of thought and innovation.

From the humble B-tree to the ambitious distributed systems, every part of a database is a story of engineering brilliance. Whether it’s handling your app’s user data or supporting a global e-commerce empire, databases are the backbone of software systems.

Now, every time you hit “Save” or run a query, you’ll know about the magic happening behind the scenes.

So next time someone calls databases boring, you can school them with your newfound knowledge — and maybe, just maybe, they’ll be as fascinated as you and I are.


This article was originally published here: Database Internals Explained: The Invisible Engine Behind Every App You Use


Further Reading and Viewing


🎁 Special Gift for You

I’ve got a couple of great offers to help you go even deeper. FREE access to my video courses - available for a limited time, so don’t wait too long!

Until next time—stay curious and keep learning!

Best,

Rakia


Want more?

💡 🧠 I share content about engineering, technology, and leadership for a community of smart, curious people. For more insights and tech updates, join my newsletter and subscribe to my YouTube channel.

Discussion about this video