Blob Storage System Design: The Complete Interview Guide

Blob Storage System Design: The Complete Interview Guide | DevMentor

Introduction: Why Blob Storage Comes Up So Often#

Here's the thing — blob storage is one of those system design questions that feels deceptively simple at first. "Just store some files, right?" And then the interviewer asks how you'd handle 10 billion objects across thousands of nodes, and suddenly you're staring at the whiteboard like a deer in headlights.

I've seen this exact moment happen in hundreds of interviews. Candidates who crushed LeetCode and could recite CAP theorem cold — completely froze when asked to design something like AWS S3 or Azure Blob Storage.

The good news? Once you understand the core patterns, blob storage becomes one of the most predictable system design interviews out there. The same concepts come up every single time. This guide is going to walk you through all of them, including the 10 mistakes I see most often and exactly how to avoid them.

What the Interviewer Is Actually Testing#

Before we dive in, let me give you some insider context. When an interviewer asks you to design blob storage, they're not just checking if you know what S3 is. They're looking for specific signals:

Can you reason about scale? Not just say "add more servers," but actually explain how data gets distributed.
Do you understand the difference between control plane and data plane? (Metadata vs. actual blob data — huge signal.)
Can you navigate tradeoffs out loud, like eventual vs. strong consistency?
Do you think about failure scenarios proactively, or only when prompted?

A strong answer hits all four. A weak answer treats the system like a file server that just needs "a load balancer and a database."

How Blob Storage Actually Works (The Mental Model)#

Let's build the mental model before we go deep. At a high level, every blob storage system does five things:

Client uploads a file (the blob)
System decides where to store it
Metadata (name, size, location) is stored separately from the data
Data is replicated across multiple machines for durability
Client retrieves the blob using the same key

That separation between metadata and data in step 3? That's the single most important concept in this entire interview. Write it down. We'll come back to it.

The Upload Flow#

When a client uploads a blob, here's what happens:

text

Client → API Gateway → Load Balancer → Frontend Server
           ↓
     Auth + Quota Check
           ↓
     Metadata Service (assigns storage location)
           ↓
     Data Node (blob is written)
           ↓
     Metadata Updated (blobId → node location)

The Download Flow#

text

Client requests blob by key
    ↓
Metadata Service looks up location
    ↓
Frontend redirects or proxies to correct Data Node
    ↓
Blob is streamed back to client

When you explain this in an interview, walk through both flows explicitly. Interviewers love when candidates can narrate a complete request lifecycle — it shows you've actually thought about the system end-to-end.

The Key Concepts You Must Cover#

1. Data Partitioning — The Big One#

Here's what the interviewer is really checking when they ask "how would you store billions of objects?": do you understand sharding?

Blob storage works by distributing data across many nodes. The partition key determines which node stores a given blob. The most common approach is consistent hashing:

python

# Simplified partition key assignment
def get_partition(blob_key: str, num_partitions: int) -> int:
    hash_value = hashlib.md5(blob_key.encode()).hexdigest()
    return int(hash_value, 16) % num_partitions
 
# Usage
blob_key = f"{account_id}/{container_id}/{blob_id}"
partition = get_partition(blob_key, num_partitions=1024)

Here's a common trap: candidates confuse the blob ID with the partition key. The blob ID identifies the blob. The partition key locates it. These are different things. A blob might have the ID img_abc123, but it lives on partition 47 — and you find partition 47 by hashing the blob key, not by reading the ID.

Why does this matter? Sequential keys (like auto-incrementing integers) create hotspots — all new uploads pile onto the same partition. You fix this with hashing or by prefixing keys with a random salt.

2. Metadata vs. Data Separation (Control Plane vs. Data Plane)#

This is the architectural insight that separates intermediate candidates from senior ones.

Property	Metadata Store	Data Store
Size	Small (bytes per record)	Large (MBs to GBs per blob)
Access Pattern	Random reads, fast lookup	Sequential reads, high throughput
Storage Type	Relational DB or KV store	Distributed object storage
Consistency Needs	Strong	Eventual is usually fine

The metadata service stores things like: blob name, size, content type, storage location, version, checksum, and access controls. The data nodes store the actual binary content.

Why split them? Because a 10-byte metadata lookup and a 4GB video download have completely different performance characteristics. Treating them the same way is like storing your index cards in the same drawer as your TV.

3. Replication and Durability#

This is non-negotiable in interviews. If you don't mention replication, the interviewer loses confidence immediately.

The standard approach: every blob is replicated 3 times, ideally across different failure domains (machines → racks → availability zones). This is how systems like S3 achieve "11 nines" of durability (99.999999999%).

When a blob is written, you typically need at least 2 out of 3 replicas to acknowledge before confirming success to the client. This is your write quorum.

4. Handling Large Files with Multipart Upload#

Here's the thing most people miss — you can't upload a 5GB file as a single HTTP request. Networks fail. Connections drop. You need multipart upload:

python

# Multipart upload flow
def multipart_upload(file_path: str, blob_key: str, chunk_size_mb: int = 8):
    # Step 1: Initiate upload, get upload_id
    upload_id = storage_client.initiate_multipart_upload(blob_key)
    
    parts = []
    chunk_size = chunk_size_mb * 1024 * 1024
    
    with open(file_path, 'rb') as f:
        part_number = 1
        while chunk := f.read(chunk_size):
            # Step 2: Upload each part independently (retryable)
            etag = storage_client.upload_part(

The key insight: each chunk can be retried independently if it fails. The client doesn't restart the entire upload — just the failed chunk. This is what makes large file uploads reliable over flaky networks.

5. Read Optimization: CDN, Caching, and Range Reads#

For hot content (profile pictures, popular videos), you put a CDN in front. The CDN caches blobs at edge nodes close to users, dramatically reducing latency and offloading your storage nodes.

For partial downloads (think: video seeking), support range reads via HTTP Range headers: Range: bytes=1048576-2097151. This lets a video player fetch only the section of a file it needs — you don't have to download a 2GB file to watch minute 47.

6. Layered Rate Limiting#

Don't just say "add rate limiting." Explain the layers:

Edge layer (CDN/API Gateway): Coarse limits — block obvious abuse, DDoS protection
Frontend server layer: Per-user and per-bucket limits — enforce quotas
Storage node layer: Hotspot protection — prevent one partition from being overwhelmed

A single global rate limiter is a bottleneck and a single point of failure. Layered rate limiting is the right answer.

7. Pagination with Continuation Tokens#

If the interviewer asks "how do you list all blobs in a bucket with a billion objects?", do NOT say offset pagination. Here's why:

With offsets, if a blob is added or deleted between page requests, you get duplicate or missing items
Offsets require the database to scan and skip N rows — that's O(N) work per page

Instead, use continuation tokens: an opaque cursor that encodes the last-seen position. The system uses this to resume from exactly the right place, regardless of what changed in between.

High-Level Architecture Diagram#

When you draw this on the whiteboard, hit these components in order:

text

[Client]
    ↓
[CDN] ← (cache hot blobs at edge)
    ↓
[API Gateway] ← (auth, rate limiting, routing)
    ↓
[Load Balancer]
    ↓
[Frontend Servers] ← (quota checks, request validation)
    ↓              ↓
[Metadata Service]  [Data Nodes (Partitioned)]
[e.g., Cassandra,   [e.g., Shard 1, Shard 2...N]
 distributed KV]    
                    ↓
             [Replication Workers]
             [Background GC / Cleanup]

Start simple, then layer on the CDN, background workers, and replication. Never jump to the full diagram immediately — build it incrementally. This shows structured thinking.

Where Candidates Get Stuck#

Let me be direct about the most common failure modes I see:

Treating blob storage like a database: Saying "store blobs in PostgreSQL" or skipping any discussion of sharding. This is the fastest way to lose credibility.
Confusing identity with location: Thinking that because you have a blobId, you know where the data is. You still need the metadata service to map that ID to a physical node.
Skipping failure scenarios: Never mentioning what happens when a storage node goes down. Interviewers will ask. The answer: health checks detect the failure, the system re-replicates the affected blobs to new nodes to restore the replication factor.
Jumping to optimizations too early: Candidates sometimes start talking about CDN caching before they've explained the basic read/write flow. Always establish the baseline first.
Using offset-based pagination: This one comes up more often than you'd expect. If you say offset, expect a follow-up that exposes its limitations.

How to Talk Through This in an Interview#

Here's the exact phrasing I coach candidates to use:

Opening the design:

"Before I start drawing, let me clarify the requirements. Are we optimizing for read-heavy or write-heavy workloads? What's the expected object size distribution? Do we need strong consistency, or is eventual consistency acceptable for reads?"

Introducing partitioning:

"To handle this at scale, I'd partition blobs across nodes using a hash of the blob key. This gives us uniform distribution and avoids the hotspot problem you'd get with sequential keys."

Explaining metadata separation:

"One important design decision is separating the control plane from the data plane. Metadata — things like blob location, size, and checksum — lives in a fast, consistent store like Cassandra or a distributed KV. The actual binary data lives on storage nodes optimized for high throughput. This separation lets us optimize each layer independently."

Handling the follow-up on failures:

"If a storage node fails, our health check service detects it within seconds. We then trigger background re-replication — any blobs that were on that node get copied to healthy nodes to restore our target replication factor of 3."

Follow-Up Questions to Prepare For#

The interviewer will almost certainly ask at least one of these:

"How would you handle a hotspot on one partition?" → Random key prefix, consistent hashing with virtual nodes
"How do you ensure no data is lost if the primary node crashes mid-write?" → Write-ahead log, quorum writes, checksum verification
"How would you support versioning?" → Store version as part of the blob key or metadata, keep old versions until explicitly deleted or TTL expires
"What consistency model does your system provide?" → Be ready to discuss read-after-write consistency and how it relates to your replication strategy
"How would you implement signed URLs for temporary public access?" → Generate a time-limited HMAC token that encodes the blob key and expiry; the frontend server validates the signature before serving

Red Flags to Avoid#

These are the things that make interviewers quietly move a candidate to the "no" pile:

Saying "just use S3" without explaining what S3 is doing under the hood
Designing without ever asking clarifying questions
Mentioning replication only when prompted — bring it up proactively
Saying "we can always scale horizontally" without explaining how
Getting defensive when the interviewer pokes a hole in your design — they're testing your adaptability, not trying to trick you

Key Takeaways#

Here's what to burn into memory before your interview:

Partition key ≠ blob ID: The key identifies; the partition key locates. Always explain both.
Separate metadata from data: Control plane (metadata service) and data plane (storage nodes) are optimized differently. This is a senior-level insight.
Replication is non-negotiable: 3 copies, across failure domains, with a write quorum. Mention it proactively — don't wait to be asked.
Multipart upload for large files: Chunked, parallelizable, and retry-safe. Required for any file over ~100MB.
Continuation tokens over offsets: Always. For any listing operation at scale.
Layered rate limiting: Edge, frontend, and storage node — not a single global limiter.
Start simple, then scale: Establish the baseline design first. Add CDN, caching, and background workers as enhancements. Never reverse this order.

Nail these seven points and you'll be in the top 10% of candidates on this question. The interviewer doesn't expect you to design the next S3 from scratch — they expect you to reason clearly about scale, tradeoffs, and failure. That's exactly what this framework gives you.