Beyond the Interview: Data Structures (#4 Hash Tables & Bloom Filters)

Speed Demons With Trust Issues. You’ve probably rattled off a definition for both of these data structures in an interview before. But have you ever actually used them outside of Leetcode? If not, buckle up buttercup.

Beyond the Interview: Data Structures (#4 Hash Tables & Bloom Filters)

About This Series

We’ve all been there—grinding LeetCode, memorizing tree traversals, and pretending to care about Red-Black Trees because some Big Tech interviewer insists they’re “fundamental.”

But here’s the reality: 99% of the time, you’re just using Lists, Arrays, and HashMaps—and that’s fine.

Read more

So why bother with the rest? Because sometimes the default isn’t the best choice.

  • Maybe a different data structure completely revolutionizes your system.
  • Maybe it opens up new possibilities you hadn’t considered.
  • Maybe it’s the key to removing that one persistent bottleneck that refuses to budge no matter how many CPU cores you throw at it.

Some data structures solve real-world problems, while others exist purely to torture CS students. In this series, we’ll cut through the nonsense—breaking down:
When data structures actually matter
What trade-offs they introduce
Where they fit into practical software development

No fluff. No pointless academic exercises. Just real-world insights, benchmarks, and maybe a few laughs along the way.

Bonus: In your next tech interview, you’ll be able to talk about the pros and cons of Bloom Filters and Suffix Trees as easily as reversing an array—and that will make you stand out.

I mention some internal workings related to performance (stack vs heap, garbage collection, etc). If you’re not up to speed, check out Dig Deeper: Memory Management in .NET].

Explore This Series

Arrays & Lists | Linked Lists | Stacks & Queues | Hash Tables & Bloom Filters


🧠 Hash Tables – The Workhorse with a Panic Room

📚 What Are They?

A hash table (or hashmap) stores key-value pairs and uses a hash function to map keys to indices in an array.

🗣️ Explain Like I'm 5: Imagine a bunch of labelled mailboxes. Each mailbox has a number on it. When you want to store something, you write your name on an envelope and use a magic machine (the hash function) to turn your name into a number. That number tells you which mailbox to use. Later, you want to find your envelope? Just hash your name again, and you get the same number. Easy peasy—unless someone else hashed to the same box. Uh-oh.

🤺 What Happens in a Hash Collision?

Hash collisions happen when two different keys get mapped to the same slot in the array. This is inevitable with a finite number of slots and a potentially infinite number of keys (pigeonhole principle, anyone?).

Example: Let's say we have a hash function that gives us an index by doing hash(key) % 10, and we want to store these two keys:

  • key = "banana" → hash = 14 → index = 4
  • key = "apple" → hash = 24 → index = 4

Both keys want to live in mailbox 4. What now?

Collision strategies:

  • 🧵 Chaining – We store a list at each slot. So mailbox 4 holds both “banana” and “apple” in a mini-list. When we retrieve, we search the list.
  • 🪑 Open Addressing – We try the next mailbox (and the next, and the next...) until we find a free one. When retrieving, we follow the same trail using the hash rules.

⚙️ Performance Tuning: Load Factor & Resizing

  • 📏 Load Factor – Ratio of items stored to total capacity. A load factor of 0.7 means 70% full. Too full = lots of collisions.
  • 🏗️ Resizing – Double the table size and rehash everything. It's expensive (O(n)) but necessary for performance. (Check out my post on [Arrays & Lists](https://ciarancodes.com/beyond-the-interview-data-structures-1-arrays-lists/) for a deeper look at array resizing.)

🧰 Real-World Example: Redis

Redis is like the VIP lounge of hash tables. It uses them internally for things like sets, dictionaries, and even keys in the key-value store itself.

Example: Let’s say we want to store session data for users:

SET user:12345 "{ name: 'Bob', loggedIn: true }"

Redis takes the key user:12345, hashes it, and stores the value in a hash table. When you run:

GET user:12345

It hashes the key again and checks the table for that hash. Fast. Clean. No SQL in sight. It’s why Redis is used for caching, queuing, rate limiting, and more.

🧠 Memory Considerations

  • Pre-allocating too much = wasteful.
  • Under-allocating = constant resizing.
  • Keys and values themselves take space too (especially strings).

🧑‍💻 When to Use Hash Tables

  • 🔧 You need fast access by key.
  • 🧳 You don’t care about order.
  • 🔍 You want O(1) average case performance.

Examples:

  • Config values
  • Caches
  • Deduplication
  • Indexing (e.g., search autocomplete)

🪄 Bloom Filters – The Lying Librarian

📚 What Are They?

A probabilistic data structure used to test whether an element is possibly in a set or definitely not.

🗣️ Explain Like I'm 5: Imagine you have a bunch of light switches (bits). When someone tells you their name, you flick a few switches based on that name. Later, if someone asks if you know them, you look at those same switches. If any are off, you say “Nope, never met.” If they’re all on? “Maybe I know them.”

I know - sounds pointless, right? Like some academic exercise with no real-world use. But bear with me.

⚙️ How It Works

  • ➕ Hash the input with multiple functions.
  • 💾 Set multiple bits in a fixed-size bit array.
  • ❓ To check, rehash and check those bits.

Important: If any of the bits are 0, the item is definitely not in the set. If all are 1, the item might be there.

✅ Pros

  • 💾 Tiny memory footprint.
  • ⚡ O(1) insert and lookup.

❌ Cons

  • 🤥 False positives.
  • 🧼 Cannot delete without a fancier version (Counting Bloom Filters).

🧰 Real-World Use Case: Caching at Scale

Imagine your service handles 1 million requests per second. You want to know if a key exists in cache before you waste time hitting disk or database.

  • Use a Bloom filter to quickly say, “This key might be in cache.”
  • If it says no → skip the cache (saves a lookup).
  • If it says maybe → check the cache.

Used by systems like:

  • ✏️ Spell checkers (to skip expensive full dictionary scans).
  • 🗄️ Bigtable / Cassandra (to avoid unnecessary disk reads).
  • 🌐 CDNs / proxies (to check if a page was previously fetched).

🧑‍💻 When to Use Bloom Filters

  • Memory is limited
  • Accuracy isn't critical
  • Read performance > write/delete complexity

So if anyone asks you about bloom filters, you can tell them you might know about them, or you definitely don't.


🥊 Hash Table vs Bloom Filter: FIGHT!

FeatureHash TableBloom Filter
⚡ SpeedFastBlisteringly fast
🧠 SpaceModerate to highVery low
🎯 AccuracyExactProbabilistic (false positives)
🗑️ Can Delete?YesNot easily
🔧 Use CaseStore & retrieve real dataTest if data might exist

⚠️ Gotchas That Will Trip You Up

  • 🧨 Bad hash functions ruin everything. Too many collisions = your hash table is just a slow, sad list.
  • Hash table resizing can spike CPU. Watch out in latency-sensitive systems.
  • 🤥 Bloom filters lie. Use with caution when accuracy matters.

🎤 Final Thoughts

Hash tables are the Swiss Army knife of software dev. Bloom filters are their shady cousin who probably knows a guy. With this article, you should now:

  • Understand how collisions happen and how they’re handled.
  • Know why Redis and friends love hash tables.
  • Recognize when a Bloom filter is the best bang-for-byte option.
  • Be confident enough to reach for these tools in your own code.

Next time someone asks, “Why’d you use a hash table there?” you can look them in the eye and say:

“Because it’s fast, reliable, and fits in memory better than your last design doc.”