Random Pick with Weight: Ace LeetCode 528 in Interviews
Learn how to solve Random Pick with Weight using prefix sums and binary search — plus exactly what interviewers expect and the mistakes that tank candidates.
Loading...
Learn how to solve Random Pick with Weight using prefix sums and binary search — plus exactly what interviewers expect and the mistakes that tank candidates.
When an interviewer drops this problem in front of you, they're not actually testing whether you know how Random works. Here's what they're really checking: can you map a probability problem into a data structure problem? That mental shift — from "this is about randomness" to "this is about ranges and search" — is the entire signal they're looking for.
A strong candidate hears this problem and within 30 seconds says something like, "I'm thinking prefix sums to convert weights into ranges, then binary search to pick the right index." A weak candidate starts talking about shuffling arrays or building a giant repeated list. Don't be that candidate.
You're given an array of weights:
w = [1, 3, 2]Every time someone calls pickIndex(), you need to return:
0 with probability 1/61 with probability 3/6 (50%)2 with probability 2/6The weights aren't equal, so you can't just call random.nextInt(w.length). You need a smarter mapping.
Here's the thing most people miss in the first two minutes: this is a bucket problem.
Imagine a number line from 1 to totalSum. Each weight gets a "slice" of that line proportional to its size:
w = [1, 3, 2] → totalSum = 6
Number line: [1][2, 3, 4][5, 6]
↑ ↑ ↑
index 0 index 1 index 2Now if you throw a dart randomly at that number line, it lands in each bucket exactly as often as the weight dictates. Index 1 covers 3 out of 6 slots — exactly 50%. That's your weighted random pick.
The question becomes: how do you efficiently figure out which bucket a random number lands in? That's where prefix sums and binary search come in.
The prefix sum array is how you turn weights into those bucket ranges.
w = [1, 3, 2]
prefixSum = [1, 4, 6]Each prefixSum[i] represents the right edge of index i's bucket:
| Index | Bucket Range | prefixSum[i] |
|---|---|---|
| 0 | [1, 1] | 1 |
| 1 | [2, 4] | 4 |
| 2 | [5, 6] | 6 |
Now you generate a random target between 1 and totalSum (inclusive), and you find the first index where prefixSum[i] >= target. That index is your answer.
For example, if target = 3, you scan and find prefixSum[1] = 4 >= 3, so you return index 1. Makes sense — target 3 falls in index 1's bucket [2, 4].
Let's walk through the full implementation together.
public Solution(int[] w) {
prefixSum = new int[w.length];
rand = new Random();
int runningSum = 0;
for (int i = 0; i < w.length; i++) {
runningSum += w[i];
prefixSum[i] = runningSum;
}
totalSum = runningSum;
}This is O(n) time and O(n) space. You do this once at construction, so repeated calls to pickIndex() are fast.
public int pickIndex() {
int target = rand.nextInt(totalSum) + 1; // [1, totalSum]
int l = 0, r = prefixSum.length - 1;
while (l < r) {
int mid = l + (r - l) / 2;
if (prefixSum[mid] < target) {
l = mid + 1;
} else {
r = mid;
}
}
return l;
This runs in O(log n) — a massive improvement over any brute-force approach when pickIndex() is called thousands of times.
import java.util.Random;
class Solution {
private int[] prefixSum;
private int totalSum;
private Random rand;
public Solution(int[] w) {
prefixSum = new int[w.length];
rand = new Random();
int runningSum = 0;
for (int i = 0; i < w.length; i++) {
runningSum += w[i];
prefixSum[i] = runningSum;
I've watched hundreds of candidates stumble on this problem. Here are the exact spots where things fall apart — and how to recover.
The single most common bug I see:
// ❌ Wrong — gives [0, totalSum - 1]
target = rand.nextInt(totalSum);
// ✅ Correct — gives [1, totalSum]
target = rand.nextInt(totalSum) + 1;Without the +1, your target range is [0, totalSum-1]. But your prefix sums start at w[0] (at least 1), and the smallest target should map to index 0's bucket. Target 0 would never correctly map — it creates an off-by-one error in your bucket logic. Always use [1, totalSum].
This is a leftmost binary search — you want the first index where prefixSum[i] >= target. The condition inside the loop must be:
// ✅ Correct
if (prefixSum[mid] < target) {
l = mid + 1;
} else {
r = mid; // Don't exclude mid — it might be the answer
}If you write prefixSum[mid] <= target instead, you overshoot and skip valid answers. A common trap is mixing up strict vs. non-strict inequality here.
r Instead of l#With the while (l < r) pattern, when the loop exits, l == r and both point to the answer. Return either — but be consistent. Most candidates who use while (l <= r) then try to return r and get confused. Stick with while (l < r) and return l.
Some candidates jump straight to: "I'll build an array where index i appears w[i] times, then pick randomly." That's O(totalSum) space and time — and the interviewer will ask you to optimize it. It's fine to mention this briefly as your starting point, but immediately pivot: "That works but could be expensive if weights are large. Let me think about a smarter approach..." Then introduce prefix sums.
If the interviewer mentions weights can be up to 10^8 and there are 10^4 elements, your int will overflow. Mention this proactively:
// ✅ Safer for large weights
long totalSum;
long[] prefixSum;Bringing this up unprompted is a strong signal to your interviewer. It shows you think about production-level edge cases.
Here's example dialogue that will land well. Use this as a template — not a script to memorize word for word.
Opening (0–60 seconds):
"Before I jump into code, let me make sure I understand the goal. I need
pickIndex()to return indexiwith probability proportional tow[i]over the total weight. My first instinct is to think about this as a bucket problem — if I lay out a number line and give each index a segment proportional to its weight, I just need to throw a random dart and see which bucket it hits. I can do that efficiently with prefix sums and binary search."
Explaining the approach:
"In the constructor, I'll build a prefix sum array in
O(n). EachprefixSum[i]is the right boundary of indexi's bucket. Then inpickIndex(), I generate a random number from1tototalSumand binary search for the first prefix sum that's greater than or equal to my target. That index is my answer, and it runs inO(log n)."
Before coding:
"One edge case I want to flag — I need to make sure my random range is
[1, totalSum]inclusive, not[0, totalSum - 1], so the mapping to prefix sums stays correct. And I'll watch out for integer overflow if weights are large."
That intro alone puts you in the top 20% of candidates before you write a single line.
Interviewers love to probe deeper after you get the solution working. Here's what's coming and how to handle it.
"What if pickIndex() is called millions of times — is your solution optimal?"
Yes — O(log n) per call after O(n) preprocessing is excellent. You can mention that the constructor cost is amortized over all the calls.
"What if the weights change dynamically after construction?"
This is where things get interesting. A static prefix sum won't work. You'd need a Binary Indexed Tree (Fenwick Tree) or Segment Tree to support O(log n) updates and O(log n) queries. Mention this to show range of knowledge — you probably won't have to implement it unless it's a senior-level interview.
"Can you walk me through why l == r at the end of your binary search?"
Be ready to trace through an example. The loop invariant is that the answer always stays within [l, r]. When l and r converge, they must both be pointing at the smallest valid index.
"What's the probability distribution of your output — can you prove it's correct?"
Index i maps to w[i] values on the number line out of totalSum total values. The probability of landing there is exactly w[i] / totalSum. That's the definition of the problem. Clean.
| Operation | Time | Space |
|---|---|---|
| Constructor | O(n) | O(n) |
pickIndex() | O(log n) | O(1) |
w = [1, 3] manually. It takes 30 seconds and catches most bugs.O(n) work once so pickIndex() is fast every call.< and not <=, the interviewer will probe until you break.[1, totalSum] — the +1 matters.while (l = target, return l at the end.