Sum of the largest k ordered pair sums
This article explains a problem I recently worked on: computing the sum of the largest ordered pair sums from a 1-dimensional array. The goal is to find an solution to this problem.
Problem statement
Given an array of length , consider all ordered pairs where and . Each pair has a value .
There are such ordered sums. If we list these sums in descending order as , the task is to compute for a given integer .
A naive approach enumerates all sums, sorts them, and sums the largest .
But what if is too large, e.g., up to ? Can you think of an solution?
Observations
We do not need to list individual pairs. The key idea is:
The -th largest sum acts as a threshold that separates the top elements from the rest.
More precisely, among all pair sums:
- Some are .
- Some are exactly .
- The rest are and irrelevant.
The top sums are then:
- All sums strictly greater than .
- Plus just enough of the sums equal to to reach a total of .
Equivalently, is the unique value such that:
- At least sums are .
- Fewer than sums are .
Therefore the problem becomes:
- Find , the value of the -th largest pair sum.
- Count and sum all pairs with value .
- Add .
Solution
Viewing the problem as a matrix
Let be the sorted version of in ascending order: .
Consider the matrix .
Because is sorted, each row and each column of is sorted in non-decreasing order.
Our pair sums are exactly the entries of this matrix.
We want the sum of the largest entries in .
This matrix view is useful because the set of entries has a nice “staircase” shape in the top-right corner, and we can walk along that staircase in linear time.
Counting pairs above a threshold
Define .
We need to compute quickly.
The pair-sum matrix is sorted in each row and each column.
This structure allows counting all entries in time using a two-pointer sweep.
The process (walking the “staircase”):
- Let .
- For from down to :
- Increase while .
- Then all in that row satisfy the threshold.
- Add to the count.
Intuitively, we are starting at the bottom-left of the matrix and moving only right (increasing ) and up (decreasing ).
Once a column becomes valid for some row, it will also be valid for all rows above, so the pointer never moves left.
The total work across all rows is therefore .
Using prefix sums of we can also compute the total sum of all in the same pass.
Finding the threshold
The function is monotone non-increasing in :
- If is very small, almost every pair satisfies , so is large.
- If is very large, almost no pairs satisfy it, so is small.
- As we increase , the set can only shrink.
This is exactly the structure we need to binary search on .
Recall that the -th largest sum is characterized by:
- At least sums are .
- Fewer than sums are .
So we can search for the largest such that .
That will be exactly .
We can bound easily:
- Lower bound: (smallest possible pair sum).
- Upper bound: (largest possible pair sum).
A binary search over this range, combined with the counting routine, yields in time, where is the range of possible sums.
Computing the final answer
Once is known, we separate the contributions into:
- All sums strictly greater than .
- Some number of sums equal to .
Concretely:
- Compute using the threshold .
- Here is the number of pairs with sum .
- is the sum of all those pair sums.
- The remaining pairs among the top must all have value exactly .
- The number of such pairs is .
Therefore the final answer is .
This exactly matches the picture that the top sums are “all sums above the threshold, plus just enough copies of the threshold itself.”
Pseudocode
def sum_largest_k_pairs(A, k):
n = len(A)
# Sort ascending
B = sorted(A)
n = len(B)
# Prefix sums
prefix = [0] * (n + 1)
for i in range(n):
prefix[i+1] = prefix[i] + B[i]
# Count & sum pairs with B[i]+B[j] >= x
def count_and_sum_ge(x):
j = 0
cnt_ge = 0
sum_ge = 0
for i in range(n-1, -1, -1):
while j < n and B[i] + B[j] < x:
j += 1
if j == n:
continue
cnt_i = n - j
sum_Bj = prefix[n] - prefix[j]
sum_i = B[i] * cnt_i + sum_Bj
cnt_ge += cnt_i
sum_ge += sum_i
return cnt_ge, sum_ge
# Binary search on x to find T* = k-th largest sum
lo = B[0] + B[0] # smallest possible sum
hi = B[-1] + B[-1] # largest possible sum
while lo < hi:
mid = (lo + hi + 1) // 2 # upper mid
cnt_ge, _ = count_and_sum_ge(mid)
if cnt_ge >= k:
# mid is still small enough: at least k pairs >= mid
lo = mid
else:
# mid is too big: fewer than k pairs >= mid
hi = mid - 1
T_star = lo # k-th largest pair sum
# Now compute sums strictly greater than T*
cnt_gt, sum_gt = count_and_sum_ge(T_star + 1)
# Remaining pairs among top k must each have value exactly T*
remaining = k - cnt_gt
answer = sum_gt + remaining * T_star
return answer
Complexity and pattern
The full algorithm runs in , where is the range of possible sums.
For 32-bit integers, is at most 32.
Therefore the approach is essentially linear in , and has no dependence on .
At a higher level, this is an instance of a common pattern for “top out of ” when you cannot afford to enumerate all candidates:
- Represent all candidates as an implicit sorted matrix.
- Define a monotone function that you can compute in .
- Binary search on to find the value where crosses .
- Reconstruct the sum of the top entries from this threshold and the counts above it.
This converts what appears to be an or -dependent problem into an almost-linear-time algorithm, and gives a reusable template for similar pair-sum problems.
Some insights
The sum of the largest ordered pair sums can be computed without enumerating all pairs by:
- Using a threshold idea: the -th largest sum acts as a boundary value .
- Viewing all pair sums as a sorted matrix and counting entries in via a two-pointer sweep along the staircase boundary.
- Applying binary search on the value to find .
- Computing the final sum by separating contributions from sums and sums equal to .
This turns a seemingly quadratic problem into an almost-linear-time algorithm and reveals the underlying structure behind the trick.