Quinn’s Mind Palace

Sum of the largest k ordered pair sums

This article explains a problem I recently worked on: computing the sum of the largest k ordered pair sums from a 1-dimensional array. The goal is to find an O(nlogn) solution to this problem.

Problem statement

Given an array A of length n, consider all ordered pairs (i,j) where 0i<n and 0j<n. Each pair has a value A[i]+A[j].

There are n2 such ordered sums. If we list these sums in descending order as S1S2Sn2, the task is to compute S1+S2++Sk for a given integer k.


A naive approach enumerates all n2 sums, sorts them, and sums the largest k.

But what if k is too large, e.g., up to 109? Can you think of an O(nlogn) solution?

Observations

We do not need to list individual pairs. The key idea is:

The k-th largest sum Sk acts as a threshold T* that separates the top k elements from the rest.

More precisely, among all n2 pair sums:

The top k sums are then:

Equivalently, T* is the unique value such that:

Therefore the problem becomes:

  1. Find T*, the value of the k-th largest pair sum.
  2. Count and sum all pairs with value >T*.
  3. Add (kcount)·T*.

Solution

Viewing the problem as a matrix

Let B be the sorted version of A in ascending order: B[0]B[1]B[n1].

Consider the n×n matrix M[i,j]=B[i]+B[j].

Because B is sorted, each row and each column of M is sorted in non-decreasing order.

Our n2 pair sums are exactly the entries of this matrix.

We want the sum of the largest k entries in M.

This matrix view is useful because the set of entries x has a nice “staircase” shape in the top-right corner, and we can walk along that staircase in linear time.

Counting pairs above a threshold

Define f(x)=#{(i,j):B[i]+B[j]x}.

We need to compute f(x) quickly.

The pair-sum matrix M[i,j]=B[i]+B[j] is sorted in each row and each column.

This structure allows counting all entries x in O(n) time using a two-pointer sweep.

The process (walking the “staircase”):

Intuitively, we are starting at the bottom-left of the matrix and moving only right (increasing j) and up (decreasing i).

Once a column becomes valid for some row, it will also be valid for all rows above, so the pointer j never moves left.

The total work across all rows is therefore O(n).

Using prefix sums of B we can also compute the total sum of all B[i]+B[j]x in the same pass.

Finding the threshold T*

The function f(x) is monotone non-increasing in x:

This is exactly the structure we need to binary search on x.

Recall that the k-th largest sum T* is characterized by:

So we can search for the largest x such that f(x)k.

That x will be exactly T*.

We can bound x easily:

A binary search over this range, combined with the O(n) counting routine, yields T* in O(nlogR) time, where R is the range of possible sums.

Computing the final answer

Once T* is known, we separate the contributions into:

Concretely:

Therefore the final answer is sum_gt+(kcnt_gt)·T*.

This exactly matches the picture that the top k sums are “all sums above the threshold, plus just enough copies of the threshold itself.”

Pseudocode

def sum_largest_k_pairs(A, k):
    n = len(A)

    # Sort ascending
    B = sorted(A)
    n = len(B)

    # Prefix sums
    prefix = [0] * (n + 1)
    for i in range(n):
        prefix[i+1] = prefix[i] + B[i]

    # Count & sum pairs with B[i]+B[j] >= x
    def count_and_sum_ge(x):
        j = 0
        cnt_ge = 0
        sum_ge = 0
        for i in range(n-1, -1, -1):
            while j < n and B[i] + B[j] < x:
                j += 1
            if j == n:
                continue
            cnt_i = n - j
            sum_Bj = prefix[n] - prefix[j]
            sum_i = B[i] * cnt_i + sum_Bj
            cnt_ge += cnt_i
            sum_ge += sum_i
        return cnt_ge, sum_ge

    # Binary search on x to find T* = k-th largest sum
    lo = B[0] + B[0]       # smallest possible sum
    hi = B[-1] + B[-1]     # largest possible sum

    while lo < hi:
        mid = (lo + hi + 1) // 2   # upper mid
        cnt_ge, _ = count_and_sum_ge(mid)
        if cnt_ge >= k:
            # mid is still small enough: at least k pairs >= mid
            lo = mid
        else:
            # mid is too big: fewer than k pairs >= mid
            hi = mid - 1

    T_star = lo  # k-th largest pair sum

    # Now compute sums strictly greater than T*
    cnt_gt, sum_gt = count_and_sum_ge(T_star + 1)

    # Remaining pairs among top k must each have value exactly T*
    remaining = k - cnt_gt
    answer = sum_gt + remaining * T_star
    return answer

Complexity and pattern

The full algorithm runs in O(nlogn+nlogR), where R is the range of possible sums.

For 32-bit integers, logR is at most 32.

Therefore the approach is essentially linear in n, and has no dependence on k.

At a higher level, this is an instance of a common pattern for “top k out of n2” when you cannot afford to enumerate all n2 candidates:

  1. Represent all candidates as an implicit sorted matrix.
  2. Define a monotone function f(x)=#{entriesx} that you can compute in O(n).
  3. Binary search on x to find the value where f(x) crosses k.
  4. Reconstruct the sum of the top k entries from this threshold and the counts above it.

This converts what appears to be an n2 or k-dependent problem into an almost-linear-time algorithm, and gives a reusable template for similar pair-sum problems.

Some insights

The sum of the largest k ordered pair sums can be computed without enumerating all pairs by:

  1. Using a threshold idea: the k-th largest sum acts as a boundary value T*.
  2. Viewing all pair sums as a sorted matrix and counting entries x in O(n) via a two-pointer sweep along the staircase boundary.
  3. Applying binary search on the value x to find T*.
  4. Computing the final sum by separating contributions from sums >T* and sums equal to T*.

This turns a seemingly quadratic problem into an almost-linear-time algorithm and reveals the underlying structure behind the trick.

#Algorithms