Sum of the largest k ordered pair sums

20 Nov, 2025

This article explains a problem I recently worked on: computing the sum of the largest $k$ ordered pair sums from a 1-dimensional array. The goal is to find an $O (n \log n)$ solution to this problem.

Problem statement

Given an array $A$ of length $n$ , consider all ordered pairs $(i, j)$ where $0 \leq i < n$ and $0 \leq j < n$ . Each pair has a value $A [i] + A [j]$ .

There are $n^{2}$ such ordered sums. If we list these sums in descending order as $S_{1} \geq S_{2} \geq \dots \geq S_{n^{2}}$ , the task is to compute $S_{1} + S_{2} + \dots + S_{k}$ for a given integer $k$ .

A naive approach enumerates all $n^{2}$ sums, sorts them, and sums the largest $k$ .

But what if $k$ is too large, e.g., up to $10^{9}$ ? Can you think of an $O (n \log n)$ solution?

Observations

We do not need to list individual pairs. The key idea is:

The $k$ -th largest sum $S_{k}$ acts as a threshold $T^{*}$ that separates the top $k$ elements from the rest.

More precisely, among all $n^{2}$ pair sums:

Some are $> T^{*}$ .
Some are exactly $T^{*}$ .
The rest are $< T^{*}$ and irrelevant.

The top $k$ sums are then:

All sums strictly greater than $T^{*}$ .
Plus just enough of the sums equal to $T^{*}$ to reach a total of $k$ .

Equivalently, $T^{*}$ is the unique value such that:

At least $k$ sums are $\geq T^{*}$ .
Fewer than $k$ sums are $\geq T^{*} + 1$ .

Therefore the problem becomes:

Find $T^{*}$ , the value of the $k$ -th largest pair sum.
Count and sum all pairs with value $> T^{*}$ .
Add $(k - count) \cdot T^{*}$ .

Solution

Viewing the problem as a matrix

Let $B$ be the sorted version of $A$ in ascending order: $B [0] \leq B [1] \leq \dots \leq B [n - 1]$ .

Consider the $n \times n$ matrix $M [i, j] = B [i] + B [j]$ .

Because $B$ is sorted, each row and each column of $M$ is sorted in non-decreasing order.

Our $n^{2}$ pair sums are exactly the entries of this matrix.

We want the sum of the largest $k$ entries in $M$ .

This matrix view is useful because the set of entries $\geq x$ has a nice “staircase” shape in the top-right corner, and we can walk along that staircase in linear time.

Counting pairs above a threshold

Define $f (x) = # {(i, j) : B [i] + B [j] \geq x}$ .

We need to compute $f (x)$ quickly.

The pair-sum matrix $M [i, j] = B [i] + B [j]$ is sorted in each row and each column.

This structure allows counting all entries $\geq x$ in $O (n)$ time using a two-pointer sweep.

The process (walking the “staircase”):

Let $j = 0$ .
For i from n−1 down to 0:
- Increase $j$ while $B [i] + B [j] < x$ .
- Then all $j^{'} \geq j$ in that row satisfy the threshold.
- Add $(n - j)$ to the count.

Intuitively, we are starting at the bottom-left of the matrix and moving only right (increasing $j$ ) and up (decreasing $i$ ).

Once a column becomes valid for some row, it will also be valid for all rows above, so the pointer $j$ never moves left.

The total work across all rows is therefore $O (n)$ .

Using prefix sums of $B$ we can also compute the total sum of all $B [i] + B [j] \geq x$ in the same pass.

Finding the threshold $T^{*}$

The function $f (x)$ is monotone non-increasing in $x$ :

If $x$ is very small, almost every pair satisfies $B [i] + B [j] \geq x$ , so $f (x)$ is large.
If $x$ is very large, almost no pairs satisfy it, so $f (x)$ is small.
As we increase $x$ , the set ${(i, j) : B [i] + B [j] \geq x}$ can only shrink.

This is exactly the structure we need to binary search on $x$ .

Recall that the $k$ -th largest sum $T^{*}$ is characterized by:

At least $k$ sums are $\geq T^{*}$ .
Fewer than $k$ sums are $\geq T^{*} + 1$ .

So we can search for the largest $x$ such that $f (x) \geq k$ .

That $x$ will be exactly $T^{*}$ .

We can bound $x$ easily:

Lower bound: $2 \cdot B [0]$ (smallest possible pair sum).
Upper bound: $2 \cdot B [n - 1]$ (largest possible pair sum).

A binary search over this range, combined with the $O (n)$ counting routine, yields $T^{*}$ in $O (n \log R)$ time, where $R$ is the range of possible sums.

Computing the final answer

Once $T^{*}$ is known, we separate the contributions into:

All sums strictly greater than $T^{*}$ .
Some number of sums equal to $T^{*}$ .

Concretely:

Compute (cnt_gt,sum_gt) using the threshold T*+1.
- Here $cnt_gt$ is the number of pairs with sum $> T^{*}$ .
- $sum_gt$ is the sum of all those pair sums.
The remaining pairs among the top k must all have value exactly T*.
- The number of such pairs is $k - cnt_gt$ .

Therefore the final answer is $sum_gt + (k - cnt_gt) \cdot T^{*}$ .

This exactly matches the picture that the top $k$ sums are “all sums above the threshold, plus just enough copies of the threshold itself.”

Pseudocode

def sum_largest_k_pairs(A, k):
    n = len(A)

    # Sort ascending
    B = sorted(A)
    n = len(B)

    # Prefix sums
    prefix = [0] * (n + 1)
    for i in range(n):
        prefix[i+1] = prefix[i] + B[i]

    # Count & sum pairs with B[i]+B[j] >= x
    def count_and_sum_ge(x):
        j = 0
        cnt_ge = 0
        sum_ge = 0
        for i in range(n-1, -1, -1):
            while j < n and B[i] + B[j] < x:
                j += 1
            if j == n:
                continue
            cnt_i = n - j
            sum_Bj = prefix[n] - prefix[j]
            sum_i = B[i] * cnt_i + sum_Bj
            cnt_ge += cnt_i
            sum_ge += sum_i
        return cnt_ge, sum_ge

    # Binary search on x to find T* = k-th largest sum
    lo = B[0] + B[0]       # smallest possible sum
    hi = B[-1] + B[-1]     # largest possible sum

    while lo < hi:
        mid = (lo + hi + 1) // 2   # upper mid
        cnt_ge, _ = count_and_sum_ge(mid)
        if cnt_ge >= k:
            # mid is still small enough: at least k pairs >= mid
            lo = mid
        else:
            # mid is too big: fewer than k pairs >= mid
            hi = mid - 1

    T_star = lo  # k-th largest pair sum

    # Now compute sums strictly greater than T*
    cnt_gt, sum_gt = count_and_sum_ge(T_star + 1)

    # Remaining pairs among top k must each have value exactly T*
    remaining = k - cnt_gt
    answer = sum_gt + remaining * T_star
    return answer

Complexity and pattern

The full algorithm runs in $O (n \log n + n \log R)$ , where $R$ is the range of possible sums.

For 32-bit integers, $\log R$ is at most 32.

Therefore the approach is essentially linear in $n$ , and has no dependence on $k$ .

At a higher level, this is an instance of a common pattern for “top $k$ out of $n^{2}$ ” when you cannot afford to enumerate all $n^{2}$ candidates:

Represent all candidates as an implicit sorted matrix.
Define a monotone function $f (x) = # {entries \geq x}$ that you can compute in $O (n)$ .
Binary search on $x$ to find the value where $f (x)$ crosses $k$ .
Reconstruct the sum of the top $k$ entries from this threshold and the counts above it.

This converts what appears to be an $n^{2}$ or $k$ -dependent problem into an almost-linear-time algorithm, and gives a reusable template for similar pair-sum problems.

Some insights

The sum of the largest $k$ ordered pair sums can be computed without enumerating all pairs by:

Using a threshold idea: the $k$ -th largest sum acts as a boundary value $T^{*}$ .
Viewing all pair sums as a sorted matrix and counting entries $\geq x$ in $O (n)$ via a two-pointer sweep along the staircase boundary.
Applying binary search on the value $x$ to find $T^{*}$ .
Computing the final sum by separating contributions from sums $> T^{*}$ and sums equal to $T^{*}$ .

This turns a seemingly quadratic problem into an almost-linear-time algorithm and reveals the underlying structure behind the trick.

#Algorithms