TD 1 - Cost evaluation in worst and mean case

In [203]:
import numpy as np
import random
import time

1. Evaluate the extreme entries of a table

1.1. Finding the largest entry

Let us start by creating the table. I'll take here a numpy array of random uniform integers in $\{0,\ldots,99\}$.

In [209]:
Tsize = 10
T=np.random.randint(0,100,size=Tsize)
print("Random table T:\n",T,"\n")
Random table T:
 [86  6 74 44 32 87  3 41 77 49] 

Sequential version of find_max

Function: find_max
INPUT : list T
OUTPUT : maximum of T, sequential approach
In [210]:
def find_max(T):
    max_element = T[0]
    for t in T[1:]:
        if t>max_element:
            max_element=t
    return max_element    

print("Max of T:\n",find_max(T),"\n")
Max of T:
 87 

Computational cost

The function calls $n-1$ comparisons, so the computational cost is of order $O(n)$. It cannot possibly be compressed as T contains $n$ elements: they must all be seen.

Assignment cost

In terms of assignments of max_element, it varies from $1$ (when the max is in first position) to $n$ (when in last position).

On average though, if the elements of T are all different and pickly uniformly at random, the calculus goes as follows:

  • T[0] is always assigned (so $+1$ in the cost)
  • T[1] is assigned as maximum if T[1]>T[0] which occurs with probability $1/2$ (so $+1\times 1/2$ on average cost)
  • T[2] is assigned as maximum if T[2]>max(T[0],T[1]) which occurs with probability $1/3$ (so $+1\times 1/3$ on average cost)
  • and so on ... until T[n-1] assigned if larger than the $n-1$ previous terms, so with probability $1/n$ (so $+1\times 1/n$ on average cost) In total, the average cost is thus: $$ 1 + \frac12 + \frac13 + \ldots + \frac1n = O(\log n).$$

Recursive version of find_max

Function: find_max_rec
INPUT : list T
OUTPUT : maximum of T, recursive approach
In [211]:
def find_max_rec(T):
    if len(T)==1:
        return T[0]
    return max(T[0],find_max_rec(T[1:]))

find_max_rec=trace(find_max_rec)  # --> convenient to "trace" the recursive call

print("\nMax of T:\n",find_max_rec(T),"\n")
|-- find_max_rec([86  6 74 44 32 87  3 41 77 49])
|  |-- find_max_rec([ 6 74 44 32 87  3 41 77 49])
|  |  |-- find_max_rec([74 44 32 87  3 41 77 49])
|  |  |  |-- find_max_rec([44 32 87  3 41 77 49])
|  |  |  |  |-- find_max_rec([32 87  3 41 77 49])
|  |  |  |  |  |-- find_max_rec([87  3 41 77 49])
|  |  |  |  |  |  |-- find_max_rec([ 3 41 77 49])
|  |  |  |  |  |  |  |-- find_max_rec([41 77 49])
|  |  |  |  |  |  |  |  |-- find_max_rec([77 49])
|  |  |  |  |  |  |  |  |  |-- find_max_rec([49])
|  |  |  |  |  |  |  |  |  |  |-- return 49
|  |  |  |  |  |  |  |  |  |-- return 77
|  |  |  |  |  |  |  |  |-- return 77
|  |  |  |  |  |  |  |-- return 77
|  |  |  |  |  |  |-- return 87
|  |  |  |  |  |-- return 87
|  |  |  |  |-- return 87
|  |  |  |-- return 87
|  |  |-- return 87
|  |-- return 87

Max of T:
 87 

1.2. Finding both minimum and maximum

Naiv approach: sequential comparisons

Function: find_minmax_naive
INPUT : list T
OUTPUT : minimum and maximum of T, naiv version
In [213]:
Tsize = int(1e6)
T=np.random.randint(0,100,size=Tsize)
print("Random table T:\n",T,"\n")


def find_minmax_naive(T):
    min_element,max_element=T[0],T[0]
    for t in T[1:]:
        if t<min_element:
            min_element=t
        elif t>max_element:
            max_element=t
    return min_element,max_element


now=time.time()
print("Naive min of T:",find_minmax_naive(T)[0],"\nNaive max of T:",find_minmax_naif(T)[1],"\n")
print("Duration:",time.time()-now,"sec\n")
Random table T:
 [ 3 70 92 ...  0 20 28] 

Naive min of T: 0 
Naive max of T: 99 

Duration: 0.5123188495635986 sec

In the worst case, the implementation costs $2n$ conditions when the table entries are increasing and $n$ if decreasing.

On average, the cost at step $i$ of the loop is again $1\times 1/i$ if the first condition is true or $2\times (1-1/i)$ if the first condition fails. So on average: $$ 2 + \frac12 + \ldots + \frac1{n-1} + 2\left(1-\frac12\right) + \ldots + 2\left(1-\frac1{n-1}\right) = 2n - \sum_{i=2}^{n-1} \frac1i \sim 2n - \log n$$

Improved approach: adding a condition

The previous cost is dominated by the fact that $2$ conditions are almost systematically performed (inducing the $2n$ dominating term in number of conditions). The idea to improve the speed is to first filter through a less likely external condition as follows.

Function: find_minmax_naive
INPUT : list T
OUTPUT : minimum and maximum of T, improved version
In [214]:
def find_minmax(T):
    min_element,max_element=T[0],T[0]
    for t in T[1:]:
        if t<min_element or t>max_element:
            if t>max_element:
                max_element=t
            else:
                min_element=t
    return min_element,max_element

now=time.time()
print("Smart min of T:",find_minmax(T)[0],", smart max of T:",find_minmax(T)[1])
print("Duration:",time.time()-now,"sec")
Smart min of T: 0 , smart max of T: 99
Duration: 0.5963411331176758 sec

If we suppose that the outer condition has a cost of $1$, the cost per iterate is either $1$ when min_element or $2$ with the converse. The latter has probability $2/i$ at step $i$, so finally the cost: $$\sum_i 2\times \frac2i + 1 \times \left(1-\frac2i\right) = n + \sum_i \frac2i \sim n+2\log n$$

2. Iterative search

We now look for a given element in a table.

In [215]:
Tsize = 50
T=np.random.randint(0,100,size=Tsize)
print("Random table T:\n",T,"\n")
Random table T:
 [98  9 15 91 86 90 69 14 92  6 43 52 91 37  2  2 83 54 48 92 13 16 13  2
 58 87  4 31 72 72  9 48 44 41 44 29 20 91 69 96 14  9 17 83 45 96 91 48
 50 19] 

Function: find_element
INPUT : element e,list T
OUTPUT : True if element e found in T, False otherwise
In [216]:
def find_element(e,T):
    for t in T:
        if e == t:
            return True
    return False

print("Element found?",find_element(3,T))
Element found? False

In the best case, this takes $1$ iteration to succeed. In the worst cas, $n$.

When e is for sure in T, its probability to be in position $i$ is $1/n$, so that on average it takes $$\sum_{i=1}^n \frac{i}n = \frac{n+1}2 \sim \frac{n}2$$ iterations to converge. Is instead e only has probability $\alpha$ to be found, then the total cost is on average: $$ (1-\alpha) n + \alpha\sum_{i=1}^n \frac{i}n \sim (1-\alpha) n + \alpha \frac{n}2 = \left(1-\frac{\alpha}2\right)n $$

Function: find_element_rec
INPUT : element e,list T
OUTPUT : True if element e found in T, False otherwise; recursive version
In [198]:
def find_element_rec(e,T):
    if T==[]:
        return False
    else:
        return (e==T[0]) or find_element_rec(e,T[1:])
    
find_element_rec=trace(find_element_rec)    
    
print("Element found?",find_element(3,T))
Element found? True

3. Dichotomic search

We now proceed to a search in a sorted array using a dichotomic search.

In [217]:
Tsize = 100
T=np.sort(np.random.randint(0,100,size=Tsize))
print("Random table T:",T,"\n")
Random table T: [ 0  2  3  3  4  5  5  6  6  7  7  8 10 11 13 13 14 15 16 16 17 17 19 19
 20 22 22 23 24 24 25 27 31 31 32 33 34 35 35 36 37 38 39 40 42 42 45 47
 48 48 49 50 50 52 53 53 54 54 55 55 55 56 58 58 58 62 63 64 67 69 69 71
 72 72 74 74 74 75 75 76 77 77 78 80 81 84 84 85 86 87 87 90 91 91 93 94
 97 99 99 99] 

Function: find_element_dichotomy
INPUT : element e, sorted list T
OUTPUT : True if element e found in T, False otherwise
In [201]:
def find_element_dichotomy(e,T):
    index_left = 0
    index_right = len(T)-1
    while index_left<=index_right:
        index_center = (index_left+index_right)//2
        if T[index_center] == e:
            return True
        if T[index_center] > e:
            index_right=index_center-1
        else:
            index_left=index_center+1
    return False

print("Element found?",find_element_dichotomy(3,T))
Element found? False

In the best case, e is at the center of T and $1$ comparison suffices to conclude. In the worst case, e is not in T. If the cost of searching in T of length $n$ is $C(n)$, by dichotomy, $$C(n) = 1+C(n/2)$$ so that $C(n)=2+C(n/4)$ and more generally $C(n)= k + C(n/2^k)$. Taking $n=2^k$, we get $$C(n) = k + 1 \sim \log_2(n).$$

If we know that e is in T of length $n=2^k-1$ (so that $k\sim \log n$), on average, we get the complexity: $$ 1\times \frac1n + 2\times \frac2n + 3\times \frac4n + \ldots + k \times \frac{2^{k-1}}n = \sum_{a=1}^k \frac{a 2^{a-1}}n \sim \frac{k 2^k}n \sim \log n $$ This follows from the fact that "falling right on e after $a$ divisions can occur at $2^a$ positions of T (the $2^a$ centers of dichotomy after $a$ divisions).

Function: find_element_dichotomy_rec
INPUT : element e, sorted list T
OUTPUT : True if element e found in T, False otherwise; recursive version
In [202]:
def find_element_dichotomy_rec(e,T):
    if len(T)==0:
        return False
    index_center = len(T)//2
    value_center = T[index_center]
    return value_center == e \
        or ( value_center > e and find_element_dichotomy_rec(e,T[:index_center]) ) \
        or find_element_dichotomy_rec(e,T[index_center+1:])

print("Element found?",find_element_dichotomy_rec(3,T))
Element found? False

EXTRAS

The function trace traces the calls of a function (here convenient for us for recursive functions).

In [173]:
from functools import wraps

def trace(func):
    func_name = func.__name__
    separator = '|  '

    trace.recursion_depth = 0

    @wraps(func)
    def traced_func(*args, **kwargs):
        print(f'{separator * trace.recursion_depth}|-- {func_name}({", ".join(map(str, args))})')
        trace.recursion_depth += 1
        result = func(*args, **kwargs)
        trace.recursion_depth -= 1
        print(f'{separator * (trace.recursion_depth + 1)}|-- return {result}')

        return result

    return traced_func