import numpy as np
import random
import time
Tsize = 10
T=np.random.randint(0,100,size=Tsize)
print("Random table T:\n",T,"\n")
def find_max(T):
max_element = T[0]
for t in T[1:]:
if t>max_element:
max_element=t
return max_element
print("Max of T:\n",find_max(T),"\n")
The function calls $n-1$ comparisons, so the computational cost is of order $O(n)$. It cannot possibly be compressed as T contains $n$ elements: they must all be seen.
In terms of assignments of max_element, it varies from $1$ (when the max is in first position) to $n$ (when in last position).
On average though, if the elements of T are all different and pickly uniformly at random, the calculus goes as follows:
def find_max_rec(T):
if len(T)==1:
return T[0]
return max(T[0],find_max_rec(T[1:]))
find_max_rec=trace(find_max_rec) # --> convenient to "trace" the recursive call
print("\nMax of T:\n",find_max_rec(T),"\n")
Tsize = int(1e6)
T=np.random.randint(0,100,size=Tsize)
print("Random table T:\n",T,"\n")
def find_minmax_naive(T):
min_element,max_element=T[0],T[0]
for t in T[1:]:
if t<min_element:
min_element=t
elif t>max_element:
max_element=t
return min_element,max_element
now=time.time()
print("Naive min of T:",find_minmax_naive(T)[0],"\nNaive max of T:",find_minmax_naif(T)[1],"\n")
print("Duration:",time.time()-now,"sec\n")
In the worst case, the implementation costs $2n$ conditions when the table entries are increasing and $n$ if decreasing.
On average, the cost at step $i$ of the loop is again $1\times 1/i$ if the first condition is true or $2\times (1-1/i)$ if the first condition fails. So on average: $$ 2 + \frac12 + \ldots + \frac1{n-1} + 2\left(1-\frac12\right) + \ldots + 2\left(1-\frac1{n-1}\right) = 2n - \sum_{i=2}^{n-1} \frac1i \sim 2n - \log n$$
The previous cost is dominated by the fact that $2$ conditions are almost systematically performed (inducing the $2n$ dominating term in number of conditions). The idea to improve the speed is to first filter through a less likely external condition as follows.
Function: find_minmax_naivedef find_minmax(T):
min_element,max_element=T[0],T[0]
for t in T[1:]:
if t<min_element or t>max_element:
if t>max_element:
max_element=t
else:
min_element=t
return min_element,max_element
now=time.time()
print("Smart min of T:",find_minmax(T)[0],", smart max of T:",find_minmax(T)[1])
print("Duration:",time.time()-now,"sec")
If we suppose that the outer condition has a cost of $1$, the cost per iterate is either $1$ when min_element
We now look for a given element in a table.
Tsize = 50
T=np.random.randint(0,100,size=Tsize)
print("Random table T:\n",T,"\n")
def find_element(e,T):
for t in T:
if e == t:
return True
return False
print("Element found?",find_element(3,T))
In the best case, this takes $1$ iteration to succeed. In the worst cas, $n$.
When e is for sure in T, its probability to be in position $i$ is $1/n$, so that on average it takes $$\sum_{i=1}^n \frac{i}n = \frac{n+1}2 \sim \frac{n}2$$ iterations to converge. Is instead e only has probability $\alpha$ to be found, then the total cost is on average: $$ (1-\alpha) n + \alpha\sum_{i=1}^n \frac{i}n \sim (1-\alpha) n + \alpha \frac{n}2 = \left(1-\frac{\alpha}2\right)n $$
Function: find_element_recdef find_element_rec(e,T):
if T==[]:
return False
else:
return (e==T[0]) or find_element_rec(e,T[1:])
find_element_rec=trace(find_element_rec)
print("Element found?",find_element(3,T))
We now proceed to a search in a sorted array using a dichotomic search.
Tsize = 100
T=np.sort(np.random.randint(0,100,size=Tsize))
print("Random table T:",T,"\n")
def find_element_dichotomy(e,T):
index_left = 0
index_right = len(T)-1
while index_left<=index_right:
index_center = (index_left+index_right)//2
if T[index_center] == e:
return True
if T[index_center] > e:
index_right=index_center-1
else:
index_left=index_center+1
return False
print("Element found?",find_element_dichotomy(3,T))
In the best case, e is at the center of T and $1$ comparison suffices to conclude. In the worst case, e is not in T. If the cost of searching in T of length $n$ is $C(n)$, by dichotomy, $$C(n) = 1+C(n/2)$$ so that $C(n)=2+C(n/4)$ and more generally $C(n)= k + C(n/2^k)$. Taking $n=2^k$, we get $$C(n) = k + 1 \sim \log_2(n).$$
If we know that e is in T of length $n=2^k-1$ (so that $k\sim \log n$), on average, we get the complexity: $$ 1\times \frac1n + 2\times \frac2n + 3\times \frac4n + \ldots + k \times \frac{2^{k-1}}n = \sum_{a=1}^k \frac{a 2^{a-1}}n \sim \frac{k 2^k}n \sim \log n $$ This follows from the fact that "falling right on e after $a$ divisions can occur at $2^a$ positions of T (the $2^a$ centers of dichotomy after $a$ divisions).
def find_element_dichotomy_rec(e,T):
if len(T)==0:
return False
index_center = len(T)//2
value_center = T[index_center]
return value_center == e \
or ( value_center > e and find_element_dichotomy_rec(e,T[:index_center]) ) \
or find_element_dichotomy_rec(e,T[index_center+1:])
print("Element found?",find_element_dichotomy_rec(3,T))
The function trace traces the calls of a function (here convenient for us for recursive functions).
from functools import wraps
def trace(func):
func_name = func.__name__
separator = '| '
trace.recursion_depth = 0
@wraps(func)
def traced_func(*args, **kwargs):
print(f'{separator * trace.recursion_depth}|-- {func_name}({", ".join(map(str, args))})')
trace.recursion_depth += 1
result = func(*args, **kwargs)
trace.recursion_depth -= 1
print(f'{separator * (trace.recursion_depth + 1)}|-- return {result}')
return result
return traced_func