TD 3 - Divide and Conquer

In [46]:
import numpy as np
import random

1. Dominating element

Function: instances
INPUT : value t, list T
OUTPUT : counts the number of instance of t in T

Function: isDominant
INPUT : value t, list T
OUTPUT : evaluates (recursively) if t is dominant in T (i.e., present more than len(T)//2 times)

Function: dominant
INPUT : list T
OUTPUT : find a dominant value of T or returns None if none
In [24]:
def instances(t,T):
    if len(T)==0:
        return 0
    return (t==T[0]) + instances(t,T[1:])

def isDominant(t,T):
    if len(T)==0:
        return False
    return instances(t,T)>len(T)/2

def dominant(T):
    for t in T[:(len(T)+1)//2]:
        if isDominant(t,T):
            return t
    return None # no dominant term
In [48]:
Tsize = 8
T=np.random.randint(0,2,size=Tsize)
print("Random table T:\n",T,"\n")

print("Number of 1s in T:",instances(1,T))
print("\nIs 1 dominant?:",isDominant(1,T))

print("\nDominant term:",dominant(T))
Random table T:
 [1 1 0 1 0 1 1 1] 

Number of 1s in T: 6

Is 1 dominant?: True

Dominant term: 1

The algorithm is quite naive and has worst case complexity of $O(n^2/2)$.

1.1. Divide and Conquer

We use here a recursive dichotomic approach using the fact that a dominant value of T must dominate one of the two subvision of T in two half-lists.


Function: divideAndConquer
INPUT : list T
OUTPUT : divideAndConquer version of dominant
In [28]:
def divideAndConquer(T):
    if len(T)==1:
        return T[0]
    center= len(T)//2
    left  = divideAndConquer(T[:center])
    right = divideAndConquer(T[center:])
    if left == right:
        return left
    if isDominant(right,T):
        return right
    if isDominant(left,T):
        return left
    return None
    
In [58]:
Tsize = 10
T=np.random.randint(0,3,size=Tsize)
print("Random table T:\n",T,"\n")
print("Number of 0s, 1s, 2s: ",sum(T==0),sum(T==1),sum(T==2))

print("Dominant term by divide and conquer: ",divideAndConquer(T))
Random table T:
 [1 2 2 1 2 2 2 0 2 2] 

Number of 0s, 1s, 2s:  1 2 7
Dominant term by divide and conquer:  2

The complexity is here such that $$C(n)=2C(n/2)+2n$$ in the worst case. By the master theorem, this gives: $$C(n)=O(n\log n)$$

1.2. Variation

If we compare the values of T by successive pairs, removing those which have different values and only keeping one of the two in case of equality, starting from $n$ values, a maximum of $n/2$ is maintained (the case of all pairwise equalities).

To confirm that a dominant element e of number $k$ of occurrences in T (i.e., $k>n/2$, or more accurately $k\geq n/2-1/2$ in both odd and even $n$'s) is still dominant after this process, consider a given pair:

  • if it does not contain e, $n$ is decreased but not $k$
  • if it contains e and another value, then $n\to n-2$ and $k\to k-1$ and of course $k-1>(n-2)/2=n/2-1$
  • if it contains twice e, then $n\to n-1$ and $k\to k-1$, and $k-1\geq (n-1)/2-1/2=n/2-1$, that is $k\geq n/2\geq n/2-1/2$ both for even and off $n$'s.
Function: divideAndConquerPairs
INPUT : list T
OUTPUT : finds a dominant term in T using the previous remark
In [59]:
def divideAndConquerPairs(T):
    if len(T)==0:
        return None
    if len(T)==1:
        return T[0]
    if len(T)%2 == 1 and isDominant(T[-1],T):   # in the odd case, last one could be "tipping" the dominance
        return T[-1]        
    
    halfT = [ t1 for t1,t2 in zip(T[::2],T[1::2]) if t1==t2 ]
    candidate = divideAndConquerPairs(halfT)
    if isDominant(candidate,T):
        return candidate
In [62]:
Tsize = 10
T=np.random.randint(0,3,size=Tsize)
print("Random table T:\n",T,"\n")
print("Number of 0s, 1s, 2s: ",sum(T==0),sum(T==1),sum(T==2))

print("Dominant term by divide and conquer: ",divideAndConquerPairs(T))
Random table T:
 [1 1 0 1 0 2 0 2 1 0] 

Number of 0s, 1s, 2s:  4 4 2
Dominant term by divide and conquer:  None

For this code, say in the even $n$ case, $C(n)=C(n/2)+n$ so that, by the master theorem, $$C(n)=O(n)$$

2. Multiplying large integers – The Karatsuba algorithm

2.1. Naive algorithm

To compute the product of two large integers $A$ and $B$, an idea is to use the binary representation of each as $A=A_1A_2$ and $B=B_1B_2$ on $2\times n/2$ bits and see that $$AB = (A_1B_1)2^n + (A_1B_2+A_2B_1)2^{n/2}+A_2B_2$$ which reduces the calculus to the products $A_1B_1$, $A_1B_2$, $A_2B_1$ and $A_2B_2$, since the products by powers of two are just bit shifts.

The associated cost is then such that $C(n)=4C(n/2)+O(1)$, that is: $$C(n)=O(n^2).$$

Function: naiveProduct
INPUT : integers A, B
OUTPUT : product AB via binary representation
In [63]:
def naiveProduct(A,B):   
    # Removes the indicator of sign and 'b' in xbxxx
    a=bin(A)[2:]
    b=bin(B)[2:]
    
    len_a=len(a)
    len_b=len(b)
    
    if min(len_a,len_b) == 0:
        return '0'
    
    if len_a == 1:
        if a[0] == '1':
            return b
        else:
            return '0'
        
    if len_b == 1:
        if b[0] == '1':
            return a
        else:
            return '0'
    
    center_a = len_a//2
    center_b = len_b//2

    #Be careful to properly fill with zeros
    return bin(int(naiveProduct(int(a[center_a:],2),int(b[center_b:],2)),2)\
            +int(naiveProduct(int(a[:center_a],2),int(b[:center_b],2))+'0'.zfill(len_a-center_a+len_b-center_b),2)\
            +int(naiveProduct(int(a[center_a:],2),int(b[:center_b],2))+'0'.zfill(len_b-center_b),2)\
            +int(naiveProduct(int(a[:center_a],2),int(b[center_b:],2))+'0'.zfill(len_a-center_a),2))[2:]
In [64]:
A,B=37,14
print(naiveProduct(A,B))
print(bin(A*B)[2:])
1000000110
1000000110

2.2. Karatsuba

Here we observe that $$(A_1 + A_2)(B_1 + B_2) = A_1 B_1 + A_2 B_2 + (A_1 B_2 + A_2 B_1 )$$ so that $A_1 B_2 + A_2 B_1$ can be computed as $(A_1 + A_2)(B_1 + B_2) - A_1 B_1 - A_2 B_2$, which reduces the number of products to compute from $4$ to $3$.

The resulting complexity is $$ C(n) = 3C(n/2)+O(1) $$ or equivalently $$ C(n) = O(n^{\log_2(3)})$$


Function: karatsuba
INPUT : integers A, B
OUTPUT : product AB via karatsuba algorithm based on above remark
In [65]:
def karatsuba(A,B):
    a=bin(A)[2:]
    b=bin(B)[2:]

    if len(a)>len(b):
        b=b.zfill(len(a))
    if len(b)>len(a):
        a=a.zfill(len(b))
    
    len_=len(a)

    if len_ == 0:
        return '0'
    
    if len_ == 1:
        if a[0] == '1':
            return b
        else:
            return '0'

    center = len_//2
    
    P1 = naiveProduct(int(a[center:],2),int(b[center:],2))
    P2 = naiveProduct(int(a[:center],2),int(b[:center],2))
    P3 = naiveProduct(int(a[center:],2)+int(a[:center],2),int(b[center:],2)+int(b[:center],2))

    return bin(int(P1,2)\
               +int(P2+'0'.zfill(2*(len_-center)),2)\
               +int(P3+'0'.zfill(len_-center),2)-int(P1+'0'.zfill(len_-center),2)-int(P2+'0'.zfill(len_-center),2))[2:]
In [66]:
A,B=37,14
print(karatsuba(A,B))
print(bin(A*B)[2:])
1000000110
1000000110

EXTRAS

Reminder of the Master Theorem

If $$C(n)=a C\left({\frac {n}{b}}\right)+O(n^{d}) $$ then,

  • if $d < \log_b a$, $$C(n)=O(n^{\log _{b}a})$$
  • if $d = \log_b ⁡a$, $$C(n)=O(n^d \log ⁡n )$$
  • if $d > \log_b ⁡a$, $$C(n)=O(n^d).$$
In [ ]: