TD5 - Hash tables

1. Identifying polygons

For simplicity here, we replace polygons (so lists of points) by lists of integers. Two "polygons" are here identical if the sets of elements in their lists is identical.

In [1]:
class Polygon:
    def __init__(self,list_of_nodes):
        self.nodes=list_of_nodes
    def compare(self,p):
        return set(self.nodes)==set(p.nodes)

1.1. Simple algorithm

In [15]:
def create_classes(polygones):
    classes = []
    for p in polygones:
        test = False
        for classe in classes:
            if classe[0].compare(p):
                classe.append(p)
                test=True
                break
        if not test:
            classes.append([p])
    return classes
In [16]:
polygones = [Polygon([1,2,3]),Polygon([2,5]),Polygon([2,6]),Polygon([5,2])]
create_classes(polygones)
Out[16]:
[[<__main__.Polygon at 0x7fc4d0397130>],
 [<__main__.Polygon at 0x7fc4d0397040>, <__main__.Polygon at 0x7fc4d0397730>],
 [<__main__.Polygon at 0x7fc4d0397c40>]]

The worst case cost is $O(n^2)$ (in number of comparisons).

1.2. Improved algorithm

The idea is here to create a hash table with key the number of nodes in each polygon and value the list of polygons with this number of nodes. Then create classes by filtering over the hash table.

In [17]:
def create_hash_table(polygones):
    hash_table={}
    for p in polygones:
        key = len(p.nodes)
        if key in hash_table:
            hash_table[key].append(p)
        else:
            hash_table[key] = [p]
    return hash_table

def create_classes_improved(hash_table_of_polygones):
    classes=[]
    for key in hash_table_of_polygones:
        classes += create_classes(hash_table_of_polygones[key])
        
    return classes        
In [18]:
ht=create_hash_table(polygones)
create_classes_improved(ht)
Out[18]:
[[<__main__.Polygon at 0x7fc4d0397130>],
 [<__main__.Polygon at 0x7fc4d0397040>, <__main__.Polygon at 0x7fc4d0397730>],
 [<__main__.Polygon at 0x7fc4d0397c40>]]

2. Removing overlaps

2.1. Hashing segments

Note that overlapping segments must be aligned: a hash-table can thus be designed based on a couple (angle,intersect).

2.2. Compute $c(p)=|\{s\in S~|~s \text{ starts in }p\}|-|\{s\in S~|~s \text{ ends in }p\}|$

We can also use a hash-table collecting all extremum points (its keys) and the corresponding value of $c(p)$.

In [19]:
def compute_store_c(S):
    c = {}
    for b,e in S:
        if not (b in c):
            c[b] = 0
        c[b]+=1
        
        if not(e in c):
            c[e] = 0
        c[e]-=1
    return c
In [20]:
S = [[0,3],[1,2],[2,8],[5,7],[1,3]]
compute_store_c(S)
Out[20]:
{0: 1, 3: -2, 1: 2, 2: 0, 8: -1, 5: 1, 7: -1}
In [8]:
def print_segments(S):
    for b,e in S:
        #print(" "*b,"="*(e-b),"\n") 
        print((b,e),":\t","{}x{}x\n".format(" "*b,"="*(e-b-1)))
In [9]:
print_segments(S)
(0, 3) :	 x==x

(1, 2) :	  xx

(2, 8) :	   x=====x

(5, 7) :	      x=x

(1, 3) :	  x=x

2.3. Discard overlaps

The strategy is the following:

  • we first show that, sorting the $c(p)$'s in the order on the line of the points $p\in P$ made of the extremeties of the segments of $S$, $$C(p)\equiv \sum_{i\in P}^p c(i)$$ corresponds to the number of segments of the type $[p_1,p_2)$ on which $p$ stays. The variable $C(p)$ will be iterated over in the code, and will be named counter.

    Prove it! (Hint: by recursion; be careful that if $s = [p_1,p_2]\in S$, then $p_2\notin [p_1,p_2)$)

  • as a consequence, a segment in the final list of segments (without overlaps) must necessarily start at a $p$ with $C(p)=1$: this is when only one segment starts at $p$.
  • when a point $p'$ immediately follows $p$, the segment must be cut if $C(p')\neq 1$: which is always the case unless the unique segment starting at $p$ stops at $p'$ and another one starts at $p'$.
In [21]:
def discard_overlaps(S):
    Sout = []
    
    # get the values of c(p) for all p extremeties of segments of S
    c = compute_store_c(S)
    
    # take the opportunity to collect the extremeties in 'points' and sort them
    points = [key for key in c]
    points.sort()
    
    # identifier to know whether we have a current beginning for the next segment (so 'None' initially)
    segment_beginning = None

    # counter counts the number of segments of the form '[p1,p2)' on which the current point 'p' lies
    counter = 0    
    
    for p in points:
        counter += c[p]
        if counter != 1:
            # if counter != 1, break the segment, output it, and wait for new segment
            if segment_beginning != None:
                Sout.append([segment_beginning,p])
                segment_beginning = None
        else:
            # initiate a new segment if counter==1
            if segment_beginning==None:
                segment_beginning = p
    return Sout

The algorithm complexity is in the worst case $O(|S|\log(|S|))$ dominated by the first sorting operation. The core of the algorithm has linear complexity in $|S|$.

In [22]:
print("Original segments:",S)
print("\nOutput segments:",discard_overlaps(S))
Original segments: [[0, 3], [1, 2], [2, 8], [5, 7], [1, 3]]

Output segments: [[0, 1], [3, 5], [7, 8]]
In [23]:
print("Original segments:\n")
print_segments(S)

print("\nAfter removing overlaps:\n")
print_segments(discard_overlaps(S))
Original segments:

(0, 3) :	 x==x

(1, 2) :	  xx

(2, 8) :	   x=====x

(5, 7) :	      x=x

(1, 3) :	  x=x


After removing overlaps:

(0, 1) :	 xx

(3, 5) :	    x=x

(7, 8) :	        xx

In [ ]:
 
In [ ]: