|
CHashTable 1
|
A hash table stores data in key-value pairs. This means that each piece of data (value) has an associated key. The main benefit of a hash table is that, if the key is known, we can retrieve its associated value close to O(1) (constant) time. This is much faster than searching through an array to find something which has a time complexity of O(n). However, this advantage comes at the cost of wasted memory.
A good hash table implementation should:
O(1) timeFor simplicity, I've chosen the division method, which works by dividing the key by the table size and taking the remainder. hash = key % table_size.
A collision occurs when two different keys produce the same hash. To handle this, I have chosen the method of using linked lists. Each table slot contains a pointer to a linked list. Colliding entries are appended to the list. When we want to retrieve or delete an element, we traverse through the linked list until we find the key.
Worst case, this produces an O(n) search time but with a good hash function the list should be relatively short. This provides a close to constant search time.
The table size should always be a prime number. This prevents the table size and the keys from having a common factor. This gives an even distribution across the table. The initial size is 53.
When the table starts to fill up, it needs to be resized. To decide when to resize, we calculate the load factor: load_factor = total_number_of_entries / current_table_size. When load factor exceeds 0.75 after an insertion, the table is immediately resized. The next table size is calculated by first doubling the current size, then finding the closest prime that is larger than the new size.
All existing entries must be rehashed into the new table using the new table size.
Because the table size can be increased, the hash table must be dynamically allocated on the heap. When increasing the size, the old table array is freed after all entries are migrated to new table.
Formula: load_factor = total_number_of_entries / current_table_size
Calculated after each insertion.
intintnullptr if it wasn't found.The hash table has a foreach function that iterates over all key-value pairs. The iteration order is not guaranteed (depends on hash function and bucket order). The foreach function accepts a callback function with signature: void callback(int key, int value, void* user_data). The callback is invoked once for each key-value pair in the table.
The hash table has:
The hash table can be saved and loaded into a .txt file.
File format
count is the total number of key-value pairs.The foreach function iterates over all key-value pairs and writes them to the file.
next_prime(count / 0.75) (optimization to avoid resizing during load)